Latent semantic indexing, by definition, is a mathematical or statistical technique for extracting and representing the similarity of meaning of words and passages by analysis of large bodies of text.
The definition may be a little difficult to understand, but basically latent semantic indexing takes the keywords you put into your search engine and go through each and every web page searching out the best results for the key words you are seeking.
There are several different mappings for latent semantic indexing from high dimensional to low dimensional spaces.
LSI chooses the optimal mapping in a sense that minimizes the distance. Choosing the number of dimensions is a unique problem. A reduction can remove much of the noise while keeping too few dimensions may lose important information.
LSI performance is improved considerably after ten to twenty dimensions and peaks at seventy to one hundred dimensions. Then it slowly begins to diminish again. There is a pattern of performance that is observed with other datasets as well.
Latent semantic indexing considers pages that have many words in common and close in meaning, sorts them out, and...