In order to understand how Latent Semantic Indexing is achieved, it is important to know some basic high school math, particularly Cartesian coordinates.
Typically when a search query is sent a term-document matrix is created. The pages that have been previously processed send back results that contain the correct semantic meanings.
All formatting from the pages including capitalization, punctuation and extraneous makeup are removed.
Also, the conjunctions, common verbs, pronouns and prepositions are removed. Lastly, the common endings are removed and what you have left are the stem words.
In order to plot the position of the web page, you need to think of the page in terms of a three – dimensional shape.
Using three words instead of three lines, you are able to achieve this image. The position of every page that contains these three words is known as a term space.
Each page forms a vector in the space and the vectors direction and magnitude determine how many times the three words appear in the structure.
With three words, it is easy to imagine what the resulting form may look like, and the resulting query would turn up a...