Understanding latent semantic indexing is quite complex and usually requires a degree in math in order to figure out and understand.
There are a few methods that can be used in order to index and retrieve all the relevant pages of the users query.
The obvious method of retrieving the relevant pages is by matching words from a search query to the same text found within the web pages that are available.
The problem with simple word matching is that they are extremely inaccurate. This is because there are so many ways for a user to express the desired concept, which they are looking for.
This is known as synonymy. This also happens because many words have multiple meanings. This is known as polysemy.
With synonymy, the user’s query may now actually match the text on the relevant pages. They will be overlooked and the problem of polysymy means the terms in a user’s query will often match terms in irrelevant pages.
Latent semantic indexing, or LSI is an attempt to overcome this problem. By looking at the patterns of words distributed across the entire web.
Pages are considered that have many words in common and are thought...