Google goes LSI?
In August, 2003, I wrote about Google Synonyms. This was just after Google acquired Applied Semantics - and their ontology called ASO (Applied Semantics Ontology).
Back then I suggested using ~ (the Tilde character) when doing keyword research.
The issue where I originally talked about Google Synonyms is:
Keyword Hawk Issue #20
The reason: In order to come up with a synonym for a keyword, Google would first have to map the keyword to one (or more) higher level categories or concepts.
With these concepts in place, Google would then be able to retrieve additional keywords from the same category. And words or term from the same category - and with the same meaning (or semantic) would be considered synonyms.
Back then, synonyms wasn’t a part of the general Google search results, but optimal AdWords delivery through AdSense partners relied on Google’s ability to match any given page with AdSense code with the most appropriate AdWords ad.
So my reasoning was that if you included different synonyms in a page, where you had your AdSense code, then Google could determine the theme of the page easier - and deliver better targeted AdWords ads.
Time has passed, and I still recommend using synonyms when writing new content, but I’ve just been made aware of an additional reason to do so.
The February 15 issue of Axandra’s newsletter talks about LSI - or Latent Semantic Indexing which is exactly what I talked about in 2003.
The news about Google and LSI comes from:
February 15, Axandra newsletter
The difference between now and then is that now it seems that Google has begun integrating LSI into their ranking algorithm - and Google Synonyms seems to be the way that we can “view” the LSI effect from the “outside”.
Here’s my own personal understanding of what LSI is. I’m not a scientist or mathematician, so please keep that in mind as you continue.
When Google processes a page, it maps it into an n-dimentional term space based on the meaning-bearing words - it produces a term space vector.
Term space vectors that are close to each other in this n-dimentional space are considered similar.
Synonyms or related keywords tend - on a large scale - to exist on the same pages.
That is, you’ll find the keywords “bird” and “parrot” and “cage” mentioned on the same page more often that you’ll find “bird” and “molecules” and “wrench” on the same page.
These n-dimentional term spaces are complex to handle because of the diversity of pages.
What LSI does is compress, cluster, categorise and collapse these term spaces into term spaces with far less dimentions.
Pages that are “close” to each other in this term space will then “belong” to the same cluster - and it’s the combined keywords of these pages that are considered synonymous or related.
So which implications could this have on the way you create or optimize pages?
Well, there are two different ways to look at page creation.
One is from the point of view of AdSense ads.
Google obviously aims at serving the best targeted AdWords ads because Google profits from the clickthroughs.
My guess - and it is a guess - is that Google maps the AdWords ad stats to the term space vectors.
AdSense (for Content - not Search) clearly can’t rely on single keywords, but a well-performing ad on page A will probably also perform well on pages that are close to page A in term space.
If these guesses or assumptions about how Google handles AdWords ad performance and the term space are correct, then the closer you stick to a theme and use similar and associated words, the better Google will be at serving high-performing ads.
But how about organic searches?
How can sticking with a subject matter and using synonymous words improve your rankings?
Well, it might not improve your rankings for specific keywords, but you’ll be able to appear on additional result pages - eventhough the queried keyword doesn’t appear on your page.
For this to have a significant impact, Google must shift its focus from keyword matching to more concept matching - and your page should be a more authoritative page on the concept or theme than the other pages found for the exact keyword.
I think that LSI with organic searches in Google is still only a very very small part of the ranking algorithm, but I also think that it’s fair to assume that Google will put more emphasis on it in the future.
No TagsAdd comment February 20th, 2005