Forum Talk: How Yandex Perceives Words

How Yandex processes and perceives words that have a similar meaning is a hotly contested topic, with much debate around exactly how Yandex’s algorithms work (much like with Google).

This topic has arisen within the Yandex Webmaster Forums this month, with RuNet optimizers ready to contribute to the discussion.

The question originally raised by Annetkas007 was:

Hello everyone!

The question arose: how does Yandex perceive words: a car and a car, children and a child, and the like?

In Direct, as far as I know, he perceives these words as the same, that is, if you sharpen an ad for a car, then an ad with a car will also be displayed.

How about searching? If you optimize the page for a car, will there be requests related to the car?

This was met with responses from the user Victor Petrov:

Yandex claims to be trying to understand the meaning of the request and its intent. If you take a unigram, then the process becomes more complicated, you need some “tails” so that it understands what it is about. It understands synonyms, but it tries to understand what most often people want to see, who climb with such a search. Or gives out a spectrum, for all occasions.
So, for example, at the request for “brandy”, it will give out “cognac” and highlight it in the search results, but never the other way around.

And Mahmoud Abbas:

I have a feeling that after the introduction of the EAT / YMYL Google analogue curve, Yandex began to relate some synonyms directly to each other. In the same financial sector, because of this, banks flew to the top, where they were not previously observed (at least in such a number). And yes, there are no specific words from the query on the pages from the top of these banks, but there are synonyms.

These are, for example, queries with the words “loans / credits / money”.

This then lends itself to the discussion of Yandex both:

  1. Being able to understand meaning/context around word usage, and;
  2. That Yandex is then able to understand the real-world implications of what users are looking for (the intent) with these queries, and whether or not it places more weight on more trusted results (in a YMYL/EAT fashion).

In regard to the second point, we know this is what the Proxima metric is attempting to do, so in order to be able to achieve the goal of a YMYL/EAT classification system, the machine learning capabilities (Vega et al) must be able to complete point 1.