Yandex.Webmaster Will Help You Find Duplicate Pages With Insignificant GET Parameters

Now it has become easier to find identical pages on the site.

In the “Site Diagnostics” section of Yandex Webmaster Tools, a special notification has appeared that will tell you about duplicates due to GET parameters.

You do not need to subscribe to notifications, the notification will appear by itself. If duplicates are found:

You need to go to Webmaster, open the Pages section in the search, select Excluded pages in the table.

Download the archive (you can select a suitable format at the bottom of the page) and view the downloaded file: the duplicate pages will have the DUPLICATE status.

This is useful, because:

  • When there are many duplicates on a site, robots spend more time and resources crawling them instead of crawling valuable content. This means that valuable pages on the site will be slower to get into the search.
  • Since the search robot randomly chooses which of the duplicates to show in the search, then the wrong pages may get to the search.
  • If insignificant parameters are not added to the clean-param, the robot can crawl these pages and consider them different without combining them in the search. Then the search robot will receive different non-aggregated signals for each of them. If all the signals were received by one page, then it would have a chance to be shown higher in the search.
  • Excessive crawling by the robot also puts excessive load requests on the site.

How To Influence This

  1. Add the Clean-param directive to the robots.txt file so that the robot does not take into account insignificant GET parameters in the URL. The Yandex robot using this directive will not crawl duplicate content many times. This means that the efficiency of the crawl will increase, and the load on the site will decrease.
  2. If it is not possible to add the Clean-param directive, you need to specify the canonical address of the page that will participate in the search. This will not reduce the load on the site: the Yandex robot will still have to crawl the page to learn about rel = canonical. Therefore, it is recommended to use Сlean-param as the main method.
  3. If for some reason the previous points do not fit, you can simply close duplicates from indexing using the Disallow directive. But in this case, Yandex search will not receive any signals from prohibited pages. Therefore, it is still better to use Clean-param as the main method.


Source: Yandex Webmasters Blog