Conversation

I don't think DuckDuckGo is going to start paying sites they include in their search results. I would expect them to take a stand against being forced to do that. It's hard to understand news sites complaining about links to their articles based on their titles and descriptions.
1
Replying to and
You raise a good point. I think the news sites are angry at their lack of market power because Google has 90+% market share and makes over 3 billion out of AU market. But you are correct DuckDuckGo is not going to pay, I don’t have any useful fixes for this
1
They're blaming search engines for taking away money from them, but the search engines are directing users to their articles. If they didn't want to have snippets included in the results, they would disable it. twitter.com/DanielMicay/st They want to have snippets included though.
Quote Tweet
Replying to @tiraniddo and @semibogan
Sites have control over how their content is used in search results: developers.google.com/search/referen If they don't want snippets of their articles used in search results, they can disable it. They can also set a maximum length on how much is used. Can set it for a specific crawler too.
2
You can also mark a section of the content with data-nosnippet to exclude only a portion of the content from being included in snippets. In general, it's usually in the best interest of the site to let search engines index all of it and include all of it as snippets in results.
1
1
It helps to drive traffic to the site. The opt-out exists so that if a site really doesn't want their content used this way, they can disable it. This is the best way to control what search engines index and include in their results though. robots.txt is to control crawling.
1
1
If you disallow access to a section of the site in robots.txt, that stops bots respecting robots.txt from crawling it. They can and will still index it based on links to it without being able to crawl it. It's better to let them crawl and set noindex, etc. via header or meta tag.
1
1
They can still find and use links to other content on a page that's marked noindex. It stops them indexing the page and including it in results. robots.txt is really a legacy thing as properly made modern sites shouldn't have a reason to disallow bots from crawling parts of them.
1
Replying to and
Agree about it being legacy, I have seen some AU govt agencies rely too much in it and basically disallow certain files and or directories that is not smart. This was a good read, decent Dunning-Kruger reminder that my opinions do not equal actual subject knowledge
1
It was originally based on sites being overloaded by queries from bots. No modern site should have a problem with it and they're going to have serious problems with DoS attacks. Sites using it that way are basically documenting how to overload their servers instead of fixing it.
1
The robots meta tag / header is a lot more useful and rel="canonical", redirects, proper access control and sitemaps (to prioritize crawling) cover most of the other reasons people use it. I'm sure most of these news sites are all over this to try ranking better in searches.
1
1
If you look at their pages, they generally add OpenGraph metadata, Twitter metadata, structured data (json-ld, etc.) and various other things specifically to show richer snippets, etc. on social media. If they could do it, I'm sure they'd pay Google to rank higher in searches...