More than 250 websites now block Google-Extended, the “standalone product token” Google introduced Sept. 28 to let you block Bard, Vertex AI generative APIs and future generations of models from accessing your content.
That’s according to research shared exclusively with Search Engine Land by the Detailed.com team.
Why we care. There has been much debate and discussion about whether brands and businesses should block any bots (e.g., GPTBot, CCBot) that crawl content that is then used to train LLMs. Only a minority of sites have decided to block so far, but the numbers have continued to climb in recent months – as they don’t want their content to help AI companies profit and compete against them.
A big increase. As of Nov. 19, 252 websites out of a set of 3,000 popular websites had blocked Google-Extended. Just over a month earlier (Oct. 8), only 89 of those sites had blocked Google-Extended.
- That means the number of sites blocking Google-Extended jumped 180% in the past month.
Websites blocking Google-Extended crawling. They include:
- Ziff Davis properties (e.g., PC Mag, Mashable).
- Vox properties (e.g., The Verge and NYMag).
- The New York Times.
- Condé Nast (22 sites, including GQ, Vogue, Wired)
- Yelp (frequent Google critic and legal opponent).
Reminder. While you can block Google-Extended in robots.txt, that does not block your content from appearing in Google’s Search Generative Experience or prevent Google from using your content from training SGE. To opt-out fully, you’d have to block Googlebot (which would also take you out of Search). However, you can opt out of SGE overviews using nosnippet
.