AI companies (e.g., Google, OpenAI) are primarily using high-quality content created by news publishers to train generative AI systems, which then compete directly against those publishers.
That’s the core argument made in a new report from News Media Alliance, a trade association that says it represents about 2,000 publishers in the U.S. and Canada.
- The report (PDF) has been submitted along with commentary (PDF) for the U.S. Copyright Office’s Artificial Intelligence Study.
Why we care. Since the arrival of Bing Chat, Google Bard and Google’s Search Generative Experience, publishers of all sizes have been concerned about generative AI replacing search, which could lead to a devastating impact on organic traffic, revenue and even the brand’s image (e.g., through hallucinations, such as Bing Chat discussing the New York Times endorsing Donald Trump as the 2024 Republican nominee for president).
What News Media Alliance is saying. The report proves the trade association would have a good case in court, according to comments given by Danielle Coffey, News Media Alliance president and CEO, to the New York Times.
- “It genuinely acts as a substitution for our very work. You can see our articles are just taken and regurgitated verbatim,” Coffey said.
What Google and OpenAI are saying. Nothing so far. But we know Google believes all online content should be available for AI training unless publishers opt out. And the New York Times was one of the first to “opt out” by adding a line to its terms of service prohibiting developers of AI systems from using their content for training.
Some control for news publishers. The AI companies will continue to have ways to access content for training purposes, (e.g., through licensing deals or crawling) unless you’ve blocked bots like GoogleBot or CCbot (Common Crawl) entirely. However:
- On Google, SGE overviews won’t show any content blocked using
nosnippet
and you can use Google-Extended (for Bard, Vertex and future models). - You can block GPTBot (and many popular websites have).
- You can disallow content from showing in Bing Chat using NOCACHE and NOARCHIVE.
Dig deeper. What is generative AI and how does it work?