A reader has sent us examples of Google misattributing content from dozens of large online news publications, with hundreds thousands of examples of Google indexing URLs and pages, but that content being pulled from a different source.
For example, if you search for [hometownlocator site:post-gazette.com], the first result is local.post-gazette.com/boardman+florist.9.125954212p.home.html:
If you look at the cached result, it brings up a page from hometownlocator.com instead of from site:post-gazette.com. Here is the cached result:
But when you click through it takes you to the post-gazette.com page.
The issue can be with Google or with the publisher. I’ve seen examples of this issue being on both Google’s end and on the publisher’s end.
We’ve emailed Google for a statement on what is going on here but have not heard back after about 12 hours. We will update this post as soon as we hear back.
Postscript: The first example we provided seems to have been an issue on the webmaster end, so we removed it to focus on the issue that seems to be related to Google.
Postscript on April 9, 2014: Google’s Matt Cutts responded to us saying they may show the canonicalized URL in some cases.