A newly published patent application from Yahoo, Using community annotations as anchortext, provides a hint at some of the research and work that Yahoo is doing to incorporate user created tags, annotations, bookmarks, and social profiles into the way that they index and organize information, and rank that information.
Danny asked me if I would mention a recent SEO by the Sea post here, where I went into some depth on the processes described in that document. It is at Social Trustrank and User Annotations as Anchor Text. One of the most interesting aspects the patent covers is how trustrank might benefit from interactions between Yahoo and users of Yahoo services. You’ve possibly heard of trustrank in conjunction with the paper Combating Web Spam with TrustRank (pdf).
Instead of just pointing at my post though, I thought it might be interesting to also point to some little cited and little discussed papers that also discuss user tagging, annotations, and how those might help strengthen the offerings from Yahoo (with a Google Base mention for good measure)…
Raghu Ramakrishnan, who joined Yahoo this summer as Vice President and Research Fellow, provides a very detailed presentation on the subject in Community Systems: The World Online. It’s a long presentation, but is a great way to come up to speed with what Yahoo is doing in the area of social search. An extended abstract of the presentation gives a hint as to what is included:
A natural question is whether we can exploit shared community interactions to improve other Web activities, in particular, search. We call this social search, and there are broadly three ways to use social interactions to improve search: 1) Use shared annotations (tags, comments, ratings, etc.) as metadata to improve search result ranking; 2) Use shared activity profiles to connect users with a mutual interest in being connected, as an extension of search; and 3) Create communities of purpose that are empowered to collect and integrate repositories of data harvested by crawling the Web.
I was able to locate some other documents that explore this area from Yahoo. One that comes very close to the processes described in the patent application is Towards the Semantic Web: Collaborative Tag Suggestions, and it shares a couple of authors with the patent filing. It goes into detail on some of the lessons that they’ve learned by looking at how people tagged pages on their My Web 2.0, and discusses steps that they will take to try to improve the service based upon that analysis.
The Bulletin of the Technical Committee on Data Engineering, December 2006 Vol. 31 No. 4, has a Special Issue on Web-Scale Data, Systems, and Semantics (pdf), which includes a couple of interesting documents from Yahoo and Google. My link leads to the full issue, but I’ve also linked to the Google cache text copies of the documents if you would like to go directly to those:
- Content, Metadata, and Behavioral Information: Directions for Yahoo! Research
- Structured Data Meets the Web: A Few Observations
The Yahoo document focuses upon the overall efforts of the Yahoo Research group in the future, and allocates a very strong role to the use of social information into their plans for that future:
Today, however, the number of distinct users generating useful metadata is growing rapidly due to three factors. First, the emergence of simple Web authoring tools such as hosted blogging software makes it possible for authorship to migrate from the elite to a much larger base of online users motivated to express themselves. Second, the introduction of new models for explicit creation of metadata versus content, such as tagging and bookmarking (e.g., through del.icio.us), the creation of rich profiles (e.g., myspace.com), or even the creation and publishing of multimedia content (e.g., youtube.com) lowers the barrier from authorship to lighter-weight interactions like commenting on somebody else’s content. And finally, there are situations in which content consumption itself is a generator of useful metadata.
I thought it was worth pointing out the Google document in this Bulletin because of some of the insights that it provides to user interaction in the building of Google Base. In the creation of customized search engines through Google Base, it’s the creator of that search engine and chosen collaborators who decide upon what is important, and what should be annotated. They talk a little about these kind of annotations in comparing them to those from Flickr:
There is a third class of structured data on the web which is the result of a variety of annotation schemes. Annotation schemes enable users to add tags describing underlying content (e.g., photos) to enable better search over the content. The Flickr Service by Yahoo! is a prime example of an annotation service for photo collections. von Ahn  took this idea to the next level by showing how mass collaboration can be used to create highquality annotations of photos.
Research on Flickr
Speaking of Flickr, what kinds of things might Yahoo be learning from looking at the ways that people tag images? Inducing Ontology from Flickr Tags provides one look at some research being done on that subject. Here’s a snippet:
Based upon our experience and that of others (e.g., Naaman et al. ), we hypothesized that images will be annotated and most easily retrieved when emphasizing several key facets: place, activity and depictions. The Flickr community also seems to emphasize another facet that might best be described as emotion or response. In our results a large proportion of the shared vocabulary is tied to placenames, although we expect that model refinements will produce more of a balance with other facets.
One area that has to be of concern when user-based information is used in a service like Flickr is privacy. You may have heard of Yahoo’s Zonetag service, in which people can upload photos from camera phones, and have the locations of those images recorded based upon GPS or cell phone triangulation. A study that looked at privacy concerns of users of ZoneTag shows another aspect of the use of user created information: Privacy Decisions for Location-Tagged Media
Yahoo research about how people interact with applications on the web is going far beyond counting and indexing tags and annotations, and trying to understand social networks though. This paper on how people interact with remixing media, and the decisions made in the creation of videos shows another area that they are exploring: Community Annotation and Remix: a Research Platform and Pilot Deployment.
The statement from Yahoo Developers Network’s Chad Dickerson about Yahoo’s acquisition of MyBlogLog, Bloggers unite! Yahoo! joins forces with MyBlogLog is in part an announcement of the acquisition, but it’s also a welcome from the Developer’s Network to the folks behind MyBlogLog. There are quite a few folks behind the scenes at Yahoo working on how to use social search in the services they provide.
To return to Raghu Ramakrishnan’s statement above, how will Yahoo “exploit shared community interactions to improve other Web activities?” MyBlogLog provides some opportunities at collecting community information that Yahoo didn’t have before. It will be interesting to see how they might use that information.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.