Google returns results based upon content appearing upon individual pages, or at specific URLs. But that content could come from different authors, who have different levels of control over it. For example, a blog page may have posts written by more than one author, comments penned by others, and advertisements showing ads that even the owner of the site has no direct control over. A forum might have many different authors responding to an initial post, and may also display advertisements.
Imagine a system that instead of ranking content on a page level, breaks those pages down and looks at smaller content items on those pages, which it associates with digital signatures. Content creators could be given reputation scores, which could influence the rankings of pages where their content appears, or which they own, edit, or endorse.
That’s a broad overview of a new patent application from Google…
Invented by David Minogue and Paul A. Tucker
US Patent Application 20070033168
Published February 8, 2007
Filed: August 8, 2005
The present invention provides methods and apparatus, including computer program products, implementing techniques for searching and ranking linked information sources. The techniques include receiving multiple content items from a corpus of content items; receiving digital signatures each made by one of multiple agents, each digital signature associating one of the agents with one or more of the content items; and assigning a score to a first agent of the multiple agents, wherein the score is based upon the content items associated with the first agent by the digital signatures.
Agents and Authority
When we perform a search at Google, we receive responses to queries based upon how relevant those results might be to our search terms. The order of those results is based upon rankings influenced by both query-dependent and query-independent criteria.
Query-dependent criteria are signals that try to identify how semantically related a document is to a query, such as word frequency distributions.
Query-independent criteria are signals that attempt to identify how authoritative, or intelligible, or trustworthy a document might be, such as PageRank. PageRank tries not only to look at the number of references to a document, but also the quality of those references.
Can authority or trustworthiness be measured in a different way, based upon understanding who the author of content on pages might be, through the use of digital signatures associated with an author? Could query-independent signals be tied to that author, so that a score for content created or controlled or edited or reviewed by the author could be used to rank pages?
This patent application describes a system where that might be a possiblity.
Agent Control of a Resource
The document begins by looking at how much control that agents might have over specific resources.
When all content from a resource is under the control of a single agent, the reputation of the agent can be directly related to the content of that resource. But, it’s possible that a page has more hands involved than one, that each control different parts of a page. In that case, if the different partitions of information can be indentified, reputation for each agent might be calculated at that partition level.
Difficulties involved with this approach might involve the fact that an agent may contribute content to many different resources, a single source may be created or controlled by multiple agents, and the ownership and control of a resource may change over time.
Benefits of the Approach
The patent filing describes a number of features and approaches, and they are worth looking over, but I want to focus upon the benefits that they say this will bring to us:
- Identifying individual agents responsible for content can be used to influence search ratings.
- The identity of agents can be reliably associated with content.
- The granularity of association can be smaller than an entire web page, so agents can disassociate themselves from information appearing near the information for which the agent is responsible.
- An agent can disclaim association with portions of content, such as advertising, that appear on the agent’s web site.
- The same agent identity can be attached to content at multiple locations.
- Multiple agents can make contributions to a single web page where each agent is only associated to the content that they provided.
Digital Signatures for Content
Different content pieces on a page can be signed with a digital signature, either directly by the agent or indirectly on behalf of the agent. These signatures identify who actually created each content piece on a page. One example for a method of creating and validating digital signatures is the World Wide Web Consortium’s XML-Signature Syntax and Processing
Content pieces can have multiple signatures based upon roles an agent may take involving the content, such as author, publisher, editor, or reviewer.
An agent would have exclusive access to the private key they use to sign the content piece, and the digital signature could also include metadata such as creation date, review score, or recommended keywords for search.
Agents could sign only a portion of a page, and exclude content over which they don’t claim any responsibility, such as ads served alongside the document.
That content can range from individual hyperlinks to entire documents, and can include text, images, audio, or video. The signature can also allow people to verify that the signed content hasn’t been materially altered since the signature was generated.
If you want to allow your content and signature to be portable, such as for a syndicated article, you could state that in the meta data associated with the content.
Ranking and Reputation Scores
Tying a page to an author can influence the ranking of that page. If the author has a high reputation, content created by him or her many be considered to be more authoritative that similar content on other pages. If the agent reviewed or edited content instead of authoring it, the score for the content might be ranked differently.
An agent may have a high reputation score for certain kinds of content, and not for others – so someone working on site involving celebrity news might have a strong reputation score for that kind of content, but not such a high score for content involving professional medical advice.
Reputation systems are often measured in terms of effectiveness by how difficult they might be to attack and manipulate. Here, there are at least two factors that may help keep manipulation from happening:
- Reputational scores may be set so that they are relatively difficult to increase and relatively easy to decrease, so that an agent may not want to place his or her reputation at risk by endorsing content inappropriately.
- Since signatures of reputable agents can promote ranking of signed content in search results, agents are provided a powerful incentive to establish and maintain good reputational scores.
The method of ranking based upon reputation scores is described in an analogy based upon PageRank. There’s also some discussion of an alternative possibility of using a seed group of trusted agents to endorse other content. Agents whose content receives consistently strong endorsements might gain reputation under that method. In either implementation, the agent’s reputation ultimately depends on the quality of the content which they sign.
The use of digital signatures enables the reputation system to link reputations with individual agents, and adjust the relative rankings based on all of the content each agent chooses to associate himself or herself with, no matter where the content may be located. That could even include content that isn’t on the internet.
This is a very different way of providing rankings for pages, based upon the reputations of agents who may have interacted with, and digitally signed content on those pages.
Ted Nelson, one of the early pioneers of hypertext, spoke at Google a couple of weeks ago (Transclusion: Fixing Electronic Literature – link to video). He described a very different kind of hypertext than what we are familiar with, which involved a system for connecting electronic documents with content from multiple sources appearing on the same pages together. The last question in the Q&A part of the presentation asked how his electronic documents might be connected so that they can be found easily. His answer, “I guess Google will do that.” This isn’t the system that Ted Nelson envisioned, but it shares some similarities.
I could see blogging systems building tools that allow for digital signatures like the ones described here, such as the Typekey feature in Typepad to authenticate the identity of commenters on multiple blogs.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.