Back in October, 2006, Google announced on the Official Google Blog that they were enabling people to create their own custom search engines.
If you asked yourself why they were doing this, and how it might provide benefits to individual site owners, searchers as a whole, and Google itself, there are some answers that came out yesterday at the US Patent Office…
Google has published a series of five new patent applications on “programmable search engines,” with Ramanthan V. Guha listed as the inventor on the patents (his name was also on the announcement linked to above on the Google Blog). From reading through the patent filings, I’m thinking that it’s safe to assume that the “programmable search engines” described are Google’s custom search engines, though the applications may describe aspects that may differ somewhat or may not have been fully developed yet.
Ramanathan Guha is listed as the sole inventor on these documents, and he has an interesting history. He joined Google in May of 2005, and had been a principle scientist for Apple Computer and for Netscape, a Co-founder and the CTO of Epinions, one of the developers of the RDF Site Summary (RSS) 0.9 standards, and has a rich resume of other accomplishments.
These are the patent filings covering the programmable search engines published this week:
- Programmable search engine
- Aggregating context data for programmable search engines
- Sharing context data across programmable search engines
- Generating and presenting advertisements based on context data for programmable search engines
- Detecting spam related and biased contexts for programmable search engines
The easiest way to learn about the features of Google’s custom search engines is to create one or two, so I’m not going to go into depth describing what the patent filings say about those. The sections involving the background of the invention are pretty interesting, though. I’m going to summarize parts of those to see if they can provide us with some insight into why these were developed and offered by Google.
Search as an unchangeable black box
We’re told that work on information retrieval systems mainly is focused upon improving search result quality, and is typically is measured in terms of how precise those results are, and how many results are recalled. While there may be other quantifiable ways to measure performance, those are two of the main goals.
Techniques used by Web search engines involve designs which encompass basic indexing algorithms and representation of documents, query analysis and modification, relevance ranking and results presentation, and many other methods. However they function, the processes search engines use are controlled internally, and can’t be changed by outside entities.
In other words, search engines operate as black boxes, receiving and processing queries using complex and preprogrammed algorithms and models which rank relevance to provide and order search results. Even if parts of the process are known, the search engine will only operate according to those algorithms and models.
Difficulties with User Intent
The relevance of search results depend upon a user’s search intent: why they are searching and why do they need the information? Two different people using the same query may be looking for completely different answers.
Attempts to solve this problem are often based upon relatively weak indicators, such as static user preferences, or predefined ways of refining queries, often amounting to educated guesses of user interest based on the query terms. These approaches can fail because of the highly variable nature of intent and situational facts that query terms may not clearly indicate.
Context and Informational Needs
The patent filing presents an example of a search using the query “Canon Digital Rebel.”
Does a searcher looking for that term want to buy the camera, or do they own it and want technical support, are they comparing it with other cameras, or may they be interested in learning how to use it?
Those situational facts, and a searcher’s information need cannot be reliably determined by either analysis of query terms, or by looking at previously stored preference data about the user.
The Failure of Inferring Intent by Tracking
Intent might also be inferred by tracking and analyzing prior user queries so that a model of a user’s interests might be created. Search queries from individual users might be collected, so that interests may be determined based on a frequency of key words appearing in search queries, as well looking at which search results the user accesses. See, for instance, Retroactive Answering of Search Queries (pdf).
The assumption that queries can accurately reflect a user’s short term or long term interests may be a problem.
Another potential problem is the assumption that there may be a direct and identifiable relationship between a given information need, such as shopping for a digital camera, and the query terms being used to meet that need. We’re told that assumption is incorrect because the same query terms can be used by the same (or different users) with quite different information needs.
Turning to Specialized Web Sites
Because people can’t consistently rely on search engines to locate information to satisfy their informational needs, they often visit sites offering highly specialized information about particular topics, built by individuals, groups, or organizations with an expertise in those subjects.
These sites, vertical content sites, often include specifically created content providing in-depth information on a topic, as well as organized collections of links to related sources of information.
So, a site about digital cameras may include:
- Product reviews,
- Guidance on how to purchase a digital camera,
- Links to camera manufacturer’s sites,
- Price comparison engines,
- Other sources of expert opinion, and;
- Other helpful information.
People running these sites, subject domain experts, often have considerable knowledge about the value of other sites on the Web. Using their expertise, these content developers can also best structure their site’s content to address the variety of different information needs of users.
A Need to Share Search with Subject Matter Experts
Someone visits one of these vertical content sites, where they find a good amount of useful information related to their needs. They may then return to a general search engine to find more relevant information. But when they do, the expertise they found at the vertical content site is no longer available to them from the search engine.
It’s not unusual for vertical content sites to provide search fields letting people access a general search engine. But those just pass search queries back to the general search engine.
Can the expertise of the owner of the vertical content site become available to a search engine during a searcher’s query, to provide more meaningful search results? If the search engine was a custom one, with some aspects of it programmed by the vertical site owner, it might allow their expertise to be shared with the searcher, with other similar sites using custom search engines, and with the search engine.
Aggregated context information might also be collected from a number of these programmable search engines, and become available to searchers even when they are entering a search at the general search engine instead of at a vertical search site.
Other Aspects of Using Programmable Search Engines
In short, custom search engines at vertical sites allow people to search using content sources decided upon and possibly annotated by the site owners. Information collected from the source choices and the labeling and annotation of those sources, and from the use of those custom searches may help inform results at other custom search engines involving related searches, and in query suggestions offered by Google on search results pages from regular Web searches.
A couple of other important topics are each discussed in individual patent applications – advertising and spam or bias.
Of course, Google would want to show advertisements with search results. Can the context (or user intent) taken from such searchers be used to inform the content of advertisements shown to searchers, or associated with the content shown on one of these vertical search pages?
There is a potential that people will try to abuse a system like this. The patent application focusing primarily upon “spam related and biased content,” describes filtering processes that may be used to avoid abuse.
If you haven’t tried out Google’s custom search engines, they are very easy to set up, and to use. If you own a site that focuses upon a particular subject, and consider yourself an expert on that subject, your expertise in setting up a custom search engine may influence results on other custom search engines from Google, and in suggestions on Google’s results pages in response to certain queries.
The only issue that I have with these patent applications is that they appear to assume that people setting up custom search engines on specific topics are experts on those subjects. Yet, if you visit a site on a topic, and find value and expertise upon the site, you may find value and expertise in a custom search set up on that site, too.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.