Local Data: Not Sexy Just Critical

A few weeks ago Steven Aldrich, a VP of strategy for small business software (and now marketing) company Intuit, delivered a speech in which he presented an amazing statistic. According to Aldrich, roughly 6 million businesses are started annually in the U.S. but another approximately 5.6 million go under. Think about it.

From the point of view of these small businesses, there’s a fundamental challenge to survive. From the point of view of search engines and directories trying to reflect and catalog their fleeting existence, there’s another kind of challenge – a data challenge.

According to the U.S. Census Bureau there are 23,343,821 “firms” in this country. Of those, 5,697,759 firms have one or more employees. But almost 99% of U.S. businesses have fewer than 20 employees and most have fewer than four. These data are mirrored by similar statistics (where they exist) around the world.

Beyond this, the majority of small businesses conduct a majority of their buying and selling (B2B and B2C) within 50 miles (even 20 miles) of their physical location. What all this means is that most U.S. business is fundamentally local. And the essence of what we broadly call “local search” is about capturing data on where these businesses are and what they do.

This is not sexy. But it’s the “bread and butter” of local.

No matter how fabulous the maps and the “Asynchronous JavaScript and XML” (Ajax) interfaces, if the basic local data aren’t there or are flawed the application invariably disappoints. We’ve all had the experience of looking for a local business we know is there and not finding it; or, conversely, looking for local “cafes” and only finding Starbucks listings.

Starbucks is there because it’s easier to get Starbucks’ location information than it is to collect data on independent local coffee houses. Getting good and accurate “local local” data, partly because of Aldrich’s equation above, is very hard.

Every top-tier U.S. local search competitor relies at its core one or more of the big commercial databases. There are basically a handful of major providers in the U.S.:

I recently tried to do an assessment of the cost and quality of first three and it’s quite challenging. Within the industry there’s a lot of criticism and, to borrow a sports phrase, “trash talk” about the relative quality and freshness of the data. And the databases tend to be very costly (although not across the board).

But if the databases are imperfect it’s because collecting data on millions of businesses is extremely difficult. And most people – even some of those working in local search – often fail to appreciate the Herculean task of doing so.

Most of these commercial databases are built from telephone company records (or phone directories) and then supplemented in various ways. InfoUSA, for its part, has an Omaha, Nebraska call center where it does out-calling to verify the accuracy business listings information in its database. But even this can fail to correct all the potential errors.

These core databases form what might be called the foundational layer of local search. But they certainly don’t complete the structure. The other layers include information gleaned from crawling the Internet and user-submitted content (from both businesses and consumers).

Crawling captures local information that can be missed in the telco databases. And some of that online local data is fresher and more accurate. But crawling can also yield inaccuracy (because it recapitulates mistakes published elsewhere). Thus getting the data directly from the source or the community is the ultimate prize.

Given that there are many local businesses (more than 50%) that still don’t have websites – even though their data may in fact be somewhere online – you would have thought that the search engines and directories would have been very aggressive and accommodating in encouraging them to directly input their information. To its credit, Yahoo has for some time allowed business owners and more recently consumers to correct and update information. (See, for example, “edit this listing.”) And Google not long ago enhanced its Local Business Center and greatly expanded the information that could be included.

Indeed, most search and directory sites now have places where local businesses can directly input information. (See Stacy Williams’ five-part article on the subject.) But those screens are commonly buried and not easy to access.

The final and, in many ways, most promising layer of local data is from the community. Sites like Citysearch and Yelp, among others, are helping build out the local database with user-generated content that provides intangibles (opinions, recommendations) that are becoming an increasingly important part of the local search experience. The community, as suggested above, can also rectify inaccurate listings information.

In a bold experiment, a couple of years ago, UK-based entrepreneur Paul Youlten created Yellowikis, a global directory site to be populated entirely by the community, Wikipedia style. Yet this is a long-term project, especially when there are so many competing directory products in the market.

All these data sources are symbiotic rather than mutually exclusive. An empty container is unlikely to be filled entirely by a community (Yellowikis notwithstanding); there must be something there to react to and modify. Also, increasingly, a skeletal database of business listings and contact information is not going to satisfy users, who are typically also looking for recommendations and other tools to help them make buying decisions.

And there are other non-traditional data sources, such as Urban Mapping, that help complete the application.

These layers and the challenge of capturing and updating information illustrates one of the least visible (or most visible) but critically important aspects of local search: the data. It’s messy, often ugly and typically hard to get. Yet, as I’ve argued, it’s the heart and soul of local and one of the things that makes it a good deal more complicated than search in general.

Greg Sterling is the founding principal of Sterling Market Intelligence and publishes Screenwerk, a blog focusing on the relationship between the Internet and traditional media, with an emphasis on the local search marketplace. The Locals Only column appears on Mondays at Search Engine Land.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

