{"id":337,"date":"2024-05-13T12:00:00","date_gmt":"2024-05-13T12:00:00","guid":{"rendered":"https:\/\/cherylroll.com\/entity-oriented-search-the-evolution-of-information-retrieval-explained-440395\/"},"modified":"2024-05-13T12:00:00","modified_gmt":"2024-05-13T12:00:00","slug":"entity-oriented-search-the-evolution-of-information-retrieval-explained-440395","status":"publish","type":"post","link":"https:\/\/cherylroll.com\/entity-oriented-search-the-evolution-of-information-retrieval-explained-440395\/","title":{"rendered":"Entity-oriented search: The evolution of information retrieval, explained"},"content":{"rendered":"
We rarely stop to think about the lightning speed of modern information access. Try picturing a time when answers lived only in libraries – it seems archaic now. <\/p>\n
Search tools have become so powerful that they grasp the meaning behind your questions, not just the individual words. This capability is the result of an evolution from keyword to entity-oriented search. While it may seem complex, today we are going to break it down.<\/p>\n
Think of a simplified world where websites are replaced by books, and answers are found by a team of 1 million dedicated workers. This analogy will help us understand the systems powering entity search, giving you a newfound appreciation for the speed and accuracy we enjoy today.<\/p>\n
Through this exercise, you’ll understand:<\/p>\n
Imagine you are responsible for a vast library with thousands of books and access to a million diligent workers. Unlike in a normal library, customers want answers to their questions and are not looking for books to read from front to back. <\/p>\n
Customers constantly approach with questions (queries), eager for answers. Your mission is to find the information they need as quickly as possible. <\/p>\n
For your library to be successful, you’ll need to return better answers that save customers time than other libraries. <\/p>\n
Let’s imagine someone asks, “how fast is the fastest animal”?<\/p>\n
If you were a traditional library you’d begin by scanning titles, hoping for a similarity match. The customer would likely receive a stack of books and it would be their job to read through the books and try to find the answer. <\/p>\n
This process may take hours. Not to mention, there could be better books that just don’t get returned because their titles are too unrelated. <\/p>\n
You decide this process is too slow and that this might be a task for your workforce. To accelerate things, you enlist your million-strong workforce to create a comprehensive index. <\/p>\n
Instead of focusing on whole books or titles like your original index, they catalog each individual page. Each worker meticulously records every word on a page, along with its location.<\/p>\n
The result is what is called an inverted index. The structure looks like this: <\/p>\n Now, when a customer asks, “What is the fastest animal?” your team consults the index, pinpoints “fastest” and “animal,” delivering a list of relevant pages and any page that is in both lists. <\/p>\n This mirrors a traditional search engine – we’re finding keywords, but we do not yet understand the deeper meanings. <\/p>\n Now, the customer is getting a list of hundreds to thousands of pages that may contain the answer. This saves the customer much time as they can jump to relevant pages to hopefully find their answer. <\/p>\n Our inverted indexes were a major leap forward, saving time for both your team and customers. <\/p>\n Word of your improved system spreads, and soon, patrons are lining up at the door. <\/p>\n However, complaints start to arise about irrelevant results and factual errors. Striving for excellence, we recognize the need to address these concerns.<\/p>\n Issues<\/strong><\/p>\n A word like “apple” leads to an overwhelming response – recipes, science, you name it, are all returned. How can we address this?<\/p>\n This is a tricky problem, and we will need to train your workforce on a few different approaches. <\/p>\n The first approach that might make sense is to train the workforce to grasp context <\/strong>to distinguish (disambiguate) between multiple meanings of a word. For example, if “Apple” is followed by “computer” or “iPhone,” it signifies a different entity than when it’s near “pie” or “tree.” <\/p>\n While using contextual clues is a powerful approach, it’s deceptively difficult. Your workforce needs to learn how to identify the subtle cues that reveal an entity’s true meaning within the surrounding text. This is challenging, requiring a nuanced understanding of language and subject matter expertise that machines may take years to replicate.<\/p>\n To effectively employ context in distinguishing word meanings, we must first construct a robust foundation that empowers our workforce to reorganize the index.<\/p>\n Here are the three steps we will achieve and discuss below: <\/p>\n Your workforce will be trained on the following three steps to help build clues as to which entity is used in the text: <\/p>\n Over time, these observations form the foundation of your guidebook. It could include:<\/p>\n Just like search engines, this system isn’t perfect. The workforce will still encounter ambiguity, but the guidebook dramatically increases their ability to identify the correct entity based on context. <\/p>\n This guidebook can then be used to identify new entities and link existing text to pre-existing entities (called entity-linking). <\/p>\n Building a comprehensive knowledge base from scratch would be a mammoth task. Fortunately, resources like encyclopedias provide a valuable foundation. <\/p>\n Just like Google, we can leverage existing knowledge sources like DBpedia. DBpedia offers well-structured categories and attributes (think of these as specialized tags), giving us a head start in organizing your library’s knowledge.<\/p>\n A key decision to make about your knowledge graph is what are the ontologies. We will try to develop ontologies that correspond to the types of queries we see coming into your library. <\/p>\n Next, your tireless workers must transform raw, unstructured information, such as the words on a page into linked knowledge. They’ll re-analyze the library’s books and incoming content, using contextual clues to identify and connect entities to DBpedia’s structure.<\/p>\n Example<\/strong>:<\/em> Let’s say a page describes a cheetah’s incredible running speed. Your workers might: <\/p>\n Let’s quickly go through an example of the entity linking process: <\/p>\n Each entity and relationship your team identifies becomes a node and edge in your growing knowledge graph – a visual map of connected information! <\/p>\n This structured format allows us to move beyond simple keyword matching and truly understand the meaning behind text. With the knowledge graph, we can augment our index with entities, not just terms. <\/p>\n Unlike plain text, entities have rich attributes associated with them. This deeper understanding will empower us to analyze unstructured text more effectively, interpret user queries more accurately, and provide highly relevant answers.<\/p>\n Get the daily newsletter search marketers rely on.<\/p>\n \t\t\t\t\t\t\t<\/figure>\n
Isolating entities: Beyond keywords<\/h3>\n
\n
Step 1: Building the guidebook<\/h2>\n
\n
\n
Step 2: Creating a knowledge base (hint: we won’t build this from scratch) <\/em><\/h2>\n
Embracing existing knowledge<\/h3>\n
<\/figure>\n
Entity linking: The art of connection<\/h3>\n
\n
<\/figure>\n
Step 3: The knowledge graph takes shape<\/h2>\n