Searching for meaning
The type of Internet search that we have now is mostly based on scanning for specific text although searching pictures is beginning to add some graphical features. The present search algorithms work remarkably well but have serious limitations and often turn up irrelevant links.
For example, if you ask a question, search engines will be unaware of the context and will instead look for pages containing the specific words used in the question. So if your search is, “How many lawyers work at Microsoft”, the computers will give you links to pages containing “Microsoft”, “lawyers”, and “work” but will not answer your question. I tried it on Google and got 2,650,000 hits. None of the first few hundred links answered the question although one estimated the number of lawyers at Disney. The answer may have been buried somewhere in the hundreds of thousands of pages but as a practical matter the question was not answered. When I searched the exact phrase, “How many lawyers work at Microsoft”, Google said nothing was found. I tried “size of Microsoft legal staff ” but had no luck. Bing didn’t know either.
Then I tried “legal budget Microsoft”. There were 7,270,000 hits but the answer was actually in the very first one. (If you are curious, the answer is an annual legal budget of $900 million and 1050 employees in its legal department, including 450 attorneys.) So, if you try different phrases, synonyms, and related subjects you may finally get your answer.
The example I have just given illustrates some of the problems with search algorithms but at other times the results are remarkably good. I searched Google for the question, “How many people live in Schenectady NY”, and got the answer right at the top of the search listing. But it was the answer for the county, not the city. The city population was several entries down. Bing gave the city population in the third entry.
Ideally, the computer would understand the meaning and context of our search queries but that’s a tall order and asks a lot of computers. Still, the goal of semantic search is being pursued and progress is being made. Science Daily reports a recent development:
European researchers have created the first integrated semantic search platform that integrates text, video and audio. The system can ‘watch’ films, ‘listen’ to audio and ‘read’ text to find relevant responses to semantic search terms. At last, computers are able to look for meaning in our multimedia searches.
Further on, the article discusses semantic search:
Right now, text in computing is defined by a series of numbers, most commonly the Unicode standard. Each number signifies a particular letter, and computers can scan these codes very quickly. So when you enter a search term, the machine has no idea what those letters signify. It simply looks for the pattern — it has no inkling of the concept behind the pattern.
But in semantic search, every bit of information is defined by potentially dozens of meaningful concepts. When a copywriter invoices for his or her work, for example, the date could be defined in terms of calendar, invoice, billing period, and so on. All these definitions for one piece of information are called ‘metadata’, or information about information.
Collections of agreed metadata terms for a particular field or task, like medicine or accounting, are called ontologies.
So the computer not only searches for the term, it searches for related metadata that defines types of information in specific ways. In reality, the computer still does not ‘understand’ a concept in its semantic search — it continues to look for patterns of letters. But because the concepts behind the search terms are included, it can return results based on concepts as well as text patterns.
Did you enjoy this post? Why not leave a comment below and continue the conversation, or subscribe to my feed and get articles like this delivered automatically to your feed reader.


Comments
No comments yet.
Sorry, the comment form is closed at this time.