Searching the Deep Web
Search engines crawl only a fraction of the information available on networks. The so-called “Deep Web” remains largely inaccessible. The New York Times discusses the problem of the Deep Web and how new technologies may bring some of this hidden information to light:
One day last summer, Google’s search engine trundled quietly past a milestone. It added the one trillionth address to the list of Web pages it knows about. But as impossibly big as that number may seem, it represents only a fraction of the entire Web.
Beyond those trillion pages lies an even vaster Web of hidden data: financial information, shopping catalogs, flight schedules, medical research and all kinds of other material stored in databases that remain largely invisible to search engines.
The challenges that the major search engines face in penetrating this so-called Deep Web go a long way toward explaining why they still can’t provide satisfying answers to questions like “What’s the best fare from New York to London next Thursday?” or “When will the Yankees play the Red Sox this year?” The answers are readily available — if only the search engines knew how to find them.
Did you enjoy this post? Why not leave a comment below and continue the conversation, or subscribe to my feed and get articles like this delivered automatically to your feed reader.


Comments
No comments yet.
Sorry, the comment form is closed at this time.