This guide is meant to aid IT researchers in finding higher quality information in less time. In a simplified description, the Web consists of these two parts – the surface Web and the deep Web (invisible Web or hidden Web). The deep Web came into public awareness only recently with the publication of the landmark book by Sherman & Price (2001), “The Invisible Web: Uncovering Information Sources Search Engines Can’t See.” Since then, many books, papers and websites have emerged to help the searcher further explore this vast landscape.
Why the fuss? Don’t search engines and directories do everything needed by a researcher? Let’s explore this further. Search engines and directories provide great services, but they are limited. Search engines, index less than 1% of the Web (BrightPlanet, 2005, Deep Web FAQ). The remaining 99% of the Web is located in the deep Web. In addition, information in the deep Web is of higher quality, that is, less “noise” and more focused. If you are searching for information using only surface Web search engines, you are missing 99% of the content of the Web. Moreover, 95% of the deep Web is free publicly accessible information (Deep Web FAQ).
Today’s search engines are marvelous research tools; however, searches often yield more trash than treasure. Sifting through the junk to find the gems can consume large amounts of time. It is noteworthy that the majority of users are frustrated by search engines, Chamy (2000, para. 2) has found that “Web-rage is uncaged after twelve minutes of fruitless searching.” A typical keyword search may uncover millions of “hits.” Even fine tuning, by tweaking your keywords and using the advanced search features of search engines, can yield results that are less than desirable. More importantly, however, is the vast amount of information missed by search engines. It is in these situations where the deep Web can be of help. The deep Web is not a substitute for surface search engines, but a complement to a complete search approach.
The imagery used for the Web is a spider’s web that covers the planet. Search engines are the spiders that crawl all over the Web to extract and index text from websites. Hence, these search engines are called spiders or crawlers. Surface search engines crawl from static web page to static web page to extract text from HTML then index these words. Information stored in databases is not in a format these search engines can access. Databases are accessed dynamically by queries using the retrieval tools unique to the database. An analogy would be that surface search engines can see all the birds floating on the ocean, but can not see the fish. You need sonar to look through the depths of the water to see the fish and a fishing pole or net to catch the fish.
Bergman (2001) contrasts these two parts of the Web:
If you know the URLs of deep Web databases and understand what information is contained in these databases, you can access the deep Web information. However, with hundreds of thousands of databases, and more being added daily, this can be a daunting task. Fortunately, elves on the Internet are busy at work creating portals to this information. Also, surface search engines are beginning to add small quantities of deep Web content to their searches.
An example of a deep Web resource would be the NLM Gateway sponsored by the National Library of Medicine (NLM). Go to this site and type in some keywords. The quality of the medical information you will find in seconds will surpass anything you can find by searching for hours on the surface Web. This example illustrates the value of the deep Web. The secret is in knowing where to look. Part of the purpose of this presentation is to guide IT professionals to some of the best places to find deep Web content.
Tools and Websites
Distinguishing between surface and deep Web sites can sometimes be tricky; many websites have both surface and deep (database) content. Furthermore, some sites have both free and pay areas. Additionally, many general sites contain IT information. To simplify categorization and to provide ease of use for the reader, the websites were placed into categories that would be most useful to the IT professional. For example, membership in a pay site like Educause is only open to organizations and their employees, not to individuals; however, most of Educause’s content is free to visitors, so Educause was placed under the Free Site category.
Free vs. Pay Sites
There are many “free” sources on the Web that follow this same spirit. While there are many free sources of excellent information, fee-based information sources are worth considering using a cost-benefit approach in their evaluation. How much is your time worth? How much time is saved and how valuable is the information to you? Each person needs to decide this for their self. The author has found the resources listed below most worthy of consideration.