Tips for Finding Content on Archive.orgDue to the vast amount of data available on Internet Archive, and that searching is full-text based on optical character recognition, rather than keyword-based, broad searches can often yield too much information. Using some of the following search techniques to effectively limit your search can often be critical.
- Search By Collection -- One trick that I use frequently is to narrow my search by collection. The key here is to learn the collection identifier which you can then use to limit your search to a particular collection. In most cases the identifier is the last portion of the URL. For example, the identifier for the American Methodism Project on Internet Archive is "americanmethodism," seen at the end of the main URL for the collection (http://archive.org/details/americanmethodism). You can alternately find the identifier by selecting the link to Browse by Subject / Keywords or the link for All items (most recently added first) where you will see the collection identified as collection:americanmethodism. This collection identifier can then be used to limit your search to a particular collection with a search query that combines your search term with the collection identifier, such as "kennedy collection:americanmethodism" which will then bring up search results for the term "kennedy" within the American Methodism Project. You can also search by collection from the Advanced Search page, but some collections can be hard to identify this way, and then there is the confusing issue of two concurrent alphabetical lists of collections -- one beginning with capital letters, and a second with lower-case (americanmethodism doesn't appear under the collections that begin with capital A, but does appear under the lower-case a's if you keep scrolling down the extremely long list).
- Search by Title or Series -- If you are looking for a book or series with a particular title then the search qualifier title can be used to limit your search to a particular title. Example: title:(pennsylvania archives) brings up a list of results with "pennsylvania archives" in the title.
- Search via Google -- Use Google's website limiter to search the Internet Archive. Google allows you to limit your search to a particular website by using the search qualifier site: followed directly (no spaces) with the URL for the website (everything after the www or http://). For example, a search for "pennsylvania archives" site:archive.org returns results for the phrase "pennsylvania archives" on the Internet Archive website. There is no guarantee that everything available on Internet Archive will be found through a Google Search, but if you are more familiar with Google's search syntax, this can sometimes help to simplify your search. And sometimes it will bring up results that do not appear in Internet Archive's own search results! Try a search for fernhaver from Internet Archive's home page, and then try the same search in Google (fernhaver site:archive.org), and you'll see what I mean.
Key things to keep in mind as you search Internet Archive:
1) Most content is full-text searchable due to the wonders (and limits) of OCR (optical character recognition) technology. There will be many words and even pages, however, that will not convert correctly from image to text. If you find a particularly promising book or record, then browse it manually instead of just relying on search.
2) Not all content on Internet Archive is OCR'd and searchable. This primarily applies to handwritten content, such as the previously mentioned U.S. Census records. Again, you'll need to browse these records manually. There are also many books, such as the Pennsylvania Archives series, that have been scanned and can be searched individually, but don't appear in Internet Archive's broad search results -- even if you limit the search by title. However, you can search each book individually, either by using the search feature for the PDF version, or by selecting the "full text" version and then using the "find" feature in your browser to search.
1. David Rinehart, "10,000,000,000,000,000 bytes archived!," Internet Archive Blogs, posted 26 October 2012 (http://blog.archive.org : accessed 27 January 2013).
2. Brewster Kahle, "Wayback Machine: Now With 240,000,000,000 URLs," Internet Archive Blogs, posted 9 January 1913 (http://blog.archive.org : accessed 27 January 2013).