Have you ever been researching something and come across a website that looks like it’s exactly what you’re looking for, only to click the link and find that it doesn’t exist anymore? Unfortunately, the Internet changes quickly, and things as simple as reorganizing a site can completely invalidate any existing links — even if the information is still there! And, of course, it takes money and effort to keep websites running and well-maintained, so less-popular sites have a tendency to eventually disappear when their owners tire of them.
Fortunately, this has been a problem for long enough that people have started to develop ways to get around it. Unfortunately, most of us still don’t know about them, and I find myself teaching people often enough that I thought I ought to write a brief article. I will present a couple of techniques, from simple to more complex.
1: Check the Google cache.
This technique is useful if a site is experiencing temporary downtime, or if you click on a search result and find the page has moved to a different URL.
When you search Google (or any other Web search engine), you’re not actually sending a request to every website that the content might be on. Instead, you’re searching an index of the contents of all the pages the search engine knows about. In addition to the indexed form, most search engines store copies of all the pages they’ve been to (usually excluding media to save space). Here’s the secret: you can access them too, and you can access them even if the original site is offline.
On Google, there’s a teensy little down arrow next to the page’s URL (in green). If you click on that, there’s a “cached” option. (Google likes to move this option around, so if it’s not there, poke around and see if it’s somewhere nearby.) Bing has a similar little arrow; Yahoo is the sensible one and just has the word “cached” next to the URL.If the page won’t come up in a Google search or the version in the cache also gives a 404 error or doesn’t contain all the information you needed, though, you’ll have to try a different method.
2: Access old Geocities pages
This is a very specific technique, but if you happen to want the contents of a page starting with http://geocities.com, know that Yahoo permanently deleted all the contents of this former free web hosting service. A lot of it was junk, yes, but there was also some very useful information on Geocities websites. Fortunately, the nice folks at Reocities copied a large portion of the content before it went down. You can try accessing it this way by simply changing the ‘g’ to an ‘r’ in the URL you’re trying to access.
3: Use the WayBackMachine
Even if the page you’re trying to access was not hosted on Geocities and has been down for years, there’s still hope. The Internet Archive’s “WayBackMachine” keeps permanent snapshots of over 390 billion pages (at latest count), so there’s a good chance that the one you’re looking for is there.
Using it is straightforward: browse to http://archive.org and paste the URL you’re trying to access into the WayBackMachine box. (You have to remove the http:// first if you’re pasting.) You’ll then be prompted to select a snapshot date; the more popular the site, the more frequent the snapshots. You can start by selecting the latest one, but if that only contains an error page you can move back several until you find what you’re looking for. This is also handy if you want to view an old version of an article or manual, or if you just want to have fun seeing what the Google homepage looked like in the nineties.
Now you can be a little bit less frustrated next time you get the dreaded 404 error!