Updated: 20th October 2019
Today we are going to travel back in time. I haven’t lost my mind because I’m not talking about breaking the laws of physics! However, I am talking about looking back in time… at websites. How is this possible? Read on to learn about Wayback Machine (operated the by Internet Archive) and why it is such an incredibly useful tool.
Have you ever accidentally deleted a page on your site? What if you need to restore it but you don’t have a backup version?
What happens if that page is a critical page?
Do you try to rewrite it from memory? Or perhaps you would try to rewrite it from scratch?
Well, those are valid if not difficult ways you could try to restore your page… but what about all the optimisations you’d made in the original version?
You might be okay if the page is still cached in a search engine index. It’s likely you could look at that version and possibly restore the copy, creative and optimizations from there.
But what if that cached page had dropped from the indexes?
What would you do?
The good news it there are places on the web you visit to see how your site’s pages appeared in the past. You can potentially could use them to copy what you have lost and restore it. One such place is Wayback Machine.
What is Wayback Machine?
What we’re gonna do right here is go back, way back, back into time.Lyrics from Troglodyte (Cave Man) 1972 by The Jimmy Castor Bunch
The Wayback Machine is an archive of sites, pages and other artifacts found on the web. It was founded by a non-profit organization called the Internet Archive in San Francisco, California in 1996.
The objective of the Internet Archive is to preserve these assets and create a reference for researchers, historians and scholars. More than this though, it’s stated mission is:
To provide Universal Access to All KnowledgeThe Internet Archive
As such the Archive closely with organisations such as The Library of Congress and the Smithsonian Institution.
How Big is the Wayback Machine Archive?
The current estimate is that it contains over 362 billion archived web artifacts since it’s inception.
Wayback Machine Archived Artifacts Grouped by Type
The pie chart clearly shows that web pages make up the majority of the Archive. They represent 91.24% of the total number of artifacts documented.
This is an enormous archive… but clearly not as large as Google’s index, which includes 100s of trillions of indexed pages.
However, the Wayback Machine can show you a number of different past versions of a particular web page. Google’s index does not do this.
The great thing about this is you can run a Wayback Machine search on any website to see how its content has changed. Assuming of course it is present in the archive in the first place.
Why Would a Website Not Appear in the Archive?
There are several reasons why a site will not appear as a result in Wayback Machine searches.
- If sites have password protection or refuse crawler access, the archive will be unable to follow links to snapshot them. The site won’t be archived and so Wayback searches won’t find it.
- It’s possible that a particular site has never been found and so has not been crawled and subsequently archived.
Just like the Google index, you can submit sites to Wayback Machine that do not presently appear in the archive. I’ll come onto this later.
The Ghosts of Pages Past 1: Why Might You Use Wayback Machine?
What we’re gonna do right here is go back, way back, back into time.
Looking at Site Changes
The first reason you’d use Wayback Machine is to look at old versions of pages within a site.
This is useful for several reasons.
- You may have deleted a page accidentally from your site and need to reinstate it but don’t have a backup. You can possibly use Wayback Machine to recreate your lost page… if it is in the archive!
- If you’ve seen a visitor decrease to certain pages you might check to see if it’s because you changed something. You could use the Archive to look at the page and compare it to the current version.
- You might need proof that a detrimental change made in the past had nothing to do with you. Wayback Machine could prove that the change was made prior to you having access to the site.
- Wayback Machine could demonstrate your link building activities to clients. You could use it to show archived pages on sites where your inbound links appear after a certain date.
Looking at robots.txt
The Wayback Machine doesn’t only crawl and archive web pages as you can see in the pie chart above. It will also archive other file types on your domain such as your robots.txt file.
Looking at an archived version of robots.txt might give you pointers if you are having search engine crawlability problems. You could look at a past version of it to determine if any change you made caused the issues.
Checking for Intellectual Property Infringements
Let’s say you’ve seen that someone has been blatantly and illegally trading off your protected trademarks. Or maybe they’ve plagiarised your valuable intellectual property.
You may have sent a cease and desist asking the offenders to remove your intellectual property from their site.
The guilty party may have ignored your legal threats completely, so you decide upon the potentially costly path of litigation.
Your lawyer sets things in motion and all of a sudden your intellectual property disappears from the offending site to “bury the evidence”.
Wayback Machine might be able to show snapshots of the pages on their site where the infringement was committed. This would prove beyond dispute that you have been wronged.
Looking at How a Site Has Changed Over Time
If you take on a new client and want to understand how their website has evolved, Wayback Machine might be the perfect place to provide an overview.
The archive could show you technical changes made or even tell you a story of how the company has developed.
You could even use Wayback Machine in your preparation to pitch to a new client for their business. This might help you demonstrate a deeper appreciation of their story than your competitors who are also pitching.
Looking for Changed URL Structures
The URL structures for a site you manage for a client changed a while back. The organic traffic to the site fell sharply as a result. These changes weren’t documented and so nobody knows how to revert them.
In this scenario you might be able to use the archive to check URL structures and either reinstate them or set up redirections correctly.
N.B. If you’ve noticed decreased visits in Google Analytics, you can identify your historical URL structures there too.
Looking at the Historical Information Architecture of the Site
The archive might be able to show you how a website was organised in terms of the page or category hierarchy. It could even demonstrate the previous navigation structure.
This could be extremely useful when trying to understand whether categories or pages have been merged at some point. Equally it could present you with a better understanding of how past navigation structures have impacted conversion rates.
The Ghosts of Pages Past 2: How to Use Wayback Machine
At the top of the page you’ll see a search box. Type in the domain you’d like to examine and if it has been archived you’ll see something like this:
You can use the timeline at the top of the page to select a particular year. You could also look at one of the circles in the calendar for the year you can currently see. Remember though that only days highlighted with a coloured circle have archived pages.
Hovering on a coloured circle will show you the number of snapshots Wayback Machine took on that day.
Clicking one of the snapshots takes you to the archived version of the page as it looked at that time.
You can click on any links you see on the archived page to browse an archived version of the site. You’ll then see how other pages within the site appeared at that time also.
Alternatively, you can click on the timeline at the top of the page to examine archives from a different year.
It’s that simple!
But What if a Page I want to See is not in the Archive?
Firstly… don’t panic!
It would be a pain a page you wanted to examine was not in the archive. Especially if you wanted to do some of the research I’ve discussed above. The Wayback Machine homepage has a tool that you can use to snapshot a page immediately though. Of course this won’t help to examine a particular issue in the past. But you could at least start archiving the site so it’s available in future.
Type the page URL into the “Save Page Now” box and Wayback Machine will add it to the archive immediately.
The tool will save the page along with any images and CSS it finds there. However, it will not crawl any links it finds on the page and so will not archive the whole domain.
You can add more pages to the archive from a site, but you have to use the “Save Page Now” tool for each one.
If you have concerns about privacy, archive.org does not retain IP addresses on submissions you make to it. So whenever you use the tool your activity is anonymous.
One final note. When a page is archived there is no guarantee when it will be snapshotted again. So you might return to the site again and see only the version that you submitted. Having said this, Wayback Machine will revisit archived pages at some point and the calendar will show this.
- The Wayback Machine is the most important archive site on the Web.
- We can use it to see page versions at given points in the past for a number of useful activities. This assumes there is an archived version of the page of course.
- For fun, you can use it to look at how bad some sites were… if you’re really bored!
Check out my video on how to make a Wayback Machine search and also see how to request an archive for a page.
That’s it for now.
Perhaps you’ve used Wayback Machine in ways that I haven’t identified here?
Thanks for visiting!
Drop a comment below and let me know how you’ve benefited from Wayback Machine. Or feel free to ask me a question.
<— Share this on image Pinterest.