Continuing in our series on the Internet Archive, we have the one thing it might be known best for, the Wayback Machine! There are over 462 billion web pages saved in the Wayback Machine, which leads to some powerful options.
The Wayback Machine is named for the WABAC time machine from the Peabody’s Improbable History segment of The Rocky and Bullwinkle Show, and like a time machine, everyone has played around with the most basic usage of the Wayback Machine. Want to know what WordPress.org looked like in 2003? No problem, the Wayback Machine has it. How about what Apple.com looked like in 1997, or what Mozilla.org looked like in 1998? The Wayback Machine will be hours of fun if that’s what you’re looking for, but what else does it offer?
The power of the Wayback Machine is in what it stores: everything. The entire source of the page, along with any available media, is stored. First of all, you might be thinking, “I’d better block that immediately!” Don’t. No one is going to purposefully visit your site through the Wayback Machine instead of just normally visiting your site, that’s silly. Allow your site to be archived for history, there’s no reason not to.
So, what does this “everything” get you? Quite a bit actually. Ever wonder what would happen to your site if you found out your backups were bad? The Wayback Machine is here for you to copy and paste whatever text you need to, and to re-upload any media it was able to archive. Does something seem odd in your site lately, something you can’t quite identify? Instead of fully restoring an old backup, compare your site to last month’s archive on the Wayback Machine. If you can identify what’s different, you can even view the source like you would on any normal web page to dig into the deep details.
As a true story of its power, we use the Wayback Machine almost every day in Jetpack support. When you connect Jetpack with your blog, it ties everything to your blog’s URL, and assigns that URL a unique blog ID. If you’re running the Stats module, you can find that ID in the source output towards the bottom. Just look in the source for “blog:’number'” and that number is the blog ID. Sometimes people move their blog to a new domain, and Jetpack will get confused and think it’s a new site (we’re working on ways to improve that). If we can find the old site in the Wayback Machine, we can find the old blog ID in the source, and then we can fix everything.
The Wayback Machine has a lot to offer, and you only need to start digging to get a good grasp of just how much there is. Storing so much data isn’t cheap though, and the Internet Archive needs your donations to keep it running. Dive into history with the Wayback Machine and see what you can uncover! Next time? Smart 404 handler!