Automated Downloading with Wget

Programming Historian Tutorial, by Ian Milligan
by emilykkeyes (01-27-2016)

Each week in #hist5702w, we’re assigned tutorials/readings. The tutorials usually come from The Programming Historian, a website that offers tutorials on digital tools/techniques for historians/individuals.

This week’s tutorial focused on a tool called Wget, which can help you download online material. I was very excited about how I might be able to use this tool, since in my part-time work with Know History we often need to download a large number of historical documents (in most cases census records, birth records etc.) Usually in these cases the task of copy and saving these files falls to me. Not only is this usually tedious, but there is a huge margin for error, not to mention the decisions that need to be made on how these documents should be filed and stored once they are downloaded.

Just looking at the tutorially, I was already  appreciated how it was broken down for Mac and Windows users. In this course I’ve been cautioned that Windows is very different (i.e. more problematic) than Mac.

The first issue I ran into was the downloading instructions. Admittedly, this was more of a reader/user error. Eagerly following the link to download Wget, I was immediately confused with all the download options:

so many options

So many options, what version do I need?!

Lucikly, per the #hist5702w mantra, I turned to my classmates (shout out to Laurel), and the problem was easily solved. What I needed was just wget.exe.

Second problem, again a great reader/user problem! When downloading wget.exe, I did not put it in the right directory. Instead of putting it in C:Windows, I left it in C:. This would cause problems for me later, forcing me to start at the top of the tutorial and read carefully again (something I’m growing increasingly familiar with doing…there’s a lesson to be learned there, but maybe I just need to be hit with it a few more times.) Again, shout out to Laurel for bringing this error to my attention.

SO, finally armed and ready with wget.exe, I proceeded to input the commands using Powershell. Powershell was used in the Command Line Bootcamp tutorial was recommended at the beginning of the Wget tutorial, so I went with that. (side bar: that was a great tutorial. Really clear and easy to follow along.)

Moving along, I was fine, right up until this appeared on my screen:

wget issues

…well this doesn’t look like what the tutorial said it would.

Again, turning to my trusty DH guide, Laurel stepped in. Deciding to give Powershell the proverbial finger, Laurel switched me to CommandPrompt. Repeating the steps, things worked fine, and I finally completed the tutorial yahoo!

Thoughts on Wget: It seems like it could be an immensely useful program to download journal articles, historical documents etc. However, I’m curious to see what kind problems you’d encounter trying to download from an archival repository such as Library and Archives Canada. Do they have safeguards in place? Also, what are the implications of ripping documents from their archival context?

Complete the Wget tutorial here.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s