Deceptive data visualisations

As part of my final project I’ll be using CartoDB to make a map of the movements of a family throughout time and space. So, doing some due diligence, I thought I would read a bit about data visualizations. I came across a paper, “How Deceptive are Deceptive Visualizations?”, and thought I would take a look to see what they found.

They start off the article by explaining how useful visualizations can be, but how with the wrong selection of colours, scaling etc., the data can be misinterpreted.

In a way this questions reminds me of the pictures you can find that show 2 images, such as this one below (which I talked about in an older blog post about an art exhibit I saw.)

2009_Vanity_Gilbert_Drawing_700px

“All is Vanity” Charles Allan GIlbert 1892

To see just how deceptive bad visualizations can be, the authors (in connected with an NYU lab class) tested a series of well-known graphical distortions on participants. Below are some examples:

inverted-300x118 - Copytruncated - Copy

For the study half of the participants received a deceptive chart, and the other a controlled one. Each were asked the same questions, which were essentially to measure the difference between the two (exp: How much better are the drinking water conditions in Willowtown as compared to Silvatown?) What the tests showed was the the deceptive chart led to more participants answering the questions with a larger/bigger estimate.

So, how can I transfer over some of these ideas into my own final project. It made me think about the options that will be available to me when I create my map. I know that CartoDB features different visualization options, include to change the shapes of makers, make it animated, and change the basemap. Below are some screenshots of the map options.

While absolutely none of the data has changed in any of these maps, at first glance they do appear to be very different.

Variety can be a great thing, but evidently if we don’t think about how the data will be used and who will be using it, we can run into some problems.

Using historical documents for mapping

Examples of Mapping using Genealogy and Census records

In preparation for my final project (which I discussed a bit in my last post), I decided to check out my competition and see what other kinds of maps are being generated based on census records and other historical documents. There are many out there, so in this post I’ll highlight two, and talk about what I found interesting, problematic etc.

The first one I looked at was from the Smithsonian magazine, by Lincoln Mullen, on slavery in the United States from 1790-1860 (which you can access here). Mullen created two interactive maps, using data taken from census records, that have a time lapse featured.

One of the first things that caught my attention was that the accompanying article was centered around the maps Mullen generated, rather than the other way around. Reflecting on this, I thought it was really important, and showed that Mullen wasn’t just using the maps as filler, or to make this article look cool. He had created the maps, and then was reflecting on them, and using them to build an argument.

The trends seen in the maps are very clear. Moving forward through time the Western United States becomes increasingly populated with slaves. Mullen uses this to talk about how rather than something that was confined to the Southern United States, slavery was widespread across the country. Additionally, he mentions the difficulties of using data from sources such as censuses, citing the example of how no slaves were enumerated in the State of Vermont in 1860, but how historical research has shown that African-Americans were kept in bonds in Vermont during this period.

The second map I looked at was made using a program called CartoDB (a program I think I’ll be using for my own project, and something I’ll discuss in another post.) It shows the movement (reportedly over 1 million locations) of the United Kingdom’s Royal Navy during WWI. It’s data set was created using captain’s logs. (You can see it here.)

The map is animated using a function called “Torque”, which allows you to create animations based on the locations in your data set. The result is the ability to watch your data zip around your screen, as time-lapse moves forward. The clock on the bottom counts us forward through time, and this map even features short summaries for activities (“Trade routes resume” “War begins with Germany”).

One of the first things I noticed was the inability to control the time. In Mullen’s maps you can move forwards and backwards, but with The Guardian’s we can only advance forward (and start/stop.) While aesthetically much nicer than Mullen’s, I found this less user-friendly. The Guardian’s map was also completely standalone and didn’t feature any accompanying text. While initially I liked this, after giving it some though I found it didn’t really prompt me to do any reflection. Instead, I watched the map a few times, mostly noting that it looked cool. I had to actually stop and prompt myself to think about what all the lights meant, and what kind of trends are visible.

Something I noticed about both maps was that they were working with large data sets. They used the data from thousands of documents, spanning hundreds of years. For my final project I want to keep my data set small, specifically focusing on one particular family. I’m curious then how features like Torque in CartoDB will work with my data set.

“…I scarcely know what to call them.”

For my Digital History class, we are working on a final project, which is supposed to be a video game built using a tool called Twine.

I’ll be partnering with someone in my class to work on a game that highlights how you can use a dataset to generate a timeline or map of an individual or group’s movements over time.

For the project, I’m hoping to follow up on a note I read in a journal article by Michelle Hamilton on the enumeration of First Nations and Métis people in Canadian censuses. In her article Hamilton draws attention to the written comments made by censuses enumerators, some of which were obviously frustrated/uncertain on what they should put for groups who appeared to be of “mixed race.” She includes one remark made by Thomas H. Johnson in the 1861 census that said:

 “These people are so mixed up with Indian that I scarcely know what to call them. The principle mixture is white, and they cultivate the soil, so I call them white.”

Library and Archives Canada, RG 31, Statistics Canada, Census of Canada, 1861, C-1091, District 6, Nipissing, 42, lines 4-17.

 

This note is made in the “Remarks” column referring to a group of individuals from the Nipissing District. What I am hoping to do with this project is to track a family in this group and their descendants forward through time (using documents available through Ancestry.ca) and to see how their recorded ethnicity changes based on their location.

What’s interesting is that if when searching for the page of this census through Library and Archives Canada, you cannot find it. That is because the 1861 census had 2 pages for every entry. The first page contained data on name, age, ethnicity, sex, religion etc., while the second contained information on land, livestock, and the “Remarks” column. The first page was digitised, but the second never was.

I spoke about this specific example (and other related “buyer beware” issues of Ancestry.ca) at the Canadian Historical Association Annual Meeting in Ottawa in 2015. I gave the presentation from the standpoint of a genealogy researcher, but now (since this is a Digital History course), I thought I’d try and see what else can be gained from putting on my DH hat.

Trevor Owens, in the draft of his article “Digital Sources & Digital Archives: The Evidentiary Basis of Digital History,” outlines some important questions to think about when consulting a digital archive. While he encourages researchers and historians to ask the same questions of a digital archive as you would a physical one, he also outlines some other important questions to keep in mind. Below, I am going to work through these questions, specifically with the 1 page 1861 census in mind, available through Library and Archives/Ancestry.ca

 

1) Why was this digitized and not something else?

The Canadian Censuses were digitised by Ancestry (free of charge, I should add) to be included on their genealogy website. Ancestry charges a fee for users to access these documents, which are frequently used by both amateur and professional genealogists. Library and Archives Canada likely agreed to have them digitised because 1) a digital copy is a safeguard from over-use, and damage of original copies, and 2) Ancestry provides them with a digital master copy, which includes the scans, and the metadata needed to index them. [1]

 

2) Is this copy of significant quality for my purpose?

Yes, and no. It does contain the data for the individuals I am interested in, but it does not include the notes I know are on the second page.

 

3) How did I find it and how does that effect what I can say about it?

7) What role did search play in the original experience of content?

(Answering both 3 and 7) I found the page by searching Library and Archives Canada, but first truly through Hamilton’s note. Search played a large role in the retrieval, and will undoubtedly play an even larger role as I search for more information on the family. The search function on Ancestry works by checking the search terms against an index created from the source. To the best of my knowledge, this index is created by individuals (not through OCR). Therefore, the index is “as reliable as the competency of the people hired to decipher original records.”[2] Each individual indexer must interpret what they see in front of them; is that a lazy “L” or a “J”? What do you put down if something has been crossed out, with a new entry written overtop?

 

4) What are you not seeing on the screen?

5) What is lost in how it was/is rendered?

6) How was this created, managed and used and how does that impact what one can say about it?

(I’ll answer these three together) As I’ve already discussed, what is not seen on the screen is the second page. Perhaps someone at Ancestry weighed the value of knowing how many cows your ancestor owed, and decided it was more cost-effective not to scan the second page. Additionally, the majority of the people enumerated in the 1861 didn’t receive any “Remarks.” But not including this information, or at least making it known for those searching that a second page exists, is extremely problematic, particularly for individuals who are using Ancestry to prove Aboriginal heritage. Furthermore, what is not seen on Ancestry (or on Library and Archives Canada for that matter) is the motives and manner in which the census was collected, who was the enumerator, who was the indexer etc.

A lot of food for thought heading into the project!

 

[1] Christine Garrett, “Genealogical Research, Ancestry.com, and Archives,” (master’s thesis, Auburn University, 2010), 28-9.

[2] Brenda Dougall Merriman, Genealogy in Ontario: Searching the Records (Toronto: Ontario Genealogical Society, 2013), 17.

The Tenement Museum, NYC

Two weekends ago (during the snowpocalypse) I went to The Tenement Museum in New York City.

(Does the Tenement Museum sound familiar? That’s because I name dropped them in this old blog post.)

The museum is tucked away in the Lower East Side of Manhattan, historically an immigrant neighbourhood. While the museum is only 24 years old, the actual building was constructed in 1863 when the area was known a Little Germany, or Kleindeutschland. Over its lifetime, the building has been home to approximately 7000 residents.

Dedicated to sharing the stories of these residents, the museum offers a variety of tours. One such tour that I went on takes visitors inside the building, showing them the apartments where families lived. Guides share the stories of residents, pieced together from archival documents, genealogy research, and oral history interviews. Rather than providing all the answers, guides prompt visitors to use their imagination, and imagine themselves in the shoes of the residents. 

The building itself is a character in each of the stories highlighted in the tours. Guides call attention to the visible layers of history that can be seen by visitors, including  layers of wallpaper and flooring. Left vacant for almost 50 years, the residential portion of the building has been left almost untouched by museum staff.

At the end of each tour, visitors are asked to reflect on the evolution of the families that lived in the tenement. Residents struggled to survive, but ultimately got their slice of the American Dream for their children and grandchildren. Visitors are asked to think on the situation of present immigrants in the United States and to compare them to the stories they’ve heard.

This final exchange is where The Tenement Museum truly shines as an exceptional institution. Rather than simply leaving each visitor thinking “What a great story”, they encourage visitors to think long and hard about their interactions. From this reflection hopefully visitors will achieve a new degree of compassion and empathy in their lives; rather than seeing people whose customs and language are foreign , they’ll see the seeds of future generations, and thriving citizens.

A big lesson from a little museum.

To learn more about The Tenement Museum, visit their website, or their blog.

A note on pre-requisite tutorials

A note on pre-requisite tutorials

by emilykkeyes (01-29-2016)

Having now managed to get a few Programming Historian tutorials under my belt (with great difficulty), I have a suggestion to make.

In almost all of the Programming Historian tutorials I’ve completed they refer back to previous Programming Tutorials, suggesting that the reader/user complete those prior to completing this tutorial. This in itself is fine. Obviously to move forward, they idea is to build on previous knowledge.

The problem arises when you get to the pre-requisite tutorial, and IT recommends you complete a pre-requisite tutorial. For example, in Data Mining the Internet Archive Collection, they explain that you will need something called pip. They recommend that you download this using the instructions in the Installing Python Module with pip tutorial. Okay, fine. But then when you get to the Installing Python Module with pip, they explain that the easiest way to install pip is by using a python program….what if I don’t have Python installed? Or even know what Python is? Well, there’s a Programming Historian tutorial for that, which is great. But what ends up happening is I spend the time I should have been spending completing my first tutorial, completing this tutorial, and then have to work my way back.

My advice would be that each tutorial be self-contained. Obviously, this can’t be the case for every one, but by continually sending users to other tutorials you risk losing them. As one of the targeted users for the Programming Historian tutorials, I likely would have given up if I hadn’t been required to complete it for class. Harsh, but true.

Otherwise, keep up the good work Programming Historians!

Automated Downloading with Wget

Programming Historian Tutorial, by Ian Milligan
by emilykkeyes (01-27-2016)

Each week in #hist5702w, we’re assigned tutorials/readings. The tutorials usually come from The Programming Historian, a website that offers tutorials on digital tools/techniques for historians/individuals.

This week’s tutorial focused on a tool called Wget, which can help you download online material. I was very excited about how I might be able to use this tool, since in my part-time work with Know History we often need to download a large number of historical documents (in most cases census records, birth records etc.) Usually in these cases the task of copy and saving these files falls to me. Not only is this usually tedious, but there is a huge margin for error, not to mention the decisions that need to be made on how these documents should be filed and stored once they are downloaded.

Just looking at the tutorially, I was already  appreciated how it was broken down for Mac and Windows users. In this course I’ve been cautioned that Windows is very different (i.e. more problematic) than Mac.

The first issue I ran into was the downloading instructions. Admittedly, this was more of a reader/user error. Eagerly following the link to download Wget, I was immediately confused with all the download options:

so many options

So many options, what version do I need?!

Lucikly, per the #hist5702w mantra, I turned to my classmates (shout out to Laurel), and the problem was easily solved. What I needed was just wget.exe.

Second problem, again a great reader/user problem! When downloading wget.exe, I did not put it in the right directory. Instead of putting it in C:Windows, I left it in C:. This would cause problems for me later, forcing me to start at the top of the tutorial and read carefully again (something I’m growing increasingly familiar with doing…there’s a lesson to be learned there, but maybe I just need to be hit with it a few more times.) Again, shout out to Laurel for bringing this error to my attention.

SO, finally armed and ready with wget.exe, I proceeded to input the commands using Powershell. Powershell was used in the Command Line Bootcamp tutorial was recommended at the beginning of the Wget tutorial, so I went with that. (side bar: that was a great tutorial. Really clear and easy to follow along.)

Moving along, I was fine, right up until this appeared on my screen:

wget issues

…well this doesn’t look like what the tutorial said it would.

Again, turning to my trusty DH guide, Laurel stepped in. Deciding to give Powershell the proverbial finger, Laurel switched me to CommandPrompt. Repeating the steps, things worked fine, and I finally completed the tutorial yahoo!

Thoughts on Wget: It seems like it could be an immensely useful program to download journal articles, historical documents etc. However, I’m curious to see what kind problems you’d encounter trying to download from an archival repository such as Library and Archives Canada. Do they have safeguards in place? Also, what are the implications of ripping documents from their archival context?

Complete the Wget tutorial here.