The majority of our readings will be available online or through a digital course packet. We will read a few books, however, which you will need to purchase:
- Stephen Ramsay, Reading Machines: Toward an Algorithmic Criticism, University of Illinois Press, 2011.
For learning R, we’ll primarily use two open-source (and both in-progress) textbooks. Because I expect you to need these frequently, I’ve added quick links to them in the class website sidebar:
- Lincoln Mullen, Digital History Methods in R
- Garrett Grolemund and Hadley Wickham, R for Data Science
There are a few other R books worth knowing about, though we won’t be using them directly in this class. If you’d like to expand your reference set these would be the next books to get:
- Taylor Arnold and Lauren Tilton, Humanities Data in R: Exploring Networks, Geospatial Data, Images, and Text, Springer, 2015.
- Matthew Jockers, Text Analysis with R for Students of Literature, Springer, 2014.
As a phrase, “humanities data analysis” has two distinct valences: it could mean the (computational) analysis of humanistic data OR the humanistic analysis of data. In this “Humanities Data Analysis” class you will do both:
Our readings this semester will help us think through how computational data analysis techniques might help us understand typical objects of humanistic study—e.g. books, newspapers, magazines, or journals—while also historicizing and theorizing data itself from humanistic perspectives. You will engage with these questions as well through 5 blog post responses published through the semester (at least 2 before mid term). You will also each lead our class discussion once during the semester. On that week you will be responsible for both the required and optional readings listed on the syllabus, and should come to class with a set of guiding questions or observations for our discussion.
You will be expected to apply data analysis techniques throughout the semester, both to data I will provide and, later in the semester, to data you will bring to the class. These exercises will take various forms. In some cases you will be completing exercises from our R textbooks and online resources, and in others I will provide you with exercises or problem sets. See the section on coding below for more on these expectations.
Through the semester you will be conceptualizing, planning, and executing a project, either exploring an interesting facet of a course dataset or better, employing a dataset related to your own research interests. We will talk more about these projects throughout the term, but in general they will blend humanities and computational methodologies, and will include both code and written reflection on its effects.
In this class, you will think about coding and you will have to do some coding. If you’ve never coded before, this will be frustrating from time to time. In fact, if you’ve done a lot of coding before, it will still be frustrating from time to time!
Since at least Stephen Ramsay’s “pithy, underdeveloped position paper at the ‘History and Future of Digital Humanities’ panel at the 2011 mla”, the question of whether humanists should code has been a vexed one in the digital humanities. In this course won’t dwell on these debates, except to say that the answer to “should I learn to code?” is almost always, “what is your research question?” This course will presume that your research questions involve either the analysis of data—in which case coding may be the only way to realize your specific vision—or building resources other scholars might want to analyze—in which case you should know the kinds of things sophisticated users will want to do with your tools, so you can make them work better. In other words, this course will not argue every humanist needs to learn to code, but it presumes you might.
I certainly do not expect anyone to come out of this class a full-fledged developer, nor could I teach you how to become one. We’ll be focusing on building skills less in full-fledged “programming” than in “scripting.” That means instructing a computer in every stage of your work flow, and often involves tweaking code written by others rather than starting from scratch. I hope that by doing some scripting, you’ll come to see that debates over learning to code brush over a lot of intermediate stages and flatten a range of skills into a simple binary (pun intended) achievement.
Even scripting will require you to use a programming language rather than a Graphical User Interface (GUI), which may be almost all the programs you’ve used before. Using a language takes more time at first, but has some distinct advantages over working in a GUI:
- Your work is saved and more visible for inspection.
- If you discover an error, you can correct it without losing the work done after the error was made.
- If you want to amend your process (to analyze a hundred books instead of ten, for instance) but perform the same analysis, you can alter the code only slightly.
- Perhaps most importantly, working in a programming language will help you better understand the step-by-step processes involved in computational analysis, including the computational analyses that underlie GUIs. Doing this work should help you be more aware of how computers think—or, better, how people think with computers. Even if you never touch a line of code after leaving this class, I hope the experience of it will make you a more thoughtful and critical user of all sorts of programs hereafter.
Save a few brief detours, we’ll be working almost entirely in the “R” language, developed specifically for statistical computing. This has three main advantages for the sort of work that historians, literary scholars, and other humanists do:
- R is easy to download and install, though the program RStudio. You should get this program installed on your computer ASAP, as we’ll start using it the second week of class. RStudio makes it easy to do scripting and test your results step by step. It also means that R takes the least time to get from raw data to pretty plots of anything this side of Excel. RStudio also offers a number of features that make it easier to explore data interactively.
- R has lots of packages we can use for data analysis, such as dplyr, tidyr, and ggplot2. These are not core R libraries, but they are widely used and offer the most intellectually coherent approach to data analysis and presentation of any computing framework in existence. That means that even if you don’t use these particular tools in the future, working with them should help you develop a more coherent way of thinking about what data is from the computational side, and what you as a humanist might be able to do with it. These tools are rooted in a long line of software based on making it easy for individuals to manipulate data: read the optional source on the history of database populism to see more. The ways of thinking you get from this will serve you will in thinking about relational databases, structured data for archives, and a welter of other sources.
- R is free: both “free as in beer,” and “free as in speech,” in the mantra of the Free Software Foundation. That means that R–like the rest of the peripheral tools we’ll talk about—won’t suddenly become inaccessible if you lose a university affiliation.
- It’s a pirate’s favorite programming language (give it a second). Pirates are important historical and literary figures.
Other Languages and Software
If you have a background in another programming language (Python is quite common), you can talk with me about using it in certain situations during the semester: either to perform a task it is better suited to than R or to to accomplish something beyond the capabilities of a novice R scripter. If you think another language will be essential for something you want to do in this class, let me know and we can discuss.
I also won’t entirely rule out you using an out-of-the-box tool such as GIS, perhaps for display of data you processed first in R. However:
- A GUI tool should only be used as part of a final step, and only to supplement or enhance work you have done in R.
- I cannot promise any support of work using a tool not explicitly taught in class. I suppose I can’t absolutely promise help for all the things you might do in R, either, but if you want to step outside the toolset of the class do so only with a tool you are already comfortable using and troubleshooting yourself.