I am committed to a flexible schedule that meets your learning goals. You will see that I provide an outline for the semester, but I expect the precise readings to evolve as the semester develops. Indeed, one of your assignments will require each of you to take the lead in discussion for one week, which may involve adjusting that week’s readings. Our topic for the last week of class will be decided by vote.
In summary: the schedule below may change, so check it before beginning a week’s readings. I will commit to making any final edits for the next Thursday by the end of day on the previous Thursday (essentially, I reserve the right to make some tweaks immediately after class). For you, that means you need to think well ahead of time about any adjustments you want to make during your week, so that I can adjust the schedule in time.
Reading the Schedule
Each week’s class will blend discussion of assigned readings with hands-on practicum in a given method of data analysis within the R programming environment. At the end of each session, you will be given practice exercises for developing your proficiency with the week’s methods. In reading the schedule for any particular week, then,
- Core Readings are the articles, books, or web resources you are responsible for reviewing before class.
- Penumbral Readings are pieces pertinent to the week’s topic that will expand your understanding if you are so inclined (plus they’re good places to start additional research for your Notebooks and final papers). I would encourage you to choose one Penumbral reading each week that looks interesting and at least browse it.
- Dependencies outline the applications, packages, or other software you should install, any datasets you should download, and any technical tutorials you should complete before class.
- Practicums describe the technical elements we will work on in class.
- Exercises describe resources that will be useful as you complete your practice exercises and problem sets following class. If a specific outcome is not outlined, you should complete your exercises in R or RMD files for submission and/or later in-class reference. We will discuss how to do this in the first week of class, so do not worry if these letters mean nothing to you now.
Readings not freely available online are available in a password-protect, zipped course packet. I will give out the password in class on day 1.
%>% January 12
%>% What Is HDA and What Might It Be?
The articles in Debates in the Digital Humanities 2016‘s “Forum: Text Analysis at Scale”. Important Note: I realize this looks like a crazy-long first-day assignment, but these are all position papers and each the length of short blog posts. It adds up to about one typical article’s length of brilliant thoughts that should give us much to discuss.
- Matthew K. Gold and Lauren F. Klein, “Introduction”
- Stephen Ramsay, “Humane Computation”
- Ted Underwood, “Distant Reading and Recent Intellectual History”
- Tanya Clement, “The Ground Truth of DH Text Mining”
- Lisa Marie Rhody, “Why I Dig: Feminist Approaches to Text”
- Tressie McMillan Cottom, “More Scale, More Questions: Observations from Sociology”
- Benjamin M. Schmidt, “Do Humanists Need to Understand Algorithms?”
- Joanna Swafford, “Messy Data and Faulty Tools”
- Alan Liu, “N + 1: A Plea for Cross-Domain Data in the Digital Humaities”
- Dan Cohen, “Searching for the Victorians” (4 October 2010)
- Cecily Carver, “Things I Wish Someone Had Told Me When I Was Learning How to Code” (22 November 2013)
- Ted Underwood, “Seven Ways Humanists are Using Computers to Understand Text” (4 June 2015)
Dependencies: A text editor with full regular expression capabilities.
- For Macs, a good one is TextWrangler or Atom.
- For Windows, NotePad++ (Not Notepad, which you most likely have already).
- For Linux, gEdit or atom.
Practicum: regular expressions
Exercises: RegEx problem sets (available through course website)
%>% January 19
- Louis T. Milic, “The Next Step,” Computers and the Humanities 1.1. (1966)
- Roberto Busa, “Why a Computer Can Do So Little,” ALLC Bulletin 4.1 (1976)
- Yohei Igarashi, “Statistical Analysis at the Birth of Close Reading,” New Literary History 46.3 (2015)
- Cameron Blevins, “Digital History’s Perpetual Future Tense,” Debates in the Digital Humanities 2016
- Rosanne G. Potter, “Literary Criticism and Literary Computing: The Difficulties of a Synthesis,” Computers and the Humanities 22.2 (1988)
- Susan Hockey, “The History of Humanities Computing”, A Companion to Digital Humanities (2004)
- Michael S. Mahoney, “The Histories of Computing(s),” Interdisciplinary Science Reviews 30.2 (2005)
- Ted Underwood, “Theorizing Research Practices We Forgot to Theorize Twenty Years Ago,” Representations 127.1 (Summer 2014)
- Install RStudio on your computer. Note that on some systems, this also requires you to install R itself. Please attempt this early in the week so that we can help you if anything goes awry.
- If you’re feeling ambitious, work through Taylor Arnold and Lauren Tilton’s “Basic Text Processing In R” at the Programming Historian and browse Lincoln Mullen’s “Introduction” to his book-in-progress, Digital History Methods in R. We will use Mullen’s exercises throughout the semester; hereafter his book is designated as DHMR. To follow along with much of DHMR, you will need Mullen’s data, which he provides in the book’s Github repository.
%>% January 26
%>% Humanistic Data
- Michael Whitmore, “Text: A Massively Addressable Object”, Debates in the Digital Humanities, University of Minnesota Press (2012)
- Katie Rawson and Trevor Muñoz, “Against Cleaning”, Curating Menus, 7 July, 2016.
- Melissa Terras and Julianne Nyhan, “Father Busa’s Female Punch Card Operatives,” Debates in the Digital Humanities 2016, University of Minnesota Press (2016)
- Frederick W. Gibbs, “New Forms of History: Critiquing Data and Its Representations,” The American Historian (February 2016)
- Daniel Rosenberg, “Data Before the Fact,” “Raw Data” Is an Oxymoron, MIT Press (2013)
- Ellen Gruber Garvey, “facts and FACTS: Abolitionists’ Database Innovations,” “Raw Data” Is an Oxymoron, MIT Press (2013)
- David Mimno, “Data Carpentry” (2015)
- Michael Hancher, “Re: Search and Close Reading,” Debates in the Digital Humanities 2016, University of Minnesota Press (2016)
- Sarah Allison, “Other People’s Data: Humanities Edition,” CA: Journal of Cultural Analytics (2016)
Dependencies: Install the ‘tidytext’ and ‘tidyverse’ R packages
Practicum: Data frames and tibbles
- DHMR, “Working with Data” sections
- R For Data Science (hereafter RDS), “Data Transformation” and “Exploratory Data Analysis”
%>% February 2
%>% Exploratory Data Analysis
- D. Sculley and Bradley M. Pasenek, “Meaning and Mining: the Impact of Implicit Assumptions in Data Mining for the Humanities,” Literary and Linguistic Computing 23.4 (2008)
- Hadley Wickham, “The Split-Apply-Combine Strategy for Data Analysis”, Journal of Statistical Software 40.1 (2011)
- Browse Micki Kaufman, “Quantifying Kissinger” posts (2014-2015)
- Ryan Cordell, “Reprinting, Circulation, and the Network Author in Antebellum Newspapers”, American Literary History 27.3 (August 2015) with accompanying methods paper by David Smith, Ryan Cordell, and Abby Mullen, “Computational Methods for Uncovering Reprinted Texts in Antebellum Newspapers”
- John T. Behrens, “Principles and Procedures of Exploratory Data Analysis,” Psychological Methods 2.2 (1997)
- Bryan Santin, Daniel Murphy, and Matthew Wilkens, “Is or Are: The ‘United States’ in Nineteenth-Century Print Culture,” American Quarterly 68.1 (March 2016)
- Sarah Wilson, “Black Folk by the Numbers: Quantification in Du Bois,” American Literary History 28.1 (2016)
Dependencies: Install the ‘dplyr’ R package
Practicum: More work with tabular data
- RDS, “Tibbles,” “Data Import,” and “Tidy Data”
%>% February 9
%>% Snow Day
%>% February 16
Discussion Leader: Matthew Hitchcock
- Johanna Drucker, “Humanities Approaches to Graphical Display,” Digital Humanities Quarterly 5.1 (2011)
- Lauren F. Klein, “The Image of Absence: Archival Silence, Data Visualization, and James Hemings,” American Literature 85.4 (2013)
- Catherine D’Ignazio and Lauren F. Klein, “Feminist Data Visualization” (2016)
- Haley Wickham, “A Layered Grammar of Graphics,” Journal of Computational and Graphical Statistics 19.1 (2010)
- Lev Manovitch, “What Is Visualisation?” Visual Studies 21.1 (March 2011)
- Shaowen Bardzell, “Feminist HCI: Taking Stock and Outlining an Agenda for Design” (CHI 2010)
Dependencies: Install the ‘ggplot2’ R package
Practicum: The plots thicken
- DHMR, “Plotting” and “Interactive Plotting”
- RDS, “Data Visualization”
%>% February 23
Discussion Leader: Gregory Palermo
- Anna Lowenhaupt Tsing, “On Nonscalability: The Living World Is Not Amenable to Precision-Nested Scales,” Common Knowledge 18.3 (2012)
- Andrew Piper, “Novel Devotions: Conversional Reading, Computational Modeling, and the Modern Novel,” New Literary History 46.1 (2015)
- Julia Flanders and Fotis Jannidis, “Data Modeling,” A New Companion to the Digital Humanities (Wiley Blackwell, 2016).
- Willard McCarty, “Knowing…: Modeling in Literary Studies,” A Companion to Digital Literary Studies (2008)
- Molly O’Hagan Hardy, “‘Black Printers’ on White Cards: Information Architecture in the Data Structures of the Early American Book Trades,” Debates in the Digital Humanities 2016
Practicum: Text analysis (led by Fitz)
%>% March 2
%>% Topic Models
- Ted Underwood, “Topic Modeling Made Just Simple Enough” (7 April 2012)
- Lisa Marie Rhody, “Topic Modeling and Figurative Language,” Journal Of Digital Humanities 2.1 (Winter 2012)
- Lauren Klein, “The Carework and Codework of the Digital Humanities” (May 2015)
- Benjamin M. Schmidt, “Words Alone: Dismantling Topic Models in the Humanities,” Journal Of Digital Humanities 2.1 (Winter 2012)
- if the principles of topic modeling are still unclear, I highly recommend this recorded presentation, David Mimno, “The Details: Training and Validating Big Models on Big Data” (starting in earnest at 3:30)
- Robert K. Nelson and Digital Scholarship Lab, University of Richmond, “Mining the Dispatch” Project (2011)
- Matthew Jockers, “The LDA Buffet is Now Open; or, Latent Dirichlet Allocation for English Majors” (29 September 2011)
- Scott Weingart, “Topic Modeling for Humanists: A Guided Tour” (25 July 2012)
- David M. Blei, “Topic Modeling and Digital Humanities,” Journal Of Digital Humanities 2.1 (Winter 2012)
- Megan R. Brett, “Topic Modeling: A Basic Introduction,” Journal Of Digital Humanities 2.1 (Winter 2012)
- Andrew Goldstone and Ted Underwood, “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us” New Literary History 45.3 (June 2014), with online supplement
Dependencies: Install ‘mallet’ R package
Practicum: Topic modeling with RMallet
- DHMR, “Topic Modeling”
%>% March 16
Discussion Leader: Lara Rose Roberts
- Jo Guildi, “What Is the Spatial Turn?”, “The Spatial Turn in History”, and “The Spatial Turn in Literature”. Feel free to read any of the other subject articles that interest you.
- Benjamin Schmidt, “Reading Digital Sources: a Case Study in Ship’s Logs” (15 November 2012).
Review the visualizations and look also at some of the linked shorter posts from the shipping logs series.
- Cameron Blevins, “Space, Nation, and the Triumph of Region: A View of the World from Houston,” Journal of American History 101.1 (2014) with accompanying digital exhibit, “Mining and Mapping the Production of Space”
- Richard White, “What Is Spatial History?” (1 February 2010)
- Matthew Wilkens, “The Geographic Imagination of Civil War-Era American Fiction,” American Literary History 25.4 (2013)
Dependencies: Install the ‘ggmap’ R packages
Practicum: Mapping in R
- DHMR, “Mapping”
%>% March 23
%>% Vector Space Models
Discussion Leader: Thanasis Kinias
- Sarah Allison, Ryan Heuser, Matthew L. Jockers, Franco Moretti, and Michael Whitmore, Quantitative Formalism: an Experiment, Stanford Literary Lab Pamphlet 1 (15 January 2011)
- Michael A. Gavin, “The Arithmetic of Concepts: a response to Peter de Bolla” (18 September 2015)
- Benjamin M. Schmidt, “Vector Space Models for Digital Humanities” (25 October 2015) and “Rejecting the gender binary: a vector-space operation” (30 October 2015)
- Benjamin M. Schmidt, “Plot Arceology: a vector-space model of narrative structure” (2015)
- Ryan Heuser, “Word Vectors in the Eighteenth Century, Episode 1: Concepts” and “Episode 2: Methods”
Dependencies: Install the ‘wordVectors’ R package
%>% March 30
%>% Classifying and Clustering
Discussion Leader: Cara Messina
- Lena Hettinger, Martin Becker, Isabella Reger, Fotis Jannidis, and Andreas Hotho “Genre Classification on German Novels” (2015)
- Hoyt Long and Richard Jean So, “Literary Pattern Recognition: Modernism between Close Reading and Machine Learning,” *Critical Inquiry 42.2 (2016)
- Ted Underwood and David Bamman, “The Gender Balance of Fiction, 1800-2007” (28 December 2016)
- Cameron Blevins, “Still Playing Catch-Up” (3 March 2014)
- Annie Swafford, “Why Syuzhet Doesn’t Work and How We Know” (30 March 2015)
- review Eileen Clancy’s Storify of the entire Syuzhet debate (primarily between Matthew Jockers and Annie Swafford) and at least browse a few of the central blog posts
- Tanya Clement and Stephen McLaughlin, “Measured Applause: Toward a Cultural Analysis of Audio Collections,” CA: Journal of Cultural Analytics (23 May 2016)
- Andrew Piper, “Fictionality,” CA: Journal of Cultural Analytics (20 December 2016)
%>% April 6
%>% Working groups
Prof. Cordell away for Rhode Island Humanities Festival
%>% April 13
%>% Ludic Data Analysis
Discussion Leader: David Medina
- Stephen Ramsay, Reading Machines, University of Illinois Press (2011)
- Bethany Nowviskie, “Ludic Algorithms,” Pastplay: Teaching and Learning History with Technology, University of Michigan Press (2014)
- Stephen Ramsay, “The Hermeneutics of Screwing Around; or What You Do with a Million Books,” Pastplay: Teaching and Learning History with Technology, University of Michigan Press (2014)
- David L. Hoover,“Argument, Evidence, and the Limits of Digital Literary Studies”, Debates in the Digital Humanities 2016
Practicum: Screwing around with data