Teaching History with XML/TEI: A Contribution to Liberal Education

During the discussion period of a NITLE webinar I participated in last week, a member of the audience asked me why we choose to use eXtensible Markup Language (XML) compatible with the Guidelines of the Text Encoding Initiative (TEI) in the Wheaton College Digital History Project.*  And I think a response to that question merits a post here since I use this blog as a space to offer information about digital humanities methods and their use in digital history.  I focus here on the practice as part of my work as an educator.  In a future post, I will speak to the question of using XML/TEI in historical scholarship.

Fundamentally, using XML/TEI in a teaching project like ours gives students a chance to learn something about the digital tools we use every day.  I think this kind of opportunity is an important component of liberal education as defined by the American Association of Colleges and Universities (AAC&U) :

a philosophy of education that empowers individuals with broad knowledge and transferable skills, and a strong sense of value, ethics, and civic engagement.

Because I am a historian, I understand the broad knowledge and transferable skills referred to in this definition as contextual, as dependent on time and place.  So in my view, the technological developments of the past twenty or so years have created for those of us who live and work in the United States a culture so mediated by digital devices of various types that a basic understanding of those devices has become an essential part of a liberal education.

That is, I think it is part of my responsibility as an educator to help students understand the laptops and tablets and smart phones of our daily lives as comprehensible machines because we use them both to consume and produce the stuff of our culture.  Because I think that a minimal understanding of how those devices work empowers students to put their values and ethics to use in the form of civic engagement and other elements of a fulfilling human life.

Now, this does not mean that I think I need to teach my students to become programmers or even that I think I need to be a programmer.  My colleagues in computer science teach students programming and machine structures and computational thinking.  And those colleagues are better able than I to speak to larger questions of the strengths and weaknesses of XML from those perspectives.

I am a historian, and my main goal in using XML/TEI in my teaching is to give students an opportunity to spend time with primary sources in a particular kind of way that is facilitated by using these tools.  But before I explain this point, I want to say just a bit more about the value of knowing at least a little bit about XML as an educated citizen of our world.  And that requires defining XML without getting too technical.  So here goes.

XML stands for eXtensible Markup Language.  You can look it up on Wikipedia , which also has a more general entry on markup language.  But those entries go into a lot of historical and somewhat technical detail.**  Boiled down to essentials, there are only a few things that make knowing a little bit about XML a useful thing at our moment in time and place:

  • A lot of the applications that we use every day store our data using XML.  If you use Microsoft Office (Word, or Excel, or PowerPoint) or analogous applications from OpenOffice.org or Apple iWork, when you save your work, the application preserves your work in XML.
  • XML is commonly used not only for storing data but also for its exchange over the Internet.

So XML is all around us.  We use it all the time.  And so do professionals who specialize in storing and accessing information.

  • XML is a very stable format for storing data and metadata (that is, information about information).
  • XML is so stable that it is a preferred archival format among libraries and other cultural heritage organizations all over the world.  This means that even if the applications you use now disappear, new software can be written to display your information on whatever new generations of devices exist at the time.
  • XML is built to be used internationally, with the facility to include characters in any alphabet.  So you can store data that uses Chinese logograms or Cyrillic characters; you need not confine your language to English or French or some other European language.

So, XML is one of the important building blocks of the way we store and exchange information every day.  We don’t usually think about it, but it underlies a lot of what we do, and thus we can say that knowing a bit about it could be part of the broad knowledge and transferable skills that make up a liberal education.

Why use XML to teach students how to do history?

A lot of teaching students how to do history involves giving them many opportunities to spend time examining primary sources, which are the evidence out of which we create historical knowledge.  As historians, we explore information that people created in the past and make arguments about what those people did and why or how their actions were significant.  We ask questions prompted by the documents, and we look for information in other documents based on those questions.  But how do we know what questions to ask?  How do we learn enough about a particular document to have a good idea of what other documents to examine next?

One way we do these tasks is through close reading, by which I mean getting to know a document, its author, its audience, its context.  And historians have been transcribing documents as a practice related to close reading for a long time.  In fact, transcribing sources is a basic research skill that students learn early in their educations; it is not a skill restricted to the practice of history.  When we do research, we take notes.  We might say that good transcription and note-taking are some of the transferable skills of a liberal education.

Teaching students to use XML as they transcribe primary sources promotes close reading.  That is, asking students to transcribe primary sources and embed information about the sources in the files that hold transcriptions gives students opportunities to get to know the sources deeply in ways that help students learn how to interpret the sources, ask questions about them, find related sources, and build arguments grounded in historical evidence.

The story I like to tell to illustrate this process comes from a time I was teaching a course on historical methods a few years ago.  I asked the students to transcribe and mark up some pages from an account book that was kept in a store in a nineteenth-century New England town with a mixed agricultural and industrial economy.  The students happened to be transcribing pages that included the purchase and sale of a lot of potatoes, and they wanted to know more.  So we talked about agriculture and the seasonal cycles of planting and harvest.  We talked about how potatoes grow and buying seed potatoes.  And we considered potato blight, the Irish famine, and the dates of the transactions the students were transcribing.  All of this discussion was fine enough, but none of it led to any particularly satisfying interpretations of the information the students had found.

So we all did some more research, this time in secondary sources.  And we finally found an article in a journal focused on Vermont history that helped us make sense of all those potatoes.***  Because in that article, we read about the need for starch in the process of textile production in New England factories.  And we also learned that around the same time we had discovered all those potatoes being bought and sold, the people who ran textile factories used starch that was made from potatoes.

Now, I do not by any means wish to claim that this anecdote is a story of professional scholarship.  If I were using the primary source my students were transcribing as part of a scholarly research project, I might or might not focus on the potato question as a significant one for the larger project.  And even if I did for some reason need to know more about those potatoes, I would probably go about the next steps in my research differently from the way that my students and I had time to do in one assignment in a semester-long course.

But I do feel comfortable claiming that this exercise in figuring out a possible story behind all those potatoes was an effective lesson for the students in the process of doing historical research.  The students had a genuine intellectual experience that arose from close reading of a primary source.  They learned that spending time with a source can lead to interesting questions and that following where those questions lead can turn up unexpected information about the past.

For me as an educator, the value of the great potato quest lay in the opportunity it gave students to practice historical research.  And I would argue that asking students to transcribe the source and embed information about the source using XML facilitated the slowing down, the taking time, the close reading that is a significant skill for the practice of history.  In this case, XML was a tool for creating the conditions that helped students learn.  And that is the only good reason to use any technology in the classroom.

I haven’t said anything in this post about the Guidelines of the Text Encoding Initiative (TEI), which shape the kinds of information we embed in XML files in the Wheaton College Digital History Project.  Those guidelines are part of the use of XML in research and scholarship, so I will speak to them in a future post.


*Michelle Moravec organized the webinar, and Georgianne Hewett managed the tools that we used to present it.  Presenters focused on using digital tools in our history teaching.   Aaron Cohen presented his work using History Pin–a tool for managing images and creating exhibits–with students at Slippery Rock University.  Michelle showed a website that she and her students created using WordPress along with images of stained glass windows and a map of the college chapel at Rosemont College.  And I offered my usual presentation about our use of   The slides from all of the presentations are available here.  Amanda Hagood, who is Director of Blended Learning at Associated Colleges of the South, asked the question that prompted me to write this post.

**For more detail and an introduction to working with XML, see Joe Fawcett, Liam R.E. Quin, Danny Ayers, Beginning XML, 5th Ed. (Indianapolis, Ind.: John Wiley & Sons, Inc., 2012).

***David Demeritt, “Climate, Cropping, and Society in Vermont, 1820-1850,” Vermont History (1991) 59/1: 133-165.






Digital Humanities, Libraries, and Scholarly Communication

For me, the lines between digital humanities, libraries, and scholarly communication are so faint as to be insignificant.  And my perception of the equivalences among these entities that often seem siloed to my colleagues presents a real challenge as I try to help people–both at my own institution and at other campuses–think about possible futures for higher education in our digital culture.

The source of my perception lies in my having begun to learn about how digital innovations are changing libraries and publishing as a result of my first forays into digital humanities.  In 2004, I participated in a series of workshops at Wheaton College that were sponsored by the National Institute for Technology in Liberal Education and funded by the Andrew W. Mellon Foundation.  Those workshops focused on two sets of encoding standards that use extensible markup language (XML): the Encoded Archival Description Document Type Definition (EAD DTD) and the Text Encoding Initiative (TEI).  The hands-on workshop sessions focused on TEI, and I attended the workshops out of interest in testing the use of TEI in teaching my undergraduate history students.  But the EAD component of the initial workshops meant that librarians attended too, so perhaps I have found one source of my elision of digital humanities, libraries, and scholarly communication.

Perhaps I have identified also a significant point about how these three often siloed entities are in fact connected.  I don’t mean to claim originality here.  Folks involved in digital humanities have been working on these questions for quite some time, as is clear from the discussion of the development of EAD at the Library of Congress website.  EAD and TEI were both developed in the 1990s.  Both began using Standardized General Markup Language (SGML), and both shifted to use of XML.  And both are used by libraries.

In fact according to the TEI website cited above, “Since 1994, the TEI Guidelines have been widely used by libraries, museums, publishers, and individual scholars to present texts for online research, teaching, and preservation.” A search of the TEI consortium’s website led me to slides from a talk by Susan Hockey of University College London, “Markup, TEI, Digital Libraries.” The talk was presented at the TEI Members Meeting in 2002, and it offers a good overview of issues about the relationships between changes digital innovations were bringing to libraries and digital scholarship at that time.  The TEI has a Libraries special interest group (SIG), and they recently released an update to their recommendations for best practices for use of TEI by libraries.

So TEI–the flavor of digital humanities that I practice–does have clear connections to libraries that can be traced back for at least two decades.  I’m not making that up.  What a relief!

Scholarly communication, the third of my equivalences, belongs in the set as a result of the ways that digital innovations have affected communication in general, that is in the ongoing shift from print to digital formats.  The most obvious example–the one that has received the most public outcry in the past couple of years–is the case of newspapers.  Like many people, I no longer subscribe to print newspapers; I read them online.  And I resented the introduction of a pay wall by my newspaper of choice, the New York Times, as the publisher sought a new way to make the newspaper profitable as a business.  But eventually I gave in, and I pay my fifteen dollars every month.

Like newspaper publishers, university presses have been changing their production practices for at least the past twenty years, as various word processing programs have become the tools of choice for scholars writing articles and books.  I began to hear about changes in scholarly publication when I attended a NITLE meeting on scholarly communication that was held at Pomona College in January 2008.  (I think that’s the right date.)  Like all NITLE meetings, this one gave me plenty to think about, especially the idea of open peer review.  And in the intervening years, I’ve had opportunities to sit in on discussions in which I’ve heard editors talk about workflows and publishing software.  Now, I have an essay in a volume that is undergoing open peer review and that is under contract (the volume, not necessarily my essay) with the Digital Culture series at the University of Michigan Press.

All of this seems perfectly transparent and logical to me, and I understand digital scholarship–which is the term I use to encompass my three equivalences–to be the future of scholarship and higher education.  My greatest challenge lies in parsing out how that is the case for folks who haven’t had the advantages I have had over the past seven years as I’ve learned from my digital humanities colleagues.

Is It Out There?: Undergraduate Research as Digitization at Analog Pace

In the poster session at the NITLE Summit, I presented the portion of the Wheaton College Digital History Project on which my students are currently working.  This is the second time that students in my iteration of the methods course for history majors have transcribed and encoded transactions from a daybook that Laban Morey Wheaton kept in Norton, Massaschusetts, between 1828 and 1859.

Viewers of the poster saw images of Wheaton and his wife, Eliza Baylies Wheaton, as well as sample images of the daybook, XML files, and a visualization based on student interest in commodities traded on days of the week from spring 2009.  I explained that asking students to transcribe and encode financial records gives them an opportunity to learn a host of principles and skills meant to prepare them for doing their own research in primary sources for their senior seminar projects.

More than once, viewers asked me whether the data was “out there” for other students and faculty members to use, and I had to reply that we have not yet reached that stage.  Our college is participating in a planning grant for a presentation tool, but the tool is very much still in the planning stages.  Which leads me to reflect that producing data through undergraduate pedagogy might appear at a pace closer to that of analog publication than we are accustomed to in a digital world.

As a comparative novice in TEI, I have only recently come to realize some of the complexities that result from our collaboration among students, the College Archivist, an academic technologist, and a faculty member.  Among these is the fact that we are creating digital versions of documents for at least three related but distinct purposes:  pedagogical, archival, and scholarly.  And for all three of these, results and publication are far from instant.

As students transcribe and encode the daybook, the pace can seem positively glacial, not least because learning to decipher nineteenth-century handwriting takes time.  We assign each student a single page spread, so at the end of this semester, we will have completed transcription and encoding of about forty pages.  And the daybook is only one of numerous account books in the collection.  From a certain pedagogical perspective, pace does not matter, and we will have plenty of material for the students to work on the next many times I hope to teach this assignment.  Aggregation of the data means only that future students will have more material to query.

Similarly, for archival purposes, pace is less important than having someone do the work.  And from the archival perspective, accuracy of transcription is often more important than speed.  In fact, the need for multiple instances of proofreading has become one of the most significant obstacles in online publication of the letters, travel journal, and pocket diaries that student workers finished transcribing, encoding, and proofing at least two summers ago.

And, as having adequate time available to proof behind the students stalls archival publication, lack of time slows my own ability to reflect and produce scholarly versions of this material.  Scholarly use of the financial records awaits digitization of adequate amounts of data to aggregate and query meaningfully.

So, no.  The data we are producing is not out there yet.  Digital methods offer important learning opportunities for our students.  They do little to speed the pace of careful archiving and scholarship as yet.  I do remain convinced that eventually there will be significant research value in the data that we will produce.  Especially if we can manage to tolerate the incremental (analog) within the digital.





Tweeting Conferences: NITLE Summit and UVA Shape of Things to Come

By about 4:30 yesterday afternoon, John Unsworth had reached the conclusion that Brett Bobley’s tweet had foreshadowed for me the previous evening.  As he was arriving in Charlottesville, Bobley had tweeted he could feel emanations of Worthy Martin’s presence.  I had replied that he was in New Orleans, but I had left off the NITLE hashtag.  (Someday I will become a more adept tweeter.)  Fortunately, Kathleen Fitzpatrick, who was also following Bobley, included it.

In the morning I sat in the first session of the NITLE Summit, ready to tweet Brian Hawkins’s talk on the role of information professionals in our current brave new world.  I noticed Bethany Nowviskie’s tweet from UVA and retweeted: “RT @nowviskie #uvashape: Robert Darnton calls for creation of a National Digital Library. #NITLE Wonder what Brian Hawkins might think.”

More and more, as I followed the backchannel from UVA and tweeted from NITLE, I experienced the convergence of the two conferences.  At one point I tweeted: “weirdness of reading @mkirschenbaum ‘scholarship has to accommodate the idiosyncratic’ while Hawkins talks learning from for-profits #NITLE.”  And soon, I was seeing numerous tweets about others following multiple streams at once.  Mark Wardecker tweeted: “Too many great conferences to listen in on today. Great stuff fr the #NITLE Summit in NOLA and Mediterranean Identities #medid.”  Others were following a conference for education leaders at Yale and a Turning Technologies User Conference at Northwestern.

By the end of the session, my head was spinning, and I was having trouble coming down enough from the tweeting to focus on the next session, for which I had become the lone faculty member talking about digital humanities at liberal arts colleges.  After Worthy Martin talked about the opportunities that are offered by NEH Fellowships at Digital Humanities Centers, we presented in alphabetical order.  Janet Simons talked about her work at Hamilton College, where a critical mass of faculty members have brought about the development of a comprehensive institutional plan for presenting their work.  (Unfortunately, Angel David Nieves was unable to attend.  This is the second time I have missed meeting him.  I hope the third opportunity will be more successful when he returns to Wheaton in April.)  Bob Kieft presented the organization of the Center for Digital Learning and Research–directed by Marsha Schnirring–as the strategy for making sure Occidental College is ready for the influx of new digital-ready faculty members sure to come in the next five years.  And Scott Hamlin from Wheaton College described the project-based approach we have taken so far and how it has led to his collaboration with counterparts at several other small liberal arts colleges on planning a presentation tool for TEI projects with funding from the Institute of Library and Museum Services.

So my remarks came as testimonial to the way that opportunities through NITLE had led to my work with TEI in teaching, fundamentally redirecting and reinvigorating my scholarship.  When I started taking students in my course on nineteenth-century U.S. Women’s History to the archives and having them transcribe a woman’s journal, I learned the joy that students can experience from firsthand knowledge of a primary source.  I became an advocate for undergraduate research.  Students in my iteration of the methods course in history are transcribing and encoding financial documents from the second quarter of the nineteenth century.  And I now represent myself rather grandly as the director of the Wheaton College Digital History Project.

So when someone tweeted from UVA this afternoon about the potential for undergraduate collaborations in digital scholarship, I retweeted, adding:  “we do this @wheatoncollege.edu.”  I’m looking forward to discussing this with Ryan Cordell, who will begin teaching at a liberal arts college in the fall.

I agree with Unsworth: “It would be really interesting to do a contrapuntal edition of #uvashape and #NITLE tweets.”

And thanks for the dart game pics on Flickr, John.  Looks like fun.




My Introduction to Kathleen Fitzpatrick’s Closing Keynote Presentation at NITLE Summit

Kathleen Fitzpatrick needs no introduction.  Everything there is to know about her is on the web.  If you google her, you will find that she is neither one of the 25 professionals named Kathleen Fitzpatrick on LinkedIn nor is she the Australian academic who died in 1990—more on that in a minute.  Rather, she is the other Kathleen Fitzpatrick on Wikipedia, which links to her homepage at Pomona College, where she is Associate Professor of Media Studies.

She is the recipient of numerous awards, most notably from the Mellon Foundation and the National Endowment for the Humanities.

Her first book, The Anxiety of Obsolescence: The American Novel in the Age of Television, was published.  Eventually.  In the introduction to her second book, she referenced the challenges she had faced in the publication process, concluding:

“The first academic book isn’t dead, it is undead.”

Now I ask you, what’s not to admire in an academic who can work a zombie metaphor into a serious book about “the crisis in scholarly publishing”?

And how appropriate that she should offer a keynote presentation for us here in New Orleans.

More significantly, Kathleen Fitzpatrick is a strong advocate for change in institutional thinking about scholarly communication.  And she practices what she preaches.  Her second book, Planned Obsolescence: Publishing, Technology, and the Future of the Academy, will be published this year by New York University Press.  A draft version is available for open peer review at Media Commons.







