Historians and personal digital archiving


Note:  This post was contributed by Eric Novotny, Acting Head of the Penn State’s Arts & Humanities Library, and History Librarian.  Eric is one of the subject librarians participating in this project.

As the library liaison to Penn State’s history department I was eager to examine historians’ responses to our Personal Scholarly Archiving Survey.  With the invaluable aid of Alan Shay we isolated responses from forty scholars with historical research interests.  Comparing these self-identified historians (who may or may not be in a history department) to the larger set of responses revealed some interesting findings.

Not surprisingly, and like scholars in other disciplines, nearly all historians reported storing relevant research materials.

Survey Question 7: Are you storing (on a computer or online) materials relevant to your research and work?

Novotny 1

The historians in our survey were less likely to use Cloud Storage (Dropbox, Google Drive), and also less likely to use citation software such as Zotero or Endnote.

Survey Question 10: Do you use citation management software (Endnote, Mendeley, Refworks, Zotero, etc.) to create bibliographies and/or organize references?

Novotny 2

Survey Question 12: Do you use any software/online services for sharing your research articles/data with others?

Novotny 3

These findings present libraries with an opportunity.  A significant percentage of respondents expressed a desire for training in the areas of citation management, personal archiving, and asset management.  There is a role for librarians to play in highlighting the available productivity tools.

Novotny 4

There is also an advocacy role for libraries and librarians in spurring the creation of universal integrated solutions. Historians want tools that accommodate their unique needs, including support for citing unpublished primary sources. They want a seamless transition from source gathering to note-taking and writing[i].   The project team is planning to conduct focus group interviews this summer to further identify the needs of historians and will share our additional findings.

[i] For a good discussion of historian’s citation management needs see: “Supporting the Changing Research Practices of Historians,” http://www.sr.ithaka.org/research-publications/supporting-changing-research-practices-historians

Initial Survey Results – Searching

* This post is the second of several installments where we describe our initial findings from our scholarly workflow survey. This post centers our participants’ thoughts on the searching component of the research process.
When searching for new relevant research-related information (books, articles, etc.) our respondents largely rely on library databases (72%), Google Scholar (64%), and Google (48%).A majority of survey participants (87%) widely agreed or strongly agreed that it is easy to find relevant research articles and other information needed for their work.
searching graph

Penn State’s 2012 FACAC survey of technology use by faculty and students provides an interesting comparison with our study’s data.  FACAC is administered to approximately 2000 respondents, 11% of which were faculty.  In the FACAC study, the library resources and services most used by faculty were: The CAT (library catalog) (73.4%), Library databases (58.7%), Google Scholar (48.0%), and My Library Account (45.7%).   Our study placed the CAT under the general category of library databases, and our numbers correspond with the FACAC findings.  The only disparity between the two studies is that more faculty in our study indicated that they use Google Scholar on a regular basis than FACAC respondents. (see chart below)

FACAC searching

What does all of this say about faculty’s searching habits and preferences?  Both the FACAC and our study’s findings indicate that the library is still a primary destination for faculty to search for, find, and retrieve information.  This runs somewhat contrary to the recent ITHAKA study of historians, which indicated that faculty are turning to Google first for research.  This means that for the time being, libraries should continue to optimize, enhance, and direct significant energy toward refining, embedding, and enhancing their web presence.  The library web site’s primacy in scholarly research may not last for much longer.

Some Thoughts on Acquiring Personal Archives in the Digital Age

conference, data, interviews, photos

What follows is the text of the talk I gave at the DLF Forum on November 4, 2012, as part of the presentation headed by Ellysa and Smiljana.

Most research libraries typically acquire personal archives from creators at the end of careers, or lives, or even after their death, which effectively means that material can be acquired decades after its creation. Because of technological obsolescence and how fluid our personal uses of technology are, it’s logical to conclude that creators need to become more active in curation activities, and libraries need to engage them much earlier in the creation process, else we risk losing valuable cultural material down the digital black hole.

But I think there are some challenges to compressing this distance between creation and acquisition. For starters, do creators want to be bothered with our curatorial recommendations? In a report on born-digital literary manuscripts, Matthew Kirschenbaum and others concluded, based on interviews with working authors, that suggesting to them certain technological behaviors might impinge on their creative process, and hence be unwelcome advice. In our grant’s initial survey less than half of respondents answered affirmatively to the question of whether librarians should offer them personal archiving and digital preservation assistance. Several respondents even explicitly stated their opposition to being told how to preserve their own material in a free text comment field.

Of course, we shouldn’t just assume that creators are absent from the preservation process to begin with. For example, 86% of respondents to our initial survey indicated they regularly back up important material. I know redundancy does not constitute preservation, but it was encouraging to see that most of the respondents back up with a frequency falling between “continually and monthly.”

My favorite answer, and the funniest one, to this question about backup frequency was a single word: “episodically”. I actually think this answer could be instructive to archivists and librarians: for purposes of preservation we may be inclined to isolate material into neat little piles based on documentary types or by file type, as discreet items on a file system, or even as atomized chunks of data, but creators themselves may instead view them as events in a continuum of research – they progress from one to the other and together form a whole, a whole that is not entirely separable from other aspects of their appointments, including teaching, dissertation advising, etc.

And it’s not entirely separable from the physical spaces in which they work either. In Penn State Special Collections we have a physical reconstruction of the author John O’Hara’s study – his writing desk, typewriter, books shelves, even his fireplace and mantle. Now… projects like the Rushdie one out of Emory are starting to make us think of the author’s study as a virtual place, his reconstructed computer desktop screen replacing the physical reconstruction of a creative space. But in our first in-person interview with a Penn State faculty member in the Communications department, we got an intimate view into how these spaces are really wrapped up with one another in complex ways we do not yet fully understand. This researcher commented on the ways in which his forty-plus years in the same office and his pending move have had an impact on the way he works. At a much higher level, being with one institution for so long influenced his relationship to technology over the years. In the early 80s he learned of a mainframe on campus and successfully petitioned to have terminal access to it, which he used to draft a book in the SCRIPT markup language. As we proceed with the interviews, we’ll be documenting both the physical and virtual desktop environments graphically, and I think this is an area of the grant work that can be especially enlightening and fruitful.

In another Mellon funded digital archives grant, the AIMS project, conducted by Stanford, Yale, UVa, and Hull, they concluded that institutions need to spend more pre-acquisition time collaborating with potential donors and getting to know the details of their digital ecologies in order to make informed decisions about ingest, management, and preservation. The AIMS project produced a comprehensive model donor survey that can and should be used by archivists, and one that helped guide some of my recommendations to Ellysa on this grant. It contains questions about general computing habits, tools, use of mobile devices, social networking and even privacy. I think if we could always get this level of detail from potential donors, then a grant like ours wouldn’t be necessary. But donors of personal papers are busy, distracted, outgoing people who lead complicated and active lives just like the rest of us. In my own very limited experience with donors, I have not always had great luck teasing the most basic responses out of them, and I think it will be hard to have the kind of in-depth discussions about their digital desktops that we as archivists desire.

With this in mind, I think what we need are more case studies about different classes of creators to draw on for insight. There are two common methods for deciding what to acquire in our profession – one suggests that we appraise a collection based on the value of the content, and another (called macro-appraisal) suggests an emphasis on the role played by the records creator and how that person fits within an established and documented institutional collecting area. The latter has become increasingly common, but it presents significant challenges when the material to be acquired is digital, distributed, duplicated (in both digital formats and analog), and dependent on specific hardware or software.

In the absence of a collaborative relationship with potential donors, a condition which will likely persist, I think archives and libraries need a community-produced set of data about different types of creators to fall back on. This kind of appraisal might fall somewhere between micro and macro. Let’s call it meso-appraisal: the idea being that we start to document, quantitatively, the tools and habits used and exhibited by creators working in different fields. We’ve seen this kind of work already coalescing around literary archives. And in a way, I think this approach might succeed in aligning our appraisal efforts with our preservation efforts, which focus on developing preservation strategies at an intermediate level, usually for different formats. Not individual items, and not whole collections of disparate items.

For instance, heading into this grant I was curious to see how many unusual or non-standard file formats we might encounter. With the exception of data sets, the survey results have showed a surprising uniformity in the common types of digital material people have. There is persistent use of documentary types familiar to us all: word docs, spreadsheets, email, image files, and already some indication that certain formats, like PDF, are rather ubiquitous. And only 5% of respondents checked the “other” option on the question related to formats.

I think we can also produce some useful information about how people are using certain tools. Email is already turning out to be an interesting case study. People use it for everything, and not just communication. They use it for sharing documents, obviously, but also rudimentary backup, and even version control.

I think for digital library professionals, some of these findings can help inform our approach to the development of tools, especially related to repository services, which is something I’ve been thinking about a lot since Penn State just released its Hydra based repository, ScholarSphere.

If there’s anything our initial findings are demonstrating, it’s that researchers on campus are not shy about going outside the academy to get the tools they need. The communications professor we interviewed uses Dropbox because it’s easier, and provides ample space, even though it adds a financial burden. He uses Gmail because he feels it’s more user friendly than the email system his department offers. In general, researchers appear to use such tools interchangeably for a variety of purposes, including sharing, versioning, and redundancy. But they don’t know where to go for things like data recovery, and have simply accepted certain levels of data loss as part of doing business. I think the data is already trending in a way that demands that our repository efforts reduce as many barriers to adoption as possible — that our repository be as easy to use as Dropbox or email, or even be interoperable with these tools. Most important is that any repository services we develop fall comfortably within their existing workflows. And if we can at the same time sell the benefits that they cannot get from other tools — greater accessibility, data integrity, etc. – then I think we’ll be able to spur much greater adoption.

— Ben Goldman, Digital Records Archivist