The Data Hoover Project (DHP) and its aims:
The Data Hoover project is to “sweep up” information about scholarly use of historical datasets and the datasets themselves. It proceeds by surveys, data collection, and methodology for data collection.
• “Data Hoover” – the project. (The project is developing standards and regulations for communication with data owners from 2013 to 2015.)
• “data hoover” – the researcher. (The data hoover conducts surveys to establish optimal communication strategies for data collection.) The Data Hoover Project is an initiative of the larger CHIA Project, and is supported by the National Science Foundation. Its aims are to:
• Study data use, creation and curation in the historical quantitative social science (HQSS) fields.
• Ingest and describe HQSS data in order to populate the CHIA Dataverse repository, inform repository and metadata design, and discover (technical, political, semantic and other) opportunities and barriers concerning eager and active participation in CHIA on the part of HQSS data owners. In order to meet these goals, the Data Hoover Project will:
• Survey HQSS faculty and graduate students about their current data repository practices
• Collect, describe and categorize data
• Explore diverse potential methods of future data collection.
These activities are intended to further CHIA’s goals of obtaining a diverse range of data, obtaining the amount of data required to fulfill the global scale of the project, and allowing HQSS data owners to share their work efficiently and enthusiastically.
Dr. Ruth Mostern – Data Hoover Project Principal Investigator
Ruth Mostern is a member of the Founding Faculty at the University of California, Merced and is an Associate Professor in the interdisciplinary School of Social Sciences, Humanities and Arts. Her Ph.D. is from the UC Berkeley Department of History. Before joining the UC Merced faculty in 2004, she was Head of Collection Development at the Electronic Cultural Atlas Initiative, a Berkeley-based international consortium promoting the use of GIS and digital library technologies for communicating about culture and history.
Ruth Mostern is a world historian with an emphasis on imperial Chinese history and political geography. She is also a pioneer in the use of information technology in the humanities, including such applications as digital mapping and digital libraries. The combination creates a meaningful blend of old and new, bringing ancient political, religious, cultural and other material to life in a state-of-the-art format useful at all levels of learning. She is interested in the relationship between territory and state power in imperial China; and in developing methods to improve digital maps and timelines that support visualization and analysis in history and cultural heritage.
Marieka Arksey – Data Hoover Project Graduate Student Researcher (‘data hoover’)
Marieka Arksey is a third year Ph.D. student in the World Cultures interdisciplinary Ph.D. program at the University of California, Merced under Dr. Holley Moyes. She earned her B.S. in Archaeological Science from the University of Toronto, Canada and her M.A. in Arts, Histories and Cultures from the University of Manchester, England. She has worked at the Royal Ontario Museum in Toronto and the British Museum in London on Mesoamerican and Middle Eastern collection digitization projects.
Over the past nine years she has worked on archaeological projects in South Africa and Belize, with an emphasis on cave sites. Her current research focuses on ancient Maya ritual use of caves and their surrounding vicinities as they relate to ritual procession, the goals of these activities, the delineation of ritual space and the social accessibility of these spaces, and potentially the proprietorship of these ritual spaces. She has been working with Dr. Ruth Mostern on the CHIA project since Spring 2013.
DHP Work Plan 2013-15
Planning Phase (Spring 2013)
The DHP completed a Planning Phase in Spring 2013. The goals of the Planning Phase were to:
• Draft the DHP Work Plan
• Write a description of the DHP for the CHIA website
• Draft a preliminary survey to administer to HQSS data owners
• Hire the DHP Research Assistant
• Acquire Human Subjects Approval for the survey.
Phase 1 (Fall 2013-Spring 2014): Test the DHP Survey and Protocol.
The first stage of the DHP will focus on developing protocols for communicating with HQSS data owners and for ingesting and describing HQSS data. During this phase, the data hoover will work with people who are already part of CHIA, starting with CHIA PI Pat Manning and DHP PI Ruth Mostern and continuing to the other CHIA PIs and close collaborators. The aim is to complete a full work flow cycle with each collaborator, beginning with the administration of the DHP survey and continuing through data acquisition and description. The Phase 1 goals are to:
• Refine the DHP survey
• Ingest data for hosting on the CHIA Dataverse server at the University of Pittsburgh and articulate a digital stewardship procedure for managing acquired data
• Develop effective data descriptions though an iterative process with data owners (but not write formally structured metadata)
• Draft a controlled vocabulary of data categories according to temporal and spatial scale, discipline, and topic.
• Acquire names of additional data owners from core CHIA participants
• Document the DHP process and develop good practices for communication with HQSS data owners.
• Document affordances and barriers involved in HQSS data submission, description, and use.
By the end of Phase 1, the survey design, repository issues, and work flow cycle should be adequately refined to expand the DHP beyond the CHIA core participants.
Phase 2 (Spring 2014-Spring 2015): Expand DHP Activity
Phase 2 is an expanded version of Phase 1. The DHP will identify and contact participants in Phase 2 by:
• Communicating with individuals recommended by Phase 1 participants (and earlier Phase 2 participants)
• Systematically communicating with HQSS data users and data owners at UC Merced, the DHP home institution
• Communicating with HQSS data users and data owners associated with the UC Berkeley D-Lab.
• Identifying major HQSS centers and repositories worldwide and communicating with their affiliates.
• Consider developing and administering a modified survey for HQSS center directors.
The survey will be administered in person if possible in order to gain immediate feedback and collect immediate participant comments, followed up with electronic communication.
The Phase 2 goals are to:
• Finalize, streamline and document the protocols, categories, infrastructure design and documents drafted and tested in Phase 1, with insights gained through the participation of a larger community and the acquisition of more data.
• Acquire a larger quantity and greater diversity of data for the CHIA Dataverse repository.
• Communicate with HQSS scholars and students in their roles as data users as well as data owners (recognizing that most data owners are also users of other people’s data) in order to understand the use of HQSS data in teaching and research and the ways in which the CHIA Dataverse repository can best meet community needs.
Phase 3 (Spring 2015): Write Up.
Phase 3 is the documentation stage of the DHP. The goals of Phase 3 are to:
• Write a white paper for CHIA about good practices for ingesting and describing HQSS data and communicating with HQSS data owners.
• Write an article about HQSS data and repository development for submission to a leading peer-reviewed journal.
• Update the list of HQSS centers and repositories on the CHIA website.
• Assess whether the DHP should be an ongoing endeavor, and if so, seek additional funding.