Research Data Task Force: a Report on Data Curation at Yale

The Research Data Task Force, sponsored by ODAI, has released its Final Report (linked below) on data related processes of Yale Faculty. The report covers perceived gaps in technology (hardware, software, and network), services, and policies that support the research enterprise throughout the data life cycle. It is based on information gathered from direct interviews with faculty across the university. 

"From telescope images to gene sequences, Yale researchers produce tens of thousands of gigabytes of digital data each year that can further research if properly managed, shared, and preserved — or it can vanish into cyberspace if neglected, Office of Digital Assets and Infrastructure Director Meg Bellinger said." (Yale Daily News, 4/13/2010)

Researchers are challenged by the demands of "data curation" -- storing and managing research data throughout the digital life cycle, of producing data management and sharing plans, of describing their data in ways that make them identifiable and usable, and of determining best formats and options for storing and sharing data securely over long periods of time.  Loss of data, unmet mandates from funding agencies to preserve data, and difficulties in securely sharing data with collaborators are challenges faced across multiple academic disciplines.

In order to understand the current state, breadth, and diversity of technologies used in research, interviews were conducted with 34 faculty drawn from a range of departments and professional schools. Of the thirty-four interviews, ten were in the social sciences, twenty-one in the sciences, and three in the humanities.

Five specific aspects of the life cycle of research data were addressed in the interviews:

1.    Data sharing: What mechanisms are in place to allow data sharing within the research group, within the institution, and with others outside the institution?
2.    Data management: How do researchers collect and manage collections of data during the research life cycle?
3.    Long term persistent access and preservation: What are the life span of research data and what policies and/or strategies are in place to access and preserve data not currently in use?
4.    Data ownership: What policies and practices are in place either locally or institutionally regarding propriety of data?
5.    Technical infrastructure: What is the current state of existing facilities and how do these impact faculty research?

The over-riding messages taken from these interviews are clear. It is critical that the University develop strategies to address the complex issues surrounding the research data life cycle and the stewardship of research outputs.  As the volume of data produced, stored, shared, and repurposed over time increases exponentially and the costs of managing, documenting, and preserving those outputs increases in parallel, the University needs to understand and evaluate the incentives, costs, and mandates behind digital stewardship. 

This report proposes a set of strategic recommendations to move forward in efforts to meet requirements and fill gaps identified through the interviews. Next steps will be to vet these recommendations and move toward developing, with ODAI partners, the business cases and pursue resource allocations.