This is the old site for the DataStaR project. We have changed the focus of the DataStaR project to that of a data set registry. More details can be found here.

Welcome to DataStaR, an experimental Data Staging Repository hosted by Albert R. Mann Library, at Cornell University.

The purpose of DataStaR is to support collaboration and data sharing among researchers during the research process, and to promote publishing or archiving data and high-quality metadata to discipline-specific data centers, and/or to Cornell's own digital repository (eCommons).

The project includes two main proposed innovations. The first is the development of a metadata management architecture which would allow managers of data staging repositories to approach heterogeneous data and metadata in a more flexible way while still leveraging the significant investment that has already been made in discipline-specific metadata schemas. The second is a model for a local data staging repository that provides data curation services early in the research cycle, and then promotes the transmission of data to repositories better suited for long-term curation and preservation, thereby improving access to research data sets.

Movement of data 
from 
local storage to public access

The conceptual model above shows the movement of metadata and data from individuals and research groups to systems supporting sharing with collaborators, and eventually with the public. The staging area represents local infrastructure to support sharing within a group of researchers. When data and metadata are ready for public release, they may be submitted to an institutional repository and/or discipline-specific repositories, which may in turn expose their content for harvesting by other repositories. In this particular example, the National Biological Information Infrastructure (NBII) harvests metadata submitted to the Knowledge Network for Biocomplexity (KNB), and Geospatial One Stop (GOS) harvests metadata from the Cornell Geospatial Information Repository (CUGIR). Institutional repositories may be indexed by web search engines.

The DataStaR team is developing a technical architecture for a local data staging repository where a researcher can:

  • create preliminary metadata for research data sets;
  • share preliminary data publicly, or only with selected colleagues;
  • complete a more detailed metadata record using a form-based editor, and optionally upload completed data sets to the staging repository;
  • export metadata in any number of domain-specific formats;
  • re-use elements of existing metadata records in the creation of new metadata records;
  • and finally, obtain assistance with any of these processes from librarians with domain-specific or general curatorial expertise.

This work extends activities initiated under NSF grant 0437603 (Small Grant for Exploratory Research), "Planning Information Infrastructure through a New Library Research Partnership." In that grant, a conceptual model was developed for library-laboratory collaborations in the arena of data curation, which is described more fully in our final report. In the DataStaR projcet, continued work with two research groups as well as additional partners is proposed to provide local assistance with research collaboration and data curation during the research process, using the proposed local data staging repository as a platform. Ultimately, the intent is to pass "publication-ready" data sets on to domain-specific repositories, or to Cornell's institutional repositories, as appropriate. If successful, this work will serve as a model for academic libraries to provide a data staging repository for use by researchers at their institutions. The model leverages the ability of a researcher's local institution to provide accessible support and services related to research data, early in the research process, and serves to promote the deposition of data in domain-specific repositories, thus making data available to the larger research community.

The DataStaR platform consists of a semantic metadata repository (based on the vitro software and a Fedora instance for storage of data sets.

DataStaR team members:

  • Brian Caruso, Programmer/Analyst
  • Kathy Chiang, Head, Life Science and Specialized Services
  • Jon Corson-Rikert, Head, Information Technology Services
  • Dianne Dietrich, Physics and Astronomy Librarian
  • Ann Green, Independent Research Consultant, Digital Life Cycle Research & Consulting
  • Huda Khan, Programmer/Analyst
  • Brian Lowe, Programmer/Analyst
  • Janet McCue, Director, Albert R. Mann Library, co-Principal Investigator
  • Gail Steinhart, Research Data & Environmental Sciences Librarian, Principal Investigator

DataStaR publications and presentations

Cornell researchers with data to share with collaborators or make publicly available are invited to contact the DataStaR team.

This material is based upon work supported by the National Science Foundation under Grant No. III- 0712989.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

DataStaR publications and presentations