Tuesday, September 23, 2008

NISO Thought Leaders Meeting on Research Data

On October 1st, NISO will hold a Thought Leader meeting on the topic of research data. The meeting is part of a series that NISO is holding in conjunction with a Mellon Foundation grant. The goal of these meetings is to incubate new standards initiatives by discussing issues and areas where standards can help address pain points, push forward use, or drive application of systems in research and information exchange.

Our goal for the meeting will be to 1) brainstorm about what barriers exist to wider sharing of research data and then 2) identify a small list of standards-related initiatives that could make a difference in this area. Following the Thought Leaders meeting, NISO will organize a Technical Working Group to further examine opportunities for development of standards in this area.

In advance of the meeting, invited experts are encouraged to provide comments here that will help us identify key issues for discussion on October 1st. Topics for consideration include: issues around provenance, metadata, citation and reference, version control and tracking, preservation, privacy, intellectual property, packaging and facilitating reuse.

Thanks for your help in preparing for this meeting.


stu said...

I have two general themes that I believe should be part of the discussions this week:

Diversity of Structure
The conduct of science is to a large degree similar to a diverse and vibrant small business environment – a wide variety of ‘products’ developed with little attention to interoperability across data types and structures.
Improving reusability will benefit from efforts to generalize the data ‘parts’ so that the distance between variant structures, and associated semantics, is reduced.

Diversity of identification
Bibliography is plagued with issues of duplication and lack of canonical identification. There are many identifiers, each designed to support a different purpose or business case. Reconciling and coordinating them is costly and fraught with ambiguity.
Cannonical identification of data sets is a critical aspect of being able to relate them accurately to publications, instruments, experimental protocols, interpretive and rendering software, people, organizations, and other data sets.
As it is early in the management of digital datasets, there is an opportunity to establish conventions and namespaces that will support this objective over the long term, and the resulting identifiers will themselves carry a measure of branding and surety of fixation in the scaffolding of science.

maxine said...

I'd like to draw attention to an Editorial in Nature Chemical Biology 4, 575 (2008) (this month) about the practice of citing "data not shown" in scientific papers.
'Data not shown' is an outdated caveat that obscures the transparency of a scientific report and weakens the peer review process.