Comments on the draft NIH Policy for Data Management and Sharing

Response to the “Request for Public Comments on a DRAFT NIH Policy for Data Management and Sharing and Supplemental DRAFT Guidance”.

Section I: Purpose

There should be a presumption that all research data underlying a publication is shared at time of publication. The current language is weak and has statements such as “shared data should be made accessible” or “not all data generated in the course of research may be necessary to validate and replicate research findings.” Instead the policy should say that shared data MUST be made accessible, except when justified by a small number of reasons, such as participant privacy concerns that cannot be overcome by protective measures, or studies on vulnerable populations.

The draft lists an expectation of “timely” data sharing. This should be defined as generally at the time of publication. Funding opportunities specifically designated to create a shared resource should specify a date by which data must be available even in the absence of a publication. This aspect is a step backwards from previous NIH policy which clearly defines “timely” as “no later than the acceptance for publication of the main findings from the final data set.” The relaxation of this existing requirement is not justified.

Section II: Definitions

Should include definitions of FAIR data and the 15 FAIR principles.

Section III: Scope

Scope should make clear that the policy continues to apply for scientific data produced by funding in whole or in part from NIH after the NIH funding period is over.

Section IV: Effective Date(s)

The current absence of an effective data management and sharing policy and lack of enforcement causes a serious negative impact on health research and enables an ongoing waste of public funds. The noncommittal implementation date of the draft is unacceptable. The final policy should have a “no later than” date for implementation, ideally 12 months after issuance of the final policy.

Section V: Requirements

To ensure good data management, any data described as collected in a progress report must be deposited independently and an accession code or digital object identifier (DOI) supplied. Except when specified by the funding opportunity announcement, researchers may embargo this data until publication. Grant opportunities specifically designated to create a shared resource should specify a date by which data must be available even in the absence of a publication.

It should be clear that these requirements apply not just to research project grants and contracts, but most other forms of requests for support that will lead to the creation of scientific data. This includes cooperative agreements, career grants, fellowships, scholarships, and training grants.

Absent a compelling reason otherwise, contract solicitations should specify that collected data is the property of NIH. They should also include specific requirements that data should be made publicly available in a third-party repository as a periodic deliverable, upon which further funding can be conditioned.

There are a large number of digital repositories with different policies. You should require that acceptable digital repositories must not allow recipients to unilaterally change or delete deposited data. The repositories, may, however, allow adding new versions of data advertised in metadata for the original dataset.

It is important to protect human participant privacy but it is also important that concerns about human participant privacy not be abused to eliminate appropriate data sharing. It is especially worth considering that many human participants expect that data from their participation will be shared with other qualified researchers. Ineffective sharing of the resulting data (assuming appropriate protective measures such as de-identification are in place) is unethical as it wastes human participants’ contributions to research and may result in more patients being exposed to harm. Therefore it should be an explicit goal of this policy and any submitted Data Management and Sharing Plans to maximize access subject to necessary restrictions.

Section VI: Data Management and Sharing Plans

The draft states that NIH encourages scientific data to be made available. Instead, it should REQUIRE that scientific data are shared.

An effective Data Management and Sharing Plan should increase the overall impact of a grant and an ineffective one will decrease it. It is important that Data Management and Sharing Plans be provided to NIH peer reviewers and ICO advisory council review so they can consider the plan’s effect on the application’s overall impact, significance, and approach. Guidance to reviewers on how to scored review criteria such as significance and approach should include review of the Data Management and Sharing Plan.

Therefore, NIH should require Data Management and Sharing Plans at the regular submission due date for an application, and not as a Just-in-Time submission. Overcoming deficiencies in the Data Management and Sharing Plan identified in summary statements could be provided as a Just-in-Time submission.

NIH should require that data management plans must describe how the researchers address each of the 15 FAIR Principles.

NIH should publish data management plans for funded grants and contracts alongside abstracts in public databases such as RePORTER. This will increase transparency and let other researchers and the public know what the grantees promised to NIH. This is the only thing that will make enforcement of individual plan items possible, given that NIH does not have the resources for exhaustive, systematic checks on compliance. Grantees knowing that their data management and sharing promises are readily available to the public will provide some measure of self-enforcement. Currently data sharing plans are available through Freedom of Information Act requests, and putting them on RePORTER will reduce the burden on data requesters.

The draft says that only data “deemed useful to the research community or the public” need be shared. It should be clear that applicants do not get to unilaterally decide what data is deemed useful. Any exceptions to the general principle that scientific data must be shared must be justified and funding conditioned on prior approval by an NIH advisory committee of data management experts that includes data scientists and librarians.

For intramural research, you should not give a single NIH official (such as Scientific Director or Clinical Director) the ability to assess Data Management and Sharing Plans without oversight. Data Management and Sharing Plans must be reviewed and approved by Boards of Scientific Counselors and ICO advisory councils during the existing periodic peer review and site visit process.

Section VII: Compliance and Enforcement

It is currently unclear where to turn when NIH data sharing expectations and policies are not followed. To solve this, RePORTER should list, for each grant, contact information to request corrective action for violations of the Data Management and Sharing policy or published Data Management and Sharing plans. This should include contact email addresses for the principal investigators/project directors of the grant, contact email addresses for officials representing the grantee institution, and a contact email address at NIH. That will allow for solving issues at the most local level, when possible, and escalation when the previous proves ineffective. Similar information should be available for contracts and for intramural research projects.

In addition to reviewing progress reports and addressing complaints, NIH ICOs should also perform more thorough random audits to ensure grantees are performing data management as expected.

Current sanctions listed in the draft policy are incredibly weak and will have no deterrent effect. The policy should mention that failure to follow the Data Management and Sharing policy can be considered research misconduct by NIH. The policy should specify that violating the policy in place at the time of competing award at any time thereafter (including after the end of the award period) can result in sanctions. These sanctions can include publication of a notice describing the violation in the NIH Guide to Grants and Contracts, debarment and suspension from contracting, subcontracting, or financial assistance from the federal government, and prohibition of service to the Public Health Service on advisory committees, boards, or peer review committees, or as a consultant. Because it touches on potential research misconduct, this policy must be reviewed by the HHS Office of Research Integrity.

Supplemental DRAFT Guidance: Allowable Costs for Data Management and Sharing

The guidance should specify that fees that preserve data beyond the funding period are allowed, as are personnel expenses related to data sharing.

Supplemental DRAFT Guidance: Elements of a NIH Data Management and Sharing Plan

An entry of “to be determined” in a Plan is not acceptable. This language will encourage useless Plans and should be removed.

Statements like “NIH does not expect researchers to share all scientific data generated in a study” defeat the purpose of this policy. Instead NIH should make clear that they do expect and require sharing of scientific data except in limited exceptions, justified by the applicant, and prior approval by peer reviewers, program staff, and an NIH advisory committee of data management experts that includes data scientists and librarians.

Section 1 describes “consistency with community practices” as a potential rationale for deciding which data are preserved and shared. In many scientific disciplines, community practices lag far behind general best practices and what the public expects for data management and sharing. This language allows certain communities to settle for mediocrity in data management and sharing, defeats the aim of this policy to improve data management and sharing. It should be removed. This also illustrates why decisions to withhold scientific data from sharing should not only be reviewed by study section members trained in the same discipline but also an NIH advisory committee of data management experts that includes data scientists and librarians.

Section 4 says that “if an existing data repository(ies) will not be used, consider indicating why not”. This policy should require the use of established repositories, except when exceptions are justified and approved. It should not be up to applicants to unilaterally decide not to use standard established repositories and to not even justify the same.

Section 5 anticipates that applicants may have restrictions on sharing imposed by existing or future agreements. This provides a major loophole in the policy in that applicants may choose to enter into more restrictive agreements than necessary so that they can avoid data sharing. This can be overcome by (1) providing data sharing plans as part of initial peer-review so that peer reviewers can appropriately score any decrease in impact that may come about from restrictions on sharing, and (2) review by an NIH advisory committee that includes data scientists and librarians.

Other Considerations Relevant to this DRAFT Policy Proposal

I applaud your efforts to establish an excellent research data management and sharing policy. As written, I do not think this policy will provide a substantive change in data sharing. To maximize the benefit to the public of providing research funds, it is essential that the policy and enforcement be strengthened as described in this response.

In general, the draft policy is overly cautious and fails to consider the burden an ineffective policy will place on researchers who seek to use shared NIH-funded scientific data. The current system is incredibly burdensome on those seeking to obtain shared data because when data are not available as per existing NIH expectations, investigators can stonewall requests. There is no enforcement and the way to request enforcement is unclear. My most serious concern about this policy is that it is too vague on requirements in some places and lacks sufficient detail on enforcement.

A policy with ineffective, vague requirements and no real enforcement will have a serious negative impact on researchers who seek to use scientific data produced with public funds. There is a huge waste of researcher time and money attempting to obtain data that is lost, improperly described, or withheld. Failure to follow good data management practices leads to great inefficiency and slows the work of many researchers. There is also a large impact on our research communities, which lose opportunities to aggregate data and create a whole that is greater than the sum of its parts.

It is good to have both requirements and incentives to encourage high-quality data management. I suggest that an “Incentives for High-Quality Data Management and Sharing” section be added to the policy, including the following incentives:

  1. Add to the NIH biosketch a section for key personnel to describe their most significant contributions to data management and resource sharing (including data, code, reagents, samples, and other materials). This should be separate from other contributions to avoid it getting short shrift due to lack of space. The past record of the principal investigator and other key personnel should be explicitly added to the scored review criteria.
  2. NIH should create awards to recognize and cultivate excellence in data management and resource sharing, both at the individual researcher and institutional levels.


Thanks to Lucia Peixoto and Tim Triche, Jr. for helpful comments. Thanks to Casey Greene and Anna Greene for sharing a draft of their thoughts on this policy which influenced my own. Much of the text I used here comes from my previous “Feedback on the draft Tri-Agency Research Data Management Policy”.