Data provided by human participants are subject to a variety of regulations and policies. Several offices at Washington University have oversight in human participant research. Below is a list of considerations for human participant data related to data management and sharing and resources for each consideration. All human participant research is subject to review and approval by the Institutional Review Board (IRB), which works closely with the Human Research Protection Office (HRPO).


If a project will collect data from human participants, informed consent must be obtained from each person who volunteers to provide data for the study.  It would be rare that the IRB would approve sharing data from human participants that did not provide consent. A written Informed Consent Form (ICF) must be developed and approved by the Washington University IRB or other IRB (i.e. another institution’s IRB for multi-site study with single IRB, a commercial IRB if working with industry collaborators). The ICF should include a description of data management and sharing that is consistent with the Data Management and Sharing Plan (DMSP).  At the time of IRB submission, the consent form may be evaluated for consistency with the DMSP depending on the IRB’s procedures.

It is important to ensure the language in the ICF is consistent with the DMS plan.  It is not necessary to include all details of the DMSP in the consent form but rather a summary to describe how the data will be collected, used and shared.  If the consent language is not consistent with the DMSP, participants may need to be re-consented with modified language. For example, if the DMSP states that data will be shared publicly in a data repository but the ICF states, “data will be shared with other researchers at Washington University,” the data will not be able to be shared publicly in a repository unless participants are re-consented.  In cases of inconsistency between the DMSP and the consent form, it may be necessary to work with the funding agency to modify the DMSP and get an approval on the revised DMSP.  Otherwise, if participants are not re-consented and data are not shared, this would be considered noncompliance with an approved DMSP and potentially lead to funds withheld or loss of future funding opportunities. 

Which to write first? ICF or DMS Plan?

In the majority of cases, the DMSP will be created prior to the IRB approval of the ICF.  The IRB does not review projects at the time of grant submission.  Upon a Just-In-Time notice, an IRB submission with the ICF will be accepted.  The DMSP should be included in the IRB submission.

When the DMSP is prepared prior to IRB approval of the ICF and you have questions about consent language, it is strongly recommended that you consult with HRPO/IRB. Especially if this is the first time you will submit a DMSP to a funder. The feasibility of the DMSP with respect to IRB approval and consent should be considered. It is recommended that you consult with HRPO/IRB prior to submitting your DMSP if you propose to:

  • Provide open access to the data.
  • Share data that was collected without consent.
  • Share data with identifiers.

An ICF may be prepared first if pilot data are needed for a grant submission. In this scenario, a protocol and ICF would be prepared and submitted to the IRB for approval followed by enrolling participants and collecting data in order to use the data in the grant.  The ICF under which the data was collected should be taken into consideration when developing the DMS plan.

WashU HRPO and IRB HELP Services

WashU HRPO and IRB offer HRPO Help Services to provide guidance on language included in the ICF related human participant data:

  1. SWAT On-Call Service: 314-747-6800
  2. Virtual Office Hours
  3. IRB Consultation Request Form

The Washington University One Protocol One Consent is a standardized genomic protocol and consent that incorporates best-practices language for data sharing and permits the linkage of a participant or patient genetic data to the research copy of their electronic health record. This protocol and consent form can be used as an addition to an investigator’s clinical or research projects.

Institutional Certifications

Investigators working with large-scale human genomic data are required to submit an Institutional Certification to NIH. Learn about this important document and how to prepare it.


HIPAA

Any researcher collecting data from human participants is required to complete HIPAA training outlined by the Washington University HIPAA Privacy Office. Following training, researchers must follow the Policies and Procedures created by the HIPAA Privacy Office in order to maintain research participant privacy, security and rights. Researchers must familiarize themselves with the 18 HIPAA identifiers and how to prevent the disclosure of these identifiers.


De-Identification

Human participant data must be properly de-identified prior to sharing to ensure individual participant privacy. De-identification should be sufficient so individuals in the dataset are not at risk of being re-identified, even preventing individuals from self-identifying. It also is important to consider the population under study. For example, the population size for individuals with a rare disease could be so small that the data would be considered identifiable regardless of what the identifiers or other information is removed. Imaging data may also be challenging to de-identify. In these cases, data sharing may not be possible if data cannot be fully de-identified. Contact the IRB using the resources described above when creating your DMSP if you are unsure if your data can be fully de-identified and shared.

Removing 18 HIPAA Identifiers

The most basic step for de-identification is removing the 18 HIPAA identifiers from a dataset. Ideally, stripping the dataset of the 18 HIPAA identifiers is straightforward, however, complete removal of certain data can cause loss of data utility. For certain types of data, there are techniques that allow for decreased risk of identification without completely removing the data. For example, dates such as clinic or lab visits, follow up survey completion, etc., and unique identifying number (i.e. MRN, record ID, study ID, patient ID, etc.) are included in the list of HIPAA identifiers. Date Shifting and Record Hashing are methods that reduce the risk of re-identification from dates and unique identifiers while maintaining the utility of the information.

Date Shifting involves software that uses an algorithm to randomly shift dates by a value between 0 and 364 days. The duration of time between dates within the project is maintained (e.g. the amount of time between a baseline and follow up visit), however, the actual dates that the events occurred are removed.

Record Hashing involves software converting unique identifiers, such as a record ID, to an unrecognizable value. This allows for datasets to be shared with a record ID that is different and unrecognizable from the record ID used internally by the study team. It is important that a Record ID “Crosswalk File” is created at the time of record hashing. The Crosswalk File links the original record ID to the newly created hashed value. This file should be stored by the study team in a secure location that is separate from any study data. 

Depending on the nature of the data, removing the 18 identifiers may be sufficient for de-identification. However, additional steps are often required. In addition to the 18 HIPAA identifiers, datasets can contain indirect or quasi-identifiersIndirect and quasi-identifiers are information that can be combined together or with external information (i.e. government databases, social media profiles) to re-identify an individual. Examples include location, salary, occupation, race, ethnicity, disease status, veteran status, pregnancy status, or any values that are outliers related to the rest of the study sample or population (e.g. age greater than 90 is considered an identifier and should be reported as a range). It is important to consider the population under study. For example, the population size for individuals with a rare disease could be so small that the data would be considered identifiable regardless of what the identifiers or other information is removed. Imaging data may also be challenging to de-identify

REDCap De-Identification

REDCap offers built-in features for removing identifiers, date shifting and record hashing. To learn more, open the REDCap De-Identification Tutorial and watch the REDCap in a Flash webinar Preparing De-Identified Data Exports.

Additional De-Identification Resources

  1. SAS Based Approach to De-Identification
    1. Jack Shostak, Duke Clinical Research Institute (DCRI), Durham, NC
    2. (Not sure how to cite this but the source code included is incredibly useful)
  2. Johns Hopkins Resource for De-identification
  3. Data Curation Network Data Primers
    1. Consent Form Primer
    2. Human Participant Data Essentials Primer
  4. National Institute of Standards and Technology Tools for de-identification[SC1] 
  5. Department of Health and Human Services Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule

Restricted Access Data Sharing

Another approach to protecting patient privacy is sharing data in a repository that allows restricted access. Restricted Access data sharing involves submitting a dataset to a repository but the dataset is not available for public use. Potential users are prompted to complete a form with information about their credentials and proposed use of the data prior to receiving access to the data. Once receiving access to the data, the user is subject to either a licensing agreement or potentially must enter a formal data use agreement.


Data Use Agreements

Data Use Agreements are formal contracts between a person who generated data and a person re-using data that have strict language regarding how data can be used, preserved, and destroyed. 

  1. Joint Research Office for Contracts (JROC)
  2. Data Use Agreement Intake Form

NIH Guidance

The NIH also released guidance on Informed Consent for Secondary Research with Data and Biospecimens. In addition, some Funding Opportunity Announcements (FOAs) or other grant awards state that awardees are required to submit data generated from the award to a domain specific repository. The FOA may list a single domain specific repository or provide a few repositories from which to choose from. For example, the NIH has several domain specific repositories. Some repositories provide guidance for developing informed consent language as well. Below are two examples:

  1. NIMH Data Archive (NDA): Crafting Informed Consent Language
  2. OpenNeuro recommended: Open Brain Consent Ultimate Consent Form

Protecting Privacy when Sharing Human Participant Data

To address the concerns about protecting privacy when sharing human research participant data, NIH released a notice (NOT-OD-22-213).

Supplemental Information to the NIH Policy for Data Management and Sharing: Protecting Privacy When Sharing Human Research Participant Data (NOT-OD-22-213)