Data provided by human participants are subject to a variety of regulations and policies. Several offices at Washington University have oversight in human participant research. Below is a list of considerations for human participant data related to data management and sharing and resources for each consideration. All human participant research is subject to review and approval by the Institutional Review Board (IRB), which works closely with the Human Research Protection Office (HRPO).


If a project will collect data from human participants, informed consent must be obtained from each person who volunteers to provide data for the study unless a waiver of consent has been obtained. When consent is required, a written Informed Consent Form (ICF) must be developed and approved by the Washington University IRB or other IRB (e.g. another institution’s IRB for multi-site study with single IRB). The ICF should include a description of how human participant data will be collected, stored, protected, preserved and shared. 

A description of informed consent language often are included in data management and sharing (DMS) plans required by funders, as well. It is important to ensure the details of the processes described above (data collection, storage, protection, preservation, and sharing) are the same in the ICF approved by the IRB and the DMS plan submitted to a funder. Specifically, if your funder requires data sharing, it is crucial that the data sharing language in the ICF matches the data sharing language in the DMS plan submitted to the funder. If the ICF includes language that restricts sharing data as described in a DMS plan submitted to a funder, participants may need to be re-consented with modified language. For example, if the DMS plan states that data will be shared publicly in a data repository but the ICF states, “data will be shared with other researchers at Washington University,” the data will not be able to be shared publicly in a repository because the consent language states sharing will occur with WashU researchers. Data from participants who signed an ICF with this language will not be able to be shared publicly unless participants are re-consented or another process is approved by the IRB. This is because the ICF language states that sharing will occur “with researchers at Washington University.” If participants are not re-consented and data are not shared, this would be considered noncompliance with a DMS plan and potentially lead to funds withheld or loss of future funding opportunities. 

For additional information on data sharing language and ICFs, please review the Curation of Data Collected by Informed Consent Primer from the Data Curation Network. 

Which to write first? ICF or DMS Plan?

Depending on the nature of your project, the ICF or DMS plan may be prepared first. For example, an ICF may be prepared first if pilot data are needed for a grant submission. In this scenario, a protocol and ICF would be prepared and submitted to the IRB for approval followed by enrolling participants and collecting data in order to use the data in the grant. However, if pilot data are not needed for a grant submission because data from a previous study, Electronic Health Records (EHR), etc. are used for a grant submission rather than pilot data, the researchers may not have an ICF approved prior to submitting the grant. In this scenario, the DMS plan would be prepared prior to the ICF being approved by the IRB. If the DMS plan is prepared prior to receiving ICF approval and you have questions about these processes, it is strongly recommended that you consult with HRPO/IRB on the details of the DMS plan. Especially if this is the first time you will submit a DMS plan to a funder. 

WashU HRPO and IRB HELP Services

WashU HRPO and IRB offer HRPO Help Services to provide guidance on language included in the ICF related human participant data:

  1. SWAT On-Call Service: 314-747-6800
  2. Virtual Office Hours, Wednesdays, 9:30 a.m. – 1 p.m.
  3. IRB Consultation Request Form

The Washington University One Protocol One Consent is a standardized genomic protocol and consent that incorporates best-practices language for data sharing and permits the linkage of a participant or patient genetic data to the research copy of their electronic health record. This protocol and consent form can be used as an addition to an investigator’s clinical or research projects.

Institutional Certifications

Investigators working with large-scale human genomic data are required to submit an Institutional Certification to NIH. Learn about this important document and how to prepare it.


HIPAA

Any researcher collecting data from human participants is required to complete HIPAA training outlined by the Washington University HIPAA Privacy Office. Following training, researchers must follow the Policies and Procedures created by the HIPAA Privacy Office in order to maintain research participant privacy, security and rights. Researchers must familiarize themselves with the 18 HIPAA identifiers and how to prevent the disclosure of these identifiers.


De-Identification

Human participant data must be properly de-identified prior to sharing to ensure individual participant privacy. De-identification should be sufficient so individuals in the dataset are not at risk of being re-identified, even preventing individuals from self-identifying. 

Removing 18 HIPAA Identifiers

The most basic step for de-identification is removing the 18 HIPAA identifiers from a dataset. Ideally, stripping the dataset of the 18 HIPAA identifiers is straightforward, however, complete removal of certain data can cause loss of data utility. For certain types of data, there are techniques that allow for decreased risk of identification without completely removing the data. For example, dates such as clinic or lab visits, follow up survey completion, etc., and unique identifying number (i.e. MRN, record ID, study ID, patient ID, etc.) are included in the list of HIPAA identifiers. Date Shifting and Record Hashing are methods that reduce the risk of re-identification from dates and unique identifiers while maintaining the utility of the information.

Date Shifting involves software that uses an algorithm to randomly shift dates by a value between 0 and 364 days. The duration of time between dates within the project is maintained (e.g. the amount of time between a baseline and follow up visit), however, the actual dates that the events occurred are removed.

Record Hashing involves software converting unique identifiers, such as a record ID, to an unrecognizable value. This allows for datasets to be shared with a record ID that is different and unrecognizable from the record ID used internally by the study team. It is important that a Record ID “Crosswalk File” is created at the time of record hashing. The Crosswalk File links the original record ID to the newly created hashed value. This file should be stored by the study team in a secure location that is separate from any study data. 

Depending on the nature of the data, removing the 18 identifiers may be sufficient for de-identification. However, additional steps are often required. In addition to the 18 HIPAA identifiers, datasets can contain indirect or quasi-identifiersIndirect and quasi-identifiers are information that can be combined together or with external information (i.e. government databases, social media profiles) to re-identify an individual. Examples include location, salary, occupation, race, ethnicity, disease status, veteran status, pregnancy status, or any values that are outliers related to the rest of the study sample or population (e.g. age greater than 90 is considered an identifier and should be reported as a range). 

REDCap De-Identification

REDCap offers built-in features for removing identifiers, date shifting and record hashing. To learn more, open the REDCap De-Identification Tutorial and watch the REDCap in a Flash webinar Preparing De-Identified Data Exports.

Additional De-Identification Resources

  1. SAS Based Approach to De-Identification
    1. Jack Shostak, Duke Clinical Research Institute (DCRI), Durham, NC
    1. (Not sure how to cite this but the source code included is incredibly useful)
  2. Johns Hopkins Resource for De-identification
  3. Data Curation Network Data Primers
    1. Consent Form Primer
    1. Human Participant Data Essentials Primer
  4. National Institute of Standards and Technology Tools for de-identification[SC1] 
  5. Department of Health and Human Services Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule

Restricted Access Data Sharing

Another approach to protecting patient privacy is sharing data in a repository that allows restricted access. Restricted Access data sharing involves submitting a dataset to a repository but the dataset is not available for public use. Potential users are prompted to complete a form with information about their credentials and proposed use of the data prior to receiving access to the data. Once receiving access to the data, the user is subject to either a licensing agreement or potentially must enter a formal data use agreement.


Data Use Agreements

Data Use Agreements are formal contracts between a person who generated data and a person re-using data that have strict language regarding how data can be used, preserved, and destroyed. 

  1. Joint Research Office for Contracts (JROC)
  2. Data Use Agreement Intake Form

NIH Guidance

The NIH also released guidance on Informed Consent for Secondary Research with Data and Biospecimens. In addition, some Funding Opportunity Announcements (FOAs) or other grant awards state that awardees are required to submit data generated from the award to a domain specific repository. The FOA may list a single domain specific repository or provide a few repositories from which to choose from. For example, the NIH has several domain specific repositories. Some repositories provide guidance for developing informed consent language as well. Below are two examples:

  1. NIMH Data Archive (NDA): Crafting Informed Consent Language
  2. OpenNeuro recommended: Open Brain Consent Ultimate Consent Form

Protecting Privacy when Sharing Human Participant Data

To address the concerns about protecting privacy when sharing human research participant data, NIH released a notice (NOT-OD-22-213).

Supplemental Information to the NIH Policy for Data Management and Sharing: Protecting Privacy When Sharing Human Research Participant Data (NOT-OD-22-213)