Page Last Updated: October 10, 2025

Frequently Asked Questions🔗

Data Access/Use🔗

Please see instructions for data access here.

HBCD data are openly available to qualified researchers who have a legitimate research purpose and are affiliated with an institution that holds an active Federal Wide Assurance, a designation held by numerous institutions worldwide.

Please also see Executive Order 14117 that may cause data access issues for users within certain countries.

There is no direct cost or fee for HBCD data access.

The HBCD Administrative Core (HCAC) and the HBCD Data Coordinating Center (HDCC) do not provide letters that could be interpreted as endorsements for specific applications or projects proposing secondary analyses of HBCD Study data. However, we are committed to fostering open communication with the broader scientific community. To support this, we can provide a letter outlining our commitment to facilitating access to information about the HBCD Data Resource, helping researchers obtain the details they need to pursue their specific aims. To request such a letter, please contact Lilly Tureaud at ltureaud@health.ucsd.edu.

Each institution has their own definitions and requirements surrounding whether the use of the de-identified HBCD data is considered to be human subjects research. Please consult directly with your IRB.

It is prohibited to input HBCD data into generative AI tools (e.g., ChatGPT) because doing so would violate the terms of the data use agreement. These agreements strictly limit access to approved individuals to protect sensitive information. Generative AI tools process input data in ways that may result in unauthorized access or unintended use, making them unsuitable for handling restricted data.

Protocol🔗

Refer to the resources below for protocol information. Note that specific measures for proprietary instruments are generally not available.

Resource	Description
HBCD Study Protocols	Available on the main HBCD Study site.
Lasso and DEAP Data Dictionary Explorers	Show survey questions and response options (see related known issue).
HBCD Study Instrument Documentation	Each instrument has a dedicated page on this site with links to source documentation and available surveys. Click instrument names in the domain tables to view their documentation pages.
DCN Special Issue on HBCD	A collection of articles describing the HBCD study design and protocol development for specific measures.

Release Data🔗

HBCD Study data includes data in both tabulated and file-based formats. Tabulated data is in a standardized table format, with one table provided for all participant data per measure, and includes instrument data (e.g., demographics, behavior, environmental determinants, etc.) as well as data derived from the file-based data. File-based data are imaging and biosignal data provided in varied formats depending on the modality. This includes MRI & MRS, EEG, and wearable sensor recordings. See the section on Data Structure Overview for a further details, including the section: Which file-based data are also available as tabulated data?.

The source element in the NBDC Data Dictionary indicates whether the data came from the caregiver, child, etc. Source is also typically included in the table name itself, with some exceptions - see Naming Conventions for details. Note that, in the HBCD Study, all data are collected under the child’s subject ID, even when provided by the birth parent or another caregiver. This is because most information collected from caregivers pertains to the child. Please see details of the design logic here.

Fields reporting age include global, single-point (i.e. static) variables in Basic Demographics (see here), instrument-specific variables for age in tabulated datainstrument and derived data
(tabulated format) that vary depending on the date of administration for a given instrument, and age variables for raw file-based dataimaging and biosignal data
(varied formats) that vary based on date of acquisition. See full details under Age Variable Definitions.

Instrument table and field names may contain either single or double underscores. Single underscores separate main naming components (e.g. the domain or source of the data) while double underscores separate subcomponents that provide additional details nested within the main naming components. See the section of Naming Conventions for full details.

Imaging Data🔗

Please see HBCD Study MRI Protocols.

Please refer to the HBCD Processing Pipelines for an overview of the pipelines and software standards. For full documentation on how each pipeline was used for HBCD processing, please visit the external HBCD Processing page.

The full MRI processing workflow includes BIBSNet (deep learning model-derived brain segmentation), Infant fMRIPrep/Nibabies (structural and functional preprocessing), and XCP-D (functional post-processing and noise regression). The current release includes V02 and V03 BIBSNet derivatives, but only V02 derivatives for the remaining pipelines.

BrainSwipes quality control results generated from XCP-D visual reports also only include V02 as a result. Also note that at this age range, Infant fMRIPrep performs T2w-based surface reconstruction using M-CRIB-S, so T1w surface delineation and atlas registration QC is missing from BrainSwipes. However, the T1w, if present, was still used to inform the brain segmentation generated in BIBSNet, which is provided as an external input to Infant fMRIPrep processing.

FreeSurfer outputs, generated as part of Infant-fMRIPrep pipeline processing, are included in the data release within the freesurfer/ folder of the derivatives. See M-CRIB-S & FreeSurfer Source Directories for details. M-CRIB-S, a surface reconstruction method optimized for neonates, is used in place of FreeSurfer for processing. The FreeSurfer files are derived from the M-CRIB-S outputs, which are converted and remapped into FreeSurfer-compatible format.

Unprocessed raw imaging DICOM files will be made publicly available in the interim Release 1.1. However, raw data converted to the Brain Imaging Data Structure (BIDS) standard is included in HBCD Release 1.0 (see details).

Raw dMRI gradient tables can be found in the raw/ folder containing raw data standardized to the Brain Imaging Data Structure (BIDS). See here for an overview of BIDS and here for details of raw dMRI data. Processed gradient tables, adjusted for head rotation, are additionally provided in the QSIPrep derivatives.

HBCD image processing pipelines use field maps to perform distortion correction for structural and functional MRI data. Most researchers will likely use the processed data for their analyses and therefore do not need to use the fieldmaps for anything, as all pipeline output derivatives are already distortion corrected. However, if using the raw BIDS data for your research, note that each fMRI acquisition will have a specific pair of fieldmaps associated with it, acquired in AP and PA phase encoding directions, located under fmap/. The matching EPI fieldmaps can easily be identified by the run number, specified by run-{X} in the filename (see details).

Quality control (QC) metrics derived from automated and manual raw data QC procedures (described in the section Raw MR Data QC) are provided for each scan in the session-level sub-{ID}_ses-{V0X}_scans.tsv file. A sampling approach was used to select a subset of data for manual review based on the automated QC metrics. Therefore, while automated QC metrics are available for all scans, not all will include manual QC metrics in the scans.tsv file. Also note that although the QC field is the overall manual QC score of 1 (pass) or 0 (fail), this field will automatically have a score of 1 if only automated QC was performed.

Only imaging data that have passed quality control (QC) and compliance checks are included in this release. To help researchers make informed decisions, QC metrics are provided in various formats. Please refer to the following sections in the Release Notes for more details:

Raw Imaging Data:

Only data that meet QC standards, as described in Raw MR Data QC, are included.
QC metrics for raw data are available in the sub-{ID}_ses-{V0X}_scans.tsv file within each subject session folder under rawdata/.
Additional exclusion criteria include acquisition parameter checks and processing pipeline requirements.
Structural and functional MRI data undergo MRIQC processing to generate image quality metrics. See the sMRI and sMRI MRIQC derivatives for more information. Researchers may use these outputs for further curation if needed.

Processed ("Derivative") Imaging Data:

Included raw data are processed through pipelines that generate analysis-ready derivatives.
Processing pipelines, such as XCP-D (for structural and functional MRI) and QSIPrep (for diffusion MRI), produce visual reports that can help guide data selection.
Visual QC is performed on these reports using BrainSwipes, and the results are available in the BIDS phenotype/ folder.

Due to the relatively limited brain coverage in dMRI and fMRI acquisitions, the superior or inferior edges of the brain may occasionally fall outside the slice stack, referred to as field of view (FOV) cutoff. In cases where the cutoff is extreme (>30% of the image), the dMRI and fMRI series fail QC and are therefore excluded from inclusion in the data release. However, mild (<10%) to moderate (10–30%) FOV cutoff does not lead to QC failure. Brain regions outside of the FOV will have missing values in the tabulatedinstrument and derived data
(tabulated format) imaging data, but the remaining areas remain usable. Automated post-processing QC metrics provide measurements of superior and inferior FOV cutoff, which researchers use for the exclusion of participants with significant FOV cutoff from analyses. See HBCD Raw MRI Data QC in the Release Notes for a description of automated and manual quality control procedures for raw imaging data.