Page Last Updated: May 18, 2026

Data Structure Overview🔗

The HBCD dataset follows NBDC data structure standards established as part of the ABCD Study (see details), which incorporates the Brain Imaging Data Structure (BIDS) wherever possible for cross-study consistency. At a high level, data are organized into two categories: tabulated and file-based data.

Tabulated Data
Tidy tables with one row per participant session and one column per variable.
Includes: Demographics, behavioral/phenotypic questionnaires (e.g. see details), and select pipeline derivatives, tabulated to match other instrument data (see details).
See detailed documentation →

File-Based Data
File-based data is an umbrella term for all other data that isn't tabulated, typically required due to the complex or multidimensional nature of certain data modalities.
Includes: Raw (raw BIDS) and processed (derivative) imaging, EEG, and wearable sensor recording data (with datasets organized under separate subject folders) and concatenated data for measures such as genomics, which include participant-level files aggregated across all subjects
See detailed documentation →

Folder Structure🔗

The following conventions are used to improve readability of file tree diagrams throughout this site:

Square brackets [ ] indicate placeholders with many possible values that are not exhaustively listed, e.g., sub-[ID]
Curly brackets { } indicate a defined set of all included values. These values are either listed directly inside the brackets (separated by |) or defined in a Label Values Legend below the file tree.
Sidecar JSON files may be omitted for brevity entirely or indicated by marking corresponding JSON files with (+JSON).
Some pipelines generate an .html visual summary report for quality assessment. These reports source images from a figures/ directory within the derivatives folder. The contents of figures/ are not listed for brevity.

hbcd/
├── rawdata/
│
│   # Tabulated data (demographics, behavior, etc.)
│   ├── phenotype/
│   │   └── {INSTRUMENT_NAME}.tsv
│
│   # Raw BIDS (MRI/MRS, EEG, biosensors)
│   ├── sub-[ID]/
│   │   ├── ses-[V0X]/   # Modality-specific subfolders
│   │   │   ├── anat/
│   │   │   ├── dwi/
│   │   │   ├── eeg/
│   │   │   ├── ...
│   │   │   └── sub-[ID]_ses-[V0X]_scans.tsv
│   │   └── sub-[ID]_sessions.tsv
│
│   # Dataset-level metadata
│   ├── dataset_description.json
│   └── participants.tsv
│
├── derivatives/
│   # Processed outputs by pipeline
│   └── {PIPELINE_NAME}/
│       └── sub-[ID]/
│           └── ses-[V0X]/   # Mirrors rawdata structure
│
└── concatenated/
    # Aggregated cross-subject datasets
    ├── genetics/
    ├── geocoding/
    └── study_navigator/

Tabulated Pipeline Derivatives🔗

Processing pipelines for imaging, EEG, and wearable sensor recordings output derivative files to separate subject- and session-specific directories. Whenever possible, derivative data is combined across participants to additionally provide a single file in the tabulated data. Users may choose to use either the file-based or tabulated data for their analyses depending on their needs. See filenaming conventions for tabulated derivatives here.

Not all processed data are available in tabulated form. Tabulated datasets have one row per participant/session, so only derivatives that can be summarized into a single row/column structure are tabulated. If no tabulated file exists for the derivatives you need, you will need to use the file-based data.