Page Last Updated: October 10, 2025

Metadata & Naming Conventions๐Ÿ”—

NBDC Data Dictionary๐Ÿ”—

Tabulatedinstrument and derived data
(tabulated format)
HBCD data is organized into a standardized table format, each of which contains a set of variables. The metadata for studies released via the NBDC Data Hub consists of:

  • Data dictionary: Provides detailed information about the variables in the tabulated data resource, including the variable name, label, description, data type, and other relevant information.
  • Levels table: Provides information about the levels of categorical variables in the tabulated format data (label, order, etc.)

Below are the definitions for the columns in the data dictionary and levels table. Note that some columns also correspond to elements in the BIDS JSON files that accompany all tabulatedinstrument and derived data
(tabulated format)
data (hover over icon for details in table below).

See here for overview of tabulated vs file-based data.

Data Dictionary & Levels Column Definitions๐Ÿ”—

CAUTION: Instruction text may be incomplete or misaligned! Review the known issue before use.

Data Dictionary Column Definitions โ–ธ
Name Label Description { Possible Values } / Example(s) MutableValues may vary across releases
study Study Indicates whether table/measure is a core components of the study or belongs to a substudy / anxiliary study { Core; Substudy }
domain Domain Domain/HBCD Workgroup { Behavior/Child-Caregiver Interaction;
Biospecimens;
Demographics;
Neurocognition & Language;
Novel Tech;
Physical Health;
Pregnancy/Exposure Including Substance;
Social & Environmental Determinants;
Tabular EEG;
Tabular imaging }
source Source Source of information for this table/measure { Biological Mother;
Caregiver (Responsible Adult);
Child;
General }
table_name Table name Name of table/measure mh_p_cbcl
table_label  Corresponds to MeasurementToolMetadata > Description in BIDS JSON Table label Label for table/measure Child Behavior Checklist [Parent]
name Variable name Name of column/variable/question mh_p_cbcl__aggr_001
label  Corresponds to Description in BIDS JSON Variable label Label for column/variable/question "Demands a lot of attention"
instruction Instruction Instructions preceding table/measure questions "The next set of questions is about your child's behavior in different situations and contexts. Please fill in a response to all questions."
header Header Header/instructions for a set of questions "Below is a list of items that describe children and youths. For each item that describes your child ... ... now or within the past 6 months, please choose whether the item is very true or often true of your child, somewhat or sometimes true of your child, or not true of your child. Please answer all items as well as you can, even if some do not seem to apply to your child."
note Note Note displayed to participants "Enter weight in pounds."
unit  Corresponds to Units in BIDS JSON Unit Unit of measurement m, cm2, lbs
type_var  Derivative element in BIDS JSON set to true if type_var = summary score or derived item Variable type Type of column/variable/question { administrative Data that gives context to the assessments, e.g. date of assessment, language, quality control, etc. ; item Original data provided by the participant, e.g. questions in a questionnaire ; derived item Derived from original data provided by the participant - e.g. if the participant filled in two fields to enter their height in feet and inches, a derived item could integrate this information into one field that provides the height in inches ; summary score Summary and/or score output based on algorithmic conversions of items/raw data }
type_data Data type Data type (in database) { date; timestamp; time; character Character only used for categorical columns ; text; integer; double }
type_level Level of measurement Measurement level/scale type { nominal; ordinal; interval; ratio }
type_field Field type Field type in data capture system as presented to participant dropdown; radio; checkbox
order_display Display order Display order of item within measure  
branching_logic Branching logic Branching logic applied to column/variable/question  
label_es Label (Spanish) Label (Spanish)  
instruction_es Instruction (Spanish) Instruction (Spanish)  
header_es Header (Spanish) Header (Spanish)  
note_es Note (Spanish) Note (Spanish)  
unique_identifiers Identifier column(s) Unique identifier column names (variable/table)  
url_table Documentation for table Link to study instrument documentation  
url_table_warn_use Responsible Use Warning (table) Link to responsible use warning (table)  
url_table_warn_data Data Warning (table) Link to data warning (table)  
url_warn_use Responsible Use Warning (variable) Link to responsible use warning (variable)  
url_warn_data Data Warning (variable) Link to data warning (variable)  
order_sort Sort order Standard sort order in table/measure (and โ‡’ column order in data/database)  
Levels Definitions โ–ธ
Name JSON Element Description Example MutableValues may vary across releases
name   Name of the categorical column/variable/question for which value/label pairs are reported  
value left hand side Value of the level 1
order_level Order of response option (in data and how they were displayed to participants) 2
label right hand side Label of the level Yes
label_es Label of the level (Spanish) Si

Lasso User Warnings - HBCD๐Ÿ”—

Additional Columns ('cohort' & 'site') Not Defined in Data Dictionary โ–ธ

    Dataset downloads contain 2 additional columns not described in the data dictionary. This includes cohort and site, identical to Visit Information variables par_visit_data_<cohort|site>.

Blank Columns in Lasso Query Tool โ–ธ

Column names appended with *_es are currently blank in the Lasso Dictionary Query Tool and will become available in a future release. Some columns in the data dictionary are not applicable to HBCD study data. These columns will appear in Lasso Portal queries, but will have blank values. Examples include atlas, metric, sub_domain, columns including nda/deap/redcap, etc. These columns can be safely ignored.

Naming Conventions๐Ÿ”—

A standardized naming convention is used across most tables and fields in the tabulatedinstrument and derived data
(tabulated format)
release data. These conventions are adapted from the ABCD Study and ensure consistency across instruments and derived datasets, allowing for intuitive parsing of variable meaning and structure.

Convention Logic & Rules๐Ÿ”—

The standard variable naming format is comprised of 4 or 5 main components:

domain_source_table_{scale}_item

  • Main components are generally separated by a single underscore ( _ ). Most instruments with multiple scales will additionally include the scale component (this component is otherwise optional and not included in all variable names).
  • Subcomponents are separated by double ( __ ) underscores to indicate nested components of table, scale, and/or item. Subcomponents distinguish finer details such as subscales, versions, or counter types. Multiselect fields are preceded by triple underscores ( ___ ), mainly relevant for V01 Demographics (sed_bm_demo) variables.

Naming Component Definitions๐Ÿ”—

Component Definition Example Values
domain Data domain (e.g. biospecimens, imaging) bioBiospecimens; imgImaging/MRI; sedSocial & Environmental Determinants; pexPregnancy & Exposures, Including Substance Use; see full list
source Subjectwho the protocol element is about/respondentwho completed the assessment (e.g., child, birth parent) bmBiological Mother; chChild; see full list
table Instrument/protocol element name Varies by instrument
{scale} Name of scale within instrument/protocol element - only if instrument contains multiple scales Varies by instrument - see details
item Will either be an item number corresponding to individual questions in a scale or admin field/score label for administrative/summary score variables - see details 001; 001__01; etc.
or admin field/score label

Details(Click sections to expand)๐Ÿ”—

Domain & Source: Possible Values โ–ธ
Possible Values: domain
bioBioSpecimens
eegTabular EEG
mhBehavior/Child-Caregiver Interaction
imgTabular Imaging
nclNeurocognition and Language
ntNovel Tech (Novel Technology & Wearable Sensors)
pexPregnancy/Exposure Including Substance
phPhysical Health
sedSocial and Environmental Determinants
Possible Values: source
bmBiological Mother
cgCaregiver (Responsible Adult)
chChild
ldLinked Data
raRA (research assistant)
Scale Details โ–ธ

Most variables of instruments/tables composed of multiple scales include an additional naming component for scale (with the exception of administrative/summary score variables - see details). The following instruments in the current release are examples of tables that include the scale component in their variable names. Note that this is not a comprehensive list.



Domain Instrument Table Name Example Variable
BCGIBehavior & Child-Caregiver Interaction IBQ-R (VSF)+BI mh_cg_ibqr mh_cg_ibqr_beh_001
PEXPregnancy & Exposure, Including Substance Use FAM MH pex_bm_psych pex_bm_psych_bf_001
SEDSocial & Environmental Determinants BFY sed_bm_bfy sed_bm_bfy_econstr_001
PROMIS sed_bm_strsup sed_bm_strsup_socspprt_001

Exceptions(Click sections to expand)๐Ÿ”—

Some variables do not fully follow the standard naming convention, which will be improved in future releases. Notable exceptions are as follows (click to expand):

Administrative & Summary Score Variables โ–ธ

Administrative (e.g., language or date of administration) and summary score (e.g., sums or means of individual items in a table) variables include administrative fields and score labels in place of item (or {scale}_item where relevant). Admin and score labels often include single underscores, but represent single main components. For example, possible values include:

Admin fields administration; location; lang; date_taken; candidate_age; gestational_age; adjusted_age
Score labels score; summary_score; total_score; etc.
Derived Variables โ–ธ

Derived tables, including Basic Demographics (sed_basic_demographics), containing global, static variables, and Visit Information (par_visit_data), containing dynamic/longitudinal visit-level data, do not follow the naming conventions outlined above. For example, both fall under the domain Demographics and source General in the NBDC Data Dictionary, but use sed_basic (in reference to Social & Environmental Determinants from which the Basic Demographics information is derived) and par_visit (for participant information from visit-level data) in place of the domain_source naming components.

Biospecimens โ–ธ

Biospecimen names are largely descriptive, e.g. bio_bm_biosample_nails_results and bio_bm_biosample_urine table names.

Tabulated MRI, MRS, & EEG Data โ–ธ

Tabulated data derived from MRI & MRS and EEG file-based data follow a unique naming convention. All files begin with the domain (img or eeg) in accordance with the conventions described above, but the following elements are the pipeline name (pipeline) and basename of the derivative output by that pipeline (derivative):

domain_pipeline_derivative

For example, the following subject/session-level XCP-D derivatives are combined into a single tabulated file:

File-based derivatives sub-{ID}_ses-{V0X}_task-rest_dir-PA_run-{X}_space-fsLR_seg_Gordon_stat-alff_bold.tsv
Tabulated file img_xcpd_space-fsLR_seg_Gordon_stat-alff_bold.tsv

Example๐Ÿ”—

Let's break down the following example: ncl_cg_spm2__inf_soc_001

  • ncl: Neurocognition & Language (domain)
  • cg: Caregiver (source)
  • spm2__inf: nested table name
    • spm2: the SPM-2 instrument (table)
    • inf: Infant version of SPM-2 (table subcomponent)
  • soc: scale for metrics of socialization (scale)
  • 001: item number (item)

Study Design Logic: Child-Centric Data Structure๐Ÿ”—

The HBCD Study organizes data around the Child ID as the central key. All caregiver-provided data (e.g., from biological mothers or other caregivers) is nested under the corresponding Child ID. This structure supports the studyโ€™s goal of enabling longitudinal analyses of child development by:

  • Simplifying child-focused analysis: Researchers can track each childโ€™s data over time without remapping caregiver information.
  • Handling multi-birth cases cleanly: When a caregiver reports on multiple children (e.g., twins), each childโ€™s data remains distinct, avoiding complex joins or disambiguation.