Page Last Updated: October 17, 2025

Naming Conventions🔗

The instrument table and variable names used for tabulated HBCD study data largely follow standardized naming conventions adapted from the ABCD Study. This ensures consistency across instruments and derived datasets, allowing for intuitive parsing of variable meaning and structure.

Convention Logic & Rules🔗

The standard variable naming format is comprised of 4 or 5 main components separated by a single underscore ( _ ). The scale component is only present in a subset of instruments that contain multiple scales:

domain_source_table_{scale}_item

Variable names may also include subcomponents, separated by double ( __ ) underscores to indicate nested components of table, scale, and/or item. Subcomponents distinguish finer details such as subscales, versions, or counter types. Finally, multiselect fields are preceded by triple underscores ( ___ ), mainly relevant for V01 Demographics (sed_bm_demo) variables.

Let's break down the following example: ncl_cg_spm2__inf_soc_001

ncl: Neurocognition & Language (domain)
cg: Caregiver (source)
spm2__inf: nested table name
- spm2: the SPM-2 instrument (table)
- inf: Infant version of SPM-2 (table subcomponent)
soc: scale for metrics of socialization (scale)
001: item number (item)

Naming Component Definitions🔗

Details of individual naming components are as follows:

`domain`	Data domain, e.g. `bio` (Biospecimens), `img` (Imaging) - see values key
`source`	Can either be the subject/who the protocol element is about OR respondent/who completed the assessment. Examples include `cg` (Caregiver), `ch` (Child), etc. - see values key
`table`	Instrument/protocol element name
`{scale}`	Name of scale within instrument/protocol element for instruments with multiple scales (not including administrative/summary score variables). For example, the IBQ-R (VSF)+BI includes 4 scales, each indicated by a separate scale component (e.g. Behavioral Inhibition scale annotated by a value of `beh` in variable name `mh_cg_ibqr_beh_001`).
`item`	Will either be an item number corresponding to individual questions in a scale (e.g. `001`) or admin field/score label for administrative/summary score variables - see details

Domain Values	Description
`bio`	BioSpecimens
`mh`	Behavior/Child-Caregiver Interaction
`eeg`	Tabular EEG
`img`	Tabular Imaging
`ncl`	Neurocognition and Language
`nt`	Novel Tech
`pex`	Pregnancy/Exposure Including Substance
`ph`	Physical Health
`sed`	Social and Environmental Determinants

Source Values	Description
`bm`	Biological Mother
`cg`	Caregiver (Responsible Adult)
`ch`	Child
`ld`	Linked Data
`ra`	RA (research assistant)

Administrative & Summary Score Variables ▸

Administrative and summary score variable types include administrative fields and score labels in place of the item naming component, respectively. Possible values include:

Admin fields	`administration`; `location`; `lang`; `date_taken`; `candidate_age`; `gestational_age`; `adjusted_age`
Score labels	`score`; `summary_score`; `total_score`; etc.

Exceptions🔗

Some variables deviate from the standard naming conventions. These exceptions are temporary and will be standardized in future releases.

Derived data — variables such as sed_basic_demographics and par_visit_data in the Demographics domain
Biospecimen data, e.g. bio_bm_biosample_nails_results (see instrument list)
Tabulated MRI and EEG derivatives (see details):
Follow the naming convention domain_pipeline_derivative, where:
- domain: either img or eeg
- pipeline: processing pipeline name
- derivative: basename of the derivative output files sourced across participants to generate the tabulated data
Administrative and summary score (see details) variables often include additional single underscores (e.g. date_taken, summary_score), but still represent single main components

Study Design Logic: Child-Centric Data Structure🔗

The HBCD Study organizes data around the Child ID as the central key. All caregiver-provided data (e.g., from biological mothers or other caregivers) is nested under the corresponding Child ID. This structure supports the study’s goal of enabling longitudinal analyses of child development by:

Simplifying child-focused analysis: Researchers can track each child’s data over time without remapping caregiver information.
Handling multi-birth cases cleanly: When a caregiver reports on multiple children (e.g., twins), each child’s data remains distinct, avoiding complex joins or disambiguation.