Page Last Updated: October 17, 2025

Naming Conventions🔗

The instrument table and variable names used for tabulated HBCD study data largely follow standardized naming conventions adapted from the ABCD Study. This ensures consistency across instruments and derived datasets, allowing for intuitive parsing of variable meaning and structure.

Convention Logic & Rules🔗

The standard variable naming format is comprised of 4 or 5 main components separated by a single underscore ( _ ). The scale component is only present in a subset of instruments that contain multiple scales:

domain_source_table_{scale}_item

Variable names may also include subcomponents, separated by double ( __ ) underscores to indicate nested components of table, scale, and/or item. Subcomponents distinguish finer details such as subscales, versions, or counter types. Finally, multiselect fields are preceded by triple underscores ( ___ ), mainly relevant for V01 Demographics (sed_bm_demo) variables.

Example â–¸

Let's break down the following example: ncl_cg_spm2__inf_soc_001

  • ncl: Neurocognition & Language (domain)
  • cg: Caregiver (source)
  • spm2__inf: nested table name
    • spm2: the SPM-2 instrument (table)
    • inf: Infant version of SPM-2 (table subcomponent)
  • soc: scale for metrics of socialization (scale)
  • 001: item number (item)

Naming Component Definitions🔗

Details of individual naming components are as follows:

domain Data domain, e.g. bio (Biospecimens), img (Imaging) - see values key
source Can either be the subject/who the protocol element is about OR respondent/who completed the assessment. Examples include cg (Caregiver), ch (Child), etc. - see values key
table Instrument/protocol element name
{scale} Name of scale within instrument/protocol element for instruments with multiple scales (not including administrative/summary score variables). For example, the IBQ-R (VSF)+BI includes 4 scales, each indicated by a separate scale component (e.g. Behavioral Inhibition scale annotated by a value of beh in variable name mh_cg_ibqr_beh_001).
item Will either be an item number corresponding to individual questions in a scale (e.g. 001) or admin field/score label for administrative/summary score variables - see details
Values Key: domain & source â–¸
Domain ValuesDescription
bioBioSpecimens
mhBehavior/Child-Caregiver Interaction
eegTabular EEG
imgTabular Imaging
nclNeurocognition and Language
ntNovel Tech
pexPregnancy/Exposure Including Substance
phPhysical Health
sedSocial and Environmental Determinants
Source ValuesDescription
bmBiological Mother
cgCaregiver (Responsible Adult)
chChild
ldLinked Data
raRA (research assistant)
Administrative & Summary Score Variables â–¸

Administrative and summary score variable types include administrative fields and score labels in place of the item naming component, respectively. Possible values include:

Admin fields administration; location; lang; date_taken; candidate_age; gestational_age; adjusted_age
Score labels score; summary_score; total_score; etc.

Exceptions🔗

Some variables deviate from the standard naming conventions. These exceptions are temporary and will be standardized in future releases.

  • Derived data — variables such as sed_basic_demographics and par_visit_data in the Demographics domain
  • Biospecimen data, e.g. bio_bm_biosample_nails_results (see instrument list)
  • Tabulated MRI and EEG derivatives (see details):
    Follow the naming convention domain_pipeline_derivative, where:
    • domain: either img or eeg
    • pipeline: processing pipeline name
    • derivative: basename of the derivative output files sourced across participants to generate the tabulated data
  • Administrative and summary score (see details) variables often include additional single underscores (e.g. date_taken, summary_score), but still represent single main components

Study Design Logic: Child-Centric Data Structure🔗

The HBCD Study organizes data around the Child ID as the central key. All caregiver-provided data (e.g., from biological mothers or other caregivers) is nested under the corresponding Child ID. This structure supports the study’s goal of enabling longitudinal analyses of child development by:

  • Simplifying child-focused analysis: Researchers can track each child’s data over time without remapping caregiver information.
  • Handling multi-birth cases cleanly: When a caregiver reports on multiple children (e.g., twins), each child’s data remains distinct, avoiding complex joins or disambiguation.