Page Last Updated: October 10, 2025

Metadata & Naming Conventions🔗

NBDC Data Dictionary🔗

Tabulatedinstrument and derived data
(tabulated format) HBCD data is organized into a standardized table format, each of which contains a set of variables. The metadata for studies released via the NBDC Data Hub consists of:

Data dictionary: Provides detailed information about the variables in the tabulated data resource, including the variable name, label, description, data type, and other relevant information.
Levels table: Provides information about the levels of categorical variables in the tabulated format data (label, order, etc.)

Below are the definitions for the columns in the data dictionary and levels table. Note that some columns also correspond to elements in the BIDS JSON files that accompany all tabulatedinstrument and derived data
(tabulated format) data (hover over icon for details in table below).

Data Dictionary & Levels Column Definitions🔗

Name	Label	Description	{ Possible Values } / Example(s)
`study`	Study	Indicates whether table/measure is a core components of the study or belongs to a substudy / anxiliary study	{ Core; Substudy }
`domain`	Domain	Domain/HBCD Workgroup	{ Behavior/Child-Caregiver Interaction; Biospecimens; Demographics; Neurocognition & Language; Novel Tech; Physical Health; Pregnancy/Exposure Including Substance; Social & Environmental Determinants; Tabular EEG; Tabular imaging }
`source`	Source	Source of information for this table/measure	{ Biological Mother; Caregiver (Responsible Adult); Child; General }
`table_name`	Table name	Name of table/measure	`mh_p_cbcl`
`table_label` Corresponds to MeasurementToolMetadata > Description in BIDS JSON	Table label	Label for table/measure	Child Behavior Checklist [Parent]
`name`	Variable name	Name of column/variable/question	`mh_p_cbcl__aggr_001`
`label` Corresponds to Description in BIDS JSON	Variable label	Label for column/variable/question	"Demands a lot of attention"
`instruction`	Instruction	Instructions preceding table/measure questions	"The next set of questions is about your child's behavior in different situations and contexts. Please fill in a response to all questions."
`header`	Header	Header/instructions for a set of questions	"Below is a list of items that describe children and youths. For each item that describes your child ... ... now or within the past 6 months, please choose whether the item is very true or often true of your child, somewhat or sometimes true of your child, or not true of your child. Please answer all items as well as you can, even if some do not seem to apply to your child."
`note`	Note	Note displayed to participants	"Enter weight in pounds."
`unit` Corresponds to Units in BIDS JSON	Unit	Unit of measurement	m, cm2, lbs
`type_var` Derivative element in BIDS JSON set to true if type_var = summary score or derived item	Variable type	Type of column/variable/question	{ administrative Data that gives context to the assessments, e.g. date of assessment, language, quality control, etc. ; item Original data provided by the participant, e.g. questions in a questionnaire ; derived item Derived from original data provided by the participant - e.g. if the participant filled in two fields to enter their height in feet and inches, a derived item could integrate this information into one field that provides the height in inches ; summary score Summary and/or score output based on algorithmic conversions of items/raw data }
`type_data`	Data type	Data type (in database)	{ date; timestamp; time; character Character only used for categorical columns ; text; integer; double }
`type_level`	Level of measurement	Measurement level/scale type	{ nominal; ordinal; interval; ratio }
`type_field`	Field type	Field type in data capture system as presented to participant	dropdown; radio; checkbox
`order_display`	Display order	Display order of item within measure
`branching_logic`	Branching logic	Branching logic applied to column/variable/question
`label_es`	Label (Spanish)	Label (Spanish)
`instruction_es`	Instruction (Spanish)	Instruction (Spanish)
`header_es`	Header (Spanish)	Header (Spanish)
`note_es`	Note (Spanish)	Note (Spanish)
`unique_identifiers`	Identifier column(s)	Unique identifier column names (variable/table)
`url_table`	Documentation for table	Link to study instrument documentation
`url_table_warn_use`	Responsible Use Warning (table)	Link to responsible use warning (table)
`url_table_warn_data`	Data Warning (table)	Link to data warning (table)
`url_warn_use`	Responsible Use Warning (variable)	Link to responsible use warning (variable)
`url_warn_data`	Data Warning (variable)	Link to data warning (variable)
`order_sort`	Sort order	Standard sort order in table/measure (and ⇒ column order in data/database)

Name	JSON Element	Description	Example
`name`		Name of the categorical column/variable/question for which value/label pairs are reported
`value`	left hand side	Value of the level	1
`order_level`		Order of response option (in data and how they were displayed to participants)	2
`label`	right hand side	Label of the level	Yes
`label_es`		Label of the level (Spanish)	Si

Lasso User Warnings - HBCD🔗

Dataset downloads contain 2 additional columns not described in the data dictionary. This includes cohort and site, identical to Visit Information variables par_visit_data_<cohort|site>.

Column names appended with *_es are currently blank in the Lasso Dictionary Query Tool and will become available in a future release. Some columns in the data dictionary are not applicable to HBCD study data. These columns will appear in Lasso Portal queries, but will have blank values. Examples include atlas, metric, sub_domain, columns including nda/deap/redcap, etc. These columns can be safely ignored.

Naming Conventions🔗

A standardized naming convention is used across most tables and fields in the tabulatedinstrument and derived data
(tabulated format) release data. These conventions are adapted from the ABCD Study and ensure consistency across instruments and derived datasets, allowing for intuitive parsing of variable meaning and structure.

Convention Logic & Rules🔗

The standard variable naming format is comprised of 4 or 5 main components:

domain_source_table_{scale}_item

Main components are generally separated by a single underscore ( _ ). Most instruments with multiple scales will additionally include the scale component (this component is otherwise optional and not included in all variable names).
Subcomponents are separated by double ( __ ) underscores to indicate nested components of table, scale, and/or item. Subcomponents distinguish finer details such as subscales, versions, or counter types. Multiselect fields are preceded by triple underscores ( ___ ), mainly relevant for V01 Demographics (sed_bm_demo) variables.

Naming Component Definitions🔗

Component	Definition	Example Values
`domain`	Data domain (e.g. biospecimens, imaging)	`bio`Biospecimens; `img`Imaging/MRI; `sed`Social & Environmental Determinants; `pex`Pregnancy & Exposures, Including Substance Use; see full list
`source`	Subjectwho the protocol element is about/respondentwho completed the assessment (e.g., child, birth parent)	`bm`Biological Mother; `ch`Child; see full list
`table`	Instrument/protocol element name	Varies by instrument
`{scale}`	Name of scale within instrument/protocol element - only if instrument contains multiple scales	Varies by instrument - see details
`item`	Will either be an item number corresponding to individual questions in a scale or admin field/score label for administrative/summary score variables - see details	`001`; `001__01`; etc. or admin field/score label

Details(Click sections to expand)🔗

Domain & Source: Possible Values ▸

Possible Values: `domain`
`bio`	BioSpecimens
`eeg`	Tabular EEG
`mh`	Behavior/Child-Caregiver Interaction
`img`	Tabular Imaging
`ncl`	Neurocognition and Language
`nt`	Novel Tech (Novel Technology & Wearable Sensors)
`pex`	Pregnancy/Exposure Including Substance
`ph`	Physical Health
`sed`	Social and Environmental Determinants

Possible Values: `source`
`bm`	Biological Mother
`cg`	Caregiver (Responsible Adult)
`ch`	Child
`ld`	Linked Data
`ra`	RA (research assistant)

Most variables of instruments/tables composed of multiple scales include an additional naming component for scale (with the exception of administrative/summary score variables - see details). The following instruments in the current release are examples of tables that include the scale component in their variable names. Note that this is not a comprehensive list.

Domain	Instrument	Table Name	Example Variable
BCGIBehavior & Child-Caregiver Interaction	IBQ-R (VSF)+BI	`mh_cg_ibqr`	`mh_cg_ibqr_beh_001`
PEXPregnancy & Exposure, Including Substance Use	FAM MH	`pex_bm_psych`	`pex_bm_psych_bf_001`
SEDSocial & Environmental Determinants	BFY	`sed_bm_bfy`	`sed_bm_bfy_econstr_001`
SEDSocial & Environmental Determinants	PROMIS	`sed_bm_strsup`	`sed_bm_strsup_socspprt_001`

Exceptions(Click sections to expand)🔗

Some variables do not fully follow the standard naming convention, which will be improved in future releases. Notable exceptions are as follows (click to expand):

Administrative (e.g., language or date of administration) and summary score (e.g., sums or means of individual items in a table) variables include administrative fields and score labels in place of item (or {scale}_item where relevant). Admin and score labels often include single underscores, but represent single main components. For example, possible values include:

Admin fields	`administration`; `location`; `lang`; `date_taken`; `candidate_age`; `gestational_age`; `adjusted_age`
Score labels	`score`; `summary_score`; `total_score`; etc.

Derived tables, including Basic Demographics (sed_basic_demographics), containing global, static variables, and Visit Information (par_visit_data), containing dynamic/longitudinal visit-level data, do not follow the naming conventions outlined above. For example, both fall under the domain Demographics and source General in the NBDC Data Dictionary, but use sed_basic (in reference to Social & Environmental Determinants from which the Basic Demographics information is derived) and par_visit (for participant information from visit-level data) in place of the domain_source naming components.

Biospecimen names are largely descriptive, e.g. bio_bm_biosample_nails_results and bio_bm_biosample_urine table names.

Tabulated data derived from MRI & MRS and EEG file-based data follow a unique naming convention. All files begin with the domain (img or eeg) in accordance with the conventions described above, but the following elements are the pipeline name (pipeline) and basename of the derivative output by that pipeline (derivative):

domain_pipeline_derivative

For example, the following subject/session-level XCP-D derivatives are combined into a single tabulated file:

File-based derivatives	`sub-{ID}_ses-{V0X}_task-rest_dir-PA_run-{X}_space-fsLR_seg_Gordon_stat-alff_bold.tsv`
Tabulated file	`img_xcpd_space-fsLR_seg_Gordon_stat-alff_bold.tsv`

Example🔗

Let's break down the following example: ncl_cg_spm2__inf_soc_001

ncl: Neurocognition & Language (domain)
cg: Caregiver (source)
spm2__inf: nested table name
- spm2: the SPM-2 instrument (table)
- inf: Infant version of SPM-2 (table subcomponent)
soc: scale for metrics of socialization (scale)
001: item number (item)

Study Design Logic: Child-Centric Data Structure🔗

The HBCD Study organizes data around the Child ID as the central key. All caregiver-provided data (e.g., from biological mothers or other caregivers) is nested under the corresponding Child ID. This structure supports the study’s goal of enabling longitudinal analyses of child development by:

Simplifying child-focused analysis: Researchers can track each child’s data over time without remapping caregiver information.
Handling multi-birth cases cleanly: When a caregiver reports on multiple children (e.g., twins), each child’s data remains distinct, avoiding complex joins or disambiguation.