Page Last Updated: October 10, 2025
Metadata & Naming Conventions๐
NBDC Data Dictionary๐
Tabulatedinstrument and derived data
(tabulated format) HBCD data is organized into a standardized table format, each of which contains a set of variables. The metadata for studies released via the NBDC Data Hub consists of:
- Data dictionary: Provides detailed information about the variables in the tabulated data resource, including the variable name, label, description, data type, and other relevant information.
- Levels table: Provides information about the levels of categorical variables in the tabulated format data (label, order, etc.)
Below are the definitions for the columns in the data dictionary and levels table. Note that some columns also correspond to elements in the BIDS JSON files that accompany all tabulatedinstrument and derived data
(tabulated format) data (hover over icon for details in table below).
Data Dictionary & Levels Column Definitions๐
| Name | Label | Description | { Possible Values } / Example(s) | MutableValues may vary across releases |
|---|---|---|---|---|
study |
Study | Indicates whether table/measure is a core components of the study or belongs to a substudy / anxiliary study | { Core; Substudy } | |
domain |
Domain | Domain/HBCD Workgroup |
{ Behavior/Child-Caregiver Interaction; Biospecimens; Demographics; Neurocognition & Language; Novel Tech; Physical Health; Pregnancy/Exposure Including Substance; Social & Environmental Determinants; Tabular EEG; Tabular imaging } |
|
source |
Source | Source of information for this table/measure | { Biological Mother; Caregiver (Responsible Adult); Child; General } |
|
table_name |
Table name | Name of table/measure | mh_p_cbcl |
|
table_label
Corresponds to MeasurementToolMetadata > Description in BIDS JSON
|
Table label | Label for table/measure | Child Behavior Checklist [Parent] | |
name |
Variable name | Name of column/variable/question | mh_p_cbcl__aggr_001 |
|
label
Corresponds to Description in BIDS JSON
|
Variable label | Label for column/variable/question | "Demands a lot of attention" | |
instruction |
Instruction | Instructions preceding table/measure questions | "The next set of questions is about your child's behavior in different situations and contexts. Please fill in a response to all questions." | |
header |
Header | Header/instructions for a set of questions | "Below is a list of items that describe children and youths. For each item that describes your child ... ... now or within the past 6 months, please choose whether the item is very true or often true of your child, somewhat or sometimes true of your child, or not true of your child. Please answer all items as well as you can, even if some do not seem to apply to your child." | |
note |
Note | Note displayed to participants | "Enter weight in pounds." | |
unit
Corresponds to Units in BIDS JSON
|
Unit | Unit of measurement | m, cm2, lbs | |
type_var
Derivative element in BIDS JSON set to true if type_var = summary score or derived item
|
Variable type | Type of column/variable/question | { administrative Data that gives context to the assessments, e.g. date of assessment, language, quality control, etc. ; item Original data provided by the participant, e.g. questions in a questionnaire ; derived item Derived from original data provided by the participant - e.g. if the participant filled in two fields to enter their height in feet and inches, a derived item could integrate this information into one field that provides the height in inches ; summary score Summary and/or score output based on algorithmic conversions of items/raw data } | |
type_data |
Data type | Data type (in database) | { date; timestamp; time; character Character only used for categorical columns ; text; integer; double } | |
type_level |
Level of measurement | Measurement level/scale type | { nominal; ordinal; interval; ratio } | |
type_field |
Field type | Field type in data capture system as presented to participant | dropdown; radio; checkbox | |
order_display |
Display order | Display order of item within measure | ||
branching_logic |
Branching logic | Branching logic applied to column/variable/question | ||
label_es |
Label (Spanish) | Label (Spanish) | ||
instruction_es |
Instruction (Spanish) | Instruction (Spanish) | ||
header_es |
Header (Spanish) | Header (Spanish) | ||
note_es |
Note (Spanish) | Note (Spanish) | ||
unique_identifiers |
Identifier column(s) | Unique identifier column names (variable/table) | ||
url_table |
Documentation for table | Link to study instrument documentation | ||
url_table_warn_use |
Responsible Use Warning (table) | Link to responsible use warning (table) | ||
url_table_warn_data |
Data Warning (table) | Link to data warning (table) | ||
url_warn_use |
Responsible Use Warning (variable) | Link to responsible use warning (variable) | ||
url_warn_data |
Data Warning (variable) | Link to data warning (variable) | ||
order_sort |
Sort order | Standard sort order in table/measure (and โ column order in data/database) |
| Name | JSON Element | Description | Example | MutableValues may vary across releases |
|---|---|---|---|---|
name |
Name of the categorical column/variable/question for which value/label pairs are reported | |||
value |
left hand side | Value of the level | 1 | |
order_level |
Order of response option (in data and how they were displayed to participants) | 2 | ||
label |
right hand side | Label of the level | Yes | |
label_es |
Label of the level (Spanish) | Si |
Lasso User Warnings - HBCD๐
Dataset downloads contain 2 additional columns not described in the data dictionary. This includes cohort and site, identical to Visit Information variables par_visit_data_<cohort|site>.
Column names appended with *_es are currently blank in the Lasso Dictionary Query Tool and will become available in a future release. Some columns in the data dictionary are not applicable to HBCD study data. These columns will appear in Lasso Portal queries, but will have blank values. Examples include atlas, metric, sub_domain, columns including nda/deap/redcap, etc. These columns can be safely ignored.
Naming Conventions๐
A standardized naming convention is used across most tables and fields in the tabulatedinstrument and derived data
(tabulated format) release data. These conventions are adapted from the ABCD Study and ensure consistency across instruments and derived datasets, allowing for intuitive parsing of variable meaning and structure.
Convention Logic & Rules๐
The standard variable naming format is comprised of 4 or 5 main components:
domain_source_table_{scale}_item
- Main components are generally separated by a single underscore (
_). Most instruments with multiple scales will additionally include thescalecomponent (this component is otherwise optional and not included in all variable names). - Subcomponents are separated by double (
__) underscores to indicate nested components oftable,scale, and/oritem. Subcomponents distinguish finer details such as subscales, versions, or counter types. Multiselect fields are preceded by triple underscores (___), mainly relevant for V01 Demographics (sed_bm_demo) variables.
Naming Component Definitions๐
| Component | Definition | Example Values |
|---|---|---|
domain |
Data domain (e.g. biospecimens, imaging) | bioBiospecimens;
imgImaging/MRI;
sedSocial & Environmental Determinants;
pexPregnancy & Exposures, Including Substance Use;
see full list |
source |
Subjectwho the protocol element is about/respondentwho completed the assessment (e.g., child, birth parent) | bmBiological Mother;
chChild; see full list
|
table |
Instrument/protocol element name | Varies by instrument |
{scale} |
Name of scale within instrument/protocol element - only if instrument contains multiple scales | Varies by instrument - see details |
item |
Will either be an item number corresponding to individual questions in a scale or admin field/score label for administrative/summary score variables - see details | 001; 001__01; etc.or admin field/score label |
Details(Click sections to expand)๐
bio | BioSpecimens |
eeg | Tabular EEG |
mh | Behavior/Child-Caregiver Interaction |
img | Tabular Imaging |
ncl | Neurocognition and Language |
nt | Novel Tech (Novel Technology & Wearable Sensors) |
pex | Pregnancy/Exposure Including Substance |
ph | Physical Health |
sed | Social and Environmental Determinants |
bm | Biological Mother |
cg | Caregiver (Responsible Adult) |
ch | Child |
ld | Linked Data |
ra | RA (research assistant) |
Most variables of instruments/tables composed of multiple scales include an additional naming component for scale (with the exception of administrative/summary score variables - see details). The following instruments in the current release are examples of tables that include the scale component in their variable names. Note that this is not a comprehensive list.
| Domain | Instrument | Table Name | Example Variable |
|---|---|---|---|
| BCGIBehavior & Child-Caregiver Interaction | IBQ-R (VSF)+BI | mh_cg_ibqr |
mh_cg_ibqr_beh_001 |
| PEXPregnancy & Exposure, Including Substance Use | FAM MH | pex_bm_psych |
pex_bm_psych_bf_001 |
| SEDSocial & Environmental Determinants | BFY | sed_bm_bfy |
sed_bm_bfy_econstr_001 |
| PROMIS | sed_bm_strsup |
sed_bm_strsup_socspprt_001 |
Exceptions(Click sections to expand)๐
Some variables do not fully follow the standard naming convention, which will be improved in future releases. Notable exceptions are as follows (click to expand):
Administrative (e.g., language or date of administration) and summary score (e.g., sums or means of individual items in a table) variables include administrative fields and score labels in place of item (or {scale}_item where relevant). Admin and score labels often include single underscores, but represent single main components. For example, possible values include:
| Admin fields | administration; location; lang; date_taken; candidate_age; gestational_age; adjusted_age |
| Score labels | score; summary_score; total_score; etc. |
Derived tables, including Basic Demographics (sed_basic_demographics), containing global, static variables, and Visit Information (par_visit_data), containing dynamic/longitudinal visit-level data, do not follow the naming conventions outlined above. For example, both fall under the domain Demographics and source General in the NBDC Data Dictionary, but use sed_basic (in reference to Social & Environmental Determinants from which the Basic Demographics information is derived) and par_visit (for participant information from visit-level data) in place of the domain_source naming components.
Biospecimen names are largely descriptive, e.g. bio_bm_biosample_nails_results and bio_bm_biosample_urine table names.
Tabulated data derived from MRI & MRS and EEG file-based data follow a unique naming convention. All files begin with the domain (img or eeg) in accordance with the conventions described above, but the following elements are the pipeline name (pipeline) and basename of the derivative output by that pipeline (derivative):
domain_pipeline_derivative
For example, the following subject/session-level XCP-D derivatives are combined into a single tabulated file:
| File-based derivatives | sub-{ID}_ses-{V0X}_task-rest_dir-PA_run-{X}_space-fsLR_seg_Gordon_stat-alff_bold.tsv |
| Tabulated file | img_xcpd_space-fsLR_seg_Gordon_stat-alff_bold.tsv |
Example๐
Let's break down the following example: ncl_cg_spm2__inf_soc_001
ncl: Neurocognition & Language (domain)cg: Caregiver (source)spm2__inf: nested table namespm2: the SPM-2 instrument (table)inf: Infant version of SPM-2 (table subcomponent)
soc: scale for metrics of socialization (scale)001: item number (item)
Study Design Logic: Child-Centric Data Structure๐
The HBCD Study organizes data around the Child ID as the central key. All caregiver-provided data (e.g., from biological mothers or other caregivers) is nested under the corresponding Child ID. This structure supports the studyโs goal of enabling longitudinal analyses of child development by:
- Simplifying child-focused analysis: Researchers can track each childโs data over time without remapping caregiver information.
- Handling multi-birth cases cleanly: When a caregiver reports on multiple children (e.g., twins), each childโs data remains distinct, avoiding complex joins or disambiguation.