Page Last Updated: September 15, 2025
ReproSchema🔗
What is ReproSchema?🔗
Ensuring consistency in research data collection is critical, especially in large-scale and multi-site studies like HBCD. However, tracking changes in questionnaires, maintaining structured formats, and ensuring version control across different time points can be challenging.
ReproSchema is a framework designed to address these challenges by providing a structured, standardized, and version-controlled system for managing research questionnaires. It is not a standalone survey tool like Qualtrics or REDCap. Instead, it is a modular framework that integrates with data collection systems, providing a structured backbone to reduce inconsistencies in questionnaire administration, improve data traceability, and ensure reproducibility in longitudinal research. ReproSchema functions as both a schema and software platform:
As a schema, ReproSchema defines a common structured format for organizing questionnaires, ensuring consistency regardless of where or when they are administered:
- Questionnaires are consistently formatted across different studies and time points.
- Each question, response option, and skip pattern is clearly defined and linked to metadata.
- Version history is preserved so researchers can track changes over time.
As a platform, it provides a set of tools to store, version, and track questionnaires, allowing researchers to:
- Store and version questionnaires to prevent discrepancies in multi-site studies.
- Track and document changes, such as wording adjustments, adding or removing questions, or updating response options.
- Access and compare different versions of a questionnaire used at any given time to understand how changes impact data collection.
-
Release 1.0 (Initial Version):
- Question: "How many hours do you sleep on a typical night?"
- Response Options: Free-text input (participants enter a number).
-
Release 2.0 (Revised Version):
- Question: "On average, how many hours of sleep do you get per night?"
- Response Options: Dropdown menu:
- Less than 5 hours
- 5-6 hours
- 7-8 hours
- More than 8 hours
-
Release 3.0 (Further Revision):
- Question: "On average, how many hours of sleep do you get in a 24-hour period, including naps?"
- Response Options: Same as Release 2.0.
-
Implications for Data Analysis:
- Data Consistency: The shift from free-text to predefined categories (Release 1.0 → 2.0) standardizes responses, reducing variability but limiting detail.
- Comparability: The addition of naps (Release 2.0 → 3.0) may impact cross-release trend analysis.
- Data Harmonization: ReproSchema’s version management ensures that each change is documented for consistent longitudinal studies.
ReproSchema Workflow🔗
The ReproSchema workflow for standardizing survey data collection consists of six key components, illustrated in Figure 1 below. For details, please see https://www.repronim.org/reproschema and Chen et al., 2024.
redcap2reproschema). (B) The reproschema-protocol-cookiecutter tool provides a structured, stepwise process for researchers to create and publish a protocol on GitHub, ensuring organized metadata and version control. This tool enables schema validation and user interface (UI) serving. (C) ReproSchema protocols are stored in GitHub repositories, where version-controlled URIs ensure persistent access to protocols, activities, and assessment items, supporting reproducibility and provenance tracking. (D) The ReproSchema-UI provides a browser-based interface for interactive survey deployment, allowing researchers to collect structured data while maintaining schema integrity. (E) Survey responses are stored in JSON-LD format, with embedded URIs linking each protocol, activity, and item to their respective sources in the ReproSchema library. This structure ensures data provenance, traceability, and semantic interoperability. (F) The reproschema-py tools facilitate output conversion into various standardized formats, including National Institute of Mental Health (NIMH) Common Data Elements (reproschema2cde), Brain Imaging Data Structure (BIDS) (reproschema2bids), and REDCap CSV (output2redcap).Overview of Version Management🔗
In longitudinal studies like HBCD, where data is collected over extended periods, tracking every change in data collection instruments is vital. ReproSchema's systematic documentation allows researchers to:
- Maintain Data Consistency: By having a clear record of all modifications, researchers can ensure that data remains comparable across different time points.
- Enhance Data Reliability: Understanding the history of changes helps in assessing the reliability of the data, as it provides context for any variations observed in the responses.
- Facilitate Transparent Reporting: Comprehensive documentation supports transparent reporting of methodologies, which is essential for reproducibility and credibility in scientific research.
Here's an overview of how ReproSchema handles various modifications:
Fixing Typographical Errors – Ensures corrected versions are used in future data collection and allows for tracking the impact of the error on past data.
Adjusting Answer Choices – Documents changes in options to ensure variations are considered during data analysis.
Modifying Question Order – Tracks reordering to assess potential effects on responses, as the sequence of questions can influence how respondents perceive and answer them.
Adding or Removing Questions – Ensures structural changes are documented for accurate longitudinal analysis and data comparability.
How ReproSchema Organizes Questionnaires🔗
ReproSchema structures research questionnaires into three levels, making it easy to track, update, and maintain consistency in data collection over time.
Protocol Level: The Big Picture
At the highest level, a protocol represents the entire study’s questionnaire framework. It includes all assessments and surveys used in the research. Each protocol is tied to a specific data release, ensuring everything is version-controlled and well-documented. This structure helps researchers keep surveys consistent across different phases of a study.
Activity Level: The Questionnaires
Within each protocol, the activity level includes individual questionnaires or assessments, such as the PHQ-9 (depression scale) or GAD-7 (anxiety scale). ReproSchema tracks changes at this level, including when assessments are added, removed, or modified. This tracking ensures that researchers can compare data across time while maintaining accuracy.
Item Level: The Questions
The smallest unit in ReproSchema is an item, which refers to individual questions within a questionnaire. Each question is carefully tracked, including:
- Question Text: The exact wording of the question.
- Response Options: The answer choices given to participants.
- Skip Patterns: If certain answers lead to different follow-up questions.
Tracking these elements allows researchers to see when and how questions change, which is critical in studies that collect data over multiple years.
Additional Resources🔗
- https://www.repronim.org/reproschema
- hbcd-loris2reproschema GitHub repository
References🔗
Chen Y, Jarecka D, Abraham SA, Gau R, Ng E, Low DM, Bevers I, Johnson A, Keshavan A, Klein A, Clucas J, Rosli Z, Hodge SM, Linkersdörfer J, Bartsch H, Das S, Fair D, Kennedy D, Ghosh SS. Standardizing Survey Data Collection to Enhance Reproducibility: An Evaluation of ReproSchema. J Med Internet Res. http://dx.doi.org/10.2196/63343