Sign up for a new account.
And get access to
The latest T1D content
Research that matters
Our daily questions
Sign up by entering your info below.
Sign in to your account with
Reset Your Password
We will email you instructions to reset your
Background/Objective: Data registries, such as T1D Exchange, advance data-driven innovation by compiling data from centers across the nation and using them to answer complex questions and develop strategies to improve patient outcomes. That requires member institutions to map, transform, and validate their data – a challenging and detail-oriented task. Our institution was able to reduce submission errors and improve data quality by using the open-source framework Pandera (Bantilan, 2020) to validate data before submission.
Methods: Pandera is a lightweight schema and data validation framework built in Python (3.7, 3.8, 3.9). It allows users to define a schema for their data and to specify a wide variety of data quality checks. Our team translated the mapping documentation provided by the T1D Exchange to data tests in Python using Pandera. This validation step was added after data extraction and mapping, before any data was submitted to the T1D Exchange.
Results: In the submission prior to using Pandera, the T1D Exchange reported 26 data schema and validation errors back to our institution. In the first monthly submission after adding Pandera schema validation to the workflow, only one error was reported.
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Schema and data validation of T1D Exchange mapped data using Pandera framework
You must be logged in to post a comment.