T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus
- Rodriguez-Leon, Ciro 1
- Aviles Perez, Maria Dolores 234
- Banos, Oresti 1
- Quesada-Charneco, Miguel 2
- Lopez-Ibarra, Pablo J 23
- Villalonga, Claudia 1
- Munoz-Torres, Manuel 1234
-
1
Universidad de Granada
info
- 2 University Hospital Clínico San Cecilio
-
3
Instituto de Investigación Biosanitaria de Granada
info
Instituto de Investigación Biosanitaria de Granada
Granada, España
-
4
Instituto de Salud Carlos III
info
Editor: Zenodo
Año de publicación: 2023
Tipo: Dataset
Resumen
T1DiabetesGranada A longitudinal multi-modal dataset of type 1 diabetes mellitus Documented by: Rodriguez-Leon, C., Aviles-Perez, M. D., Banos, O., Quesada-Charneco, M., Lopez-Ibarra, P. J., Villalonga, C., & Munoz-Torres, M. (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data, 10(1), 916. https://doi.org/10.1038/s41597-023-02737-4 Background Type 1 diabetes mellitus (T1D) patients face daily difficulties in keeping their blood glucose levels within appropriate ranges. Several techniques and devices, such as flash glucose meters, have been developed to help T1D patients improve their quality of life. Most recently, the data collected via these devices is being used to train advanced artificial intelligence models to characterize the evolution of the disease and support its management. The main problem for the generation of these models is the scarcity of data, as most published works use private or artificially generated datasets. For this reason, this work presents T1DiabetesGranada, a open under specific permission longitudinal dataset that not only provides continuous glucose levels, but also patient demographic and clinical information. The dataset includes 257780 days of measurements over four years from 736 T1D patients from the province of Granada, Spain. This dataset progresses significantly beyond the state of the art as one the longest and largest open datasets of continuous glucose measurements, thus boosting the development of new artificial intelligence models for glucose level characterization and prediction. Data Records The data are stored in four comma-separated values (CSV) files which are available in T1DiabetesGranada.zip. These files are described in detail below. Patient_info.csv Patient_info.csv is the file containing information about the patients, such as demographic data, start and end dates of blood glucose level measurements and biochemical parameters, number of biochemical parameters or number of diagnostics. This file is composed of 736 records, one for each patient in the dataset, and includes the following variables: Patient_ID – Unique identifier of the patient. Format: LIB19XXXX. Sex – Sex of the patient. Values: F (for female), masculine (for male) Birth_year – Year of birth of the patient. Format: YYYY. Initial_measurement_date – Date of the first blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD. Final_measurement_date – Date of the last blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD. Number_of_days_with_measures – Number of days with blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 8 to 1463. Number_of_measurements – Number of blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 400 to 137292. Initial_biochemical_parameters_date – Date of the first biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD. Final_biochemical_parameters_date – Date of the last biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD. Number_of_biochemical_parameters – Number of biochemical parameters measured on the patient, extracted from the Biochemical_parameters.csv file. Values: ranging from 4 to 846. Number_of_diagnostics – Number of diagnoses realized to the patient, extracted from the Diagnostics.csv file. Values: ranging from 1 to 24. Glucose_measurements.csv Glucose_measurements.csv is the file containing the continuous blood glucose level measurements of the patients. The file is composed of more than 22.6 million records that constitute the time series of continuous blood glucose level measurements. It includes the following variables: Patient_ID – Unique identifier of the patient. Format: LIB19XXXX. Measurement_date – Date of the blood glucose level measurement. Format: YYYY-MM-DD. Measurement_time – Time of the blood glucose level measurement. Format: HH:MM:SS. Measurement – Value of the blood glucose level measurement in mg/dL. Values: ranging from 40 to 500. Biochemical_parameters.csv Biochemical_parameters.csv is the file containing data of the biochemical tests performed on patients to measure their biochemical parameters. This file is composed of 87482 records and includes the following variables: Patient_ID – Unique identifier of the patient. Format: LIB19XXXX. Reception_date – Date of receipt in the laboratory of the sample to measure the biochemical parameter. Format: YYYY-MM-DD. Name – Name of the measured biochemical parameter. Values: 'Potassium', 'HDL cholesterol', 'Gammaglutamyl Transferase (GGT)', 'Creatinine', 'Glucose', 'Uric acid', 'Triglycerides', 'Alanine transaminase (GPT)', 'Chlorine', 'Thyrotropin (TSH)', 'Sodium', 'Glycated hemoglobin (Ac)', 'Total cholesterol', 'Albumin (urine)', 'Creatinine (urine)', 'Insulin', 'IA ANTIBODIES'. Value – Value of the biochemical parameter. Values: ranging from -4.0 to 6446.74. Diagnostics.csv Diagnostics.csv is the file containing diagnoses of diabetes mellitus complications or other diseases that patients have in addition to type 1 diabetes mellitus. This file is composed of 1757 records and includes the following variables: Patient_ID – Unique identifier of the patient. Format: LIB19XXXX. Code – ICD-9-CM diagnosis code. Values: subset of 594 of the ICD-9-CM codes (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes). Description – ICD-9-CM long description. Values: subset of 594 of the ICD-9-CM long description (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes). Technical Validation Blood glucose level measurements are collected using FreeStyle Libre devices, which are widely used for healthcare in patients with T1D. Abbott Diabetes Care, Inc., Alameda, CA, USA, the manufacturer company, has conducted validation studies of these devices concluding that the measurements made by their sensors compare to YSI analyzer devices (Xylem Inc.), the gold standard, yielding results of 99.9% of the time within zones A and B of the consensus error grid. In addition, other studies external to the company concluded that the accuracy of the measurements is adequate. Moreover, it was also checked in most cases the blood glucose level measurements per patient were continuous (i.e. a sample at least every 15 minutes) in the Glucose_measurements.csv file as they should be. Usage Notes For data downloading, it is necessary to be authenticated on the Zenodo platform, accept the Data Usage Agreement and send a request specifying full name, email, and the justification of the data use. This request will be processed by the Secretary of the Department of Computer Engineering, Automatics, and Robotics of the University of Granada and access to the dataset will be granted. The files that compose the dataset are CSV type files delimited by commas and are available in T1DiabetesGranada.zip. A Jupyter Notebook (Python v. 3.8) with code that may help to a better understanding of the dataset, with graphics and statistics, is available in UsageNotes.zip. Graphs_and_stats.ipynb The Jupyter Notebook generates tables, graphs and statistics for a better understanding of the dataset. It has four main sections, one dedicated to each file in the dataset. In addition, it has useful functions such as calculating the patient age, deleting a patient list from a dataset file and leaving only a patient list in a dataset file. Code Availability The dataset was generated using some custom code located in CodeAvailability.zip. The code is provided as Jupyter Notebooks created with Python v. 3.8. The code was used to conduct tasks such as data curation and transformation, and variables extraction. Original_patient_info_curation.ipynb In the Jupyter Notebook is preprocessed the original file with patient data. Mainly irrelevant rows and columns are removed, and the sex variable is recoded. Glucose_measurements_curation.ipynb In the Jupyter Notebook is preprocessed the original file with the continuous glucose level measurements of the patients. Principally rows without information or duplicated rows are removed and the variable with the timestamp is transformed into two new variables, measurement date and measurement time. Biochemical_parameters_curation.ipynb In the Jupyter Notebook is preprocessed the original file with patient data of the biochemical tests performed on patients to measure their biochemical parameters. Mainly irrelevant rows and columns are removed and the variable with the name of the measured biochemical parameter is translated. Diagnostic_curation.ipynb In the Jupyter Notebook is preprocessed the original file with patient data of the diagnoses of diabetes mellitus complications or other diseases that patients have in addition to T1D. Get_patient_info_variables.ipynb In the Jupyter Notebook it is coded the feature extraction process from the files Glucose_measurements.csv, Biochemical_parameters.csv and Diagnostics.csv to complete the file Patient_info.csv. It is divided into six sections, the first three to extract the features from each of the mentioned files and the next three to add the extracted features to the resulting new file. Data Usage Agreement The conditions for use are as follows: You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research. You commit to keeping the T1DiabetesGranada dataset confidential and secure and will not redistribute data or Zenodo account credentials. You will require anyone on your team who utilizes these data to comply with this Data Use Agreement by accessing the data themselves through Zenodo. Each user wishing to access controlled data must individually agree to the Conditions for Use. You understand that these data may not be used for commercial use or to re-contact research participants. You agree to report any misuse, unauthorized access, or data release, intentional or inadvertent within 5 business days. For this purpose, please write an email to icarsecretaria@ugr.es explaining the detected data breach. You agree to cite the research paper under which T1DiabetesGranada dataset was published on all publications or presentations which result from using the T1DiabetesGranada dataset. Please note that in order to request access to the dataset you need to be authenticated on Zenodo, accept the Data Usage Agreement and specify in the request your full name, email, and the justification of the data use.