accedi-icon

Machine Learning As A New Method For “Case -Finding” In Celiac Disease (Needle Study)

Open

Salvatore Oliva, 'La Sapienza' University, Rome, Italy

2-years Project

Celiac disease 
Area: Clinics

Background

Machine learning (ML) is a discipline at the meeting point of statistics and computer science, bearing the potential to develop powerful support systems for medical decision-making, which helps in assigning diagnoses to patients based on clinical data. Recently, efforts on applying ML in the field of celiac disease (CD) have increased. Several authors used ML techniques in order to identify patients at high CD risk, needing a specific screening. In a recent systematic review, computer-aided CD diagnosis has been studied by using computer image processing techniques for the detection of villous atrophy. No study has assessed the role of ML in a case-finding strategy for CD detection in the general population.

Hypothesis, Rationale and Aims

In CD, the screening approach still represents an important challenge: two strategies have been proposed (mass screening versus case finding), none of which seems totally feasible. In this study, we aim to apply a case finding approach by using a new “probabilistic score” based on supervised ML prediction models including several parameters (common and uncommon symptoms, laboratory tests, gender, familiarity for CD or other autoimmune diseases, auxological parameters) and independently of antibodies values. These models would find application in the clinical setting, enabling clinicians to efficiently identifying patients (especially children) who are at high risk of having or developing CD. This could allow us to identify patients with common, uncommon symptoms and laboratory signs likely needing a CD screening. According to our “case finding strategy”, recognizing common, mild and uncommon symptoms by this ML prediction score could greatly reduce the rate of missed CD diagnosis in the general population.

Research Plan (Experimental design and methodologies)

A pediatric discovery cohort will be recruited to build this new prediction score. Our proposed ML models will be trained on this cohort and then their performance will be evaluated on a pediatric validation cohort. As a following phase, if our model results reliable in children, we plan to validate it also on a dataset of adult subjects. To create the study cohort, we will consider clinical charts of patients diagnosed with CD in Pediatric Gastroenterology and Liver Unit at Sapienza University of Rome, Italy from 2018 to 2020 (n= 200) and 600 controls matched for age and gender. In our prediction score, we will include clinical data in the following way: symptoms, time from symptoms’ onset, presence of other autoimmune diseases, presence of correlated syndrome, laboratory tests, gender, familiarity for CD, familiarity for other autoimmune diseases and auxological parameters. All these data will be included in a statistical model based on ML with the previous input features used to “label” the presence of CD. After the validation of a first discovery cohort, we aim to validate this prediction model for verification of the reliability of this prediction model.

Expected results and Impact

The proposed supervised ML models and the prediction score might be used in a clinical setting, identifying patients who are at higher risk of CD that can receive a timely diagnosis by the screening.

This prediction model could enable clinicians to selectively identify patients with common and uncommon symptoms, and laboratory signs by reducing the need for CD mass screen campaigns. The major aim of the scientific community is to diagnose as many patients as possible by discovering the hidden part of the “CD iceberg” and diminishing the “submerged” undiagnosed untreated patients. We aim to publish results of this study on highly impacted journals since we believe that our result might change clinical practice.

 

Torna al sito regionale