Journées de Biostatistique 2024

21-22 nov. 2024 Paris (France)

sciencesconf.org:jdb2024:579356

Generative Models for Screening Genetic Mutations in Marfan Syndrome

Antonin Della Noce 1, 2 , Vesna Lukic 3 , Hakim Benkirane 3, 4 , Pauline Arnaud 5, 6, 7 , Agathe Le Galiot, Nadine Hanna 5, 6, 7 , Olivier Milleron 7 , Carine Le Goff 5 , Catherine Boileau 5, 6 , Paul-Henry Cournède 3, @ , Guillaume Jondeau 5, 7, @

1 : MATHERIALS, Paris

INRIA

2 : CERMICS, École des Ponts, Marne-la-Vallée

IP Paris

3 : MICS, CentraleSupélec, Gif-sur-Yvette

Université Paris-Saclay

4 : Inserm U1018 Oncostat, Villejuif

Inserm

5 : LVTS, U1148

Inserm, Université Paris-Cité

6 : Département de Génétique

Hopital Bichat, APHP

7 : Centre de réference pour le syndrome de Marfan et apparentés

Hopital Bichat, APHP

Background: Genetic disorders within the connective tissue spectrum, such as Marfan, Loeys-Dietz, and Ehlers-Danlos syndromes, are associated with a heightened risk of early-onset cardiovascular complications, including thoracic aortic aneurysm and dissection. Early detection of mutations, particularly in genes such as FBN1, TGFBR1, and TGFBR2, is crucial for timely prophylactic interventions to improve patient outcomes. However, reference centers specializing in these rare disorders face budgetary and capacity constraints that prevent them from testing every patient referred due to suspected connective tissue disorders. Current screening strategies, like those based on the Ghent nosology, struggle to reliably differentiate between individuals carrying pathogenic mutations and those who do not. Our goal is to enhance this screening strategy by modeling the distribution of mutation types based on an arbitrary set of clinical characteristics employed by the Ghent nosology.

Methods: We analyzed a cohort of 3,982 patients referred to the Reference Center for Marfan Syndrome and Related Disorders at Bichat-Claude Bernard Hospital, Paris, between 1988 and 2018. Genetic sequencing was performed for 36 connective tissue-associated genes, with identified mutations classified into three phenotype categories: mutations on FBN1, mutations on TGFBR1 or TGFBR2, and mutations on the other genes of the panel. Clinical covariates such as age, sex, height, aortic dimensions, and skeletal features were collected. 954 wild-type individuals, sequenced as part of a family investigation, served as controls. Given that the proportion of control patients in the cohort does not reflect the general population, a traditional classifier trained on this data would be limited to screening patients within the reference center. To extend screening to a general population, where the proportion of control individuals is higher, we applied Bayes' theorem, converting the classification problem into the estimation of the multivariate phenotype distribution for each mutation type. We developed an arbitrary conditioning generative model using a series of residual neural networks to output the conditional distributions for both continuous and categorical variables.

Results: The generative model achieved a global reconstruction coefficient of determination (R²) of 0.9 on the validation set. This joint distribution allows for the estimation of mutation probabilities based on clinical profiles, enabling a more precise and effective screening strategy.

Type :	:	Abstract pour présentation orale
Thématiques	:	Session 2.
PDF version	:	PDF version

Vie privée | Accessibilité