• Something wrong with this record ?

Whole exome sequencing and machine learning germline analysis of individuals presenting with extreme phenotypes of high and low risk of developing tobacco-associated lung adenocarcinoma

A. Patiño-García, E. Guruceaga, MP. Andueza, M. Ocón, JJ. Fodop Sokoudjou, N. de Villalonga Zornoza, G. Alkorta-Aranburu, IT. Uria, A. Gurpide, C. Camps, E. Jantus-Lewintre, M. Navamuel-Andueza, MF. Sanmamed, I. Melero, M. Elgendy, JP. Fusco, JJ....

. 2024 ; 102 (-) : 105048. [pub] 20240313

Language English Country Netherlands

Document type Journal Article

BACKGROUND: Tobacco is the main risk factor for developing lung cancer. Yet, while some heavy smokers develop lung cancer at a young age, other heavy smokers never develop it, even at an advanced age, suggesting a remarkable variability in the individual susceptibility to the carcinogenic effects of tobacco. We characterized the germline profile of subjects presenting these extreme phenotypes with Whole Exome Sequencing (WES) and Machine Learning (ML). METHODS: We sequenced germline DNA from heavy smokers who either developed lung adenocarcinoma at an early age (extreme cases) or who did not develop lung cancer at an advanced age (extreme controls), selected from databases including over 6600 subjects. We selected individual coding genetic variants and variant-rich genes showing a significantly different distribution between extreme cases and controls. We validated the results from our discovery cohort, in which we analysed by WES extreme cases and controls presenting similar phenotypes. We developed ML models using both cohorts. FINDINGS: Mean age for extreme cases and controls was 50.7 and 79.1 years respectively, and mean tobacco consumption was 34.6 and 62.3 pack-years. We validated 16 individual variants and 33 variant-rich genes. The gene harbouring the most validated variants was HLA-A in extreme controls (4 variants in the discovery cohort, p = 3.46E-07; and 4 in the validation cohort, p = 1.67E-06). We trained ML models using as input the 16 individual variants in the discovery cohort and tested them on the validation cohort, obtaining an accuracy of 76.5% and an AUC-ROC of 83.6%. Functions of validated genes included candidate oncogenes, tumour-suppressors, DNA repair, HLA-mediated antigen presentation and regulation of proliferation, apoptosis, inflammation and immune response. INTERPRETATION: Individuals presenting extreme phenotypes of high and low risk of developing tobacco-associated lung adenocarcinoma show different germline profiles. Our strategy may allow the identification of high-risk subjects and the development of new therapeutic approaches. FUNDING: See a detailed list of funding bodies in the Acknowledgements section at the end of the manuscript.

Bioinformatics Platform Cima and IdisNA University of Navarra Pamplona Spain

CIMA LAB Diagnostics and IdisNA University of Navarra Pamplona Spain

Computational Biology Program Cima Data Science and Artificial Intelligence Institute CCUN IdisNA and CIBERONC University of Navarra Pamplona Spain

Department of Biotechnology Universitat Politècnica de València Unidad Mixta TRIAL and CIBERONC Valencia Spain

Department of Medical Oncology Hospital General Universitario de Valencia Unidad Mixta TRIAL Valencia Spain

Department of Medical Oncology Hospital La Luz Quirón Madrid Spain

Department of Oncology CUN CCUN and IdisNA University of Navarra Pamplona Spain

Department of Oncology CUN CCUN IdisNA and CIBERONC University of Navarra Pamplona Spain

Department of Oncology CUN Division of Immunology Cima CCUN IdisNA and CIBERONC University of Navarra Pamplona Spain

Department of Pediatrics and Clinical Genetics Clínica Universidad de Navarra University of Navarra Pamplona Spain

Department of Radiology CUN CCUN and IdisNA Pamplona Spain

Division of Immunology Cima and Immunotherapy CUN CCUN IdisNA and CIBERONC University of Navarra Pamplona Spain

Electrical and Electronic Engineering Department Tecnun DATAI University of Navarra San Sebastian Spain

Electrical and Electronic Engineering Department Tecnun University of Navarra San Sebastian Spain

Institute for Clinical Chemistry and Laboratory Medicine Mildred Scheel Early Career Center National Center for Tumor Diseases Dresden University Hospital and Faculty of Medicine Medical Clinic 1 University Hospital Carl Gustav Carus Technische Universität Dresden Dresden Germany Laboratory of Cancer Cell Biology Institute of Molecular Genetics of the Czech Academy of Sciences Prague Czech Republic

Program in Solid Tumors Cima CCUN Department of Biochemistry and Genetics School of Science IdisNA and CIBERONC University of Navarra Pamplona Spain

Program in Solid Tumors Cima Department of Pathology Anatomy and Physiology Schools of Medicine and Sciences CCUN IdisNA and CIBERONC University of Navarra Pamplona Spain

Pulmonary Critical Care and Sleep Division Mount Sinai Morningside Hospital New York USA

Pulmonary Department CUN CCUN and Centro de Investigación Biomédica en Red de Enfermedades Respiratorias University of Navarra Madrid Spain

Pulmonary Department CUN CCUN and IdisNA University of Navarra Pamplona Spain

References provided by Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc24014443
003      
CZ-PrNML
005      
20240905134309.0
007      
ta
008      
240725e20240313ne f 000 0|eng||
009      
AR
024    7_
$a 10.1016/j.ebiom.2024.105048 $2 doi
035    __
$a (PubMed)38484556
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a ne
100    1_
$a Patiño-García, Ana $u Department of Pediatrics and Clinical Genetics, Clínica Universidad de Navarra (CUN), Cancer Center Clínica Universidad de Navarra (CCUN), Program in Solid Tumors, Center for Applied Medical Research (Cima) and Navarra Institute for Health Research (IdisNA), University of Navarra, Pamplona, Spain
245    10
$a Whole exome sequencing and machine learning germline analysis of individuals presenting with extreme phenotypes of high and low risk of developing tobacco-associated lung adenocarcinoma / $c A. Patiño-García, E. Guruceaga, MP. Andueza, M. Ocón, JJ. Fodop Sokoudjou, N. de Villalonga Zornoza, G. Alkorta-Aranburu, IT. Uria, A. Gurpide, C. Camps, E. Jantus-Lewintre, M. Navamuel-Andueza, MF. Sanmamed, I. Melero, M. Elgendy, JP. Fusco, JJ. Zulueta, JP. de-Torres, G. Bastarrika, L. Seijo, R. Pio, LM. Montuenga, M. Hernáez, I. Ochoa, JL. Perez-Gracia
520    9_
$a BACKGROUND: Tobacco is the main risk factor for developing lung cancer. Yet, while some heavy smokers develop lung cancer at a young age, other heavy smokers never develop it, even at an advanced age, suggesting a remarkable variability in the individual susceptibility to the carcinogenic effects of tobacco. We characterized the germline profile of subjects presenting these extreme phenotypes with Whole Exome Sequencing (WES) and Machine Learning (ML). METHODS: We sequenced germline DNA from heavy smokers who either developed lung adenocarcinoma at an early age (extreme cases) or who did not develop lung cancer at an advanced age (extreme controls), selected from databases including over 6600 subjects. We selected individual coding genetic variants and variant-rich genes showing a significantly different distribution between extreme cases and controls. We validated the results from our discovery cohort, in which we analysed by WES extreme cases and controls presenting similar phenotypes. We developed ML models using both cohorts. FINDINGS: Mean age for extreme cases and controls was 50.7 and 79.1 years respectively, and mean tobacco consumption was 34.6 and 62.3 pack-years. We validated 16 individual variants and 33 variant-rich genes. The gene harbouring the most validated variants was HLA-A in extreme controls (4 variants in the discovery cohort, p = 3.46E-07; and 4 in the validation cohort, p = 1.67E-06). We trained ML models using as input the 16 individual variants in the discovery cohort and tested them on the validation cohort, obtaining an accuracy of 76.5% and an AUC-ROC of 83.6%. Functions of validated genes included candidate oncogenes, tumour-suppressors, DNA repair, HLA-mediated antigen presentation and regulation of proliferation, apoptosis, inflammation and immune response. INTERPRETATION: Individuals presenting extreme phenotypes of high and low risk of developing tobacco-associated lung adenocarcinoma show different germline profiles. Our strategy may allow the identification of high-risk subjects and the development of new therapeutic approaches. FUNDING: See a detailed list of funding bodies in the Acknowledgements section at the end of the manuscript.
650    _2
$a lidé $7 D006801
650    _2
$a lidé středního věku $7 D008875
650    _2
$a senioři $7 D000368
650    _2
$a sekvenování exomu $7 D000073359
650    _2
$a genetická predispozice k nemoci $7 D020022
650    12
$a adenokarcinom plic $x genetika $7 D000077192
650    12
$a nádory plic $x genetika $x patologie $7 D008175
650    _2
$a fenotyp $7 D010641
650    _2
$a zárodečné buňky $x patologie $7 D005854
655    _2
$a časopisecké články $7 D016428
700    1_
$a Guruceaga, Elizabeth $u Bioinformatics Platform, Cima and IdisNA, University of Navarra, Pamplona, Spain
700    1_
$a Andueza, Maria Pilar $u Department of Oncology, CUN, CCUN and IdisNA, University of Navarra, Pamplona, Spain
700    1_
$a Ocón, Marimar $u Pulmonary Department, CUN, CCUN and IdisNA, University of Navarra, Pamplona, Spain
700    1_
$a Fodop Sokoudjou, Jafait Junior $u Electrical and Electronic Engineering Department, Tecnun, University of Navarra, San Sebastian, Spain
700    1_
$a de Villalonga Zornoza, Nicolás $u Electrical and Electronic Engineering Department, Tecnun, University of Navarra, San Sebastian, Spain
700    1_
$a Alkorta-Aranburu, Gorka $u CIMA LAB Diagnostics and IdisNA, University of Navarra, Pamplona, Spain
700    1_
$a Uria, Ibon Tamayo $u Bioinformatics Platform, Cima and IdisNA, University of Navarra, Pamplona, Spain
700    1_
$a Gurpide, Alfonso $u Department of Oncology, CUN, CCUN and IdisNA, University of Navarra, Pamplona, Spain
700    1_
$a Camps, Carlos $u Department of Medical Oncology, Hospital General Universitario de Valencia, Unidad Mixta TRIAL (Fundación para la Investigación del Hospital General Universitario de Valencia y Centro de Investigación Príncipe Felipe) and Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Valencia, Spain
700    1_
$a Jantus-Lewintre, Eloísa $u Department of Biotechnology, Universitat Politècnica de València, Unidad Mixta TRIAL (Fundación para la Investigación del Hospital General Universitario de Valencia y Centro de Investigación Príncipe Felipe) and CIBERONC, Valencia, Spain
700    1_
$a Navamuel-Andueza, Maria $u Pulmonary Department, CUN, CCUN and IdisNA, University of Navarra, Pamplona, Spain
700    1_
$a Sanmamed, Miguel F $u Department of Oncology, CUN, Division of Immunology, Cima, CCUN, IdisNA and CIBERONC, University of Navarra, Pamplona, Spain
700    1_
$a Melero, Ignacio $u Division of Immunology, Cima and Immunotherapy, CUN, CCUN, IdisNA and CIBERONC, University of Navarra, Pamplona, Spain
700    1_
$a Elgendy, Mohamed $u Institute for Clinical Chemistry and Laboratory Medicine, Mildred-Scheel Early Career Center, National Center for Tumor Diseases Dresden (NCT/UCC), University Hospital and Faculty of Medicine, Medical Clinic I, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany. Laboratory of Cancer Cell Biology, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, Czech Republic
700    1_
$a Fusco, Juan Pablo $u Department of Medical Oncology Hospital La Luz, Quirón, Madrid, Spain
700    1_
$a Zulueta, Javier J $u Pulmonary, Critical Care, and Sleep Division, Mount Sinai Morningside Hospital, New York, USA
700    1_
$a de-Torres, Juan P $u Pulmonary Department, CUN, CCUN and IdisNA, University of Navarra, Pamplona, Spain
700    1_
$a Bastarrika, Gorka $u Department of Radiology, CUN, CCUN and IdisNA, Pamplona, Spain
700    1_
$a Seijo, Luis $u Pulmonary Department, CUN, CCUN and Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES), University of Navarra, Madrid, Spain
700    1_
$a Pio, Ruben $u Program in Solid Tumors, Cima -CCUN, Department of Biochemistry and Genetics, School of Science, IdisNA and CIBERONC, University of Navarra, Pamplona, Spain
700    1_
$a Montuenga, Luis M $u Program in Solid Tumors, Cima, Department of Pathology, Anatomy and Physiology, Schools of Medicine and Sciences, CCUN, IdisNA and CIBERONC, University of Navarra, Pamplona, Spain
700    1_
$a Hernáez, Mikel $u Computational Biology Program, Cima, Data Science and Artificial Intelligence Institute (DATAI), CCUN, IdisNA and CIBERONC, University of Navarra, Pamplona, Spain
700    1_
$a Ochoa, Idoia $u Electrical and Electronic Engineering Department, Tecnun, DATAI, University of Navarra, San Sebastian, Spain
700    1_
$a Perez-Gracia, Jose Luis $u Department of Oncology, CUN, CCUN, IdisNA and CIBERONC, University of Navarra, Pamplona, Spain. Electronic address: jlgracia@unav.es
773    0_
$w MED00190061 $t EBioMedicine $x 2352-3964 $g Roč. 102 (20240313), s. 105048
856    41
$u https://pubmed.ncbi.nlm.nih.gov/38484556 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y - $z 0
990    __
$a 20240725 $b ABA008
991    __
$a 20240905134303 $b ABA008
999    __
$a ok $b bmc $g 2143914 $s 1226309
BAS    __
$a 3
BAS    __
$a PreBMC-MEDLINE
BMC    __
$a 2024 $b 102 $c - $d 105048 $e 20240313 $i 2352-3964 $m EBioMedicine $n EBioMedicine $x MED00190061
LZP    __
$a Pubmed-20240725

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...