Integrating next-generation sequencing and artificial intelligence for the identification and validation of pathogenic variants in colorectal cancer Academic Article

abstract

  • Background: Colorectal cancer (CRC) is recognized as a multifactorial disease, where both genetic and environmental factors play critical roles in its development and progression. The identification of pathogenic germline variants has proven to be a valuable tool for early diagnosis, the implementation of surveillance strategies, and the identification of individuals at increased cancer risk. Next-generation sequencing (NGS) has facilitated comprehensive multigene analysis in both hereditary and sporadic cases of CRC. Patients and methods: In this study, we analyzed 100 unselected Colombian patients with CRC to identify pathogenic (P) and likely pathogenic (LP) germline variants, classified according to the guidelines established by the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP). Using the BoostDM artificial intelligence method, we were able to identify oncodriver germline variants with potential implications for disease progression. We assessed the model’s accuracy in predicting germline variants by comparing its results with the AlphaMissense pathogenicity prediction model. Additionally, a minigene assay was employed for the functional validation of intronic mutations. Results: Our findings revealed that 12percent-flag-change of the patients carried pathogenic/likely pathogenic (P/LP) variants according to ACMG/AMP criteria. Using BoostDM, we identified oncodriver variants in 65percent-flag-change of the cases. These results highlight the significance of expanded multigene analysis and the integration of artificial intelligence in detecting germline variants associated with CRC. The average overall AUC values for the comparison between BoostDM and AlphaMissense were 0.788 for the entire BoostDM dataset and 0.803 for the genes within our panel, with individual gene AUC values ranging from 0.606 to 0.983. Functional validation through the minigene assay revealed the generation of aberrant transcripts, potentially linked to the molecular etiology of the disease. Conclusion: Our study provided valuable insights into the prevalence and frequency of P/LP germline variants in unselected Colombian CRC patients through NGS. Integrating advanced genomic analysis and artificial intelligence has proven instrumental in enhancing variant detection beyond conventional methods. Our functional validation results provide insights into the potential pathogenicity of intronic variants. These findings underscore the necessity of a multifaceted approach to unravel the complex genetic landscape of CRC.

publication date

  • 2025-1-1

edition

  • 15