Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem

. 2021 Sep 27 ; 22 (1) : 464. [epub] 20210927

Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid34579642

Grantová podpora
VI20172020102 Ministry of Interior of the Czech Republic
VI20172020102 Ministry of Interior of the Czech Republic
VI20172020102 Ministry of Interior of the Czech Republic
VI20172020102 Ministry of Interior of the Czech Republic
VI20172020102 Ministry of Interior of the Czech Republic
VI20172020102 Ministry of Interior of the Czech Republic

Odkazy

PubMed 34579642
PubMed Central PMC8474851
DOI 10.1186/s12859-021-04374-3
PII: 10.1186/s12859-021-04374-3
Knihovny.cz E-zdroje

BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. RESULTS: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy-Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. CONCLUSIONS: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used.

Zobrazit více v PubMed

Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nat Rev Genet. 2020;21(10):597–614. doi: 10.1038/s41576-020-0236-x. PubMed DOI PMC

Sanchis-Juan A, Stephens J, French CE, Gleadall N, Megy K, Penkett C, et al. Complex structural variants in Mendelian disorders: identification and breakpoint resolution using short- and long-read genome sequencing. Genome Med. 2018;10(1):95. doi: 10.1186/s13073-018-0606-6. PubMed DOI PMC

Thibodeau ML, O'Neill K, Dixon K, Reisle C, Mungall KL, Krzywinski M, et al. Improved structural variant interpretation for hereditary cancer susceptibility using long-read sequencing. Genet Med. 2020;22(11):1892–1897. doi: 10.1038/s41436-020-0880-8. PubMed DOI PMC

Collins RL, Brand H, Karczewski KJ, Zhao X, Alfoldi J, Francioli LC, et al. A structural variation reference for medical and population genetics. Nature. 2020;581(7809):444–451. doi: 10.1038/s41586-020-2287-8. PubMed DOI PMC

Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, et al. Mapping and characterization of structural variation in 17,795 human genomes. Nature. 2020;583(7814):83–89. doi: 10.1038/s41586-020-2371-0. PubMed DOI PMC

Jeffares DC, Jolly C, Hoti M, Speed D, Shaw L, Rallis C, et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun. 2017;8:14061. doi: 10.1038/ncomms14061. PubMed DOI PMC

Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526(7571):7581. doi: 10.1038/nature15394. PubMed DOI PMC

gatk4-data-processing. Available from: https://github.com/gatk-workflows/gatk4-data-processing.

Quality control—tasks. Available from: https://github.com/gatk-workflows/five-dollar-genome-analysis-pipeline/blob/master/tasks/Qc.wdl.

Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Kallberg M, et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016;32(8):1220–1222. doi: 10.1093/bioinformatics/btv710. PubMed DOI

Constrained-clustering-of-structural-variants. Available from: https://github.com/geryk/Constrained-clustering-of-structural-variants.

Kraemer P, Gerlach G. Demerelate: calculating interindividual relatedness for kinship analysis based on codominant diploid genetic markers using R. Mol Ecol Resour. 2017;17(6):1371–1377. doi: 10.1111/1755-0998.12666. PubMed DOI

Loiselle BA, Sork VL, Nason J, Graham C. Spatial genetic-structure of a tropical understory shrub, Psychotria Officinalis (Rubiaceae) Am J Bot. 1995;82(11):1420–1425. doi: 10.1002/j.1537-2197.1995.tb12679.x. DOI

Lynch M. Estimation of relatedness by DNA fingerprinting. Mol Biol Evol. 1988;5(5):584–599. PubMed

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...