Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem
Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
VI20172020102
Ministry of Interior of the Czech Republic
VI20172020102
Ministry of Interior of the Czech Republic
VI20172020102
Ministry of Interior of the Czech Republic
VI20172020102
Ministry of Interior of the Czech Republic
VI20172020102
Ministry of Interior of the Czech Republic
VI20172020102
Ministry of Interior of the Czech Republic
PubMed
34579642
PubMed Central
PMC8474851
DOI
10.1186/s12859-021-04374-3
PII: 10.1186/s12859-021-04374-3
Knihovny.cz E-zdroje
- Klíčová slova
- Breakpoints uncertainty problem, Constrained clustering, Mendelian inheritance error, Structural variants, Whole genome sequencing,
- MeSH
- genom lidský * MeSH
- genomika MeSH
- lidé MeSH
- nejistota MeSH
- shluková analýza MeSH
- strukturální variace genomu * MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. RESULTS: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy-Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. CONCLUSIONS: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used.
Zobrazit více v PubMed
Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nat Rev Genet. 2020;21(10):597–614. doi: 10.1038/s41576-020-0236-x. PubMed DOI PMC
Sanchis-Juan A, Stephens J, French CE, Gleadall N, Megy K, Penkett C, et al. Complex structural variants in Mendelian disorders: identification and breakpoint resolution using short- and long-read genome sequencing. Genome Med. 2018;10(1):95. doi: 10.1186/s13073-018-0606-6. PubMed DOI PMC
Thibodeau ML, O'Neill K, Dixon K, Reisle C, Mungall KL, Krzywinski M, et al. Improved structural variant interpretation for hereditary cancer susceptibility using long-read sequencing. Genet Med. 2020;22(11):1892–1897. doi: 10.1038/s41436-020-0880-8. PubMed DOI PMC
Collins RL, Brand H, Karczewski KJ, Zhao X, Alfoldi J, Francioli LC, et al. A structural variation reference for medical and population genetics. Nature. 2020;581(7809):444–451. doi: 10.1038/s41586-020-2287-8. PubMed DOI PMC
Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, et al. Mapping and characterization of structural variation in 17,795 human genomes. Nature. 2020;583(7814):83–89. doi: 10.1038/s41586-020-2371-0. PubMed DOI PMC
Jeffares DC, Jolly C, Hoti M, Speed D, Shaw L, Rallis C, et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun. 2017;8:14061. doi: 10.1038/ncomms14061. PubMed DOI PMC
Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526(7571):7581. doi: 10.1038/nature15394. PubMed DOI PMC
gatk4-data-processing. Available from: https://github.com/gatk-workflows/gatk4-data-processing.
Quality control—tasks. Available from: https://github.com/gatk-workflows/five-dollar-genome-analysis-pipeline/blob/master/tasks/Qc.wdl.
Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Kallberg M, et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016;32(8):1220–1222. doi: 10.1093/bioinformatics/btv710. PubMed DOI
Constrained-clustering-of-structural-variants. Available from: https://github.com/geryk/Constrained-clustering-of-structural-variants.
Kraemer P, Gerlach G. Demerelate: calculating interindividual relatedness for kinship analysis based on codominant diploid genetic markers using R. Mol Ecol Resour. 2017;17(6):1371–1377. doi: 10.1111/1755-0998.12666. PubMed DOI
Loiselle BA, Sork VL, Nason J, Graham C. Spatial genetic-structure of a tropical understory shrub, Psychotria Officinalis (Rubiaceae) Am J Bot. 1995;82(11):1420–1425. doi: 10.1002/j.1537-2197.1995.tb12679.x. DOI
Lynch M. Estimation of relatedness by DNA fingerprinting. Mol Biol Evol. 1988;5(5):584–599. PubMed