-
Something wrong with this record ?
Probably Correct: Rescuing Repeats with Short and Long Reads
M. Cechova
Language English Country Switzerland
Document type Journal Article, Research Support, Non-U.S. Gov't, Review
NLK
Free Medical Journals
from 2010
PubMed Central
from 2010
Europe PubMed Central
from 2010
ProQuest Central
from 2010-03-01
Open Access Digital Library
from 2010-01-01
Open Access Digital Library
from 2010-01-01
ROAD: Directory of Open Access Scholarly Resources
from 2010
PubMed
33396198
DOI
10.3390/genes12010048
Knihovny.cz E-resources
- MeSH
- Centromere chemistry MeSH
- Genome Size MeSH
- Genome, Human * MeSH
- Humans MeSH
- Chromosome Mapping methods MeSH
- DNA Methylation MeSH
- Microsatellite Repeats * MeSH
- Sex Chromosomes chemistry MeSH
- Telomere chemistry MeSH
- Computational Biology methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Review MeSH
Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
References provided by Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc21026320
- 003
- CZ-PrNML
- 005
- 20211026133008.0
- 007
- ta
- 008
- 211013s2020 sz f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.3390/genes12010048 $2 doi
- 035 __
- $a (PubMed)33396198
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a sz
- 100 1_
- $a Cechova, Monika $u Genetics and Reproductive Biotechnologies, Veterinary Research Institute, Central European Institute of Technology (CEITEC), 621 00 Brno, Czech Republic
- 245 10
- $a Probably Correct: Rescuing Repeats with Short and Long Reads / $c M. Cechova
- 520 9_
- $a Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
- 650 _2
- $a centromera $x chemie $7 D002503
- 650 _2
- $a mapování chromozomů $x metody $7 D002874
- 650 _2
- $a výpočetní biologie $x metody $7 D019295
- 650 _2
- $a metylace DNA $7 D019175
- 650 _2
- $a délka genomu $7 D059646
- 650 12
- $a genom lidský $7 D015894
- 650 _2
- $a lidé $7 D006801
- 650 12
- $a mikrosatelitní repetice $7 D018895
- 650 _2
- $a pohlavní chromozomy $x chemie $7 D012730
- 650 _2
- $a telomery $x chemie $7 D016615
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a práce podpořená grantem $7 D013485
- 655 _2
- $a přehledy $7 D016454
- 773 0_
- $w MED00174652 $t Genes $x 2073-4425 $g Roč. 12, č. 1 (2020)
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/33396198 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y p $z 0
- 990 __
- $a 20211013 $b ABA008
- 991 __
- $a 20211026133014 $b ABA008
- 999 __
- $a ok $b bmc $g 1715136 $s 1146827
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2020 $b 12 $c 1 $e 20201231 $i 2073-4425 $m Genes $n Genes $x MED00174652
- LZP __
- $a Pubmed-20211013