• Je něco špatně v tomto záznamu ?

Probably Correct: Rescuing Repeats with Short and Long Reads

M. Cechova

. 2020 ; 12 (1) : . [pub] 20201231

Jazyk angličtina Země Švýcarsko

Typ dokumentu časopisecké články, práce podpořená grantem, přehledy

Perzistentní odkaz   https://www.medvik.cz/link/bmc21026320

Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.

Citace poskytuje Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc21026320
003      
CZ-PrNML
005      
20211026133008.0
007      
ta
008      
211013s2020 sz f 000 0|eng||
009      
AR
024    7_
$a 10.3390/genes12010048 $2 doi
035    __
$a (PubMed)33396198
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a sz
100    1_
$a Cechova, Monika $u Genetics and Reproductive Biotechnologies, Veterinary Research Institute, Central European Institute of Technology (CEITEC), 621 00 Brno, Czech Republic
245    10
$a Probably Correct: Rescuing Repeats with Short and Long Reads / $c M. Cechova
520    9_
$a Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
650    _2
$a centromera $x chemie $7 D002503
650    _2
$a mapování chromozomů $x metody $7 D002874
650    _2
$a výpočetní biologie $x metody $7 D019295
650    _2
$a metylace DNA $7 D019175
650    _2
$a délka genomu $7 D059646
650    12
$a genom lidský $7 D015894
650    _2
$a lidé $7 D006801
650    12
$a mikrosatelitní repetice $7 D018895
650    _2
$a pohlavní chromozomy $x chemie $7 D012730
650    _2
$a telomery $x chemie $7 D016615
655    _2
$a časopisecké články $7 D016428
655    _2
$a práce podpořená grantem $7 D013485
655    _2
$a přehledy $7 D016454
773    0_
$w MED00174652 $t Genes $x 2073-4425 $g Roč. 12, č. 1 (2020)
856    41
$u https://pubmed.ncbi.nlm.nih.gov/33396198 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y p $z 0
990    __
$a 20211013 $b ABA008
991    __
$a 20211026133014 $b ABA008
999    __
$a ok $b bmc $g 1715136 $s 1146827
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2020 $b 12 $c 1 $e 20201231 $i 2073-4425 $m Genes $n Genes $x MED00174652
LZP    __
$a Pubmed-20211013

Najít záznam

Citační ukazatele

Nahrávání dat ...

    Možnosti archivace