The complete sequence of a human Y chromosome

. 2023 Sep ; 621 (7978) : 344-354. [epub] 20230823

Jazyk angličtina Země Anglie, Velká Británie Médium print-electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid37612512

Grantová podpora
U01 HG010961 NHGRI NIH HHS - United States
R35 GM124827 NIGMS NIH HHS - United States
R01 GM130691 NIGMS NIH HHS - United States
T32 GM007454 NIGMS NIH HHS - United States
Z99 HG999999 Intramural NIH HHS - United States
R01 HG002939 NHGRI NIH HHS - United States
K99 GM147352 NIGMS NIH HHS - United States
R01 HG009190 NHGRI NIH HHS - United States
ZIA HG200398 Intramural NIH HHS - United States
R35 GM133747 NIGMS NIH HHS - United States
U24 HG010263 NHGRI NIH HHS - United States
R01 GM136684 NIGMS NIH HHS - United States
R01 HG010040 NHGRI NIH HHS - United States
U41 HG010972 NHGRI NIH HHS - United States
R21 CA240199 NCI NIH HHS - United States
R01 CA266339 NCI NIH HHS - United States
R00 GM147352 NIGMS NIH HHS - United States
U41 HG006620 NHGRI NIH HHS - United States
R01 HG010169 NHGRI NIH HHS - United States
U41 HG007234 NHGRI NIH HHS - United States
U01 CA253481 NCI NIH HHS - United States
U24 HG007234 NHGRI NIH HHS - United States
R01 HG011274 NHGRI NIH HHS - United States
U24 HG006620 NHGRI NIH HHS - United States
U24 HG010136 NHGRI NIH HHS - United States
R21 HG010548 NHGRI NIH HHS - United States
S10 OD028587 NIH HHS - United States
U01 HG010971 NHGRI NIH HHS - United States
U01 DA047638 NIDA NIH HHS - United States
R01 GM123312 NIGMS NIH HHS - United States
R01 GM072264 NIGMS NIH HHS - United States
R01 HG002385 NHGRI NIH HHS - United States
U01 HG011758 NHGRI NIH HHS - United States
Howard Hughes Medical Institute - United States

Odkazy

PubMed 37612512
PubMed Central PMC10752217
DOI 10.1038/s41586-023-06457-y
PII: 10.1038/s41586-023-06457-y
Knihovny.cz E-zdroje

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

Biosystems and Biomaterials Division National Institute of Standards and Technology Gaithersburg MD USA

Cancer Genetics and Comparative Genomics Branch National Human Genome Research Institute National Institutes of Health Bethesda MD USA

Center for Algorithmic Biotechnology Saint Petersburg State University St Petersburg Russia

Center for Computational Biology and Bioinformatics Pennsylvania State University University Park PA USA

Center for Evolution and Medicine School of Life Sciences Arizona State University Tempe AZ USA

Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry Sackler Faculty of Medicine Tel Aviv University Tel Aviv Yafo Israel

Department of Biochemistry and Molecular Biology Pennsylvania State University University Park PA USA

Department of Bioengineering Department of Physics Northeastern University Boston MA USA

Department of Biology Johns Hopkins University Baltimore MD USA

Department of Biology Pennsylvania State University University Park PA USA

Department of Biomedical Engineering Johns Hopkins University Baltimore MD USA

Department of Biomedical Engineering Pennsylvania State University State College PA USA

Department of Biomedical Informatics Harvard Medical School Boston MA USA

Department of Biomolecular Engineering University of California Santa Cruz Santa Cruz CA USA

Department of Computer Science and Engineering Pennsylvania State University University Park PA USA

Department of Computer Science Johns Hopkins University Baltimore MD USA

Department of Computer Science Rice University Houston TX USA

Department of Data Sciences Dana Farber Cancer Institute Boston MA USA

Department of Genetics and Genome Sciences UConn Health Farmington CT USA

Department of Genetics Genomics and Informatics University of Tennessee Health Science Center Memphis TN USA

Department of Genetics University of Cambridge Cambridge UK

Department of Genome Sciences University of Washington School of Medicine Seattle WA USA

Department of Molecular and Cell Biology University of California Berkeley CA USA

Department of Molecular and Cell Biology University of Connecticut Storrs CT USA

Departments of Biomedical Engineering Computer Science and Biostatistics Johns Hopkins University Baltimore MD USA

DNAnexus Inc Mountain View CA USA

European Molecular Biology Laboratory European Bioinformatics Institute Wellcome Genome Campus Hinxton Cambridge UK

Faculty of Informatics Masaryk University Brno Czech Republic

Federal Research Center of Biotechnology of the Russian Academy of Sciences Moscow Russia

Foundation of Biological Data Science Belmont CA USA

GeneDX Holdings Corp Stamford CT USA

Genome Informatics Section Computational and Statistical Genomics Branch National Human Genome Research Institute National Institutes of Health Bethesda MD USA

Genome Technology Access Center at the McDonnell Genome Institute Washington University St Louis MO USA

Genomics Research Centre Human Technopole Milan Italy

Google Inc Mountain View CA USA

Graduate Program in Bioinformatics and Systems Biology University of California San Diego CA USA

Human Genome Sequencing Center Baylor College of Medicine One Baylor Plaza Houston TX USA

Institute for Systems Biology Seattle WA USA

Institute for Systems Genomics University of Connecticut Storrs CT USA

Institute of Bioinformatics Faculty of Medicine University of Münster Münster Germany

Institute of Molecular Genetics Moscow Russia

Investigator Howard Hughes Medical Institute University of Washington Seattle WA USA

Masters Program in National Research University Higher School of Economics Moscow Russia

National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda MD USA

Oxford Nanopore Technologies Inc Oxford UK

Pacific Biosciences Menlo Park CA USA

Stowers Institute for Medical Research Kansas City MO USA

The Rockefeller University New York NY USA

UC Santa Cruz Genomics Institute University of California Santa Cruz Santa Cruz CA USA

UCL Queen Square Institute of Neurology UCL London UK

University of Kansas Medical Center Kansas City MO USA

XDBio Program Johns Hopkins University Baltimore MD USA

Zobrazit více v PubMed

Skaletsky H et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423, 825–837 (2003). PubMed

Miga KH et al. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 24, 697–707 (2014). PubMed PMC

Vollger MR et al. Segmental duplications and their variation in a complete human genome. Science 376, eabj6965 (2022). PubMed PMC

Nurk S et al. The complete sequence of a human genome. Science 376, 44–53 (2022). PubMed PMC

Schneider VA et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017). PubMed PMC

Gustafson ML, M. D. & Donahoe PK, M. D. MALE SEX DETERMINATION: Current Concepts of Male Sexual Differentiation. Annu. Rev. Med 45, 505–524 (1994). PubMed

Vog PH et al. Human Y Chromosome Azoospermia Factors (AZF) Mapped to Different Subregions in Yq11. Hum. Mol. Genet 5, 933–943 (1996). PubMed

Miga KH et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585, 79–84 (2020). PubMed PMC

Logsdon GA et al. The structure, function and evolution of a complete human chromosome 8. Nature 593, 101–107 (2021). PubMed PMC

Wenger AM et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol 37, 1155–1162 (2019). PubMed PMC

Jain M et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol 36, 338–345 (2018). PubMed PMC

Nurk S et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. gr 263566.120 (2020) doi:10.1101/gr.263566.120. PubMed DOI PMC

Rautiainen M & Marschall T GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol. 21, 253 (2020). PubMed PMC

Formenti G et al. Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation. Nat. Methods 19, 696–704 (2022). PubMed PMC

Kirsche M et al. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat. Methods 20, 408–417 (2023). PubMed PMC

Jain C, Rhie A, Hansen NF, Koren S & Phillippy AM Long-read mapping to repetitive reference sequences using Winnowmap2. Nat. Methods 19, 705–710 (2022). PubMed PMC

Mc Cartney AM et al. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat. Methods 19, 687–695 (2022). PubMed PMC

Rhie A, Walenz BP, Koren S & Phillippy AM Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020). PubMed PMC

Wang T et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446 (2022). PubMed PMC

Jarvis ED et al. Semi-automated assembly of high-quality diploid human reference genomes. Nature 611, 519–531 (2022). PubMed PMC

Shumate A et al. Assembly and annotation of an Ashkenazi human reference genome. Genome Biol. 21, 129 (2020). PubMed PMC

Zook JM et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016). PubMed PMC

Landrum MJ et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020). PubMed PMC

Welter D et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014). PubMed PMC

Smigielski EM, Sirotkin K, Ward M & Sherry ST dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res. 28, 352–355 (2000). PubMed PMC

Karczewski KJ et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). PubMed PMC

Byrska-Bishop M et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440.e19 (2022). PubMed PMC

Mallick S et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016). PubMed PMC

Dunham I et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). PubMed PMC

Ebert P et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021). PubMed PMC

Sanders AD et al. Single-cell analysis of structural variations and complex rearrangements with tri-channel processing. Nat. Biotechnol 38, 343–354 (2020). PubMed PMC

Hallast et al. Assembly of 43 diverse human Y chromosomes reveals extensive complexity and variation. Nature (2023). In press, doi:10.1038/s41586-023-06425-6 PubMed DOI PMC

Hammer MF et al. Extended Y chromosome haplotypes resolve multiple and unique lineages of the Jewish priesthood. Hum. Genet 126, 707 (2009). PubMed PMC

Poznik GD et al. Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nat. Genet 48, 593–599 (2016). PubMed PMC

Jiang Z, Hubley R, Smit A & Eichler EE DupMasker: a tool for annotating primate segmental duplications. Genome Res. 18, 1362–1368 (2008). PubMed PMC

Vollger MR, Kerpedjiev P, Phillippy AM & Eichler EE StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022). PubMed PMC

Vegesna R, Tomaszkiewicz M, Medvedev P & Makova KD Dosage regulation, and variation in gene expression and copy number of human Y chromosome ampliconic genes. PLOS Genet. 15, e1008369 (2019). PubMed PMC

NCBI RefSeq v110 Browser. Homo sapiens isolate NA24385 chromosome Y, alternate assembly T2T-CHM13v2.0; https://tinyurl.com/bdfudexn (2022). More tracks are visible via “Tracks shown” option.

Hoyt SJ et al. From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science 376, eabk3112 (2022). PubMed PMC

Warburton PE et al. Analysis of the largest tandemly repeated DNA families in the human genome. BMC Genomics 9, 533 (2008). PubMed PMC

Halabian R & Makałowski W A Map of 3′ DNA Transduction Variants Mediated by Non-LTR Retroelements on 3202 Human Genomes. Biology 11, 1032 (2022). PubMed PMC

Weissensteiner MH et al. Accurate sequencing of DNA motifs able to form alternative (non-B) structures. Genome Res. 33, 907–922 (2023). PubMed PMC

Tyler-Smith C, Taylor L & Müller U Structure of a hypervariable tandemly repeated DNA sequence on the short arm of the human Y chromosome. J. Mol. Biol 203, 837–848 (1988). PubMed

Xue Y & Tyler-Smith C An Exceptional Gene: Evolution of the TSPY Gene Family in Humans and Other Great Apes. Genes 2, 36–47 (2011). PubMed PMC

Saxena R et al. Four DAZ Genes in Two Clusters Found in the AZFc Region of the Human Y Chromosome. Genomics 67, 256–267 (2000). PubMed

Altemose N et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022). PubMed PMC

Jain M et al. Linear assembly of a human centromere on the Y chromosome. Nat. Biotechnol 36, 321–323 (2018). PubMed PMC

Gershman A et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022). PubMed PMC

Kasinathan S & Henikoff S Non-B-Form DNA Is Enriched at Centromeres. Mol. Biol. Evol 35, 949–962 (2018). PubMed PMC

Skene PJ & Henikoff S An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. eLife 6, e21856 (2017). PubMed PMC

Nailwal M & Chauhan JB Azoospermia factor C subregion of the Y chromosome. J. Hum. Reprod. Sci 10, 256 (2017). PubMed PMC

Kuroda-Kawaguchi T et al. The AZFc region of the Y chromosome features massive palindromes and uniform recurrent deletions in infertile men. Nat. Genet 29, 279–286 (2001). PubMed

Repping S et al. A family of human Y chromosomes has dispersed throughout northern Eurasia despite a 1.8-Mb deletion in the azoospermia factor c region. Genomics 83, 1046–1052 (2004). PubMed

Porubsky D et al. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell 185, 1986–2005.e26 (2022). PubMed PMC

Teitz LS, Pyntikova T, Skaletsky H & Page DC Selection Has Countered High Mutability to Preserve the Ancestral Copy Number of Y Chromosome Amplicons in Diverse Human Lineages. Am. J. Hum. Genet 103, 261–275 (2018). PubMed PMC

Jobling MA Copy number variation on the human Y chromosome. Cytogenet. Genome Res 123, 253–262 (2008). PubMed

Navarro-Costa P, Plancha CE & Gonçalves J Genetic Dissection of the AZF Regions of the Human Y Chromosome: Thriller or Filler for Male (In)fertility? BioMed Res. Int 2010, e936569 (2010). PubMed PMC

Evans HJ, Gosden JR, Mitchell AR & Buckland RA Location of human satellite DNAs on the Y chromosome. Nature 251, 346–347 (1974).

Schmid M, Guttenbach M, Nanda I, Studer R & Epplen JT Organization of DYZ2 repetitive DNA on the human Y chromosome. Genomics 6, 212–218 (1990). PubMed

Manz E, Alkan M, Bühler E & Schmidtke J Arrangement of DYZ1 and DYZ2 repeats on the human Y-chromosome: a case with presence of DYZ1 and absence of DYZ2. Mol. Cell. Probes 6, 257–259 (1992). PubMed

Altemose N A classical revival: Human satellite DNAs enter the genomics era. Semin. Cell Dev. Biol 128, 2–14 (2022). PubMed

Gripenberg U Size variation and orientation of the human Y chromosome. Chromosoma 15, 618–629 (1964). PubMed

Mathias N, Bayés M & Tyler-Smith C Highly informative compound haplotypes for the human Y chromosome. Hum. Mol. Genet 3, 115–123 (1994). PubMed

Altemose N, Miga KH, Maggioni M & Willard HF Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly. PLOS Comput. Biol 10, e1003628 (2014). PubMed PMC

Cooke H Repeated sequence specific to human males. Nature 262, 182–186 (1976). PubMed

Frommer M, Prosser J & Vincent PC Human satellite I sequences include a male specific 2.47 kb tandemly repeated unit containing one Alu family member per repeat. Nucleic Acids Res. 12, 2887–2900 (1984). PubMed PMC

Babcock M, Yatsenko S, Stankiewicz P, Lupski JR & Morrow BE AT-rich repeats associated with chromosome 22q11.2 rearrangement disorders shape human genome architecture on Yq12. Genome Res. 17, 451–460 (2007). PubMed PMC

Webster TH et al. Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data. GigaScience 8, giz074 (2019). PubMed PMC

Aganezov S et al. A complete reference genome improves analysis of human genetic variation. Science 376, eabl3533 (2022). PubMed PMC

Bekritsky MA, Colombo C, Eberle MA Identifying Genomic Regions with High Quality Single Nucleotide Variant Calling. Identifying Genomic Regions with High Quality Single Nucleotide Variant Calling https://www.illumina.com/content/illumina-marketing/amr/en_US/science/genomics-research/articles/identifying-genomic-regions-with-high-quality-single-nucleotide-.html.

Robinson JT et al. Integrative genomics viewer. Nat. Biotechnol 29, 24–26 (2011). PubMed PMC

Breitwieser FP, Pertea M, Zimin AV & Salzberg SL Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Res. 29, 954–960 (2019). PubMed PMC

Steinegger M & Salzberg SL Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 21, 115 (2020). PubMed PMC

Chrisman B et al. The human “contaminome”: bacterial, viral, and computational contamination in whole genome sequences from 1000 families. Sci. Rep 12, 9863 (2022). PubMed PMC

Kent WJ et al. The Human Genome Browser at UCSC. Genome Res. 12, 996–1006 (2002). PubMed PMC

Rautiainen M et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol 1–9 (2023) doi:10.1038/s41587-023-01662-6. PubMed DOI PMC

Liao W-W et al. A draft human pangenome reference. Nature 617, 312–324 (2023). PubMed PMC

Shafin K et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol 38, 1044–1053 (2020). PubMed PMC

Koren S et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol 36, 1174–1182 (2018). PubMed PMC

Kolmogorov M, Yuan J, Lin Y & Pevzner PA Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol 37, 540–546 (2019). PubMed

Poplin R et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol 36, 983–987 (2018). PubMed

Shafin K et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021). PubMed PMC

Sedlazeck FJ et al. Accurate detection of complex structural variations using single molecule sequencing. Nat. Methods 15, 461–468 (2018). PubMed PMC

Jiang T et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020). PubMed PMC

Bzikadze AV, Mikheenko A & Pevzner PA Fast and accurate mapping of long reads to complete genome assemblies with VerityMap. Genome Res. gr 276871.122 (2022) doi:10.1101/gr.276871.122. PubMed DOI PMC

Li H & Durbin R Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009). PubMed PMC

Porubsky D et al. breakpointR: an R/Bioconductor package to localize strand state changes in Strand-seq data. Bioinformatics 36, 1260–1261 (2020). PubMed

PacBio Revio WGS Dataset. Homo sapiens – GIAB trio HG002-4. https://downloads.pacbcloud.com/public/revio/2022Q4/ (2022).

Poznik David. yhaplo | Identifying Y-Chromosome Haplogroups. Last accessed: 2022–11-29. https://github.com/23andMe/yhaplo (2022).

Tseng B et al. Y-SNP Haplogroup Hierarchy Finder: a web tool for Y-SNP haplogroup assignment. J. Hum. Genet 67, 487–493 (2022). PubMed

Li H Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018). PubMed PMC

Li H Identifying centromeric satellites with dna-brnn. Bioinformatics 35, 4408–4410 (2019). PubMed PMC

Harris, Robert S. Improved Pairwise Alignmnet of Genomic DNA. (Penn State, 2007).

Morgulis A, Gertz EM, Schäffer AA & Agarwala R WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141 (2006). PubMed

Chin C-S et al. Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes. Nat. Methods 1–9 (2023) doi:10.1038/s41592-023-01914-y. PubMed DOI PMC

Frankish A et al. GENCODE 2021. Nucleic Acids Res. 49, D916–D923 (2021). PubMed PMC

Armstrong J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 587, 246–251 (2020). PubMed PMC

Kovaka S et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019). PubMed PMC

Stanke M, Diekhans M, Baertsch R & Haussler D Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008). PubMed

Fiddes IT et al. Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation. Genome Res. 28, 1029–1038 (2018). PubMed PMC

Shumate A & Salzberg SL Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021). PubMed PMC

Dale RK, Pedersen BS & Quinlan AR Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011). PubMed PMC

Rhie A et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021). PubMed PMC

Pruitt KD et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–763 (2014). PubMed PMC

Kapustin Y, Souvorov A, Tatusova T & Lipman D Splign: algorithms for computing spliced alignments with identification of paralogs. Biol. Direct 3, 20 (2008). PubMed PMC

Katoh K & Standley DM MAFFT: Iterative Refinement and Additional Methods. in Multiple Sequence Alignment Methods (ed. Russell DJ) 131–146 (Humana Press, 2014). doi:10.1007/978-1-62703-646-7_8. PubMed DOI

Slater GSC & Birney E Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005). PubMed PMC

Zook JM et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol 32, 246–251 (2014). PubMed

Numanagić I et al. Fast characterization of segmental duplications in genome assemblies. Bioinformatics 34, i706–i714 (2018). PubMed PMC

Benson G Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–580 (1999). PubMed PMC

Arian FAS, Hubley R & Green P RepeatMasker Open-4.0 2013-2015. http://www.repeatmasker.org (2015).

Storer J, Hubley R, Rosen J, Wheeler TJ & Smit AF The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob. DNA 12, 2 (2021). PubMed PMC

Olson D & Wheeler T ULTRA: a model based tool to detect tandem repeats. ACM BCB 2018, 37–46 (2018). PubMed PMC

Quinlan AR & Hall IM BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). PubMed PMC

Storer JM, Hubley R, Rosen J & Smit AF A. Curation Guidelines for de novo Generated Transposable Element Families. Curr. Protoc 1, e154 (2021). PubMed PMC

Kent WJ BLAT—The BLAST-Like Alignment Tool. Genome Res. 12, 656–664 (2002). PubMed PMC

Szak ST et al. Molecular archeology of L1 insertions in the human genome. Genome Biol. 3, research0052.1 (2002). PubMed PMC

Altschul SF, Gish W, Miller W, Myers EW & Lipman DJ Basic local alignment search tool. J. Mol. Biol 215, 403–410 (1990). PubMed

Cer R. z. et al. Searching for Non-B DNA-Forming Motifs Using nBMST (Non-B DNA Motif Search Tool). Curr. Protoc. Hum. Genet 73, 18.7.1–18.7.22 (2012). PubMed PMC

Zou X et al. Short inverted repeats contribute to localized mutability in human somatic cells. Nucleic Acids Res. 45, 11213–11221 (2017). PubMed PMC

Svetec Miklenić M et al. Size-dependent antirecombinogenic effect of short spacers on palindrome recombinogenicity. DNA Repair 90, 102848 (2020). PubMed

Sahakyan AB et al. Machine learning model for sequence-driven DNA G-quadruplex formation. Sci. Rep 7, 14535 (2017). PubMed PMC

Hao Z et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci 6, e251 (2020). PubMed PMC

Dotmatics. GraphPad Prism v.9.1.0 for Windows; https://www.graphpad.com

Vollger MR SafFire. Last accessed: 2022–11-29. https://github.com/mrvollger/SafFire (2022).

Pendleton AL et al. Comparison of village dog and wolf genomes highlights the role of the neural crest in dog domestication. BMC Biol. 16, 64 (2018). PubMed PMC

Hach F et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7, 576–577 (2010). PubMed PMC

Escalona M et al. Whole-genome sequence and assembly of the Javan gibbon (Hylobates moloch). J. Hered 114, 35–43 (2023). PubMed PMC

Cortez D et al. Origins and functional evolution of Y chromosomes across mammals. Nature 508, 488–493 (2014). PubMed

Stamatakis A RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014). PubMed PMC

Dotmatics. Geneious. v2019.2.3; https://www.geneious.com/

Rambaut et al. FigTree v1.4.4; http://tree.bio.ed.ac.uk/software/figtree/

Tyler-Smith C & Brown WRA Structure of the major block of alphoid satellite DNA on the human Y chromosome. J. Mol. Biol 195, 457–470 (1987). PubMed

Shepelev VA et al. Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly. Genomics Data 5, 139–146 (2015). PubMed PMC

Lee I et al. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. Nat. Methods 17, 1191–1199 (2020). PubMed PMC

Krumsiek J, Arnold R & Rattei T Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23, 1026–1028 (2007). PubMed

Rice P, Longden I & Bleasby A EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000). PubMed

Sun C et al. Deletion of azoospermia factor a (AZFa) region of human Y chromosome caused by recombination between HERV15 proviruses. Hum. Mol. Genet 9, 2291–2296 (2000). PubMed

Lassmann T Kalign 3: multiple sequence alignment of large datasets. Bioinformatics 36, 1928–1929 (2020). PubMed PMC

Wheeler TJ & Eddy SR nhmmer: DNA homology search with profile HMMs. Bioinformatics 29, 2487–2489 (2013). PubMed PMC

Stephens ZD et al. Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models. PLOS ONE 11, e0167047 (2016). PubMed PMC

Bushnell B BBMap: A Fast, Accurate, Splice-Aware Aligner. https://www.osti.gov/biblio/1241166 (2014).

Aken BL et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017). PubMed PMC

Poznik GD et al. Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females. Science 341, 562–565 (2013). PubMed PMC

McKenna A et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). PubMed PMC

Schatz MC et al. Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space. Cell Genomics 2, 100085 (2022). PubMed PMC

Danecek P et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021). PubMed PMC

Talenti A & Prendergast J nf-LO: A Scalable, Containerized Workflow for Genome-to-Genome Lift Over. Genome Biol. Evol 13, evab183 (2021). PubMed PMC

Guarracino A, Mwaniki N, Marco-Sola S, & Garrison E wfmash: whole-chromosome pairwise alignment using the hierarchical wavefront algorithm. GitHub https://github.com/ekg/wfmash (2021).

Sherry ST, Ward M & Sirotkin K dbSNP—Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. Genome Res. 9, 677–679 (1999). PubMed

Landrum MJ et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018). PubMed PMC

Buniello A et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019). PubMed PMC

Van der Auwera GA & O’Connor BD. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. (O’Reilly Media, 2020).

Langmead B & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). PubMed PMC

Ramírez F et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016). PubMed PMC

Zhao H et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–1007 (2014). PubMed PMC

Marçais G et al. MUMmer4: A fast and versatile genome alignment system. PLOS Comput. Biol 14, e1005944 (2018). PubMed PMC

Ondov BD, Bergman NH & Phillippy AM Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12, 385 (2011). PubMed PMC

Falconer E et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat. Methods 9, 1107–1112 (2012). PubMed PMC

Rhie Arang. (2023). Repositories for the analysis of T2T-Y and T2T-CHM13v2.0. Zenodo. 10.5281/zenodo.8136598 DOI

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...