The complete sequence of a human Y chromosome
Jazyk angličtina Země Anglie, Velká Británie Médium print-electronic
Typ dokumentu časopisecké články
Grantová podpora
U01 HG010961
NHGRI NIH HHS - United States
R35 GM124827
NIGMS NIH HHS - United States
R01 GM130691
NIGMS NIH HHS - United States
T32 GM007454
NIGMS NIH HHS - United States
Z99 HG999999
Intramural NIH HHS - United States
R01 HG002939
NHGRI NIH HHS - United States
K99 GM147352
NIGMS NIH HHS - United States
R01 HG009190
NHGRI NIH HHS - United States
ZIA HG200398
Intramural NIH HHS - United States
R35 GM133747
NIGMS NIH HHS - United States
U24 HG010263
NHGRI NIH HHS - United States
R01 GM136684
NIGMS NIH HHS - United States
R01 HG010040
NHGRI NIH HHS - United States
U41 HG010972
NHGRI NIH HHS - United States
R21 CA240199
NCI NIH HHS - United States
R01 CA266339
NCI NIH HHS - United States
R00 GM147352
NIGMS NIH HHS - United States
U41 HG006620
NHGRI NIH HHS - United States
R01 HG010169
NHGRI NIH HHS - United States
U41 HG007234
NHGRI NIH HHS - United States
U01 CA253481
NCI NIH HHS - United States
U24 HG007234
NHGRI NIH HHS - United States
R01 HG011274
NHGRI NIH HHS - United States
U24 HG006620
NHGRI NIH HHS - United States
U24 HG010136
NHGRI NIH HHS - United States
R21 HG010548
NHGRI NIH HHS - United States
S10 OD028587
NIH HHS - United States
U01 HG010971
NHGRI NIH HHS - United States
U01 DA047638
NIDA NIH HHS - United States
R01 GM123312
NIGMS NIH HHS - United States
R01 GM072264
NIGMS NIH HHS - United States
R01 HG002385
NHGRI NIH HHS - United States
U01 HG011758
NHGRI NIH HHS - United States
Howard Hughes Medical Institute - United States
PubMed
37612512
PubMed Central
PMC10752217
DOI
10.1038/s41586-023-06457-y
PII: 10.1038/s41586-023-06457-y
Knihovny.cz E-zdroje
- MeSH
- genetická variace genetika MeSH
- genomika * metody normy MeSH
- heterochromatin genetika MeSH
- lidé MeSH
- lidský chromozom Y * genetika MeSH
- multigenová rodina genetika MeSH
- populační genetika MeSH
- referenční standardy MeSH
- satelitní DNA genetika MeSH
- segmentové duplikace genetika MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza DNA * normy MeSH
- tandemové repetitivní sekvence genetika MeSH
- telomery genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- DAZ1 protein, human MeSH Prohlížeč
- heterochromatin MeSH
- RBMY1A1 protein, human MeSH Prohlížeč
- satelitní DNA MeSH
- TSPY1 protein, human MeSH Prohlížeč
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Center for Algorithmic Biotechnology Saint Petersburg State University St Petersburg Russia
Center for Evolution and Medicine School of Life Sciences Arizona State University Tempe AZ USA
Department of Bioengineering Department of Physics Northeastern University Boston MA USA
Department of Biology Johns Hopkins University Baltimore MD USA
Department of Biology Pennsylvania State University University Park PA USA
Department of Biomedical Engineering Johns Hopkins University Baltimore MD USA
Department of Biomedical Engineering Pennsylvania State University State College PA USA
Department of Biomedical Informatics Harvard Medical School Boston MA USA
Department of Biomolecular Engineering University of California Santa Cruz Santa Cruz CA USA
Department of Computer Science and Engineering Pennsylvania State University University Park PA USA
Department of Computer Science Johns Hopkins University Baltimore MD USA
Department of Computer Science Rice University Houston TX USA
Department of Data Sciences Dana Farber Cancer Institute Boston MA USA
Department of Genetics and Genome Sciences UConn Health Farmington CT USA
Department of Genetics University of Cambridge Cambridge UK
Department of Genome Sciences University of Washington School of Medicine Seattle WA USA
Department of Molecular and Cell Biology University of California Berkeley CA USA
Department of Molecular and Cell Biology University of Connecticut Storrs CT USA
DNAnexus Inc Mountain View CA USA
Faculty of Informatics Masaryk University Brno Czech Republic
Federal Research Center of Biotechnology of the Russian Academy of Sciences Moscow Russia
Foundation of Biological Data Science Belmont CA USA
GeneDX Holdings Corp Stamford CT USA
Genomics Research Centre Human Technopole Milan Italy
Google Inc Mountain View CA USA
Graduate Program in Bioinformatics and Systems Biology University of California San Diego CA USA
Human Genome Sequencing Center Baylor College of Medicine One Baylor Plaza Houston TX USA
Institute for Systems Biology Seattle WA USA
Institute for Systems Genomics University of Connecticut Storrs CT USA
Institute of Bioinformatics Faculty of Medicine University of Münster Münster Germany
Institute of Molecular Genetics Moscow Russia
Investigator Howard Hughes Medical Institute University of Washington Seattle WA USA
Masters Program in National Research University Higher School of Economics Moscow Russia
Oxford Nanopore Technologies Inc Oxford UK
Pacific Biosciences Menlo Park CA USA
Stowers Institute for Medical Research Kansas City MO USA
The Rockefeller University New York NY USA
UC Santa Cruz Genomics Institute University of California Santa Cruz Santa Cruz CA USA
UCL Queen Square Institute of Neurology UCL London UK
Zobrazit více v PubMed
Skaletsky H et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423, 825–837 (2003). PubMed
Miga KH et al. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 24, 697–707 (2014). PubMed PMC
Vollger MR et al. Segmental duplications and their variation in a complete human genome. Science 376, eabj6965 (2022). PubMed PMC
Nurk S et al. The complete sequence of a human genome. Science 376, 44–53 (2022). PubMed PMC
Schneider VA et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017). PubMed PMC
Gustafson ML, M. D. & Donahoe PK, M. D. MALE SEX DETERMINATION: Current Concepts of Male Sexual Differentiation. Annu. Rev. Med 45, 505–524 (1994). PubMed
Vog PH et al. Human Y Chromosome Azoospermia Factors (AZF) Mapped to Different Subregions in Yq11. Hum. Mol. Genet 5, 933–943 (1996). PubMed
Miga KH et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585, 79–84 (2020). PubMed PMC
Logsdon GA et al. The structure, function and evolution of a complete human chromosome 8. Nature 593, 101–107 (2021). PubMed PMC
Wenger AM et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol 37, 1155–1162 (2019). PubMed PMC
Jain M et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol 36, 338–345 (2018). PubMed PMC
Nurk S et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. gr 263566.120 (2020) doi:10.1101/gr.263566.120. PubMed DOI PMC
Rautiainen M & Marschall T GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol. 21, 253 (2020). PubMed PMC
Formenti G et al. Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation. Nat. Methods 19, 696–704 (2022). PubMed PMC
Kirsche M et al. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat. Methods 20, 408–417 (2023). PubMed PMC
Jain C, Rhie A, Hansen NF, Koren S & Phillippy AM Long-read mapping to repetitive reference sequences using Winnowmap2. Nat. Methods 19, 705–710 (2022). PubMed PMC
Mc Cartney AM et al. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat. Methods 19, 687–695 (2022). PubMed PMC
Rhie A, Walenz BP, Koren S & Phillippy AM Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020). PubMed PMC
Wang T et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446 (2022). PubMed PMC
Jarvis ED et al. Semi-automated assembly of high-quality diploid human reference genomes. Nature 611, 519–531 (2022). PubMed PMC
Shumate A et al. Assembly and annotation of an Ashkenazi human reference genome. Genome Biol. 21, 129 (2020). PubMed PMC
Zook JM et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016). PubMed PMC
Landrum MJ et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020). PubMed PMC
Welter D et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014). PubMed PMC
Smigielski EM, Sirotkin K, Ward M & Sherry ST dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res. 28, 352–355 (2000). PubMed PMC
Karczewski KJ et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). PubMed PMC
Byrska-Bishop M et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440.e19 (2022). PubMed PMC
Mallick S et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016). PubMed PMC
Dunham I et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). PubMed PMC
Ebert P et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021). PubMed PMC
Sanders AD et al. Single-cell analysis of structural variations and complex rearrangements with tri-channel processing. Nat. Biotechnol 38, 343–354 (2020). PubMed PMC
Hallast et al. Assembly of 43 diverse human Y chromosomes reveals extensive complexity and variation. Nature (2023). In press, doi:10.1038/s41586-023-06425-6 PubMed DOI PMC
Hammer MF et al. Extended Y chromosome haplotypes resolve multiple and unique lineages of the Jewish priesthood. Hum. Genet 126, 707 (2009). PubMed PMC
Poznik GD et al. Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nat. Genet 48, 593–599 (2016). PubMed PMC
Jiang Z, Hubley R, Smit A & Eichler EE DupMasker: a tool for annotating primate segmental duplications. Genome Res. 18, 1362–1368 (2008). PubMed PMC
Vollger MR, Kerpedjiev P, Phillippy AM & Eichler EE StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022). PubMed PMC
Vegesna R, Tomaszkiewicz M, Medvedev P & Makova KD Dosage regulation, and variation in gene expression and copy number of human Y chromosome ampliconic genes. PLOS Genet. 15, e1008369 (2019). PubMed PMC
NCBI RefSeq v110 Browser. Homo sapiens isolate NA24385 chromosome Y, alternate assembly T2T-CHM13v2.0; https://tinyurl.com/bdfudexn (2022). More tracks are visible via “Tracks shown” option.
Hoyt SJ et al. From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science 376, eabk3112 (2022). PubMed PMC
Warburton PE et al. Analysis of the largest tandemly repeated DNA families in the human genome. BMC Genomics 9, 533 (2008). PubMed PMC
Halabian R & Makałowski W A Map of 3′ DNA Transduction Variants Mediated by Non-LTR Retroelements on 3202 Human Genomes. Biology 11, 1032 (2022). PubMed PMC
Weissensteiner MH et al. Accurate sequencing of DNA motifs able to form alternative (non-B) structures. Genome Res. 33, 907–922 (2023). PubMed PMC
Tyler-Smith C, Taylor L & Müller U Structure of a hypervariable tandemly repeated DNA sequence on the short arm of the human Y chromosome. J. Mol. Biol 203, 837–848 (1988). PubMed
Xue Y & Tyler-Smith C An Exceptional Gene: Evolution of the TSPY Gene Family in Humans and Other Great Apes. Genes 2, 36–47 (2011). PubMed PMC
Saxena R et al. Four DAZ Genes in Two Clusters Found in the AZFc Region of the Human Y Chromosome. Genomics 67, 256–267 (2000). PubMed
Altemose N et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022). PubMed PMC
Jain M et al. Linear assembly of a human centromere on the Y chromosome. Nat. Biotechnol 36, 321–323 (2018). PubMed PMC
Gershman A et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022). PubMed PMC
Kasinathan S & Henikoff S Non-B-Form DNA Is Enriched at Centromeres. Mol. Biol. Evol 35, 949–962 (2018). PubMed PMC
Skene PJ & Henikoff S An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. eLife 6, e21856 (2017). PubMed PMC
Nailwal M & Chauhan JB Azoospermia factor C subregion of the Y chromosome. J. Hum. Reprod. Sci 10, 256 (2017). PubMed PMC
Kuroda-Kawaguchi T et al. The AZFc region of the Y chromosome features massive palindromes and uniform recurrent deletions in infertile men. Nat. Genet 29, 279–286 (2001). PubMed
Repping S et al. A family of human Y chromosomes has dispersed throughout northern Eurasia despite a 1.8-Mb deletion in the azoospermia factor c region. Genomics 83, 1046–1052 (2004). PubMed
Porubsky D et al. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell 185, 1986–2005.e26 (2022). PubMed PMC
Teitz LS, Pyntikova T, Skaletsky H & Page DC Selection Has Countered High Mutability to Preserve the Ancestral Copy Number of Y Chromosome Amplicons in Diverse Human Lineages. Am. J. Hum. Genet 103, 261–275 (2018). PubMed PMC
Jobling MA Copy number variation on the human Y chromosome. Cytogenet. Genome Res 123, 253–262 (2008). PubMed
Navarro-Costa P, Plancha CE & Gonçalves J Genetic Dissection of the AZF Regions of the Human Y Chromosome: Thriller or Filler for Male (In)fertility? BioMed Res. Int 2010, e936569 (2010). PubMed PMC
Evans HJ, Gosden JR, Mitchell AR & Buckland RA Location of human satellite DNAs on the Y chromosome. Nature 251, 346–347 (1974).
Schmid M, Guttenbach M, Nanda I, Studer R & Epplen JT Organization of DYZ2 repetitive DNA on the human Y chromosome. Genomics 6, 212–218 (1990). PubMed
Manz E, Alkan M, Bühler E & Schmidtke J Arrangement of DYZ1 and DYZ2 repeats on the human Y-chromosome: a case with presence of DYZ1 and absence of DYZ2. Mol. Cell. Probes 6, 257–259 (1992). PubMed
Altemose N A classical revival: Human satellite DNAs enter the genomics era. Semin. Cell Dev. Biol 128, 2–14 (2022). PubMed
Gripenberg U Size variation and orientation of the human Y chromosome. Chromosoma 15, 618–629 (1964). PubMed
Mathias N, Bayés M & Tyler-Smith C Highly informative compound haplotypes for the human Y chromosome. Hum. Mol. Genet 3, 115–123 (1994). PubMed
Altemose N, Miga KH, Maggioni M & Willard HF Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly. PLOS Comput. Biol 10, e1003628 (2014). PubMed PMC
Cooke H Repeated sequence specific to human males. Nature 262, 182–186 (1976). PubMed
Frommer M, Prosser J & Vincent PC Human satellite I sequences include a male specific 2.47 kb tandemly repeated unit containing one Alu family member per repeat. Nucleic Acids Res. 12, 2887–2900 (1984). PubMed PMC
Babcock M, Yatsenko S, Stankiewicz P, Lupski JR & Morrow BE AT-rich repeats associated with chromosome 22q11.2 rearrangement disorders shape human genome architecture on Yq12. Genome Res. 17, 451–460 (2007). PubMed PMC
Webster TH et al. Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data. GigaScience 8, giz074 (2019). PubMed PMC
Aganezov S et al. A complete reference genome improves analysis of human genetic variation. Science 376, eabl3533 (2022). PubMed PMC
Bekritsky MA, Colombo C, Eberle MA Identifying Genomic Regions with High Quality Single Nucleotide Variant Calling. Identifying Genomic Regions with High Quality Single Nucleotide Variant Calling https://www.illumina.com/content/illumina-marketing/amr/en_US/science/genomics-research/articles/identifying-genomic-regions-with-high-quality-single-nucleotide-.html.
Robinson JT et al. Integrative genomics viewer. Nat. Biotechnol 29, 24–26 (2011). PubMed PMC
Breitwieser FP, Pertea M, Zimin AV & Salzberg SL Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Res. 29, 954–960 (2019). PubMed PMC
Steinegger M & Salzberg SL Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 21, 115 (2020). PubMed PMC
Chrisman B et al. The human “contaminome”: bacterial, viral, and computational contamination in whole genome sequences from 1000 families. Sci. Rep 12, 9863 (2022). PubMed PMC
Kent WJ et al. The Human Genome Browser at UCSC. Genome Res. 12, 996–1006 (2002). PubMed PMC
Rautiainen M et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol 1–9 (2023) doi:10.1038/s41587-023-01662-6. PubMed DOI PMC
Liao W-W et al. A draft human pangenome reference. Nature 617, 312–324 (2023). PubMed PMC
Shafin K et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol 38, 1044–1053 (2020). PubMed PMC
Koren S et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol 36, 1174–1182 (2018). PubMed PMC
Kolmogorov M, Yuan J, Lin Y & Pevzner PA Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol 37, 540–546 (2019). PubMed
Poplin R et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol 36, 983–987 (2018). PubMed
Shafin K et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021). PubMed PMC
Sedlazeck FJ et al. Accurate detection of complex structural variations using single molecule sequencing. Nat. Methods 15, 461–468 (2018). PubMed PMC
Jiang T et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020). PubMed PMC
Bzikadze AV, Mikheenko A & Pevzner PA Fast and accurate mapping of long reads to complete genome assemblies with VerityMap. Genome Res. gr 276871.122 (2022) doi:10.1101/gr.276871.122. PubMed DOI PMC
Li H & Durbin R Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009). PubMed PMC
Porubsky D et al. breakpointR: an R/Bioconductor package to localize strand state changes in Strand-seq data. Bioinformatics 36, 1260–1261 (2020). PubMed
PacBio Revio WGS Dataset. Homo sapiens – GIAB trio HG002-4. https://downloads.pacbcloud.com/public/revio/2022Q4/ (2022).
Poznik David. yhaplo | Identifying Y-Chromosome Haplogroups. Last accessed: 2022–11-29. https://github.com/23andMe/yhaplo (2022).
Tseng B et al. Y-SNP Haplogroup Hierarchy Finder: a web tool for Y-SNP haplogroup assignment. J. Hum. Genet 67, 487–493 (2022). PubMed
Li H Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018). PubMed PMC
Li H Identifying centromeric satellites with dna-brnn. Bioinformatics 35, 4408–4410 (2019). PubMed PMC
Harris, Robert S. Improved Pairwise Alignmnet of Genomic DNA. (Penn State, 2007).
Morgulis A, Gertz EM, Schäffer AA & Agarwala R WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141 (2006). PubMed
Chin C-S et al. Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes. Nat. Methods 1–9 (2023) doi:10.1038/s41592-023-01914-y. PubMed DOI PMC
Frankish A et al. GENCODE 2021. Nucleic Acids Res. 49, D916–D923 (2021). PubMed PMC
Armstrong J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 587, 246–251 (2020). PubMed PMC
Kovaka S et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019). PubMed PMC
Stanke M, Diekhans M, Baertsch R & Haussler D Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008). PubMed
Fiddes IT et al. Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation. Genome Res. 28, 1029–1038 (2018). PubMed PMC
Shumate A & Salzberg SL Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021). PubMed PMC
Dale RK, Pedersen BS & Quinlan AR Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011). PubMed PMC
Rhie A et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021). PubMed PMC
Pruitt KD et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–763 (2014). PubMed PMC
Kapustin Y, Souvorov A, Tatusova T & Lipman D Splign: algorithms for computing spliced alignments with identification of paralogs. Biol. Direct 3, 20 (2008). PubMed PMC
Katoh K & Standley DM MAFFT: Iterative Refinement and Additional Methods. in Multiple Sequence Alignment Methods (ed. Russell DJ) 131–146 (Humana Press, 2014). doi:10.1007/978-1-62703-646-7_8. PubMed DOI
Slater GSC & Birney E Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005). PubMed PMC
Zook JM et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol 32, 246–251 (2014). PubMed
Numanagić I et al. Fast characterization of segmental duplications in genome assemblies. Bioinformatics 34, i706–i714 (2018). PubMed PMC
Benson G Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–580 (1999). PubMed PMC
Arian FAS, Hubley R & Green P RepeatMasker Open-4.0 2013-2015. http://www.repeatmasker.org (2015).
Storer J, Hubley R, Rosen J, Wheeler TJ & Smit AF The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob. DNA 12, 2 (2021). PubMed PMC
Olson D & Wheeler T ULTRA: a model based tool to detect tandem repeats. ACM BCB 2018, 37–46 (2018). PubMed PMC
Quinlan AR & Hall IM BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). PubMed PMC
Storer JM, Hubley R, Rosen J & Smit AF A. Curation Guidelines for de novo Generated Transposable Element Families. Curr. Protoc 1, e154 (2021). PubMed PMC
Kent WJ BLAT—The BLAST-Like Alignment Tool. Genome Res. 12, 656–664 (2002). PubMed PMC
Szak ST et al. Molecular archeology of L1 insertions in the human genome. Genome Biol. 3, research0052.1 (2002). PubMed PMC
Altschul SF, Gish W, Miller W, Myers EW & Lipman DJ Basic local alignment search tool. J. Mol. Biol 215, 403–410 (1990). PubMed
Cer R. z. et al. Searching for Non-B DNA-Forming Motifs Using nBMST (Non-B DNA Motif Search Tool). Curr. Protoc. Hum. Genet 73, 18.7.1–18.7.22 (2012). PubMed PMC
Zou X et al. Short inverted repeats contribute to localized mutability in human somatic cells. Nucleic Acids Res. 45, 11213–11221 (2017). PubMed PMC
Svetec Miklenić M et al. Size-dependent antirecombinogenic effect of short spacers on palindrome recombinogenicity. DNA Repair 90, 102848 (2020). PubMed
Sahakyan AB et al. Machine learning model for sequence-driven DNA G-quadruplex formation. Sci. Rep 7, 14535 (2017). PubMed PMC
Hao Z et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci 6, e251 (2020). PubMed PMC
Dotmatics. GraphPad Prism v.9.1.0 for Windows; https://www.graphpad.com
Vollger MR SafFire. Last accessed: 2022–11-29. https://github.com/mrvollger/SafFire (2022).
Pendleton AL et al. Comparison of village dog and wolf genomes highlights the role of the neural crest in dog domestication. BMC Biol. 16, 64 (2018). PubMed PMC
Hach F et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7, 576–577 (2010). PubMed PMC
Escalona M et al. Whole-genome sequence and assembly of the Javan gibbon (Hylobates moloch). J. Hered 114, 35–43 (2023). PubMed PMC
Cortez D et al. Origins and functional evolution of Y chromosomes across mammals. Nature 508, 488–493 (2014). PubMed
Stamatakis A RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014). PubMed PMC
Dotmatics. Geneious. v2019.2.3; https://www.geneious.com/
Rambaut et al. FigTree v1.4.4; http://tree.bio.ed.ac.uk/software/figtree/
Tyler-Smith C & Brown WRA Structure of the major block of alphoid satellite DNA on the human Y chromosome. J. Mol. Biol 195, 457–470 (1987). PubMed
Shepelev VA et al. Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly. Genomics Data 5, 139–146 (2015). PubMed PMC
Lee I et al. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. Nat. Methods 17, 1191–1199 (2020). PubMed PMC
Krumsiek J, Arnold R & Rattei T Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23, 1026–1028 (2007). PubMed
Rice P, Longden I & Bleasby A EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000). PubMed
Sun C et al. Deletion of azoospermia factor a (AZFa) region of human Y chromosome caused by recombination between HERV15 proviruses. Hum. Mol. Genet 9, 2291–2296 (2000). PubMed
Lassmann T Kalign 3: multiple sequence alignment of large datasets. Bioinformatics 36, 1928–1929 (2020). PubMed PMC
Wheeler TJ & Eddy SR nhmmer: DNA homology search with profile HMMs. Bioinformatics 29, 2487–2489 (2013). PubMed PMC
Stephens ZD et al. Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models. PLOS ONE 11, e0167047 (2016). PubMed PMC
Bushnell B BBMap: A Fast, Accurate, Splice-Aware Aligner. https://www.osti.gov/biblio/1241166 (2014).
Aken BL et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017). PubMed PMC
Poznik GD et al. Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females. Science 341, 562–565 (2013). PubMed PMC
McKenna A et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). PubMed PMC
Schatz MC et al. Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space. Cell Genomics 2, 100085 (2022). PubMed PMC
Danecek P et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021). PubMed PMC
Talenti A & Prendergast J nf-LO: A Scalable, Containerized Workflow for Genome-to-Genome Lift Over. Genome Biol. Evol 13, evab183 (2021). PubMed PMC
Guarracino A, Mwaniki N, Marco-Sola S, & Garrison E wfmash: whole-chromosome pairwise alignment using the hierarchical wavefront algorithm. GitHub https://github.com/ekg/wfmash (2021).
Sherry ST, Ward M & Sirotkin K dbSNP—Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. Genome Res. 9, 677–679 (1999). PubMed
Landrum MJ et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018). PubMed PMC
Buniello A et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019). PubMed PMC
Van der Auwera GA & O’Connor BD. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. (O’Reilly Media, 2020).
Langmead B & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). PubMed PMC
Ramírez F et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016). PubMed PMC
Zhao H et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–1007 (2014). PubMed PMC
Marçais G et al. MUMmer4: A fast and versatile genome alignment system. PLOS Comput. Biol 14, e1005944 (2018). PubMed PMC
Ondov BD, Bergman NH & Phillippy AM Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12, 385 (2011). PubMed PMC
Falconer E et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat. Methods 9, 1107–1112 (2012). PubMed PMC
Rhie Arang. (2023). Repositories for the analysis of T2T-Y and T2T-CHM13v2.0. Zenodo. 10.5281/zenodo.8136598 DOI