Most cited article - PubMed ID 37433640
Accurate sequencing of DNA motifs able to form alternative (non-B) structures
Non-canonical (non-B) DNA structures-e.g. bent DNA, hairpins, G-quadruplexes (G4s), Z-DNA, etc.-which form at certain sequence motifs (e.g. A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies and occupy 9%-15%, 9%-11%, and 12%-38% of autosomes and chromosomes X and Y, respectively. G4s and Z-DNA are enriched at promoters and enhancers, as well as at origins of replication. Repetitive sequences harbor more non-B DNA motifs than non-repetitive sequences, especially in the short arms of acrocentric chromosomes. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
- MeSH
- DNA * chemistry genetics MeSH
- G-Quadruplexes MeSH
- Genome, Human MeSH
- Genome * MeSH
- Hominidae * genetics MeSH
- Humans MeSH
- Nucleotide Motifs MeSH
- Pan troglodytes genetics MeSH
- Repetitive Sequences, Nucleic Acid MeSH
- Telomere * genetics MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA * MeSH
Non-canonical (non-B) DNA structures-e.g., bent DNA, hairpins, G-quadruplexes (G4s), Z-DNA, etc.-which form at certain sequence motifs (e.g., A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies, and occupy 9-15%, 9-11%, and 12-38% of autosomes, and chromosomes X and Y, respectively. G4s and Z-DNA are enriched at promoters and enhancers, as well as at origins of replication. Repetitive sequences harbor more non-B DNA motifs than non-repetitive sequences, especially in the short arms of acrocentric chromosomes. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
- Publication type
- Journal Article MeSH
- Preprint MeSH
Nucleic acids are not only static carriers of genetic information but also play vital roles in controlling cellular lifecycles through their fascinating structural diversity [...].
- MeSH
- DNA * chemistry metabolism MeSH
- Nucleic Acid Conformation * MeSH
- Humans MeSH
- RNA * chemistry metabolism MeSH
- Computational Biology * methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Introductory Journal Article MeSH
- Editorial MeSH
- Names of Substances
- DNA * MeSH
- RNA * MeSH
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
- MeSH
- Genetic Variation genetics MeSH
- Genomics * methods standards MeSH
- Heterochromatin genetics MeSH
- Humans MeSH
- Chromosomes, Human, Y * genetics MeSH
- Multigene Family genetics MeSH
- Genetics, Population MeSH
- Reference Standards MeSH
- DNA, Satellite genetics MeSH
- Segmental Duplications, Genomic genetics MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA * standards MeSH
- Tandem Repeat Sequences genetics MeSH
- Telomere genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DAZ1 protein, human MeSH Browser
- Heterochromatin MeSH
- RBMY1A1 protein, human MeSH Browser
- DNA, Satellite MeSH
- TSPY1 protein, human MeSH Browser