Nejvíce citovaný článek - PubMed ID 32902599
Fundamentally different repetitive element composition of sex chromosomes in Rumex acetosa
BACKGROUND: Long terminal repeats (LTRs) represent important parts of LTR retrotransposons and retroviruses found in high copy numbers in a majority of eukaryotic genomes. LTRs contain regulatory sequences essential for the life cycle of the retrotransposon. Previous experimental and sequence studies have provided only limited information about LTR structure and composition, mostly from model systems. To enhance our understanding of these key sequence modules, we focused on the contrasts between LTRs of various retrotransposon families and other genomic regions. Furthermore, this approach can be utilized for the classification and prediction of LTRs. RESULTS: We used machine learning methods suitable for DNA sequence classification and applied them to a large dataset of plant LTR retrotransposon sequences. We trained three machine learning models using (i) traditional model ensembles (Gradient Boosting), (ii) hybrid convolutional/long and short memory network models, and (iii) a DNA pre-trained transformer-based model using k-mer sequence representation. All three approaches were successful in classifying and isolating LTRs in this data, as well as providing valuable insights into LTR sequence composition. The best classification (expressed as F1 score) achieved for LTR detection was 0.85 using the hybrid network model. The most accurate classification task was superfamily classification (F1=0.89) while the least accurate was family classification (F1=0.74). The trained models were subjected to explainability analysis. Positional analysis identified a mixture of interesting features, many of which had a preferred absolute position within the LTR and/or were biologically relevant, such as a centrally positioned TATA-box regulatory sequence, and TG..CA nucleotide patterns around both LTR edges. CONCLUSIONS: Our results show that the models used here recognized biologically relevant motifs, such as core promoter elements in the LTR detection task, and a development and stress-related subclass of transcription factor binding sites in the family classification task. Explainability analysis also highlighted the importance of 5'- and 3'- edges in LTR identity and revealed need to analyze more than just dinucleotides at these ends. Our work shows the applicability of machine learning models to regulatory sequence analysis and classification, and demonstrates the important role of the identified motifs in LTR detection.
- Klíčová slova
- CNN-LSTM, DNABERT, Deep learning, Eukaryote, Regulatory mechanisms, Repeat, SHAP score, Sequence analysis, TFBS, Transcription factor binding sites, Transposable elements,
- Publikační typ
- časopisecké články MeSH
Sex chromosomes have evolved in many plant species with separate sexes. Current plant research is shifting from examining the structure of sex chromosomes to exploring their functional aspects. New studies are progressively unveiling the specific genetic and epigenetic mechanisms responsible for shaping distinct sexes in plants. While the fundamental methods of molecular biology and genomics are generally employed for the analysis of sex chromosomes, it is often necessary to modify classical procedures not only to simplify and expedite analyses but sometimes to make them possible at all. In this review, we demonstrate how, at the level of structural and functional genetics, cytogenetics, and bioinformatics, it is essential to adapt established procedures for sex chromosome analysis.
- Klíčová slova
- Bioinformatics, chromosome dissection, cytogenetics, dioecious plants, epigenetics, functional genetics, sex chromosomes, tandem repeats, transposable elements,
- MeSH
- chromozomy rostlin * genetika MeSH
- pohlavní chromozomy * genetika MeSH
- rostliny genetika MeSH
- výpočetní biologie metody MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Telomeres are essential structures formed from satellite DNA repeats at the ends of chromosomes in most eukaryotes. Satellite DNA repeat sequences are useful markers for karyotyping, but have a more enigmatic role in the eukaryotic cell. Much work has been done to investigate the structure and arrangement of repetitive DNA elements in classical models with implications for species evolution. Still more is needed until there is a complete picture of the biological function of DNA satellite sequences, particularly when considering non-model organisms. Celebrating Gregor Mendel's anniversary by going to the roots, this review is designed to inspire and aid new research into telomeres and satellites with a particular focus on non-model organisms and accessible experimental and in silico methods that do not require specialized equipment or expensive materials. We describe how to identify telomere (and satellite) repeats giving many examples of published (and some unpublished) data from these techniques to illustrate the principles behind the experiments. We also present advice on how to perform and analyse such experiments, including details of common pitfalls. Our examples are a selection of recent developments and underexplored areas of research from the past. As a nod to Mendel's early work, we use many examples from plants and insects, especially as much recent work has expanded beyond the human and yeast models traditional in telomere research. We give a general introduction to the accepted knowledge of telomere and satellite systems and include references to specialized reviews for the interested reader.
- Klíčová slova
- FISH, NGS, TRAP, eukaryotic tree of life, interstitial telomere sequences, retroelements, satellite, subtelomere structure, telomerase RNA, telomere evolution,
- MeSH
- DNA MeSH
- lidé MeSH
- repetitivní sekvence nukleových kyselin MeSH
- satelitní DNA * MeSH
- sekvence nukleotidů MeSH
- telomery * genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- přehledy MeSH
- Názvy látek
- DNA MeSH
- satelitní DNA * MeSH
Young sex chromosomes possess unique and ongoing dynamics that allow us to understand processes that have an impact on their evolution and divergence. The genus Silene includes species with evolutionarily young sex chromosomes, and two species of section Melandrium, namely Silene latifolia (24, XY) and Silene dioica (24, XY), are well-established models of sex chromosome evolution, Y chromosome degeneration, and sex determination. In both species, the X and Y chromosomes are strongly heteromorphic and differ in the genomic composition compared to the autosomes. It is generally accepted that for proper cell division, the longest chromosomal arm must not exceed half of the average length of the spindle axis at telophase. Yet, it is not clear what are the dynamics between males and females during mitosis and how the cell compensates for the presence of the large Y chromosome in one sex. Using hydroxyurea cell synchronization and 2D/3D microscopy, we determined the position of the sex chromosomes during the mitotic cell cycle and determined the upper limit for the expansion of sex chromosome non-recombining region. Using 3D specimen preparations, we found that the velocity of the large chromosomes is compensated by the distant positioning from the central interpolar axis, confirming previous mathematical modulations.
- Klíčová slova
- Silene, central interpolar axis, chromosome velocity, sex chromosomes, sister chromatid division,
- MeSH
- chromatidy fyziologie MeSH
- chromozomy rostlin fyziologie MeSH
- hybridizace in situ fluorescenční MeSH
- hydroxymočovina farmakologie MeSH
- konfokální mikroskopie MeSH
- mitóza MeSH
- molekulární evoluce MeSH
- pohlavní chromozomy fyziologie MeSH
- Silene genetika fyziologie MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- hydroxymočovina MeSH