Satellite DNAs (satDNA) are tandemly arrayed repeated sequences largely present in eukaryotic genomes, which play important roles in genome evolution and function, and therefore, their analysis is vital. Here, we describe the isolation of a novel satellite DNA family (PMSat) from the rodent Peromyscus eremicus (Cricetidae, Rodentia), which is located in pericentromeric regions and exhibits a typical satellite DNA genome organization. Orthologous PMSat sequences were isolated and characterized from three species belonging to Cricetidae: Cricetus cricetus, Phodopus sungorus and Microtus arvalis. In these species, PMSat is highly conserved, with the absence of fixed species-specific mutations. Strikingly, different numbers of copies of this sequence were found among the species, suggesting evolution by copy number fluctuation. Repeat units of PMSat were also found in the Peromyscus maniculatus bairdii BioProject, but our results suggest that these repeat units are from genome regions outside the pericentromere. The remarkably high evolutionary sequence conservation along with the preservation of a few numbers of copies of this sequence in the analyzed genomes may suggest functional significance but a different sequence nature/organization. Our data highlight that repeats are difficult to analyze due to the limited tools available to dissect genomes and the fact that assemblies do not cover regions of constitutive heterochromatin.
- Keywords
- Copy number, Laser microdissection, Rodentia, Satellite DNA,
- MeSH
- Species Specificity MeSH
- Phylogeny MeSH
- Physical Chromosome Mapping MeSH
- Genome * MeSH
- Gene Dosage * MeSH
- Peromyscus genetics MeSH
- Evolution, Molecular * MeSH
- Molecular Sequence Data MeSH
- Computer Simulation MeSH
- Restriction Mapping MeSH
- DNA, Satellite genetics isolation & purification MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Sequence Alignment MeSH
- Blotting, Southern MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Satellite MeSH
Protein evolution and protein engineering techniques are of great interest in basic science and industrial applications such as pharmacology, medicine, or biotechnology. Ancestral sequence reconstruction (ASR) is a powerful technique for probing evolutionary relationships and engineering robust proteins with good thermostability and broad substrate specificity. The following protocol describes the setting up and execution of an automated FireProtASR workflow using a dedicated web site. The service allows for inference of ancestral proteins automatically, from a single protein sequence. Once a protein sequence is submitted, the server will build a dataset of homology sequences, perform a multiple sequence alignment (MSA), build a phylogenetic tree, and reconstruct ancestral nodes. The protocol is also highly flexible and allows for multiple forms of input, advanced settings, and the ability to start jobs from: (i) a single sequence, (ii) a set of homologous sequences, (iii) an MSA, and (iv) a phylogenetic tree. This approach automates all necessary steps and offers a way for novices with limited exposure to ASR techniques to improve the properties of a protein of interest. The technique can even be used to introduce catalytic promiscuity into an enzyme. A web server for accessing the fully automated workflow is freely accessible at https://loschmidt.chemi.muni.cz/fireprotasr/. © 2021 Wiley Periodicals LLC. Basic Protocol: ASR using the Web Server FireProtASR.
- Keywords
- ancestral sequence reconstruction, automation, protein engineering, protein evolution, thermostability,
- MeSH
- Phylogeny MeSH
- Evolution, Molecular * MeSH
- Proteins * genetics MeSH
- Amino Acid Sequence MeSH
- Sequence Alignment MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Proteins * MeSH
Linear chromosomes of eukaryotic organisms invariably possess centromeres and telomeres to ensure proper chromosome segregation during nuclear divisions and to protect the chromosome ends from deterioration and fusion, respectively. While centromeric sequences may differ between species, with arrays of tandemly repeated sequences and retrotransposons being the most abundant sequence types in plant centromeres, telomeric sequences are usually highly conserved among plants and other organisms. The genome size of the carnivorous genus Genlisea (Lentibulariaceae) is highly variable. Here we study evolutionary sequence plasticity of these chromosomal domains at an intrageneric level. We show that Genlisea nigrocaulis (1C = 86 Mbp; 2n = 40) and G. hispidula (1C = 1550 Mbp; 2n = 40) differ as to their DNA composition at centromeres and telomeres. G. nigrocaulis and its close relative G. pygmaea revealed mainly 161 bp tandem repeats, while G. hispidula and its close relative G. subglabra displayed a combination of four retroelements at centromeric positions. G. nigrocaulis and G. pygmaea chromosome ends are characterized by the Arabidopsis-type telomeric repeats (TTTAGGG); G. hispidula and G. subglabra instead revealed two intermingled sequence variants (TTCAGG and TTTCAGG). These differences in centromeric and, surprisingly, also in telomeric DNA sequences, uncovered between groups with on average a > 9-fold genome size difference, emphasize the fast genome evolution within this genus. Such intrageneric evolutionary alteration of telomeric repeats with cytosine in the guanine-rich strand, not yet known for plants, might impact the epigenetic telomere chromatin modification.
- Keywords
- G. hispidula, Genlisea nigrocaulis, Lentibulariaceae, centromeric retrotransposons, centromeric tandem repeat, genome evolution, plant telomeric repeat variants, telomerase,
- MeSH
- Biological Evolution * MeSH
- Time Factors MeSH
- Centromere genetics MeSH
- Chromosomes, Plant genetics MeSH
- Species Specificity MeSH
- Genetic Variation MeSH
- Genome, Plant genetics physiology MeSH
- Magnoliopsida genetics physiology MeSH
- Molecular Sequence Data MeSH
- Base Sequence MeSH
- Telomere genetics MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
The nucleic acid sequence of the preproinsulin cDNA of carp (Cyprinus carpio), cloned in the PstI site of pBR322 ( Liebscher et al. 1980), has been determined. The sequenced insert of 439 bp includes the complete coding information for carp preproinsulin (108 amino acids), 10 nucleotides of the 5'-and 105 nucleotides of the 3'-nontranslated regions. The nucleotide sequence confirms the previously established amino acid sequence of carp insulin ( Makower et al. 1982) and determines those of the signal 21 amino acids and C peptide (35 amino acids). The observed shortness of the signal peptide of carp preproinsulin and the N-terminal addition of 2 amino acids to the carp insulin B chain suggest that the cleavage site of the signal peptidase has moved. Calculations based on the comparison of known preproinsulin cDNA sequences showed that the evolutionary distance between fresh water and salt water teleostians is not smaller than that between man and chicken.
- MeSH
- Biological Evolution * MeSH
- C-Peptide MeSH
- Cyprinidae genetics MeSH
- DNA MeSH
- Insulin genetics MeSH
- Carps genetics MeSH
- Cloning, Molecular MeSH
- Rats MeSH
- Chickens MeSH
- Humans MeSH
- Peptides MeSH
- Proinsulin genetics MeSH
- Protein Precursors genetics MeSH
- Protein Sorting Signals MeSH
- Fishes MeSH
- Amino Acid Sequence MeSH
- Base Sequence MeSH
- Animals MeSH
- Check Tag
- Rats MeSH
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Comparative Study MeSH
- Names of Substances
- C-Peptide MeSH
- DNA MeSH
- Insulin MeSH
- Peptides MeSH
- preproinsulin MeSH Browser
- Proinsulin MeSH
- Protein Precursors MeSH
- Protein Sorting Signals MeSH
The quest to predict and understand protein evolution has been hindered by limitations on both the theoretical and the experimental fronts. Most existing theoretical models of evolution are descriptive, rather than predictive, leaving the final modifications in the hands of researchers. Existing experimental techniques to help probe the evolutionary sequence space of proteins, such as directed evolution, are resource-intensive and require specialised skills. We present the successor sequence predictor (SSP) as an innovative solution. Successor sequence predictor is an in silico protein design method that mimics laboratory-based protein evolution by reconstructing a protein's evolutionary history and suggesting future amino acid substitutions based on trends observed in that history through carefully selected physicochemical descriptors. This approach enhances specialised proteins by predicting mutations that improve desired properties, such as thermostability, activity, and solubility. Successor Sequence Predictor can thus be used as a general protein engineering tool to develop practically useful proteins. The code of the Successor Sequence Predictor is provided at https://github.com/loschmidt/successor-sequence-predictor , and the design of mutations will be also possible via an easy-to-use web server https://loschmidt.chemi.muni.cz/fireprotasr/ . SCIENTIFIC CONTRIBUTION: The Successor Sequence Predictor advances protein evolution prediction at the amino acid level by integrating ancestral sequence reconstruction with a novel in silico approach that models evolutionary trends through selected physicochemical descriptors. Unlike prior work, SSP can forecast future amino acid substitutions that enhance protein properties such as thermostability, activity, and solubility. This method reduces reliance on resource-intensive directed evolution techniques while providing a generalizable, predictive tool for protein engineering.
- Keywords
- Activity, Adaptation, Evolution, Evolutionary trajectory, Protein design, Solubility, Thermostability,
- Publication type
- Journal Article MeSH
Karyotypic changes in chromosome number and structure are drivers in the divergent evolution of diverse plant species and lineages. This study aimed to reveal the origins of the unique karyotype (2n = 12) and phylogenetic relationships of the genus Megadenia (Brassicaceae). A high-quality chromosome-scale genome was assembled for Megadenia pygmaea using Nanopore long reads and high-throughput chromosome conformation capture (Hi-C). The assembled genome is 215.2 Mb and is anchored on six pseudochromosomes. We annotated a total of 25,607 high-confidence protein-coding genes and corroborated the phylogenetic affinity of Megadenia with the Brassicaceae expanded lineage II, containing numerous agricultural crops. We dated the divergence of Megadenia from its closest relatives to 27.04 (19.11-36.60) million years ago. A reconstruction of the chromosomal composition of the species was performed based on the de novo assembled genome and comparative chromosome painting analysis. The karyotype structure of M. pygmaea is very similar to the previously inferred proto-Calepineae karyotype (PCK; n = 7) of the lineage II. However, an end-to-end translocation between two ancestral chromosomes reduced the chromosome number from n = 7 to n = 6 in Megadenia. Our reference genome provides fundamental information for karyotypic evolution and evolutionary study of this genus.
The rightmost 2016 bp of the Bacillus subtilis phage phi 15 genome were sequenced. The nucleotide sequence was compared with the homologous regions of the related phages PZA and phi 29. There are six open reading frames (ORFs) in this region of the phi 15 genome; all of them are present in the PZA and phi 29 genomes. One of the ORFs was assigned to gene 17, which is involved in the replication of the phage DNA. Gene 17 has undergone reorganization during the evolution of this phage family. Comparison of the nucleotide sequence of its mRNA-like strand in phi 15, PZA and phi 29 showed that deletions in its central and 3'-end-proximal parts are tolerated and do not interfere with the gene 17 product function. It seems that the only portion of gene 17 that has to be conserved to encode the functional product is its 5'-end-proximal part.
- MeSH
- Bacillus subtilis genetics MeSH
- Bacteriophages genetics MeSH
- Biological Evolution * MeSH
- DNA, Viral MeSH
- Gene Rearrangement * MeSH
- Molecular Sequence Data MeSH
- Restriction Mapping MeSH
- Amino Acid Sequence MeSH
- Base Sequence MeSH
- Sequence Homology, Nucleic Acid MeSH
- Genes, Viral * MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA, Viral MeSH
In recent years, considerable progress has been made in topologically and functionally characterizing integral outer membrane proteins (OMPs) of Treponema pallidum subspecies pallidum, the syphilis spirochete, and identifying its surface-exposed β-barrel domains. Extracellular loops in OMPs of Gram-negative bacteria are known to be highly variable. We examined the sequence diversity of β-barrel-encoding regions of tprC, tprD, and bamA in 31 specimens from Cali, Colombia; San Francisco, California; and the Czech Republic and compared them to allelic variants in the 41 reference genomes in the NCBI database. To establish a phylogenetic framework, we used T. pallidum 0548 (tp0548) genotyping and tp0558 sequences to assign strains to the Nichols or SS14 clades. We found that (i) β-barrels in clinical strains could be grouped according to allelic variants in T. pallidum subsp. pallidum reference genomes; (ii) for all three OMP loci, clinical strains within the Nichols or SS14 clades often harbored β-barrel variants that differed from the Nichols and SS14 reference strains; and (iii) OMP variable regions often reside in predicted extracellular loops containing B-cell epitopes. On the basis of structural models, nonconservative amino acid substitutions in predicted transmembrane β-strands of T. pallidum repeat C (TprC) and TprD2 could give rise to functional differences in their porin channels. OMP profiles of some clinical strains were mosaics of different reference strains and did not correlate with results from enhanced molecular typing. Our observations suggest that human host selection pressures drive T. pallidum subsp. pallidum OMP diversity and that genetic exchange contributes to the evolutionary biology of T. pallidum subsp. pallidum They also set the stage for topology-based analysis of antibody responses to OMPs and help frame strategies for syphilis vaccine development.IMPORTANCE Despite recent progress characterizing outer membrane proteins (OMPs) of Treponema pallidum, little is known about how their surface-exposed, β-barrel-forming domains vary among strains circulating within high-risk populations. In this study, sequences for the β-barrel-encoding regions of three OMP loci, tprC, tprD, and bamA, in T. pallidum subsp. pallidum isolates from a large number of patient specimens from geographically disparate sites were examined. Structural models predict that sequence variation within β-barrel domains occurs predominantly within predicted extracellular loops. Amino acid substitutions in predicted transmembrane strands that could potentially affect porin channel function were also noted. Our findings suggest that selection pressures exerted within human populations drive T. pallidum subsp. pallidum OMP diversity and that recombination at OMP loci contributes to the evolutionary biology of syphilis spirochetes. These results also set the stage for topology-based analysis of antibody responses that promote clearance of T. pallidum subsp. pallidum and frame strategies for vaccine development based upon conserved OMP extracellular loops.
- Keywords
- Treponema pallidum, molecular subtyping, outer membrane proteins, spirochetes, syphilis,
- MeSH
- Phylogeny MeSH
- Genetic Variation MeSH
- Humans MeSH
- Evolution, Molecular * MeSH
- Molecular Sequence Data MeSH
- Protein Domains MeSH
- Bacterial Outer Membrane Proteins chemistry genetics metabolism MeSH
- Amino Acid Sequence MeSH
- Base Sequence MeSH
- Sequence Alignment MeSH
- Spirochaetales classification genetics growth & development isolation & purification MeSH
- Syphilis microbiology MeSH
- Treponema pallidum classification genetics growth & development isolation & purification MeSH
- Check Tag
- Humans MeSH
- Male MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
- Names of Substances
- Bacterial Outer Membrane Proteins MeSH
All grass species evolved from an ancestor that underwent a whole-genome duplication (WGD) approximately 70 million years ago. Interestingly, the short arms of rice chromosomes 11 and 12 (and independently their homologs in sorghum) were found to be much more similar to each other than other homeologous regions within the duplicated genome. Based on detailed analysis of rice chromosomes 11 and 12 and their homologs in seven grass species, we propose a mechanism that explains the apparently 'younger' age of the duplication in this region of the genome, assuming a small number of reciprocal translocations at the chromosome termini. In each case the translocations were followed by unbalanced transmission and subsequent lineage sorting of the involved chromosomes to offspring. Molecular dating of these translocation events also allowed us to date major chromosome 'fusions' in the evolutionary lineages that led to Brachypodium and Triticeae. Furthermore, we provide evidence that rice is exceptional regarding the evolution of chromosomes 11 and 12, inasmuch as in other species the process of sequence exchange between homeologous chromosomes ceased much earlier than in rice. We presume that random events rather than selective forces are responsible for the observed high similarity between the short arm ends of rice chromosomes 11 and 12.
- Keywords
- genome evolution, grass ancestor, inter-homeolog recombination, reciprocal translocation, whole-genome duplication,
- MeSH
- Time Factors MeSH
- Chromosomes, Plant genetics MeSH
- Species Specificity MeSH
- Gene Duplication MeSH
- Phylogeny MeSH
- Genetic Variation MeSH
- Genome, Plant genetics MeSH
- Poaceae classification genetics MeSH
- Evolution, Molecular * MeSH
- Molecular Sequence Data MeSH
- Recombination, Genetic MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Sequence Homology, Nucleic Acid MeSH
- Selection, Genetic MeSH
- Synteny MeSH
- Translocation, Genetic MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
B chromosomes are enigmatic elements in thousands of plant and animal genomes that persist in populations despite being nonessential. They circumvent the laws of Mendelian inheritance but the molecular mechanisms underlying this behavior remain unknown. Here we present the sequence, annotation, and analysis of the maize B chromosome providing insight into its drive mechanism. The sequence assembly reveals detailed locations of the elements involved with the cis and trans functions of its drive mechanism, consisting of nondisjunction at the second pollen mitosis and preferential fertilization of the egg by the B-containing sperm. We identified 758 protein-coding genes in 125.9 Mb of B chromosome sequence, of which at least 88 are expressed. Our results demonstrate that transposable elements in the B chromosome are shared with the standard A chromosome set but multiple lines of evidence fail to detect a syntenic genic region in the A chromosomes, suggesting a distant origin. The current gene content is a result of continuous transfer from the A chromosomal complement over an extended evolutionary time with subsequent degradation but with selection for maintenance of this nonvital chromosome.
- Keywords
- B chromosome, genetic drive, nondisjunction, preferential fertilization,
- MeSH
- Chromosomes, Plant genetics MeSH
- Zea mays genetics MeSH
- Meiosis genetics MeSH
- Mitosis genetics MeSH
- Evolution, Molecular * MeSH
- Pollen genetics MeSH
- Pregnancy Proteins genetics MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- Names of Substances
- Pregnancy Proteins MeSH