SARS-CoV-2 is a novel positive-sense single-stranded RNA virus from the Coronaviridae family (genus Betacoronavirus), which has been established as causing the COVID-19 pandemic. The genome of SARS-CoV-2 is one of the largest among known RNA viruses, comprising of at least 26 known protein-coding loci. Studies thus far have outlined the coding capacity of the positive-sense strand of the SARS-CoV-2 genome, which can be used directly for protein translation. However, it has been recently shown that transcribed negative-sense viral RNA intermediates that arise during viral genome replication from positive-sense viruses can also code for proteins. No studies have yet explored the potential for negative-sense SARS-CoV-2 RNA intermediates to contain protein-coding loci. Thus, using sequence and structure-based bioinformatics methodologies, we have investigated the presence and validity of putative negative-sense ORFs (nsORFs) in the SARS-CoV-2 genome. Nine nsORFs were discovered to contain strong eukaryotic translation initiation signals and high codon adaptability scores, and several of the nsORFs were predicted to interact with RNA-binding proteins. Evolutionary conservation analyses indicated that some of the nsORFs are deeply conserved among related coronaviruses. Three-dimensional protein modeling revealed the presence of higher order folding among all putative SARS-CoV-2 nsORFs, and subsequent structural mimicry analyses suggest similarity of the nsORFs to DNA/RNA-binding proteins and proteins involved in immune signaling pathways. Altogether, these results suggest the potential existence of still undescribed SARS-CoV-2 proteins, which may play an important role in the viral lifecycle and COVID-19 pathogenesis.
- MeSH
- COVID-19 * genetika MeSH
- genom virový MeSH
- lidé MeSH
- pandemie MeSH
- proteiny vázající RNA genetika MeSH
- RNA virová chemie genetika MeSH
- SARS-CoV-2 * genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Due to the fast global spreading of the Severe Acute Respiratory Syndrome Coronavirus - 2 (SARS-CoV-2), prevention and treatment options are direly needed in order to control infection-related morbidity, mortality, and economic losses. Although drug and inactivated and attenuated virus vaccine development can require significant amounts of time and resources, DNA and RNA vaccines offer a quick, simple, and cheap treatment alternative, even when produced on a large scale. The spike protein, which has been shown as the most antigenic SARS-CoV-2 protein, has been widely selected as the target of choice for DNA/RNA vaccines. Vaccination campaigns have reported high vaccination rates and protection, but numerous unintended effects, ranging from muscle pain to death, have led to concerns about the safety of RNA/DNA vaccines. In parallel to these studies, several open reading frames (ORFs) have been found to be overlapping SARS-CoV-2 accessory genes, two of which, ORF2b and ORF-Sh, overlap the spike protein sequence. Thus, the presence of these, and potentially other ORFs on SARS-CoV-2 DNA/RNA vaccines, could lead to the translation of undesired proteins during vaccination. Herein, we discuss the translation of overlapping genes in connection with DNA/RNA vaccines. Two mRNA vaccine spike protein sequences, which have been made publicly-available, were compared to the wild-type sequence in order to uncover possible differences in putative overlapping ORFs. Notably, the Moderna mRNA-1273 vaccine sequence is predicted to contain no frameshifted ORFs on the positive sense strand, which highlights the utility of codon optimization in DNA/RNA vaccine design to remove undesired overlapping ORFs. Since little information is available on ORF2b or ORF-Sh, we use structural bioinformatics techniques to investigate the structure-function relationship of these proteins. The presence of putative ORFs on DNA/RNA vaccine candidates implies that overlapping genes may contribute to the translation of smaller peptides, potentially leading to unintended clinical outcomes, and that the protein-coding potential of DNA/RNA vaccines should be rigorously examined prior to administration.
- MeSH
- DNA vakcíny škodlivé účinky genetika MeSH
- glykoprotein S, koronavirus genetika MeSH
- kodon MeSH
- konformace nukleové kyseliny MeSH
- lidé MeSH
- messenger RNA MeSH
- mRNA vakcíny škodlivé účinky genetika MeSH
- otevřené čtecí rámce MeSH
- překrývající se geny * MeSH
- proteinové domény MeSH
- proteosyntéza MeSH
- vakcíny proti COVID-19 škodlivé účinky genetika MeSH
- virové geny * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
SARS-CoV-2 is an intensively investigated virus from the order Nidovirales (Coronaviridae family) that causes COVID-19 disease in humans. Through enormous scientific effort, thousands of viral strains have been sequenced to date, thereby creating a strong background for deep bioinformatics studies of the SARS-CoV-2 genome. In this study, we inspected high-frequency mutations of SARS-CoV-2 and carried out systematic analyses of their overlay with inverted repeat (IR) loci and CpG islands. The main conclusion of our study is that SARS-CoV-2 hot-spot mutations are significantly enriched within both IRs and CpG island loci. This points to their role in genomic instability and may predict further mutational drive of the SARS-CoV-2 genome. Moreover, CpG islands are strongly enriched upstream from viral ORFs and thus could play important roles in transcription and the viral life cycle. We hypothesize that hypermethylation of these loci will decrease the transcription of viral ORFs and could therefore limit the progression of the disease.
- MeSH
- COVID-19 virologie MeSH
- CpG ostrůvky * MeSH
- genom virový MeSH
- lidé MeSH
- metylace DNA MeSH
- mutace * MeSH
- SARS-CoV-2 genetika MeSH
- vazba proteinů MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Non-canonical nucleic acid structures play important roles in the regulation of molecular processes. Considering the importance of the ongoing coronavirus crisis, we decided to evaluate genomes of all coronaviruses sequenced to date (stated more broadly, the order Nidovirales) to determine if they contain non-canonical nucleic acid structures. We discovered much evidence of putative G-quadruplex sites and even much more of inverted repeats (IRs) loci, which in fact are ubiquitous along the whole genomic sequence and indicate a possible mechanism for genomic RNA packaging. The most notable enrichment of IRs was found inside 5'UTR for IRs of size 12+ nucleotides, and the most notable enrichment of putative quadruplex sites (PQSs) was located before 3'UTR, inside 5'UTR, and before mRNA. This indicates crucial regulatory roles for both IRs and PQSs. Moreover, we found multiple G-quadruplex binding motifs in human proteins having potential for binding of SARS-CoV-2 RNA. Non-canonical nucleic acids structures in Nidovirales and in novel SARS-CoV-2 are therefore promising druggable structures that can be targeted and utilized in the future.
- Publikační typ
- časopisecké články MeSH