Protein structural families are groups of homologous proteins defined by the organization of secondary structure elements (SSEs). Nowadays, many families contain vast numbers of structures, and the SSEs can help to orient within them. Communities around specific protein families have even developed specialized SSE annotations, always assigning the same name to the equivalent SSEs in homologous proteins. A detailed analysis of the groups of equivalent SSEs provides an overview of the studied family and enriches the analysis of any particular protein at hand. We developed a workflow for the analysis of the secondary structure anatomy of a protein family. We applied this analysis to the model family of cytochromes P450 (CYPs)-a family of important biotransformation enzymes with a community-wide used SSE annotation. We report the occurrence, typical length and amino acid sequence for the equivalent SSE groups, the conservation/variability of these properties and relationship to the substrate recognition sites. We also suggest a generic residue numbering scheme for the CYP family. Comparing the bacterial and eukaryotic part of the family highlights the significant differences and reveals a well-known anomalous group of bacterial CYPs with some typically eukaryotic features. Our workflow for SSE annotation for CYP and other families can be freely used at address https://sestra.ncbr.muni.cz .
CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.
- MeSH
- anotace sekvence MeSH
- COVID-19 epidemiologie prevence a kontrola virologie MeSH
- databáze proteinů statistika a číselné údaje MeSH
- epidemie MeSH
- internet MeSH
- lidé MeSH
- proteinové domény * MeSH
- proteiny chemie genetika metabolismus MeSH
- SARS-CoV-2 genetika metabolismus fyziologie MeSH
- sekvence aminokyselin MeSH
- sekvenční analýza proteinů metody MeSH
- sekvenční homologie aminokyselin MeSH
- virové proteiny chemie genetika metabolismus MeSH
- výpočetní biologie metody statistika a číselné údaje MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API.
- MeSH
- anotace sekvence MeSH
- databáze jako téma MeSH
- databáze proteinů * MeSH
- internet MeSH
- konformace proteinů, alfa-helix MeSH
- konformace proteinů, beta-řetězec MeSH
- lidé MeSH
- molekulární modely MeSH
- počítačová grafika MeSH
- proteiny chemie genetika metabolismus MeSH
- sekvence aminokyselin MeSH
- sekvenční analýza proteinů metody MeSH
- šíření informací MeSH
- uživatelské rozhraní počítače * MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Evropa MeSH
Summary: CrocoBLAST is a tool for dramatically speeding up BLAST+ execution on any computer. Alignments that would take days or weeks with NCBI BLAST+ can be run overnight with CrocoBLAST. Additionally, CrocoBLAST provides features critical for NGS data analysis, including: results identical to those of BLAST+; compatibility with any BLAST+ version; real-time information regarding calculation progress and remaining run time; access to partial alignment results; queueing, pausing, and resuming BLAST+ calculations without information loss. Availability and implementation: CrocoBLAST is freely available online, with ample documentation (webchem.ncbr.muni.cz/Platform/App/CrocoBLAST). No installation or user registration is required. CrocoBLAST is implemented in C, while the graphical user interface is implemented in Java. CrocoBLAST is supported under Linux and Windows, and can be run under Mac OS X in a Linux virtual machine. Contact: jkoca@ceitec.cz. Supplementary information: Supplementary data are available at Bioinformatics online.
We proposed here a sequence-based approach predicting some microorganisms as possible sources of autoantigen-related molecular mimicry concerning Idiopathic Pulmonary Arterial Hypertension (IPAH) and related hypertension mostly accompanying autoimmune diseases and AIDS (APAH). This approach (SPECIES_VALENCE) processes the database occurrences of linear autoepitope-related short Dense Quasi-Pattern Sequences (DQPA) generated based on identities of important autoantigenic sequences. The corresponding enumeration comprises two types of statistical evaluations performed in each of eight proposed models. Based on this enumeration, we selected nine microorganisms, whereas revaluation of the obtained scoring values restricted Pseudomonas aeruginosa, Aspergillus fumigatus and the two co-infecting herpes viruses (Epstein Barr virus and cytomegalovirus) as most favourable. The results are discussed in terms of (a) the validity of increased DQPA occurrence in functionally correlated sequences, (b) the possible mechanisms leading to autoantibody response, (c) selected additional pathogenic effects of predicted microorganisms and (d) possible effects of cross-reactivities and immune tolerance.
- MeSH
- druhová specificita MeSH
- epitopy chemie genetika imunologie MeSH
- familiární plicní arteriální hypertenze genetika imunologie mikrobiologie MeSH
- konzervovaná sekvence MeSH
- lidé MeSH
- molekulární mimikry genetika imunologie MeSH
- molekulární sekvence - údaje MeSH
- sekvence aminokyselin MeSH
- sekvenční analýza proteinů metody MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
UNLABELLED: Nowadays the most used pipeline for protein identification consists in the comparison of the MS/MS spectra to reference databases. Search algorithms compare obtained spectra to an in silico digestion of a sequence database to find exact matches. In this context, the database has a paramount importance and will determine in a great deal the number of identifications and its quality, being this especially relevant for non-model plant species. Using a single Viridiplantae database (NCBI, UniProt) and TAIR is not the best choice for non-model species since they are underrepresented in databases resulting in poor identification rates. We demonstrate how it is possible to improve the rate and quality of identifications in two orphan species, Quercus ilex and Pinus radiata, by using SEQUEST and a combination of public (Viridiplantae NCBI, UniProt) and a custom-built specific database which contained 593,294 and 455,096 peptide sequences (Quercus and Pinus, respectively). These databases were built after gathering and processing (trimming, contiging, 6-frame translation) publicly available RNA sequences, mostly ESTs and NGS reads. A total of 149 and 1533 proteins were identified from Quercus seeds and Pinus needles, representing a 3.1- or 1.5-fold increase in the number of protein identifications and scores compared to the use of a single database. Since this approach greatly improves the identification rate, and is not significantly more complicated or time consuming than other approaches, we recommend its routine use when working with non-model species. BIOLOGICAL SIGNIFICANCE: In this work we demonstrate how the construction of a custom database (DB) gathering all available RNA sequences and its use in combination with Viridiplantae public DBs (NCBI, UniProt) significantly improve protein identification when working with non-model species. Protein identification rate and quality is higher to those obtained in routine procedures based on using only one database (commonly Viridiplantae from NCBI), as we demonstrated analyzing Quercus seeds and Pine needles. The proposed approach based on the building of a custom database is not difficult or time consuming, so we recommend its routine use when working with non-model species. This article is part of a Special Issue entitled: Proteomics of non-model organisms.
- MeSH
- borovice genetika metabolismus MeSH
- databáze proteinů * MeSH
- dub (rod) genetika metabolismus MeSH
- proteom genetika metabolismus MeSH
- proteomika metody MeSH
- rostlinné proteiny genetika metabolismus MeSH
- sekvence aminokyselin MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza proteinů metody MeSH
- sekvenční analýza RNA metody MeSH
- semena rostlinná genetika metabolismus MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
- práce podpořená grantem MeSH
Various sources of protein data, such as knowledgebases and scientific literature, are currently available, as are numerous tools for their analysis. The matter becomes one of choosing the tools that are most appropriate for the specific task and for the specific proteins. A combination of standard and alternative tools may lead to biologically significant results. Here, a computational classification of proteins is made using standard multiple sequence alignment in combination with an alternative method for analysis of hydropathy distribution in proteins. Both of these methods are applied to the Na+/Cl--dependent neurotransmitter symporters (NSSs), resulting in two alternative classifications. The classifications are validated and interpreted biologically by literature and knowledgebase annotation mining, producing a consensus classification. The classification leads to the identification and functional characterization of three families of largely structurally and functionally uncharacterized orphan NSSs. The literature and knowledgebase annotations are mined to functionally characterize the NSSs in these families. The presented work also demonstrates that, in specific cases, the analysis of the hydropathy distribution in proteins is capable of revealing functional properties of proteins.
- MeSH
- databáze proteinů MeSH
- financování organizované MeSH
- hydrofobní a hydrofilní interakce MeSH
- mapování interakce mezi proteiny klasifikace MeSH
- proteiny přenášející neurotransmitery přes plazmatickou membránu klasifikace metabolismus MeSH
- sekvenční analýza proteinů metody MeSH
- sekvenční seřazení MeSH
- výpočetní biologie metody MeSH
- znalostní báze MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- srovnávací studie MeSH
- validační studie MeSH
Open tubular capillary enzyme reactors were studied for rapid protein digestion and possible on-line integration into a CE/ESI/MS system. The need to minimize the time of the analyte molecules to diffuse towards the surface immobilized enzyme and to maximize the surface-to-volume (S/V) ratio of the open tubular reactors dictated the use of very narrow bore capillaries. Extremely small protein amounts (atto-femtomoles loaded) could be digested with enzymes immobilized directly on the inside wall of a 10 microm I.D. capillary. Covalently immobilized L-1-tosylamido-2-phenylethyl chloromethyl ketone (TPCK)-trypsin and pepsin A were tested for the surface immobilization. The enzymatic activity was characterized in the flow-through mode with on-line coupling to electrospray ionization-time of flight-mass spectrometer (ESI/TOF-MS) under a range of protein concentrations, buffer pH's, temperatures and reaction times. The optimized reactors were tested as the nanospray needles for fast identification of proteins using CE-ESI/TOF-MS.
- MeSH
- biokompatibilní potahované materiály MeSH
- elektroforéza kapilární metody MeSH
- elektroforéza mikročipová MeSH
- elektrolyty MeSH
- enzymy imobilizované chemická syntéza klasifikace MeSH
- financování organizované MeSH
- hmotnostní spektrometrie s elektrosprejovou ionizací metody MeSH
- mikrochemie MeSH
- on-line systémy MeSH
- pepsin A metabolismus MeSH
- proteiny analýza chemie MeSH
- sekvenční analýza proteinů metody MeSH
- senzitivita a specificita MeSH
- studie proveditelnosti MeSH
- trypsin metabolismus MeSH
3rd ed. xviii, 540 s. : il. ; 29 cm