V současné době prudce narůstá množství identifikovaných genů, které jsou-li poškozeny nebo je-li změněna jejich funkce či regulační vztahy způsobují dědičnou predispozici k určitému onemocnění. Molekulárně genetická diagnostika je nyní dostupnou součástí vyšetření u řady geneticky podmíněných chorob. Laboratorní metody umožňují detekci široké škály mutací, které lze obecně definovat jako odchylky od specifické DNA sekvence ve srovnání s referenční sekvencí zveřejněnou v genové databázi. V některých případech je však obtížné rozlišit, zda je detekovaná sekvenční varianta hledanou onemocnění způsobující mutací nebo zda se jedná o neutrální (polymorfní) variantu nemající vztah k onemocnění jedince. Dědičné formy komplexních onemocnění, jako jsou například hereditární formy nádorových onemocnění, jsou z hlediska interpretace závažnosti mutace velmi problematickou skupinou. Další analýzy na DNA a na proteinové úrovni s využitím bioinformatiky však mohou míru patogenity sekvenčních variant nejasného významu odhalit. Určení konkrétní příčiny genetické predispozice k onemocnění a míra patogenity za onemocnění odpovědné mutace má význam pro včasný záchyt jedinců ve velkém riziku onemocnění, pro cílená preventivní a léčebná opatření a umožňuje v závažných případech prenatální nebo případně také preimplantační diagnostiku.
Molecular genetic diagnostics is available for increasing number of genetically determined diseases. Awide spectrum of mutations can be detected by laboratory methods. A mutation can be defined as a change in a specific DNA sequence when compared with the reference sequence published in the gene database. However, in some cases it is difficult to distinguish if the detected sequence variant is a causal mutation or a neutral (polymorphic) variation without any effect on phenotype. The interpretation of rare sequence variants of unknown significance detected in disease-causing genes becomes an increasingly important problem. Further analysis on DNA and on protein levels with the use of bioinformatics are needed to reveal the effect of rare sequence variants. Inherited complex disorders, for example rare hereditary forms of cancer diseases, represent a challenge tomolecular geneticists. The identification of exact causal mutation directly responsible for the development of the disease and for the assessment of disease risk resulting from this genetic variation has further implications. Predictive genetic diagnostics allows identify relatives at high risk of genetically determined disease and use of targeted preventive and therapeutic approaches. In severe cases it allows also prenatal or pre-implantation diagnostics.
Technologie sekvenování DNA nové generace mají v současné době nezastupitelné místo ve výzkumu a postupně nacházejí cestu i do oblasti klinické praxe. Sekvenační přístroje produkují velké množství dat, jejichž analýza metodami bioinformatiky je nezbytná k získání relevantních výsledků. Sekvenování se tak bez pokročilého výpočetního zpracování specializovanými algoritmy naprosto neobejde. V tomto přehledu jsou představeny základní koncepty výpočetního zpracování sekvenačních dat s přihlédnutím ke specifickým aspektům oblasti onkologie. Rovněž jsou uvedeny nejčastější problémy a překážky komplikující zpracování a biologickou interpretaci výsledků.
Next-generation sequencing technologies are currently well‑established in the research field and progressively find their way towards clinical applications. Sequencers produce vast amounts of data and therefore bioinformatics methods are needed for processing. Without computational methods, sequencing would not be able to produce relevant biological information. In this review, we introduce the basics of common NGS‑related bioinformatics methods used in oncological research. We also state some of the common problems complicating data processing and interpretation of the results. Key words: bioinformatics – high‑throughput nucleotide sequencing – mutations – cancer research – clinical application This study was supported by the European Regional Development Fund and the State Budget of the Czech Republic (RECAMO, CZ.1.05/2.1.00/03.0101), by the project MEYS – NPS I – LO1413, MH CZ – DRO (MMCI, 00209805) and BBMRI_CZ (LM2010004). The authors declare they have no potential conflicts of interest concerning drugs, products, or services used in the study. The Editorial Board declares that the manuscript met the ICMJE “uniform requirements” for biomedical papers. Submitted: 21. 4. 2015 Accepted: 26. 6. 2015
BACKGROUND: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). RESULTS: We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. CONCLUSION: The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.
Several new viral infections have emerged in the human population and establishing as global pandemics. With advancements in translation research, the scientific community has developed potential therapeutics to eradicate or control certain viral infections, such as smallpox and polio, responsible for billions of disabilities and deaths in the past. Unfortunately, some viral infections, such as dengue virus (DENV) and human immunodeficiency virus-1 (HIV-1), are still prevailing due to a lack of specific therapeutics, while new pathogenic viral strains or variants are emerging because of high genetic recombination or cross-species transmission. Consequently, to combat the emerging viral infections, bioinformatics-based potential strategies have been developed for viral characterization and developing new effective therapeutics for their eradication or management. This review attempts to provide a single platform for the available wide range of bioinformatics-based approaches, including bioinformatics methods for the identification and management of emerging or evolved viral strains, genome analysis concerning the pathogenicity and epidemiological analysis, computational methods for designing the viral therapeutics, and consolidated information in the form of databases against the known pathogenic viruses. This enriched review of the generally applicable viral informatics approaches aims to provide an overview of available resources capable of carrying out the desired task and may be utilized to expand additional strategies to improve the quality of translation viral informatics research.
Computational methods that allow predicting the effects of nonsynonymous substitutions are an integral part of exome studies. Here, we validated and improved their specificity by performing a comprehensive bioinformatics analysis combined with experimental and clinical data on a model of glucokinase (GCK): 8835 putative variations, including 515 disease-associated variations from 1596 families with diagnoses of monogenic diabetes (GCK-MODY) or persistent hyperinsulinemic hypoglycemia of infancy (PHHI), and 126 variations with available or newly reported (19 variations) data on enzyme kinetics. We also proved that high frequency of disease-associated variations found in patients is closely related to their evolutionary conservation. The default set prediction methods predicted correctly the effects of only a part of the GCK-MODY-associated variations and completely failed to predict the normoglycemic or PHHI-associated variations. Therefore, we calculated evidence-based thresholds that improved significantly the specificity of predictions (≤75%). The combined prediction analysis even allowed to distinguish activating from inactivating variations and identified a group of putatively highly pathogenic variations (EVmutation score <-7.5 and SNAP2 score >70), which were surprisingly underrepresented among MODY patients and thus under negative selection during molecular evolution. We suggested and validated the first robust evidence-based thresholds, which allow improved, highly specific predictions of disease-associated GCK variations.
