BACKGROUND: One of the major challenges in the analysis of gene expression data is to identify local patterns composed of genes showing coherent expression across subsets of experimental conditions. Such patterns may provide an understanding of underlying biological processes related to these conditions. This understanding can further be improved by providing concise characterizations of the genes and situations delimiting the pattern. RESULTS: We propose a method called semantic biclustering with the aim to detect interpretable rectangular patterns in binary data matrices. As usual in biclustering, we seek homogeneous submatrices, however, we also require that the included elements can be jointly described in terms of semantic annotations pertaining to both rows (genes) and columns (samples). To find such interpretable biclusters, we explore two strategies. The first endows an existing biclustering algorithm with the semantic ingredients. The other is based on rule and tree learning known from machine learning. CONCLUSIONS: The two alternatives are tested in experiments with two Drosophila melanogaster gene expression datasets. Both strategies are shown to detect sets of compact biclusters with semantic descriptions that also remain largely valid for unseen (testing) data. This desirable generalization aspect is more emphasized in the strategy stemming from conventional biclustering although this is traded off by the complexity of the descriptions (number of ontology terms employed), which, on the other hand, is lower for the alternative strategy.
BACKGROUND: Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. METHODS: We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. RESULTS: The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. CONCLUSION: Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.
BACKGROUND: Delayed graft function (DGF) caused by ischemia/reperfusion injury (I/RI) negatively influences the outcome of kidney transplantation. This prospective single-center study characterized the intrarenal transcriptome during I/RI as a means of identifying genes associated with DGF development. METHODS: Characterization of the intrarenal transcription profile associated with I/RI was carried out on three sequential graft biopsies from respective allografts before and during transplantation. The intragraft expression of 92 candidate genes was measured using quantitative real-time reverse transcriptase polymerase chain reaction (2) in delayed (n=9) and primary function allografts (n=26). RESULTS: Cold storage was not associated with significant changes to the expression profile of the target gene transcripts; however, up-regulation of 16 genes associated with enhanced activation of innate and adaptive immune responses and apoptosis was observed after reperfusion. Multivariate logistic regression analysis revealed that higher tubular atrophy scores (ct) together with a lower expression of Netrin-1 might predict DGF development (training area under the receiver operating curve=0.89, cross-validated area under the receiver operating curve=0.81). CONCLUSIONS: Poor baseline tubular cell quality (defined by a higher rate of tubular atrophy) combined with the reduced potential of apoptotic survival factors represented by decreased Netrin-1 gene expression were associated with delayed kidney graft function.
- MeSH
- analýza hlavních komponent MeSH
- atrofie MeSH
- biopsie MeSH
- imunohistochemie MeSH
- ledvinové kanálky patologie MeSH
- lidé MeSH
- logistické modely MeSH
- nádorové supresorové proteiny analýza genetika MeSH
- neurotrofní faktory analýza genetika MeSH
- opožděný nástup funkce štěpu etiologie metabolismus patologie MeSH
- prospektivní studie MeSH
- regulace genové exprese MeSH
- reperfuzní poškození komplikace MeSH
- transplantace ledvin škodlivé účinky MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Publikační typ
- abstrakt z konference MeSH
BACKGROUND: Induction therapy is associated with excellent short-term kidney graft outcome. The aim of this study was to evaluate differences in the intragraft transcriptome after successful induction therapy using two rabbit antithymocyte globulins. METHODS: The expression of 376 target genes involved in tolerance, inflammation, T- and B-cell immune response, and apoptosis was evaluated using the quantitative real-time reverse-transcriptase polymerase chain reaction (2(-ΔΔCt)) method in kidney graft biopsies with normal histological findings and stable renal function, 3 months posttransplantation after induction therapy with Thymoglobulin, ATG-Fresenius S (ATG-F), and a control group without induction therapy. RESULTS: The transcriptional pattern induced by Thymoglobulin differed from ATG-F in 18 differentially expressed genes. Down-regulation of genes involved in the nuclear factor-κB pathway (TLR4, MYD88, and CD209), costimulation (CD80 and CTLA4), apoptosis (NLRP1), chemoattraction (CCR10), and dendritic cell function (CLEC4C) was observed in the biopsies from patients treated with Thymoglobulin. A hierarchical clustering analysis clearly separated the Thymoglobulin group from the ATG-F group, while the control group had a similar profile as the Thymoglobulin group. CONCLUSIONS: Despite normal morphology in graft biopsy taken 3 months posttransplantation, the intrarenal transcriptome differed in patients treated with induction therapy using different rATGs. In the Thymoglobulin high-risk group, the transcriptome profile was identical to the low-risk group. Therefore, the down-regulation of the nuclear factor-κB pathway after Thymoglobulin induction in vivo is likely to explain the clinical success of this biologic.
- MeSH
- antilymfocytární sérum farmakologie MeSH
- apoptóza MeSH
- biopsie MeSH
- dospělí MeSH
- down regulace účinky léků imunologie fyziologie MeSH
- imunosupresivní léčba metody MeSH
- králíci MeSH
- ledviny metabolismus patologie MeSH
- lidé středního věku MeSH
- lidé MeSH
- messenger RNA metabolismus MeSH
- následné studie MeSH
- NF-kappa B genetika metabolismus MeSH
- rejekce štěpu imunologie patologie prevence a kontrola MeSH
- senioři MeSH
- signální transdukce účinky léků imunologie fyziologie MeSH
- stanovení celkové genové exprese MeSH
- transplantace ledvin imunologie patologie fyziologie MeSH
- zvířata MeSH
- Check Tag
- dospělí MeSH
- králíci MeSH
- lidé středního věku MeSH
- lidé MeSH
- mužské pohlaví MeSH
- senioři MeSH
- ženské pohlaví MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- srovnávací studie MeSH
We contribute a novel, ball-histogram approach to DNA-binding propensity prediction of proteins. Unlike state-of-the-art methods based on constructing an ad-hoc set of features describing physicochemical properties of the proteins, the ball-histogram technique enables a systematic, Monte-Carlo exploration of the spatial distribution of amino acids complying with automatically selected properties. This exploration yields a model for the prediction of DNA binding propensity. We validate our method in prediction experiments, improving on state-of-the-art accuracies. Moreover, our method also provides interpretable features involving spatial distributions of selected amino acids.
BACKGROUND: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments. RESULTS: Genuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step. CONCLUSION: Set-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients. AVAILABILITY: Open-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available at http://ida.felk.cvut.cz/CESLT.
UNLABELLED: BACKGROUND: The process of protein-DNA binding has an essential role in the biological processing of genetic information. We use relational machine learning to predict DNA-binding propensity of proteins from their structures. Automatically discovered structural features are able to capture some characteristic spatial configurations of amino acids in proteins. RESULTS: Prediction based only on structural relational features already achieves competitive results to existing methods based on physicochemical properties on several protein datasets. Predictive performance is further improved when structural features are combined with physicochemical features. Moreover, the structural features provide some insights not revealed by physicochemical features. Our method is able to detect common spatial substructures. We demonstrate this in experiments with zinc finger proteins. CONCLUSIONS: We introduced a novel approach for DNA-binding propensity prediction using relational machine learning which could potentially be used also for protein function prediction in general.
- Publikační typ
- časopisecké články MeSH
Sekvenční data jsou důležitým zdrojem lékařských znalostí. Tato specifická data mohou vznikat řadou různých způsobů. V tomto článku na příkladu konkrétní studie prezentujeme obecné postupy pro jejich dolování. Jde o preventivní dlouhodobou studii atherosklerózy – data jsou výsledkem dvě dekády trvajícího sledování vývoje rizikových faktorů a přidružených jevů. Hlavním cílem je identifikovat časté sekvenční vzory, tj. opakující se časové jevy, a studovat jejich možnou souvislost s objevením jedné ze sledovaných kardiovaskulárních nemocí. Z širší škály dostupných metod se soustředíme na induktivní logické programování, které potenciální vzory vyjadřuje ve formě rysů v predikátové logice prvního řádu. Rysy jsou nejprve automaticky extrahovány a následně sdružovány do pravidel, která představují výstupní formu získané znalosti. Navržený postup je porovnán s tradičnějšími metodami publikovanými dříve. Jde o metodu posuvných oken a epizodní pravidla.
Sequential data represent an important source of automatically mined and potentially new medical knowledge. They can originate in various ways. Within the presented domain they come from a longitudinal preventive study of atherosclerosis - the data consists of series of long-term observations recording the development of risk factors and associated conditions. The intention is to identify frequent sequential patterns having any relation to an onset of any of the observed cardiovascular diseases. This paper focuses on application of inductive logic programming. The prospective patterns are based on first-order features automatically extracted from the sequential data. The features are further grouped in order to reach final complex patterns expressed as rules. The presented approach is also compared with the approaches published earlier (windowing, episode rules).
- MeSH
- algoritmy MeSH
- ateroskleróza diagnóza etiologie MeSH
- biomedicínský výzkum metody přístrojové vybavení trendy MeSH
- databáze faktografické trendy využití MeSH
- financování organizované MeSH
- kardiovaskulární nemoci diagnóza etiologie MeSH
- lékařská informatika metody přístrojové vybavení trendy MeSH
- lidé MeSH
- longitudinální studie MeSH
- rizikové faktory MeSH
- sběr dat metody přístrojové vybavení trendy MeSH
- statistika jako téma metody přístrojové vybavení trendy MeSH
- teoretické modely MeSH
- Check Tag
- lidé MeSH