Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd

. 2016 Sep 26 ; 7 () : 12846. [epub] 20160926

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid27667448

Grantová podpora
R01 NS099068 NINDS NIH HHS - United States
U54 HL127624 NHLBI NIH HHS - United States
T32 HL007824 NHLBI NIH HHS - United States
R01 GM098316 NIGMS NIH HHS - United States
U54 CA189201 NCI NIH HHS - United States

Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.

Anna Blamansingel 216 Amsterdam 102 SW Netherlands

Center for Interdisciplinary Cardiovascular Sciences Brigham and Women's Hospital 3 Blackfan Circle Boston Massachusetts 02115 USA

Center for Research in Myology Sorbonne Universités UPMC Univ Paris 06 INSERM UMRS975 CNRS FRE3617 47 Boulevard de l'hôpital Paris 75013 France

Center for Space Medicine Baylor College of Medicine 1 Baylor Plaza Houston Texas 77030 USA

CICAB Clinical Research Centre Extremadura University Hospital Elvas Av s n 06006 Badajoz 06006 Spain

Consejo Superior de Investigaciones Científicas Centro Nacional de Biotecnología Department of Immunology and Oncology c Darwin 3 Madrid 28049 Spain

David H Koch Institute for Integrative Cancer Research Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Daylesford the Fairway Weybridge Surrey KT13 0RZ UK

Department of Biochemistry 3 University of Regensburg Universitätsstrasse 31 Regensburg 93053 Germany

Department of Biological Engineering Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Department of Biological Sciences 600 Fairchild Center Mail Code 2402 Columbia University New York New York 10032 USA

Department of Biology and Institute of Genetics Universidad Nacional de Colombia Bogota Cr 30 45 08 Colombia

Department of Biology Faculty of Medicine Masaryk University Brno 625 00 Czech Republic

Department of Biology Shenandoah University 1460 University Dr Winchester Winchester Virginia 22601 USA

Department of Human Genetics Faculty of Medicine and Health Sciences University of Oldenburg Ammerländer Heerstrasse 114 118 Oldenburg 26129 Germany

Department of Life Sciences School of Sciences European University Cyprus 6 Diogenes Str Engomi P O Box 22006 Nicosia 1516 Cyprus

Department of Materials Science and Engineering Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Department of Neurosurgery Stanford School of Medicine Stanford California 94304 USA

Department of Pharmacological Sciences BD2K LINCS Data Coordination and Integration Center Illuminating the Druggable Genome Knowledge Management Center Icahn School of Medicine at Mount Sinai One Gustave L Levy Place Box 1215 New York New York 10029 USA

Department of Pharmacology and Toxicology University of Navarra Pamplona Irunlarrea 1 Pamplona 31008 Spain

Department of Research Institute of Liver and Biliary Sciences D1 Vasant Kunj New Delhi 110070 India

Division of Clinical Immunology Department of Laboratory Medicine Karolinska Institute Alfred Nobels Allé 8 level 7 Stockholm SE141 86 Sweden

Fluid Physics and Transport Processes Branch NASA Glenn Research Center 21000 Brookpark Rd Cleveland Ohio 44135 USA

IBM India Pvt Ltd Bengaluru 560045 India

IMIM Hospital Del Mar PRBB Barcelona Dr Aiguader Barcelona 88 08003 Spain

Paediatric Allergology and Pulmonology Dr von Hauner University Children's Hospital Ludwig Maximilians University of Munich Member of the German Centre for Lung Research Lindwurmstrasse 4 Munich 80337 Germany

Protein Chemistry and Proteomics Unit Biotechnology Research Center Pasteur Institute of Iran No 358 12th Farwardin Ave Jomhhoori St Tehran 13164 Iran

School of Biological Sciences Institute for Researches in Fundamental Sciences Niavaran Square P O Box Tehran 19395 5746 Iran

School of Biosciences University of Nottingham Sutton Bonington Campus Sutton Bonington Leicestershire LE12 5RD UK

Spinal Cord Injury Service Veteran Affairs Palo Alto Health Care System Palo Alto California 94304 USA

Technical University of Denmark National Veterinary Institute Bülowsvej 27 Building 2 3 Frederiksberg C 1870 Denmark

The Ragon Institute of MGH MIT and Harvard 400 Technology Square Cambridge Massachusetts 02139 USA

University of Salamanca Salamanca Madrid 37008 Spain

Warsaw School of Information Technology under the auspices of the Polish Academy of Sciences 6 Newelska St Warsaw 01 447 Poland

Zobrazit více v PubMed

Barrett T. PubMed PMC

Rustici G. PubMed PMC

Chang J. PubMed PMC

Williams G. A searchable cross-platform gene expression database reveals connections between drug treatments and disease. BMC Genom. 13, 12 (2012). PubMed PMC

Fujibuchi W., Kiseleva L., Taniguchi T., Harada H. & Horton P. CellMontage: similar expression profile search server. Bioinformatics 23, 3103–3104 (2007). PubMed

Engreitz J. M. PubMed PMC

Zinman G. E., Naiman S., Kanfi Y., Cohen H. & Bar-Joseph Z. ExpressionBlast: mining large, unstructured expression databases. Nat. Methods 10, 925–926 (2013). PubMed

Zhu Q. PubMed PMC

Dudley J. T. PubMed PMC

Hu G. & Agarwal P. Human disease-drug network based on genomic expression profiles. PLoS ONE 4, e6536 (2009). PubMed PMC

Iorio F. PubMed PMC

Feng C. PubMed PMC

Good B. M. & Su A. I. Crowdsourcing for bioinformatics. Bioinformatics 29, 1925–1933 (2013). PubMed PMC

Khare R., Good B. M., Leaman R., Su A. I. & Lu Z. Crowdsourcing in biomedicine: challenges and opportunities. Brief. Bioinf. 17, 23–32 (2015). PubMed PMC

Candido dos Reis F. J. PubMed PMC

Benjamin M. G., Max N., Chunlei W. U. & Andrew I. S. in Biocomputing 2015 282–293World Scientific (2014).

Burger J. D. PubMed PMC

Gottlieb A., Hoehndorf R., Dumontier M. & Altman R. B. Ranking adverse drug reactions with crowdsourcing. J. Med. Internet Res. 17, e80 (2015). PubMed PMC

Khare R. PubMed PMC

Vergoulis T. PubMed PMC

Clark N. PubMed PMC

Storey J. D. & Tibshirani R. in

Ritchie M. E. PubMed PMC

Anders S. Analysing RNA-Seq data with the DESeq package. Mol. Biol. 43, 1–17 (2010).

Gundersen G. W. PubMed PMC

Li J., Bushel P. R., Chu T.-M. & Wolfinger R. D. in

Boedigheimer M. J. PubMed PMC

Leek J. T. & Storey J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, e161 (2007). PubMed PMC

Liberzon A. PubMed PMC

He X. C. PubMed PMC

Sagiv E. PubMed

Soucek L. PubMed

Nilsson E. C. PubMed

Hwang S. J. PubMed

Sohda T. PubMed

Savage D. G. & Antman K. H. Imatinib mesylate—a new oral targeted therapy. N. Engl. J. Med. 346, 683–693 (2002). PubMed

Hodi F. S. PubMed PMC

Martínez-Ramírez A. PubMed

Antunes C. M. F. PubMed

Weiderpass E. PubMed

Grady D., Gebretsadik T., Kerlikowske K., Ernster V. & Petitti D. Hormone replacement therapy and endometrial cancer risk: a meta-analysis. Obstet. Gynecol. 85, 304–313 (1995). PubMed

Kahlert S. PubMed

Song R. X. PubMed PMC

Sirianni R. PubMed

Pollak M. Insulin and insulin-like growth factor signalling in neoplasia. Nat. Rev. Cancer 8, 915–928 (2008). PubMed

Schmandt R. E., Iglesias D. A., Co N. N. & Lu K. H. Understanding obesity and endometrial cancer risk: opportunities for prevention. Am. J. Obstet. Gynecol. 205, 518–525 (2011). PubMed PMC

Michalik L., Desvergne B. & Wahli W. Peroxisome-proliferator-activated receptors and cancers: complex stories. Nat. Rev. Cancer 4, 61–70 (2004). PubMed

Tsuchida A. PubMed

Mu N., Zhu Y., Wang Y., Zhang H. & Xue F. Insulin resistance: a significant risk factor of endometrial cancer. Gynecol. Oncol. 125, 751–757 (2012). PubMed

Tupler R. & Gabellini D. Molecular basis of facioscapulohumeral muscular dystrophy. CMLS Cell Mol. Life Sci. 61, 557–566 (2004). PubMed PMC

Tawil R. & Van Der Maarel S. M. Facioscapulohumeral muscular dystrophy. Muscle Nerve 34, 1–15 (2006). PubMed

Lamb J. PubMed

Lonsdale J. PubMed PMC

The Cancer Genome Atlas Research, N.. PubMed PMC

Barretina J. PubMed PMC

Settles B. Active learning literature survey. University of Wisconsin, Madison 52, 11 (2010).

Yan Y., Fung G. M., Rosales R. & Dy J. G. in

Mozafari B., Sarkar P., Franklin M., Jordan M. & Madden S. Scaling up crowd-sourcing to very large datasets: a case for active learning. Proc. VLDB Endow. 8, 125–136 (2014).

Gray K. A. PubMed PMC

Kibbe W. A. PubMed PMC

Law V. PubMed PMC

Leek J. T., Johnson W. E., Parker H. S., Jaffe A. E. & Storey J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012). PubMed PMC

Wang Z., Clark N. & Ma'ayan A. Dynamics of the discovery process of protein-protein interactions from low content studies. BMC Syst. Biol. 9, 26 (2015). PubMed PMC

Pletscher-Frankild S., Pallejà A., Tsafou K., Binder J. X. & Jensen L. J. DISEASES: text mining and data integration of disease–gene associations. Methods 74, 83–89 (2015). PubMed

Rogers D. & Hahn M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010). PubMed

DeLong E. R., DeLong D. M. & Clarke-Pearson D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988). PubMed

Fellbaum C. WordNet Wiley Online Library (1998).

Van Rijsbergen C. J., Robertson S. E. & Porter M. F.

Manning C. D., Raghavan P. & Schütze H. Introduction to information retrieval Vol. 1, (Cambridge university press Cambridge (2008).

Van der Maaten L. & Hinton G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 85 (2008).

Pedregosa F.

Breiman L. Random forests. Mach. Learn. 45, 5–32 (2001).

Geurts P., Ernst D. & Wehenkel L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).

Friedman J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).

Breiman L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).

Zadrozny B. & Elkan C. in ICML, vol. 1, 609–616Citeseer (2001).

Ester M., Kriegel H.-P., Sander J. & Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In

Nunes T., Campos D., Matos S. & Oliveira J. L. BeCAS: biomedical concept recognition services and visualization. Bioinformatics 29, 1915–1916 (2013). PubMed

Najít záznam

Citační ukazatele

Pouze přihlášení uživatelé

Možnosti archivace

Nahrávání dat ...