Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution
Language English Country England, Great Britain Media electronic
Document type Evaluation Study, Journal Article, Research Support, Non-U.S. Gov't
PubMed
18764936
PubMed Central
PMC2556684
DOI
10.1186/1471-2105-9-361
PII: 1471-2105-9-361
Knihovny.cz E-resources
- MeSH
- Algorithms MeSH
- Biomarkers analysis MeSH
- Models, Biological MeSH
- Child MeSH
- Epidemiologic Methods MeSH
- Risk Assessment methods MeSH
- Data Interpretation, Statistical * MeSH
- Humans MeSH
- Computer Simulation MeSH
- Proteome analysis MeSH
- Risk Factors MeSH
- Gene Expression Profiling methods statistics & numerical data MeSH
- Models, Statistical MeSH
- Environmental Exposure analysis statistics & numerical data MeSH
- Air Pollution statistics & numerical data MeSH
- Check Tag
- Child MeSH
- Humans MeSH
- Publication type
- Journal Article MeSH
- Evaluation Study MeSH
- Research Support, Non-U.S. Gov't MeSH
- Geographicals
- Czech Republic epidemiology MeSH
- Names of Substances
- Biomarkers MeSH
- Proteome MeSH
BACKGROUND: In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. RESULTS: In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. CONCLUSION: CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways.
See more in PubMed
Dudoit S, Yang Y, Callow M, Speed T. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica. 2002;12:111–139.
Storey JD, Tibshirani R. SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. In: Parmigiani G, Garrett ES, Irizarry RA, Zeger SL, editor. The analysis of gene expression data: methods and software. New York: Springer; 2003.
Su Y, Murali TM, Pavlovic V, Schaffer M, Kasif S. RankGene: identification of diagnostic genes based on expression data. Bioinformatics. 2003;19:1578–9. PubMed
Herrero J, Al-Shahrour F, Diaz-Uriarte R, Mateos A, Vaquerizas JM, Santoyo J, Dopazo J. GEPAS: A web-based resource for microarray gene expression data analysis. Nucleic Acids Research. 2003;31:3461–7. PubMed PMC
Herrero J, Dìaz-Uriarte R, Dopazo J. Gene Expression Data Preprocessing. Bioinformatics. 2003;19:655–656. PubMed
Moretti S, Patrone F, Bonassi S. The class of Microarray games and the relevance index for genes. Top. 2007;15:265–280.
Shapley LS. A Value for n-Person Games. In: Kuhn W, Tucker AW, editor. Contributions to the Theory of Games II. New York: Princeton University Press; 1953. pp. 307–317. [Annals of Mathematics Studies 28]
Moretti S, Patrone F. Transversality of the Shapley value. Top. 2008;16:1–41.
Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic Gene Expression in a Single Cell. Science. 2002;297:1183–86. PubMed
Swain PS, Elowitz MB, Siggia ED. Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc Natl Acad Sci U S A. 2002;99:12795–12800. PubMed PMC
Efron B. Computers and the theory of statistics: thinking the unthinkable. j-SIAM-REVIEW. 1979;21:460–480.
Efron B, Gong G. A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation. The American Statistician. 1983;37:36–48.
Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman & Hall/CRC; 1993.
van Leeuwen DM, van Herwijnen MHM, Pedersen M, Knudsen LE, Kirsch-Volders M, Sram RJ, Staal YCM, Bajak E, van Delft JHM, Kleinjans JCS. Genome-wide differential gene expression in children exposed to air pollution in the Czech Republic. Mutation Research. 2006;600:12–22. PubMed
Moretti S. GameNets '06: Proceeding from the 2006 workshop on Game theory for communications and networks. ACM International Conference Proceeding Series, New York, NY, USA: ACM; 2006. Minimum cost spanning tree situations and gene expression data analysis; p. 8.
Fragnelli V, Moretti S. A game theoretical approach to the classification problem in gene expression data analysis. Computers & Mathematics with Applications. 2008;55:950–959.
Albino D, Scaruffi P, Moretti S, Coco S, Di Cristofano C, Cavazzana A, Truini M, Stigliani S, Bonassi S, Tonini GP. Identification of low intratumoral gene expression heterogeneity in Neuroblastic Tumors by wide-genome expression analysis and game theory. Cancer. 2008;113:1412–22. PubMed
Keinan A, Sandbank B, Hilgetag CC, Meilijson I, Ruppin E. Fair attribution of functional contribution in artificial and biological networks. Neural Computation. 2004;16:1887–1915. PubMed
Kaufman A, Keinan A, Meilijson I, Kupiec M, Ruppin E. Quantitative analysis of genetic and neuronal multi-perturbation experiments. PLoS Computational Biology. 2005;1:e64. PubMed PMC
Dennis GJ, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biology. 2003;4:P3. PubMed
Cover TM, Thomas JA. Elements of Information Theory. New York: Wiley; 1991.
Jeffery IB, Higgins DG, Culhane AC. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics. 2006;7:359. PubMed PMC
Barkett M, Gilmore TD. Control of apoptosis by Rel/NF-kappaB transcription factors. Oncogene. 1999;18:6910–6924. PubMed
Silverman N, Maniatis T. NF-kappaB signaling pathways in mammalian and insect innate immunity. Genes & Development. 2001;15:2321–2342. PubMed
Brena RM, Morrison C, Liyanarachchi S, Jarjoura D, Davuluri RV, Otterson GA, Reisman D, Glaros S, Rush LJ, Plass C. Aberrant DNA methylation of OLIG1, a novel prognostic factor in non-small cell lung cancer. PLoS Med. 2007;4:e108. PubMed PMC
Sztrolovics R, Wang SP, Lapierre P, Chen HS, Robert MF, Mitchell GA. Hormone-sensitive lipase (Lipe): sequence analysis of the 129Sv mouse Lipe gene. Mammalian Genome. 1997;8:86–9. PubMed
Contreras JA, Karlsson M, Osterlund T, Laurell H, Svensson A, Holm C. Hormone-sensitive lipase is structurally related to acetylcholinesterase, bile salt-stimulated lipase, and several fungal lipases. Building of a three-dimensional model for the catalytic domain of hormone-sensitive lipase. Journal of Biological Chemistry. 1996;271:31426–30. PubMed
Xu LL, Shanmugam N, Segawa T, Sesterhenn IA, McLeod DG, Moul JW, Srivastava S. A novel androgen-regulated gene, PMEPA1, located on chromosome 20q13 exhibits high level expression in prostate. Genomics. 2000;66:257–63. PubMed
Padilla O, Pujana MA, la Iglesia AL, Gimferrer I, Arman M, Vila JM, Places L, Vives J, Estivill X, Lozano F. Cloning of S4D-SRCRB, a new soluble member of the group B scavenger receptor cysteine-rich family (SRCR-SF) mapping to human chromosome 7q11.23. Immunogenetics. 2002;54:621–34. PubMed
Drabkin HA, West JD, Hotfilder M, Heng YM, Erickson P, Calvo R, Dalmau J, Gemmill RM, Sablitzky F. DEF-3(g16/NY-LU-12), an RNA binding protein from the 3p21.3 homozygous deletion region in SCLC. Oncogene. 1999;18:2589–97. PubMed
Gure A, Altorki N, Stockert E, Scanlan M, Old L, Chen Y. Human lung cancer antigens recognized by autologous antibodies: definition of a novel cDNA derived from the tumor suppressor gene locus on chromosome 3p21.3. Cancer Research. 1998;58:1034–41. PubMed
Hanahan D, Weindberg RA. The Hallmarks of Cancer. Cell. 2000;100:57–70. PubMed
Owen G. Game Theory. 3. Academic Press; 1995.
Shmulevich I, Zhang W. Binary analysis and optimization-based normalization of gene expression data. Bioinformatics. 2002;18:555–565. PubMed
Zhou X, Wang X, Dougherty ER. Binarization of microarray data on the basis of a mixture model. Molecular Cancer Therapeutics. 2003;2:679–684. PubMed
Bickel DR. Microarray gene expression analysis: Data transformation and multiple comparison bootstrapping. Computing Science and Statistics. 2002;34:383–400.
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U, Speed T. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4:249–264. PubMed
Wu Z, Irizarry RA, Gentleman R, Murillo FM, Spencer F. A Model Based Background Adjustment for Oligonucleotide Expression Arrays. Journal of the American Statistical Association. 2004;99:909–917.
Cleveland WS, Devlin SJ. Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting. Journal of the American Statistical Association. 1988;99:596–610.
Ward JH. Hierachical grouping to optimize an objective function. Journal of The American Statistical Association. 1963;58:236–244.
Hartigan JA, Wong MA. A K-means clustering algorithm. Applied Statistics. 1979;28:100–108.
Wang D, Lv Y, Guo Z, Li X, Li Y, Zhu J, Yang D, Xu J, Wang C, Rao S, Yang B. Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules. Bioinformatics. 2006;22:2883–9. PubMed
Fisher RA. The logic of inductive inference. Journal of the Royal Statistical Society Series A. 1935;98:39–54.
Hosack DA, Dennis GJ, Sherman BT, Lane HC, Lempicki RA. Identifying biological themes within lists of genes with EASE. Genome Biology. 2003;4:R70. PubMed PMC