• This record comes from PubMed

Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution

. 2008 Sep 02 ; 9 () : 361. [epub] 20080902

Language English Country England, Great Britain Media electronic

Document type Evaluation Study, Journal Article, Research Support, Non-U.S. Gov't

BACKGROUND: In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. RESULTS: In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. CONCLUSION: CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways.

See more in PubMed

Dudoit S, Yang Y, Callow M, Speed T. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica. 2002;12:111–139.

Storey JD, Tibshirani R. SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. In: Parmigiani G, Garrett ES, Irizarry RA, Zeger SL, editor. The analysis of gene expression data: methods and software. New York: Springer; 2003.

Su Y, Murali TM, Pavlovic V, Schaffer M, Kasif S. RankGene: identification of diagnostic genes based on expression data. Bioinformatics. 2003;19:1578–9. PubMed

Herrero J, Al-Shahrour F, Diaz-Uriarte R, Mateos A, Vaquerizas JM, Santoyo J, Dopazo J. GEPAS: A web-based resource for microarray gene expression data analysis. Nucleic Acids Research. 2003;31:3461–7. PubMed PMC

Herrero J, Dìaz-Uriarte R, Dopazo J. Gene Expression Data Preprocessing. Bioinformatics. 2003;19:655–656. PubMed

Moretti S, Patrone F, Bonassi S. The class of Microarray games and the relevance index for genes. Top. 2007;15:265–280.

Shapley LS. A Value for n-Person Games. In: Kuhn W, Tucker AW, editor. Contributions to the Theory of Games II. New York: Princeton University Press; 1953. pp. 307–317. [Annals of Mathematics Studies 28]

Moretti S, Patrone F. Transversality of the Shapley value. Top. 2008;16:1–41.

Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic Gene Expression in a Single Cell. Science. 2002;297:1183–86. PubMed

Swain PS, Elowitz MB, Siggia ED. Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc Natl Acad Sci U S A. 2002;99:12795–12800. PubMed PMC

Efron B. Computers and the theory of statistics: thinking the unthinkable. j-SIAM-REVIEW. 1979;21:460–480.

Efron B, Gong G. A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation. The American Statistician. 1983;37:36–48.

Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman & Hall/CRC; 1993.

van Leeuwen DM, van Herwijnen MHM, Pedersen M, Knudsen LE, Kirsch-Volders M, Sram RJ, Staal YCM, Bajak E, van Delft JHM, Kleinjans JCS. Genome-wide differential gene expression in children exposed to air pollution in the Czech Republic. Mutation Research. 2006;600:12–22. PubMed

Moretti S. GameNets '06: Proceeding from the 2006 workshop on Game theory for communications and networks. ACM International Conference Proceeding Series, New York, NY, USA: ACM; 2006. Minimum cost spanning tree situations and gene expression data analysis; p. 8.

Fragnelli V, Moretti S. A game theoretical approach to the classification problem in gene expression data analysis. Computers & Mathematics with Applications. 2008;55:950–959.

Albino D, Scaruffi P, Moretti S, Coco S, Di Cristofano C, Cavazzana A, Truini M, Stigliani S, Bonassi S, Tonini GP. Identification of low intratumoral gene expression heterogeneity in Neuroblastic Tumors by wide-genome expression analysis and game theory. Cancer. 2008;113:1412–22. PubMed

Keinan A, Sandbank B, Hilgetag CC, Meilijson I, Ruppin E. Fair attribution of functional contribution in artificial and biological networks. Neural Computation. 2004;16:1887–1915. PubMed

Kaufman A, Keinan A, Meilijson I, Kupiec M, Ruppin E. Quantitative analysis of genetic and neuronal multi-perturbation experiments. PLoS Computational Biology. 2005;1:e64. PubMed PMC

Dennis GJ, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biology. 2003;4:P3. PubMed

Cover TM, Thomas JA. Elements of Information Theory. New York: Wiley; 1991.

Jeffery IB, Higgins DG, Culhane AC. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics. 2006;7:359. PubMed PMC

Barkett M, Gilmore TD. Control of apoptosis by Rel/NF-kappaB transcription factors. Oncogene. 1999;18:6910–6924. PubMed

Silverman N, Maniatis T. NF-kappaB signaling pathways in mammalian and insect innate immunity. Genes & Development. 2001;15:2321–2342. PubMed

Brena RM, Morrison C, Liyanarachchi S, Jarjoura D, Davuluri RV, Otterson GA, Reisman D, Glaros S, Rush LJ, Plass C. Aberrant DNA methylation of OLIG1, a novel prognostic factor in non-small cell lung cancer. PLoS Med. 2007;4:e108. PubMed PMC

Sztrolovics R, Wang SP, Lapierre P, Chen HS, Robert MF, Mitchell GA. Hormone-sensitive lipase (Lipe): sequence analysis of the 129Sv mouse Lipe gene. Mammalian Genome. 1997;8:86–9. PubMed

Contreras JA, Karlsson M, Osterlund T, Laurell H, Svensson A, Holm C. Hormone-sensitive lipase is structurally related to acetylcholinesterase, bile salt-stimulated lipase, and several fungal lipases. Building of a three-dimensional model for the catalytic domain of hormone-sensitive lipase. Journal of Biological Chemistry. 1996;271:31426–30. PubMed

Xu LL, Shanmugam N, Segawa T, Sesterhenn IA, McLeod DG, Moul JW, Srivastava S. A novel androgen-regulated gene, PMEPA1, located on chromosome 20q13 exhibits high level expression in prostate. Genomics. 2000;66:257–63. PubMed

Padilla O, Pujana MA, la Iglesia AL, Gimferrer I, Arman M, Vila JM, Places L, Vives J, Estivill X, Lozano F. Cloning of S4D-SRCRB, a new soluble member of the group B scavenger receptor cysteine-rich family (SRCR-SF) mapping to human chromosome 7q11.23. Immunogenetics. 2002;54:621–34. PubMed

Drabkin HA, West JD, Hotfilder M, Heng YM, Erickson P, Calvo R, Dalmau J, Gemmill RM, Sablitzky F. DEF-3(g16/NY-LU-12), an RNA binding protein from the 3p21.3 homozygous deletion region in SCLC. Oncogene. 1999;18:2589–97. PubMed

Gure A, Altorki N, Stockert E, Scanlan M, Old L, Chen Y. Human lung cancer antigens recognized by autologous antibodies: definition of a novel cDNA derived from the tumor suppressor gene locus on chromosome 3p21.3. Cancer Research. 1998;58:1034–41. PubMed

Hanahan D, Weindberg RA. The Hallmarks of Cancer. Cell. 2000;100:57–70. PubMed

Owen G. Game Theory. 3. Academic Press; 1995.

Shmulevich I, Zhang W. Binary analysis and optimization-based normalization of gene expression data. Bioinformatics. 2002;18:555–565. PubMed

Zhou X, Wang X, Dougherty ER. Binarization of microarray data on the basis of a mixture model. Molecular Cancer Therapeutics. 2003;2:679–684. PubMed

Bickel DR. Microarray gene expression analysis: Data transformation and multiple comparison bootstrapping. Computing Science and Statistics. 2002;34:383–400.

Irizarry RA, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U, Speed T. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4:249–264. PubMed

Wu Z, Irizarry RA, Gentleman R, Murillo FM, Spencer F. A Model Based Background Adjustment for Oligonucleotide Expression Arrays. Journal of the American Statistical Association. 2004;99:909–917.

Cleveland WS, Devlin SJ. Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting. Journal of the American Statistical Association. 1988;99:596–610.

Ward JH. Hierachical grouping to optimize an objective function. Journal of The American Statistical Association. 1963;58:236–244.

Hartigan JA, Wong MA. A K-means clustering algorithm. Applied Statistics. 1979;28:100–108.

Wang D, Lv Y, Guo Z, Li X, Li Y, Zhu J, Yang D, Xu J, Wang C, Rao S, Yang B. Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules. Bioinformatics. 2006;22:2883–9. PubMed

Fisher RA. The logic of inductive inference. Journal of the Royal Statistical Society Series A. 1935;98:39–54.

Hosack DA, Dennis GJ, Sherman BT, Lane HC, Lempicki RA. Identifying biological themes within lists of genes with EASE. Genome Biology. 2003;4:R70. PubMed PMC

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...