• This record comes from PubMed

A nitty-gritty aspect of correlation and network inference from gene expression data

. 2008 Aug 20 ; 3 () : 35. [epub] 20080820

Language English Country Great Britain, England Media electronic

Document type Journal Article, Research Support, N.I.H., Extramural, Research Support, Non-U.S. Gov't, Review

Grant support
R01 GM075299 NIGMS NIH HHS - United States
R21 GM079259 NIGMS NIH HHS - United States
T32 ES007271 NIEHS NIH HHS - United States
T32 ES 0072 NIEHS NIH HHS - United States

BACKGROUND: All currently available methods of network/association inference from microarray gene expression measurements implicitly assume that such measurements represent the actual expression levels of different genes within each cell included in the biological sample under study. Contrary to this common belief, modern microarray technology produces signals aggregated over a random number of individual cells, a "nitty-gritty" aspect of such arrays, thereby causing a random effect that distorts the correlation structure of intra-cellular gene expression levels. RESULTS: This paper provides a theoretical consideration of the random effect of signal aggregation and its implications for correlation analysis and network inference. An attempt is made to quantitatively assess the magnitude of this effect from real data. Some preliminary ideas are offered to mitigate the consequences of random signal aggregation in the analysis of gene expression data. CONCLUSION: Resulting from the summation of expression intensities over a random number of individual cells, the observed signals may not adequately reflect the true dependence structure of intra-cellular gene expression levels needed as a source of information for network reconstruction. Whether the reported effect is extrime or not, the important point, is to reconize and incorporate such signal source for proper inference. The usefulness of inference on genetic regulatory structures from microarray data depends critically on the ability of investigators to overcome this obstacle in a scientifically sound way. REVIEWERS: This article was reviewed by Byung Soo KIM, Jeanne Kowalski and Geoff McLachlan.

See more in PubMed

Chu T, Glymour C, Scheines R, Spirtes P. A statistical problem for inference to regulatory structure from associations of gene expression measurements with microarrays. Bioinformatics. 2003;19:1147–1152. PubMed

Held GA, Grinstein G, Tu Y. Modeling of DNA microarray data by using physical properties of hybridization. Proc Natl Acad Sci USA. 2003;100:7575–7580. PubMed PMC

Qiu X, Brooks AI, Klebanov L, Yakovlev A. The effects of normalization on the correlation structure of microarray data. BMC Bioinformatics. 2005;6:Article 120. PubMed PMC

Lim WK, Wang K, Lefebvre C, Califano A. Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks. Bioinformatics. 2007;23:i282–i288. PubMed

Chen L, Klebanov L, Yakovlev AY. Normality of gene expression revisited. J Biol Syst. 2007;15:39–48.

Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet J-P, Subramanian A, Ross KN, Reich M, Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons PA, Ru Wei R, Carr SA, Lander ES, Golub TR. The connectivity map: Using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313:1929–1935. PubMed

Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002;1:133–143. PubMed

Singh D, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D'Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002;1:203–209. PubMed

Klebanov L, Chen L, Yakovlev A. Revisiting adverse effects of cross-hybridization in Affymetrix gene expression data: do they matter for correlation analysis? Biology Direct. 2007;2:Article 28. PubMed PMC

Klebanov L, Yakovlev A. How high is the level of technical noise in microarray data? Biology Direct. 2007;2:Article 9. PubMed PMC

Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Lou Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Sherf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK. The microarray quality control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006;24:1151–1161. PubMed PMC

Klebanov L, Yakovlev A. Diverse correlation structures in microarray gene expression data and their utility in improving statistical inference. Annals of Applied Statistics. 2007;1:538–559.

Klebanov L, Qiu X, Yakovlev AY. Testing differential expression in non-overlapping gene pairs: A new perspective for the empirical Bayes method. J Bioinformatics Comput Biol. 2008;6:301–316. PubMed

Melamed JA. Stability Problems for Stochastic Models, Lecture Notes in Mathematics. Vol. 1412. Springer, Berlin; 1989. Limit theorems in the set-up of summation of a random number of independent and identically distributed random variables; pp. 194–228.

Gnedenko BV. On convergence of laws of a distribution of sums of independent summands. Doklady Akad Nauk USSR. 1938;18:231–234.

Klebanov LB, Rachev ST. Sums of a random number of random variables and their approximations with ν-accompanying infinitely divisible laws. Serdica Math J. 1996;22:471–496.

Klebanov L, Kozubowski TJ, Rachev ST. Ill-posed problems in probability and stability of random sums. Nova Science Publishers, NY; 2006.

Révész P. The laws of large numbers. Academic Press, NY; 1968.

Robbins H. The asymptotic distribution of the sum of a random number of independent random variables. Bull Am Math Soc. 1948;54:1151–1161. PubMed PMC

Gnedenko BV, Korolev VY. Random summation. Limit theorems and applications. CRC Press, Boca Raton; 2000.

Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A. Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005;37:382–390. PubMed

Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:e161. PubMed PMC

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...