SHOGoiN CELLBLAST
Guide to CELLBLAST
What is CELLBLAST?
CELLBLAST is a system for searching gene expression databases for
cells similar to the query gene expression profile. The
similarity of two profiles is computed by comparing the order of
genes ranked by expression. Although this is a
simple measure we have observed that it is sufficient to characterize
cell types across different nextgeneration sequencer platforms
(see also Example Results).
What characterizes cells?
Expression value ranges differ between platforms, making direct
comparison impossible. Given this situation, we use "gene expression
ranks" as a way to compare expression data across platforms.
Spearman's rank correlation coefficient
Spearman's method uses the correlation coefficient \(r\) between two rank numbers, where
\(D_i\) and \(n\) indicate the rank difference between gene \(i\) and the number of genes to be used for calculation.
Fisher's Ztransformation
The highest similarity to the query expression profile are ranked by their statistical significance
based on Ztest via Fisher's Ztransformation of the rank correlation coefficient.
The distribution of Fisher's Ztransformed sample correlation coefficient \(z_r\) approximately follows the normal distribution with a mean
\(z_\rho\) and a standard deviation \(1/\sqrt{n3}\) regardless of the size of \(n\).
In CELLBLAST, \(z_\rho\) appears in standardization is
approximated as the average of Ztransformed sample correlation coefficients \(z_r\).
Thus, CELLBLAST profile search is statistically robust even when the population correlation coefficient between query and database profiles is nonzero.
Distributions of \(r\), \(z_r\), and \(z\)
The following figure shows a graphical abstract of statistical evaluation in CELLBLAST and CellMontage (previous version of CELLBLAST), and
the histograms indicate the distributions of sample correlation coefficient \(r\), tstatistic \(t_r = r\sqrt{(n2)/(1r^2)}\),
Ztransformed sample correlation coefficient \(z_r\), and standardized Ztransformed sample correlation
coefficient \(z\), respectively, where
in the case that query is a mouse lung cell sample (GSM1271921) and "SINGLECELL: all" and
"MF:transcription factor activity, protein binding" are selected as database setting.
In CellMontage, the distribution of \(t_r\) does not follow tdistribution when
the population correlation coefficient between query and database profiles is nonzero.
In CELLBLAST, however, the distribution of \(z_r\) approximately follows the normal distribution whose mean is
estimated Ztransformed population correlation coefficient \(z_\rho=0.42\).
The standardized Ztransformed sample correlation coefficient \(z\) follows the
standard normal distribution.
References

Fujibuchi W, Kiseleva L, Taniguchi T, Harada H, Horton P. "CellMontage: similar expression profile search server." Bioinformatics. 2007 Nov 15;23(22):31034.
Natalia Polouliakh, Tohru Natsume, Hajime Harada, Wataru Fujibuchi, & Paul Horton, "Comparative Genomic Analysis of Transcription Regulation Elements Involved In Human Map Kinase GProtein Coupling Pathway", Journal of Bioinformatics and Computational Biology, 2006 Apr;4(2):46982.
Wataru Fujibuchi, Larisa Kiseleva, Takeaki Taniguchi & Paul Horton, "Development of Cell Knowledge Base and Prediction of Cell Types and Characteristics by Gene Expression Profiles" (in Japanese), IPSJ SIG Technical Report 2005BIO2, pp. 3337. 2005.
"GENE EXPRESSION PROFILE RETRIEVING APPARATUS, GENE EXPRESSION PROFILE RETRIEV\ ING METHOD, AND PROGRAM" US patent [US_11/235150] 2005/09/27
Reality for finding homologous gene expression profiles, Fujibuchi, W. and Horton, P., poster presentation at BITS 2004 Oct. 30 in Kazusa DNA research institute, Chiba.http://www.kap.co.jp/bits2004/
CellMontage  Cell type retrieval system by gene expression profiles, Fujibuchi, W., oral presentation at AIST bioinformatics educational course symposium, 2004 Oct. 1.
Microarray analysis on many genes determine a cell type., Fujibuchi, W., poster presentation at ISMB 2004 Aug. in Glasgow.http://www.iscb.org/ismb2004/cgibin/posterabstracts.cgi
Development of similar cell search system, "Cell Montage" from gene expression profiles., Fujibuchi, W. and Horton, P., poster presentation at life science field research workshop, 2004 Feb. 3(Japanese).
NCBI GEO: mining millions of expression profilesdatabase and tools.: Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R., Nucleic Acids Res. 2005 Jan. 1;33 Database Issue:D5626.