Guide to CELLBLAST
What is CELLBLAST?
CELLBLAST is a system for searching gene expression databases for cells similar to the query gene expression profile. The similarity of two profiles is computed by comparing the order of genes ranked by expression. Although this is a simple measure we have observed that it is sufficient to characterize cell types across different next-generation sequencer platforms (see also Example Results).
What characterizes cells?
Expression value ranges differ between platforms, making direct comparison impossible. Given this situation, we use "gene expression ranks" as a way to compare expression data across platforms.
Spearman's rank correlation coefficient
Spearman's method uses the correlation coefficient \(r\) between two rank numbers, where \(D_i\) and \(n\) indicate the rank difference between gene \(i\) and the number of genes to be used for calculation.
The highest similarity to the query expression profile are ranked by their statistical significance based on Z-test via Fisher's Z-transformation of the rank correlation coefficient. The Z-transformed population correlation coefficient used in standardization in Z-test is approximated as the average of Z-transformed sample correlation coefficients.
The distribution of Fisher's Z-transformed sample correlation coefficient approximately follows the normal distribution. Thus, CELLBLAST profile search is statistically robust even when the population correlation coefficient between query and database profiles is non-zero.
Distribution of \(r\) and \(z_r\)
The left and right panels show the distributions of sample correlation coefficient \(r\) and Z-transformed sample correlation coefficient \(z_r\), respectively, where in the case that query is a human myoblast cell sample (GSM1268960) and "SINGLECELL: all" and "all genes" are selected as database setting. The distribution of \(z_r\) approximately follows normal distribution whose mean is estimated Z-transformed population correlation coefficient \(z_\rho=1.84\).
Fujibuchi W, Kiseleva L, Taniguchi T, Harada H, Horton P. "CellMontage: similar expression profile search server." Bioinformatics. 2007 Nov 15;23(22):3103-4.
Natalia Polouliakh, Tohru Natsume, Hajime Harada, Wataru Fujibuchi, & Paul Horton, "Comparative Genomic Analysis of Transcription Regulation Elements Involved In Human Map Kinase G-Protein Coupling Pathway", Journal of Bioinformatics and Computational Biology, 2006 Apr;4(2):469-82.
Wataru Fujibuchi, Larisa Kiseleva, Takeaki Taniguchi & Paul Horton, "Development of Cell Knowledge Base and Prediction of Cell Types and Characteristics by Gene Expression Profiles" (in Japanese), IPSJ SIG Technical Report 2005-BIO-2, pp. 33-37. 2005.
"GENE EXPRESSION PROFILE RETRIEVING APPARATUS, GENE EXPRESSION PROFILE RETRIEV\ ING METHOD, AND PROGRAM" US patent [US_11/235150] 2005/09/27
Reality for finding homologous gene expression profiles, Fujibuchi, W. and Horton, P., poster presentation at BITS 2004 Oct. 30 in Kazusa DNA research institute, Chiba.http://www.kap.co.jp/bits2004/
CellMontage - Cell type retrieval system by gene expression profiles, Fujibuchi, W., oral presentation at AIST bioinformatics educational course symposium, 2004 Oct. 1.
Microarray analysis on many genes determine a cell type., Fujibuchi, W., poster presentation at ISMB 2004 Aug. in Glasgow.http://www.iscb.org/ismb2004/cgi-bin/posterabstracts.cgi
Development of similar cell search system, "Cell Montage" from gene expression profiles., Fujibuchi, W. and Horton, P., poster presentation at life science field research workshop, 2004 Feb. 3(Japanese).
NCBI GEO: mining millions of expression profiles--database and tools.: Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R., Nucleic Acids Res. 2005 Jan. 1;33 Database Issue:D562-6.