SHOGoiN CELLBLAST

Downloads

Installation
Requirements
A functional version of Boost (C++ libraries) is required.

Standard Installation
In Linux, just type the following commands:
$ ./configure
$ make
$ make install

Individual Installation
By default, "make install" will install all the files in "/usr/local/bin", "/usr/local/lib" etc. You can specify an installation prefix other than "/usr/local" using "--prefix" to "configure" execution, for instance "--prefix=$HOME".
$ ./configure --prefix=$HOME

Running
Database file generation
Prepare a database file in which gene expression files in CM format are just concatenated as follows:

db.CM
>GSM1269135 |GPL16791 (HiSeq 2500)|H.sapiens|8000301-738 (Muscle, Myoblast_T24 (Skeletal muscle))
ENSG00000000003:469 ENSG00000000005:0 ENSG00000000419:0 ...
...
ENSG00000283122:0 ENSG00000283123:8 ENSG00000283125:0
>GSM1269137 |GPL16791 (HiSeq 2500)|H.sapiens|8000301-738 (Muscle, Myoblast_T24 (Skeletal muscle))
ENSG00000000003:237 ENSG00000000005:0 ENSG00000000419:45 ...
...
ENSG00000283122:0 ENSG00000283123:0 ENSG00000283125:0
GSM1269130 |GPL16791 (HiSeq 2500)|H.sapiens|8000301-738 (Muscle, Myoblast_T24 (Skeletal muscle))
...

Generate index file, "db_geneIds.txt".
$ ./genIndex.pl db.CM | sort | uniq > db_geneIds.txt

Generate binary file of the database, "db.bin".
$ ./runGerIndexer db_geneIds.txt < db.CM > db.bin

Run CELLBLAST profile matcher.
Prepare query file in CM format and run CELLBLAST profile matcher as follows.
$ ./runGerMatcher db.bin query.CM > result.txt

Example result
Sample ID P-value Spearman's rank correlation coefficient # genes used for matching Header information of CM format in database file
GSM1901473 0 1.00 588 GSM1901473 |GPL11154 (HiSeq 2000)|H.sapiens|3110001010000000000000-020 (Pancreas, Alpha cell (Pancreatic islet))
GSM1901487 3.62266e-13 0.556442 588 GSM1901487 |GPL11154 (HiSeq 2000)|H.sapiens|3110001010000000000000-020 (Pancreas, Alpha cell (Pancreatic islet))
GSM1901493 7.22888e-12 0.544295 588 GSM1901493 |GPL11154 (HiSeq 2000)|H.sapiens|3110001010000000000000-020 (Pancreas, Alpha cell (Pancreatic islet))
GSM1901488 1.94848e-10 0.529727 588 GSM1901488 |GPL11154 (HiSeq 2000)|H.sapiens|3110001010000000000000-020 (Pancreas, Alpha cell (Pancreatic islet))
GSM1901458 3.72947e-10 0.526685 588 GSM1901458 |GPL11154 (HiSeq 2000)|H.sapiens|3110001010000000000000-212 (Pancreas, PP cell (Pancreatic islet))
GSM1901497 1.09923e-09 0.521478 588 GSM1901497 |GPL11154 (HiSeq 2000)|H.sapiens|3110001010000000000000-020 (Pancreas, Alpha cell (Pancreatic islet))
GSM1901464 1.62965e-09 0.519536 588 GSM1901464 |GPL11154 (HiSeq 2000)|H.sapiens|3110002050000000000000-090 (Pancreas, Duct cell (Pancreatic islet))
GSM1901519 3.53105e-09 0.515645 588 GSM1901519 |GPL11154 (HiSeq 2000)|H.sapiens|3110001010000000000000-026 (Pancreas, Beta cell (Pancreatic islet))
... ... ... ... ...