This is readme file of version 2.0 of ChemGenome. If you encounter any problem or error while downloading or running the program, please inform us at scfbio@scfbio-iitd.res.in

About ChemGenome
ChemGenome is a physico-chemical method which accepts DNA sequence in FASTA format and predicts genes, based on hydrogen bonding energy, stacking energy and protein-nucleic acid interaction parameter for each trinucleotide (codon).

ChemGenome is ab-initio in nature and has been tested on 372 prokaryotic genomes with sensitivity, specificity and correlation coefficients averaged over 356208 genes and an equal number of frame-shifted genes (non-genes) as 97.5%, 97.20% & 94.25% respectively. The software can be accessed online from the following link.

www.scfbio-iitd.res.in/chemgenome/chemgenomenew.jsp
------------------------------------------------------------------------------------------------------------------------

Installing and Running ChemGenome
ChemGenome has been written and compiled in Linux environment. Following instructions will be run on Linux system. 


Installation

To install ChemGenome download the files from website Chemgenome2.0.tar. Size of the compressed file is 30KB. 

Copy the tar files in your current directory and uncompress it by using this command 
 	$ tar -xvf Chemgenome2.0.tar 

The ChemGenome2.0 contains three files- Chemgenome2.0, data directory and readme.txt. 

To run ChemGenome2.0 properly , user should copy data directory in to their current directory before running Chemgenome2.0. After execution of ChemGenome2.0 all the result files will be copied in to the current directory.

Running

Run the ChemGenome2.0 with following paramters 
        $ ./Chemgenome2.0 <genome_file_name> <orf_length> <method> <Start Codon(ATG OR|AND CTG OR|AND GTG OR|AND TTG) > 

Arguements
Threshold Value: If you have small genome you can specify lower threshold
value to find smaller genes. If you have large genomes you can specify higher
threshold value to weed out false positives

Start Codon: You can specify what should be the start codon with which you
want to find genes.

Method : 
DNA Space: The method takes complete or part of genome sequence of prokaryotic
species in FASTA format as input file. It searches for genes based on
physico-chemical properties of double-helical deoxyribonucleic acid (DNA). 

Protein Space: The method takes the result generated from DNA space as input
file and works as a filter based on stereochemical properties of protein
sequences to reduce false positives. 

Swissprot Space :The method takes the result generated from protein space as
input file and calculates the standard deviation of a query nucleotide
sequence (predicted gene sequence) with the swissprot proteins based on the
frequency of occurrence of aminoacids. A threshold standard deviation is
chosen to keep the false positives at minimum and precision at maximum. 
---------------------------------------------------------------------------------------------------------------------------
Output of the Program

On Version avaiable online, there is graphical output. In downloadable version
following files are created in tmp folder. 

The output files are
1. 1main_orfs - Genes predicted in 1st main reading frame
2. 2main_orfs - Genes predicted in 2nd main reading frame
3. 3main_orfs - Genes predicted in 3rd main reading frame
4. 1complementary_orfs - Genes predicted in 1st complementary reading  frame
5. 2complementary_orfs - Genes predicted in 2nd complementary reading  frame
6. 3complementary_orfs  - Genes predicted in 3rd complementary reading  frame
7. gene_sequences - Gene Sequences of the predicted genes


Speed
Time taken by the program will depend on genome size and the speed  of the system on which its run. It takes usually 1-2 minutes for 1MB genome on a Pentium 4, CPU 2.40 GHz, 248 MB RAM with swissprot space

----------------------------------------------------------------------------------------------------------------------------
References  

[1] "Molecular Dynamics Based Physicochemical Model for Gene Prediction in Prokaryotic Genomes", Poonam Singhal,B Jayaram,Surjit B. Dixit and David L. Beveridge.Manuscript under revision.

 
[2] "A Physico-Chemical model for analyzing DNA sequences", Dutta S, Singhal P, Agrawal P, Tomer R, Kritee, Khurana E and Jayaram B, J.Chem. Inf. Mod.,46(1), 78-85, 2006.[ ABSTRACT ].

 
[3] "Beyond the Wobble : The rule of conjugates", Jayaram B, Journal of Mol. Evol.,1997,45,704-705.
 
