Genome Analysis

Regulation of gene expression is a matter of chemistry between DNA and proteins at the molecular level. While remarkable advances have been made over the last two decades in gene identification using statistical and mathematical models with a heavy usage of databases and computational protocols, a gene finding model which directly captures the physicochemical properties intrinsic to DNA sequences and the chemistry of protein DNA interactions remains a goal yet to be realized.

Progressing towards development of an ab initio physico-chemical model christened ChemGenome, we constructed three dimensional vectors for each tri-nucleotide (codon) considering hydrogen bond energy, stacking energy and a third parameter which we provisionally identified with groove potentials. As this three-dimensional vector walks along a genome, the net orientation of the resultant vector differs significantly for gene and non-gene regions. The model works well for prokaryotic genomes and shows promise of universal applicability. Efforts to develop ChemGenome into a stand-alone algorithm for gene prediction are in progress. .


  ChemGenome 1.1 Accuracy

S.No.

NCBI_ID

Species Name

Genes

TP#

FP#

SS#

SP#

CC#

1

NC_000117

Chlamydia trachomatis

463

458

4

0.98

0.99

0.98

2

NC_000853

Thermotoga maritima MSB8

641

619

3

0.96

0.99

0.96

3

NC_000854

Aeropyrum pernix K1

561

532

7

0.94

0.98

0.93

4

NC_000868

Pyrococcus abyssi GE5

632

630

241

0.99

0.63

0.49

5

NC_000907

Haemophilus influenzae

955

953

7

0.99

0.99

0.99

6

NC_000908

Mycoplasma genitalium G-37

189

186

2

0.98

0.98

0.97

7

NC_000909

Methanocaldococcus janaschii

720

708

9

0.98

0.98

0.97

8

NC_000912

Mycoplasma pneumoniae M129

243

241

2

0.99

0.99

0.98

9

NC_000913

Escherichia coli K12

2759

175

659

0.63

0.72

0.39

10

NC_000915

Helicobacter pylori

731

727

4

0.99

0.99

0.98

11

NC_000916

Methanobacterium thermoautotrophicum

719

711

4

0.98

0.99

0.98

12

NC_000917

Archaeoglobus fulgidus

782

774

8

0.98

0.98

0.97

13

NC_000917

Archaeoglobus fulgidus DSM4304

782

774

8

0.98

0.98

0.98

14

NC_000918

Aquifex aeolicus VF5

584

575

3

0.98

0.99

0.97

15

NC_000921

Helicobacter pylori strain J99

658

648

9

0.98

0.98

0.97

16

NC_000922

Chlamydophila pneumoniae CWL029

597

590

9

0.98

0.98

0.97

17

NC_000948

Borrelia burgdorferi B31 plsmids cp32-1

11

11

0

1.0

1.0

1.0

18

NC_000949

Borrelia burgdorferi B31 plsmids cp32-3

11

11

0

1.0

1.0

1.0

19

NC_000950

Borrelia burgdorferi B31 plsmids cp32-4

11

11

0

1.0

1.0

1.0

20

NC_000951

Borrelia burgdorferi B31 plsmids cp32-6

10

10

0

1.0

1.0

1.0

# True positives (TP): Genes evaluated as genes.
# False positives (FP): Non-genes evaluated as genes.
# True negatives (TN): Non-genes evaluated as non-genes.
# False negatives (FN): Genes evaluated as non-genes. Number of actual positives
(AP) = TP+FN. Number of actual negatives (AN) = FP+TN.
Predicted number of positives (PP) =TP+FP.
Predicted number of negatives (PN) = TN+FN. Sensitivity (SS) =TP / (TP+FN).
Specificity (SP) =TP / (TP+FP).

REFERENCE :

1.) Dutta, S., Singhal, P., Agrawal, P., Tomer, R., Kritee, Khurana, E. and Jayaram, B. A Physico-Chemical Model for Analyzing DNA sequences, J. Chem. Inf. Model, 2005. [ABSTRACT]

2.)Jayaram, B. Beyond the wobble: the rule of conjugates. J. Mol. Evol. 1997, 45, 704-705. [ABSTRACT]