Genome analysis entails the prediction of genes in uncharacterized genomic sequences. The 21st century has seen the announcement of the draft version of the human genome sequence. Model organisms have been sequenced in both the plant and animal kingdoms.
However, the pace of genome annotation is not matching the pace of genome sequencing. Experimental genome annotation is slow and time consuming. The demand is to be able to develop computational tools for gene prediction.
Computational Gene prediction is relatively simple for the prokaryotes where all the genes are converted into the corresponding mRNA and then into proteins. The process is more complex for eukaryotic cells where the coding DNA sequence is interrupted by random sequences called introns.
Some of the questions which biologists want to answer today are:
- Given a DNA sequence, what part of it codes for a protein and what part of it is junk DNA.
- Classify the junk DNA as intron, untranslated region, transposons, dead genes, regulatory elements etc.
- Divide a newly sequenced genome into the genes (coding) and the non-coding regions.
The importance of genome analysis can be understood by comparing the human and chimpanzee genomes. The chimp and human genomes vary by an average of just 2% i.e. just about 160 enzymes. A complete genome analysis of the two genomes would give a strong insight into the various mechanisms responsible for the differences.
Given below is a table listing down the estimated sizes of certain genomes and the number of genes in them.
|