BOSC 2000

Telegraph: A Free Library For Probabilistic Sequence Analysis

Ian Holmes, Guy Slater and Ewan Birney

Ian Holmes <ihh@fruitfly.org>
Guy Slater <guy@ebi.ac.uk>

Ewan Birney <birney@ebi.ac.uk>

Hidden Markov Models (HMMs) have been used successfully for a wide range of applications in bioinformatics including protein domain classification, signal peptide recognition and gene prediction, among others. New developments such as Fisher kernels suggest that HMMs have plenty more to offer and that many ideas remain unexplored.

Many computational biologists express interest in trying out new kinds of HMM architecture for novel problems. In practise, this means writing or otherwise leveraging code to handle:

  1. arbitrarily structured HMMs
  2. mathematical manipulation of likelihood derivatives and calculus
  3. training algorithms
  4. very fast dynamic programming routines for database searching

Several free software packages exist that can do some of these things, but the absence of a free package with all these capabilities means that experimenting with HMMs often necessitates writing a lot of specialised code.

Telegraph picks up where one previous Open Source project -- Dynamite -- left off. Begun as a collaboration between the EnsEMBL group in Cambridge and the Berkeley Drosophila Genome Project, Telegraph's primary goal was to formalise the functionality of Dynamite while adding capabilities essential for machine learning.

Enrichments of Telegraph over Dynamite include a supportive infrastructure for working with likelihood calculus (including training, priors, sampling, posterior probabilities and Fisher kernels) as well as a modular design with an XML exchange format linking the higher-level Perl object model to a fast C engine.

The BOSC and ISMB 2000 conferences coincide with Telegraph's "going public", and we strongly hope that potential developers, users and testers will contact us during or after the meetings. As an incentive, there will be a repeat of Dynamite's now-legendary offer of a bottle of champagne for the best bugfinder at each release.

As the number of biologists interested in experimenting with HMMs rises, while the computational demands of the latest algorithms exceed the capabilities of any one centre, the potential for a "lingua franca" in which sequence analysis tools can be described and exchanged is high. We very much hope that you will be interested in becoming involved in any capacity, be it as a user or a developer, and that you'll contact one of the following Telegraph co-ordinators: