2. Administration of EMBOSS

EMBOSS was conceived as an open platform for bioinformatics applications and their development after the libraries of the Wisconsin (GCG) package were made proprietary. It now supersedes that package in most areas. EMBOSS was designed to cater for professional bioinformaticians, who need simple, or at least well documented tooling that is easily interfaced with diverse systems. The programs are adaptable for use in different situations. AJAX Command Definition (ACD) files, written in plain text, define all application parameters and command line behaviour, such as default and permissible values, allowing applications to be customised and configured for different purposes without recompilation. EMBOSS has the benefit of freely accessible source code, so where requirements go beyond that which can be handled by changes to the ACD file, novel applications can be developed rapidly and at minimal overhead. There are no arbitrary limits on the amount of data that can be processed, the upper memory limit is determined by the available system memory.

The command line interface is powerful and consistent. Crucially, all user input, including command line parsing and user prompting, is handled automatically before the main application starts. This guarantees that applications receive correct values for all required options when running and will not re-prompt the user for more information. This makes the tools particularly suitable for scripting and for web interfaces. EMBOSS configuration files allow installation-wide configuration and multiple individual user-specific configurations. Environment variables allow the global behaviour of the programs, including paths to data and other files, to be set and controlled conveniently.

Database integration in EMBOSS is flexible. EMBOSS supports a variety of local databases and access methods including flat and indexed files such as EMBL flatfiles and some BLAST indices. Local utilities and database systems may be defined as access methods and new databases and access methods are added easily. Retrieval of sequence data over the web is transparent and remote servers (e.g. SRS & MRS) may be defined as databases, often providing the same access to the user as from a local database.

EMBOSS runs on almost every available UNIX platform, MS Windows XP, MS Vista and Mac OSX. The applications are reliable and will hold up in demanding, high-throughput environments. Nightly compilation tests are performed on a variety of platforms and quality assurance (QA) tests are run on all applications ensuring everything works as expected. Applications are tested for memory usage ensuring they do not leak or corrupt memory. Regular updates and fixes arising from these tests and from community bug reports are made available in a timely manner.