1.3. Developer Documentation

EMBOSS is richly documented. Depending on your experience and requirements you will want to approach it in different ways:

1.3.1. Application Documentation

You should familiarise yourself with the applications and get to know what has or hasn't been done already. Every EMBOSS application is well documented:

CVS (Developers) Release documentation

http://emboss.open-bio.org/rel/dev/apps

Stable Release 6 Documentation

http://emboss.open-bio.org/rel/rel6/apps

1.3.2. Library Documentation

AJAX and NUCLEUS contain hundreds of library calls and this can be daunting at first. Documentation for AJAX and NUCLEUS is available on the EMBOSS website, for the CVS (Developers) Release and major versions of the Stable Release. The documentation is derived from structured comments in the source code itself (see Appendix D, Code Documentation Standards). It is easy to navigate, especially when you have some familiarity with the libraries, enough to guess the library file a function lives in.

1.3.2.1. AJAX Library Documentation

AJAX is the core library used by all EMBOSS applications. It covers standard data structures and algorithms:

CVS (Developers) Release documentation

http://emboss.open-bio.org/rel/dev/libs

Stable Release 6 Documentation

http://emboss.open-bio.org/rel/rel6/libs

1.3.2.2. NUCLEUS Library Documentation

NUCLEUS provides higher-level functions specific to molecular sequence analysis:

CVS (Developers) Release Documentation

http://emboss.open-bio.org/rel/dev/libs/

Stable Release 6 Documentation

http://emboss.open-bio.org/rel/rel6/libs

1.3.3. Navigating the Libraries

It is easy to navigate the library documentation available from the EMBOSS homepage (http://emboss.open-bio.org/).

  • From the EMBOSS homepage, click on "AJAX" or "NUCLEUS".

This will bring up a table for the AJAX or NUCLEUS library.

Rows in the AJAX or NUCLEUS library tables correspond to an individual library file, e.g. for Alignments, Array handling, Assert Functions and so on. There are columns in the table for:

Library documentation

Links here bring up the library file documentation (see below) which references all the available objects (C data structures) and functions for that library file.

Short description

A short description of the library file.

Programming Guide

Links here bring up a detailed programming guide and usage notes for the library file, if available (see Section 6.2, “Programming Guides”).

Example application (C source code)

Links to the C source code for an example application, that illustrates the use of the library, if available (see Section 6.1, “Demonstration Applications”).

Example application (ACD code)

Links to the ACD code for an example application (see Section 6.1, “Demonstration Applications”).

1.3.3.1. Library File Documentation

  • Find "String manipulation" in the table and follow the link under "Library documentation".

This will bring up the documentation available for string handling (ajstr.c/h library files).

The library file documentation includes the following sections:

Summary

A short description of the library file.

Description

A longer description of the library file.

Data structures

Table of names, short description and links to further information for each object (C data structure).

Description of Function Categories

Formal description of each function category in the library file, organised by object type.

Functions (organised by object and category)

Table of names, short description and links to formal description for each function in the library, organised by object type and function category.

Functions (alphabetic listing)

Table of names, short description and URL to a formal description for each function in the library, organised alphabetically.

Following a link in the tables of objects or functions brings up information on the objects and functions themselves (see below).

1.3.3.2. Function Documentation

The function documentation includes all the critical information.

The sections in the file are as follows:

Function Synopsis

This includes the function name, short description and the EMBOSS version number when it was first made available.

Function Prototype

The function prototype is given in standard C form.

Function Parameters Table

The function parameters are summarised in a table which organises parameters reflecting their relationship to the function as follows:

  • INPUT parameters are read by the function.

  • OUTPUT parameters are written by it.

  • UPDATE parameters may be read and written.

Returns

Description of return value(s).

Description

Full description of function.

Source Code

C source code of function.

Usage Example

A typical use of the function, generated automatically.

Notes

Peripheral documentation such as usage notes.

Warnings

Cautionary usage advice, known bugs etc.

Exceptions

Exception and other messages the function might generate.

Dependencies

External entities the function is dependent upon, for example, environment variables and files.

See Also

Links to functions in the same category.

There may well be several fields which are blank. These will be completed along with progress in documenting the software libraries.

1.3.3.3. Object (C data Structure) Documentation

The objects are comprehensively described.

The sections are as follows:

Structure Synopsis

This includes the C data structure name, short description and EMBOSS version number when it was first made available.

Synopsis

Object synopsis (datatypes and variable names).

Data definitions

Definitions of datatypes for the object.

Description

Full description of object.

Elements

Description of elements in the data structure.

Functions

Functions that operate on the object.

Source Code

C source code of the data structure.

Usage Example

Typical usage example, generated automatically.

Notes

Peripheral documentation such as usage notes.

Warnings

Cautionary usage advice, known bugs etc.

See Also

Links to structures in the same library file.

Again, several fields might be blank and will be completed along with progress in documenting the software libraries.

1.3.4. The Source Code

The source code is a vital reference. A simple method for searching the library or application code is to use the UNIX command grep to search the C source files for keywords. This is a convenient and direct way to find objects or functions quickly.

If you are unsure how to do a particular task, for example reading in a data file, then you should quickly be able to find a program that does something similar to what you need. Bear in mind there are many ways to solve a problem and the example you find might not necessarily be the best way.

There are two files (the C source code and the ACD file) to look at for each application. They're kept in the directories:

/home/auser/emboss/emboss/emboss/c
/home/auser/emboss/emboss/emboss/acd/

1.3.4.1. Navigating the Source Code using SRS

The source code (for the CVS (Developers) Release and the latest Stable Release) may be inspected directly and navigated using SRS. The library source code is indexed in SRS at the EBI SRS Server:

http://srs.ebi.ac.uk/

There are separate SRS databases for objects (C data structures) and functions:

1.3.4.1.1. Searching EDATA

From http://www.ebi.ac.uk/srs/:

  1. Click on the Library Page tab at the of the screen.

  2. Expand the Other databases section by clicking on the + to the left of Other databases.

    You will see EDATA, EDATAREL, EFUNC and EFUNCREL listed.

  3. Highlight the check-box next to EMBOSS Data Structures (CVS) and then click on the Query Form tab.

  4. Change one of the AllText options to ID and type a * character in its associated box, then click on Search.

You will see a list of every available object. Here is a more specific search:

  1. Return to the query form and replace the * by ajpstr (the AJAX string object).

  2. Click on Search.

You'll see that two entries are returned, AjPStr and AjPStrTok. Click on the link for AjPStr.

The documentation here is in several sections. The first three give the name, description and "aliases" of the object:

  • AjSStr is the name of the string object.

  • AjPStr is the datatype for the object pointer.

  • AjPPStr is the datatype for a pointer to the object pointer.

  • AjOStr is the datatype for the object proper.

Meaning of AjSStr, AjOStr, AjPStr

AjSStr is the formal name of the string object, AjOStr is the datatype name for the object whereas AjPStr is the datatype name for the object pointer. In practice AjOStr (and all other AjO* datatypes) are never used in EMBOSS. Instead, memory for an instance of the object in memory is dynamically allocated to the pointer AjPStr (see Section 5.5, “Programming with Objects”). For this reason, AjPStr is given after "Name" in SRS and for the sake of brevity, "object" is often used to refer to an AjPStr (for example) when what is really meant is "object pointer". The use of objects and pointers is covered in depth elsewhere (Section 5.5, “Programming with Objects”).

EDATA and EDATAREL include links to functions that use each object, which is handy if you want to know what you can do with an object. The functions in EFUNC and EFUNCREL are organised into categories of related functionality that correspond to sections in the C source file (see Appendix D, Code Documentation Standards and below).

After the Alias(es) section you'll see several more blocks which correspond to the function categories Each block contains a list of available functions within that category. The categories you see will depend upon the library file, but might include:

  • Iterators - iteration, e.g. over individual characters in a string.

  • Constructors - create new instances of an object (allocate memory).

  • Destructors - destroy instances of an object (free memory).

  • Assignments - initialise an object, replace contents if necessary.

  • Modifiers - change or replace the contents of an object.

  • Operators - use, but do not change, the contents of an object.

  • Outputs - write the contents of an object to an external file.

  • Casts - convert an object into an object or data of another type.

At the bottom of the page you'll see the following section:

  • Attributes lists the elements of the C data structure.

  • Body gives the C code for the object definition.

1.3.4.1.2. Searching EFUNC

The EFUNC database can be searched directly. This is useful if you know the kind of function you want but don't know the name. The function names and names and order of function parameters have been standardised (see Appendix D, Code Documentation Standards) to be intuitive and consistent.

Let's assume you want to search for a function that appends one string to another:

  1. Return to the SRS databases page, uncheck the EDATA database and check the check-box for the EFUNC database.

  2. Select the query form.

  3. It's often best to limit the search to the description field so as to retrieve more specific matches. So:

    Change AllText to Description

  4. Type append & string into the associated box, then click on Search.

A list of functions will appear. You can only use those functions that begin with aj or emb; public functions in the AJAX and NUCLEUS libraries respectively. The others are hidden functions; accessed by the internals of EMBOSS and not for general use.

From looking at the names, the functions you need are those in the ajStrAppend* family. You'll see that some of the functions accept other string objects, character strings or just single characters.

This search method is of course limited by the vocabulary used in the function descriptions. For instance, the term "append" is used rather than "catenate". You can see this for yourself by repeating the above search using catenate & string.

To show the advantage of limiting the search:

  • Change the Description field back to AllText and repeat the string & append query.

You'll see that there is a significant amount of noise in the results list.

Of course you can use SRS if you know the name of a function and need to examine the source code.

  1. Return to the EFUNC page and change AllText to ID.

  2. Now use ajstrappend as the search term. Perform the search and then click on EFUNC:ajStrAppendS.

You should see the source code for ajStrAppendS on screen. Again, the output is in several sections. The name of the function indicates the source library file in which it is to be found; the str of ajStrAppendS indicates the ajstr library. The description field gives the text you search with a Description search.

The most useful information for a user of the library are the Input, Returns and Prototype fields.

The Input field shows that this function takes the address of a string object pointer as its first parameter and a string object pointer per se as its second parameter. The Returns field shows, as expected, the return value of the function (AjBool, a boolean value). All this information is given at-a-glance in the Prototype field for the function (the prototypes are included in the library code so you don't need to declare them in your applications). A prototype tells the compiler what a function is expecting and what it will return.

Below the prototype is the body of the function. This patently contains the source code of the function. C language reserved words are highlighted in red. The source code is marked-up with any calls to other EMBOSS functions. Unhighlighted function calls are standard C library calls. You could click on, for example, ajFatal and see the code for that function.

Clicking on the red arrow on the prototype line will show all the EMBOSS functions that use this particular function. Clicking on the blue arrow will show all the EMBOSS functions that are called by this particular function.

As an EMBOSS application programmer you really don't need to know most of the detailed information above, just the inputs and returns. As a library developer, all the information is useful.

1.3.5. Demonstration Applications

EMBOSS includes, for certain AJAX and NUCLEUS library files, an application which illustrates the correct usage of the common functions. Currently, these "demonstration applications" are kept in the myembossdemo package and have the prefix "demo". Of course, there is an ACD file for each application. For example the following files illustrates the use of the string library:

/home/auser/emboss/emboss/embassy/myembossdemo/emboss_src/demostring.c
/home/auser/emboss/emboss/embassy/myembossdemo/emboss_acd/demostring.acd

For information on compiling and using these applications see Section 3.1, “EMBOSS Programming”.

1.3.6. Programming Guides

Programming guides (Section 6.2, “Programming Guides”) are available for most AJAX sub-libraries. These summarise the available C data structures and functions and examples of their use. They are very useful if you want to learn all about a particular area of EMBOSS programming.

1.3.7. AJAX Command Definition (ACD) Developers Guide and Syntax

Every EMBOSS application has an AJAX Command Definition (ACD) file which contains a complete definition of the command line interface and defines all the information the application needs to run. A single library function call from the application source code parses the ACD file and command line and prompts the user for any values still needed.

ACD files are written in the ACD syntax (Appendix A, ACD Syntax Reference) which defines a set of datatypes available to the applications, attributes for qualifying the datatypes, and much more besides. To develop new applications you will need to master ACD programming (see Chapter 5, C Programming).

1.3.8. C Coding Standards and Guidelines

To ensure consistency, all code should conform to a basic style and standards. You should familiarise yourself with these C coding standards (Appendix C, C Coding Standards), most of which concern the layout of code.

1.3.9. Quality Assurance Guidelines

Various quality assurance (QA) tests are performed on the code and documentation to maintain the quality and integrity of the package. This includes application test runs, compilation and memory leak tests and validation of the structured documentation used for objects and functions.

All code should be thoroughly tested and new library code should be documented to the EMBOSS standard (see below) so that checks can be performed. QA testing is handled by the EMBOSS developers but there are ways to help; if you develop a new application you should also provide test data for it (see Chapter 7, Quality Assurance).

1.3.10. Code and Application Documentation Standards

Software without documentation often has little value whereas good documentation can enhance the usefulness of software immensely. All contributed code should be adequately documented. End-user documentation is also required for any new applications. To ensure consistency, the documentation should conform to a basic style and standards that are defined for the code (Appendix D, Code Documentation Standards) and the applications (Section 8.1, “Application Documentation Standards”).

1.3.11. EMBOSS Software Development Course

Hands-on courses in "Bioinformatics Software Development using EMBOSS" provide a good introduction to programming in EMBOSS, including all the steps to writing a basic bioinformatics application using the EMBOSS programming libraries. If you would like to attend or host a course then get in touch with the EMBOSS developers (emboss-bug@emboss.open-bio.org).