2.2. helloworld in EMBOSS

You probably write your programs to a standard pattern: you write a file of source code, compile the source code, then debug the resulting executable program. You finally run your debugged binary to achieve the task at hand.

There are a couple additional steps when writing an EMBOSS program. The key difference is that in addition to writing the source code, you must also write an ACD file for your new application. An ACD file contains a description of the command-line interface. It specifies exactly what input values are required and how to verify them, what is output and controls the behaviour of the application on the command line, in particular the user input operations.

All of the parameters required for an application are prompted for before the application proper begins. The input values are read and held in memory, files are opened as required and so forth, so that all the parameters are available when the application proper starts. An EMBOSS application cannot ask the user for more information after several hours of processing!

It's good practice to write your ACD file before the source code because this forces you to think closely about the application inputs and outputs and exactly what's required from the user. You should then test the ACD file by using an EMBOSS application called acdc (more on this shortly). In addition, you must integrate your application into EMBOSS or EMBASSY.

So, the basic steps to writing your first EMBOSS application are:

  1. Write ACD file

  2. Test ACD file

  3. Write source code

  4. Integrate application into EMBOSS or EMBASSY (e.g. myemboss)

  5. Compile

Additionally, there are several other considerations with any software project:

These additional steps, summarised below, cannot normally be omitted for deployment in production environments. They are covered in detail later (Section 3.1, “EMBOSS Programming”).

2.2.1. Planning and Design

You should think carefully about the task at hand and plan or design your software before coding. Think about the inputs and outputs and the major logical steps in the source code. This is a very simple program therefore a simple list of steps should suffice. In very broad terms the program needs to:

  1. Read and process the ACD file

  2. Print "Hello World!" to the screen

  3. Exit cleanly

Futher suggestions for planning and designing software are given in Section 3.1, “EMBOSS Programming”.

2.2.2. Writing the ACD File

The input and output of helloworld is trivial. All the program has to do is print "Hello World!" to the screen and so nothing is required from the user. It's no surprise then that the ACD file is pretty sparse. As a minimum though, all ACD files must contain an application definition with a single documentation: attribute:

application: helloworld
[
    documentation: "Prints 'Hello World!' to the screen."
]

Every ACD file must contain an application definition, and this should come first in the file. The definition consists of the application: token, followed by the application name and a block of attributes held between square brackets. Each attribute is a name: value pair. The definition above contains a single documentation: attribute. The text should be a succinct description of the program and will be printed to screen when the program is run. If the documentation: attribute is missing, a warning will be issued when you run the program.

Typically you will develop new code in a special EMBASSY package called myemboss that is reserved for applications that are not yet ready to be incorporated into the main EMBOSS or EMBASSY packages (see Section 3.1, “EMBOSS Programming”). Save your ACD file in the myemboss ACD directory:

.../myemboss/emboss_acd

If the program were to be added to EMBOSS itself then directory would be:

.../emboss/acd

ACD files have a filename of the form ApplicationName.acd, where ApplicationName is the name of the application. The file extension .acd is mandatory. It's sensible (but not mandatory) that the filename (without the .acd extension) is identical to the name of the C source code file.

See the detailed information on the ACD syntax (Appendix A, ACD Syntax Reference) and ACD file development (Section 4.1, “Introduction to ACD File Development”).

2.2.3. Testing the ACD file

Special utilities (Section 4.6, “ACD Utilities”) are provided to help you test and validate your ACD files. The main one you'll need is called acdc (the ACD compiler) which, when given the name of an ACD file as the first argument on the command line, will parse the file, validate it, parse the command line and "run" the application command line interface as if the application proper was running.

So, testing the ACD file is easy. You simply run acdc, giving your application name as an argument:

acdc ApplicationName

where ApplicationName is the name of the application. So, for helloworld!:

% acdc helloworld

Prints 'Hello World!' to the screen.

%

acdc reads helloworld.acd and reads in any required data just as if the application itself was running. It will also test anything on the command line and report errors in exactly the same way as the real application. In this case there is no required data and nothing else on the command line. As acdc didn't reported an error in the example above, then we can assume all is well.

2.2.4. Writing the Source Code

Happy in the knowledge you have a working ACD file you can turn to the C source code itself, which should look something like this:

/* @source helloworld Prints "Hello World!" to the screen.
**
** @author: Copyright (C) Arthur Geek (ageek@ebi.ac.uk)
**                        
** @@
**
** This program is free software; you can redistribute it and/or
** modify it under the terms of the GNU General Public License
** as published by the Free Software Foundation; either version 2
** of the License, or (at your option) any later version.
** 
** This program is distributed in the hope that it will be useful,
** but WITHOUT ANY WARRANTY; without even the implied warranty of
** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
** GNU General Public License for more details.
** 
** You should have received a copy of the GNU General Public License
** along with this program; if not, write to the Free Software
** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
******************************************************************************/

#include "emboss.h"

/* @prog helloworld **********************************************************
**
** Prints "Hello World!" to the screen.
**
******************************************************************************/
int main(int argc, char **argv)
{  
      embInit("helloworld", argc, argv);

      ajFmtPrint("Hello World!\n");

      embExit();
      return 0;
}

There are three main parts to the helloworld.c file and, indeed, to all EMBOSS programs:

  • The standard EMBOSS application documentation header

  • #include statements

  • The application code proper

For helloworld.c the program itself consists of the main() function only, but most programs have other functions besides main(). In C (and EMBOSS code is no exception), every program must have a main() function.

The source begins with the standard EMBOSS header. This block of comments includes the name of the program and its short description, copyright notice, licence information, disclaimer, author name and contact details. The tags, for instance @source, allow EMBOSS to generate documentation automatically from the code. EMBOSS applications are licensed under the GNU General Public License, so these comments must be included in the source.

Next you have the preprocessor directive #include "emboss.h". In contrast to #include <stdio.h>, this imports the entire EMBOSS interface i.e. makes all the EMBOSS library calls available to you. This must be included at the start of every EMBOSS program.

In the EMBOSS version of helloworld, the filename emboss.h is surrounded by quotes which means that the preprocessor will look in the current directory and any other directories defined in the configuration file emboss/Makefile.am.

emboss.h is the master include file. It includes all the other header files for the AJAX and NUCLEUS C programming language libraries. If you look inside the header files you'll see that eventually stdio.h is itself included:

% more nucleus/emboss.h 

#ifndef emboss_h
#define emboss_h

#include "ajax.h"
#include "ajgraph.h"
#include "embaln.h"
#include "embcom.h"
#include "embcons.h"
#include "embdbi.h"
.
.
% more ajax/core/ajax.h

#ifdef __cplusplus
extern "C"
{
#endif
#ifndef ajax_h
#define ajax_h

#include "ajarch.h"
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "ajassert.h"
#include "ajdefine.h"
#include "ajstr.h"
#include "ajtime.h"
#include "ajfile.h"
.
.

Note

The #include "emboss.h" statement is a directive for the C preprocessor. Any line beginning with a # character is processed by the C preprocessor before the source code is compiled. For example, the line #include "emboss.h" tells the preprocessor to replace that line with the whole text of the file emboss.h before compilation.

Following the pre-processor directive there is documentation for the main() function. Every function, including main(), must be documented. Undocumented code often has little value, with the exception of code that is self-explanatory. Even then it's often helpful, especially in larger programs, to document at least the main steps in the program logic.

EMBOSS uses a standard format for function documentation (see Appendix D, Code Documentation Standards). For now, all you need know is that the @prog token is used for documenting the main() function. You have already seen @source and @author in the header documentation. These tokens are read by a program that parses the source code and automatically generates the documentation that goes on the web and into SRS.

The source code proper begins with the main() function. The int indicates that the main() function is of type int. In other words, when helloworld is called, it will return an integer value to the operating system. main() functions in C are conventionally of type int.

The command line must be available therefore main must include it. This is done in the parameter list using int argc and char **argv. This is the standard way in C of saying that the program should read any input on the command line as an array of character strings. argc is the number of arguments (words on the command line separated by whitespace) and argv is the array of strings itself.

For example, let's pretend that helloworld took a parameter (it doesn't) and was invoked like this:

helloworld "Print this message"

then argv[0] would have the value "helloworld", argv[1] the value "Print this message" and argc the value of "2".

Three calls to the EMBOSS libraries are made: embInit, ajFmtPrint and embExit.

A big advantage of EMBOSS programming is that you don't need to write your own code to process the command line (the argv array); this functionality is built into the function embInit. All EMBOSS applications must call this function, which handles all of the user input processing, and do so right at the start of the application. embInit does the following:

  • Reads in local database definitions

  • Finds the right ACD file to use (the application name is "helloworld" so it looks for helloworld.acd in the ACD directory)

  • Reads the ACD file

  • Processes the command line (it uses argc and argv from main)

embInit handles all prompting of the user for values that are not entered on the command line, including functionality such as re-prompting the user for values that are out of range. If our ACD file was more complicated, and required a sequence as input and a file as output for example, then by the time the call returned it would have read in the sequence and put it somewhere in memory and also opened the output file.

ajFmtPrint is used to print text to the screen. ajFmtPrint is the EMBOSS version of the printf() C function which you'll know from the C stdio (standard input/output) library. embExit calls some internal clean-up and statistical routines.

C programming is covered in detail for most available library files (see Section 6.2, “Programming Guides”).

2.2.5. Integration (Adding the Application to EMBOSS)

Once you have your C source code and an ACD file, you must add your application to myemboss (or EMBOSS itself) before you compile it. myemboss includes two files, both called Makefile.am, which together contain information about every C source file and ACD file known to the package. To add helloworld to myemboss you must therefore edit these files. Assuming you checked out the CVS version of EMBOSS into /home/auser/emboss you'll have the following directories:

The 'executables directory' for C source files and executables:

/home/auser/emboss/emboss/embassy/myemboss/src

The 'acd directory' for ACD files:

/home/auser/emboss/emboss/embassy/myemboss/emboss_acd

The files you have to edit are:

/home/auser/emboss/emboss/embassy/myemboss/src/Makefile.am
/home/auser/emboss/emboss/embassy/myemboss/emboss_acd/Makefile.am

Were you adding the application to the main EMBOSS package, the files would be:

/home/auser/emboss/emboss/emboss/Makefile.am
/home/auser/emboss/emboss/emboss/acd/Makefile.am

The Makefile.am in the executables directory contains information about each C source file. First, you must add your program name to the bin_PROGRAMS list. This is usually done in alphabetical order. The before and after editing stages are shown below for EMBOSS but the edits are the same for myemboss.

Before editing. 

bin_PROGRAMS = aaindexextract abiview acdc antigenic \
...
garnier geecee getorf helixturnhelix hmoment \
...

After editing. 

bin_PROGRAMS = aaindexextract abiview acdc antigenic \
...
garnier geecee getorf helixturnhelix helloworld hmoment \
...

Important

When editing Makefile.am, the line continuation characters ('\') must be explicitly added to break the entries over more than one line.

Secondly you must add your application source file to the SOURCES section. The line to add has the following general syntax:

ApplicationName_SOURCES = ApplicationName.c

where ApplicationName is the name of the application. This line should be added in alphabetic order. So, the appearance of the file would be as follows.

Before editing. 

...
geecee_SOURCES = geecee.c
getorf_SOURCES = getorf.c
helixturnhelix_SOURCES = helixturnhelix.c
hmoment_SOURCES = hmoment.c
iep_SOURCES = iep.c
infoalign_SOURCES = infoalign.c
...

After editing. 

...
geecee_SOURCES = geecee.c
getorf_SOURCES = getorf.c
helixturnhelix_SOURCES = helixturnhelix.c
helloworld_SOURCES = helloworld.c
hmoment_SOURCES = hmoment.c
iep_SOURCES = iep.c
infoalign_SOURCES = infoalign.c

The Makefile.am in the ACD directory contains information about each ACD file. All that needs to be done for this file is to add the name of the new ACD file. Again, it is usual to do this alphabetically. Here's what the file for the main EMBOSS package looks like:

i) Before editing. 

pkgdata_DATA = codes.english \
        aaindexextract.acd abiview.acd ajbad.acd ajfeatest.acd ajtest.acd \
...
        garnier.acd geecee.acd getorf.acd helixturnhelix.acd hmoment.acd \
        histogramtest.acd iep.acd infoalign.acd infoseq.acd isochore.acd \
        lindna.acd listor.acd \
        marscan.acd maskfeat.acd maskseq.acd \
        matcher.acd

i) After editing. 

pkgdata_DATA = codes.english \
        aaindexextract.acd abiview.acd ajbad.acd ajfeatest.acd ajtest.acd \
...
        garnier.acd geecee.acd getorf.acd helixturnhelix.acd helloworld.acd \
        hmoment.acd histogramtest.acd iep.acd infoalign.acd infoseq.acd \
        isochore.acd lindna.acd listor.acd \
        marscan.acd maskfeat.acd maskseq.acd \
        matcher.acd

Again, line continuation characters ('\') must be added explicitly.

2.2.6. Compilation

You compile the application by typing one of the following from the executables directory (.../myemboss/source):

make helloworld
make

The latter option may be slower as it will sometimes (when library changes are made) have to compile everything.

The GNU tools will recognise whether the Makefile.am files have been edited and reconstruct the Makefile files when a make command is given. It is bad practice to edit the Makefile files themselves.

Here's the example for helloworld compiled in the main EMBOSS package:

% pwd
/home/auser/emboss/emboss/emboss/

% make helloworld
/bin/sh ../libtool --tag=CC --mode=link gcc  -O2 -Wall -fno-strict-aliasing   -o helloworld  helloworld.o 
../nucleus/libnucleus.la ../ajax/libajaxg.la ../ajax/libajax.la ../plplot/libplplot.la -L/usr/X11R6/lib 
-lX11  -lm -lgd -lpng -lz -lm

gcc -O2 -Wall -fno-strict-aliasing -o .libs/helloworld helloworld.o  ../nucleus/.libs/libnucleus.so 
../ajax/.libs/libajaxg.so ../ajax/.libs/libajax.so ../plplot/.libs/libplplot.so -L/usr/X11R6/lib -lX11 
-lgd -lpng -lz -lm -Wl,--rpath -Wl,/home/auser/emboss_test_installation_for_course/emboss/lib

creating helloworld

Finally, to run the program:

% helloworld
Prints 'Hello World!' to the screen.
Hello World!
%

2.2.7. Debugging

No debugging should be required in this case but larger programs will invariably contain bugs that need fixing before the application will run to completion, or even run at all. Debugging is covered in greater detail elsewhere (Section 3.3, “Debugging”).

2.2.8. Testing

Thorough testing is an essential part of software development. For EMBOSS this includes formal quality assurance tests that are run on a regular basis by the EMBOSS developers to ensure the applications work as anticipated. If you want to contribute your applications you will need to write these (see Chapter 7, Quality Assurance).

2.2.9. Documentation

You should ensure that the main() function is appropriately documented and the C source file includes the standard documentation block. helloworld is so simple it doesn't require end-user documentation other than the basics that are automatically generated from the source and ACD file.

More complex programs should be fully documented. This includes documentation in the code (see Appendix D, Code Documentation Standards), e.g. for datatypes and functions, and end-user documentation (see (Section 8.1, “Application Documentation Standards”) for the application as a whole.