3.3. Debugging

Bugs are problems with code that cause it to crash or operate in an unexpected way. They arise through erroneous use of syntax (which is not always caught by compilers) and errors in the code logic. This section gives some practical hints for debugging EMBOSS code.

3.3.1. Direct Debugging

Very broadly, debugging proceeds in four stages:

  1. Fixing bugs that prevent the program from compiling

  2. Fixing bugs that cause the program to crash

  3. Fixing bugs that cause the program to operate incorrectly

  4. Fixing bugs that manifest under extensive test conditions

With experience, most bugs are obvious from visual inspection of the code. It is highly recommended that, before you compile your code, you read then re-read it until you're satisfied it should work as expected. Be scrupulous when writing the code itself. Avoid the temptation to code too quickly; the extra time spent avoiding errors in the first place will be very well rewarded later.

Tip

A simple and powerful debugging method is to use ajFmtPrint and fflush(stdout) statements to report values of key variables at different stages of execution, allowing you to trace and identify problems. ajFmtPrint is used to print variables, and fflush is called immediately afterwards to flush the output buffer. This is important as the output buffer might have content at the point of the crash, which will only be printed to screen by calling fflush(stdout). Most bugs are easily squashed using this method.

When writing your code, there will be many stages where you know in advance what value(s) a variable should (or should not) have, particularly when checking a function's arguments or return values. For instance, the value of a pointer used by a function should in most cases not be NULL. At all such places, at least in early versions of the code, code should be added to trap errors and raise appropriate warnings, alerting you to potential bugs before they manifest.

AJAX functions in ajmess.h provide various levels of error handling. Each of the following format and output an exception message (provided as a string):

ajUser

Report an informative message

ajWarn

Report a warning message

ajErr

Report an error message

ajExit

Report a message then exit

ajDie

Report a message then crash (kill) the application

ajDebug

Report a general debugging message to the file programname.dbg if the switch -debug was given on the command line

Messages go to stdout or stderr (in both cases usually the screen) or, in the case of ajDebug, to the file programname.dbg. The EMBOSS code makes extensive use of ajDebug so that bugs reported by users can easily be traced. The typical way therefore to debug applications is to produce a debug trace using -debug. In practice, much of the error-trapping code can, for purposes of speed, be commented out or removed once extensive testing of the code is complete. This is especially true for library function code where speed is paramount.

For the vast majority of applications, ajDebug and the other functions above will suffice. In special circumstances however you might need to write your own exception handling functions. An AJAX library file ajmess is provided for this (see Section 6.23, “Handling Exception Messages”).

3.3.2. AJAX Debugging Functions

Most AJAX library files include functions for debugging the code in that library file. These usually call ajDebug. Some libraries provide more comprehensive debugging functions than others, but typically functions are provided to report on the internal state of data structures defined in that library file. Generally, debugging functions are organised under their own "Debug" section in the library C source code and online documentation. In some cases special debugging files are provided. For example, there is a "debug" output sequence format used when debugging sequence output, and a "trace feature table" used when debugging report formats.

For more information, see the library documentation for AJAX and NUCLEUS. See also the programming guides (Section 3.1, “EMBOSS Programming”) for individual library files.

3.3.3. Controlling Debugging Behaviour

The default behaviour of EMBOSS is not to report debugging information generated by calls to ajDebug. The global command line qualifier -debug can be used to turn debugging on for any EMBOSS application. For example, if you think you have found a bug when the following command is issued:

seqret sequence.seq

then debugging can be turned on as follows:

seqret sequence.seq -debug

This will create a debug file called seqret.dbg. Debugging could be explicitly turned off by prepending the qualifier with no:

seqret sequence.seq -nodebug

but there's normally no need to do this as the default is false (no debugging) anyway. It could however be useful if debugging was turned on by default in the EMBOSS configuration files or by an environment variable. Debugging can be globally switched on using the EMBOSS environment variable:

EMBOSS_DEBUG

If this is set TRUE, all programs act as if they have -debug set on the command line. They create a file called programname.dbg containing debugging information.

Logging of the processing of the EMBOSS configuration files emboss.default and .embossrc can be turned on using:

EMBOSS_NAMDEBUG

This processing takes place before the -debug command line switch is processed. The functions that are called are described in ajnam.c . The debugging information includes:

  • A report of all defined databases, variables and environment variables

  • A report of defined attributes for a database definition.

  • A report of defined attributes for a resource definition.

3.3.4. Debuggers

Some bugs might not be obvious from visual inspection or easily traced using ajFmtPrint and fflush(stdout). For these, it is worth using specialised debugging software. A debugger executes the bugged program and traces its internal state to allow problems with the code to be rapidly identified and fixed. The functions available depend on the debugger used. Most give control over the program, allowing it to be executed in a stepwise manner, variables to be given values and so on, providing a quicker and more reliable means of stepping through the logic of the code than doing this mentally. Most debuggers should provide at least the following information:

  • The line of code and statement the program crashed on

  • If the error occurred within a function, the line the function was invoked from and the arguments

  • The values of variables (local to a function or global variables) at a particular point during execution of the program

  • The result of a particular expression in a program

The most popular UNIX debugger is GDB, the GNU debugger (see http://sourceware.org/gdb/). It includes powerful features for tracing and altering the execution of a program, but you'll only need to use the very basics for it be extremely useful.

If you intend using GDB to debug EMBOSS code, it's necessary to configure the package using:

--enable-debug

before you build the package (see Section 1.2, “Installation of CVS (Developers) Release”).

Bear in mind that the output of debuggers cannot always be entirely trusted. Rare cases can arise where the behaviour of the executable compiled for debugging is subtly different to the standard executable. For this reason, ajFmtPrint and fflush statements (see above), which are less invasive as far as the executable is concerned, remain the tool of last resort.

3.3.5. Tracing Memory Problems

Some bugs evade identification by direct means or debuggers. Such bugs are usually caused by serious programming errors including the invalid use of pointers, memory corruption, memory leaks, or invalid memory access, for instance, an out-of-bounds array. These errors can be extremely difficult to trace but can easily arise in C programming, especially where C pointers are used extensively. Fortunately, there are several excellent programs available that can trap and identify such problems. Even if your code appears to be working correctly it is well worth using a memory checker to check that memory is not being violated. Their use can avoid hours of frustration later on and will help ensure your code remains stable in different use cases. Popular programs include:

Valgrind (http://valgrind.org/)

A suite of tools for debugging and profiling Linux programs. It detects many memory management and threading bugs. It also performs profiling which helps you to identify ways to speed up and reduce memory usage of your programs. Valgrind runs on X86/Linux, AMD64/Linux, PPC32/Linux and PPC64/Linux.

PurifyPlus

Commercial software for memory corruption detection, memory leak detection, application performance profiling and code coverage analysis.

Insure++

Powerful commercial software checking the integrity of memory usage and detecting potential defects and inefficiencies in memory usage.