5.1. Basic Datatypes

5.1.1. Fundamental C Datatypes

The fundamental datatypes that the C programming language supports include two basic arithmetic types. Integer types represent characters or whole numbers whereas floating types represent floating-point numbers and have a sign bit, mantissa and exponent.

The types may be prepended with modifiers to specify whether or not the type is signed (integer types only) and the size (not for char which is usually always 8 bits) and precision (floating types only). Integer types on most operating systems (e.g. IRIX is an exception) are all signed unless specified otherwise: unsigned integers can hold a greater range of positive values. The exact types available and their sizes depend on the compiler. The common types and nominal sizes are given in the table below (Table 5.1, “C Fundamental Datatypes”).

Table 5.1. C Fundamental Datatypes
TypeDescriptionSize (bytes)
charCharacter1
short intShort integer2
intInteger4
long intLong integer8
unsigned charUnsigned character1
unsigned short intUnsigned short integer2
unsigned intUnsigned integer4
unsigned long intUnsigned long integer8
floatFloating point number4
doubleDouble precision floating point number8
long doubleExtra precision floating point number12

Others may be available, for example the long long integer type. The standard C header limits.h defines the size of a char (e.g. 8 bits) and the largest and smallest values of the other integer types that are permitted in any implementation, such that each type is guaranteed to be of a minimum size and no larger than the following type. For example, an int is never longer than a long int. Similarly, the header float.h (or equivalent) defines constants such that a floating type is always at least as precise as the previous type. For example, a double is always at least as precise as a float.

In addition to the fundamental types described above, C defines the void type which specifies an empty value and is used, for example, for the return type of functions that do not return a value.

There are also enumerations, unique types that are associated with a set of named constant integer values.

5.1.2. Fundamental AJAX Datatypes

5.1.2.1. Integer Types

The exact size and precision of these C datatypes (excluding char) is in fact implementation dependent. To avoid any potential problems with these system-dependent limits, AJAX defines in the AJAX file ajarch.h a new set of fundamental datatypes as shown in the table (Table 5.2, “Fundamental AJAX Datatypes”).

Table 5.2. Fundamental AJAX Datatypes
TypeDescription
ajshortShort integer
ajintStandard integer
ajlongLong integer
ajushortUnsigned short integer
ajuintUnsigned standard integer
ajulongUnsigned long integer

There are some differences between the systems listed in ajarch.h but the typical definitions are as follows:

typedef short ajshort;
typedef int ajint;
typedef long long ajlong;

typedef unsigned int ajuint;
typedef unsigned short ajushort;
typedef unsigned long long ajulong;

An ajint is typically 32 bits and an ajlong typically 64 bits. Use ajint, if 32 bits is enough, instead of int. Use ajlong instead of long or long long. That said, standard C int and long should be used in some circumstances, for example as parameters to C system library functions.

You should match your datatype to what you need. If, for example, you are using an Alpha box then both your int and long variables will be 64 bits. In this case don't use only ajlong out of laziness as your code may run more slowly on other platforms.

5.1.2.2. Other AJAX Types

For convenience ajdefine.h also defines a few datatypes given in the table below (Table 5.3, “Other AJAX Types”).

Other AJAX Types

Table 5.3. Other AJAX Types
TypeDescription
AjBoolBoolean
AjStatusStatus code
AjIntArrayInteger array (int* )
AjFloatArrayFloat array (float*)

AjBoolUsed to store true (ajTrue, AJTRUE) and false (ajFalse, AJFALSE) values. On output, the conversion code %b writes Y or N while conversion code %B writes Yes or No.

There is also a macro for testing boolean values:

#define AJBOOL(b) (b ? "TRUE" : "FALSE")

AjStatusIntended as a general return code for functions, but currently unused because AjBool is enough. Has a constant value to indicate success (ajStatusOK), message (ajStatusInfo), warning (ajStatusWarn), error (ajStatusError) and fatal error (ajStatusFatal).

AjIntArrayA simple C-type array of integers:

typedef int* AjIntArray;

AjFloatArrayA simple C-type array of floats:

typedef float* AjFloatArray;

ajdefine.h defines some other datatypes and constants that are more specialised. See the source code for further information.

To use these AJAX datatypes you must include the files ajdefine.h and ajarch.h in your code. All applications must include at the start of the code the preprocessor directive #include "emboss.h" (see Chapter 2, Your First EMBOSS Application).

emboss.h is the master include file and imports the entire EMBOSS interface: it includes all the header files in the AJAX and NUCLEUS C programming libraries making all the code available to you. If you inspect the file you'll see that ajax.h is included:

% more nucleus/emboss.h 

#ifndef emboss_h
#define emboss_h

#include "ajax.h"
#include "ajgraph.h"
.
.

which itself includes ajdefine.h, which includes ajarch.h:

#ifdef __cplusplus
extern "C"
{
#endif

#ifndef ajdefine_h
#define ajdefine_h

#include "ajarch.h"
.
.

If you develop library code that uses the fundamental types, you must include ajdefine.h explicitly. For example from ajstr.h:

#ifdef __cplusplus
extern "C"
{
#endif

#ifndef ajstr_h
#define ajstr_h

#include "ajdefine.h"
#include "ajtable.h"

.
.

5.1.3. Derived Types

A potentially infinite number of other types may be derived from the fundamental C datatypes as follows:

  • Arrays of objects of a single type

  • Functions returning objects of a single type

  • Pointers to objects of a given type

  • Structures of objects of various types

  • Unions capable of holding one of several objects of different datatypes

In general these methods can be applied in a compound manner. It is possible, for instance, to have a data structure that includes an array of functions which all return a pointer to an array of float variables. In other words, "object" here might refer to a variable with a primitive datatype, to a function, to a data structure and so on. Pointers provide a handle on objects of a particular type and are used when managing memory for objects.

When programming under EMBOSS things are, for most intents and purposes, simplified:

  • There is a standard way for defining new data structures and pointers to them. Data structure and pointer types are referred to as "objects" and "object pointers" respectively.

  • Non-void functions typically return either a primitive datatype or an object pointer.

  • Structures are passed to functions by reference (object pointer): the structure itself is never passed.

  • Constructor and destructor functions handle object memory management.

  • Macros are provided for general-purpose memory management.

  • AJAX implements dynamic arrays of common fundamental datatypes for which memory management is handled automatically. Memory management is also handled automatically for some other datatypes, for example strings.

Derived types, in particular pointers and structures, and methods for memory management are discussed below in greater detail.

5.1.4. Storage Class and Linkage

There are two storage classes in C, automatic and static. Automatic objects are initialised whenever the code block in which they are declared is entered (excluding jumps into the code), and in the order in which they are declared. In contrast, static objects are initialised only once before the program proper starts. The storage class of an object depends on the context of its declaration and the keywords used.

Automatic objects are local to a block and are discarded when the block is exited. Declarations in a block are automatic by default although this may be made explicit with the auto keyword. Objects declared with register are automatic and, where possible, are handled in the fastest available memory register.

Static objects might be local to a block or external to all blocks at the same level as the function definitions. In either case they retain their value when the block is exited and re-entered. They are declared with the keyword static.

Objects that are external to all blocks are always static. In such cases the static keyword gives them internal linkage which means they are only visible in the local file. Otherwise, they have external linkage which means they will be global to the entire program or other compiled unit.

There are a few implications here when programming for EMBOSS. Any unions and C data structures (objects) that are private to a library file or application should be declared static in the library or application C source code file. Any public (external) unions and structures are given in appropriate library header files and should not include the static keyword. Similarly, all application functions and private functions in the libraries should be declared static. Public functions in the library should not include this keyword.

Avoid exporting names outside individual C source files; i.e., declare as static (in the library header file or application code) every function that you possibly can. Where code is specific to an application only, it should stay in the application C source code file and not be moved to the libraries until it is of more generally use.

All datatypes should be defined in the EMBOSS style and functions must be prototyped using the full ANSI C style (see Appendix C, C Coding Standards).