6.18. Handling Tables

6.18.1. Introduction

A hash table is a data structure holding key:value pairs which allows for very efficient retrieval of a value given a key. For example, a key might be the accession number of a sequence and a value might be the sequence itself. Hash tables work by translating the key into a number called a "hash" using a hashing function. The hash then provides the index of an array element or "bucket" in the data structure holding the value. Hash tables have the great advantage that the search time is independent of the number of items stored, and that new key:value pairs can be added efficiently.

EMBOSS supports a standard hash table using strings for keys and generic hash tables with keys of arbitrary datatype. All the basic functionality you would expect for creating, manipulating and using hash tables is provided. This includes:

  • Hash functions for the construction of string hash functions

  • Functions to add and remove key:value pairs or change the value associated with a key

  • Functions to return the true key or value associated with a given key

  • Ability to apply a function to each key:value pair

  • Printing a string hash table

There is no ACD datatype for hash tables, they are always created directly from within the C source code.

6.18.2. AJAX Library Files

AJAX library files for handling hash tables are listed in the table (Table 6.31, “AJAX Library Files for Handling Tables”). Library file documentation, including a complete description of datatypes and functions with usage notes is available at:

http://emboss.open-bio.org/rel/dev/libs/
Table 6.31. AJAX Library Files for Handling Tables
Library File DocumentationDescription
ajtableHash table functions

ajtable.h/cDefines the hash table object (AjPTable) and functions for handling hash tables. They also contains static functions for handling same at a low level. You are unlikely to need the static functions unless you plan to extend the hash table code.

6.18.3. ACD Datatypes

There are no ACD datatypes dedicated for tables.

6.18.4. AJAX Datatypes

For handling tables use:

AjPTable

Hash table - a set of key/value pairs using a simple hash function to provide rapid searching for keys. Tables can hold any data type.

6.18.5. Table Object Memory Management

6.18.5.1. Default Object Construction

To use a hash table object you must first instantiate the appropriate object pointer. Default constructor functions are provided for generic hash tables and EMBOSS string (AjPStr) hash tables:

/* Generic hash table */
AjPTable   ajTableNewLen(ajuint size)

/* Generic hash table with compare and hash functions */
AjPTable   ajTableNewFunctionLen (ajint n,
                       ajint cmp(const void* x, const void* y),  
                       unsigned hash(const void* key, unsigned hashsize));   

/* String hash table  */
AjPTable   ajTablestrNew  (ajint n);
AjPTable   ajTablestrNewLen (ajint n);
AjPTable   ajTablestrNewCaseLen (ajint n);

Both functions will create, initialise and return a new empty hash table that can hold an arbitrary number of key:value pairs. For a generic hash table, created with ajTableNewFunctionLen, an estimate of the number of unique keys (n) is passed, and functions for comparison (cmp) and for hashing the keys (hash) should be provided. If either of the cmp or hash arguments are NULL then it will use a default general-purpose hash function. The ajTableNewLen uses the general-purpose functions. Note that these cannot inspect the data, and can only use the addresses or values provided.

All constructors return the address of a new object. In the following code comp is the comparison function and hash is the hash function. An initial size of 50 is used. The pointer does not need to be initialised to NULL but it is good practice to do so:

    AjPTable  table = NULL;

    table   = ajTableNewFunctionLen(50, comp, hash);

    /* The table is instantiated and ready for use */

The comparison function must return the result of comparing two keys as an integer which is 0 if the comparison is equal, +1 if the x is higher than y or -1 otherwise. The hash function must return the hash value for a given key.

Assume that you want a hash table with a case-insensitive C-type character string (char *) key. The comparison function may then be defined as ajCharCmpCase (available from ajstr.h) which returns the sort order of two text strings:

ajint comp(const void* x, const void* y)
{
    const char* sx;
    const char* sy;

    sx = (const char*) x;
    sy = (const char*) y;

    return (ajint) ajCharCmpCase(sx, sy);
}

The following hash function may be used:

ajuint hash(const void* key, ajuint hashsize)
{
    ajuint hash;
    const char* s;

    s = (const char*) key;

    for(hash = 0; *s; s++)
        hash = (hash * 127 + toupper((ajint)*s)) % hashsize;

    return hash;
}

6.18.5.2. Default Object Destruction

You must free the memory for an object once your are finished with it. A default destructor function is provided for generic and string hash tables:

void  ajTableFree (AjPTable* Ptable);
void  ajTablestrFree (AjPTable* Ptable);

It is the responsibility of the calling function to destroy any objects once they are finished with:

    AjPTable  table = NULL;

    table = ajTableNewFunctionLen(50, comp, hash);

    /* Do something with the instantiated table */

    ajTableFree(&table);

    /* The memory is freed and the pointer reset to NULL, ready for re-use. */

    table = ajTablestrNew(50);  

    /* The pointer variable is reallocated.  Do something else with the new table. */

    ajTableFree(&table);

    /* Done with the object so the memory is freed. */

6.18.5.3. Alternative Object Construction and Loading

To create a generic hash table with an explicit (rather than estimated) number of key:value pairs use:

AjPTable  ajTableNewLen (ajint n,                                             
                         ajint cmp(const void* x, const void* y),
                         unsigned hash(const void* key, unsigned hashsize));

6.18.6. Table Hash Functions

A few table hash functions used during hash table construction (see above) are provided for convenience for string tables only. These provide case-sensitive and case-insensitive hashing for hash tables of EMBOSS strings (AjPStr) and C-type (char *) strings:

/* Case-insensitive hashing of EMBOSS strings */
unsigned  ajTablestrHash (const void* key, unsigned hashsize);   

/* Case-sensitive hashing of EMBOSS strings   */
unsigned  ajTablestrHashCase (const void* key, unsigned hashsize);

6.18.7. Table Comparison Functions

A few table comparison functions used during hash table construction (see above) are provided for convenience for string tables only. These provide case-sensitive and case-insensitive comparison for hash tables of EMBOSS strings (AjPStr):

/* Case-sensitive comparison of EMBOSS strings */
ajint  ajTablestrCmp (const void* ptrx, const void* ptry);    

/* Case-insensitive comparison of EMBOSS strings */
ajint  ajTablestrCmpCase (const void* ptrx, const void* ptry);

6.18.8. Table Edit Functions

Key:value pairs may be added to a table, removed from a table or the value associated with a key may be changed.

ajTablePut is used to set the value associated with a given key in a hash table. It will return the previous value or, if the table does not hold the specified key, it will add the key and value and return NULL.

void*  ajTablePut (AjPTable table, const void* key, void* value);

ajTableRemove will remove the key:value pair from a table and returns the removed value. If the table does not hold the specified key then the function merely returns NULL:

void*  ajTableRemove (AjPTable table, const void* key);

6.18.9. Table Query Functions

The true key or value associated with a given key may be returned by using:

/* Return the value */
void*  ajTableFetch (const AjPTable table, const void* key);  

/* Return the key   */
void*  ajTableFetchKey (const AjPTable table, const void* key);  

ajTableFetch is the standard function that returns a value given a query key.

ajTableFetchKey is provided to return the true key (with any capitalisation) in cases where a case-insensitive hash table is used.

To return the number of key:value pairs in a hash table use:

ajint  ajTableGetLength (const AjPTable table);

6.18.10. Table Map Functions

It is possible to apply a function to each key:value in a hash table using:

void  ajTableMap (AjPTable table, void apply(const void* key, void** Pvalue, void* ptr), void* ptr);     
void  ajTableMapDel (AjPTable table, void apply(const void** Pkey, void** Pvalue, void* ptr), void* ptr);

The function called by ajTableMap should not delete any keys (although values can be modified) whereas the function called by ajTableMapDel may delete keys. In both cases a function must be provided.

Assume that you have a hash table with EMBOSS string keys and integer values and that you want to apply one function to print the key:value pair and another to free a key:value-pair.

The print functions will look something like:

static void print(const void* key, void** value, void* cl)
{
    AjPStr keystr;
    ajint *valptr;

    keystr = (AjPStr) key;
    valptr = (ajint *) *value;

    ajUser("type '%S' found %d times", keystr, *valptr);

    return;
}

whereas the delete function would be:

static void delete(const void** key, void** value, void* cl)
{
    AjPStr keystr;
    ajint *valptr;

    keystr = (AjPStr) *key;
    valptr = (ajint *) *value;

    ajStrDel(&keystr);
    AJFREE(valptr);

    *key = NULL;
    *value = NULL;

    return;
}

To call the functions you would use something like:

    AjPTable table = NULL;

    /* Allocate table */

    ajTableMap(table, print, NULL);

    ajTableMapDel(table, delete, NULL);

6.18.11. Print Functions

Functions are provided to print a string hash table of EMBOSS strings (AjPStr):

void  ajTablestrPrint (const AjPTable table);