6.15. Handling Application Reports

6.15.1. Introduction

Applications should use the standard set of report formats (see the EMBOSS Users Guide) for output where possible. Each report define a content and layout of text in a file. Most of the reports are for describing features, usually short sequence motifs, in a sequence. The sequence motifs, ID codes, sequence position numbers and the corresponding features can be output in a variety of formats. A set of command line qualifiers (Section A.5.3.17, “report) are used to control the data written to the report and set such things as the name and format of the file. Non-printable control characters are not used therefore and so are suitable for viewing on-screen or for printing.

Reports have a consistent look and feel, which helps the end-user and improves interoperation of the applications. In principle an application can use different report formats for different purposes, depending on what the output file will be used for. A human-readable report is required for the end-user whereas a more compact format might be suitable if the output is to be processed by another application. All of the standard sequence feature table formats (see the EMBOSS Users Guide) are also report formats. Other report formats cater for non-sequence information. For a complete list, descriptions and examples of the supported report formats see the EMBOSS Users Guide.

All report formats, excluding the TAB-delimited (excel) one and those using the standard feature table formats, have a block of reference information at the start of the report. This gives:

  • The program name

  • Date the report was generated

  • The output file name

  • The ID name of the sequence

  • The region of the sequence for which features are being reported

  • Some of the parameters and statistics of the report.

The reported data are taken from the report feature table. Different report formats report different values or different tags from the feature table. The tags that might be reported are as follows:

start

Start position of match in sequence.

end

End position of match in sequence.

length

Length of match in sequence.

name

Name of sequence.

sequence

Sequence of match

strand

DNA strand in which feature occurs.

tagname

Printable name of feature

tagvalue

Feature in sequence

type

Type of feature

Finally there is a block of information at the end of the report with summary information.

Internally all reports are held in a very close analogue of the General Feature Format (GFF). This flexible file format holds sequence feature information in a feature table as tag/value pairs. Features can easily be converted to any of the supported report formats (see the EMBOSS Users Guide), and can be selected by using the -rformat built-in qualifier.

GFF has some fields that must exist including:

  • A feature type that can be any string but is best specified using the Sequence Ontology (for example, the EMBOSS antigenic uses 'SO:0001018' as the predicted features are of type 'epitope'. Using SO terms allows some report formats, for example GFF3 or DAS, to correctly label features with biological properties. Any unrecognozed string ir reported as a 'miscellaneous feature' or 'region' by these formats, with the original string retained and reported in the feature tags

  • The sequence itself which may or may not be output depending on the output report format used.

  • The start and end positions of the features. GFF requires that the start is before the end, with the strand identifying nucleotide features in the reverse direction.

  • A score (in case the program reports a score for the feature)

The score and most other standard GFF tags are set to zero or NULL automatically by the library. You have to explicitly use them (assign values to them) if you need them, otherwise they'll be ignored. To cope with non-standard feature values the AJAX library uses the default '/note' tag. This is transparent to the programmer, unless you select EMBL or another output format in which the '/note' tag is visible.

Any report output files should be defined in the application ACD file and retrieved from within the C source code by a call to ajAcdGetReport. Some programming is required to load the report with data and write it to file as described below.

6.15.2. AJAX Library Files

AJAX library files for handling reports are listed in the table below (Table 6.26, “AJAX Library Files for Handling Reports”). Library file documentation, including a complete description of datatypes and functions, is available at:

http://emboss.open-bio.org/rel/dev/libs/
Table 6.26. AJAX Library Files for Handling Reports
Library File DocumentationDescription
ajreportApplication reports

ajreport.h/cDefines the AjPReport object and functions for handling of reports. They also contain static data structures and functions for handling reports at a low level. You are unlikely to need the static data structures and functions unless you plan to implement code to support new report formats for EMBOSS. For advice on how to do this ask the EMBOSS developers.

6.15.3. AJAX Datatypes

For handling application report output files defined in the ACD file use:

AjPReport

Application report (for report ACD datatypes).

6.15.4. ACD Datatypes

The datatype for handling report output is:

report

Output file for sequence annotation (report).

6.15.5. ACD Data Definition

A typical ACD definition for report output:

report: outfile 
[
    parameter: "Y"
    rformat: "diffseq"
    taglist: "int:start int:end int:length str:name str:sequence
              str:first_feature str:second_feature"
]

6.15.5.1. Parameter Name

All data definitions for report output should have standard parameter names (see Appendix A, ACD Syntax Reference):

  • outfile

  • *file

6.15.5.2. Common Attributes

Attributes that are typically specified are summarised below. They are datatype-specific (Section A.5, “Datatype-specific Attributes”) unless they are indicated as being global attributes (Section A.4, “Global Attributes”).

parameter: Reports are typically the primary output of an EMBOSS application and, as such, should be defined as parameters by using the global attribute parameter: "Y".

rformat: Specifies the report format to use, which must be one of the supported report formats (see the EMBOSS Users Guide). There is a default report format that will be used if the user doesn't change it with -rformat on the command line.

multiple: A boolean attribute which should be set to "Y" if the output can contain more than one report from the same input. This will generally have a value of "N" if you are using a single input sequence (the sequence ACD datatype) and "Y" if you're processing multiple sequences from an input stream (the seqall ACD datatype).

type: Specifies where the report format is one of the standard feature table formats (see the EMBOSS Users Guide). type: defines whether the report output is "protein" or "nucleotide". There is a default based on the type of any input sequence, but a value should always be specified.

taglist: Defines the tag / value pairs from the internal feature table that are to be reported in the output. Each tag is in the general format type:tagname and can (optionally) be appended with [=columnname] to indicate a column name to use in the report:

type:tagname[=columnname]

For example:

int:length
string:gc=GC%

Typical taglist datatypes are:

float
int
str

The name can be anything within reason but tag names and types must match those used in the C source code of the application.

In the given example float:molwt indicates that one of the columns is called molwt and that it will contain floating point values.

6.15.6. ACD File Handling

Datatypes and functions for handling report output via the ACD file are shown below (Table 6.27, “Datatypes and Functions for Report Output”).

Table 6.27. Datatypes and Functions for Report Output
To write a report
ACD datatypereport
AJAX datatypeAjPReport
To retrieve from ACDajAcdGetReport

Your application code will call embInit to process the ACD file and command line (see Section 6.3, “Handling ACD Files”). All values from the ACD file are read into memory and files are opened as necessary. You have a handle on the files and memory through the ajAcdGet* family of functions which return pointers to appropriate objects.

6.15.6.1. Report Retrieval

To retrieve an output report stream an object pointer is declared and initialised using ajAcdGetReport:

    AjPReport report=NULL;

    report = ajAcdGetReport("report");

6.15.6.2. Processing Command line Options and ACD Attributes

Currently there are no functions for this.

6.15.6.3. Memory and File Management

It is your responsibility to close any files and free up memory at the end of the program.

6.15.6.3.1. Closing Report Files

Any report file must be closed by calling ajReportClose

void  ajReportClose (AjPReport pthys);
6.15.6.3.2. Freeing Memory

You must call the default destructor function (see below) on any report objects returned by calls to ajAcdGetReport.

Additionally you must call ajReportExit to free up any internal memory allocated for report housekeeping:

void  ajReportExit (void);

6.15.7. Report Object Memory Management

6.15.7.1. Default Object Construction

Report objects are typically loaded from ACD file processing via a call to ajAcdGetReport (see above). You should therefore never need to instantiate the report object (AjPReport) manually. In the unlikely event you do then the default constructor is:

AjPReport  ajReportNew (void);    

The constructor returns the address of a new report object. In the following code the pointer does not need to be initialised to NULL but it is good practice to do so:

    AjPReport      report = NULL;

    report    = ajReportNew();

    /* The object is instantiated and ready for use */

6.15.7.2. Default Object Destruction

You must free the memory for an object once you are finished with it. The default destructor function is:

void  ajReportDel (AjPReport* Preport);

It is used as follows:

    AjPReport     report =NULL;

    report = ajAcdGetReport("report");

    ...

    ajReportDel(&report);

Or if you manually instantiated the object:

    AjPReport report = NULL;

    report = ajReportNew();

    /* Do something with the instantiated objects */

    ajReportDel(&report);

    /* Done with the objects so the memory is freed. */

6.15.7.3. Alternative Object Construction and Loading

Currently there are none of these.

6.15.8. Preparing a Report

Each report has a header, body and tail information. The process of preparing a report for output is:

  1. Define a suitable report datatype in ACD.

  2. Collect the ACD report information within the application in a report object (AjPReport).

  3. Create a feature table object (AjPFeattable) using ajFeattableNewSeq.

  4. For each 'line' of information in the body of the report create a feature object (AjPFeature) and load the feature object with your data values (by using ajFeatNewII and ajFeatTagAdd).

  5. Set any data for the header and tail of the report.

  6. Pass the report AjPReport) and feature table (AjPFeattable) objects to the general report writing function (ajReportWrite).

  7. Clean up memory as normal and exit.

6.15.9. Example Report Application

Consider a simple application to report the sequence length and molecular weight of a sequence.

6.15.9.1. ACD File

In the following ACD file, a sequence input and a report output are defined. The datafile datatype is just there so that a file of molecular weight data in the EMBOSS data area can be read.

application: wreport 
[
    documentation: "Example report program"
]

section: input [ info: "input Section" type: page ]
sequence: sequence  
[
    parameter: "Y"
    type: "Protein"
]

endsection: input


section: advanced [ info: "advanced Section" type: page ]

datafile: aadata  
[
    information: "Amino acid data file"
    help: "Molecular weight data for amino acids"
    default: "Eamino.dat"
]

endsection: advanced


section: output [ info: "output Section" type: page ]

report: outfile  
[
    parameter: "Y"
    rformat: "table"
    multiple: "N"
    precision: "1"
    taglist: "float:molwt int:len"
]
endsection: output

The section: and endsection: definitions provide a means by which GUIs can be instructed to organise the ACD information on the screen. Each section must always have a corresponding endsection. It is standard practice to have at least an input and an output section definition, adding others as appropriate.

6.15.9.2. C Source Code

The complete source code for the application is given below.

/* @prog wreport **************************************************************
**
** Show sequence length and molwt as a report
**
******************************************************************************/

int main(int argc, char **argv)

{
    AjPSeq       seq    = NULL;
    AjPReport    report = NULL;
    AjPFeattable ftable = NULL;
    AjPFeature   feat   = NULL;
    AjPStr       tmpstr = NULL;

    double       molwt;
    int          len;

    AjPFile      mfptr  = NULL;


    embInit ("wreport", argc, argv);

    seq    = ajAcdGetSeq ("sequence");
    report = ajAcdGetReport("outfile");

    /* This bit just reads in an EMBOSS data table of molwt info */
    mfptr  = ajAcdGetDatafile("aadata");


    embPropEaminoRead(mfptr);
    /* End of data file reading */

    /* Calculate the values to output */
    len   = ajSeqGetLen(seq);
    molwt = embPropCalcMolwt(ajSeqGetSeqC(seq),0,len-1);


    /* Create a feature table */
    ftable = ajFeattableNewSeq(seq);


    tmpstr = ajStrNew();


    /* Fill head and tail information for the report */
    ajFmtPrintS(&tmpstr,"This is some Header Text");
    ajReportSetHeaderS(report, tmpstr);

    ajFmtPrintS(&tmpstr,"This is some Tail Text");
    ajReportSetTailS(report, tmpstr);


    /* Create feature object and load with the output values */
    feat = ajFeatNewII(ftable,1,len);

    ajFmtPrintS(&tmpstr,"*molwt %.1f", (float)molwt);
    ajFeatTagAdd(feat,NULL,tmpstr);

    ajFmtPrintS(&tmpstr,"*len %d", len);
    ajFeatTagAdd(feat,NULL,tmpstr);


    /* Write report and clean up */
    ajReportWrite(report,ftable,seq);


    ajFeattableDel(&ftable);
    ajStrDel(&tmpstr);
    ajFileClose(&mfptr);

    embExit();
    return 0;}

First the variables are defined. Objects are needed for a sequence (AjPSeq) and a feature table (AjPFeattable) to hold the feature objects containing the column values. Also needed is an object for individual features (AjPFeature) to contain the values that will be printed in the report. Some housekeeping variables are required and, of course, the report itself (AjPReport):

AjPSeq       seq    = NULL;   /* Input sequence */
AjPReport    report = NULL;   /* Output report */
AjPFeattable ftable = NULL;   /* Feature table */
AjPFeature   feat   = NULL;   /* Individual feature */
AjPStr       tmpstr = NULL;

double       molwt;
int          len;

AjPFile      mfptr  = NULL;

The program can then be initialised and the ACD file processed:

    embInit ("wreport", argc, argv);

    seq    = ajAcdGetSeq ("sequence");
    report = ajAcdGetReport("outfile");

Code is then needed to read in the EMBOSS data file containing amino acid molecular weight data and calculate the molecular weight and length values to be reported. Note that len and molwt are used for both the variable names and for the column names. This is not necessary but does make the code more clear:

    mfptr = ajAcdGetDatafile("aadata");

    embPropEaminoRead(mfptr);

    len   = ajSeqGetLen(seq);
    molwt = embPropCalcMolwt(ajSeqGetSeqC(seq),0,len-1);

The report head and tail information is then set. This could have been done any time after the ajAcdGetReport but before the report is printed. They are optional but recommended. The temporary string object is used for this:

    tmpstr = ajStrNew();

    ajFmtPrintS(&tmpstr,"This is some Header Text");
    ajReportSetHeaderS(report, tmpstr);

    ajFmtPrintS(&tmpstr,"This is some Tail Text");
    ajReportSetTailS(report, tmpstr);

The feature table object is then created. Only one feature table is required per report. The sequence object is passed as a parameter so that the sequence name can be automatically loaded into the internal GFF format. A feature object can now be created into which the column values can be loaded. A single feature object with a tag/value pair is used for each output value. The column values are printed into a string object (tmpstr) which is used to load the feature object.

ajFeatNewII requires 3 parameters. The first is the feature table object, the last two are the sequence start and end positions. Rather naughtily they are hard coded in this example (as mentioned above). A feature object has to be created per line of output in the final report. ajFeatTagAdd is used to load the molecular weight and length values into the feature object:

    ftable = ajFeattableNewSeq(seq);

    feat = ajFeatNewII(ftable,1,len);

    ajFmtPrintS(&tmpstr,"*molwt %.1f", (float)molwt);
    ajFeatTagAdd(feat,NULL,tmpstr);

    ajFmtPrintS(&tmpstr,"*len %d", len);
    ajFeatTagAdd(feat,NULL,tmpstr);

This is the bit where the ACD column names (molwt and len) must match the ones in the ajFmtPrintS calls. Within the C program these names must be preceded by an asterisk (*). The datatypes specified in the ACD file (float and int) must also match what's given in the C code. The NULL parameter just means that only a value is being added (the library will add the /note tag automatically).

The report can now be written. Three objects are passed. The reason for the sequence object being passed is in case you choose a report format that prints out the (sub)sequence used:

    ajReportWrite(report,ftable,seq);

Finally the dynamic memory is recovered in a clean-up prior to exiting:

    ajFeattableDel(&ftable);
    ajStrDel(&tmpstr);
    ajFileClose(&mfptr);

    embExit ();

    return 0;

6.15.10. Report File Management

As an alternative to ACD processing, a report file can be opened directly by calling ajReportOpen. A report file is closed by calling ajReportClose:

AjBool  ajReportOpen (AjPReport thys, const AjPStr name);
void    ajReportClose (AjPReport thys);

To add an additional file name and description to the report call ajReportAddFileF:

void  ajReportAddFileF (AjPReport report, AjPFile file, const AjPStr type);

The file name and type will be listed in the report header as "Additional files" to let the user know that there are associated output files available.

To write the report file call ajReportWrite:

AjBool  ajReportWrite (AjPReport report, const AjPFeattable ftable, const AjPSeq seq);

The function writes a report file (report) from the feature table object (ftable) for the given sequence (seq). It returns ajTrue if data was written or ajFalse if maximum output has already been reached. The steps needed to prepare the report are described in detail below.

The report header and tail can be written separately by calling ajReportWriteHeader or ajReportWriteTail. These take as parameters the sequence object being reported on:

void  ajReportWriteHeader (AjPReport report, const AjPFeattable ftable, const AjPSeq seq); 
void  ajReportWriteTail (AjPReport report, const AjPFeattable ftable, const AjPSeq seq);

You will never normally need to call these functions as the header and tail are written automatically when you call ajReportWrite.

6.15.11. Setting Elements of a Report Object

The functions in this section are used to set the elements in a report object directly.

To set some preliminary text (header) as the report header or subheader then call ajReportSetHeaderS or ajReportSetSubheaderS:

void  ajReportSetHeaderS (AjPReport report, const AjPStr header);
void  ajReportSetHeaderC (AjPReport report, const char* header);
void  ajReportSetSubheaderS (AjPReport report, const AjPStr header);
void  ajReportSetSubheaderC (AjPReport report, const char* header);

To append some text to the report header or subheader to the report then call ajReportAppendHeaderS or ajReportAppendSubheaderS:

void  ajReportAppendHeaderS (AjPReport report, const AjPStr header)
void  ajReportAppendHeaderC (AjPReport report, const char* header);
void  ajReportAppendSubheaderS (AjPReport report, const AjPStr header)
void  ajReportAppendSubheaderC (AjPReport report, const char* header);

To set some trailing text (tail) as the report tail or subtail then call ajReportSetTailS or ajReportSetSubtailS:

void  ajReportSetTailS (AjPReport report, const AjPStr tail);
void  ajReportSetTailC (AjPReport report, const char* tail);
void  ajReportSetSubtailS (AjPReport report, const AjPStr tail);
void  ajReportSetSubtailC (AjPReport report, const char* tail);

To append some text to the report tail or subtail to the report then call ajReportAppendTailS or ajReportAppendSubtailS:

void  ajReportAppendTailS(AjPReport report, const AjPStr tail)
void  ajReportAppendTailC (AjPReport report, const char* tail);
void  ajReportAppendSubtailS (AjPReport report, const AjPStr tail)
void  ajReportAppendSubtailC (AjPReport report, const char* tail);

To set the tag list for a report call ajReportSetTagsS:

AjBool  ajReportSetTagsS (AjPReport report, const AjPStr taglist);

To set the report type if it is not set already call ajReportSetType:

void  ajReportSetType (AjPReport report, const AjPFeattable ftable, const AjPSeq seq);

To set the default format for a feature report call ajReportFormatDefault:

AjBool  ajReportFormatDefault (AjPStr* pformat);

6.15.12. Getting Elements of a Report Object

It is sometimes necessary to query the properties of a report object. Although you will typically not need the functions in this section, they are useful if you have created a report object manually (using the constructor) rather than by picking up one from ACD processing.

To return the sequence name or USA depending on the setting in the report object (derived from the ACD and command line -rusashow option) call ajReportGetSeqnameSeq:

const AjPStr  ajReportGetSeqnameSeq (const AjPReport report, const AjPSeq seq);

6.15.13. Debugging Report Objects

To test that is a report is valid call:

AjBool  ajReportValid (AjPReport report); 

This will check that the specified format works with the number of tags and with the type (protein or nucleotide) of the sequence. The sequence format is set if it is not already defined.

To print information about the current report format call ajReportPrintFormat. If the full argument is ajTrue then information on all reports formats is generated:

void  ajReportPrintFormat (AjPFile outf, AjBool full);