6.14. Handling Files

6.14.1. Introduction

EMBOSS provides comprehensive and flexible functionality for file handling, including:

  • Input and output general files

  • Input and output data files

  • Input and output directories

  • Lists of files

  • Lists of files from directories

ACD datatypes are provided for handling all of the above. In addition, functions are provided to read and write files and handle directories directly if this is required.

6.14.2. AJAX Library Files

AJAX library files for handling files are listed in the table (Table 6.24, “AJAX Library Files for Handling Files”). Library file documentation, including a complete description of datatypes and functions, is available at:

http://emboss.open-bio.org/rel/dev/libs/
Table 6.24. AJAX Library Files for Handling Files
Library File DocumentationDescription
ajfileGeneral file handling

ajfile.h/cObjects and functions for handling of data files, general files and directories and static functions for manipulating them at a low level. You are unlikely to need the static functions unless you plan to implement code to extend the core functionality of the library.

6.14.3. ACD Datatypes

There are five datatypes for handling file input:

infile

General input file.

datafile

Input data file.

directory

A directory with an optional specified file extension and filename prefix.

dirlist

A list of file names that are read from a directory with an optional specified file extension and filename prefix.

filelist

A list of input files.

There are five datatypes for handling file output:

outfile

General output file.

outdata

Output data file.

outdir

Output directory for writing of multiple output files.

directory

A directory that can be used for input or output.

6.14.4. ACD Data Definition

Typical ACD definitions for file input and output are given below.

6.14.4.1. infile

Input file:

infile: pepstatsfile 
[
    parameter: "Y"
    information: "Pepstats program output file (optional)"
    knowntype: "pepstats output"
]

6.14.4.2. filelist

File list input:

filelist: files 
[
    parameter: "Y"
    information: "Molecular weights file list"
    knowntype: "molwt"
]

6.14.4.3. datafile

Input data file:

datafile: aaproperties 
[
    name: "Eaa_properties.dat"
    information: "Amino acid chemical classes data file"
    knowntype: "amino acid classification"
]

6.14.4.4. dirlist

Input directory list:

dirlist: directory 
[
    parameter: "Y"
    information: "Codon usage directories"
    knowntype: "codon usage"
]

6.14.4.5. directory

Input directory:

directory: directory 
[
    parameter: "Y"
    information: "Database directory"
    knowntype: "emboss database directory"
]

6.14.4.6. outfile

Output file:

outfile: outfile 
[
    parameter: "Y"
    information: "Pepstats program output file"
    knowntype: "pepstats output"
]

6.14.4.7. outdata

Output data file:

outdata: outfile
[
    default: "Eaa_properties.dat"
    information: "Amino acid chemical classes data file"
    knowntype: "amino acid classification"
]

6.14.4.8. outdir

Output directory:

outdir: indexoutdir 
[
    parameter: "Y"
    information: "Index file output directory"
    knowntype: "emboss database index"
]

6.14.4.9. directory

Output directory:

directory: directory 
[
    parameter: "Y"
    information: "Database directory"
    knowntype: "emboss database directory"
]

6.14.4.10. Parameter Names

All data definitions for file input and output should have standard parameter names. These include:

  • "infile" for infile

  • "*files" for filelist

  • "directory" for dirlist

  • "directory" for directory

  • "outfile" for outfile

  • "outdir" for outdir

Alternatives and variations are allowed. For more information, see Appendix A, ACD Syntax Reference.

6.14.4.11. Common Attributes

Attributes that are typically specified are summarised below. They are datatype-specific (Section A.5, “Datatype-specific Attributes”) unless they are indicated as being global attributes (see Section 4.3, “Data Definition”).

parameter: File datatypes are typically the primary input or output of an EMBOSS application and, as such, should be defined as parameters by using the global attribute parameter: "Y".

information: A global attribute that specifies the user prompt. It is used in the application documentation.

knowntype: A global attribute typically specified for all file input and output types and is required for outfile and outdir. If the output is not any of the standard EMBOSS known types (Section 4.3.5.3.1, “Application Data Known Types File (knowntypes.standard)”) then ApplicationName output is the recommended value.

name: Datafiles often have a hardcoded filename. You are free to define this using the name: attribute.

default: Datafiles often have a hardcoded filename. A default file name is defined for outdata: using the default: global attribute (no name: attribute is available as you might have expected).

6.14.5. AJAX Datatypes

For handling input files and directories defined in the ACD file use:

AjPFile

An unbuffered file that can be used for input or output (for infile and datafile ACD datatypes).

AjPList

General list (for lists of files from filelist and dirlist ACD datatypes).

AjPDir

A directory that can be used for input or output (for directory ACD datatype).

For handling output files and directories defined in the ACD file use:

AjPFile

General file (for outfile ACD datatype).

AjPDir

Directory (for outdir and directory ACD datatypes).

For handling general output files including output data files defined in the ACD file use:

AjPOutfile

An unbuffered output file.

For handling buffered files use:

AjPFilebuff

Buffered file object holding information for a buffered input file.

AjPFilebufflist

File buffer holding a simple linked list of buffered lines. This is a substructure of the AjPFilebuff object.

6.14.6. ACD File Handling

Datatypes and functions for handling files via the ACD file are shown below (Table 6.25, “Datatypes and Functions for File and Output”).

Table 6.25. Datatypes and Functions for File and Output
ACD datatypeAJAX datatypeTo retrieve from ACD
File Input
infileAjPFileajAcdGetInfile
filelistAjPListajAcdGetFilelist
datafileAjPFileajAcdGetDatafile
dirlistAjPListajAcdGetDirlist
directoryAjPDirajAcdGetDirectory
File Output
outfileAjPFileajAcdGetOutfile
outdataAjPOutfileajAcdGetOutdata
outdirAjPDirajAcdGetOutdir
directoryAjPDirajAcdGetDirectory

Your application code will call embInit to process the ACD file and command line (see Section 6.3, “Handling ACD Files”). All values from the ACD file are read into memory and files are opened as necessary. You have a handle on the files and memory through the ajAcdGet* family of functions which return pointers to appropriate objects.

6.14.6.1. Input File Retrieval

To retrieve an input file, list of files etc. an object pointer is declared and then initialised using the appropriate ajAcdGet* function.

6.14.6.1.1. Input file
    AjPFile pepstatsfile = NULL;

    pepstatsfile = ajAcdGetInfile("pepstatsfile");
6.14.6.1.2. Input list of files
    AjPList files = NULL;

    files = ajAcdGetFilelist("files");
6.14.6.1.3. Input data file
    AjPFile aaproperties = NULL;

    aaproperties = ajAcdGetDatafile("aaproperties");
6.14.6.1.4. Input list of files from a directory
    AjPList filelist = NULL;

    filelist = ajAcdGetDirlist("directory");
6.14.6.1.5. Directory (input or output)
    AjPDir directory = NULL;

    directory = ajAcdGetDirectory("directory");

6.14.6.2. Output File Retrieval

To retrieve an output file stream, directory etc. an object pointer is declared and initialised using the appropriate ajAcdGet* function.

6.14.6.2.1. Output file
    AjPFile outfile = NULL;

    outfile = ajAcdGetOutfile("outfile");
6.14.6.2.2. Output data file
    AjPOutfile outfile = NULL;

    outfile = ajAcdGetOutdata("outfile");
6.14.6.2.3. Output directory
    AjPDir indexoutdir = NULL;

    indexoutdir = ajAcdGetOutdir("indexoutdir");
6.14.6.2.4. Directory (input or output)
    AjPDir directory = NULL;

    directory = ajAcdGetDirectory("directory");

6.14.6.3. Alternative ACD Retrieval Functions

To retrieve a directory name rather than the directory itself call either ajAcdGetDirectoryName (directory ACD object) or ajAcdGetOutdirName (outdir ACD object). Both functions return the directory name as a string object. The directory and name strings are owned by the caller and should be freed when the application exits. For example:

    AjPDir directory   = NULL;
    AjPDir indexoutdir = NULL;
    AjPStr indir       = NULL;
    AjPStr outdir      = NULL;

    directory   = ajAcdGetDirectory("directory");
    indexoutdir = ajAcdGetOutdir("indexoutdir");

    indir  = ajAcdGetDirectoryName("directory");
    outdir = ajAcdGetOutdirName("indexoutdir");

/* ... */

   ajStrDel(&indir);
   ajStrDel(&outdir);
   ajDirDel(&indexoutdir);

Directories have attributes defined through ACD including default file extensions which can be useful for processing selected input files, or writing output files with a defined (or user-defined) file type.

6.14.6.4. Processing Command line Options and ACD Attributes

Currently there are no functions for this.

6.14.6.5. Memory and File Management

It is your responsibility to close any files and free up memory at the end of the program.

6.14.6.5.1. Closing Files

To close an input file or input data file call ajFileClose with the address of the input file.

To close an output file call ajOutfileClose with the address of the output file.

Following from the examples above:

    AjPFile pepstatsfile = NULL;
    AjPFile outfile      = NULL;

    pepstatsfile = ajAcdGetInfile("pepstatsfile");
    outfile      = ajAcdGetOutfile("outfile");

    /* Do something with files */

    ajFileClose (&pepstatsfile);
    ajOutfileClose(&outfile);
    ajOutfileClose(&outfiles);

In both cases the file is closed and memory for the file object is freed.

6.14.6.5.2. Freeing Memory

You must call the destructor functions (see below) to free the memory used for any objects returned by calls to ajAcdGet*.

6.14.7. File and Directory Object Memory Management

6.14.7.1. Default Object Construction

To use a file or directory object that is not defined in the ACD file you must first instantiate the appropriate object pointer. The default constructor functions described below take the name of the file or directory to open.

6.14.7.1.1. General input and output files

The basic constructors for general input and output files are:

/* Input file  */
AjPFile  ajFileNewInNameS (const AjPStr name);          

/* Input file  */
AjPFile  ajFileNewInNameC (const char* name);           

/* Output file */
AjPFile  ajFileNewOutNameS (const AjPStr name);         

/* Output file */
AjPFile  ajFileNewOutNameC (const char* name);          

/* Perl-style Pipe. */
AjPFile  ajFileNewInPipe (const AjPStr str);

/* Output file */
AjPOutfile  ajOutfileNewNameS (const AjPStr name);

You will typically use an AjPFile for input and an AjPOutfile for output and therefore use ajOutfileNewNameS in preference to ajFileNewOutNameS or ajFileNewOutNameC.

6.14.7.1.2. Data files

The default constructor functions for input data files are:

/* Input data file */
AjPFile  ajDatafileNewInNameS (const AjPStr name);     

/* Input data file */
AjPFile  ajDatafileNewInNameC (const char* name);      

The functions take the file name as a C-type (char *) or string object (AjPStr) but are otherwise identical. They will open the named file if it exists in one of the following paths which are searched in the order shown:

  1. . (current working directory)

  2. ./.embossdata (EMBOSS data directory)

  3. ~/ (users home directory)

  4. ~/.embossdata (EMBOSS data directory)

  5. $EMBOSS_DATA (EMBOSS data directory)

The default constructor function for output data files is:

AjPFile  ajDatafileNewOutNameS (const AjPStr name);

ajDatafileNewOutNameS will return an pointer to a file object for a file in the EMBOSS data directory ($EMBOSS_DATA).

6.14.7.1.3. Buffered input files

There is no ACD datatype for buffered files although there is an AJAX datatype (AjPFilebuff). To use a buffered file you must first instantiate the object pointer using a filename, an open file, or buffered lines of data. The basic constructor functions are:

 /* Named buffered input file */
AjPFilebuff  ajFilebuffNewNameS (const AjPStr name);

/* Named buffered input file and directory  */
AjPFilebuff  ajFilebuffNewNamePathS (const AjPStr name, const AjPStr path)

/* Named directory and wildcard filename to read all matching files */
AjPFilebuff  ajFilebuffNewPathWild (const AjPStr path,
                                    const AjPStr wildname);

/* Named directory and wildcard filename to read all matching files with exceptions */
AjPFilebuff  ajFilebuffNewPathWildExclude(const AjPStr path,
                                          const AjPStr wildname,
                                          const AjPStr exclude)

/* no file - to be loaded with data for re-reading */
AjPFilebuff  ajFilebuffNewNofile (void);

ajFilebuffNewNofile is a primitive constructor creating the object but no file stream used in several places in EMBOSS where the buffer will be loaded with data records. Examples can be found in processing sequence databases. The usual constructor is ajFilebuffNewNameS and will open the named file.

6.14.7.1.4. Directories

Default constructors are provided for input and output:

/* Directory for input  */
AjPDir  ajDirNewPath (const AjPStr path);     

/* Directory for output */
AjPDir  ajDiroutNewPath (const AjPStr path);  

The constructors return the address of a new object. The pointers do not need to be initialised to NULL but it is good practice to do so:

    AjPFile      filein  = NULL;
    AjPFilebuff  buffin  = NULL;
    AjPFile      datain  = NULL;

    AjPOutfile   fileout = NULL;

    AjPDir       dirin   = NULL;
    AjPDir       dirout  = NULL;

    AjPStr       fileout_name= NULL;
    AjPStr       buffin_name = NULL;
    AjPStr       dirin_name  = NULL;
    AjPStr       dirout_name = NULL;


    fileout_name = ajStrNewC("outfile"); 
    buffin_name  = ajStrNewC("bufffile");
    dirin_name   = ajDirNewPath("./dirin");
    dirout_name  = ajDirNewPath("./dirout");

    filein  = ajFileNewInNameC("infile");
    fileout = ajOutfileNewNameS(fileoutname);
    datain  = ajDatafileNewInNameC("datafile");
    buffin  = ajFilebuffNewNameS(buffname);
    dirin   = ajDirNewPath(dirin_name);
    dirout  = ajDiroutNewPath(dirout_name);


    /* The objects are instantiated and ready for use */

6.14.7.2. Default Object Destruction

You must free the memory for any object once you are finished with them. All the default destructor functions take the address of the object being freed:

/* File object destructor */
void  ajFileClose (AjPFile* Pfile);           

/* Output file object destructor */
void  ajOutfileClose (AjPOutfile* Pfile);      

/* Buffered file object constructor */
void  ajFilebuffDel (AjPFilebuff* Pfile);   

/* Directory destructor */
void  ajDirDel (AjPDir* Pdir); 

The destructor functions close the file (if appropriate) and free memory for the file or directory object. Following from the example above:

    ajFileClose(&filein);
    ajOutfileClose(&fileout);
    ajFilebuffDel(&buffin);
    ajDirDel(&dirin);
    ajDirDel(&dirout);

For a list of files (filelist or dirlist) it is necessary to free the list of filenames (which are stored as strings), once you are done with them, by calling ajListstrFreeData:

    AjPList files = NULL;
    AjPList filelist = NULL;
    AjIList iter     = NULL;
    AjPFile infile   = NULL;
    AjPStr filename  = NULL;

    files    = ajAcdGetFilelist("files");
    filelist = ajAcdGetDirlist("directory");

    /* both lists are filenames as strings */

    iter = ajListIterNewread(filelist);
    while(!ajListIterDone(iter))
    {
        filename = (AjPStr) ajListIterGet(iter);
        infile = ajFileNewInNameS(filename);

        if(infile) 
        {

            /* Do something with the file */

            ajFileClose(&infile);
        }
    }

    ajListIterDel(&iter);

    ajListstrFreeData(&files);
    ajListstrFreeData(&directory);

Where a directory is used, either for input (directory) or output (outdir) directory you must call the directory destructor function ajDirDel once you are done with it:

    AjPDir indexoutdir = NULL;
    AjPDir directory = NULL;
    AjPList filelist = NULL;

    indexoutdir = ajAcdGetOutdir("indexoutdir");
    directory = ajAcdGetDirectory("directory");

    filelist = ajListstrNew();
    ajFilelistAddDirectory(filelist, directory);

    /* Do something with directory or list of files */

    ajDirDel(&indexoutdir);
    ajDirDel(&directory);
    ajListstrFreeData(&filelist);

6.14.7.3. Alternative Object Construction and Loading

The numerous alternative constructor functions described below provide maximum flexibility in file object construction and loading. All constructors return the address of a new object.

6.14.7.3.1. General input file

For ajFileNewListinPathWild and ajFileNewListinPathWildExclude the filename may be wildcarded and all files matching the name will be opened for reading. ajFileNewListinPathWildExclude also takes a second string (exclude) which is a wildcard string specifying which file names to exclude. ajFileNewListinNameDirS and ajFileNewListinDirPre will also open a named file from a directory but in contrast to the other functions mentioned adds a default extension defined in a directory object (AjPDir):

/* List of input file names as strings (AjPStr) */
AjPFile  ajFileNewListinList (AjPList list);

/* Specify a directory object. */
AjPFile  ajFileNewListinNameDirS (const AjPStr name, const AjPDir dir);       

/* Specify a directory object and filename prefix. The directory object supplies the filename extension. */
AjPFile  ajFileNewListinDirPre (const AjPDir dir, const AjPStr prefix);

/* Read all files matching wild card. */
AjPFile  ajFileNewListinPathWild (const AjPStr name, const AjPStr path); 

/* Read all files matching / excluding wild card. */
AjPFile  ajFileNewListinPathWildExclude (const AjPStr name,const AjPStr path, const AjPStr exclude);

/* Read using blocked fread calls */
AjPFile  ajFileNewInBlockS (const AjPStr name, ajuint blocksize);

Functions with the prefix ajFileNewFrom construct the file object from some other entity; either a UNIX pipe, a list of input files or from a C-type (FILE *) file pointer:

/* C-type FILE pointer. */
AjPFile  ajFileNewFromCfile(FILE* file);      
6.14.7.3.2. infile (data file)

A named input data file may be read from a specific data subdirectory by calling:

AjPFile  ajDatafileNewInNamePathS (const AjPStr name, const AjPStr path);

The file will be searched for in the usual EMBOSS data directories (see above) and in the subdirectory path of the active EMBOSS data directory.

6.14.7.3.3. Buffered input files

Buffered input files may be read from a named directory using the following functions, which are otherwise identical to the general file handling functions with similar names (see above):

/* Specify a path. */
AjPFilebuff  ajFilebuffNewNamePathS (const AjPStr name, const AjPStr path);    

/* Specify a path. */
AjPFilebuff  ajFilebuffNewNamePathC (const char* name, const AjPStr path);     

/* Read all files matching wild card. */
AjPFilebuff  ajFilebuffNewPathWild (const AjPStr name, const AjPStr path); 

/* Construct from all files matching / excluding  wild card. */
AjPFilebuff  ajFilebuffNewPathWildExclude (const AjPStr name, const AjPStr path,const AjPStr exclude);

Buffered input files may be constructed from some other entity; either a file object, a C-type (FILE *) file pointer, from another buffered input file (in which case the file is copied), from a string (rather than a file) or from a list of file names:

/* from an already open file */
AjPFilebuff  ajFilebuffNewFromFile (AjPFile file);

/* from an already open C file */
AjPFilebuff  ajFilebuffNewFromCfile (FILE* file);

/* no file, but with one line of buffered data */
AjPFilebuff  ajFilebuffNewLine (const AjPStr line);

/* list of filenames to be read in turn */
AjPFilebuff  ajFilebuffNewListinList(AjPList);

The string passed to ajFilebuffNewLine should contain one line of buffered data for reading.

ajFilebuffNewFromFile and ajFilebuffNewFromCfile both appropriate the file pointer that is passed, so you must be careful to avoid calling the destructor twice on the same object when managing the memory for your objects:

AjPFile     file = NULL;
AjPFilebuff buff = NULL;

file  = ajFileNewInNameC("infile");
buff  = ajFilebuffNewFromCfile(file);

/* Do something with buffered file */

/* file is appropriated by buff so we only need to call one destructor (close) function */
ajFilebuffDel(&buff);
6.14.7.3.4. directory

A directory object can be constructed to exclusively read or write files with a specified file extension (ext) or a specified prefix (pre) and file extension:

/* Input directory  */
AjPDir  ajDirNewPathExt (const AjPStr path, const AjPStr ext);
AjPDir  ajDirNewPathPreExt (const AjPStr path, const AjPStr pre, const AjPStr ext);

/* Output directory */
AjPDirout  ajDiroutNewPath (const AjPStr path);
AjPDirout  ajDiroutNewPathExt (const AjPStr path, const AjPStr ext);
6.14.7.3.5. General output file

A named general output file may be constructed in a given directory which may be specified as a directory object (AjPDir) or as a file path:

/* Directory object. */
AjPFile  ajFileNewOutNameDirS (const AjPStr name, const AjPDir dir);   

/* Path. */
AjPFile  ajFileNewOutNamePathS (const AjPStr name, const AjPStr path); 

To open a named general output file for appending you can use:

AjPFile  ajFileNewOutappendNameS (const AjPStr name);

6.14.8. Reading from File

Functions for reading lines from a general file have the prefix ajReadline. The file position before the read can be retained, newline characters can be trimmed from the line, and the line can be appended to a provided buffer:

/* Read a line from a file. */
AjBool  ajReadline (AjPFile file, AjPStr* Pstr); 

/* Read and retain file position before read. */
AjBool  ajReadlinePos (AjPFile file, AjPStr* Pstr, ajlong* fpos); 

/* Read and trim any trailing newline character. */
AjBool  ajReadlineTrim (AjPFile file, AjPStr* Pstr); 

/* Read, trim newline characters and retain file position before read.  */
AjBool  ajReadlineTrimPos (AjPFile file, AjPStr* Pstr, ajlong* fpos); 

/* Read and append to buffer. */
AjBool  ajReadlineAppend (AjPFile file, AjPStr* Pbuff); 

To read a binary file call:

/* Binary read from a file using the C 'fread' function. */
AjBool  ajReadbinBinary (AjPFile file, size_t count, size_t element_size,
                         void* buffer);   

/* Read unsigned int */
AjBool  ajReadbinUint (AjPFile file,  ajuint* ret);

/* Read unsigned int bigendian and converts if needed. */
AjBool  ajReadbinUintEndian (AjPFile file,  ajuint* ret);

ajReadbinBinary will read count elements of size element_size from an input file. ajReadbinUint and ajReadbinUintEndian read an unsigned integer and convert from littleendian (the form used for binary data output, e.g. database index files) or bigendian respectively. Both functions use the C function fread.

6.14.8.1. Buffered files

Functions with the prefix ajBuffRead read a line from a buffered file. The line that's read can be appended to a string and the file position before the read can be retained:

/* Read a line from a buffered file. */
AjBool  ajBuffreadLine (AjPFilebuff file, AjPStr* Pstr); 

/* Read a line and note the file position. */
AjBool  ajBuffreadLinePos (AjPFilebuff file, AjPStr* Pstr, ajlong* fpos); 

/* Read a line and optionally append to a Pstore buffer */
AjBool  ajBuffreadLinePosStore (AjPFilebuff buff, AjPStr* Pdest, ajlong* Ppos,
                                AjBool dostore, AjPStr *Pstore);

6.14.9. Writing to Files

6.14.9.1. General files

There are two functions from ajfmt.c/h for writing to files. The most widely used is ajFmtPrintF which writes to an AJAX file object. Alternative functions write to a C FILE or to standard output:

/* Format and emit the "..." arguments according to fmt; writes to a file object */
void  ajFmtPrintF (AjPFile file,
                   const char *fmt, ...);

/* Format and emit the "..." arguments according to fmt; writes to  C FILE stream */
void  ajFmtPrintFp (FILE *stream,
                    const char *fmt, ...);

/* Format and emit the "..." arguments according to fmt; writes to stdout */
void  ajFmtPrint (const char *fmt, ...);

6.14.9.2. Binary files

Functions for writing to a binary file have the prefix ajWritebin. A single byte, a C-type string (char *) or a string object can be written to a field of defined size (len), or a numeric values can be written with the correct byte orientation:

/* Writes a single byte in a defined field. */
ajint  ajWritebinByte (AjPFile file, char chr);                    
ajint  ajWritebinChar (AjPFile file, const char* txt, ajint len); 
ajint  ajWritebinStr (AjPFile file, const AjPStr str, ajint len); 

/* Writes a 2-byte integer . */
ajint  ajWritebinInt2 (AjPFile file, ajlong i); 

/* Writes a 4-byte integer. */
ajint  ajWritebinInt4 (AjPFile file, ajlong i); 

/* Writes a 8-byte integer. */
ajint  ajWritebinInt8 (AjPFile file, ajlong i); 

6.14.9.3. Buffered files

In some cases it is necessary to write to a buffered file that is otherwise being used for input. Three functions are provided to write lines to a buffered file for input. These are used internally, for example when reading binary sequence formats.

/* Writes a C-type string to the buffer */
void  ajFilebuffLoadC (AjPFilebuff file, const char* txt);   

/* Writes a string object to the buffer */
void  ajFilebuffLoadS (AjPFilebuff file, const AjPStr str);  

/* Writes all lines from a file into the buffer. */
void  ajFilebuffLoadAll (AjPFilebuff file);  

6.14.10. Manipulating Files

Three functions are provided to set the properties of a file object:

/* Close current file and open next one. */
AjBool  ajFileReopenNext (AjPFile* Pfile);  

/* Reopens a file with a new name. */
FILE*  ajFileReopenName (AjPFile* Pfile, const AjPStr name); 

/* Sets the current position in an open file. */
ajint  ajFileSeek (AjPFile* Pfile, ajlong offset, ajint from); 

For a file object that includes a list of input files, ajFileReopenNext will close the current input file and open the next one. ajTrue is returned on success or ajFalse otherwise. ajFileReopenName calls the C function freopen to close the current file and open a file with a new name. ajFileSeek is, essentially, a wrapper to the C function fseek and sets the current position in an open file. In cases where end-of-file was reached and this function is used to move back somewhere in the file, then the end-of-file flag in the file object is reset.

Functions for setting the properties of buffered file objects have the prefix ajFilebuffSet and include:

/* Sets file to be buffered. */
AjBool  ajFilebuffSetBuffered (AjPFilebuff* Pfile); 

/* Sets file to be unbuffered. */
AjBool  ajFilebuffSetUnbuffered (AjPFilebuff* Pfile); 

/* Sets next read to start at first buffered line. */
AjBool  ajFilebuffReset (AjPFilebuff* Pfile); 

/* Sets next read position and resets file position. */
AjBool  ajFilebuffResetPos (AjPFilebuff* Pfile); 

/* Resets the pointer and current record and optionally clears Pstore buffer */
void  ajFilebuffResetStore (AjPFilebuff buff, AjBool dostore, AjPStr *Pstore)

/* Deletes processed lines from a file buffer. */
AjBool  ajFilebuffClear (AjPFilebuff* Pfile, ajint n); 

/* Clears processed records and removes extra record(s) from the Pstore buffer  */
AjBool  ajFilebuffClearStore (AjPFilebuff buff, ajint lines, const AjPStr lastline,
                              AjBool dostore, AjPStr *Pstore); 

Buffering of file input can be turned on and off by calling ajFilebuffSetBuffered and ajFilebuffSetUnbuffered respectively (buffering for an AjPFilebuff object is on by default). Functions with the prefix ajFilebuffReset set the next read position to the start of the first buffered line. ajFilebuffResetPos also resets the current file position to the last known read as a precaution in cases where this might have been changed by some other function. ajFilebuffResetStore also clears the caller's record of processed lines, using the same control parameter as ajBuffreadLinePosStore. ajFilebuffClear and ajFilebuffClear will delete the buffered lines processed so far. ajFilebuffClearStore will also clear the caller's record of processed lines, using the same control parameter as ajBuffreadLinePosStore.

6.14.11. Querying Properties of Files

6.14.11.1. Basic Properties

The file name, C-type file pointer (FILE *) and file length of a file object are available by calling:

/* File name. */
AjPStr  ajFileGetNameS (const AjPFile file);   

/* File name. */
const char*  ajFileGetNameC (const AjPFile file);   

/* C file pointer. */
FILE*  ajFileGetFileptr (const AjPFile file); 

These functions (and other *Get* functions below) return elements (or values calculated from them) of the object itself, therefore care must be taken not to inadvertently change the object in cases where a pointer is returned.

Functions with the prefix ajFileIs check whether a file is a certain type or if the object satisfies some other property:

/* File is stderr. */
AjBool  ajFileIsStderr (const AjPFile file);  

/* File is stdin. */
AjBool  ajFileIsStdin (const AjPFile file);   

/* File is stdout. */
AjBool  ajFileIsStdout (const AjPFile file);  

/* File is set for appending. */
AjBool  ajFileIsAppend (const AjPFile file);      

/* File position is at end-of-file. */
AjBool  ajFileIsEof (const AjPFile file);

These functions (and *Is* functions for other datatype below) return ajTrue if the condition is met.

6.14.11.2. Output files

The file object, C-type file pointer (FILE *) from the file object and format are available for retrieval from an output file object:

/* File object.    */
AjPFile  ajOutfileGetFile (const AjPOutfile file);      

/* Output format.  */
AjPStr  ajOutfileGetFormat (const AjPOutfile file); 

/* C file pointer. */
FILE*  ajOutfileGetFileptr (const AjPOutfile file);

6.14.11.3. Buffered files

The file object, C-type file pointer (FILE *) from the file object and standard record buffer size are available for retrieval from a buffered file object:

/* File object */
AjPFile  ajFilebuffGetFile (const AjPFilebuff file); 

/* C file pointer from file object /
FILE*  ajFilebuffGetFileptr (const AjPFilebuff file);

A buffered file can be tested for whether it is empty, buffered, whether end-of-file has been reached or whether the file is exhausted (EOF is reached and the buffer is empty):

/*File buffer is empty. */
AjBool  ajFilebuffIsEmpty (const AjPFilebuff file); 

/* Input file is buffered. */
AjBool  ajFilebuffIsBuffered (const AjPFilebuff file); 

/* File is exhausted (EOF is reached and buffer is empty). */
AjBool  ajFilebuffIsEmpty (const AjPFilebuff file); 

/* EOF is reached.  */
AjBool  ajFilebuffIsEof (const AjPFilebuff file); 

6.14.11.4. Directories

The directory name and file extension of files for reading are available for a directory object:

AjPStr  ajDirGetPath (const AjPDir Pdir);
AjPStr  ajDirGetExt (const AjPDir Pdir);
AjPStr  ajDirGetPrefix (const AjPDir Pdir); 

The directory object can be used to generate a list of full path names of files in the directory and matching any specified prefix and extension:

ajint  ajFilelistAddDirectory (AjPList list, const AjPDir dir); 

6.14.12. Querying and Manipulating File and Directory Names

It's possible to check a string representing a file name includes a path by calling ajFilenameHasPath, which returns return ajTrue if the property is satisfied. Functions with the prefix ajFilenameTest test a filename:

/* Checks for path. */
AjBool  ajFilenameHasPath (const AjPStr name);

There are various functions to test a file name (with or without a path specification) against wildcards defining file names to include or exclude. File names may be included or excluded by default. The inclusion wildcard is used to select files, and the exclusion wildcard is then used to exclude selected files again:

/* Tests a filename.  Default is to exclude */
AjBool  ajFilenameTestExclude(const AjPStr filename,
                              const AjPStr exclude,
                              const AjPStr include);

/* Tests a full path file name. Default is to exclude. */
AjBool  ajFilenameTestExcludePath(const AjPStr filename,
                                  const AjPStr exclude,
                                  const AjPStr include);

/* Tests a file name. Default is to include. */
AjBool  ajFilenameTestExclude(const AjPStr filename,
                              const AjPStr exclude,
                              const AjPStr include);

/* Tests a full path file name. Default is to include). */
AjBool  ajFilenameTestExcludePath(const AjPStr filename,
                                  const AjPStr exclude,
                                  const AjPStr include); 

There are functions to check whether a named file exists and is in a specified mode (read, write or executable):

/* File exists */
AjBool  ajFilenameExists (const AjPStr filename);    

/* Exists and is a directory */
AjBool  ajFilenameExistsDir (const AjPStr filename); 

/* Exists and is executable */
AjBool  ajFilenameExistsExec (const AjPStr filename); 

/* Exists and is readable */
AjBool  ajFilenameExistsRead (const AjPStr filename); 

/* Exists and is writeable */
AjBool  ajFilenameExistsWrite (const AjPStr filename); 

To return the file size call ajFilenameGetSize:

ajlong  ajFilenameGetSize (const AjPStr name);

Functions for setting a string representing a file name have the prefix ajFilenameTrim if they remove the path and/or file extension from the name. The trim functions include:

/* Truncates a filename to a basic file name. */
AjBool  ajFilenameTrimAll (AjPStr* Pname);

/* Remove directory path and file extension from a file name. */
AjBool  ajFilenameTrimPathExt (AjPStr* Pname);       

/* Remove directory path (if any) only. */
AjBool  ajFilenameTrimPath (AjPStr* Pname);  

/* Remove extension (if any) only. */
AjBool  ajFilenameTrimExt (AjPStr* Pname);

The prefix ajFilenamereplace functions allow the file extension and path to be replaced with some other value. They have the prefix ajFilenameSet if they set the file name in some other way. For example, the file name can also be set to an available temporary file name (with optional path) by calling ajFilenameSetTempname:

/* Sets the file extension.*/
AjBool  ajFilenameReplaceExtS (AjPStr* Pname, const AjPStr extn);  
AjBool  ajFilenameReplaceExtC (AjPStr* Pname, const char* extn);

/* Sets the directory path. */
AjBool  ajFilenameReplacePathS (AjPStr* Pname, const AjPStr path);  
AjBool  ajFilenameReplacePathC (AjPStr* Pname, const char* path); 

/* Sets name to an available temporary file name with a defined path. */
AjBool  ajFilenameSetTempnamePathS (AjPStr* Pfilename, const AjPStr str);

/* Sets name to an available temporary file name. */
AjBool  ajFilenameSetTempname (const char** Pname);

There are a few functions for manipulating strings representing file paths:

/* Changes directory path to one level up. */
AjBool  ajDirnameUp (AjPStr* Pname);

/* Add trailing '/' if missing. */
AjBool  ajDirnameFix (AjPStr* Path);      

/* Checks for valid path, and appends trailing '/' if missing. */
AjBool  ajDirnameFixExists (AjPStr* Ppath);  

/* Checks for valid path and ensures full path definition is included. */
AjBool  ajDirnameFillPath (AjPStr* Ppath); 

ajDirnameFixExists and ajDirnameFillPath check that the specified path (Ppath) is a valid directory. A trailing / is appended to the path if missing. ajDirnameFillPath will navigate up the directory structure and modify the path if necessary to ensure that the full path definition is included.