1.6. Post-installation of EMBOSS

The most important post-installation step is to set your operating system environment so that it knows where to find the EMBOSS applications. Assuming that you followed our suggestion and configured EMBOSS using --prefix=/usr/local/emboss then you need to add the directory /usr/local/emboss/bin to your PATH. How to do this will depend on your operating system and the command shell you use. You can find out which shell you are using by typing:

env | grep SHELL

For users of the sh or bash shells (or derivatives) the PATH is altered using the following lines.

PATH="$PATH:/usr/local/emboss/bin"
export PATH

If you want to make these definitions available for all users then you would typically add the lines to the system /etc/profile file. If you just want to use EMBOSS yourself then you can add the lines to (e.g.) the .bashrc file in your home directory.

For users of csh or tcsh shells the PATH is altered using the following line.

set path=($path /usr/local/emboss/bin)

If you want to make these definitions available for all users then you would typically add the lines to the system /etc/csh.cshrc file. If you just want to use EMBOSS yourself then you can add the line to (e.g.) the .cshrc file in your home directory.

Note

You may have to log out and log back in again for the changes to your PATH to take effect.

1.6.1. Testing the EMBOSS Installation

An easy way to check that all is working is to use the EMBOSS application embossversion.

% embossversion
Writes the current EMBOSS version number
6.1.0

If the version number of EMBOSS is not printed similarly to the above then all is not well; if it is printed then celebrate appropriately.

1.6.1.1. Common Errors During Testing

The most common error is Command not found whenever you type in an EMBOSS application name. This is caused by incorrectly setting up the PATH (see above). Double-check that you set up the PATH correctly and, if necessary, take advice from someone familiar with the operating system you're using.

The second most common error is a report by the program that it cannot find the libnucleus library. This is one of the EMBOSS libraries and, if you followed our suggestion, it will be found in the /usr/local/emboss/lib directory after the installation phase. As long as you have set up your PATH correctly then EMBOSS should always be able to find its libraries. It has, however, been reported that some systems (notably SuSE Linux variants) have problems. In this case there are a few solutions.

  1. With [Open]SuSE this error often happens if you have not specified a --prefix option or have otherwise installed EMBOSS at the root of the /usr/local directory tree such that the EMBOSS libraries are in the /usr/local/lib directory. [Open]SuSE maintains a cache of the contents of that directory which you will need to rebuild by typing as the superuser:

    ldconfig

    Do this also for other operating systems that maintain such a cache. If the error happens on other operating systems or distributions then you could do one of the following:

    • Add the path to the EMBOSS libraries to the LD_LIBRARY_PATH environment variable. For example:

      LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/fu/bar/lib"
      export LD_LIBRARY_PATH
    • Or, for csh shells:

      setenv LD_LIBRARY_PATH "$LD_LIBRARY_PATH:/fu/bar/lib"
  2. Add the path to the EMBOSS libraries system-wide. This is perhaps the preferable way. For example, under Linux you could add the following line to the /etc/ld.so.conf file:

    /fu/bar/lib

    and then type:

    ldconfig

    For other operating systems, check the manual pages to see how to do the equivalent operations.

1.6.2. Post-installation of Data Files

If you wish to use the restriction mapping, domain recognition and amino acid index applications in EMBOSS then you will need to download the following databases from the Internet; all are relatively small. Download them all to a temporary directory.

1.6.2.1. REBASE

This is available from ftp://ftp.neb.com/pub/rebase/

You need the withrefm and proto files from that directory. A common error is to download the withref file by mistake - it must be the withrefm file. The file extensions for these files change on the server every month to reflect the date.

Then type:

rebaseextract

and follow the prompts.

1.6.2.2. AAINDEX

This is available from ftp://ftp.genome.ad.jp/pub/db/community/aaindex/

You need the aaindex1 file from that directory.

Then type:

aaindexextract

and follow the prompts.

1.6.2.3. PRINTS

This is available from ftp://ftp.ebi.ac.uk/pub/databases/prints/

You need the prints.dat file from that directory.

Then type:

printsextract

and follow the prompts.

1.6.2.4. PROSITE

This is available from ftp://ftp.ebi.ac.uk/pub/databases/prosite/release/

You need the prosite.dat and the prosite.doc files from that directory

Then type:

prosextract

and follow the prompts.

You can now delete the data files you downloaded.

1.6.2.5. JASPAR

This is available from http://jaspar.genereg.net/html/DOWNLOAD/

You need the Archive.zip file. Uncompress it and then run:

jaspextract

and specify the all_data/FlatFileDir directory in response to the prompt. You can now delete the source directory contents.

1.6.3. Deleting the EMBOSS Package

If you followed our advice and gave a --prefix option to the configure command, thereby specifying a directory where EMBOSS alone would be installed, then there are two methods for deleting EMBOSS.

  1. If you've kept the source code tree from which you'd done the make install.

    In this case, deleting the installation is easy. Just type:

    make uninstall

    This has the advantage that it will delete EMBOSS but will not delete any configuration files you have spent ages developing for your system. This is useful if you wish to reinstall a new version of EMBOSS after the deletion.

  2. If you didn't keep the source code tree.

    As long as you specified a suitable --prefix option to the configure command then you can use a UNIX rm -rf directoryname command to delete the EMBOSS installation tree.

    If you didn't specify a --prefix option to the configure command but did do a make install then you'll have to clean EMBOSS out of the /usr/local directory tree manually or, better, reinstall the same version of EMBOSS on top of itself and then use the make uninstall method.

1.6.4. Keeping EMBOSS Up To Date

From time to time, bugfixes or new functionalities are provided, which can be applied to the version of EMBOSS you have installed. At such times new source code files will appear on our FTP server in the directory:

ftp://emboss.open-bio.org/pub/EMBOSS/fixes/

Usually these source code files are replacements for files that came with the EMBOSS distribution. You should read the README.fixes file in the above directory to see what the file fixes and whereabouts in the EMBOSS source directory it lives.

To apply the fixes, copy the source code file to its correct location, return to the top level of your EMBOSS source code tree and type:

make clean
make install

This is, of course, another very good reason for not deleting your EMBOSS source code tree.

A more convenient way of applying all the fixes from the above directory is to use the patch file in the subdirectory:

ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/

The patch files are of the form patch-1-n.gz where n refers to the latest source code correction in the README.fixes file in the directory above. So, if there are ten corrections in the latter file then the patch file would be called patch-1-10.gz.

The patch files are applied using the UNIX patch command e.g.:

gunzip EMBOSS-6.1.0.tar.gz
tar xf EMBOSS-6.1.0.tar
cd EMBOSS-6.1.0
gunzip -c patch-1-10.gz | patch -p1

Or, if the file has been uncompressed in transit:

patch -p1 < patch-1-10

You should always start with freshly extracted EMBOSS source code, as above, before applying a patch. This allows you to see any errors more easily. On rare occasions the developers will provide a patch file that contains fixes to a binary file. Some operating systems (e.g. FreeBSD) cannot handle binary patches and will report that such a patch file is malformed. In those circumstances follow the instructions in the nonbinary directory.

1.6.5. Installing a New Version of EMBOSS

A new version of EMBOSS is released at least once per year, typically on St Swithun's Day (15th July). Before installing the new version you should either delete the existing EMBOSS version (if installing to the same directory) or install EMBOSS in a new location. Do not install a new version of EMBOSS on top of an existing installation as files from previous versions may cause compatibility problems.

Note

If you changed any system library or execution paths when you first installed EMBOSS then make sure you update these as necessary. A new version of EMBOSS is unlikely to work if new executables are trying to access older versions of the EMBOSS libraries.

1.6.6. EMBOSS Configuration Files

EMBOSS includes two files that are used to configure the package, particularly for defining databases and for making global settings that influence the behaviour of all EMBOSS programs.

The file emboss.default is used for site-wide configuration. Template files are included:

Stable release (.../share/EMBOSS/emboss.default.template)
CVS releases (.../emboss/emboss/emboss.default.template)

The file .embossrc, which you can create in your personal home directory, is used for user-specific customisation. Typically you might test, for example, database definitions in your own ~/.embossrc file before adding them to emboss.default.

1.6.6.1. Syntax of emboss.default and .embossrc Files

1.6.6.1.1. Blank lines and comments

Blank lines are ignored. Comments start with a '#' character in the first position of a line. For example:

# this is a comment
1.6.6.1.2. Includes

INCLUDE allows you to include a subsidiary file as part of the text of the main emboss.default or .embossrc file at the position of the INCLUDE command. This is useful for keeping the configuration files tidy. For example, to include the contents of the file project_databases.def:

INCLUDE "project_databases.def"
1.6.6.1.3. Variable definitions

Variables may be set with the keyword SETENV, (usually shortened to SET or ENV - either is ok), followed by the variable name, then the value to which you wish it set. For example:

SET dbdir /data/sequencedbs

This variable may now be used in the rest of the file emboss.default by preceding it with a $. For example:

file: $dbdir/data.dat

The name of the variable is case-insensitive when used within emboss.default or .embossrc.

1.6.6.2. Configuring EMBOSS for Different Groups of Users

When maintaining EMBOSS for multiple users, more than one configuration might be required, for example to provide access to different sets of databases or data directories. It can be time consuming and error prone to maintain a series of individual .embossrc files in each user directory, or to force users to work in the same directory.

An alternative is to maintain one central copy of each of the different configuration files (.embossrc) in its own directory. All the user then need do is set the environment variable EMBOSSRC in their .cshrc (csh) or .profile (bash) file to point to the appropriate directory.

1.6.7. EMBOSS Environment Variables

Caution

It is possible to make EMBOSS unusable if you adjust the global variables. For example:

SET EMBOSS_HELP 1

will make all EMBOSS programs only display their help when they are run.

Table 1.1. Environment variables
Environment VariableDescriptionTypeDefault value
EMBOSS_ACDCOMMANDLINELOGLog file for full commandline, used to convert QA test definitions into memory leak test command linesstring""
EMBOSS_ACDFILENAMEUse filename rather than sequence name as default for file namingbooleanN
EMBOSS_ACDLOGLog ACD processing to file program.acdlog to debug ACD processingbooleanN
EMBOSS_ACDPROMPTSNumber of times to prompt for a value interactivelyinteger1
EMBOSS_ACDROOTEMBOSS root directory for finding filesstring(install directory)
EMBOSS_ACDUTILROOTEMBOSS source directory for finding filesstring(source directory)
EMBOSS_ACDWARNRANGEWarn if a number is out of range and fixed to be within limitsbooleanN
EMBOSS_AUTORun with all default values unless -noauto is on the command linebooleanN
EMBOSS_CACHESIZECache size to use for database indexinginteger2048
EMBOSS_DATAEMBOSS directory for finding data filesstring(install directory)
EMBOSS_DEBUGWrite debug output to program.dbg unless -nodebug is on the command linebooleanN
EMBOSS_DEBUGBUFFERBuffer debug output to save I/O time but risk losing output on a crashbooleanN
EMBOSS_DIEPrint program abort messages to standard errorbooleanY
EMBOSS_DOCROOTEMBOSS directory for finding application documentationstring(install directory)
EMBOSS_EMBOSSRCDirectory to search for an additional .embossrc filestring(current directory)
EMBOSS_FEATWARNPrint warning messages when parsing feature table inputbooleanY
EMBOSS_FILTERBy default read standard input and write to standard output unless -nofilter is on the command linebooleanY
EMBOSS_FORMATInput sequence formatstringunknown
EMBOSS_GRAPHICSDefault graphics output devicestringx11
EMBOSS_HOMERCRead the .embossrc file in the user's home directorybooleanY
EMBOSS_HTTPVERSIONHTTP versionstring1.1
EMBOSS_LANGUAGE(Obsolete) Language used for the codes.language filestringenglish
EMBOSS_LOGFILESystem statistics log filestring""
EMBOSS_MYEMBOSSACDROOTMYEMBOSS package source directory for user's uninstalled utility ACD filesstring(source directory)
EMBOSS_NAMDEBUGWrite log nessages to standard error while processing .embossrc and emboss.defaultsstringN
EMBOSS_NAMVALIDDetailed validation while processing .embossrc and emboss.defaultsstringN
EMBOSS_OPTIONSPrompt for optional command line values unless -nooptions is on the command linebooleanN
EMBOSS_OUTDIRECTORYDirectory used to write outputstring(current directory)
EMBOSS_OUTFEATFORMATOutput feature formatstringgff
EMBOSS_OUTFORMATOutput sequence formatstringfasta
EMBOSS_PAGERApplication to use for pages output to screenstringmore
EMBOSS_PAGESIZEPage size to use for database indexinginteger2048
EMBOSS_PROXYHTTP proxy server address in the form proxy.xyz.ac.uk:7890string""
EMBOSS_RCHOMEProcess the .embossrc file in the home directorybooleanY
EMBOSS_SEQWARNPrint warning messages when parsing standard sequence charactersbooleanN
EMBOSS_STDOUTBy default write to standard output unless -nostdout is on the command linebooleanY
EMBOSS_TIMETODAYDate and time to override the current date - used to give a standard date and time for test runsstring2010-07-15 12:00:00
EMBOSS_VERBOSEPrint verbose help outputbooleanN
EMBOSS_WARNOBSOLETEPrint warning messages when ACD file declares an application as 'obsolete'booleanY
Table 1.2. Environment variables associated with global qualifiers.
Environment VariableDescriptionTypeDefault value
EMBOSS_ERRORPrint error messages to standard errorbooleanY
EMBOSS_FATALPrint fatal error messages to standard errorbooleanY
EMBOSS_WARNINGPrint warning messages to standard errorbooleanY
Table 1.3. Environment variables to launch external applications.
Environment variableDescriptionTypeDefault value
EMBOSS_CLUSTALWName or path to launch clustalwstringclustalw
EMBOSS_PRIMER3_COREName or path to launch primer3_corestringprimer3_core
EMBOSS_HMMALIGNName or path to launch hmmalignstringhmmalign
EMBOSS_HMMBUILDName or path to launch hmmbuildstringhmmbuild
EMBOSS_HMMCALIBRATEName or path to launch hmmcalibratestringhmmcalibrate
EMBOSS_HMMCONVERTName or path to launch hmmconvertstringhmmconvert
EMBOSS_HMMEMITName or path to launch hmmemitstringhmmemit
EMBOSS_HMMFETCHName or path to launch hmmfetchstringhmmfetch
EMBOSS_HMMINDEXName or path to launch hmmindexstringhmmindex
EMBOSS_HMMPFAMName or path to launch hmmpfamstringhmmpfam
EMBOSS_HMMSEARCHName or path to launch hmmsearchstringhmmsearch
EMBOSS_MASTName or path to launch maststringmast
EMBOSS_MEMEName or path to launch memestringmeme
EMBOSS_MIRAName or path to launch mirastringmira
EMBOSS_MIRAESTName or path to launch miraESTstringmiraEST
EMBOSS_BLASTPGPName or path to launch blastpgpstringblastpgp
EMBOSS_FORMATDBName or path to launch formatdbstringformatdb
EMBOSS_MODELFROMALIGNName or path to launch modelfromalignstringmodelfromalign
EMBOSS_NACCESSName or path to launch naccessstringnaccess
EMBOSS_RPSBLASTName or path to launch rpsblaststringrpsblast
EMBOSS_STAMPName or path to launch stampstringstamp
EMBOSS_STRIDEName or path to launch stridestringstride

EMBOSS defines various environment variables. They include global variables used to control the behaviour of all EMBOSS programs, and variables to set the location of system files or directories, specify default values etc. There is normally no need to set the environment variables, but you may do so to customise the behaviour of your instance of EMBOSS.

Environment variables are useful for simplifying maintenance of your .embossrc file. If, for example, you specify the location of your databases as an environment variable, then if you move the databases you only have to update one line in the configuration file. For example, for the data directory:

/data/databases/flatfiles/

you might have something like this:

set EMBOSS_database_dir /data/databases/flatfiles

SET EMBOSS_embldir $EMBOSS_database_dir/embl

The second line sets another variable to the directory:

/data/databases/flatfiles/embl

Global environment variables must have UPPERCASE names and usually have Boolean values; they can be turned on by setting them to "1", or "Y" (they are off by default.) The global variables can also be set in the UNIX session by defining an environment variable with the commands:

setenv NAME value (csh type shells)
export NAME=value (sh type shells)

where NAME is the name of the variable and value is the value you wish to set it to.

1.6.7.1. Global Qualifiers

EMBOSS includes several global qualifiers (see the EMBOSS Users Guide) that are available to all the applications. They are typically used by advanced users (who use -options or -verbose) or by developers (who use -debug, -acdlog). They may be set as follows:

set EMBOSS_QUALIFIER 1

where QUALIFIER is one of the global qualifiers. The value above is 1 but can be:

1 or Y for true.
0 or N for false.

Setting the qualifier value to true has the effect of running every program with that qualifier set. Qualifiers, when set, will work in the same way as if you used them when running the program. For example you can:

set EMBOSS_VERBOSE Y

and the program will run normally, but when the program is run with the -help qualifier, the output will be in verbose form.

Other program options that can be set include

  • EMBOSS_FORMAT

  • EMBOSS_ACDROOT

  • EMBOSS_DATA

The value of EMBOSS_FORMAT determines which default sequence format to use for output. For example, if you are running EMBOSS alongside GCG you may wish to have the following entry in your .embossrc:

set EMBOSS_FORMAT gcg
set EMBOSS_OUTFORMAT gcg

which has the effect of using GCG format for input and output by default.

If you wish to use a different directory for the ACD files then this can be set:

set EMBOSS_ACDROOT /path/to/acd

If you wish to maintain a separate data directory then use:

set EMBOSS_DATA /path/to/data

1.6.7.2. Logging

System administrators may wish to make use of the logging facilities of EMBOSS. Setting the variable EMBOSS_LOGFILE forces the system to keep a log of which programs are used when and by whom:

set EMBOSS_LOGFILE /site/log/emboss.log

The log file structure is very simple. Three tab-separated fields are stored, program name, user name, and the date and time:

prettyplot      joeuser        Wed Aug 02 14:29:13 2000

The file defined by EMBOSS_LOGFILE should be world writable. The following command ensures logging can occur:

chmod o+w /site/log/emboss.log

All settings can be overridden in a users .embossrc file by redefining the relevant variables. So to prevent your system usage being logged you can redefine EMBOSS_LOGFILE by putting the following entry in your .embossrc file:

set EMBOSS_LOGFILE /dev/null

This behaviour may change in the future to prevent users redefining some system settings.

1.6.7.3. Environment Variables File (variables.standard)

Descriptions of the environment variables are stored in the EMBOSS system file variables.standard which is stored and installed in the application ACD file directory. An excerpt of this file is shown below:

acdcommandlinelog string  "" "Log file for full commandline, used to convert QA test definitions into memory leak test command lines"
acdlog boolean "N" "Log ACD processing to file program.acdlog"
acdprompts integer "1" "Number of times to prompt for a value when interactive"
acdroot string "(install directory)" "EMBOSS root directory for finding files"
acdutilroot string "(source directory)" "EMBOSS source directory for finding files"

1.6.8. EMBOSS Data Files

EMBOSS data files are included in the distribution and stored in the standard EMBOSS data directory, which can be defined by the EMBOSS environment variable EMBOSS_DATA.

If you built EMBOSS using make install, EMBOSS will by default install the data files, including those installed with rebaseextract, prosextract, printsextract, aaindexextract or cutgextract, in the directory:

share/EMBOSS/data

under the install directory, which is defined by the --prefix when you configured the package (see the EMBOSS Users Guide). Typically this is:

usr/local/emboss/share/EMBOSS/data.

If EMBOSS was not installed using make install but just compiled using make, then by default the data files are in:

emboss/data

under the directory where emboss was built.

If you want to keep your data files somewhere else, or have a set of datafiles you want to keep separate from those distributed with the package, then you can set the EMBOSS_DATA environment variable in your emboss.default or .embossrc file.

To see the available EMBOSS data files, run:

embossdata -showall

To fetch one of the data files into your current directory for you to inspect or modify, run:

embossdata -fetch -file EDatafileName.dat

where EDatafileName.dat is the name of the data file.

Users can provide their own data files in their own directories. Project specific files can be put in the current directory or, for tidier directory listings, in a subdirectory called ".embossdata". Similarly, for files to be accessible to all EMBOSS applications, invoked from any location, they can be put in your home directory, or in a subdirectory under it called ".embossdata".

The directories are searched in the following order:

  • * . (your current directory)

  • * .embossdata (under your current directory)

  • * ~/ (your home directory)

  • * ~/.embossdata