EMBOSS can be integrated with several common non-sequence biological databases. These are described in this section.
REBASE is the restriction enzyme database maintained by New England Biolabs. It is needed for programs such as remap and restrict. The latest version of Rebase can be obtained by anonymous FTP (ftp.neb.com/pub/rebase/). EMBOSS needs the
proto files. The data is extracted for EMBOSS with the program rebaseextract:
mkdir /site/prog/emboss/data/REBASE% rebaseextract Extract data from REBASE REBASE database withrefm file: /data/rebase/withrefm.208 REBASE database proto file: /data/rebase/proto.208
Rebase is now installed and ready to use.
TRANSFAC is the transcription factor binding site database. It is available by anonymous FTP (ftp.ebi.ac.uk/pub/databases/transfac/). Unpacking the distribution reveals a file called
site.dat. This is the one EMBOSS needs. Run tfextract to extract the data from TRANSFAC:
tfextractExtract data from TRANSFAC Full pathname of transfac SITE.DAT: /databases/transfac/site.dat
tfscan can now access the TRANSFAC database.
PROSITE is a database of regular expressions that match potentially diagnostic regions for structural/functional classification of proteins. EMBOSS needs this database for the patmatmotifs program. PROSITE can be obtained via anonymous FTP from the EMBL-EBI. Download the
prosite.doc files to the same directory. Then run prosextract to build the EMBOSS Prosite database specifying the download directory:
prosextractBuilds the PROSITE motif database for patmatmotifs to search Enter name of prosite directory: /data/prosite
PROSITE is now integrated into your EMBOSS installation.
PRINTS is a database of diagnostic patterns of blocks of sequence homology in protein families. The PRINTS database can be searched using the EMBOSS program pscan. PRINTS can be obtained via anonymous FTP from the EMBL-EBI. The database is made available as compressed files which should be uncompressed using gzip before integrating them into EMBOSS. PRINTS is integrated with EMBOSS using the program printsextract:
printsextractExtract data from PRINTS Input file: /data/prints/prints28_0.dat
The PRINTS database is now integrated with EMBOSS.
An amino acid index is a set of 20 numerical values representing any of the different physicochemical and biological properties of amino acids. The
AAindex1 section of the Amino Acid Index Database is a collection of published indices together with the result of cluster analysis using the correlation coefficient as the distance between two indices. This section currently contains 437 indices in release 4.0 of the database.
aaindexextractExtract data from AAINDEX Full pathname of file aaindex1: /data/aaindex/aaindex1
The AAINDEX database is now integrated with EMBOSS.
The CUTG database contains a series of codon usage tables calculated from GenBank. CUTG can be obtained via anonymous FTP from the EMBL-EBI server. CUTG is integrated with EMBOSS using the program cutgextract which writes files to the
CODONS data directory:
cutgextractExtract data from CUTG CUTG directory [.]: /data/cutg/
The CUTG database is now integrated with EMBOSS.
Download and unzip the
Archive.zip file and then run
jaspextract specifying the
jaspextractExtract data from JASPAR JASPAR database directory [.]: /data/jaspar/all_data/FlatFileDir
Other data files should be kept in the data directory under the main EMBOSS installation.
Personal (user) data files can be kept in:
The current working directory
.embossdata of the current directory
Their home directory
.embossdata of their home directory
EMBOSS will search these locations in this order and will stop as soon as it finds a matching file. If the personal directories do not contain the desired file, EMBOSS will search the system-wide data directory (
Apparently inexplicable errors when running EMBOSS programs may be caused by the system not using the data files one expects. The search path can be displayed in search order using the command
For more information on EMBOSS data files, see the EMBOSS Users Guide.