6.1. Introduction to the EMBOSS Command Line

6.1.1. Finding and Running EMBOSS Applications

EMBOSS applications are invoked by typing their name at a command line prompt. For example, to run the seqret application you would type:

seqret

If you're not certain of the application you need, see the tables of application names and short descriptions (Section 3.1, “Application Documentation”).

The same information is retrieved by running the wossname application. This searches for keywords or parts of words in the application short description (the text that is displayed by a program when it first starts). If no keywords are specified, then details of all the EMBOSS programs are output. Simply type:

wossname

6.1.2. Application Options

Every application has a set of options allowing you to specify all of the inputs and outputs, including input and data files and values that control how the application operates. Options might be application-specific, available for particular datatypes only (datatype-specific or available for all datatypes (global). All options are described in the application documentation:

Application-specific options are defined in an Ajax Command Definition (ACD) file, associated with the EMBOSS program. To retrieve this list of options from the command line, run the application with -help (and nothing else):

seqret -help

To get a complete list of options that includes datatype-specific options (inbuilt options associated with the datatypes the application processes), and global options (ones available to all applications), run the application with -help -verbose:

seqret -help -verbose

Some application options must be specified and some are optional. EMBOSS makes the distinction between application parameters and qualifiers. Parameters are always required and prompted for if necessary whereas qualifiers may or may not be required and prompted for, depending on how they are specified in the ACD file.

Values for parameters and qualifiers are set either on the command line used to run the program, or as a response to a prompt generated by EMBOSS before the main application code runs. Any required values that you have not already given on the command line will be prompted for automatically.

For example, the seqret application can be run with an input sequence by typing:

seqret input.seq

seqret, however, has two parameters. They are the input and output sequence files, therefore, if you type the above command you will be prompted for the output sequence.

Datatype-specific qualifiers (Section 6.4, “Datatype-specific Command Line Qualifiers”) are available for specific input and output datatypes, for any application which uses these datatypes. They are used to specify a particular input or output in more detail, for instance the format of an output sequence file. The command below calls seqret with the -osformat qualifier to set the output format of the sequence file to embl. -osformat is specific to the sequence output datatypes:

seqret input.seq -osformat embl

Global qualifiers (Section 6.3, “Global Command Line Qualifiers”) are available to all EMBOSS applications. They change the behaviour of the program for which they are set. You've already come across the use -help which is a global qualifier to retrieve application options:

seqret -help

6.1.3. Parameters and Qualifiers

Application-specific options are defined in the Ajax Command Definition (ACD) file that is associated with the EMBOSS program. The ACD file determines exactly what can appear on the command line and how values are prompted for. If you only intend to use but not write ACD files, then you don't need to know the ACD syntax or even look at the ACD file. All parameters and qualifiers are described in the application documentation and help is available at the command line by using -help.

Every application option has a corresponding definition in the ACD file and is defined as one of:

  • parameter

  • standard qualifier

  • additional qualifier

with the default of:

  • advanced qualifier

Parameters are usually the primary input and output files whereas qualifiers are used for other options.

You don't need to use a flag to specify a value for a parameter on the command line. Values are typically specified like this:

ApplicationNameParameterValue

It is, however, necessary to give such unqualified parameter values in the same order as the corresponding data definitions appear in the ACD file (and documentation).

In contrast, you must use a flag to give a value for a qualifier. Values for standard, additional and advanced qualifiers are specified like this:

ApplicationName -QualifierName QualifierValue

The flag can optionally be given for a parameter too:

ApplicationName -ParameterName ParameterValue

In either case, where the flag is given, values can be given in any order. The flags (parameter or qualifier names) are listed in the documentation, are shown when running the application with -help, or can be seen in the ACD file itself (they are the text tokens given after the colon (:) on the first line of each data definition.)

Example. In seqret.acd two parameters are defined; an input sequence (with the parameter name sequence) and an output sequence (called outseq). The input sequence is defined before the output sequence:

application: seqret 
[
  documentation: "Reads and writes (returns) sequences"
  groups: "Edit"
]

section: input 
[
  information: "Input section"
  type: "page"
]

seqall: sequence 
[
    parameter: "Y"
]

endsection: input

.
.
.

section: output 
[
  information: "Output section"
  type: "page"
]

seqoutall: outseq 
[
    parameter: "Y"
]

endsection: output

Assuming our input sequence was in the file input.seq and you wanted to write a file called output.seq, the following command is perfectly valid:

seqret input.seq output.seq

Whereas the following command would mess things up:

seqret output.seq input.seq

EMBOSS would try to open a file called output.seq for reading, and would also open a file called input.seq for writing, possibly overwriting a valuable data file in the latter case!

Where the flags are used, values can be given in any order, so either of the following is perfectly valid:

seqret -sequence input.seq -outseq output.seq
seqret -outseq output.seq -sequence input.seq

6.1.4. Datatype-specific Qualifiers

Datatype-specific qualifiers (Section 6.4, “Datatype-specific Command Line Qualifiers”) are available for specific input and output datatypes. They are used to specify a particular input or output in more detail, for instance the format of an output sequence file, or the types of data that are written in an application report.

6.1.4.1. Multiple Qualifiers

In cases where an application has two or more options of the same ACD datatype, the command line flags refer to the option that preceded the flag on the command line, but not those appearing afterwards. Flags that are specific to options of different datatypes can be intermixed: the order is not important.

In the example below, the program seqret takes two parameters, an input sequence (file in.seq) and an output sequence (out.seq) . The order of the command line flags that follow is irrelevant as the two qualifiers refer to different datatypes:

seqret in.seq out.seq -sformat fasta -osformat gcg

In the following example, the program water takes two parameters, both input sequences (files aap.seq and noot.seq, of datatypes sequence and seqall, each of which can have a -sformat qualifier), and here the order of the qualifiers is important. Assuming aap.seq is in FASTA format and noot.seq is in GCG format we have:

water aap.seq -sformat fasta noot.seq -sformat gcg

6.1.4.2. Numbering Qualifiers

Instead of having to adhere to a rigorous order for command line flags when two or more options of the same (class of) datatype are defined, it is also possible to use numbers with the qualifier/parameter names, to indicate the option to which the flag refers.

This is formalised as follows:

-qualifiername QualifierPosition qualifiervalue

where QualifierPosition is an integer number indicating the option to which the flag refers. The number should reflect the order of that option in the ACD file relative to other options of the same type: it is not the absolute position of the data definition! For example, if an ACD file contains two sequence input parameters (at the top of the ACD file) and two align output parameters for alignment output (at the bottom of the file), the align parameters would be numbered 1 and 2 respectively, not 3 and 4 which would be their absolute position in the file.

In the following example, qualifier numbering indicates that the format of the first parameter is fasta and the second gcg:

someprogram aap.seq noot.seq -sformat2 gcg -sformat1 fasta

As a further example, consider the ACD file below:

application: seqtest

sequence: asequence
[
    parameter: Y
]

int :  wibblefactor     
[
    parameter: Y
]

sequence: bsequence 
[
    parameter: Y
]

The following command line:

seqtest seqtest.in 5 seqtest.out -sformat1 gcg -sformat2 fasta

defines that the first sequence file (seqtest.in) is in GCG format and the second sequence file (seqtest.out) is in FASTA format. Note that the second -sformat qualifier has been numbered 2 because it is the second sequence parameter, even though it is the third parameter in the file.

6.1.5. Global Qualifiers

Global qualifiers (Section 6.3, “Global Command Line Qualifiers”) are command line qualifiers that are available to all EMBOSS applications. They change the behaviour of the program for which they are set. They are used in the same way as any other qualifier, but are usually given on the command line after the application name and other parameters.

6.1.6. Command line Styles

EMBOSS supports three different command line styles. In the examples below, the seqret application is used to retrieve a 100 nucleotide sequence from the input sequence P10932 from the EMBL database. The global qualifier -auto is used to turn off any prompting of the user.

Unix style:

% seqret embl:P10932 -send 100 -auto
% seqret -send 100 embl:P10932 -auto

SeqPup style:

% seqret embl:P10932 -end=100 -auto

VMS style:

%  seqret /SEQUENCE=EMBL:P10932 /SEND=100 /AUTO

As you can see, the command line syntax is very versatile. To save confusion, it is strongly recommended that you use the UNIX command style.

6.1.7. Environment Variables

The general behaviour of EMBOSS programs such as prompting for values, the directory to be searched for data files, default sequence formats, messaging etc, may be controlled with environment variables. See Section 2.8, “Maintenance” for more information.