A.3. Supported Alignment Formats

A.3.1. FASTA

This is just the standard FASTA sequence format (Section A.1, “Supported Sequence Formats”) with gaps, where many sequences are concatenated. FASTA format is used by Bill Pearson's suite of FASTA programs. For further information see:

http://rcr-www.med.nyu.edu/rcr/fastaman.html
>IXI_234
TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGTC
TTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTTG
PPAWAGDRSHE
>IXI_235
TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGTC
TTSTSTRHRGRSGW----------RASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTTG
PPAWAGDRSHE
>IXI_236
TSPASIRPPAGPSSRPAMVSSR--RPSPPPPRRPPGRPCCSAAPPRPQATGGWKTCSGTC
TTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR--GSRPPRFAPPLMSSCITSTTG
PPPPAGDRSHE
>IXI_237
TSPASLRPPAGPSSRPAMVSSRR-RPSPPGPRRPT----CSAAPRRPQATGGYKTCSGTC
TTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR--GSRPNRFAPTLMSSCLTSTTG
PPAYAGDRSHE

A.3.2. Markx0

This is the standard default output format used by Bill Pearson's suite of FASTA programs. For further information see:

http://rcr-www.med.nyu.edu/rcr/fastaman.html
########################################
# Program:  water
# Rundate:  Wed Jan 16 17:21:36 2002
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: IXI_234
# 2: IXI_235
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:     112/131 (85.5%)
# Similarity:   112/131 (85.5%)
# Gaps:          19/131 (14.5%)
# Score: 591.5
#
#
#=======================================


               10        20        30        40        50
IXI_23 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT
       :::::::::::::::         ::::::::::::::::::::::::::
IXI_23 TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT
               10                 20        30        40

               60        70        80        90       100
IXI_23 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG
       ::::::::::::::::::::::::          ::::::::::::::::
IXI_23 GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG
              50        60                  70        80

              110       120       130
IXI_23 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE
       :::::::::::::::::::::::::::::::
IXI_23 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE
              90       100       110

#---------------------------------------
#---------------------------------------        

A.3.3. Markx1

This is an alternative output format used by Bill Pearson's suite of FASTA programs in which identities are not marked. Instead, conservative replacements are denoted by 'x' and non-conservative substitutions by 'X'. For further information see:

http://rcr-www.med.nyu.edu/rcr/fastaman.html
########################################
# Program:  water
# Rundate:  Wed Jan 16 17:22:07 2002
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: IXI_234
# 2: IXI_235
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:     112/131 (85.5%)
# Similarity:   112/131 (85.5%)
# Gaps:          19/131 (14.5%)
# Score: 591.5
#
#
#=======================================


               10        20        30        40        50
IXI_23 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT

IXI_23 TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT
               10                 20        30        40

               60        70        80        90       100
IXI_23 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG

IXI_23 GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG
              50        60                  70        80

              110       120       130
IXI_23 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE

IXI_23 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE
              90       100       110

#---------------------------------------
#---------------------------------------  

A.3.4. Markx2

This is an alternative output format used by Bill Pearson's suite of FASTA programs in which the residues in the second sequence are only shown if they are different from the first. For further information see:

http://rcr-www.med.nyu.edu/rcr/fastaman.html
########################################
# Program:  water
# Rundate:  Wed Jan 16 17:22:25 2002
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: IXI_234
# 2: IXI_235
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:     112/131 (85.5%)
# Similarity:   112/131 (85.5%)
# Gaps:          19/131 (14.5%)
# Score: 591.5
#
#
#=======================================


               10        20        30        40        50
IXI_23 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT
IXI_23 ...............---------..........................

               60        70        80        90       100
IXI_23 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG
IXI_23 ........................----------................

              110       120       130
IXI_23 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE
IXI_23 ...............................

#---------------------------------------
#---------------------------------------        

A.3.5. Markx3

This is an alternative output format used by Bill Pearson's suite of FASTA programs in which the aligned sequences are displayed in FASTA sequence format. These can be used to build a primitive multiple alignment. For further information see:

http://rcr-www.med.nyu.edu/rcr/fastaman.html
########################################
# Program:  water
# Rundate:  Wed Jan 16 17:22:42 2002
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: IXI_234
# 2: IXI_235
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:     112/131 (85.5%)
# Similarity:   112/131 (85.5%)
# Gaps:          19/131 (14.5%)
# Score: 591.5
#
#
#=======================================

>IXI_234 ..
TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT
GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG
SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE
>IXI_235 ..
TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT
GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG
SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE

#---------------------------------------
#---------------------------------------    

A.3.6. Markx10

This is an alternative output format used by Bill Pearson's suite of FASTA programs in which the aligned sequences are displayed in FASTA sequence format and the sequence length, alignment start and stop information are given in lines starting with a ';' character just after the title line for each sequence. It is intended to be easily parsed by other programs. For further information see:

http://rcr-www.med.nyu.edu/rcr/fastaman.html
########################################
# Program:  water
# Rundate:  Wed Jan 16 17:23:00 2002
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: IXI_234
# 2: IXI_235
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:     112/131 (85.5%)
# Similarity:   112/131 (85.5%)
# Gaps:          19/131 (14.5%)
# Score: 591.5
#
#
#=======================================

>IXI_234 ..
; sq_len: 131
; al_start: 1
; al_stop: 131
; al_display_start: 1
TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT-GGWKTCSGTCTTSTSTRHRGRSGWSARTT
TAACLRASRKSMRAACSRSAG-SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE-
>IXI_235 ..
; sq_len: 131
; al_start: 1
; al_stop: 131
; al_display_start: 1
TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT
GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG
SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE

#---------------------------------------
#---------------------------------------                 

A.3.7. Match

This is a format defined by EMBOSS in which the start and end of matches between pairs of sequences are highlighted.

   131 IXI_234         +        1..131      IXI_235         +        1..131

#---------------------------------------
#---------------------------------------

A.3.8. MSF

MSF is the format used for multiple sequences by the Accelrys GCG, formerly known as the GCG Wisconsin, package. GCG was a commercial software package of programs and utilities for gene and protein analysis. For further information see:

http://www.accelrys.com/products/gcg/
!!AA_MULTIPLE_ALIGNMENT 1.0

  stdout MSF:  131 Type: P 16/01/02 CompCheck: 3003 ..

  Name: IXI_234 Len: 131  Check: 6808 Weight: 1.00
  Name: IXI_235 Len: 131  Check: 4032 Weight: 1.00
  Name: IXI_236 Len: 131  Check: 2744 Weight: 1.00
  Name: IXI_237 Len: 131  Check: 9419 Weight: 1.00

//

           1                                               50
IXI_234    TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT
IXI_235    TSPASIRPPAGPSSR.........RPSPPGPRRPTGRPCCSAAPRRPQAT
IXI_236    TSPASIRPPAGPSSRPAMVSSR..RPSPPPPRRPPGRPCCSAAPPRPQAT
IXI_237    TSPASLRPPAGPSSRPAMVSSRR.RPSPPGPRRPT....CSAAPRRPQAT

           51                                             100
IXI_234    GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG
IXI_235    GGWKTCSGTCTTSTSTRHRGRSGW..........RASRKSMRAACSRSAG
IXI_236    GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR..G
IXI_237    GGYKTCSGTCTTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR..G

           101                         131
IXI_234    SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE
IXI_235    SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE
IXI_236    SRPPRFAPPLMSSCITSTTGPPPPAGDRSHE
IXI_237    SRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE

A.3.9. Multiple

Simple format for multiple sequences. This format displays the sequence names, positions and sequences and then puts the markup line underneath the sequences.

########################################
# Program: demoalign
# Rundate: Thu 18 Mar 2010 10:03:39
# Commandline: demoalign
#    [-sequence] Msf
#    [-outfile] Multiple
#    -aformat2 multiple
# Align_format: multiple
# Report_file: Multiple
########################################

#=======================================
#
# Aligned_sequences: 4
# 1: IXI_234
# 2: IXI_235
# 3: IXI_236
# 4: IXI_237
# Matrix: EBLOSUM62
# Gap_penalty: 9
# Extend_penalty: -1
#
# Length: 131
# Identity:      95/131 (72.5%)
# Similarity:   127/131 (96.9%)
# Gaps:          25/131 (19.1%)
# 
#
#=======================================

IXI_234            1 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT     50
IXI_235            1 TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT     41
IXI_236            1 TSPASIRPPAGPSSRPAMVSSR--RPSPPPPRRPPGRPCCSAAPPRPQAT     48
IXI_237            1 TSPASLRPPAGPSSRPAMVSSRR-RPSPPGPRRPT----CSAAPRRPQAT     45
                     |||||:|||||||||:::::::  |||||:||||:::::|||||:|||||

IXI_234           51 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG    100
IXI_235           42 GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG     81
IXI_236           49 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR--G     96
IXI_237           46 GGYKTCSGTCTTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR--G     93
                     ||:||||||||||||||||||||:::::::::::|||||||||||||  |

IXI_234          101 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE    131
IXI_235           82 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE    112
IXI_236           97 SRPPRFAPPLMSSCITSTTGPPPPAGDRSHE    127
IXI_237           94 SRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE    124
                     |||:||||:|||||:|||||||::|||||||


#---------------------------------------
#---------------------------------------

A.3.10. Pair

This is the default format used when there are only 2 sequences. When "simple" format is selected but there are only 2 sequences, then this format is used. The sequences have a markup line between them.

########################################
# Program:  water
# Rundate:  Wed Jan 16 17:23:19 2002
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: IXI_234
# 2: IXI_235
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:     112/131 (85.5%)
# Similarity:   112/131 (85.5%)
# Gaps:          19/131 (14.5%)
# Score: 591.5
#
#
#=======================================

IXI_234            1 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT     50
                     |||||||||||||||         ||||||||||||||||||||||||||
IXI_235            1 TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT     41

IXI_234           51 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG    100
                     ||||||||||||||||||||||||          ||||||||||||||||
IXI_235           42 GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG     81

IXI_234          101 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE    131
                     |||||||||||||||||||||||||||||||
IXI_235           82 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE    112


#---------------------------------------
#---------------------------------------       

A.3.11. Score

This does not display the sequence alignment. It shows only the names of the sequences, the length of the alignment and the score.

IXI_234 IXI_235 131 (100.0)

#---------------------------------------
#---------------------------------------      

A.3.12. Simple

Simple format is known by the synonyms simple, multiple and unknown. This format displays the sequence names, positions and sequences and then puts the markup line underneath the sequences. When multiple sequences are aligned then the format is identical to "Multiple' format. When only two sequences are aligned then the format is identical to the "Pair" format.

A.3.13. SRS

This is the format used by the SRS system. SRS is a scalable and robust platform that provides fast access to diverse data resources in the life sciences from public and proprietary sources. For further information see:

http://www.biowisdom.com/solutions/srs/

The SRS format shows the sequence ID name, the sequence position and the sequence.

########################################
# Program:  alignret
# Rundate:  Wed Jan 16 17:18:29 2002
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 4
# 1: IXI_234
# 2: IXI_235
# 3: IXI_236
# 4: IXI_237
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:      95/131 (72.5%)
# Similarity:   127/131 (96.9%)
# Gaps:          25/131 (19.1%)
# Score: 100.0
#
#
#=======================================

IXI_234            1 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT     50
IXI_235            1 TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT     41
IXI_236            1 TSPASIRPPAGPSSRPAMVSSR--RPSPPPPRRPPGRPCCSAAPPRPQAT     48
IXI_237            1 TSPASLRPPAGPSSRPAMVSSRR-RPSPPGPRRPT----CSAAPRRPQAT     45

IXI_234           51 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG    100
IXI_235           42 GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG     81
IXI_236           49 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR--G     96
IXI_237           46 GGYKTCSGTCTTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR--G     93

IXI_234          101 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE    131
IXI_235           82 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE    112
IXI_236           97 SRPPRFAPPLMSSCITSTTGPPPPAGDRSHE    127
IXI_237           94 SRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE    124


#---------------------------------------
#---------------------------------------

A.3.14. SRS Pair

This is the format used by the SRS system for pairwise alignments. It is similar in style to "pair" format.

########################################
# Program:  water
# Rundate:  Wed Jan 16 17:23:40 2002
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: IXI_234
# 2: IXI_235
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:     112/131 (85.5%)
# Similarity:   112/131 (85.5%)
# Gaps:          19/131 (14.5%)
# Score: 591.5
#
#
#=======================================

IXI_234            1 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT     50
                     |||||||||||||||         ||||||||||||||||||||||||||
IXI_235            1 TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT     41

IXI_234           51 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG    100
                     ||||||||||||||||||||||||          ||||||||||||||||
IXI_235           42 GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG     81

IXI_234          101 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE    131
                     |||||||||||||||||||||||||||||||
IXI_235           82 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE    112


#---------------------------------------
#---------------------------------------     

A.3.15. TCOFFEE

The alignment used by the TCOFFEE package. TCOFFEE is a collection of tools for computing, evaluating and manipulating multiple alignments of DNA, protein sequences and structures. For further information see:

http://www.igs.cnrs-mrs.fr/Tcoffee/tcoffee_cgi/index.cgi
2
IXI_234 131 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHE
IXI_235 112 TSPASIRPPAGPSSRRPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGTCTTSTSTRHRGRSGWRASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHE
! score=591.5
! matrix=EBLOSUM62
! gapopen=10.0 gapext=0.5
#1 2
1 1 0 1 0
2 2 0 1 0
3 3 0 1 0
4 4 0 1 0
5 5 0 1 0
6 6 0 1 0

... < data omitted for brevity >

125 106 0 1 0
126 107 0 1 0
127 108 0 1 0
128 109 0 1 0
129 110 0 1 0
130 111 0 1 0
131 112 0 1 0
! SEQ_1_TO_N

A.3.16. Trace (debugging only)

This is a special verbose format for use in debugging. It might be useful to software developers but not for typical users of EMBOSS.

Trace output
============
a: Type:'P' Formatstr:'debug' Format:15
b: File:Trace
Show:     ShowAcc:No  ShowDes:No  ShowUsa:No
Booleans: Multi:No  Global:No  SeqOnly:No  SeqExternal:No
Numbers:  NMin:0  NMax:0  Nseqs:4  Count:0  Width:50
Matrices: IMatrix:'EBLOSUM62'(25)  FMatrix:''(0)
Strings:  Matrix:'EBLOSUM62'  GapPen:'9'  ExtPen:'-1'
Header: '<null>'
SubHeader: '<null>'
Tail: '<null>'
SubTail: '<null>'
Key: seqlen/len offset> start..end <offend (suboffset) rev Begin..End GapBegin..End

align0: Nseqs:4  LenAli:131  NumId:0  NumSim:0  NumGap:0  Score:'<null>'
fixed0: Nseqs:4  LenAli:131  NumId:95  NumSim:127  NumGap:25  Score:'<null>'
Num0.0: 131/131 0> 1..131 <0 (0) Rev:No 1..131 0..0
Seq0.0: 131 'TSPASIRPPAGPSSRPAMVS...SSCITSTTGPPAWAGDRSHE'
Num0.1: 112/112 0> 1..131 <0 (0) Rev:No 1..131 0..0
Seq0.1: 131 'TSPASIRPPAGPSSR-----...SSCITSTTGPPAWAGDRSHE'
Num0.2: 127/127 0> 1..131 <0 (0) Rev:No 1..131 0..0
Seq0.2: 131 'TSPASIRPPAGPSSRPAMVS...SSCITSTTGPPPPAGDRSHE'
Num0.3: 124/124 0> 1..131 <0 (0) Rev:No 1..131 0..0
Seq0.3: 131 'TSPASLRPPAGPSSRPAMVS...SSCLTSTTGPPAYAGDRSHE'

#---------------------------------------
#---------------------------------------