4.6. Patterns, Profiles and Multiple Sequence Alignment

BLAST or FASTA searching is not covered in this tutorial because they are not currently part of main EMBOSS package (although interfaces are available); these searches are offered at many web sites worldwide. Database searches are an important part of the bioinformatician's arsenal. When you screen a new sequence against a database of known sequences, you are trying to answer the following questions:

If you can identify a relationship to a protein of known structure, it is possible to infer that the new protein shares a common structure with its relative and to assign its general fold. However, what if the homologue has no known structure? If its function has been identified then you might expect the unknown protein to have a similar or related function. However, many exceptions do exist. A classic example is lysozyme, which shares around 50% sequence identity and 70% sequence similarity with alpha-lactalbumin. The two proteins also share similar folds, but their functions are entirely different: the two key catalytic residues of lysozyme are not conserved in alpha-lactalbumin, and the acidic calcium binding motif important to the function of alpha-lactalbumin is not present in most lysoszymes. It is essential that, where possible, you confirm any computer based predictions with benchwork.

What can you do if sequence similarity alone does not satisfactorily identify a relative? Next are shown a few more applications that can help you predict the function of your sequence.

4.6.1. Pattern Matching

In a number of cases, the active site of a protein can be recognized by a specific `fingerprint' or `template', a fairly small set of residues that are unique to a family of proteins. An example is the sequence GXGXXG (where G=glycine and X=any amino acid) which defines a GTP binding site. Searching for a (rather loose) predefined string of characters in a sequence is called 'pattern matching'.

The EMBOSS program patmatmotifs looks for sequence motifs by searching with a pattern search algorithm through the given protein sequence for the patterns defined in the PROSITE database. PROSITE is a database of protein families and domains, based on the observation that, while there are a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor.

4.6.2. Exercise: patmatmotifs

% patmatmotifs
Search a motif database with a protein sequence
Input protein sequence: L07770.pep
Output report [l07770_1.patmatmotifs]: L07770.patmatmotifs


% more L07770.patmatmotifs

########################################
# Program: patmatmotifs
# Rundate: Wed 18 Feb 2009 14:58:32
# Commandline: patmatmotifs
#    -sequence L07770.pep
# Report_format: dbmotif
# Report_file: l07770_1.patmatmotifs
########################################

#=======================================
#
# Sequence: L07770_1     from: 1   to: 354
# HitCount: 2
#
# Full: No
# Prune: Yes
# Data_file: /m2/emboss/emboss/emboss/data/PROSITE/prosite.lines
#
#=======================================

Length = 17
Start = position 123 of sequence
End = position 139 of sequence

Motif = G_PROTEIN_RECEP_F1_1

TLGGEVALWSLVVLAVERYMVVCKPMA
     |               |
   123               139

Length = 17
Start = position 290 of sequence
End = position 306 of sequence

Motif = OPSIN

PVFMTVPAFFAKSSAIYNPVIYIVLNK
     |               |
   290               306


#---------------------------------------
#---------------------------------------

In this case you already know that the sequence is a rhodopsin. However, if you had an unknown sequence, identifying motifs might provide you with information to help you plan further experiments.

4.6.3. Protein Fingerprints

PRINTS is a database that defines functional protein families, identifying each domain by a number of short, particularly well conserved sequences. A full match to one of these "fingerprints" will match all the relevant short sequences in the correct order. A partial match is recorded if some are missing or if they occur in an incorrect order. The PRINTS database can be searched using the pscan program which is available within EMBOSS. However, PRINTS is now part of InterPro and so it is advisable to install and use the EMBASSY wrapper to the IPRSCAN package assuming it's available for your platform.

4.6.4. Exercise: pscan

% pscan
Scans proteins using PRINTS
Input protein sequence(s): L07770.pep
Minimum number of elements per fingerprint [2]:
Maximum number of elements per fingerprint [20]:
Output file [L07770_1.pscan]: L07770.pscan

Scanning L07770+1...
% more L07770.pscan


CLASS 1
Fingerprints with all elements in order

Fingerprint GPCRRHODOPSN Elements 7
    Accession number PR00237
    Rhodopsin-like GPCR superfamily signature
  Element 1 Threshold 54% Score 61%
         Start position 39 Length 25
  Element 2 Threshold 49% Score 49%
         Start position 72 Length 22
  Element 3 Threshold 48% Score 55%
         Start position 117 Length 23
  Element 4 Threshold 50% Score 69%
         Start position 152 Length 22
  Element 5 Threshold 51% Score 82%
         Start position 204 Length 24
  Element 6 Threshold 42% Score 72%
         Start position 250 Length 25
  Element 7 Threshold 46% Score 68%
         Start position 288 Length 27

CLASS 2
All elements match but not all in the correct order

Fingerprint RHODOPSIN Elements 6
    Accession number PR00579
    Rhodopsin signature
  Element 1 Threshold 80% Score 100%
         Start position 3 Length 19
  Element 2 Threshold 76% Score 94%
         Start position 22 Length 17
  Element 3 Threshold 53% Score 90%
         Start position 85 Length 17
  Element 4 Threshold 71% Score 100%
         Start position 191 Length 17
  Element 5 Threshold 56% Score 97%
         Start position 271 Length 19
  Element 6 Threshold 81% Score 95%
         Start position 319 Length 14

CLASS 3
Not all elements match but those that do are in order


CLASS 4
Remaining partial matches

4.6.5. Multiple Sequence Analysis

The simultaneous alignment of many nucleotide or amino acid sequences is now an essential tool in molecular biology. Multiple alignments are used to find diagnostic patterns to characterize protein families, to detect or demonstrate homology between new sequences and existing families of sequences, to help predict the secondary and tertiary structures of the new sequences, to suggest oligonucleotide primers for PCR and as an essential prelude to molecular evolutionary analysis.

One of the most popular programs for performing multiple sequence alignments is clustalw. EMBOSS has an interface to clustalw called emma. clustalw (and thus emma) creates a multiple sequence alignment from a group of related sequences using progressive pairwise alignments. It can also produce a dendrogram showing the clustering relationships used to create the alignment. The dendrogram shows the order of the pairwise alignments of sequences and clusters of sequences that together generate the final alignment, but it is not an evolutionary tree (although the length of the branches is related to the relative distance of the sequences).

clustalw finds global optimal alignments. The alignment procedure begins with the pairwise alignment of the two most similar sequences, producing a cluster of two aligned sequences. This cluster can then be aligned to the next most related sequence or cluster of aligned sequences. Two clusters of sequences can be aligned by a simple extension of the pairwise alignment of two individual sequences. The final alignment is achieved by a series of progressive, pairwise alignments that include increasingly dissimilar sequences and clusters, until all sequences have been included in the final pairwise alignment. When gaps are inserted into a sequence to produce an alignment, they are inserted at the same position in all the sequences of the cluster. Each pairwise alignment uses the method of Needleman and Wunsch extended for use with clusters of aligned sequences.

pscan reported that the sequence belongs to the rhodopsin family. This is a very large family of sequences - for example, you can see the Pfam entry for rhodopsin by doing a keyword search at http://www.sanger.ac.uk/Software/Pfam

We will now retrieve some further members of the family from SwissProt and produce a multiple alignment; you'll then use this multiple alignment to produce a profile of this group of sequences and use that to align them all to the original sequence.

First, let's retrieve the sequences using seqret:

4.6.6. Exercise: Retrieving a Set of Sequences

% seqret
Reads and writes (returns) a set of sequences
Input (gapped) sequence(s): sw:ops*2_*
output sequence(s) [ops2_drome.fasta]: ops2.fasta

Note the use of the wildcard character * to retrieve all swissprot sequences whose identifiers begin with ops*2_.

4.6.7. Exercise: emma

% emma
Multiple alignment program - interface to ClustalW program
Input sequence: ops2.fasta
Output sequence set [ops2_drome.aln]: ops2.aln
Dendrogram (tree file) from clustalw output file [ops2_drome.dnd]: ops2.dnd

 CLUSTAL W (1.83) Multiple Sequence Alignments



Sequence type explicitly set to Protein
Sequence format is Pearson
Sequence 1: OPS2_DROME       381 aa
Sequence 2: OPS2_DROPS       381 aa
Sequence 3: OPS2_SCHGR       380 aa
Sequence 4: OPSC2_HEMSA      377 aa
Sequence 5: OPSD2_ANGAN      352 aa
Sequence 6: OPSD2_PATYE      399 aa
Sequence 7: OPSG2_ASTFA      353 aa
Sequence 8: OPSG2_CARAU      349 aa
Sequence 9: OPSG2_DANRE      349 aa
Sequence 10: OPSR2_DANRE      356 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  92
Sequences (1:3) Aligned. Score:  33
Sequences (1:4) Aligned. Score:  38
Sequences (1:5) Aligned. Score:  22
Sequences (1:6) Aligned. Score:  21
Sequences (1:7) Aligned. Score:  26
Sequences (1:8) Aligned. Score:  23
Sequences (1:9) Aligned. Score:  23
Sequences (1:10) Aligned. Score:  25
Sequences (2:3) Aligned. Score:  32
Sequences (2:4) Aligned. Score:  37
Sequences (2:5) Aligned. Score:  20
Sequences (2:6) Aligned. Score:  22
Sequences (2:7) Aligned. Score:  23
Sequences (2:8) Aligned. Score:  24
Sequences (2:9) Aligned. Score:  23
Sequences (2:10) Aligned. Score:  21
Sequences (3:4) Aligned. Score:  33
Sequences (3:5) Aligned. Score:  22
Sequences (3:6) Aligned. Score:  20
Sequences (3:7) Aligned. Score:  21
Sequences (3:8) Aligned. Score:  20
Sequences (3:9) Aligned. Score:  19
Sequences (3:10) Aligned. Score:  22
Sequences (4:5) Aligned. Score:  24
Sequences (4:6) Aligned. Score:  24
Sequences (4:7) Aligned. Score:  24
Sequences (4:8) Aligned. Score:  25
Sequences (4:9) Aligned. Score:  22
Sequences (4:10) Aligned. Score:  23
Sequences (5:6) Aligned. Score:  21
Sequences (5:7) Aligned. Score:  35
Sequences (5:8) Aligned. Score:  67
Sequences (5:9) Aligned. Score:  66
Sequences (5:10) Aligned. Score:  37
Sequences (6:7) Aligned. Score:  20
Sequences (6:8) Aligned. Score:  23
Sequences (6:9) Aligned. Score:  22
Sequences (6:10) Aligned. Score:  20
Sequences (7:8) Aligned. Score:  39
Sequences (7:9) Aligned. Score:  38
Sequences (7:10) Aligned. Score:  82
Sequences (8:9) Aligned. Score:  85
Sequences (8:10) Aligned. Score:  42
Sequences (9:10) Aligned. Score:  41
Guide tree        file created:   [00004255C]
Start of Multiple Alignment
There are 9 groups
Aligning...
Group 1: Sequences:   2      Score:5375
Group 2: Sequences:   3      Score:5387
Group 3: Sequences:   2      Score:5429
Group 4: Sequences:   5      Score:2568
Group 5: Sequences:   2      Score:6128
Group 6: Sequences:   3      Score:2747
Group 7: Sequences:   4      Score:2500
Group 8: Sequences:   9      Score:1827
Group 9:                     Delayed
Sequence:6     Score:1930
Alignment Score 27426
GCG-Alignment file created      [00004255B]

We have aligned ops2 sequences from two fruit fly species, two crab species, locust and scallop. Let's see what emma made of them:

% more ops2.aln

>OPSG2_CARAU
--------------------MNGTEGNNFYVPLSNRTGLVRSPFEYPQYYLAEPWQFKLL
AVYMFFLICLGLPINGLTLICTAQHKKLRQPLNFILVNLAVAGAIMVCFGFTVTFYT-AI
NGYFALGPTGCAVEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSSTHASAGIAF
TWVMAMACAAPPLVG-WSRYIPEGIQCSCGPDYYTLNPEYNNESYVLYMFICHFILPVTI
IFFTYGRLVCTVKAAAAQQQD------------SASTQKAEREVTKMVILMVLGFLVAWT
PYATVAAWIFFNKGAAFSAQFMAIPAFFSKTSALYNPVIYVLLNKQFRSCMLTTLFCGKN
PLGDEESSTVSTSKTEVSSVSPA-------------------------------------
---------------------------
>OPSG2_DANRE
--------------------MNGTEGNNFYIPMSNRTGLVRSPYEYTQYYLADPWQFKAL
AFYMFFLICFGLPINVLTLLVTAQHKKLRQPLNYILVNLAFAGTIMAFFGFTVTFYC-SI
NGYMALGPTGCAIEGFFATLGGQVALWSLVVLAIERYIVVCKPMGSFKFSSNHAMAGIAF
TWVMASSCAVPPLFG-WSRYIPEGMQTSCGPDYYTLNPEFNNESYVLYMFSCHFCVPVTT
IFFTYGSLVCTVKAAAAQQQE------------SESTQKAEREVTRMVILMVLGFLVAWV
PYASFAAWIFFNRGAAFSAQAMAIPAFFSKASALFNPIIYVLLNKQFRSCMLNTLFCGKS
PLGDDESSSVSTSKTEVSSVSPA-------------------------------------
---------------------------
>OPSD2_ANGAN
--------------------MNGTEGPNFYVPMSNVTGVVRSPFEYPQYYLAEPWAYSAL
AAYMFFLIIAGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYT-SM
HGYFVFGPTGCNIEGFFATLGGEIALWCLVVLAVERWMVVCKPMSNFRFGENHAIMGVAF
TWVMALACAAPPLFG-WSRYIPEGMQCSCGMDHYAPNPETYNESFVIYMFICHFTIPLTV
ISFCYGRLVCTVKEATAQQQE------------SETTQRAEREVTRMVIIMVISFLVCWV
PYASVAWYIFTHQGSSFGPIFMTIPAFFAKSSSLYNPLIYICMNKQSRNCMITTLCCGKN
PFEEEEGASTTASKTEASSVSSVSPA----------------------------------
---------------------------
>OPSG2_ASTFA
---------MAAHEPVFAARRHNEDTTRESAFVYTNANNTRDPFEGPNYHIAPRWVYNVS
SLWMIFVVIASVFTNGLVIVATAKFKKLRHPLNWILVNLAIADLGETVLASTISVIN-QI
FGYFILGHPMCVFEGWTVSVCGITALWSLTIISWERWVVVCKPFGNVKFDGKWAAGGIIF
SWVWAIIWCTPPIFG-WSRYWPHGLKTSCGPDVFSGSEDPGVASYMITLMLTCCILPLSI
IIICYIFVWSAIHQVAQQQKD------------SESTQKAEKEVSRMVVVMILAFIVCWG
PYASFATFSAVNPGYAWHPLAAAMPAYFAKSATIYNPIIYVFMNRQFRSCIMQLFGKKVE
-----DASEVSGSTTEVSTAS---------------------------------------
---------------------------
>OPSR2_DANRE
--------MAEWANAAFAARRRGDETTRDNAFSYTNSNNTRDPFEGPNYHIAPRWVYNVA
TVWMFFVVVASTFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETLFASTISVIN-QV
FGYFILGHPMCIFEGYTVSVCGIAGLWSLTVISWERWVVVCKPFGNVKFDGKWASAGIIF
SWVWAAVWCAPPIFG-WSRYWPHGLKTSCGPDVFGGNEDPGVQSYMLVLMITCCILPLAI
IILCYIAVFLAIHAVAQQQKD------------SESTQKAEKEVSRMVVVMILAFCLCWG
PYTAFACFAAANPGYAFHPLAAAMPAYFAKSATIYNPIIYVFMNRQFRVCIMQLFGKKVD
-----DGSEVSTSKTEVSSVAPA-------------------------------------
---------------------------
>OPS2_DROME
MERSHLPETPFDLAHSGPRFQAQSSGNGSVLDNVLPDMAHLVNPYWSRFAPMDPMMSKIL
GLFTLAIMIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIIN-FY
YETWVLGPLWCDIYAGCGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKILF
IWMMAVFWTVMPLIG-WSAYVPEGNLTACSIDYMTRMWNPRSYLITYSLFVYYTPLFLIC
YSYWFIIAAVAAHEKAMREQAKKMNVKSLRS-SEDCDKSAEGKLAKVALTTISLWFMAWT
PYLVICYFGLF-KIDGLTPLTTIWGATFAKTSAVYNPIVYGISHPKYRIVLKEKCPMCVF
GNTDEPKPDAPASDTETTSEADSKA-----------------------------------
---------------------------
>OPS2_DROPS
MERSLLPEPPLAMALLGPRFEAQTGGNRSVLDNVLPDMAPLVNPYWSRFAPMDPTMSKIL
GLFTLVILIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIIN-FY
YETWVLGPLWCDIYAACGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKIAF
IWMMAVFWTIMPLIG-WSSYVPEGNLTACSIDYMTRQWNPRSYLITYSLFVYYTPLFMIC
YSYWFIIATVAAHEKAMRDQAKKMNVKSLRS-SEDCDKSAENKLAKVALTTISLWFMAWT
PYLIICYFGLF-KIDGLTPLTTIWGATFAKTSAVYNPIVYGISHPKYRLVLKEKCPMCVC
GSTDEPKPDAPPSDTETTSEAESKA-----------------------------------
---------------------------
>OPSC2_HEMSA
---MTNATGPQMAYYGAASMDFGYPEGVSIVDFVRPEIKPYVHQHWYNYPPVNPMWHYLL
GVIYLFLGTVSIFGNGLVIYLFNKSAALRTPANILVVNLALSDLIMLTTNVPFFTYNCFS
GGVWMFSPQYCEIYACLGAITGVCSIWLLCMISFDRYNIICNGFNGPKLTTGKAVVFALI
SWVIAIGCALPPFFG-WGNYILEGILDSCSYDYLTQDFNTFSYNIFIFVFDYFLPAAIIV
FSYVFIVKAIFAHEAAMRAQAKKMNVSTLRS-NEADAQRAEIRIAKTALVNVSLWFICWT
PYALISLKGVMGDTSGITPLVSTLPALLAKSCSCYNPFVYAISHPKYRLAITQHLPWFCV
HETETKSNDDSQSNSTVAQDKA--------------------------------------
---------------------------
>OPS2_SCHGR
-----MVNTTDFYPVPAAMAYESSVGLPLLGWNVPTEHLDLVHPHWRSFQVPNKYWHFGL
AFVYFMLMCMSSLGNGIVLWIYATTKSIRTPSNMFIVNLALFDVLMLL-EMPMLVVSSLF
YQRPVGWELGCDIYAALGSVAGIGSAINNAAIAFDRYRTISCPIDG-RLTQGQVLALIAG
TWVWTLPFTLMPLLRIWSRFTAEGFLTTCSFDYLTDDEDTKVFVGCIFAWSYAFPLCLIC
CFYYRLIGAVREHEKMLRDQAKKMNVKSLQSNADTEAQSAEIRIAKVALTIFFLFLCSWT
PYAVVAMIGAFGNRAALTPLSTMIPAVTAKIVSCIDPWVYAINHPRFRAEVQKRMKWLHL
GEDARSSKSDTSSTATDRTVGNVSASA---------------------------------
---------------------------
>OPSD2_PATYE
--------------------------------------MPFPLNRTDTALVISPSEFRII
GIFISICCIIGVLGNLLIIIVFAKRRSVRRPINFFVLNLAVSDLIVALLGYPMTAAS-AF
SNRWIFDNIGCKIYAFLCFNSGVISIMTHAALSFCRYIIICQYGYRKKITQTTVLRTLFS
IWSFAMFWTLSPLFG-WSSYVIEVVPVSCSVNWYGHGLGDVSYTISVIVAVYVFPLSIIV
FSYGMILQEKVCKDSRKNGIR------AQQRYTPRFIQDIEQRVTFISFLMMAAFMVAWT
PYAIMSALAIG--SFNVENSFAALPTLFAKASCAYNPFIYAFTNANFRDTVVEIMAPWTT
RRVGVSTLPWPQVTYYPRRRTSAVNTTDIEFPDDNIFIVNSSVNGPTVKREKIVQRNPIN
VRLGIKIEPRDSRAATENTFTADFSVI

The sequences are very similar, but there are some differences - note the gaps that have been inserted. Also note that since this is a global alignment algorithm, gaps have been inserted to make all the sequences the same length.

Differences in alignment can be very difficult to see in this format. The program prettyplot can enhance visualisation of your results, by aligning the sequences on top of one another.

4.6.8. Exercise: prettyplot

% prettyplot
Displays aligned sequences, with colouring and boxing
Input sequence set: ops2.aln
Graph type [x11]:

A graphic display will appear on your screen detailing your alignment. Identical residues are shown in red, and similar residues in green. This type of display can give you a first impression of regions of conservation.

As with all EMBOSS graphical programs you can capture the output in a file rather than just viewing it on screen. The output is controlled by the -graph family of associated qualifiers (type prettyplot -help -verbose to get a complete listing of options.

We will save the pretty plot to a file rhodopsin.ps in colour postscript format. To do this you use -graph cps and -goutfile rhodopsin:

% prettyplot ops2.aln -goutfile rhodopsin -graph cps
Displays aligned sequences, with colouring and boxing
Created rhodopsin.ps

This has created a file rhodopsin.ps that can be printed on a postscript printer or turned into a PDF document with ps2pdf (not an EMBOSS program but commonly found on many UNIX/Linux systems). PDF documents can then be viewed with a PDF viewer such as Acrobat Reader.

To adjust the output of prettyplot (e.g. to increase the number of residues per line) there are a number of options that can be set. Read the help file and try to plot with/without a consensus, different numbers of residues per line and so on. (hint: prettyplot -help)

4.6.9. Profiles

A very powerful technique for characterizing the putative structure and function of a sequence is profile analysis. This is a sequence comparison method for finding and aligning distantly related sequences. The comparison allows a new sequence to be aligned optimally to a family of similar sequences. The comparison uses a scoring matrix and an existing optimal alignment of two or more similar protein sequences. The group or 'family' of similar sequences are first aligned together to create a multiple sequence alignment. The information in the multiple sequence alignment is then represented as a table of position-specific symbol comparison values and gap penalties. This table is called a profile. The similarity of new sequences to an existing profile can be tested by comparing each new sequence to the profile using a modification of the Smith/Waterman algorithm.

4.6.10. Exercise: prophecy

prophecy is an EMBOSS program for creating a profile from a set of multiple-aligned sequences. You'll use the ops2 alignment to show you prophecy:

% prophecy
Creates matrices/profiles from multiple alignments
Input sequence: ops2.aln
Profile type
F : Frequency
G : Gribskov
H : Henikoff
Select type [F]: g
Scoring matrix [Epprofile]:
Enter a name for the profile [mymatrix]: ops2 sequences
Gap opening penalty [3.0]:
Gap extension penalty [0.3]:
Output file [ops2.prophecy]: ops2.prophecy

4.6.11. Exercise: prophet

Now let's use the profile you just created to align L07770.pep to the opsin2 sequences:

% prophet
Gapped alignment for profiles
Input sequence(s): L07770.pep
Profile or weight matrix file: ops2.prophecy
Gap opening coefficient [1.0]:
Gap extension coefficient [0.1]:
Output alignment [l07770_1.prophet]: ops2.prophet

% more ops2.prophet

########################################
# Program: prophet
# Rundate: Wed 18 Feb 2009 15:58:33
# Commandline: prophet
#    -sequence L07770.pep
#    -infile ops2.prophecy
#    -outfile ops2.prophet
# Align_format: simple
# Report_file: ops2.prophet
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: ops2
# 2: L07770_1
# Matrix: EBLOSUM62
#
# Length: 368
# Identity:     193/368 (52.4%)
# Similarity:   250/368 (67.9%)
# Gaps:          16/368 ( 4.3%)
# Score: 2671.089
# 
#
#=======================================

ops2              21 MNGTEGNNFYVDNVNPTGLPRVPFEWPNYYLADPWMFKILALFMFFLIIA     70
                     ||||||.||||...|.||:.|.||::|.||||:||.:..||.:||.||:.
L07770_1           1 MNGTEGPNFYVPMSNKTGVVRSPFDYPQYYLAEPWQYSALAAYMFLLILL     50

ops2              71 SCFGNGLVLYITAKHKKLRTPLNFILVNLAFADLIMALFGSPVTVINCFI    120
                     ....|.:.|::|.:|||||||||:||:||.||:..|.|.|..||:... :
L07770_1          51 GLPINFMTLFVTIQHKKLRTPLNYILLNLVFANHFMVLCGFTVTMYTS-M     99

ops2             121 YGYFVLGPLGCDIEAFLGSLGGIVSLWSLCVIAFERYIVICKPFGGFKFT    170
                     :|||:.|..||.||.|..:|||.|:||||.|:|.|||:|:|||...|:|.
L07770_1         100 HGYFIFGQTGCYIEGFFATLGGEVALWSLVVLAVERYMVVCKPMANFRFG    149

ops2             171 GKHAIAGIAFTWVMAIFWAAPPLFGIWSRYIPEGILTSCGPDYYTGNEDP    220
                     ..|||.|:||||:||:..||||||| ||||||||:..|||.||||...:.
L07770_1         150 ENHAIMGVAFTWIMALSCAAPPLFG-WSRYIPEGMQCSCGVDYYTLKPEV    198

ops2             221 GSYSIVIYLFIYHFPLPLICISYCYIILACAAHEAAAQQQAKKMNVKSLR    270
                     .:.|.|||:||.||.:|||.|.:||..|.|...|||||||.         
L07770_1         199 NNESFVIYMFIVHFTIPLIVIFFCYGRLLCTVKEAAAQQQE---------    239

ops2             271 SNSSESTQKAEREVAKMVLLMILLFLVAWTPYASFAAFGAFNKGAAFTPL    320
                        |.:|||||:||.:||::|::.||:.|.|||..|.:...::|:.|.|:
L07770_1         240 ---SATTQKAEKEVTRMVVIMVVFFLICWVPYAYVAFYIFTHQGSNFGPV    286

ops2             321 AAAIPAFFAKSSALYNPIIYVIMNPQFRSCIKETLPCGVNGETDEESSDV    370
                     ...:|||||||||:|||:||:::|.|||:|:..||.||.|...||:.|..
L07770_1         287 FMTVPAFFAKSSAIYNPVIYIVLNKQFRNCLITTLCCGKNPFGDEDGSSA    336

ops2             371 STSKTEVSSVSPA--KAA    386
                     :|||||.||||.:  ..|
L07770_1         337 ATSKTEASSVSSSQVSPA    354


#---------------------------------------
#---------------------------------------

The vertical bars (|) represent residues that are identical between the ops2 consensus and the rhodopsin, while the colons (:) represent conservative substitutions. Aligning members of a family can reveal conserved regions that may be important for structure and/or function.