Structural Biochemistry and Bioinformatics

Application Ssearch



SSEARCH(1)                User Commands                SSEARCH(1)



NAME
     ssearch - scan a protein or DNA sequence library for similar
     sequences


SYNOPSIS
     ssearch [-a -b # -d # -E # -f # -g # -h -i -l FASTLIBS    -L
     -r  STATFILE -m # -O filename -Q -s SMATRIX -w # -z ] query-
     sequence-file library-file

     ssearch [-QabdEfghilmOrswz] query-file @library-name-file

     ssearch [-QabdEfghilmOrswz] query-file "%PRMVI"

     ssearch [-aEfghilmrsw] - interactive mode


DESCRIPTION
     ssearch compares a protein or DNA sequence  to  all  of  the
     entries  in  a  sequence  library  using the rigorous Smith-
     Waterman algorithm (Smith and Waterman, J. Mol. Biol. (1983)
     147:195-197.   For  example,  ssearch  can compare a protein
     sequence to all of the sequences in  the  NBRF  PIR  protein
     sequence   database.    ssearch  will  automatically  decide
     whether the query sequence is DNA or protein by reading  the
     query  sequence  as  protein  and  determining  whether  the
     `amino-acid composition' is more than 85% A+C+G+T.  The pro-
     gram can be invoked either with command line arguments or in
     interactive mode.  ssearch compares a query  sequence  to  a
     sequence  library  which  consists  of  sequence data inter-
     spersed with  comments,  see  below.   The  fasta  programs,
     including ssearch, use a standard text format sequence file.
     Lines beginning with or lower case, blanks,tabs and unrecog-
     nizable  characters  are ignored.  ssearch expects sequences
     to use the single letter amino acid codes, see  protcodes(1)
     .   Library  files  for  ssearch  should have the form shown
     below.

OPTIONS
     ssearch can be directed to change the scoring matrix, search
     parameters, output format, and default search directories by
     entering options on the command line (preceeded by  a  `-').
     All  of  the  options  should preceed the file name and ktup
     arguments). Alternately, these options  can  be  changed  by
     setting  environment variables.  The options and environment
     variables are:

     -a   (SHOWALL) Modifies the display of the two sequences  in
          alignments.  Normally,  both  sequences  are shown only
          where they overlap (SHOWALL=0); If -a or  the  environ-
          ment  variable SHOWALL = 1, both sequences are shown in
          their entirety.



SunOS 5.5.1            Last change: local                       1






SSEARCH(1)                User Commands                SSEARCH(1)



     -b # The number of similarity scores to be shown when the -Q
          option is used.  This value is usually calculated based
          on the actual scores.

     -d # The  number  of  alignments  to  be  shown.   Normally,
          ssearch shows the same number of alignments as similar-
          ity scores.  By using ssearch -Q  -b  200  -d  50,  one
          would  see the top scoring 200 sequences and alignments
          for the 50 best scores.

     -E # The expectation value threshold for displaying similar-
          ity  scores  and  sequence alignments.  fasta -Q -E 2.0
          would show all library sequences with  scores  expected
          to  occur no more than 2 times by chance in a search of
          the library.

     -f # Penalty  for  the  first  residue  in  a  gap  (-12  by
          default).

     -g # Penalty  for  additional  residues  in  a  gap  (-2  by
          default).

     -h   Do not display histogram of similarity scores.

     -l file
          (FASTLIBS) The name of the library menu file.  Normally
          this  will  be  determined  by the environment variable
          FASTLIBS.  However, a library menu  file  can  also  be
          specified with -l.

     -L   display more information about the library sequence  in
          the alignment.

     -m # (MARKX) =0,1,2,3.  Alternate  display  of  matches  and
          mismatches in alignments. MARKX=0 uses ":","."," ", for
          identities,   consevative   replacements,   and    non-
          conservative replacements, respectively. MARKX=1 uses "
          ","x", and "X".   MARKX=2  does  not  show  the  second
          sequence, but uses the second alignment line to display
          matches  with  a  "."   for  identity,  or   with   the
          mismatched  residue  for mismatches.  MARKX=2 is useful
          for  aligning  large  numbers  of  similar   sequences.
          MARKX=3 writes out a file of library sequences in FASTA
          format.   MARKX=3  should  always  be  used  with   the
          "SHOWALL"  (-a)  option,  but  this does not completely
          ensure  that  all  of  the  sequences  output  will  be
          aligned.

     -O filename
          Sends copy of results to "filename".

report



SunOS 5.5.1            Last change: local                       2






SSEARCH(1)                User Commands                SSEARCH(1)



     -Q Quiet option.  This allows ssearch to search  a  database  and
          the  results  without  asking any questions. ssearch -Q
          file library > output can be put in the  background  or
          run  at  a  later time with the unix 'at' command.  The
          number of similarity scores  and  alignments  displayed
          with the -Q option can be modified with the -b (scores)
          and -d (alignments) options.

     -r   STATFILE Causes ssearch to write out the sequence iden-
          tifier, superfamily number (if available), and similar-
          ity scores  to  STATFILE  for  every  sequence  in  the
          library.  These results are not sorted.

     -s str
          (SMATRIX) the filename of an alternative scoring matrix
          file.   For  protein  sequences,  BLOSUM50  is  used by
          default; PAM250 can  be  used  with  the  command  line
          option -s 250.

     -w # (LINLEN) output line length  for  sequence  alignments.
          (normally 60, can be set up to 200).

     -z   Do not do statistical significance calculation.

EXAMPLES
     (1)  ssearch musplfm.aa $AABANK

     Compare the amino acid sequence in the file musplfm.aa  with
     the   complete   PIR  protein  sequence  library.   This  is
     extremely slow and should almost never be done.  ssearch  is
     designed to search very small libraries of sequences.

          >LCBO bovine preprolactin
          WILLLSQ ...
          >LCHU human ...
          ...


     (2)  ssearch -a -w 80 musplfm.aa lcbo.aa

     Compare the amino acid sequence in the file musplfm.aa  with
     the sequences in the file lcbo.aa using ktup = 1.  Show both
     sequences in their entirety, with 80 residues on each output
     line.

     (3)  ssearch

     Run the ssearch program in interactive  mode.   The  program
     will  prompt  for the file name for the query sequence, list
     alternative libraries to be seached (if  FASTLIBS  is  set),
     and prompt for the ktup.




SunOS 5.5.1            Last change: local                       3






SSEARCH(1)                User Commands                SSEARCH(1)



     You can use your own sequence files  for  ssearch,  just  be
     certain  to  put  a '>' and comment as the first line before
     the sequence.

SEE ALSO
     rss(1),    align(1),     fasta(1),     rdf2(1),protcodes(5),
     dnacodes(5)

AUTHOR
     Bill Pearson
     wrp@virginia.EDU












































SunOS 5.5.1            Last change: local                       4
Arne Elofsson Department of Biochemistry, Arrheniuslaboratoriet Stockholms Universitet 10691 Stockholm, Sweden	Tel: +46-(0)8/161553 Fax: +46-(0)8/153679 Hem: +46-(0)8/6413158 Email: arne@rune.biokemi.su.se WWW: /~arne/