Examples -- The JPEG versions doesn't look nearly as good as the postscript or tif files. color.ps black-and-white.ps

Getting Started

Of course, this page is no substitue for the actual Alscript Manual.

Getting the alignment file

This is what I did.
Start with FASTA format files. An example

$ cat seq1.fa seq2.fa seq3.fa > 3seqs.fa
$ clustalw -align -outorder=input -infile=3seqs.fa -output=gcg
$ msf2blc -q < 3seqs.msf > 3seqs.blc
I got clustalw1.82.UNIX.tar.gz from IUBio Archive.

Replace periods in the .blc file with spaces. This is necessary because you can tell Alscript to use the sequence number of one particular protein, but it seems to count periods as residues, but spaces as gaps.
In vi, go to start of sequence then type

    :.,$ s/\./ /g
Or
sed '/iteration/,/endoffile/ s/\./ /g' 3seqs.blc > final.blc
I use the script fa2blc to do this automatically.
Here is the Block file I used for this page.

Getting the secondary structure input

The following script either uses molauto (part of the Molscript package) to generate the secondary structure portion of the input file OR takes the information from an existing Molscript input file. I call the script "addstruct". Run it just by typing "addstruct".

The addnumber is added to the sequence number and allows you to slide the secondary structure elemnets if necessary. If you want to adapt this script to use output from a different program and can't figure out how, contact me.

This creates a file that gets incorporated into the Alscript input.

 
[dcoop Temp2]->cat addstruct
#!/bin/sh
#usage addstruct 
echo -n "What type of input?  [pdb] / ms  "; read type
  if [ -z $type ] ; then type=pdb; fi
echo "Please pick an input file.  These are available:"
 echo ; ls *.$type 2> /dev/null
 echo ; echo -n "Which one?  " ;read file
echo -n "What is the alscript name?  These are some possibilities:"
 echo ; ls *.blc 2> /dev/null | awk -F . '{printf $1"  "}'
 echo ; echo -n "What name?  " ;read name
echo -n "Number to add? [ 0 ] "; read add
 if [ -z $add ] ; then add=0; fi
echo -n "Chain? [A] "; read chain
 if [ -z $chain ] ; then chain="A"; fi
echo "s/$chain//g" >tEmP

echo "Its a little slower than you would think."
#Is type PDB?
if [ $type = "pdb" ] ; then
#    IF type is PDB, Check to see if SS info is present
     if ! [ `grep -l HELIX $file` ] 
#    if SS info not present run molauto -- should change to DSSP
     then 
  molauto $file  | grep $chain |egrep -e helix -e strand |sed -f tEmP -e 's/;//' >tEmP2
#    if present (assumes dssp2pdb version 0.02 output) get info 
#       puts in molscript format just to be compatible with rest of script
     else
     grep $chain $file | egrep -e HELIX -e SHEET | \
      awk '$1=="HELIX" {print ("helix from "$6" to "$9)}
           $1=="SHEET" {print ("strand from "$7" to "$10)}' > tEmP2     
     fi
#If Molscript input, just extract
  else cat $file | grep $chain |grep -v "!" |egrep -e helix -e strand |sed -f tEmP -e 's/;//' >tEmP2
fi
echo "     #Secondary Structure" > $name.ss
awk -v add=$add '
     /strand/ {print (" col#COLOUR_TEXT_REGION ",$3+add,"$ss",$5+add,"$ss 4")} 
      /strand/ {print ("     STRAND ",$3+add,"$ss",$5+add)}
     /helix/ {print (" col#COLOUR_TEXT_REGION ",$3+add,"$ss",$5+add,"$ss 5")} 
      /helix/ {print ("     HELIX  ",$3+add,"$ss",$5+add)} 
' tEmP2 >>$name.ss

echo "     #Secondary Structure Labels" >>$name.ss
echo '     FONT_REGION 1 $ssl $loa $ssl 3'>>$name.ss
awk -v add=$add '
      /strand/ {ns=ns+1}
      /strand/ {print ("     TEXT",int(($3+$5)/2+add),"$ssl \"b"ns"\"")}
      /helix/ {nh=nh+1}
      /helix/ {print ("     TEXT",int(($3+$5)/2+add),"$ssl \"a"nh"\"")}
' tEmP2>>$name.ss
 
rm -f tEmP tEmP2
echo "Thanks.  Drive through."

The file created looks like this:

     #Secondary Structure
 col#COLOUR_TEXT_REGION  112 $ss 119 $ss 4
     STRAND  112 $ss 119
 col#COLOUR_TEXT_REGION  250 $ss 260 $ss 5
     HELIX   250 $ss 260
     #Secondary Structure Labels
     FONT_REGION 1 $ssl $loa $ssl 3
     TEXT 115 $ssl "b1"
     TEXT 250 $ssl "a5"

The Meat of the script

The following script was designed to create an image that looks like one of the above. It has the sequence alignment with several colors of conservation, secondary structures with labls, and three label blocks. The real utility of this script is that if you decide to move things around (ie move the secondary structure from below to above the alignment or add a blank line between to elements), you only have to change a few lines in the top of the script. The traditional way would require you to change dozens of numbers in sporadic places in the input file, which makes visually basic changes an editing nightmare.

At this point the script is not entirely automatic and does require some manual editing. Generally these are the start and stop points of various labels, boxes, and groupings (things you would need to figure out anyway). In the beginning, it will probably help if you uncomment the NUMBER_SEQS and DO_TICKS lines and comment out the NO_NUMBERS line. This will give you output that looks like this:

This script

[dcoop Temp2]->cat doals
#!/bin/sh
#Use this section to describe what goes on what line
name=6seqs           # name.blc -> name.als -> name.ps
nb=4                 # number of lines before sequences
na=3                 # number of lines after sequences
b1="113  1  273  1"  # Block 1 (start - line# - end - line#)
b2="113  3  192  3"  # Block 2
b3="197  3  273  3"  # Block 3
color=yes             # "yes" for color 

# Necessary Calculations -- no changes should be necessary
#  EXCEPT fraction in nc
ns=`grep -c ">" $name.blc`        # number of seqs
fs=`expr $nb + 1`                 # line of first sequence
ls=`expr $nb + $ns`               # line of last sequence
ss=`expr $nb + $ns + 2`           # line of secondary structure
ssl=`expr $nb + $ns + 3`          # line of secondary sstructure labels
b11=`echo $b1 |awk '{print $2}'`  # The seq# of block 1
b22=`echo $b2 |awk '{print $2}'`  # The seq# of block 2
b33=`echo $b3 |awk '{print $2}'`  # The seq# of block 3
ab=`grep -n "*" $name.blc |head -1 |awk -F ":" '{print $1}'`
ae=`grep -n "*" $name.blc |tail -1 |awk -F ":" '{print $1}'`
loa=`expr  $ae - $ab - 1`         #Length of alignment
#nc defines the percentage of conservation
nc=`expr $ns \* 3 / 4`


#Generate Alscript Input
cat << end-top > temp.als
     #Comments in ALscript command files start with a #
     #Commands are free format - separated by blank, tab or comma characters
     #But no blank lines.  Blank lines must have a comment character #
     #
     BLOCK_FILE 	$name.blc
     OUTPUT_FILE 	$name.ps
     PORTRAIT
     POINTSIZE 6
     DEFINE_FONT 0 Helvetica DEFAULT
     DEFINE_FONT 1 Helvetica-Bold DEFAULT
     DEFINE_FONT 2 Helvetica REL .5
     DEFINE_FONT 3 Symbol REL 1.2
     IDENT_WIDTH 6
     ADD_SEQ 0 $nb
     ADD_SEQ $ns $na
  bw#DEFINE_COLOUR 1 .2 .2 .2
  bw#DEFINE_COLOUR 2 .4 .4 .4
  bw#DEFINE_COLOUR 3 .8 .8 .8
 col#DEFINE_COLOUR 1 1 0 0
 col#DEFINE_COLOUR 2 0 0 1
 col#DEFINE_COLOUR 3 0 1 0
 col#DEFINE_COLOUR 4 0 1 1
 col#DEFINE_COLOUR 5 1 0 1
 col#DEFINE_COLOUR 6 1 .4 .4
 col#DEFINE_COLOUR 7 .4 .4 1
 col#DEFINE_COLOUR 8 .4  1 .4
     #NUMBER_SEQS
     #DO_TICKS
     NO_NUMBERS
     SETUP		#Tell the program to get on with the formatting.
     #
     RELATIVE_TO $fs  1 #assumes your sequence on top
     #
     #Block 1
     FONT_REGION $b1 1
     COLOUR_TEXT_REGION $b1 99
  bw#SHADE_REGION $b1 .2
 col#COLOUR_REGION $b1 6
     TEXT 114 $b11 "PDZ TANDEM"
     #Block2
     FONT_REGION $b2 1
     COLOUR_TEXT_REGION $b2 99
  bw#SHADE_REGION $b2 .5 
 col#COLOUR_REGION $b2 7
     TEXT 116 $b22 "PDZ-1"
     TEXT 150 $b22 "PDZ-1"
     #BLOCK3
     FONT_REGION $b3 1
  bw#SHADE_REGION $b3 .8 
 col#COLOUR_REGION $b3 8
     TEXT 210 $b33 "PDZ-2"
     TEXT 255 $b33 "PDZ-2"
     #
end-top

sed -e s/\$ssl/$ssl/g -e s/\$ss/$ss/g -e s/\$loa/$loa/g $name.ss >> temp.als
cat << end-bottom >> temp.als
     #
     #  Mask for conservative substitution
     #  Next line assumes your sequence is on top of the alignment
     RELATIVE_TO 0
     SUB_CHARS 1 $fs $loa $ls SPACE "-"
     calcons  1 $fs $loa $ls
     mask SETUP
     mask ILLEGAL "-"
     #Last number on next line is % conserved for conservative sub
     mask CONSERVATION 1 $fs $loa $ls 4
     mask SCOL 1 $fs $loa $ls 2
     mask RESET
     #  Mask for conserved
     mask SETUP
     mask ILLEGAL "-"
     #Last number on next line is #/total seqs for conserved
     mask  ID 1 $fs $loa $ls $nc
     mask SCOL 1 $fs $loa $ls 3
     mask RESET
     #  Mask for total identity
     mask CONSERVATION 1 $fs $loa $ls 10
     mask SCOL 1 $fs $loa $ls 1
     mask CCOL 1 $fs $loa $ls 99
     mask FONT 1 $fs $loa $ls 1
     mask RESET
     # End of Alscript input file
end-bottom

if [ $color == "yes" ] ; then grep -v "bw#" temp.als | sed 's/col#/    /g' > $name.als
 else grep -v "col#" temp.als | sed 's/bw#/   /g' > $name.als
fi
alscript $name.als > $name.ps   && kghostview $name.ps &

Here is the Alscript input file this created for the color figure above.

Notes

Final note (bug in Alsript?)

For some reason, the text in some parts of the identity box remained black a couple of times -- not always. To fix this, open the file in vi and type
   :1,$ s/F1/F1 C99