Browsing protein families via the "Rich Family Description" format

Bioinformatics Vol. 15, no 12 (1999) pages 1020-1027


Florence CORPET1, Jerome GOUZY2 and Daniel KAHN.2*

1Laboratoire de Génétique Cellulaire,
2Laboratoire de Biologie Moléculaire des Relations Plantes-Microoorganismes,

I.N.R.A./C.N.R.S., BP27, F-31326 Castanet-Tolosan Cedex, France

* to whom correspondence should be addressed.

Abstract


Motivation: Multiple alignments of protein sequences are at the basis of structural and functional analysis of protein families. It is however difficult even for an expert biologist to comprehend an alignment of more than 50 to 100 homologous sequences.
Results: This paper presents a browser for the analysis of multiple alignments of large numbers of protein sequences. Phylogenetic trees and consensus sequences are computed and used to summarise the alignments; these data are stored in a structure called Rich Family Description. Summary alignments and trees are displayed in HTML pages and can be developed or reduced by the user. This browser is used to display the ProDom domain families on the Web. Its zooming facilities allow extracting information from alignments of more than one thousand homologous sequences.
Availability: Programs are available free of charge on request from the authors. The ProDom database can be consulted at http://www.toulouse.inra.fr/prodom.html.

Contact: E-mail address: proquest@toulouse.inra.fr

Try the DisplayFam Web server

Download

Installation

Utilisation

References

Warning

Installation

 The default configuration of DisplayFam.pl expects that:

1. the program is installed into the following directory tree
/www/                                 WWW Server root
/www/prodom/DisplayFam/               DisplayFam main directory
/www/prodom/DisplayFam/tmp/           DisplayFam temporary directory, the www daemon needs RW access on this directory
/www/prodom/DisplayFam/images/        images directory

	% pwd
	/www/prodom/DisplayFam

	% ls -alR
	.:
	total 654
	drwxrwxr-x   4 gouzy    nobody       512 May 11 18:52 .
	drwxr-xr-x  13 gouzy    inf          512 Mar 31 22:08 ..
	-rw-r--r--   1 gouzy    prodom      3087 May 11 18:27 DisplayFam.html
	-rwxr-xr-x   1 gouzy    prodom     54113 May 11 17:11 DisplayFam.pl
	-rw-rw-r--   1 gouzy    prodom      9194 May 11 18:38 INSTALL.html
	-rw-rw-r--   1 gouzy    prodom      8791 May 11 18:39 README
	-rwxrwxr-x   1 gouzy    prodom     19336 Mar 31 22:04 TriangleTree
	-rwxrwxr-x   1 gouzy    prodom     12504 Mar 31 22:04 bionj
	-rwxr-xr-x   1 gouzy    prodom     15109 May  9 15:46 cgi-lib2.pl
	-rwxrwxr-x   1 gouzy    prodom     20236 Mar 31 22:04 consens
	-rwxrwxr-x   1 gouzy    prodom     77192 Mar 31 22:04 drawtree
	drwxrwxr-x   2 gouzy    prodom       512 May  9 16:23 images
	-rwxrwxr-x   1 gouzy    prodom     16176 Mar 31 22:04 protdist
	-rwxrwxr-x   1 gouzy    prodom     15776 Mar 31 22:04 reroot
	-rw-r--r--   1 gouzy    prodom     56398 Mar 31 22:05 species
	-rw-r--r--   1 gouzy    prodom       187 Mar 31 22:05 speciesCG
	-rwxrwxr-x   1 gouzy    prodom     15568 Mar 31 22:04 subtree
	drwxrwxrwx   2 gouzy    prodom       512 May 11 18:54 tmp<-- the web server daemon needs read-write access for this directory

	./images:
	total 34
	drwxrwxr-x   2 gouzy    prodom       512 May  9 16:23 .
	drwxrwxr-x   4 gouzy    nobody       512 May 11 18:52 ..
	-rwxrwxr-x   1 gouzy    prodom      2226 May  9 15:48 alignement.gif
	-rw-rw-r--   1 gouzy    prodom       820 May  9 16:23 blacksquare.gif
	-rwxrwxr-x   1 gouzy    prodom       820 May  9 15:48 bluesquare.gif
	-rwxrwxr-x   1 gouzy    prodom      1149 May  9 15:48 domains.gif
	-rwxrwxr-x   1 gouzy    prodom       820 May  9 15:48 greensquare.gif
	-rwxrwxr-x   1 gouzy    prodom       820 May  9 15:48 greysquare.gif
	-rwxrwxr-x   1 gouzy    prodom        53 May  9 15:47 plus.gif
	-rwxrwxr-x   1 gouzy    prodom       267 May  9 15:48 redball.gif
	-rwxrwxr-x   1 gouzy    prodom       820 May  9 15:48 redsquare.gif
	-rwxrwxr-x   1 gouzy    prodom      1003 May  9 15:48 tree.gif
	-rwxrwxr-x   1 gouzy    prodom       878 May  9 15:49 triangle.gif
	-rwxrwxr-x   1 gouzy    prodom       820 May  9 15:48 yellowsquare.gif

	./tmp:
	total 4
	drwxrwxrwx   2 gouzy    prodom       512 May 11 18:54 .
	drwxrwxrwx   4 gouzy    nobody       512 May 11 18:52 ..

2. the web server daemon (srm.conf for apache) contains 
DocumentRoot /www
AddHandler imap-file cf
ScriptAlias /DisplayFam/cgi-bin/ /www/prodom/DisplayFam/

3. the perl path is /usr/sbin/perl


If you configuration doesn't match :-; and you want to install the programs into /Your/Web/Server/RootDocumentDir tree
1. go to /Your/Web/Server/RootDocumentDir
% cd /Your/Web/Server/RootDocumentDir

2. unpack the package
% gzip -cd DisplayFam.tar.gz | tar xvf -
% chmod 777 prodom/DisplayFam/tmp

4. edit DisplayFam.pl and modify the first line to specify the correct path for perl program: 
#!/Your/perl/path/perl -w

5. go to the InitFlo subroutine and modify $ROOT
$ROOT="/Your/Web/Server/RootDocumentDir/"; <--- be careful, don't forget the tailing /

6. edit the web server configuration file (srm.conf) according to your directory tree
AddHandler imap-file cf
ScriptAlias /DisplayFam/cgi-bin/ /My/Web/Server/RootDocumentDir/prodom/DisplayFam/

Utilisation

The input for DisplayFam is a multiple aligment in MSF format

Form mode

1. with your favorite browser: open http://your.machine/prodom/DisplayFam/DisplayFam.html document
2. copy/paste or open an MSF document (see example)
3. submit the form

URL mode

You could also use DisplayFam on precomputed multiple alignments by adding tags into HTML documents

< A HREF=http://your.machine/DisplayFam.pl?id_fam=new&msf=/my/home/dir/example.msf&mindist=20&search=SMc&maxleaves=30>DisplayFam example< /A>

msf:			MSF file path
mindist:numeric_value	Minimal distance between sequences (in PAM)
search:string		If possible, clusterIDs should contain the value string (e.g.: human)
maxleaves:numeric_value	Maximal number of clusters

References

  • Boutell, T. (1995) gd 1.2, a graphics library for fast GIF creation, Cold Spring Harbor Labs,
    http://www.boutell.com/gd/
  • Corpet, F. (1988) Multiple sequence alignment with hierarchical clustering, Nucleic Acids Res., 16, 10881-10890.
    http://www.toulouse.inra.fr/multalin.html
  • Corpet, F., Gouzy, J. and Kahn, D. (1999) Recent improvements of the ProDom database of protein domain families, Nucleic Acids Res., 27, 263-267.,
    http://www.toulouse.inra.fr/prodom.html
  • Felsenstein, J. (1993) PHYLIP: Phylogeny Inference Package, Version 3.5, Univ. Of Washington, Seattle,
    http://evolution.genetics.washington.edu/phylip.html
  • Gascuel, O. (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data., Mol Biol Evol, 14, 685-695
  • Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput. Appl. Biosci., 10, 19-29.
  • Guenoche and Grandcolas, personal communication

    Note: This is not the complete list of all references used in the article. This short list contains only references used for the DisplayFam package implementation

    Warnings


    1. the DisplayFam package doesn't clean tmp directory
    2. we use DisplayFam dayly on linux and solaris but not on irix !


    Last updated May 10, 2000