Home

Version 1.08 User Manual

image

Contents

1. h Description This file specifies the initial final probabilities and transition probabilities for the HMM model used by the aligner The HMM model consists a Match state and Insert X state and an Insert Y state and is described in more detail in the mentioned papers The file format consists of three lines containing initMatchProb initInsertXProb initInsertY Prob startInsertXProb startInsertYProb extendInsertXProb extendInsertYProb Example usage probcons p trained params input mfa gt output mfa probcons paramfile trained params input mfa gt output mfa where the file trained params has contents 0 9999138713 0 0000430496 0 0000430496 0 0144627076 0 0144627076 0 6306074262 0 6306074262 Page 11 of 12 Last Updated 8 31 2004 PROBCONS Version 1 08 manual m matrixfile FILENAME Read scoring matrix parameters from FILENAME default matrix read from the Defaults h Description This file specifies the emission probabilities that are to be used for scoring alignments By default an emission probabilities based on the BLOSUME62 matrix are used The file format consists of one line with twenty letters specifying the order of the amino acid alphabet to be used in describing the pair emission probabilities twenty lines describing pair emission probabilities where the nth line contains n entries the mth entry of the nth line gives the joint probability for emitting amino acid mwith amino acid n the matrix is as
2. integer between 0 and 100 inclusive representing the expected percentage of correct pairwise matches in the column Columns containing only one non gap character automatically have quality score 0 Example usage probcons annot output mfa annotations input mfa gt output mfa generates the file output mfa annotations containing 96 94 84 Page 10 of 12 Last Updated 8 31 2004 PROBCONS Version 1 08 manual t train FILENAME Compute EM transition probabilities store in FILENAME default no training Description This option is used to train the aligner using a set of sequences The test sequences are read from the files specified in MFAFILE s Each file in MFAFILE s is taken as a separate training instance This performs exactly one round of EM training on the sequences multiple calls to PROBCONS are needed in order to obtain convergence The training parameters are written to the file FILENAME in as three lines initMatchProb initInsertXProb initInsertY Prob startInsertXProb startInsertYProb extendInsertXProb extendInsertYProb Example usage probcons t trained params input mfa input2 mfa input3 mfa probcons train trained params input mfa input2 mfa input3 mfa generates the file trained params with contents 0 9999138713 0 0000430496 0 0000430496 0 0144627076 0 0144627076 0 6306074262 0 6306074262 p paramfile FILENAME Read initial final and transition probabilities from FILENAME default parameters read from Defaults
3. MVGKIIV EYTFYCEP EPKALSISAGDTV APGSFYSVTLGTPG gt azup_achcy VHMLNKGKDGAMVF KVTFTAPGVYGVKCTPHYGMGMVGVV EFVMNKVGPHNVIFDKVPAG HRGAGMVGKITV TYSFYCTPHRGAG VGTITV EPASLKVAPGDTV will generate the following output gt plas_horvu EDAVPSGVDVSKISQ PROBCONS Version 1 08 manual EYL EDAIPSGVNADAISRDDYLN INNKVPPHNVVFDAALNPAKSADLAKSLSHKOL ESAPALSNTKLRI TF IPTDKGHNV ETIKGMIPDGA EV DVLLGANGGVLVF EPNDFSVKAGI ETI TFKNNAGYPHNVVF D H EYLTAPGETFSVTI gt plas_chlre TV PGTYGFYCI EPHAGAGMVGRKVTV VKLGA T a DYLNAPGETYSVKL gt plas_anava TA AGEYGYYC EPHOGAGMVGKIIV VKLGSDKGLLVE EPAKLTIKPGDTV QLLMSPGQOSTSTTF gt plas_proho VOIKMGTDKYAPLY EFL PADAPAG EYTFYC EPHRGAGMVGKITV EPKALSISAGDTV NKVGPHNVIFDK VPA GI KLRIAPGSFYSVTLGT PGTYSFYC gt azup_achcy VHMLNKGKDGAMVF FKSKIN ENYKVTFTA PGVYGVKCI EPASLKVAPGDTV1 RGAGMVGTITV F IPTDK GHNVETIKGMIPD GA EAFKSKINI EDAVPS GVDVSKISQ TPHYGMGMVGVVEV ENY DSGALEFVPKTLTIKSGETVNFVNNAGFPHNIVFDEDAIPS GVNADAISRD NKVPPHNVVF DAALNPAKSADLAKSLSHK ESAPALSNT This sequence is 1plc_ref1 from the BAIiBASE collection Thompson J D Plewniak F and Poch O 1999a BAIiBASE a benchmark alignment database for t
4. PROBCONS Version 1 08 manual PROBCONS Version 1 08 User Manual Written by Mahathi Mahabhashyam mmahathi cs stanford edu and Chuong Do chuongdo cs stanford edu Page 1 of 12 Last Updated 8 31 2004 PROBCONS Version 1 08 manual Overview PROBCONS is a novel tool for generating multiple alignments of protein sequences Using a combination of probabilistic modeling and consistency based alignment techniques PROBCONS has achieved the highest accuracies of all alignment methods to date The basis for the PROBCONS algorithm is the computation of pairwise posterior probability matrices P x y x y which give the probability that one should match letters x and y when aligning two sequences x and y PROBCONS uses a simple probabilistic model that allows for efficient computation of these probabilities Given these posterior matrices PROBCONS applies the probabilistic consistency transformation to incorporate evidence from intermediate sequences Finally PROBCONS performs progressive alignment using a sum of pairs maximum expected accuracy objective function Algorithm Summary Given a set of sequences to be aligned Compute posterior probability matrices for each pair of sequences Compute the expected accuracy of each alignment Apply the probabilistic consistency transformation to posterior matrices Compute a guide tree using the expected accuracies of NS Progressively align the sequences using th
5. e guide tree References PROBCONS is discussed in the following papers Do C B Brudno M and Batzoglou S 2004 PROBCONS Probabilistic Consistency based Multiple Alignment of Amino Acid Sequences To appear in ISMB Do C B Brudno M and Batzoglou S 2004 ProbCons Probabilistic Consistency based Multiple Alignment of Amino Acid Sequences To appear in AAAI PROBCONS is public domain software for details See README for details Page 2 of 12 Last Updated 8 31 2004 PROBCONS Version 1 08 manual Getting Started To install and use PROBCONS 1 Download the latest version of the PROBCONS source code from http probcons stanford edu download html 2 Decompress the files gunzip probcons_vX_XX tar gz tar xvf probcons_vX_XX tar 3 This will create a subdirectory called probcons inside of the current directory 4 Change to the probcons directory and make the PROBCONS executable cd probcons make 5 Align the sequences in the file input and send the result to the file output probcons input gt output That s it Page 3 of 12 Last Updated 8 31 2004 PROBCONS Version 1 08 manual Input Output Format Any file used as input for PROBCONS should be a text file This means that the program will not work with doc files from MS word or other formatted word processing files Most word processors allow the user to save a text file by selecting Save as in the file menu MFA format for input outp
6. e performed In each stage of iterative refinement the set of sequences in the alignment is randomly partitioned into two groups After projecting the alignments to these groups the two groups are realigned resulting in an alignment whose objective score is guaranteed to be at least that of the original alignment Example usage probcons ir 1000 input mfa gt output mfa probcons iterative refinement 1000 input mfa gt output mfa Page 8 of 12 Last Updated 8 31 2004 PROBCONS Version 1 08 manual pre pre training REPS Use 0 lt REPS lt 20 default 0 rounds of pre training before aligning the sequences Description This specifies the number of rounds of EM to be applied on the set of sequences being aligned This option is used in case the default parameters are not appropriate for the particular sequences being aligned in general this option is not recommended as it may lead to unstable alignment parameters Example usage probcons pre 1 input mfa gt output mfa probcons pre training 1 input mfa gt output mfa pairs Generate all pairwise alignments of all possible pairs of sequences Description When this option is selected PROBCONS generates all pairs pairwise maximum expected accuracy alignments using the posterior matrices without generating a full multiple alignment The names of the files are based on the header comments for each of the sequences in the original input file with fasta append
7. ed When the clustalw option is selected then aln is used as a suffix instead Example usage probcons pairs input mfa gt output mfa where input mfa consists of gt seql ATGC gt seq2 ATGC gt seq3 ATGC generates the files seql seq2 fasta seql seq3 fasta seq2 seq3 fasta Page 9 of 12 Last Updated 8 31 2004 PROBCONS Version 1 08 manual viterbi Use Viterbi decoding rather than maximum expected accuracy alignment Description Generates all pairs pairwise alignments using the Viterbi algorithm Note that this option has the effect of automatically turning on pairs This option is not recommended but is available for comparison to the maximum expected accuracy alignments Example usage probcons viterbi input mfa gt output mfa v verbose Report progress while aligning default off Description Turning on this option instructs the aligner to report the progress on all pairwise alignments during the initial alignment step all consistency transformation calculations and all iterative refinement steps Example usage probcons v input mfa gt output mfa probcons verbose input mfa gt output mfa annot FILENAME Write annotation for multiple alignment to file FILENAME Description Turning on this option causes the program to write quality scores for columns in the produced alignment to FILENAME The quality score for each column of the alignment is given on a separate line and is an
8. file OTHERFLAGS DNumInsertStates 1l DVERSION 1 08 and then recompile the program with make While any positive integer may be specified in the Makefile default parameter values exist only when the number of insert state pairs is 1 or 2 Page 7 of 12 Last Updated 8 31 2004 PROBCONS Version 1 08 manual Command line Options PROBCONS offers several command line options which are detailed below General usage probcons OPTION MFAFILE MFAFILE clustalw Use CLUSTALW output format instead of MFA Description Generates alignments in the ClustalW output format Example usage probcons clustalw input mfa gt output aln c consistency REPS Use 0 lt REPS lt 5 default 2 passes of consistency transformation Description Each pass applies one round of the consistency transformation on the set of sequences The consistency transformation is described in detail in the mentioned papers In each round the aligner computes the consistency transformation for each pair of sequences using all other sequences The aligner then updates the posterior probability matrices of the pairwise alignments Example usage probcons c 1 input mfa gt output mfa probcons consistency 1 input mfa gt output mfa ir iterative refinement REPS Use 0 lt REPS lt 1000 default 100 passes of iterative refinement Description This specifies the number of iterations of iterative refinement to b
9. he evaluation of multiple alignment programs Bioinformatics 15 1 87 88 Page 5 of 12 Last Updated 8 31 2004 PROBCONS Version 1 08 manual If the clustalw option is specified then the following output is produced instead PROBCONS version 1 08 multiple sequence alignment plas_horvu DVLLGANGGVLVFEPNDFSVKAGET ITFKNNAGYPHNVVEDEDAVPS GVDVSKISQOE plas_chlre VKLGADSGALEFVPKTLTIKSGETVNE VNNAGFPHNIVEDEDAIPS GVNADATSRD plas_anava VKLGSDKGLLVFEPAKLTIKPGDTVEF LNNKVPPHNVVEFDAALNPAKSADLAKSLSHK plas_proho VOQIKMGTDKYAPLYEPKALSISAGDTVEFVMNKVGPHNVIFDK VPA GESAPALSNT azup_achcy VHMLNKGKDGAMVFEPASLKVAPGDTVTF IPTDK GHNVET IKGMIPD GAEA Lire I Wat see T KES x Sha plas_horvu EYLTAPGETFSVTLTV PGTYGF YCEPHAGAGMVGKVTV plas_chlre DYLNAPGETYSVKLTA AGEYGY YCEPHOQGAGMVGKIIV plas_anava OLLMSPGOSTSTTFPADAPAGEYTFYCEPHRGAGMVGKITV plas_proho KLRIAPGSFYSVTLGT PGTYSFYCTPHRGAGMVGTITV azup_achcy FKSKINENYKVTFTA PGVYGVKCTPHYGMGMVGVVEV x x KK kK KKK Page 6 of 12 Last Updated 8 31 2004 PROBCONS Version 1 08 manual Changing the number of insertion state pairs For efficiency reasons the number of insertion state pairs used in PROBCONS is fixed at compile time To change the number of insertion state pairs used edit the following line in the Make
10. sumed to be symmetric one line with twenty letters specifying the order of the amino acid alphabet to be used in describing the single emission probabilities one line describing single emission probabilities e the mth entry gives the single emission probability for emitting amino acid m in an insertion state assumed to be the same for insert X and insert Y states Example usage probcons m blosum62 matrix input mfa gt output mfa probcons matrixfile blosum62 matrix input mfa gt output mfa where the file blosum62 matrix contains ARNDCQEGHILKMFPSTWYV 0 02373072 0 00244502 0 01775118 0 00210228 0 00207782 0 01281864 0 00223549 0 00161657 0 00353540 0 01911178 0 00145515 0 00044701 0 00042479 0 00036798 0 01013470 ARNDCQEGHILKMFPSTWYV 0 07831005 0 05246024 0 04433257 0 05130349 0 02189704 Page 12 of 12 Last Updated 8 31 2004
11. ut PROBCONS accepts files in the MFA format and produces output in MFA format The MFA format is specified below The MFA format consists of multiple sequences e Each sequence in MFA format begins with a single line description followed by lines of sequence data The description line is distinguished from the sequence data by a greater than gt symbol in the first column ClustalW ALN format for output If the clustalw option is specified then a ClustalW output file is produced instead of the regular MFA The ClustalW format consists of a single header line followed by sequence data in blocks of 50 alignment positions Each block consists of o one line of data for each of the sequences in the alignment in particular the name of the sequence 50 characters of the alignment o One annotation line indicating fully conserved strongly conserved or weakly conserved columns Page 4 of 12 Last Updated 8 31 2004 Example Usage Running PROBCONS on the following input file gt plas_horvu DVLLGANGGVLVF EPNDFSVKAGI TAPGI ETFSVTLTV gt plas_chlre PG Dp YGF YCI a ETITFKNNAGYPHNVVFD PHAGAGMVGKVTV Dp EE VE VKLGADSGA KTLTIKSG APGI ETYSVKLTAAG EYGYYCEP gt plas_anava VKLGSDKGLLVE EPAKLTIK PGDTVEF LMSPGOSTSTTF gt plas_proho VOIKMGTDKYAPLY PADAPAG ETVNEVNNAGFPHNIVFD HOGAG

Download Pdf Manuals

image

Related Search

Related Contents

IWILL DP533 Series Motherboar IWILL DP533 Series  

Copyright © All rights reserved.
Failed to retrieve file