Home

T-Coffee User Guide and Reference Manual

1. 63 MOS A er Ed du bnc ti ra oec ege Festive oe pese Grid aR sacha eu A caspian EM EN 64 DL AL TA A cousepteeces 64 HOIETOL MUT LIEU RARE 64 CORE Complit ation 5 et a di HO AH I EE oed deos 64 cevaluate Mod A UR ee ecb eb ir IR Eee UE Der gris 64 Generic Output B ilding a Serv Teresu rnea AA 66 Common Problems when setting up ServerS seseseeseseeseseroesosocoesosoesesoeeesoesesessesesossesoesesoreesosoesesoreesesseseseesesossesossesee 66 Output of th dnd file IRR Eme ARRA A ES 66 Perinissions io e e rei ide de o qp e RUE POE IH eges se ee tune redi evo S eo etre epa edes 66 Other PHOSPOMS ete echo Rp EE WOO EON ED e TOUR T RH RA pea 66 OPA SR sere CERO rae tees sede QU MIEL PLA E MAD DM LAUS Parameter MeS A MS Sequence Name Handling Automatic Format Recognition ssscssssssrsssssscersesssssssssecsscsssssessessersssssessensessssssessesssssscssaseassensesseesees Ql M MH SI AA A MM MMC ERU EE REM jn n M LOS a rl T COFFEE LIB FORMAT 01 T COFFEE LIB FORMAT 02 Library D E ES A T AE OER E EEEE E E Substitution AAA O ClustalW Style Deprecated 5 n e em eo e eite elena UR POR e de tuts BLAST Format Recommended Sequences Welglits eoo ckttoi NN Known PrODDIeNmsS z ooa
2. saat 48 sseq name for quadruplet o d ed de 48 A est dactubcesdatecedsshsensasedach a ap a S E a ae ea ESAS aaia iee SEE eas siie Ea oaa 48 Tree Compilation ss as cca casas 52 TEN 49 distance matrix mode secet ree eR UR EG ERE e ler at ete Eakins 49 squicktree TOW RE Bip e e RV nes 49 Pair wise Alignment Computation ss 49 Zap mode dedo er RO ES e p ea pU Ee Ed 50 O t NUR REGE YO A ots nes teen aed RU R DEN Se tate ee 50 T COFFEE REFERENCE MANUAL MUA MO a zdiag threshold o etit e pU OU ND C E Een Ru m SA RE PRESSE See nomatch gapopen pu 21H fgapopen Dalm zCOSTIE TIC D Ralty o oett site do ettet mter ule etur Jg modes c n CERBERUS MAIS A O O O Multiple Alignment Computation ss Smsa modes xe e qe eye ua a etes etti RE aa profile comparison sprofile mode cod sao t eve dom A A Leu ir Mc A Mq UM oe Metti EUR Alignment Post Processing e RR ERE E EUR EN e clean aln clean threshold gre T E e RUE ERR apa pte red tee pet ree ST iteration eset t OE E ate a ele temm tete tane ta deines clean evaluation mode QI DE CPU Control intet RON Multithreading iu eee ep ete PE eee tubas
3. A identifier that tells the program to keep the sequences aligned Aligning Nucleic Acids Nucleic acid sequences are difficult to align and T Coffee is not especially well gifted However if you want to give it a try you are advised to use the following 19 command line EXCL t coffee sample dnasegl fasta special mode dna This special mode triggers the use of slow pair4dna and lalign id pair4dna that use lower gap estension penalties and the identity matrix If you would rather use your own matrix use t coffee sample dnaseql fasta in Mlalign id pair4dna EP MATRIX idmat Where you should replace idmat with your own matrix in BLAST format see the format section Aligning Sequences and Structures Assuming some structures are associated with your sequences it is possible to align these sequences while using associated structural information The easiest way to do this is to use 3dcoffee Aligning Sequences and Profiles T Coffee can make multiple profile alignments In this context the alignments are treated as single sequences and aligned to one another in a progressive fashion Currently we only support profiles under the form of standard multiple sequence alignments The profile must either be entered via the profile flag EXCL t coffee profile sample alnl aln sample aln2 aln outfile combined profiles aln It is also possible to read the profile via the in flag as long as they are preceded w
4. Usage cosmetic penalty negative value gt Default cosmetic penalty 50 Indicates the penalty applied for opening a gap This penalty is set to a very low value It will only have an influence on the portions of the alignment that are unalignable It will not make them more correct but only more pleasing to the eye i e Avoid stretches of lonely residues The cosmetic penalty is automatically turned off if a substitution matrix is used rather than a library tg mode Usage tg mode lt 0 1 or 2 gt Default tg mode 1 0 terminal gaps penalized with gapopen gapext len 1 terminal gaps penalized with a gapext len 2 terminal gaps unpenalized 52 Weighting Schemes seq weight Usage seq weight lt t coffee or lt file_name gt gt Default seq weight t coffee These are the individual weights assigned to each sequence The t coffee weights try to compensate the bias in consistency caused by redundancy in the sequences sim A B similarity between A and B between 0 and 1 weight A 1 sum sim A X 13 Weights are normalized so that their sum equals the number of sequences They are applied onto the primary library in the following manner res_score Ax B y Min weight A weight B res score Ax By These are very simple weights Their main goal is to prevent a single sequence present in many copies to dominate the alignment Note The library output by out lib is the un weighted library Note We
5. Usage type DNA PROTEIN DNA_PROTEIN Default type lt automatically set gt This flag sets the type of the sequences If omitted the type is guessed automatically This flag is compatible with ClustalW Note In case of low complexity or short sequences it is recommended to set the type manually seq Usage seq lt P S gt lt name gt Default none seq is now the recommended flag to provide your sequences It behaves mostly like the in flag seq source Usage seq_source lt ANY or _LS or LS gt Default ANY You may not want to combine all the provided sequences into a single sequence list You can do by specifying that you do not want to treat all the in files as potential sequence sources seq_source LA indicates that neither sequences provided via the A Alignment flag or via the L Library flag should be added to the sequence list seq_source S means that only sequences provided via the S tag will be considered All the other sequences will be ignored Note This flag is mostly designed for interactions between T Coffee and T CoffeeDPA the large scale version of T Coffee Structure Input pdb Usage pdb lt pdbid1 gt lt pdbid2 gt Max 200 Default None Reads or fetch a pdb file It is possible to specify a chain or even a sub chain PDBID PDB_CHAIN opt FIRST LAST opt Tree Input usetree Usage usetree lt tree file gt 43 Default No file specified
6. I want to output an html file and a regular file A see the next question Q I would like to output more than one alignment format at the same time A The flag output accepts more than one parameter For instance 32 EXCL t coffee sample seql fasta output clustalw score html score ps msf This will output founr alignment files in the corresponding formats Alignments names will have the format name as an extension Note you need to have the converter ps2pdf installed on your system standard under Linux and cygwin The latest versions of Internet Explorer and Netscape now allow the user to print the HTML display Do not forget to request Background printing Alignment Computation Q Can t coffee align Nucleic Acids A yes it can but you must use the special_mode dna EXCL t coffee sample dnaseql fasta special mode dna Q I do not want to compute the alignment A use the convert flag EXCL t coffee sample alnl aln convert output gcg This command will read the aln file and turn it into an msf alignment Q I would like to force some residues to be aligned If you want to brutally force some residues to be aligned you may use as a post processing the force aln function of seq reformat EXCL t coffee other pg seq reformat in sample aln4 aln action force aln seql 10 seq2 15 EXCL t coffee other pg seq reformat in sample aln4 aln action force aln sample lib4 tc 1ib02 sample lib4 tc
7. profile_ mode Usage profile mode lt cw_ profile profile muscle profile profile multi channel Default profile mode cw profile profile When profile comparison profile this flag selects a profile scoring function Alignment Post Processing clean aln Usage clean aln Default clean aln This flag causes T Coffee to post process the multiple alignment Residues that have a reliability score smaller or equal to clean threshold as given by an evaluation that uses clean evaluate mode are realigned to the rest of the alignment Residues with a score higher than the threshold constitute a rigid framework that cannot be altered The cleaning algorithm is greedy It starts from the top left segment of low constituency residues and works its way left to right top to bottom along the alignment You can require this operation to be carried out for several cycles using the clean iterations flag The rationale behind this operation is mostly cosmetic In order to ensure a decent looking alignment the gop is set to 20 and the gep to 1 There is no penalty for terminal gaps and the matrix is blosum62mt Note Gaps are always considered to have a reliability score of 0 Note The use of the cleaning option can result in memory overflow when aligning large sequences clean threshold Usage clean threshold lt 0 9 gt Default clean aln 1 See clean aln for details clean iteration Usage clean_iteration lt value betw
8. whose order will be used in the final alignment seqnos Usage seqnos lt on or off gt Default seqnos off Causes the output alignment to contain residue numbers at the end of each line T COFFEE seql aaa aaaa aa 9 seq2 a aa a4 segl a eu E seq2 aaaaaaaaaaaaaaaaaaa 19 Libraries Although it does not necessarily do so explicitly T Coffee always end up combining libraries Libraries are collections of pairs of residues Given a set of libraries T Coffee makes an attempt to assemble the alignment with the highest level of consistence You can think of the alignment as a timetable Each library pair would be a request from students or teachers and the job of T Coffee would be to assemble the time table that makes as many people as possible happy out_lib Usage out_lib lt name of the library default no gt Default out_lib default Sets the name of the library output Default implies lt run_name gt tc_lib lib_only Usage lib_only 63 Default unset Causes the program to stop once the library has been computed Must be used in conjunction with the flag out_lib Trees newtree Usage newtree lt tree file gt Default No file specified Indicates the name of the file into which the guide tree will be written The default will be lt sequence_name gt dnd or lt run_name dnd gt The tree is written in the parenthesis format known as newick or New Hampshire and u
9. with sim matrix 50 ndiag Usage ndiag lt value gt Default ndiag 0 Indicates the number of diagonals used by the fasta pair wise algorithm cf dp mode When ndiag 0 n diag Log length of the smallest sequence 1 When ndiag and diag threshold are set diagonals are selected if and only if they fulfill both conditions diag mode Usage diag mode lt value gt Default diag_mode 0 Indicates the manner in which diagonals are scored during the fasta hashing 0 indicates that the score of a diagonal is equal to the sum of the scores of the exact matches it contains indicates that this score is set equal to the score of the best uninterrupted segment useful when dealing with fragments of sequences diag threshold Usage diag threshold lt value gt Default diag threshold 0 Sets the value of the threshold when selecting diagonals 0 indicates that ndiag should be used to select the diagonals cf ndiag section sim matrix Usage sim_matrix lt string gt Default sim matrix vasiliky Indicates the manner in which the amino acid alphabet is degenerated when hashing in the fasta pairwise dynamic programming Standard ClustalW matrices are all valid They are used to define groups of amino acids having positive substitution values In T Coffee the default is a 13 letter grouping named Vasiliky with residues grouped as follows rk de ah vilm fy other residues kept alone This
10. 0 2 S1 SEO1 OPTIONAL S2 SEQ2 OPTIONAL comment OPTIONAL SiL RL RL 2 Re REZ wil 12 WS 69 S1 S2 name of sequence 1 and 2 SEQI sequence of S1 Ril Ri2 index of the residues in their respective sequence R1 R2 Residue type V1 V2 V3 integer Values V2 and V3 are optional Valuel Value 2 and Value3 are optional Library List These are lists of pairs of sequences that must be used to compute a library The format is lt nseq gt S1 8S2 E hamg2 globav 13 hamgw hemog singa Substitution matrices If the required substitution matrix is not available write your own in a file using the following format ClustalW Style Deprecated VAVO v4 v5 v6 v1 v2 are integers possibly negatives The order of the amino acids is ABCDEFGHIKLMNQRSTVWXYZ which means that v1 is the substitution value for A vs A v2 for A vs B v3 for B vs B v4 for A vs C and so on BLAST Format Recommended BLAST MATRIX FORMAT n ALPHABET AGCT A Gc HE anna FO o NWN w uS W The alphabet can be freely defined Sequences Weights Create your own weight file using the seq_weight flag SINGLE SEQ WEIGHT FORMAT 01 seq_namel wal 70 seg_name2 v2 No duplicate allowed Sequences not included in the set of sequences provided to t coffee will be ignored Order is free V1 is a float Un weighted sequenc
11. XX o es IN FLAG X FRR OK IRR ok kk kk kk ok kk kk ko RRA RRA RAR RARA RRA RARA RARA AR NAAA flag indicating the name of the in coming sequences IN FLAG S no name no flag ENE A SSA S Nos gt euam IN FLAG infile CK Ck ck ck ck ck ck Ck Ck CK Ck Ck CC CC CK CC C C CK C CC ck Ck Sk Ck Sk CK Ck ck Ck Sk Ck Ck Kk Sk ko Sk ko Sk Sk XX o a OUT_FLAG E MAA AAA AAC AR ADAC MERA OCC e HERE REIR ede Me ME DU UE ete De Eee eode e ECCE ee OUT_FLAG flag indicating the name of the out coming data same conventions as IN_FLAG OUT FLAG S no name no flag OUT FLAG outfile CK ck Ck ck Ck ck CK Ck CK Ck CC Ck CK Ck CK Ck Ce Ck Sk CK Ck CC kk Sk Ck Sk CK Ck Cock kk Sk Ck RARA Sk ko Sk Sk XX A SEQ TYPE CK ck ck ck Ck Ck CK Ck CK Ck Ck Ck CK Ck CK Ck Ck Ck Ck CK Ck Ck ck Ck ck Ck Sk Cock Kock kk Sk Ck Ck Ck ck ko Sk ko Sk ko XX o Go Genome Ss Sequences Pa PID Re Projallle Examples 7 SHOLYPE S sequences against sequences default NS QINMBE S JP Sequence against structure SOUP Ie EE structure against structure TS ONAL IE ES mix of sequences and structure SEQ TYPE S KKEKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKK KKK PARAM kc ck ck ck kk kk Ck Ck Ck Ck Ck ck ck ck kk kk Ck Ck Ck Ck Ck ck Ck ck ck ck kk kk kk Ck Ck Ck ck ck ck kv Sk kx kx XX EX KKK pipamametenca eni TO dns acu Asia e there ls more en 1 PAVAN lim
12. a list of pairs of residues that could be aligned It is like a Xmas list you can ask anything you fancy but it is down to Santa to assemble a collection of Toys that won t get him stuck at the airport while going through the metal detector Given a standard library it is not possible to have all the residues aligned at the same time because all the lines of the library may not agree For instance line 1 may say Residue 1 of seq A with Residue 5 of seq B and line 100 may say Residue 1 of seq A with Residue 29 of seq B Each of these constraints comes with a weight and in the end the T Coffee algorithm tries to generate the multiple alignment that contains constraints whose sum of weights yields the highest score In other words it tries to make happy as many constraints as possible replace the word constraint with friends family members collaborators and you will know exactly what we mean You can generate this list of constraints however you like You may even provide it yourself forcing important residues to be aligned by giving them high weights see the FAQ For your convenience T Coffee can generate this is the default its own list by making all the possible global pairwise alignments and the 10 best local alignments associated with each pair of sequences Each pair of residues observed aligned in these pairwise alignments becomes a line in the library Yet be aware that nothing forces you to use this library
13. alphabet is set with the flag sim matrix vasiliky In order to keep the alphabet non degenerated sim matrix idmat can be used to retain the standard alphabet matrix CW Usage matrix lt blosum62mt gt Default matrix blosum62mt The usage of this flag has been modified from previous versions due to frequent mistakes in its usage This flag sets the matrix that will be used by alignment methods within t_coffee slow pair lalign id pair It does not affect external methods like clustal pair clustal aln 51 nomatch Usage nomatch lt positive value gt Default nomatch 0 Indicates the penalty to associate with a match When using a library all matches are positive or equal to 0 Matches equal to O are unsupported by the library but non penalized Setting nomatch to a non negative value makes it possible to penalize these null matches and prevent unrelated sequences from being aligned this can be useful when the alignments are meant to be used for structural modeling gapopen Usage gapopen lt negative value gt Default gapopen 0 Indicates the penalty applied for opening a gap The penalty must be negative If no value is provided when using a substitution matrix a value will be automatically computed gapext Usage gapext lt negative value gt Default gapext 0 Indicates the penalty applied for extending a gap cf gapopen fgapopen Unsupported fgapext Unsupported cosmetic penalty
14. and causes t coffee to evaluates a pre computed alignment provided via infile lt alignment gt The flag output must be set to an appropriate format i e output score_ascii score html or score pdf The main purpose of evaluate is to let you control every aspect of the evaluation Yet it is advisable to use pre defined parameterization special mode evaluate EXCL t coffee infile sample alnl aln special mode evaluate EXCL t coffee infile sample seql aln in Lsample libl tc lib special mode evaluate convert cw Usage convert Default turned off Toggles on the conversion mode and causes T Coffee to convert the sequences alignments libraries or structures provided via the infile and in flags The output format must be set via the output flag This flag can also be used if you simply want to compute a library i e you have an alignment and you want to turn it into a library This flag is ClustalW compliant do align cw Usage do align Default turned on Special Parameters version Usage version Default not used Returns the current version number check configuration Usage check configuration Default not used Checks your system to determine whether all the programs T Coffee can interact with are installed cache Usage cache lt use update ignore lt filename gt gt Default cache use By default t_coffee stores in a cache directory the results of computationally ex
15. denne fert e ee etes multi thread NOE Supported e RE RU ist er RARI BOR OST ne Limits mem Mode eiir E de UN a ie A EE dpamasterzaln te ir ede rne ep etre ee cdpa Haxnsequ cae DU EU UE e ie MS dpa O isis aa nete e d T get e Pee ADAMS COMODO lcd ne dap tree NOT IMPLEMENTED ier titt e el e teet edens be daa sente ent Using Structures Generic M specidl mode adea e NUR lent LA PE E DOG U NOE ANO CO UE V n rt atis CHECK PAD status suu mss beate eta EN nt 3D Coffee Using SAP i sss s eie cde dete i redde Using finding PDB templates for the Sequences AS A t n M DRE e A etur RIORUM CR Multiple Local Alignments MOM GUM OCCA s os ida domain interactive Examples f ren me ie ee Ea ss Uunn M Generic S Conventions Regarding Filenames ooo Identifying the Output files automatically JAllgnmenis dm soe e SES NER NUN EE HD UU B OUR Gp Tete RR ONU ee eis T COFFEE REFERENCE MANUAL OMC ati a ia D MY 61 UU sons unas de ica te a rade nl te taa 61 LUS A LA RETE een E 62 A en Neate tete tacere d Ta etre Rule vam ah La Est es msn die diets 62 Eg m 63 QuiseqWelghl Li A e e HP RP E MPO e eene ee b En e ER EU ern 63 QUIE E a 63 EGNOS DE 63 DEI HERE 63 UL A e 63 dim
16. either modifying the METHODS 4 TCOFFEE in define headers h and recompile or by modifying the envoronement variable METHODS 4 TCOFFEE Advanced Method Integration It may sometimes be difficult to customize the program you want to use through a tc method file In that case you may rather use an external perl script to run your external application This can easily be achieved using the generic method tc method file TC METHOD FORMAT 01 GET S Se DS uos ms Os Raw T ge neri e method Aci eo AS EXECUTABLE tc generic method pl ALN MODE pairwise IN FLAG infile OUT FLAG outfile OUT MODE aln PARAM method clustalw PARAM gapopen 10 SEROR EE S CK ck ck ck ck Ck Ck CK C CK Ck Ck Ck Ck CK Ck Ck Ck Sk Ck Ck CK RARA ck Ck Sk Ck Sk ko Sk ko Sk RA KA AAA Note amp bsnp can be used to for white spaces When you run this method EXCL t coffee sample seql fasta in Mgeneric method tc method T Coffee runs the script tc generic method pl on your data It also provides the script with parameters In this case method clustalw indicates that the script should run clustalw on your data The script tc generic method pl is incorporated in t coffee Over the time this script will be the place where novel methods will be 24 integrated will be used to run the script tc_generic_method pl The file tc generic method pl is a perl file automatically generated by t coffee Over the time this
17. file will make it possible to run all available methods You can dump the script using the following command EXCL t coffee other pg unpack tc generic method pl Note If there is a copy of that script in your local directory that copy will be used in place of the internal copy of T Coffee The Mother of All method files TC METHOD FORMAT 01 ge me ee IIIS Ce Tile ANO AAA Incorporating new methods in T Coffee i Cedric Notredame 17 04 05 CK ck ck ck Ck CK Ck CK Ck KC C 0k Ck CK C CK Ck Ck Ck Ck CK Ck CK Ck ck Sk Ck Sk C ck C ck kk ck Ck RARA Sk ko Mk AAA o This file is a method file Copy it and adapt it to your need so that the method you want to use can be incorporated within T Coffee cock ck ck ck ck ck ck k ck XX XX ck k ck XX XX k ck ck XX k k k ck XX k k ck ko ck ck k ck ck ko ko Mk AAA A xo x USAGE kk Ck ck ck ck ck ckckckckck ck ck ck Ck ck Ck kk kk kk Ck Ck Ck Ck Ck Ck kk kk kk kk kk Ck Ck Ck kk kk kv EX XX kc ko ko This file is passed to t coffee via in d ib CO i in Mgeneric method method 5 The method is passed to the shell using the following eue e lt EXECUTABLE gt lt IN_FLAG gt lt seq_file gt lt OUT_FLAG gt lt outname gt lt PARAM gt Conventions FLAG NAME gt STIYPES VALUE lt VALUE gt no_nam lt gt Replaced with a space lt VALUE gt amp nbsp lt gt Replaced with a space CK ck cock ck Ck CC C
18. lib02 is a T Coffee library using the tc_lib02 format TC LIB FORMAT 02 SeqX resY ResY index SeqZ ResZ ResZ index The TC LIB FORMAT 02 is still experimental and unsupported It can only be used in the context of the force aln function described here Given more than one constraint these will be applied one after the other in the order they are provided This greedy procedure means that the Nth constraint may disrupt the N 1 th previously imposed constraint hence the importance of forcing the constraints in the right order with the most important coming last We do not recommend imposing hard constraints on an alignment and it is much more advisable to use the soft constraints provided by standard t coffee libraries cf building your own libraries section 33 Q I would like to use structural alignments See the section Using structures in Multiple Sequence Alignments or see the question want to build my own libraries Q I want to build my own libraries A Turn your alignment into a library forcing the residues to have a very good weight using structure EXCL t coffee in Asample seql aln weight 1000 out lib sample seql tc lib lib only The value 1000 is simply a high value that should make it more likely for the substitution found in your alignment to reoccur in the final alignment This will produce the library sample alnl tc lib that you can later use when aligning all the sequences EXCL t co
19. matrix file gt or integer value gt Default weight sim Weight defines the way alignments are weighted when turned into a library winsimN indicates that the weight assigned to a given pair will be equal to the percent identity within a window of 2N 1 length centered on that pair For instance winsim10 defines a window of 10 residues around the pair being considered This gives its own weight to each residue in the output library In our hands this type of weighting scheme has not provided any significant improvement over the standard sim value EXCL t coffee sample seql fasta weight winsim10 out lib test tc lib sim indicates that the weight equals the average identity within the sequences containing the matched residues 48 sim_matrix_name indicates the average identity with two residues regarded as identical when their substitution value is positive The valid matrices names are in matrices h pam250mt Matrices not found in this header are considered to be filenames See the format section for matrices For instance weight sim pam250mt indicates that the grouping used for similarity will be the set of classes with positive substitutions EXCL t coffee sample seql fasta weight winsiml0 out lib test tc lib Other groups include sim clustalw col categories of clustalw marked with sim clustalw dot categories of clustalw marked with Value indicates that all the pairs found in the alignments must be given the
20. other coordinates for the repeat such as EXCL t coffee in sample libl mocca lib domain start 10 len 60 59 This run will use the fragment 100 160 and will be much faster because it does not need to re compute the lalign library Start Usage start lt int value gt Default not set This flag indicates the starting position of the portion of sequence that will be used as a template for the repeat extraction The value assumes that all the sequences have been concatenated and is given on the resulting sequence len Usage len lt int value gt Default not set This flag indicates the length of the portion of sequence that will be used as a template scale Usage scale lt int value gt Default scale 100 This flag indicates the value of the threshold for extracting the repeats The actual threshold is equal to motif len scale Increase the scale lt gt Increase sensitivity More alignments i e 50 domain interactive Examples Usage domain interactive Default unset Launches an interactive mocca session EXCL t coffee in Lsample lib3 tc lib Mlalign rs s pair domain start 100 len 60 MOTERA COI atl SKLAYVTFESGR SALVIQTLANGAVRQV ASFPRHNGAPAFSPDGSKLAFA TOLB ECOLI 165 218 164 TRIAYVVOTNGGOFPYELRVSDYDGYNOFVVHRSPOPLMSPAWSPDGSKLAYV TOLB ECOLI 256 306 255 SKLAFALSKTGS LNLYVMDLASGOIRQV TDGRSNNTEPTWFPDSOQNLAFT TOLB ECOLI 307 350 S06 5 DQAGR POVYKVNINGGAPORI TWE
21. same weight equal to value This is useful when the alignment one wishes to turn into a library must be given a pre specified score for instance if they come from a structure super imposition program Value is an integer EXCL t coffee sample seql fasta weight 1000 out lib test tc lib Tree Computation distance matrix mode Usage distance matrix mode slow fast very fast Default very fast This flag indicates the method used for computing the distance matrix distance between every pair of sequences required for the computation of the dendrogram Slow The chosen dp mode using the extended library fast The fasta dp mode using the extended library very fast The fasta dp mode using blosum62mt ktup Ktup matching Muscle kind aln Read the distances on a precomputed MSA quicktree CW Usage quicktree Description Causes T Coffee to compute a fast approximate guide tree This flag is kept for compatibility with ClustalW It indicates that EXCL t coffee sample seql fasta distance matrix mode very fast EXCL t coffee sample seql fasta quicktree Pair wise Alignment Computation 49 dp_mode Usage dp_mode lt string gt Default dp mode cfasta fair wise This flag indicates the type of dynamic programming used by the program EXCL t coffee sample seql fasta dp mode myers miller pair wise gotoh pair wise implementation of the gotoh algorithm quadratic in memory and time myers miller pai
22. will output the tree in new hampshire format and the alignment to stdout 30 Q Is it possible to pipe stuff INTO t_coffee A If as a file name you specify stdin the content of this file will be expected throught pipe EXCL cat sample seql fasta t coffee infile stdin will be equivalent to EXCL t coffee sample seql fasta If you do not give any argument to t coffee they will be expected to come from pipe EXCL cat sample param file param t coffee parameters stdin For instance EXCL echo in Ssample seql fasta Mclustalw pair t coffee parameters stdin Q Can I read my parameters from a file A See the well behaved parameters section Q I want to decide myself on the name of the output files A Use the run_name flag EXCL t coffee sample seql fasta run name guacamole Q I want to use the sequences in an alignment file A Simply fed your alignment any way you like but do not forget to append the prefix S for sequence EXCL t coffee Ssample alnl aln EXCL t coffee infile Ssample alnl aln EXCL t coffee in Ssample alnl aln Mslow pair Mlalign id pair outfile outaln This means that the gaps will be reset and that the alignment you provide will not be considered as an alignment but as a set of sequences Q I only want to produce a library A use the lib only flag EXCL t coffee sample seql fasta out lib sample libl tc lib lib only Please note that the previous usage
23. ANT All the files mentioned here sample seq can be found in the example directory of the distribution NOT Fetching Sequences T Coffee will NOT fetch sequences for you you must select the sequences you want to align before hand We suggest you use any BLAST server and format your sequences in FASTA so that T COFFEE can use them easily Aligning Sequences Making accurate multiple alignments of DNA RNA or Protein sequences Combining Alignments T Coffee allows you to combine results obtained with several alignment methods For instance if you have an alignment coming from ClustalW an other alignment coming from Dialign and a structural alignment of some of your sequences T Coffee will combine all that information and produce a new multiple sequence alignment having the best agreement with all these methods see the FAQ for more details EXCL t coffee in Asample alnl aln Asample aln2 aln Asample aln3 aln outfile combined aln aln Evaluating Alignments You can use T Coffee to measure the reliability of your Multiple Sequence alignment If you want to find out about that read the FAQ or the documentation for the output flag EXCL t coffee infile sample alnl aln special mode evaluate Combining Sequences and Structures One of the latest improvements of T Coffee is to let you combine sequences and structures so that your alignments are of higher quality You need to have sap package installed to fully ben
24. Format newick tree format ClustalW Style This flag indicates that rather than computing a new dendrogram t_coffee must use a pre computed one The tree files are in phylips format and compatible with ClustalW In most cases using a pre computed tree will halve the computation time required by t_coffee It is also possible to use trees output by ClustalW Phylips and any other program Methods and Library Input in Usage in lt P S A L M X gt lt name gt Default in Mlalign_id pair Mclustalw pair See the box for an explanation of the in flag The following argument passed via in EXCL t coffee in Ssample seql fasta Asample alnl aln Asample aln2 msf Mlalign id _pair Lsample libl tc lib outfile outaln This command will trigger the following chain of events 1 Gather all the sequences Sequences within all the provided files are pooled together Format recognition is automatic Duplicates are removed if they have the same name Duplicates in a single file are only tolerated in FASTA format file although they will cause sequences to be renamed 44 In the above case the total set of sequences will be made of sequences contained in sequences1 seq alignment1 aln alignment2 msf and library lib plus the sequences initially gathered by infile 2 Turn alignments into libraries alignmentl aln and alignment2 msf will be read and turned into libraries Another library will be produced by applying the method lalig
25. GSONODADVSSDGKFMVMV MORE SOS ONT CDS 350 SNGGQ OHIAKOQDLATGGV QV LSSTFLDETPSLAPNGTMVIYS MENU Type Letter Flag number and Return ex 10 x gt Set the START to x gt X SSS Sele the LEN OMS Cx gt Sac the scale to 5 Sname gt Saye the Alignment Bx 8ave Goes back x it return Compute the Alignment X eXit ITERATION 1 START 211 LEN 50 SCALE 100 YOUR CHOICE For instance to set the length of the domain to 40 type ITERATION 1 START 211 LEN 50 SCALE 100 YOUR CHOICE gt 40 return return 60 Which will generate TOLBRES OTE Za 211 SKLAYVTFESGRSALVIOTLANGAVROVASFPRHNGAPAF 251 TOLB_ECOLI_256_296 255 SKLAFALSKTGSLNLYVMDLASGQIROVIDGRSNNTEPTW 295 TOLB_ECOLI_300_340 299 ONLAFTSDOAGRPOVYKVNINGGAPORITWEGSQNODADV 339 TOLB ECOLI 344 383 343 KFMVMVSSNGGOOHIAKODLATGGV OVLSSTFLDETPSL 382 TOLB ECOLI 387 427 386 TMVIYSSSOGMGSVLNLVSTDGRFKARLPATDGOVKFPAW 426 al ee 40 MENU Type Letter Flag number and Return ex 10 x gt Set the START to x gt x et the LEN COR Cx gt Sale the scale to x Sname AS the Alignment Bx gt Save Goes back x it return gt Compute the Alignment X eXit ITERATION 3 START 211 LEN 40 SCALE 100 YOUR CHOICE If you want to indicate the coordinates relative to a specific sequence type lt seq_name gt start Type S lt your name gt to save the current alignment and extract a new motif Type X when
26. K KK KK KA RARA KARA Ck ck ck RA RARA RA RARA Sk ko Mk AAA AR vs EXECUTABLE 2s CK ck ck ck ck Ck ck C Ck CK Ck C Ck Ck CK Ck CK Ck Ck Sk CK Ck Ck ck kk Sk Cock c ck ck Ck Sk Ck Sk ko Sk kk ko Mk AAA XX name of the executable passed to the shell executabl EXECUTABLE tc generic method pl Ck Ckckck KKK KKK KKK KKK ck ck ck ck kk Ck Ck Ck Ck Ck Ck Ck Ck kk kk kk kk kk Ck kk ck ko ko kv ko k KE kc i ALN MODE 5s kk Ck ck ck ck ck ckckckckck ck ck ck ck ck Ck kk ck kk Ck Ck Ck Ck ck ck Ck Ck kk kk Sk kk kk Ck Ck Ck kk ko kv kv XXE KE KE pairwise Seuli Ye all ime slri D 92 m Zedbal Hl PAuliiwilse gt we all mo sels ac 2 a 2 SM as ell vs all ES ES AL A multiple All the sequences in one go ALN MODE pairwise Ck ck ck ck ck ckckckckckckck ck ck ck ck kk ck kk kk Ck Ck Ck ck ck Ck kk ck kk kk kk kk Ck kk ck kk kv kv kx KE KE i OUT MODE amp 25 CK ck ck ck ck Ck ck Ck Ck CK Ck Ck CC CK RARA KKK KKK KKK RARA Ck CK ck Sk Ck RARA Sk ko Sk ko Sk AAA XX mode for the output External methods aln gt alignmnent File Fasta or ClustalW Format O a ies QUO Jeans Je O MA O Internal Methods fL gt Internal Function returning a Lib Librairie fA gt Internal Function returning an Alignmnent OUT MODE aln CK ck cock ck Ck ck Ck Ck CK Ck Ck Ck Ck CC CK CC Ck C CK Ck CC ck Ck ck Ck CK Ck CK ck kk Sk Ck Sk Kk Ck ko Sk ko Sk ko
27. MPLEMENTED Usage dpa tree lt filename gt Default unset Guide tree used in DPA This is a newick tree where the distance associated with each node is set to the minimum pairwise distance among all considered sequences 56 Using Structures Generic special mode Usage special_mode 3dcoffee Default turned off Runs t_coffee with the 3dcoffee mode cf next section check pdb status 3D Usage check pdb status Default turned off Forces t coffee to run extract from pdb to check the pdb status of each sequence This can considerably slow down the program Coffee Using SAP It is possible to use t coffee to compute multiple structural alignments To do so ensure that you have the sap program installed EXCL t coffee in strucl pdb struc2 pdb struc3 pdb Msap pair Will combine the pairwise alignments produced by SAP There are currently two methods that can be interfaced with t coffee sap pair that uses the sap algorithm align pdb usesat coffee implementation of sap not as accurate By default the computation will be made only on the first chain contained in the pdb file If your structure is an NMR structure you are advised to provide the program with one structure only If you wish to align only a portion of the structure you should extract it yourself from the pdb file using t coffee other pg extract from pdb or any pdb handling program You can provide t coffee with a mixture of sequen
28. Manual CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE C dric Notredame T Coffee User Guide and Reference Manual T Coffee User Guide Version 3 18 September 2005 O C dric Notredame and Centre National de la Recherche Scientifique France T COFFEE REFERENCE MANUAL License and Terms ON US ltda n 6 T Coffee is distributed under the Gnu Public License s sscccccscccccsssssssssssssssccccccccccsssesssssssssceccceseesescesssssesees 6 T Coffee code can be re used freely Addresses and COnta6ts ab a aa e ea arco et iis 7 AAA 7 Addresses iie O RON 7 Whirabls T COEP BE e Qoa 11 Whatis TaCofle vd M Non 11 MHIL 11 What can it align dote e pte RD der HOW CAN D USCULG sees eun ol Notae teres Il Is T Coffee different from ClustalW ooo 12 What T Coffee Can and Cannot do for you NOT Fetching Sequences Aligning Seguences este til ee E ERR p Rte Combining Alignments Evaluating Aligumenls il NN Combining Sequences and Structures Identifying Occurrences of a Motif Mocca sse eene tree13 How Does Un 13 AREA SE ud fed S RES Standard Installati n sissano aaeei asosa Sanear ases Extended Installation and other Packages CITE SHARE E 17 A NT 17 IM OG GA cuida ERREUR ITE I E T EREE FES 17 Manada ns Et nU ice veau ded e Aaa AAN ITA M St
29. ameterization associate with special mode turns off every memory expensive heuristic within T Coffee For version 2 11 this amounts to EXCL t coffee sample seql fasta in Mslow pair Mlalign id pair distance matrix mode slow dp mode myers miller pair wise If you keep running out of memory you may also want to lower maxnseq to ensure that t coffee dpa will be used Input Output Control Q How many Sequences can t coffee handle A T Coffee is limited to a maximum of 50 sequences Above this number the program automatically switches to a heuristic mode named DPA where DPA stands for Double Progressive Alignment DPA is still in development and the version currently shipped with T Coffee is only a beta version Q How many ways to pass parameters to t coffee A See the section well behaved parameters Q How can I change the default output format A See the output option common output formats are EXCL t coffee sample seql fasta output msf fasta aln Q My sequences are slightly different between all the alignments A It does not matter T Coffee will reconstruct a set of sequences that incorporates all the residues potentially missing in some of the sequences see flag in Q Is it possible to pipe stuff OUT of t coffee A Specify stderr or stdout as output filename the output will be redirected accordingly For instance EXCL t coffee sample seql fasta outfile stdout out lib stdout This instruction
30. and that you could build it using other methods see the FAQ In protein language T COFEE is synonymous for freedom the freedom of being aligned however you fancy It is with that sort of statements that I got elected Chief Tryptophan Officer in some previous life Standard Installation 1 decompress distribution tar gz gunzip distribution tar gz 2 untar distribution tar tar xvf distribution tar 3 This will create the distribution directory with the following structure distribution bin distribution doc t_coffee doc pdf t_coffee doc html distribution t coffee source distribution example distribution html 4 go into the main directory and type install You will know the installation proceeded completely with the mention Installation of t coffee Successful 5 add the bin folder to your path set path path address of the t coffee bin folder Note The latest t coffee distribution 2 15 and higher is self contained and only requires one executable You may still require external modules sap blast ClustalW if you wish to use another mode than the default 15 Note When updating make sure to remove the old distribution and any associated program from your path 6 If you have PDB installed Assuming you have a standard PDB installation in your file system setenv PDB DIR pdb dir structures all Note This must be added to your login file Extended Installation and other Packages By de
31. andard Alignments Alignment Combination suisses ENA Aligning Sequences and Structures Aligning Sequences and Profiles Using Structures Or templates Within Profiles ue Using New and Existing Methods eere eese eee eee eene tnt tn nnns stata sonans snas ta senses stains tasas sens tn seasons tuas tnn Using Methods Integrated in T Coffee Integrating External Methods Managing a collection of method files Advanced Method Integration The Mother of All method files sss Creating Your Own T Coffee Libraries T COFFEE REFERENCE MANUAL Using Pre Computed Alignments ss Customizing the Weighting Scheme Generating Your QWn LIbFArtes s aiit if We I E EUR AA RE A dict Frequently Asked Questions sessessee 29 Abnormal Terminations and Wrong Results eerie ee eee eee eee eee eese en tetas tta setas etna etas etus essen sens senate ss ens sense 29 O The program keeps crashing when I give my sequences 29 Q The default alignment is not good enough O The alignment contains obvious mistakes ss O The program is crashing is d Rees dl O I am running out of memory eene tenentes Input Output Control O How many Sequences can t coffee handle O How many ways to pas
32. ans 72 Technical Notes neral litis 73 YSL OA 73 Command Line Ls edet eesete o etate eL e eee eie e at ede dede eta eee debe eed 73 T Coffee is distributed under the Gnu Public License Please make sure you have agreed with the terms of the license attached to the package before using the T Coffee package or its documentation T Coffee is a freeware open source distributed under a GPL license This means that there is no restriction to its use either in an academic or a non academic environment T Coffee code can be re used freely Our philosophy is that code is meant to be re used including ours No permission is needed although we are always happy to receive pieces of improved code Contributors T coffee is developed by a dedicated team that includes C dric Notredame Olivier Poirot Fabrice Armougom Sebastien Moretti Addresses We are always very eager to get some user feedback Please do not hesitate to drop us a line at cedric notredame Oeurope com The latest updates of T Coffee are always available on http igs server cnrs mrs fr cnotred On this address you will also find a link to some of the online T Coffee servers including Tcoffee igs http igs server cnrs mrs fr Tcoffee T Coffee can be used to automatically check if an updated version is available however the program will not update automatically as this can cause endless reproducibility problems EXCL t coffee update It is import
33. ant that you cite T Coffee when you use it Citing us is almost like giving us money it helps us convincing our institutions that what we do is useful and that they should keep paying our salaries and delivering Donuts to our offices from time to time Not that they ever did it but it would be nice anyway Cite the server if you used it otherwise cite the original paper from 2000 No it was never named T Coffee 2000 Notredame C Higgins Related Articles Links DG Heringa J T Coffee A novel method for fast and accurate multiple sequence alignment J Mol Biol 2000 Sep 8 302 1 205 17 PMID 10964570 PubMed indexed for MEDLINE Other useful publications include T Coffee Claude JB Suhre K Related Articles Links Notredame C Claverie JM Abergel C CaspR a web server for automated molecular replacement using homology modelling Nucleic Acids Res 2004 Jul 1 32 Web Server issue W606 9 PMID 15215460 PubMed indexed for MEDLINE Poirot O Suhre K Abergel C Related Articles Links O Toole E Notredame C 3DCoffee igs a web server for combining sequences and structures into a multiple sequence alignment Nucleic Acids Res 2004 Jul 1 32 Web Server issue W37 40 PMID 15215345 PubMed indexed for MEDLINE O Sullivan O Suhre K Related Articles Links Abergel C Higgins DG Notredame C 3DCoffee combining protein sequences and structures within multiple seque
34. be ee des SE SR x 42 type E dns 43 Rl RE 43 SOG S OUNCE Em 43 Structure IDUL m 43 A S Pete ct dt eM OS 43 Tree INP A e lo ato alt des bie nds a per 43 USC A A ii ie 43 Methods and Library Input seen 44 V ERE 44 REO SIL T SHEET 45 SDP OPE C M 45 profilel cw essc un iei dee e eei dee de sede deco nnd 46 sprofile2 POW A et tao trt m rer par e e AE ie e HT A 46 Exidu21250 0 rn 46 Library Computation Methods e e dest epe e Ee th vg PR eee e repo PPAR 46 Jlalign n topi id A ede dtt b te A c ig IRR RR RETE tud 46 ATEN PAD Parral e tee PU Ned d ead dee berg 46 align pdb hasch mode aep IER EE E e gere EH enis va esos REED HR ad E autre Dowd 46 Library Computation Extension sise 46 Jib list Unsupported 4 ess eee i e eddie eie dede de beside cornes 46 HAO NOMAS E eshs tien ne in M een Lee RER nek Nes bee E ete ne tnt UN ied ts eh ook elated tae 47 sextend SRE SAR ER aei tasto quas endete d a 47 Bana EE 47 MAX Dali ie de eet eee UA e le toutes es PLE P le Ales cute medal ee veut 48 seq name for quadruplet ss 48 O se MM 48 QU D 48 mU B 48 El e
35. ces and structure In this case you should use the special mode EXCL t coffee special mode 3dcoffee seq 3d sample3 fasta template file template file template Using finding PDB templates for the Sequences template file Usage template file filename SCRIPT scriptame SELF TAG 57 SEQFILE_TAG filename no gt Default no This flag instructs t_coffee on the templates that will be used when combining several types of information For instance when using structural information this file will indicate the structural template that corresponds to your sequences The identifier T indicates that the file should be a FASTA like file formatted as follows There are several ways to pass the templates 1 File name This file contains the sequence template association it uses a FASTA like format as follows gt lt sequence name P pdb template gt lt sequence name gt OG gene template gt lt sequence name gt R lt MSA template Each template will be used in place of the sequence with the appropriate method For instance structural templates will be aligned with sap pair and the information thus generated will be transferred onto the alignment Note the following rule Each sequence can have one template of each type structural genomics Each sequence can only have one template of a given type Several sequences can share the same template All the sequences do not need to have a temp
36. e tas lines a iconem PARAM method clustalw PARAM OUTORDER INPUT NEWTR T E core align gapopen 15 CK ck Ck ck Ck ck Ck Ck CK Ck C CK Ck CC CK Ck Ce Ck Ck CK Ck Ck ck Ck ck Ck Sk C Ck ck Ck Sk Ck Ck Pk Sk ko Sk ko Sk Sk kx kv A E END E CK ck ck ck Ck Ck CK C CK Ck Ck C KA RARA KARA Ck CK Ck Ck Ck RARA RARA Sk Ck Ck ko Sk ko Sk ko Sk AAA ER Creating Your Own T Coffee Libraries If the method you want to use is not integrated or impossible to integrate you can 26 generate your own libraries either directly or by turning existing alignments into libraries You may also want to precompute your libraries in order to combine them at your convenience Using Pre Computed Alignments If the method you wish to use is not supported or if you simply have the alignments the simplest thing to do is to generate yourself the pairwise multiple alignments in FASTA ClustalW msf or Pir format and feed them into t_coffee using the in flag EXCL t coffee in Asample aln1 1 aln Asample alnl 2 aln outfile combined aln aln Customizing the Weighting Scheme The previous integration method forces you to use the same weighting scheme for each alignment and the rest of the libraries generated on the fly This weighting scheme is based on global pairwise sequence identity If you want to use a more specific weighting scheme with a given method you should either generate your o
37. e Profiles ie 35 O Can I align two profiles according to the structures they contain ss 35 Exi 21g r 36 O How good is my alignment aci ode A dada 36 O What 18 that color index Pis poete aaa 36 O Can I evaluate alignments NOT produced with T Coffee sun 36 O Can I Compare Two Alignments iii enana 36 Q Lam aligning sequences with long regions of very good overlap Entering the right parameters Parameters ZU Lr GRE T COFFEE REFERENCE MANUAL NO PTGS iii A A Ai 39 IS E NN 39 ACCES US A oda 40 sspecial mode e a Pre dre de RAM AN ete qc dae es ates ad deh RM ite eR 40 score Deprecated e di eicere e ede dee dada dei ia S aane 40 SQUID SCORVEF IGCW ct ttt cet em MU ee ne Ne ay cates winks ee ev d e e lend do align cw Special Param eters est 41 VOU SION O 41 Check Configurdli ns a ie se A ad 41 O A RE sete ue ele eee ne AS LU ee een A NT LIAE ELI Nad 2 Beet er oot a ent 41 update oo O ee rene end de en nee ne eee anna 42 PUNTO ras e AA te ee E us ae tale ne Ann 42 other Dg eoe Re as he ete utet e a re 42 lim 42 DEGUENCEINPUE ERE 42 INE CW itt t tete ern ee t e ru edu dece doses OR ads 42 in Cf in from the Method and Library Input section ss 42 get I I RDS CES ENTER dea eed sott be
38. e named according to the sequences For instance if your protein sequences have been recoded with Exon Intron information you should have the recoded sequences names according to the original SEQFILE G recodedprotein fasta struc to use Usage struc to use lt strucl struc2 gt Default struc to use NULL Restricts the 3Dcoffee to a set of pre defined structures Multiple Local Alignments It is possible to compute multiple local alignments using the moca routine MOCA is a routine that allows extracting all the local alignments that show some similarity with another predefined fragment mocca is a perl script that calls t coffee and provides it with the appropriate parameters domain mocca Usage domain Default not set This flag indicates that t_coffee will run using the domain mode All the sequences will be concatenated and the resulting sequence will be compared to itself using lalign_rs_s_pair mode lalign of the sequence against itself using keeping the lalign raw score This step is the most computer intensive and it is advisable to save the resulting file EXCL t coffee in Ssample seql fasta Mlalign rs s pair out lib sample libl mocca lib domain start 10 len 50 This instruction will use the fragment 100 150 on the concatenated sequences as a template for the extracted repeats The extraction will only be made once The library will be placed in the file lib name If you want you can test
39. een 1 and gt Default clean iteration 1 See clean aln for details clean evaluation mode Usage clean_iteration lt evaluation mode gt Default clean iteration t coffee non extended Indicates the mode used for the evaluation that will indicate the segments that should be realigned See evaluation mode for the list of accepted modes 54 iterate Usage iterate lt integer gt Default iterate 0 Sequences are extracted in turn and realigned to the MSA If iterate is set to 1 each sequence is realigned otherwise the number of iterations is set by iterate CPU Control Multithreading multi_ thread NOT Supported Usage multi_thread lt N gt Default O0 Specifies that the program should be used in multithreading mode N specifies the number of processors available EXCL t coffee sample seq2 fasta multi thread 4 If you are using a quadriprocessor Limits mem mode Usage deprecated ulimit Usage ulimit lt value gt Default ulimit 0 Specifies the upper limit of memory usage in Megabytes Processes exceeding this limit will automatically exit A value O indicates that no limit applies maxlen Usage maxlen lt value 0 nolimit gt Default maxlen 1000 Indicates the maximum length of the sequences Aligning more than 100 sequences with DPA maxnseq Usage maxnseq lt value 0 nolimit gt Default maxnseq 50 Indicates the maximum number of sequences before triggering
40. efit of this facility EXCL t coffee 3d fasta special mode 3dcoffee Using this mode will cause T Coffee to automatically identify the target corresponding to your sequence as indicated by an NCBI BLAST T Coffee then obtains the required PDB sequences from RCSB However if you are also using template_file the program will use the template you specified and the corresponding files on your disk All these network based operations are carried out using wget If wget is not installed on your system you can get it for free from www wget org To make sure wget is installed on your system type EXCL which wget Identifying Occurrences of a Motif Mocca How Mocca is a special mode of T Coffee that allows you to extract a series of repeats from a single sequence or a set of sequences In other words if you know the coordinates of one copy of a repeat you can extract all the other occurrences If you want to use Mocca simply type EXCL t coffee other pg mocca sample seql fasta The program needs some time to compute a library and it will then prompt you with an interactive menu Follow the instructions Does T Coffee works If you only want to make a standard multiple alignments you may skip these explanations But if you want to do more sophisticated things these few indications may help before you start reading the doc and the papers When you run T Coffee the first thing it does is to compute a library The library is
41. es will see their weight set to 1 71 1 Sensitivity to sequence order It is difficult to implement a MSA algorithm totally insensitive to the order of input of the sequences In t_coffee robustness is increased by sorting the sequences alphabetically before aligning them Beware that this can result in confusing output where sequences with similar name are unexpectedly close to one another in the final alignment 2 Nucleotides sequences with long stretches of Ns will cause problems to lalign especially when using Mocca To avoid any problem filter out these nucleotides before running mocca 3 Stop codons are sometimes coded with in protein sequences This will cause the program to crash or hang Please replace the signs with an X 4 Results can differ from one architecture to another due rounding differences This is caused by the tree estimation procedcure If you want to make sure an alignment is reproducible you should keep the associated dendrogram 72 These notes are only meant for internal development Development The following examples are only meant for internal development and are used to insure stability from release to release PROFILE2LIST prf1 profile containing one structure prf2 profile containing one structure EXCL t coffee Rsample profilel aln Rsample profile2 aln special mode 3dcoffee outfile aligned prf aln Command Line List These command lines have been checked before
42. every release along with the other CL in this documentation external methods EXCL t coffee sample seql fasta in Mclustalw pair Mclustalw msa Mslow pair outfile clustal text fugue client EXCL t coffee in Ssample seq5 fasta Pstruc4 pdb Mfugue pair 73 implement UPGMA tree computation implement seq2dpa_tree debug dpa Reconciliate sequences and template when reading the template Add the server command lines to the checking procedure 74
43. fault T Coffee does not require any other package than those included in the distribution However depending on your needs you may want to install some of the following Package FACE LOM ClustalW can interact with t coffee wget 3DCoffee Automatic Downloading of Structures Remote use of the Fugue server sap structure structure comparisons obtain it from W Taylor NIMR MRC Blast www ncbi nih nlm gov Once the package is installed make sure make sure that the executable is on your path so that t coffee ca find it automatically 16 IMPORTANT All the files mentionned here sampe_seq can be found in the example directory of the distribution T COFFEE Write your sequences in the same file Swiss prot Fasta or Pir and type EXCL t coffee sample seql fasta This will output two files sample seql aln your Multiple Sequence Alignment sample seql dnd The Guide tree newick Format IMPORTANT If you are trying to align Nucleic Acid use special mode dna EXCL t coffee in sample dnaseql fasta special mode dna MOCCA Write your sequences in the same file Swiss prot Fasta or Pir and type EXCL t coffee other pg mocca sample seql fasta This command output one files your sequences gt mocca lib and starts an interactive menu 17 Use of as a separator when specifying methods parameters The most notable modifications have to do with the struct
44. ffee in Ssample seql fasta Lsample seql tc lib outfile sample seql aln If you only want some of these residues to be aligned or want to give them individual weights you will have to edit the library file yourself or use the force aln option cf FAQ I would like to force some residues to be aligned A value of N N 1000 N being the number of sequences usually ensure the respect of a constraint Q I want to use my own tree A Use the usetree lt your own tree flag EXCL t coffee sample seql fasta usetree sample tree dnd Q I want to align coding DNA A use the fasta cdna pair method that compares two cDNA using the best reading frame and taking frameshifts into account EXCL t coffee sample seq4 fasta in Mcdna fast pair Notice that in the resulting alignments all the gaps are of modulo3 except one small gap in the first line of sequence hmgl trybr This is a framshift made on purpose You can realign the same sequences while ignoring their coding potential and treating them like standard DNA EXCL t coffee sample seq4 fasta Note This method has not yet been fully tested and is only provided as is with no warranty Q I do not want to use all the possible pairs when computing the library Q I only want to use specific pairs to compute the 34 library A Simply write in a file the list of sequence groups you want to use EXCL t coffee sample seql fasta in Mclustalw pair Mclustalw msa lib l
45. handling of names is consistent with Clustalw Cf Sequence Name Handling in the Format section If your dataset contains sequences with identical names these will automatically be renamed to kckckckckckckckckckckckckckckckck ck ckck ck ck ck ck gt seql gt seql KKKKKKKKKKKKKKKKKEKKEKKKKK gt seql gt seql_1 XXKKKKKAKKKKKKKKKKKKKAXAk Warning The behaviour is undefined when this creates two sequence with a similar names 37 This reference manual gives a list of all the flags that can be used to modify the behavior of T Coffee For your convenience we have grouped them according to their nature To display a list of all the flags used in the version of T Coffee you are using along with their default value type EXCL t coffee Or EXCL t coffee help Or EXCL t coffee help in Or any other parameter Well Behaved Parameters Separation You can use any kind of separator you want i e lt space gt The syntax used in this document is meant to be consistent with that of ClustalW However in order to take advantage of the automatic filename compleation provided by many shells you can replace and with a space Posix T Coffee is not POSIX compliant Entering the right parameters There are many ways to enter parameters in T Coffee see the parameter flag in 38 Parameters Syntax No Flag If no flag is used lt your sequence gt must be the first a
46. ights can be output using the outseqweight flag Note You can use your own weights see the format section Multiple Alignment Computation msa mode Usage msa_mode lt tree graph precomputed gt Default evaluate mode tree Unsupported profile comparison Usage profile_mode lt fullN profile gt Default profile mode full50 The profile mode flag controls the multiple profile alignments in T Coffee There are two instances where t coffee can make multiple profile alignments 1 When N the number of sequences is higher than maxnseq the program switches to its multiple profile alignment mode t coffee dpa 2 When MSAs are provided via the profile flag or via profilel and profile2 In these situations the profile mode value influences the alignment computation these values are profile comparison profile the MSAs provided via profile are vectorized and the function specified by profile comparison is used to make profile profile alignments In that case the complexity is NL 2 profile comparison fullN N is an integer value that can omitted Full indicates that given two profiles the alignment will be based on a library that includes every possible pair of sequences between the two profiles If N is set then the library will be restricted to the N 53 most similar pairs of sequences between the two profiles as judged from a measure made on a pairwise alignment of these two profiles
47. ignments can be viewed as collections of constraints that must be fit within the final alignment Of course the constraints do not have to agree with one another This section shows you what are the vailable method in T Coffee and how you can add your own methods either through direct parameterization or via a perl script Using Methods Integrated in T Coffee Some packages already have an interface with t_coffee these include align_pdb sap lalign2list clustalw ALIGN_PDB_4_TCOFFEE SAP_4_TCOFFEE LALIGN_4_TCOFFEE CLUSTALW_4_TCOFFEE If these programs are installed on your system and you want t_coffee to use a specific version setenv CLUSTALW 4 TCOFFEE path to your version gt LIST OF INTERNAL METHODS Built in methods methods can be requested using the following names fast pair slow pair ifast pair islow pair Makes a global fasta style pairwise alignment For proteins matrix blosum62mt gep 1 gop 10 ktup 2 For DNA matrix idmat id 10 gep 1 gop 20 ktup 5 Each pair of residue is given a score function of the weighting mode defined by weight Identical to fast pair but does a full dynamic programming using the myers and miller algorithm This method is recommended if your sequences are distantly related Makes a global fasta alignmnet using the previously computed pairs as a library i stands for iterative Each pair of residue is given a score function of the weighting mode defined b
48. ion should be provided For instance EXCL t coffee sample seql fasta in Xpam250mt gapopen 10 gapext 1 This command results in a progressive alignment carried out on the sequences in seqfile The procedure is very similar to Pileup In this context appropriate gap penalties should be provided The matrices are in the file source matrices h Add Hoc matrices can also be provided by the user see the matrices format section at the end of this manual Profile Input profile Usage profile lt name gt maximum of 200 profiles 45 Default no default This flag causes T Coffee to treat multiple alignments as a single sequences thus making it possible to make multiple profile alignments The profile profile alignment is controlled by profile_mode and profile_comparison When provided with the in flag profiles must be preceded with the letter R EXCL t coffee profile sample alnl aln sample aln2 aln outfile profile aln EXCL t coffee in Rsample alnl aln Rsample aln2 aln Mslow pair Mlalign id pair outfile profile aln Note that when using template file the program will also look for the templates associated with the profiles even if the profiles have been provided as templates themselves however it will not look for the template of the profile templates of the profile templates profilel cw Usage profilel lt name gt one name only Default no default Similar to the previous one and was pro
49. ist sample listl lib list outfile test SAS O EEES ES wow 2 hmgl_trybr hmgt_mouse 2 hmgl_trybr hmgb_chite 2 hmgl trybr hmgl wheat 3 hmgl trybr hmgl wheat hmgl mouse SS am SAS A OATES pi Note Pairwise methods slow_pair will only be applied to list of pairs of sequences while multiple methods clustalw_aln will be applied to any dataset having more than two sequences Q There are duplicates or quasi duplicates in my set A If you can remove them this will make the program run faster otherwise the t_coffee scoring scheme should be able to avoid over weighting of over represented sequences Using Structures and Profiles Q Can I align sequences to a profile with T Coffee A Yes you simply need to indicate that your alignment is a profile with the R tag EXCL t coffee sample seql fasta Rsample aln2 aln outfile tacos Q Can I align sequences Two or More Profiles A Yes you simply tag your profiles with the letter R and the program will treat them like standard sequences EXCL t coffee Rsample alnl fasta Rsample aln2 aln outfile tacos Q Can I align two profiles according to the structures they contain A Yes As long as the structure sequences are named according to their PDB identifier EXCL t coffee Rsample profilel aln Rsample profile2 aln special mode 3dcoffee outfile aligne prf aln Q T Coffee becomes very slow when combining sequences and structures A This is true By defaul
50. ith the R identifier EXCL t coffee Ssample seql fasta Rsample aln2 aln outfile seqprofile aln All the internal methods should support profiles External methods do not support profiles unless specified otherwise Using Structures Or templates Within Profiles If your profiles contain structures you can make sure these will be used during the computatiuon by specifying the 3dcoffee special mode EXCL t coffee Rsample profilel aln Rsample profile2 aln special mode 3dcoffee outfile aligned prf aln Note that when providing a collection of templates the program will use the template file flag to look for templates within the sequences AND within the profiles associated with some sequences Using New and Existing Methods Although it does not necessarily do so explicitly T Coffee always end up combining libraries Libraries are collections of pairs of residues Given a set of 20 libraries T Coffee makes an attempt to assemble the alignment with the highest level of consistence You can think of the alignment as a timetable Each library pair would be a request from students or teachers and the job of T Coffee would be to assemble the time table that makes as many people as possible happy In T Coffee methods replace the students professors as constraints generators These methods can be any standard non standard alignment methods that can be used to generate alignments pairwise most of the time These al
51. late The type of template on which a method works is declared with the SEQ TYPE parameter in the method configuration file SEQ TYPE S a method that uses sequences SEQ TYPE PS a pairwise method that aligns sequences and structures SEQ TYPE P a method that aligns structures sap for instance There are 3 tags identifying the template type P Structural templates a pdb identifier OR a pdb file _G_ Genomic templates a protein sequence where boundary amino acid have been recoded with 0 0 i 1 j 2 R Profile Templates a file containing a multiple sequence alignment More than one template file can be provided There is no need to have one template for every sequence in the dataset P _G_ and R are known as template TAGS 2 SCRIPT lt scriptname gt Indicates that filename is a script that will be used to generate a valid template file The script will run on a file containing all your sequences using the following syntax scriptname infile your sequences outfile template file It is also possible to pass some parameters use as a separator i e the will be turned into a space 58 SCRIPT _myscriptl vall val2 3 SELF TAG original name of the sequence will be used to fetch the template EXCL t coffee 3d sample2 fasta template file SELF P The previous command will work because the sequences in 3d sample3 are named 4 SEOFILE TAG filename Use this flag if your templates are in filename and ar
52. ly carried out on pairs having a weight value superior to the specified limit extend_mode Usage extend lt string gt Default extend very fast triplet Warning Development Only Controls the algorithm for matrix extension Available modes include relative triplet Unsupported g coffee Unsupported coffee quadruplets Unsupported fast triplet Fast triplet extension very fast triplet slow triplet extension limited to the max n pair best sequence pairs when aligning two profiles slow triplet Exhaustive use of all the triplets mixt Unsupported quadruplet Unsupported test Unsupported 47 matrix Use of the matrix matrix fast_matrix Use of the matrix matrix Profiles are turned into consensus max_n_pair Usage max_n_pair lt integer gt Default extend 10 Development Only Controls the number of pairs considered by the extend_mode very_fast_triplet Setting it to O forces all the pairs to be considered equivalent to extend_mode slow_triplet seq name for quadruplet Usage Unsupported compact Usage Unsupported clean Usage Unsupported maximise Usage Unsupported do self Usage Flag do self Default No This flag causes the extension to carried out within the sequences as opposed to between sequences This is necessary when looking for internal repeats with Mocca seq name for quadruplet Usage Unsupported weight Usage weight lt winsimN sim or sim matrix name or
53. n_id_pair to the set of sequences previously obtained 1 The final library used for the alignment will be the combination of all this information Note as well the following rules 1 Order The order in which sequences methods alignments and libraries are fed in is irrelevant 2 Heterogeneity There is no need for each element A S L to contain the same sequences 3 No Duplicate Each file should contain only one copy of each sequence Duplicates are only allowed in FASTA files but will cause the sequences to be renamed 4 Reconciliation If two files for instance two alignments contain different versions of the same sequence due to an indel a new sequence will be reconstructed and used instead aln 1 hgabl AAAAABAAAAA aln 2 hgabl AAAAAAAAAACCC will cause the program to reconstruct and use the following sequence hgabl AAAAABAAAAACCC This can be useful if you are trying to combine several runs of blast or structural information where residues may have been deleted However substitutions are forbidden If two sequences with the same name cannot be merged they will cause the program to exit with an information message 5 Methods The method describer can either be built in See for a list of all the available methods or be a file describing the method to be used The exact syntax is provided in part 4 of this manual 6 Substitution Matrices If the method is a substitution matrix X then no other type of informat
54. nce alignments J Mol Biol 2004 Jul 2 340 2 385 95 PMID 15201059 PubMed indexed for MEDLINE Poirot O O Toole E Related Articles Links Notredame C Tcoffee igs A web server for computing evaluating and combining multiple sequence alignments Nucleic Acids Res 2003 Jul 1 31 13 3503 6 PMID 12824354 PubMed indexed for MEDLINE Notredame C Related Articles Links Mocca semi automatic method for domain hunting Bioinformatics 2001 Apr 17 4 373 4 PMID 11301309 PubMed indexed for MEDLINE Notredame C Higgins DG Related Articles Links Heringa J T Coffee A novel method for fast and accurate multiple sequence alignment J Mol Biol 2000 Sep 8 302 1 205 17 PMID 10964570 PubMed indexed for MEDLINE Notredame C Holm L Related Articles Links Higgins DG COFFEE an objective function for multiple sequence alignments Bioinformatics 1998 Jun 14 5 407 22 PMID 9682054 PubMed indexed for MEDLINE Mocca Notredame C Related Articles Links Mocca semi automatic method for domain hunting Bioinformatics 2001 Apr 17 4 373 4 PMID 11301309 PubMed indexed for MEDLINE CORE http igs server cnrs mrs fr cnotred Publications Pdf core pp pdf Other Contributions Some pieces of code from other packages have been incorporated within the T Coffee package These include The Sim algorithm of Huang and Miller that given two sequences computes the N be
55. non character signs are ignored in the sequence field such as numbers annotation Note a different number of lines in the different blocks will cause the program to crash or hang 68 Libraries T COFFEE LIB FORMAT 01 This is currently the only supported format space TC LIB FORMAT 01 lt nseq gt lt seql name lt seql length lt seql gt lt seq2 name lt seq2 length lt seq2 gt seq3 name seq3 length lt seq3 gt Comment Comment n Sud S227 REL REZ Wi W2 VS i2 12 13 99 12 0 vs 15 1 weiche 99 12 14 70 Ss 1 56 LS 12 13 99 12 14 Y LS 16 56 space SEQ 1 TO N Sil index of Sequence 1 Ril index of residue 1 in seql V1 Integer Value Weight V2 V3 optional values Note 1 There is a space between the And SEQ 1 TO N Note 2 The last line SEQ 1 TO N indicates that Sequences and residues are numbered from 1 to N unless the token SEQ 1 TO N is omitted in which case the sequences are numbered from O to N 1 and residues are from 1 to N Residues do not need to be sorted and neither do the sequences The same pair can appear several times in the library For instance the following file would be legal 0 1 12 ils 99 0 2 LS 16 99 0 1 12 14 79 T COFFEE LIB FORMAT 02 A simpler format is being developed however it is not yet fully supported and is only mentioned here for development purpose j me guae RE ORINA
56. o run the EXCL t coffee other pg unpack clustalw method tc method EXCL t coffee other pg unpack generic method tc method The second file generic method tc method contains many hints on how to customize your new method The first file is a very straightforward example on how to have t coffee to run Clustalw with a set of parameters you may be interested in TC METHOD FORMAT 01 23 k k k k clustalw method tc method EXECUTABLE clustalw ALN_MODE pairwise IN_FLAG INFILE OUT FLAG OUTFILE OUT MODE aln PARAM gapopen 10 SEOMIVPE S kk ckckck ck ckckckckckckckckckckock ck ck ck ck ck ck ck ck ck ck ck ck KR kk ck kk ck ckckckckckckckck XX KEK This configuration file will cause T Coffee to emit the following system call clustalw INFILE tmpfilel OUTFILE tmpfile2 gapopen 10 Note that ALN MODE instructs t coffee to run clustalw on every pair of sequences cf generic method tc method for more details The tc method files are treated like any standard established method in T Coffee For instance if the file c ustalw method tc method is in your current directory run EXCL t coffee sample seql fasta in Mclustalw method tc method Managing a collection of method files It may be convenient to store all the method files in a single location on your system By default t coffee will go looking into the directory t_coffee methods You can change this by
57. pair the current default EXCL t coffee in Ssample seql fasta Mfast pair Mlalign id pair MODFYING THE PARAMETERS It is possible to modify on the fly the parameters of hard coding methods EXCL t coffee sample seql fasta in slow_pair EP MATRIX pam250mt GOPG 10 GEP 1 EP stands for Extra parameters These parameters will superseed any other parameters Integrating External Methods DIRECT ACCESS TO EXTERNAL METHODS A special method exists in T Coffee that can be used to invoke any existing program EXCL t coffee sample seql fasta in Mem clustalw pairwise In this context Clustalw is a method that can be ran with the following command line method infile lt infile gt outfile lt outfile gt Clustalw can be replaced with any method using a similar syntax If the program you want to use cannot be run this way you can either write a perl wrapper that fits the bill or write a tc_method file adapted to your program cf next section This special method em external method uses the following syntax Em lt method gt lt aln_mode pairwise s pairwise multiple gt CUSTOMIZING AN EXTERNAL METHOD WITH PARAMETERS FOR T COFFEE T Coffee can run external methods using a tc method file that can be used in place of an established method Two such files are incorporated in T Coffee You can dump them and customize them according to your needs For instance if you have ClustalW installed you can use the following file t
58. pensive structural alignment or network intensive BLAST search operations 41 update Usage update Default turned off Causes a wget access that checks whether the t coffee version you are using needs updating full log Usage full_log lt filename gt Default turned off Causes t coffee to output a full log file that contains all the input output files other pg Usage other_pg lt filename gt Default turned off Some rumours claim that Tetris is embedded within T Coffee and could be ran using some special set of commands We wish to deny these rumours although we may admit that several interesting reformatting programs are now embedded in t coffee and can be ran through the other pg flag EXCL t coffee other pg seq reformat EXCL t coffee other pg unpack all EXCL t coffee other pg unpack extract from pdb Input Sequence Input infile cw To remain compatible with ClustalW it is possible to indicate the sequences with this flag EXCL t coffee infile sample seql fasta Note Common multiple sequence alignments format constitute a valid input format Note T Coffee automatically removes the gaps before doing the alignment This behaviour is different from that of ClustalW where the gaps are kept in Cf in from the Method and Library Input section get type Usage get type Default turned off Forces t coffee to identify the sequences type PROTEIN DNA 42 type cw
59. quences may be edited when coming out of the program Five rules apply Naming Your Sequences the Right Way 1 No Space Names that do contain spaces for instance gt seql human myc will be turned into gt seql It is your responsibility to make sure that the names you provide are not ambiguous after such an editing This editing is consistent with Clustalw Version 1 73 2 No Strange Character Some non alphabetical characters are replaced with underscores These are Other characters are legal and will be kept unchanged This editing is meant to keep in line with Clustalw Version 1 75 3 is NEVER legal except as a header token in a FASTA file 4 Name length must be below 100 characters although 15 is recommended for compatibility with other programs 5 Duplicated sequences will be renamed i e sequences with the same name in the same dataset are allowed but will be renamed according to their original order When sequences come from multiple sources via the in flag consistency of the 67 renaming is not guaranteed You should avoid duplicated sequences as they will cause your input to differ from your output thus making it difficult to track data Automatic Format Recognition Most common formats are automatically recognized by t_coffee See in and the next section for more details If your format is not recognized use readseq or clustalw to switch to another format We recommend Fasta Structu
60. r wise implementation of the Myers and Miller dynamic programming algorithm quadratic in time and linear in space This algorithm is recommended for very long sequences It is about 2 times slower than gotoh and only accepts tg mode lor 2 i e gaps penalized for opening fasta pair wise implementation of the fasta algorithm The sequence is hashed looking for ktuples words Dynamic programming is only carried out on the ndiag best scoring diagonals This is much faster but less accurate than the two previous This mode is controlled by the parameters ktuple diag mode and ndiag cfasta pair wise c stands for checked It is the same algorithm The dynamic programming is made on the ndiag best diagonals and then on the 2 ndiags and so on until the scores converge Complexity will depend on the level of divergence of the sequences but will usually be L log L with an accuracy comparable to the two first mode this was checked on BaliBase This mode is controlled by the parameters ktuple diag mode and ndiag Note Users may find by looking into the code that other modes with fancy names exists viterby pair wise Unless mentioned in this documentation these modes are not supported ktuple Usage ktuple lt value gt Default ktuple 1 or 2 Indicates the ktuple size for cfasta pair wise dp mode and fasta pair wise It is set to 1 for proteins and 2 for DNA The alphabet used for protein can be a degenerated version set
61. rating colored version of the output with the output flag EXCL t coffee sample seql fasta evaluate mode t coffee slow output score ascii score html EXCL t coffee sample seql fasta evaluate mode t coffee fast 64 output score ascii score html EXCL t coffee sample seql fasta evaluate mode t coffee non extended output score ascii score html Generic Output run name Usage run_name lt your run name gt Default no default set This flag causes the prefix lt your sequences gt to be replaced by lt your run name gt when renaming the default output files quiet Usage quiet lt stderr stdout file name OR nothing gt Default quiet stderr Redirects the standard output to either a file quiet on its own redirect the output to dev null align CW This flag indicates that the program must produce the alignment It is here for compatibility with ClustalW 65 We maintain a T Coffee server igs server cnrs mrs fr Tcoffee We will be pleased to provide anyone who wants to set up a similar service with the sources Common Problems when setting up servers CACHE Directory T Coffee needs a cache directory where it stores temporary files caches alignments and all sort of other messy things For a normal user this cache is ususally build in HOME t_coffee Yet in the case of a server such permissions may not be availale You can therefore redirect the cache by setting the environement va
62. re We recently introduced a new mode that makes T Coffee able to accurately align large datasets How can l use it T Coffee is not an interactive program It runs from your UNIX or Linux command line and you must provide it with the correct parameters If you do not like typing commands here is the simplest available mode where T Coffee only needs the name of the sequence file EXCL t coffee sample seql fasta Installing and using T Coffee requires a minimum acquaintance with the Linux Unix operating system If you feel this is beyond your computer skills we suggest you use one of the available online servers Is T Coffee different from ClustalW According to several benchmarks T Coffee appears to be more accurate than ClustalW Yet this increased accuracy comes at a price T Coffee is slower than Clustal about N times If you are familiar with ClustalW or if you run a ClustalW server you will find that we have made some efforts to ensure as much compatibility as possible between ClustalW and T COFFEE Whenever it was relevant we have kept the flag name and the flag syntax of ClustalW Yet you will find that T Coffee also has many extra possibilities If you want to align closely related sequences T Coffee can also be used in a fast mode much faster than ClustalW and about as accurate T Coffee very fast This mode is especially useful to align long sequences What T Coffee Can and Cannot do for you IMPORT
63. res PDB format is recognized by T Coffee T Coffee uses extract_from_pdb cf other_pg flag extract_from_pdb is a small embeded module that can be used on its own to extract information from pdb files Sequences Sequences can come in the following formats fasta pir swiss prot clustal aln msf aln and t_coffee aln These formats are the one automatically recognized Please replace the sign sometimes used for stop codons with an X Alignments Alignments can come in the following formats msf ClustalW Fasta Pir and t_coffee The t_coffee format is very similar to the ClustalW format but slightly more flexible Any interleaved format with sequence name on each line will be correctly parsed lt empy line gt Facultative n line of text Required line of text Facultative n empty line Required empty line Facultative n lt seql name gt lt space gt lt seql gt lt seq2 name gt lt space gt lt seq2 gt lt seq3 name gt lt space gt lt seq3 gt lt empty line gt Required lt empty line gt Facultative n lt seql name gt lt space gt lt seql gt lt seq2 name gt lt space gt lt seq2 gt lt seq3 name gt lt space gt lt seq3 gt lt empty line gt Required lt empty line gt Facultative n An empty line is a line that does NOT contain amino acid A line that contains the ClustalW annotation is empty Spaces are forbidden in the name When the alignment is being read
64. rgument See format for further information EXCL t coffee sample seql fasta Which is equivalent to EXCL t coffee Ssample seql fasta When you do so sample seq1 is used as a name prefix for every file the program outputs parameters Usage parameters parameters file Default no parameters file Indicates a file containing extra parameters Parameters read this way behave as if they had been added on the right end of the command line that they either supersede one value parameter or complete list of values For instance the following file parameter file could be used asen le ona meii rr EO in Ssample seql fasta Mfast pair output msf aln KREEKEKKEKEARRAKRRK CK KK UKCK KRK KKK AKA SEO RARA Note This is one of the exceptions with infile where the identifier tag S A L M can be omitted Any dataset provided this way will be assumed to be a sequence S These exceptions have been designed to keep the program compatible with ClustalW 39 Note This parameter file can ONLY contain valid parameters Comments are not allowed Parameters passed this way will be checked like normal parameters Used with EXCL t coffee parameters sample param file param Will cause t coffee to apply the fast pair method onto to the sequences contained in sample seq fasta If you wish you can also pipe these arguments into t coffee by naming the parameter file stdin as a rule any file named stdin is expected to receive it
65. riable CACHE 4 TCOFFEE to some mpore suitable value in your scratch area Output of the dnd file A common source of error when running a server T Coffee MUST output the dnd file because it re reads it to carry out the progressive alignment By default T Coffee outputs this file in the directory where the process is running If the T Coffee process does not have permission to write in that directory the computation will abort To avoid this simply specify the name of the output tree newtree lt writable file usually in tmp gt Chose the name so that two processes may not over write each other dnd file Permissions The t coffee process MUST be allowed to write in some scratch area even when it is ran by Mr nobody Make sure the tmp partition is not protected Other Programs T Coffee may call various programs while it runs lalign2list by defaults Make sure your process knows where to find these executables 66 Parameter files Parameter files used with parameters t_coffee_defaults dali_defaults Must contain a valid parameter string where line breaks are allowed These files cannot contain any comment the recommended format is one parameter per line lt parameter name gt lt valuel gt lt value2 gt lt parameter name Sequence Name Handling Sequence name handling is meant to be fully consistent with ClustalW Version 1 75 This implies that in some cases the names of your se
66. rned into a library where matched nucleotides receive a score equql to the average level of identity at the amino acid level This mode is intended to clean cDNA obtained from ESTs or to align pseudo genes LIST OF EXTERNAL METHODS The following methods are external They correspond to packages developped by other groups that you may want to run within T Coffee We are very open to extending these options and we welcome any request to ad an extra interface clustalw pair Uses clustalw default parameters to align two sequences Each pair of residue is given a score function of the weighting mode defined by weight clustalw msa Makes a multiple alignment using ClustalW and adds it to the library Each pair of residue is given a score function of the weighting mode defined by weight probcons pair Probcons package install the latest version from http probcons stanford edw probcons msa idem muscle pair Muscle package install the latest version from http www drive5 com muscle muscle msa idem sap_pair Uses sap to align two structures Each pair of residue is given a score function defined by sap You must have sap installed on your system to use this method fugue pair Uses fugue to align a structure and a sequence Fugue does not need to be installed the call is made through wget Unsupported 22 To request a method see the in flag For instance if you wish to request the use of fast_pair and lalign_id_
67. roduced with T Coffee A Yes You may have an alignment produced from any source you like To evaluate it do EXCL t coffee infile sample alnl aln in Lsample alnl tc lib special mode evaluate If you have no library available the library will be computed on the fly using the following command This can take some time depending on your sample size To monitor the progress in a situation where the default library is being built use EXCL t coffee infile sample alnl aln special mode evaluate Q Can I Compare Two Alignments A Yes You can treat one of your alignments as a library and compare it with the second alignment 36 EXCL t coffee infile sample aln1 1 aln in Asample aln1 2 aln special mode evaluate If you have no library available the library will be computed on the fly using the following command This can take some time depending on your sample size To monitor the progress in a situation where the default library is being built use EXCL t coffee infile sample alnl aln special mode evaluate Q I am aligning sequences with long regions of very good overlap A Increase the ktuple size up to 4 or 5 for DNA and up to 3 for proteins EXCL t coffee sample seql fasta ktuple 3 This will speed up the program It can be very useful especially when aligning ESTs Q Why is T Coffee changing the names of my sequences lll A If there is no duplicated name in your sequence set T Coffee s
68. s content via the stdin cat sample param file param t coffee parameters stdin coffee defaults Usage t coffee defaults lt file name gt Default not used This flag tells the program to use some default parameter file for t coffee The format of that file is the same as the one used with parameters The file used is either 1 file name if a name has been specified 2 4 t coffee defaults if no file was specified 3 The file indicated by the environment variable TCOFFEE DEFAULTS special mode Usage special mode hard coded mode Default not used It indicates that t coffee will use some hard coded parameters These include quickaln very fast approximate alignment dali a mode used to combine dali pairwise alignments evaluate defaults for evaluating an alignment 3dcoffee runs t coffee with the 3dcoffee parameterization dna runs t coffee with appropriate parameters Other modes exist that are not yet fully supported score Deprecated Usage score Default not used Toggles on the evaluate mode and causes t coffee to evaluates a precomputed alignment provided via infile lt alignment gt The flag output must be set to an appropriate format i e output score_ascii score html or score pdf A better default parameterization is obtained when using the flag special mode evaluate evaluate Usage evaluate 40 Default not used Replaces score This flag toggles on the evaluate mode
69. s parameters to t coffee sse trennen nentes 30 O How can I change the default output format sin 30 O My sequences are slightly different between all the alignment O Is it possible to pipe stuff OUT of t coffee inner 30 O Is it possible to pipe stuff INTO t_coffee eene ener 3l O Can I read my parameters from a file O I want to decide myself on the name of the output files l essent 3l O I want to use the sequences in an alignment file O I only want to produce a library Exit 21250 0 rn O Can t coffee align Nucleic Acids 2 sss O I do not want to compute the alignment sine O I would like to force some residues to be aligned ss 33 O I would like to use structural alignments ooo O I want to build my own libraries O I wantto align Coding DNA d re rese d RR EUR Ie pets O I do not want to use all the possible pairs when computing the library O I only want to use specific pairs to compute the library sss O There are duplicates or quasi duplicates in my set Using Structures and Profiles 1 Leere eee eee eeepc eren eerte seen setae sesta see en sete ease eene seen sese esse esas setas seen ases ese seen aae O Can I align sequences to a profile with T Coffee 2 O Can I align sequences Two or Mor
70. sed by Phylips see the format section Do NOT confuse this guide tree with a phylogenetic tree Reliability Estimation CORE Computation The CORE is an index that indicates the consistency between the library of piarwise alignments and the final multiple alignment Our experiment indicate that the higher this consistency the more reliable the alignment A publication describing the CORE index can be found on http igs server cnrs mrs fr cnotred Publications Pdf core pp pdf evaluate mode Usage evaluate mode lt t coffee fast t coffee slow t coffee non extende d gt Default evaluate_mode t_coffee_fast This flag indicates the mode used to normalize the t_coffee score when computing the reliability score t coffee fast Normalization is made using the highest score in the MSA This evaluation mode was validated and in our hands pairs of residues with a score of 5 or higher have 90 chances to be correctly aligned to one another t coffee slow Normalization is made using the library This usually results in lower score and a scoring scheme more sensitive to the number of sequences in the dataset Note that this scoring scheme is not any more slower thanks to the implementation of a faster heuristic algorithm t coffee non extended the score of each residue is the ratio between the sum of its non extended scores with the column and the sum of all its possible non extended scores These modes will be useful when gene
71. st scoring local alignments The tree reading computing routines are taken from the ClustalW Package courtesy of Julie Thompson Des Higgins and Toby Gibson Thompson Higgins Gibson 1994 4673 4680 vol 22 Nucleic Acid Research The implementation of the algorithm for aligning two sequences in linear space was adapted from Myers and Miller in CABIOS 1988 11 17 vol 1 Various techniques and algorithms have been implemented Whenever relevant the source of the code algorithm idea is indicated in the corresponding function 64 Bits compliance was implemented by Benjamin Sohn Performance Computing Center Stuttgart HLRS Germany Prof David Jones UCL reported and corrected the PDBIK bug now t_coffee sap can align PDB sequences longer than 1000 AA What is T Coffee Before going deep into the core of the matter here are a few words to quickly explain some of the things T Coffee will do for you What does it do T Coffee is a multiple sequence alignment program given a set of sequences previously gathered using database search programs like BLAST FASTA or Smith and Waterman T Coffee will produce a multiple sequence alignment To use T Coffee you must already have your sequences ready What can it align T Coffee will align DNA and protein sequences alike although it does better at aligning proteins than nucleic acids It will be able to use structural information for protein sequences with a known structu
72. supersedes the use of the convert flag Its main 31 advantage is to restrict computation time to the actual library computation Q I want to turn an alignment into a library A use the lib_only flag EXCL t coffee in Asample alnl aln out lib sample libl tc lib lib only It is also possible to control the weight associated with this alignment see the weight section EXCL t coffee in Asample alnl aln out lib sample libl tc lib lib only weight 1000 Q I want to concatenate two libraries A You cannot concatenate these files on their own You will have to use t coffee Assume you want to combine tc libl tc lib and tc lib2 tc lib EXCL t coffee in Lsample libl tc lib Lsample lib2 tc lib lib only out lib sample lib3 tc lib Q What happens to the gaps when an alignment is fed to T Coffee A An alignment is ALWAYS considered as a library AND a set of sequences If you want your alignment to be considered as a library only use the S identifier EXCL t coffee Ssample alnl aln outfile outaln It will be seen as a sequence file even if it has an alignment format gaps will be removed Q I cannot print the html graphic display l A This is a problem that has to do with your browser Instead of requesting the score html output request the score ps output that can be read using ghostview EXCL t coffee sample seql fasta output score ps or EXCL t_coffee sample seq2 fasta output score pdf Q
73. t coffee sample seql fasta output clustalw gcg score html A publication describing the CORE index is available on http igs server cnrs mrs fr cnotred Publications Pdf core pp pdf outseqweight Usage outseqweight lt filename gt Default not used Indicates the name of the file in which the sequences weights should be saved Case Usage case lt keep upper lower gt Default case keep Instructs the program on the case to be used in the output file Clustalw uses upper case The default keeps the case and makes it possible to maintain a mixture of upper and lower case residues If you need to change the case of your file you can use seq_reformat EXCL t coffee other pg seq reformat in sample alnl aln action lower output clustalw 62 cpu Usage deprecated outseqweight Usage outseqweight lt name of the file containing the weights applied Default outseqweight no Will cause the program to output the weights associated with every sequence in the dataset outorder cw Usage outorder lt input OR aligned OR filename gt Default outorder input Sets the order of the sequences in the output alignment outorder input means the sequences are kept in the original order outorder aligned means the sequences come in the order indicated by the tree This order can be seen as a one dimensional projection of the tree distances outdorder lt filename gt Filename is a legal fasta file
74. t the structures are feteched on the net using RCSB The problem arises when T Coffee looks for the structure of sequences WITHOUT 35 structures One solution is to install PDB locally In that case you will need to set two environement variables setenv PDB DIR directory containing the pdb structures Setenv NO REMOTE PDB DIR 1 Interestingly the observation that sequences without structures are those that take the most time to be checked is a reminder of the strongest rational argument that I know of against torture any innocent would require the maximum amount of torture to establish his her innocence which sounds ahem strange and at least inneficient Then again I was never struck by the efficiency of the Bush administration Alignment Evaluation Q How good is my alignment A see what is the color index Q What is that color index A T Coffee can provide you with a measure of consistency among all the methods used You can produce such an output using EXCL t coffee sample seql fasta output score html This will compute your seq score html that you can view using netscape An alternative is to use score ps or score pdf that can be viewed using ghostview or acroread score ascii will give you an alignment that can be parsed as a text file A book chapter describing the CORE index is available on http igs server cnrs mrs fr cnotred Publications Pdf core pp pdf Q Can I evaluate alignments NOT p
75. the use of t_coffee_dpa 55 dpa master aln Usage dpa master aln File method Default dpa master aln NO When using dpa t coffee needs a seed alignment that can be computed using any appropriate method By default t coffee computes a fast approximate alignment A pre alignment can be provided through this flag as well as any program using the following syntax your script in fasta file out file name dpa maxnseq Usage dpa maxnseq integer value gt Default dpa maxnseq 30 Maximum number of sequences aligned simultaneously when DPA is ran Given the tree computed from the master alignment a node is sent to computation if it controls more than dpa maxnseq OR if it controls a pair of sequences having less than dpa min score2 percent ID dpa min scorel Usage dpa min scorel integer value gt Default dpa min score1 95 Threshold for not realigning the sequences within the master alignment Given this alignment and the associated tree sequences below a node are not realigned if none of them has less than dpa min scorel 46 identity dpa min score2 Usage dpa min score2 Default dpa min score2 Maximum number of sequences aligned simultaneously when DPA is ran Given the tree computed from the master alignment a node is sent to computation if it controls more than dpa maxnseq OR if it controls a pair of sequences having less than dpa min score2 percent ID dap tree NOT I
76. ure of the input From version 2 20 all files must be tagged to indicate their nature A alignment S Sequence L Library We are becoming stricter but that s for your own good Another important modification has to do with the flag matrix it now controls the matrix being used for the computation 18 This manual is at a very preliminary stage of redaction and will only show you how to do the very basic with T Coffee In order to solve a more specific problem or answer a query we suggest you first go through the FAQ to see of your problem has been addressed read it and then read carefully the documentation associated with corresponding flags Of course we also welcome queries and do our best to provide answers and clues in a timely manner Using T Coffee Standard Alignments T Coffee can align sequences structures and profiles The default mode when using t_coffee is EXCL t coffee sample seql fasta It is also possible to combine sequences from various sources EXCL t coffee sample seql fasta sample seq2 fasta Or even sequences coming from sequences and alignment files EXCL t coffee sample seql fasta Ssample aln2 aln Note the S identifier tells the program to use the alignment as a collection of unaligned sequences Alignment Combination It is possible to combine several alignments into one final alignment EXCL t coffee in Asample alnl 1 aln Asample alnl 2 aln outfile combined aln aln Note the
77. vided for compatibility with ClustalW profile2 cw Usage profilel lt name gt one name only Default no default Similar to the previous one and was provided for compatibility with ClustalW Alignment Computation Library Computation Methods lalign n top Usage lalign_n_top lt Integer gt Default lalign_n top 10 Number of alignment reported by the local method lalign align pdb param file Unsuported align pdb hasch mode Unsuported Library Computation Extension lib list Unsupported Usage lib_list lt filename gt 46 Default unset Use this flag if you do not want the library computation to take into account all the possible pairs in your dataset For instance Format 2 Namel name2 2 Namel name4 3 Namel Name2 Name3 the line 3 would be used by a multiple method do_normalise Usage do_normalise lt 0 or a positive value gt Default do normalise 1000 Development Only When using a value different from 0 this flag sets the score of the highest scoring pair to 1000 extend Usage extend lt 0 1 or a positive value gt Default extend 1 Development Only When turned on this flag indicates that the library extension should be carried out when performing the multiple alignment If extend 0 the extension is not made if it is set to 1 the extension is made on all the pairs in the library If the extension is set to another positive value the extension is on
78. wn library cf next section convert your aln into a lib using the weight flag EXCL t coffee in Asample alnl aln out lib test lib tc lib lib only weight sim pam250mt EXCL t coffee in Asample alnl aln Ltest lib tc lib outfile outaln EXCL t coffee in Asample aln1 1 aln Asample alnl1 2 aln Mfast pair Mlalign id pai r outfile out aln Generating Your Own Libraries This is suitable if you have local alignments or very detailed information about your potential residue pairs or if you want to use a very specific weighting scheme You will need to generate your own libraries using the format described in the last section You may also want to pre compute your libraries in order to save them for further use For instance in the following example we generate the local and the global libraries and later re use them for combination into a multiple alignment EXCL t coffee sample seql fasta in Mslow pair out lib slow pair seql tc lib lib only EXCL t coffee sample seql fasta in Mlalign id pair out lib lalign id pair seql tc lib lib only Once these libraries have been computed you can then combine tem at your convenience in a single MSA Of course you can decide to only use the local or the global library 27 EXCL t coffee sample seql fasta in Llalign id pair seql tc lib Lslow pair seql tc lib 28 IMPORTANT All the files mentionned here sampe seq can be found in the example l director
79. y weight The Library used for the computation is the one computed before the method is used The resullt is therefore dependant on the order in methods and library are set via the in flag align pdb pair Uses the align pdb routine to align two structures The pairwise scores are those returnes by the align pdb program If a structure is missing fast pair is used instead Each pair of residue is given a score function defined by align pdb UNSUPORTED 21 lalign id pair Same as lalign rs pir but using the level of identity as a weight lalign s pair Same as above but does also the self comparison s stands for self This is needed when extracting repeats The weights used that way are based on identity lalign rs s pair Same as above but does also the self comparison s stands for self This is needed when extracting repeats Matrix Amy matrix can be requested Simply indicate as a method the name of the matrix preceded with an X i e Xpam250mt If you indicate such a matrix all the other methods will simply be ignored and a standard fast progressive alignment will be computed If you want to change the substitution matrix used by the methods use the matrix flag fast cdna pair This method computes the pairwise alignment of two cDNA sequences It is a fast pair alignment that only takes into account the amino acid similarity and uses different penalties for amino acid insertions and frameshifts This alignment is tu
80. y of the distribution Abnormal Terminations and Wrong Results Q The program keeps crashing when I give my sequences A This may be a format problem Try to reformat your sequences using any utility readseq We recommend the Fasta format If the problem persists contact us A Your sequences may not be recognized for what they really are Normally T Coffee recognizes the type of your sequences automatically but if it fails use EXCL t coffee sample_seql fasta type PROTEIN Q The default alignment is not good enough A see next question Q The alignment contains obvious mistakes A This happens with most multiple alignment procedures However wrong alignments are sometimes caused by bugs or an implementation mistake Please report the most unexpected results to the authors Q The program is crashing A If you get the message FAILED TO ALLOCATE REQUIRED MEMORY See the next question If the program crashes for some other reason please check whether you are using the right syntax and if the problem persists get in touch with the authors 29 Q I am running out of memory A You can use a more accurate slower and less memory hungry dynamic programming mode called myers_miller_pair_wise Simply indicate the flag EXCL t coffee sample seql fasta special mode low memory Note that this mode will be much less time efficient than the default although it may be slightly more accurate In practice the par
81. you are done Output Control Generic Conventions Regarding Filenames stdout stderr stdin no dev null are valid filenames They cause the corresponding file to be output in stderr or stdout for an input file stdin causes the program to requests the corresponding file through pipe No causes a suppression of the output as does dev null Identifying the Output files automatically In the t_coffee output each output appears in a line FILENAME lt name gt TYPE Type FORMAT Format Alignments outfile Usage outfile lt out aln file default no gt Default outfile default Indicates the name of the alignment output by t coffee If the default is used the alignment is named your sequences gt aln output Usage output lt formatl format2 gt 61 Default output clustalw Indicates the format used for outputting the outfile Supported formats are clustalw_aln clustalw ClustalW format gcg msf aln pir aln fasta aln phylip pir seq fasta seq As well as Score ascii score html score pdf Score ps MSF alignment pir alignment fasta alignment Phylip format pir sequences no gap fasta sequences no gap causes the output of a reliability flag causes the output to be a reliability plot in HTML idem in PDF if ps2pdf is installed on your system idem in postscript More than one format can be indicated EXCL

T-Coffee User Guide and Reference Manual

Contents

Download Pdf Manuals

Related Search

Related Contents