Home

Mascot 2.4: Installation & Setup Manual

image

Contents

1. External Intranet TCP IP i _ TCP IP Mascot master TCP IP TCP IP A 100Base T Hub TCP IP Mascot node 2 a cs Mascot Cluster Topology Mascot node 3 182 Mascot Installation and Setup Hardware Requirements All machines in a cluster should have processors of the same speed Otherwise the box with the slower processor s will become a bottleneck Two network ports are required on the master server one for external access and one for communication on the private local area network LAN that connects the master to the nodes The LAN for the cluster should run at least at 100Mb s The total amount of RAM required in a cluster is a function of how many sequence databases need to be held in memory simultaneously Mascot supports an unlimited number of databases but only those that are searched frequently justify being locked in memory The others can be allowed to swap in and out of memory as needed For example a 5 node 8 processor cluster non searching head node might have 12 GB on the master and 6 GB on each of the search nodes Assuming memory re quirements for the OS are negligible this gives nearly 24 GB in total for searches and databases Even though 5 or 10 searches might be running this should be sufficient to allow the more popular databases to remain resident in
2. mascot sequence SwissProt incoming current old NCBinr incoming current old For each database the incoming directory provides a workspace for downloading and expanding a new database file The current directory contains the active database and this is where Mascot Monitor creates the memory mapped compressed files The old directory is where the immediate past database file is archived just in case In the Mascot configuration the filename for each database must in clude a wild card This is to enable the automatic recognition and ex change of an update file For example the filename for the SwissProt database might be defined as SwissProt_ fasta This would match to filenames that included a release number e g SwissProt_2012_03 fasta ora date stamp e g SwissProt_20120311 fasta Whenever Monitor sees a file in that directory which matches to the database name and is not the current database it will initiate the ex change process This is why the wild card is important even though you may not wish to track database dates or revision numbers Even if you never intend to swap a database and have called it say SwissProt fasta you must still define it in the Mascot configuration using a wild card as SwissProt fasta 60 Mascot Installation and Setup Database File Update Procedure Mascot Database Manager can update database
3. I 8 P P Important Because Mascot frequently writes status information to nodelist txt you should open the file in a text editor that puts a lock on the file e g vi or wordpad This will prevent Mascot from modifying the file while it is being edited nodelist txt can be viewed using Mascot Status Node There must be one or more node entries Items in square brackets are optional but the commas must always be supplied IP address Port Host name Number of processors OS Home dir Home dir from master IP address port and host name must always be specified The number of processors to be used on the node can be less than the number of processors available If the total number of processors speci Chapter 11 Cluster Mode 207 fied in all the nodes exceeds the number of licenses available only the licensed number will be used at any one time If the OS is not specified then the DefaultNodeOS is used Must be one of the choices shown under DefaultNodeOS The home directory is the local path on the node to the root of the Mascot directory structure If this is not specified then DefaultNodeHomeDir is used Home directory from master is the home directory on the node as seen from the master This parameter is only applicable to a Windows cluster and must be omitted for a Linux cluster Once a cluster has been started an additional four status values will be written periodically to node
4. If you chose to configure Mascot as a single SMP server you will see a screen similar to the one above and can proceed to start Mascot Monitor If you chose cluster mode refer to Chapter 11 for further configuration information Start Monitor at a shell prompt You must have root privileges for this ed usr local mascot bin ms monitor exe Then follow the hyperlink to the Database Status page to register your product key 14 Mascot Installation and Setup Step 6 Licence Registration Register product key e C fi Obogong mascot_2_4_0_64 x cgi ms status exe Show REGPRODUCTKEY Mascot Server Product Key Registration View current licence information View database status You are about to be transferred to the Matrix Science licensing website to register a new product key When the registation process has been completed a licence file will be sent via e mail which should then be saved into the following directory on this server usr local mascot_2_4_0_64 config licdb Register Online Now Offline Registration Ifyou are unable to view this page from a computer that can access the Internet then click the button below to download a product registration file You can then transfer this file to another computer which does have Internet access and open the URL shown below in a web browser When prompted select the registration file that you saved http uww matrixscience com licensing register Save Registr
5. 1 Creates compressed files from the databases checking that the FASTA database files are valid minor errors in the files are re ported as warnings more serious errors stop the databases from being used 2 These files can then be mapped into memory to improve search times 3 Allows swapping and updating of databases without interruption to executing searches This means that Mascot can be available for running searches 24 7 4 Deletes old copies of the FASTA databases to stop the disk becom ing full only the most recent copy is kept 5 Optionally email a system administrator with serious errors requiring immediate attention Configuration of email settings in the options section of mascot dat is described in Chapter 6 6 Optionally email users with their results if they didn t wait for them Configuration of email settings in the options section of mascot dat is described in Chapter 6 Sequence Of Events When A New Database Is Added When a new or updated database is added to a directory the following sequence of events takes place 1 If the entry in the mascot dat file indicates that there should also be a reference file containing full text entries Monitor looks for a file with the same name as the new file but with a ref or dat extension instead of fasta If there is no such file the swap to the new database stops 2 Compressed index files are made from the fasta and reference files For example the fo
6. Insufficient data segment size will cause a large master_results pl script to fail and a mascot search to fail with an error M00000 Out of memory malloc number of bytes requested Swap space When all physical memory is exhausted swap space is used When all swap space is used no more memory can be allocated and an error will be reported There is a different way of setting up swap space on each system see system documentation Mascot shows free swap space for cluster nodes only Stack space Has not been a problem yet 22 Mascot Installation and Setup Thread stack space This is not normally an issue since it is increased by all the binaries at run time to 128k 23 Installation Microsoft Windows Release Notes Mascot 2 4 is compiled for 32 bit and 64 bit Windows Refer to the re lease notes for last minute additions to documentation and the Matrix Science web site support page for patches and known issues http www matrixscience com mascot_support html Cluster Mode If you have a licence to run Mascot on multiple processors and plan to do so on a networked cluster of machines then please familiarise yourself with the material in Chapter 11 Cluster Mode before proceeding with the installation Overview To install or upgrade Mascot the following steps need to be performed 1 2 6 Verify that the computer has sufficient memory and disk space Verify that the computer
7. To override this setting for a particular node enter the directory on the node line DefaultNodeHomeDirFromMaster This is the directory on the node as seen from the master For a Windows cluster this must be present and specified as a UNC name The text lt host_name gt will be replaced by the host name as specified in the Node line For a Linux cluster this parameter must be commented out MascotNodeScript This script is run for each node with the following parameters i ip address of node required t The task to be performed required either StopNode the script will try and stop the Mascot Node daemon or service on the specified node or StartNode the script will unconditionally update ms mascotnode exe mascot license and mascot dat on the specified node then start the Mascot Node daemon or service f Full path to the node s home directory required r Port number of node required O The operating system running on the node required For a Linux cluster the master and search nodes must be able to com municate using either ssh preferred or rsh without requiring pass words or passphrases In the case of ssh key based authentication is the preferred mechanism A less secure alternative for rsh is provided by file based authentication using rhosts or hosts equiv As shipped load_node pl executes ms mascotnode exe as root on each search node If this is not acceptable the
8. 6 When there are no more searches running that use the old data base the files for the old database will be unmapped from memory and the new files are then mapped into memory 7 Any files in the old directory for the database which have the same base name as the current files are deleted 8 The fasta and ref files for the outgoing database are moved to the old directory 9 The compressed index files are for the outgoing database are deleted Why Memory Map and Lock the FASTA Files To speed up the processing of the FASTA files they should be mapped into memory Databases can be configured in three operational modes 1 Without memory mapping Do not choose this option it will make searches very slow 2 Memory mapping the database files but not locking the memory This gives the best performance in most cases When the system gets low on memory the files are swapped out of memory to disk 114 Mascot Installation and Setup GetSeq On most platforms this will give better performance than simply relying on the system file cache 3 Memory mapping the files and locking the memory This gives the best possible performance but does require sufficient RAM for the databases the operating system searches and any other appli cations that are to run concurrently with Mascot In order to reduce the amount of memory required and to prevent memory fragmentation the sequence strings from the FASTA database are sav
9. The file system NFS or a local file system needs to support file locking and memory mapping The following files will be locked unlocked using the fentl F_SETLKW system call mascot job getseq job mascot control mascotnode control If Mascot Daemon Mascot Distiller or any application using the task management functions in client p1is used then there will be a task_id file in each data yyyymmdd directory that will be locked unlocked The following files will be memory mapped for r w mascot control mascotnode control The location of these files can all be specified in the options section of mascot dat so that if necessary they can be put on a local file system Fasta files greater than 2 GB are fully supported on ext2 ext3 and ext4 partitions System limits Memory limits There are several types of memory limits that can stop Mascot from running 1 Virtual address space When files are memory mapped the address space required can be large the amount of physical RAM swap space is not an issue here 2 The amount of memory that can be locked On most systems memory can only be locked by root 3 Physical memory It is obviously not possible to lock more memory than is physically available Chapter 2 Installation Linux 19 4 Data segment size The amount of memory that an executable or Perl script has access to The default is sometimes too small to run master_results pl and big searches 5 Swap space
10. or enter which is only shown in ambiguous cases Bold italic fixed pitch font indicates a variable for which an appropriate value should be entered Introduction Mascot is a software system for protein identification by matching mass spectrometry MS data against FASTA format protein or nucleic acid sequence databases This can be done in three different ways 1 A Peptide Mass Fingerprint PMF in which the MS data are peptide molecular masses from the digestion of a protein by an enzyme 2 A Sequence Query SQ also called a sequence tag in which MS data are combined with amino acid sequence or composition data 3 An MS MS Ions Search MIS which uses MS MS data from one or more peptides MS data are submitted to Mascot in the form of peak lists That is lists of centroided mass values possibly with associated intensity values The result of a search is a ranked list of the most closely matching proteins Mascot uses a probability based scoring algorithm so that it is possible to report whether a match is statistically significant If an exact match is not present in the database the highest scoring matches will be those entries which exhibit the greatest homology Overview This manual describes how to install configure and administer Mascot It is not a User Guide Mascot includes a linked collection of HTML help pages that provide guidance and application related reference material for end users Mascot
11. refers to the data system on which the Mascot search engine executes The term client is used very loosely It may refer to a data system attached to a mass spectrometer or it may refer to any system at which a user interacts with the Mascot server via a web browser In a small laboratory the server and client may be one and the same computer This doesn t affect installing or using Mascot but it does introduce additional considerations such as the need to adjust system priorities to ensure that the instrument control and data acquisition software is responsive to the real time needs of both instrument and operator Configuration Mascot configuration files are structured text files Modifications can be made using a browser based configuration editor and take effect without a system restart Search Engine The Mascot search engine accepts data and parameters on STDIN in MIME format executes a search of the specified FASTA format data base and outputs a structured text file containing the search results together with the input data and the complete set of search parameters The results file contains everything necessary to repeat the search at a later date should the need arise In the default configuration a new directory is created on the server for each day s results files If required the contents of these results files can be parsed into an external database to be queried and analysed Monitor Swapping datab
12. unigene contains sub directories for species specific UniGene indexes x cgi is a directory for administrative CGI executables to which access may need to be restricted This can be achieved using either Mascot security or web server security Installation Clean Installation Create a directory for the Mascot program files In documentation this is assumed to be called mascot but any name can be used This directory should not be in a path mapped to a web server URL Version upgrade If upgrading Perl do this before upgrading Mascot Ensure that no one will try to use Mascot during the upgrade procedure Kill the ms monitor exe process You might wish to make a backup of the existing files before they are overwritten All configuration files in the config directory apart from mascot dat and the security settings will be overwritten by new files All results files and sequence databases apart from SwissProt will be retained The installation script will update mascot dat with by adding any new options but will retain all existing sequence database configu ration settings and other options Unpack the Mascot file system Copy the files mascot tar bz2 and swissprot tar bz2 from the Mascot DVD into the mascot directory and unpack the archives bzip2 d mascot tar bz2 bzip2 d swissprot tar bz2 tar xvf mascot tar 10 Mascot Installation and Setup tar xvf swissprot tar For 32 bit Linux the 32 bit binaries should be u
13. For example to add a choice of Ferns or human add the following to the taxonomy file 168 Mascot Installation and Setup Tat les Include Exclude Titles Include Exclude Ferns or human 3263 And to add the choice of Not human or mice add the following to the taxonomy file Not human or mice 10088 Note that all species or root has the ID 1 It is of course possible to accidentally specify a selection that will result in no species matching for example include humans and exclude ani mals If you wish to include species in the taxonomy file without having them appear on the search form the keyword Hidden should appear on the line following the title line Location and format of species lists 3701 Arabidopsis scientific name 3701 Cardaminopsis synonym 3702 Arabidopsis thaliana scientific name 3702 Arbisopsis thaliana misspelling 3702 thale cress preferred common name 3702 thale cress common name 3702 mouse ear cress common name NCBI Files The NCBI provide two files that list all the species for which they have one or more sequences These files are called names dmp and nodes dmp As shown above names dmp is a list of scientific names synonyms and misspellings for the species From this list you can easily find the ID for the given species For example The file nodes dmp specifies the tree structure The first column is a taxonomy ID and the
14. Mascot will reconfigure the sub cluster to exclude the faulty node and re start 212 Mascot Installation and Setup Configuration mascot dat SubClusterSet X Y Large clusters can be divided into sub clusters X is a unique integer value 0 based used to identify the sub cluster Y is the maximum number of licensed processors assigned to the sub cluster Since a li censed processor is good for up to 4 cores it may be clearer to think of Y as cores 4 A single cluster must have a single entry with X set to 0 nodelist txt This file is used to define the nodes that belong to the cluster For a very large cluster it is advisable to define a few percent of additional nodes as spares For example if 51 nodes with 102 processors were available and Mascot was configured to use 2 sub clusters each of 50 processors the node with the 2 spare processors could be used to replace a failed node automatically At start up ms monitor starts each sub cluster in turn taking the required number of nodes from nodelist txt in the order specified in the file If you wish to override this behaviour specify a sub cluster number in nodelist txt Each line begins with the word Node followed by a space and then a comma delimited list of configuration param eters ip address port computer host name maximum number of node CPU s to be used operating system local path to home directory status 0 available sub c
15. series single charge 5 b series double charge 6 y series single charge 7 y NH3 series single charge 8 y series double charge 9 ce series single charge 10 c series double charge 11 x series single charge 12 x series double charge 13 z series single charge 14 z series double charge 15 a H20 series single charge 16 a H20 series double charge 17 b H20 series single charge 18 b H20 series double charge 19 y H20 series single charge 20 y H20 series double charge 21 a NH3 series double charge 22 b NH3 series double charge 23 y NH3 series double charge 25 internal yb series single charge 26 internal ya series single charge 27 z H series single charge 28 z H series double charge 29 high energy d and d series single charge 31 high energy v series single charge 32 high energy w and w series single charge 33 z 2H series single charge 34 z 2H series double charge Chapter 8 I O File Formats 159 If there are multiple tags for a query comma separated groups of these numbers are output for each tag hn_qm_drange is output for a query that includes an error tolerant sequence tag It defines the range of positions within which an unsus pected modification has been located For a peptide of 1
16. If this is not the case make the necessary changes then save mascot dat For a 5 node 10 cpu cluster typical entries might be Cluster Enable 1 or disable 0 cluster mode Enabled 1 MasterComputerName must be the hostname MasterComputerName zx80 Node defaults DefaultNodeOS Linux DefaultNodeHomeDir usr local mascotnode Following line must be commented out WHEN this is a homogeneous MascotNodeScript usr local mascot bin load_node pl Sub cluster definition Syntax is SubClusterSet X Y where X is the sub cluster number and Y is the maximum number of CPUs to use within the given sub SubClusterSet 0 10 Time outs log files IPCTimeout 5 seconds with no response before timeout IPCLogging 0 no logging 0 minimal 1 verbose 2 IPCLogfile logs ipc log relative path CheckNodesAliveFreq 30 seconds between node health checks SecsToWaitForNodeAtStartup 20 seconds to wait for node to end 202 Mascot Installation and Setup 3 Open mascot config not nodelist txt ina text editor Enter configuration information for the cluster The parameters are fully described below in the Reference section Save as nodelist txt Fora 5 node 10 cpu cluster typical entries might be Cluster node definitions Each line begins with the word Node followed by a space and then a comma delimited list of configuration parameters ip address port computer host n
17. Installation and Setup If set to 1 each charge state will be searched but only the charge state that gets the highest scoring match is saved to the result file and re ported This is the recommended setting Note that this switch only applies to MS MS queries including tags Independent queries are always generated if multiple charge states are specified for molecular mass queries CacheDirectory data cache Y m Cache files are created and to improve performance when viewing large search results This option specifies the relative path from the cgi direc tory to the location for saving report cache files The actual directory will be for example data cache 2010 02 uwcuxlsxx3s524f4vnnz3btmni where the lowest level directory is an mddsum of the dat filename the size and last modified date of the dat file The tokens are followed by any of the conversion specifiers supported by the strftime function http www cplusplus com reference clibrary ctime strftime For example Y gets converted to the year as a decimal number including the century m to the month as a decimal number range 01 to 12 and d to the day of the month as a decimal number range 01 to 31 The date used will be the last modified date of the dat file rather than the time that the search started See also ResfileCache and ResultsCache CentroidWidth 0 25 CentroidWidthCount 1000 CentroidWidth is the width in Daltons of the sliding window used for
18. MailTransport 2 MonitorEmailCheckFreq 300 SendmailPath usr lib sendmail Mascot can be configured to use email for two purposes 1 When the search engine executes as a CGI application email can be used to send the results of a search to a user who accidentally or deliberately disconnected before the search was complete This facility can be enabled by setting EmailUsersEnabled to 1 or disabled by setting it to 0 2 Serious error messages can be emailed to an administrator This facility can be enabled by setting EmailErrorsEnabled to 1 or disabled by setting it to 0 Error messages that are considered serious are identified in the file errors html This file can be found in the root directory of the installation CD ROM and is displayed by clicking on the link Error message descriptions at the top of the database status page A number of parameters are used to define how email should be sent MailTransport should be set to one of the following values 0 for CMC 1 for MAPI 2 for sendmail 3 for Blat EmailService is the service name CMC only EmailPassword is the password if any required to log onto MAPI or CMC 84 Mascot Installation and Setup EmailProfile is the MAPI profile name sendmailPath is the path to sendmail or an equivalent EmailFromUser is the name which will appear in the From field of the email message EmailFromTextName will appear in the Title field of the message If Email
19. Se 07 1 u K TLNDELELI GMK F 40 760 8461 1519 6777 1519 7439 43 58 0 93 3 2e 07 1 u K TLNDELELI GMK F Oxidation M 4 CH60 STRMS Mass 57312 Score 42 Matches 1 1 Sequences 1 1 60 kDa chaperonin OS Stenotrophomonas maltophilia strain R551 3 GNegroL PE 3 SV 1 E Check to include this hit in error tolerant search or archive report Query Observed Mr expt Mx calc ppn Miss Score Expect Rank Unique Peptide 16 456 7806 911 5467 911 6168 76 92 1 42 0 035 2 Y R GIVKVVAVK A Proteins matching the same set of peptides CH60 STRMK Mass 7339 Score 42 Matches 1 1 Sequences 1 1 60 kDa chaperonin OS Stenotrophomonas maltophilia strain K279a GN groL PE 3 SV 1 CH60 XANAC Mass 57131 Score 42 Matches 1 1 Sequences 1 1 60 kDa chaperonin 8 Xanthomonas axonopodis pv citri GN groL PE 3 sv 1 CH60 _XANCS Mass 57163 Score 42 Matches 1 1 Sequences 1 1 60 kDa chaperonin OS Xanthomonas campestris pv vesicatoria strain 85 10 GN groL PE 3 SV 1 CH0 XANCS Maga 57149 Score 42 Matches 1 1 Sequences 1 1 60 kDa chaperonin OS Xanthomonas campestris pv campestris strain 8004 GN groL PE 3 Sv 1 CH60 XANCB Mass 57177 Score 42 Matches 1 1 Sequences 1 1 60 kDa chaperonin OS Xanthononas campestris pv campestris strain 8100 GN groL PE 3 Ssv 1 CH60 XANCH Mass 57190 Score 42 Matches 1 1 Sequences 1 1 60 kDa chaperonin OS Xanthomonas campestris pv phaseoli GNe grol PE 3 SV 1 CH6O_XANCE Mass 57149 Score 42 Matches 1 1
20. There are often many names for one particular species e g homo sapiens human man 3 Names are sometimes misspelled e g homo sapeins 4 Continual re classification of species is taking place 5 Some non redundant databases only reliably give one species when several submissions from different species have identical sequences 6 There are differences of opinion regarding the taxonomy tree struc ture This section describes how the Mascot taxonomy filter works and how to configure it Most of the configuration that will be required should be simple to change for example the list of species displayed in the search form can be modified easily and it is fairly simple to download updated taxonomy lists from the vendors of public web sites However to modify the configuration to use a new format and a different numbering system is a more complex task that may take some time The NCBI keeps a list of taxonomy ID s up to date and guarantees that the ID for a given species will not change although some of the names used for that ID may change Mascot configurations all use the NCBI IDs but it would be possible to configure mascot to use a different sys tem 166 Mascot Installation and Setup Modifying the list in the search form window The list in the search form is taken from the taxonomy file in the mascot config directory cRAP IPI human Quantitation None Taxonomy All entnes All entries Fixed Arch
21. X 2 cA JO search Sie Favorites A 2 A Cl x Snagit Powermarks DA A http Frillimascot x cgi ms config exe u 1172165637 Mascot Configuration Enzymes Title Sense Cleave at Restrict Independent Semispecific Trypsin C Term KR P no no Edit Delete Arg C C Term R P no no Edit Delete Asp N N Term BD no no Edit Delete Asp N_ambic N Term DE no no Edit Delete Chymotrypsin C Term no no Edit ete CNBr C Term M no no Edit Delete P C Term M CNBr Trypsin Fann KR no no Edit Delete Formic_acid C Term D no no Edit Delete Lys C C Term K no no Edit Delete Lys C P C Term K no no Edit Delete Pepsin amp C Term FL no no Edit Delete Tryp CNBr C Term no no Edit Delete TrypChymo C Term no no Edit ete Trypsin P C Term KR no no Edit Delete V8 DE C Term no no Edit Delete VB E C Term Ez no no Edit Delete semitrypsin C Term KR no yes Edit Delete N Term BD LysC AspN C Term K no no Edit Delete None ade new enzyme yy Local intranet Enzyme None is a special case which cannot be modified or deleted All the other enzyme definitions can be edited or deleted and new ones added The edit page allows you to test a new enzyme definition against a protein Chapter 6 Configuration amp Log Files 69 F Mascot configuration Microsoft Internet Explorer Eile Edit View Favorites Tools Help Q pax Q x a EA Search Sie Favorites B s Cil Hi Snagit Powermarks We A Address http Frill mascot x cgi ms confi
22. and provided that you do these two things a Accompany the combined library with a copy of the same work based on the Library uncombined with any other library facilities This must be distributed under the terms of the Sections above b Give prominent notice with the combined library of the fact that part of it is a work based on the Library and explaining where to find the accompanying uncombined form of the same work 8 You may not copy modify sublicense link with or distribute the Library except as expressly provided under this License Any attempt otherwise to copy modify sublicense link with or distribute the Library is void and will automatically terminate your rights under this License However parties who have received copies or rights from you under this License will not have their licenses terminated so long as such parties remain in full compliance 9 You are not required to accept this License since you have not signed it However nothing else grants you permission to modify or distribute the Library or its derivative works These actions are prohibited by law if you do not accept this License Therefore by modifying or distributing the Library or any work based on the Library you indicate your acceptance of this License to do so and all its terms and conditions for copying distributing or modifying the Library or works based on it 10 Each time you redistribute the Library or any work based on the Libra
23. and then click Settings 1394 Connection Settings Local rea Connection Wireless Network Connection Security Logging You can create a log file for troubleshooting purposes Settings ICMP With Internet Control Message Protocol ICMP the Settings computers on a network can share error and status information Default Settings To restore all Windows Firewall settings to a default state Restore Defaults click Restore Defaults Choose ICMP Settings check Allow incoming echo request and choose OK 188 Mascot Installation and Setup ICMP Settings Internet Control Message Protocol ICMP allows the computers on a network to share error and status information Select the requests for information from the Internet that this computer will respond to T Allow incoming echo request A O Allow incoming timestamp request O Allow incoming mask request O Allow incoming router request Allow outgoing destination unreachable O Allow outgoing source quench O Allow outgoing parameter problem O Allow outgoing time exceeded O Allow redirect C Allow outgoing packet too big _ Description Messages sent to this computer will be repeated back to the sender This is commonly used for troubleshooting for example to ping a machine Requests of this type are automatically allowed if TCP port 445 is enabled Now go to the Exceptions tab and ensure that File and Printer Sharing is che
24. daemon file data F981122 dat daemon release MSDB 20020121 fasta daemon queries 8 daemon num_hits 6 H daemon h1 1A6K 103 1 00 17004 1 daemon h1_ text myoglobin sperm whale daemon reptype concise daemon S1igscoreprot 72 daemon lonquery1 734 992175 from 736 000000 1 daemon Llonquery2 746 992175 from 748 000000 1 daemon ionquery3 939 992175 from 941 000000 1 daemon ionquery4 1515 992175 from 1517 000000 1 daemon ionquery5 1591 992175 from 1593 000000 1 daemon ionquery6 1853 992175 from 1855 000000 1 daemon ionquery7 1980 992175 from 1982 000000 1 daemon ionquery8 2111 992175 from 2113 000000 1 daemon Selectpeptides 0 For an MS MS ions search the output is of the form daemon file data F981123 dat daemon release MSDB 20020121 fasta daemon queries 4 daemon num_hits 1 Fdaemon HH1 Q9XZI2 286 477 1 00 79480 1 daemon h1 text HEAT SHOCK PROTEIN 70 Crassostrea gigas Pacific oyster daemon reptype peptide daemon S1gscoreprot 72 daemon ionqueryl 1341 784350 from 671 900000 2 query 1 daemon scorel 95 12 daemon Sigscorel 49 daemon ionquery2 1614 584350 from 808 300000 2 query 2 daemon score2 74 55 daemon Sigscore2 48 daemon ionquery3 1945 784350 from 973 900000 2 query 3 daemon sScore3 89 84 daemon Sigscore3 47 daem
25. distribution of files and executables is all handled when Mascot Monitor starts Windows During Mascot installation on a Windows system the following dialog will be displayed 184 Mascot Installation and Setup Be Mascot Server Setup oC Cluster Configuration MATRIX Choose whether to use Mascot cluster mode SCIENCE Your mascot licence permits you to use cluster mode If you wish to enable this feature please select the option below and then click the Configure button to specify the nodes that will be in the duster Enable Mascot duster mode Configure At least one node must be defined in the cluster If you enable cluster mode the configure button invokes the following dialog Node Address Port Processors UNC Node Path Node Directory eases eat _Deete asoc escanas Choose Add to configure each cluster node Chapter 11 Cluster Mode 185 Mascot Cluster Node Se Enter the UNC path to the location on the node where Mascot will install its cluster node files Make sure that this directory path is unique to this node entry koala c mascotnode Browse Enter the equivalent of the above path as seen locally on the node c mascotnode Node Address The node name or IP address can usually be determined from the UNC path above However you may override these values below if desired V Use this specific host name koala 7 Use this specific IP address 192 168
26. engine is run by the web server as a CGI application It is also possible to execute the search engine as a console or command line application This Chapter provides the information that is required to write scripts or applications which interface to the Mascot search engine and associated programs Mascot Search Engine The Mascot search engine cgi nph mascot exe accepts command line arguments and a MIME format ASCII text file on standard input STDIN containing search data and parameters nph mascot exe 1 commandline f path taskID number sessionID string lt in asc The first argument is required and is a digit between 1 and 4 which determines the mode of operation 1 Normal search MS MS data if any form part of the MIME format input file 2 Monitor test mode 0 3 Monitor test mode 1 4 Repeat search the MIME format input file contains a reference to a Mascot results file which may contain MS MS data Optional argument commandline is a flag If present HTML formatted output is not written to STDOUT Optional argument f allows a result file path to be specified In the absence of this argument the result file will be written to a daily sub directory of mascot data and have the filename F123456 dat where 123456 is an auto incremented job number 110 Mascot Installation and Setup Optional argument taskID is used to specify a unique numeric identi fier This identifier should be
27. one or more accession strings and an optional text string describing the entry Apart from the use of the greater than character the precise syntax of the title line varies from database to database The title line is delimited from the sequence that follows by a platform dependent new line character The title line is followed by lines of contiguous sequence characters Line lengths vary between databases anything from 60 characters to a thou sand or more Mascot can handle lines up to 50 000 characters long The end of a sequence is indicated when the following line is either a new title line or the end of the file For example VYEYVRKYAEHRMLVVAEOPLHAMRKGLLDVLPKNSLEDLTAEDFRLLVNGCGEVNVOML ISFTSFNDESGENAEKLLOFKRWFWS IVERMSMTERQDLVYFWTSSPSLPASEEGFOPMP SITIRPPDDOQHLPTANTCISRLYVPLYSSKOILKOKLLLAIKTKNFGFV gt 104K THEPA P15711 104 KD MICRONEME RHOPTRY ANTIGEN MKFLILLFNILCLFPVLAADNHGVGPOGASGVDPITFDINSNOTGPAFLTAVEMAGVKYL OQVOHGSNVNIHRLVEGNVVIWENASTPLYTGAIVTNNDGPYMAYVEVLGDPNLOFFIKSG DAWVTLSEHEY LAKLOETROAVHIESVFSLNMAFOQLENNKYEVETHAKNGANMVTFIPRN Mascot doesn t search the Fasta file directly When a new database is recognised Mascot Monitor uses the Fasta file to create a set of com pressed files One reason for doing this to separate the sequence string from the title line because only the sequence string needs to be memory mapped In the case of a database with predominantly short sequences this greatly reduces the amount of memory r
28. t be more than 10 Mb All request parameter names are case insensitive Any parameter value can be optionally quoted DB mandatory parameter and can only appear once If several databases are searched than ms getseq must be called separately for each database ACCESSION must appear at least once and consist of entries in the format accession_string frameNo Quotes around accession strings are mandatory Frame number can be integer from 0 to 6 and can only be specified for NA databases Other wise an error will be reported Accessions can be delimited with com mas spaces tabs or new line characters Several ACCESSION fields will be merged by ms getseq exe into one internally SHOWPI can appear only once and if set to TRUE pi values will have to be calculated for each sequence and output SHOWTITLE can appear only once and if set to TRUE a description for each db entry has to be output SHOWLEN can appear only once and if set to TRUE a length of sequence string is output for each db entry SHOWSEQUENCE can appear only once and if set to TRUE a se quence string should be output for every db entry SHOWREFERENCE can appear only once and if set to TRUE refer ence lines should be output for each db entry SESSIONID an optional parameter and can appear at most once If no session ID is supplied then ms getseq can either process the request when security is disabled or try to retrieve the ID from cooki
29. taxonomy id SearchControl Any helper application can call bin ms searchcontrol exe to imple ment asynchronous automation of search submission Available com mands are status result_file_name result_file mime result_file ini results xmlresults create _task_id 134 Mascot Installation and Setup mascot_job number kill_ job pause_ job resume_job nice_ job set_to_ queued version ms searchcontrol exe status taskID lt number gt sessionID lt string gt The status command will return one of the following unknown_id referring to task ID id_assigned referring to task ID error nnnn running yy complete queued searchcontrol error nnn where error indicates an error in the search and will be the Mascot error number or one of TASK_ERROR_NO ERROR 0 TASK_ERROR_JOB_CRASHED 1 TASK _ERROR_JOB_KILLED 2 And searchcontrol error indicates a problem with the ms searchcontrol exe program Values will be one of ERR_TASKID NOERROR 0 ERR_TASKID FATLOPEN ERR_TASKID FATLCREATE ERR_ TASKID FATLREAD ERR_TASKID FATLWRITE ERR_TASKID FATLCLOSE ERR_TASKID CHANGEDRECORD ERR_TASKID INVALIDMASCOTDAT ERR_TASKID MISSINGRESULTSFILE 8 NAN UF WN HE Chapter 7 Program Reference 135 ERR_TASKID FILENAMETOOLONG 9 ERR_TASKID SESSTONTIMEDOUT 10 ERR_TASKID PERMISSTONDENTED 11 ms searchcontrol exe result file name taskID lt
30. this list of conditions and the following disclaimer 2 Redistributions in binary form must reproduce the above copyright notice this list of conditions and the following disclaimer in the documentation and or other materials provided with the distribution 3 The end user documentation included with the redistribution if any must include the following acknowledgment This product includes software developed by the Apache Software Foundation http www apache org Alternately this acknowledgment may appear in the software itself if and wherever such third party acknowledgments normally appear 4 The names Xerces and Apache Software Foundation must not be used to endorse or promote products derived from this software without prior written permission For written permission please contact apache apache org 5 Products derived from this software may not be called Apache nor may Apache appear in their name without prior written permission of the Apache Software Foundation THIS SOFTWARE IS PROVIDED AS IS AND ANY EXPRESSED OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED INNO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT INDIRECT INCIDENTAL SPECIAL EXEMPLARY OR CON
31. 01 The definition block for NCBI is number 8 and this contains the following TAXONOMY FOR NCBInr using GI2TAXID Taxonomy 8 Enabled FromRefFile ErrorLevel 1 0 to disable it 0 0 DescriptionLineSep 1 ctrl a hex code 1 For multiple descrip tions per entry SpeciesFiles NodesFiles DefaultRule number Identifier GI2TAXID gi_taxid_prot dmp NCBI names dmp NCBI nodes dmp NCBI merged dmp GI2TAXID CHOP gi 0 9 The gi NCBI protein FASTA using GI2TAXID AccFromSpeciesLine gi 0 9 End To turn off taxonomy for NCBInr set the enabled flag to 0 FromRefFile is set to 0 indicates that the taxonomy should be found in the fasta file rather than in a reference file ErrorLevel is set to zero to indicate the type of warnings or errors that are found when creating the taxonomy information If this is set to 0 then an entry is put into the NoTaxonomyMatch txt file for every sequence where no taxonomy information is found If it is set to 1 then an entry is put into the file NoTaxonomyMatch txt for every sequence that had any gi number in a sequence without a match Since some sequences will have up to 200 gi numbers sources there is a reasonable chance that some of these entries will not have species information and this would cause the errors files to become very large As mentioned earlier each entry in the description line is separated by a CTRL A so DescriptionLin
32. 12 Mixtures in decoy results if automatic decoy PMF 13 Peptides if SQ or MIS 14 Decoy peptides if SQ or MIS and automatic ET 15 Error tolerant peptides Gf SQ or MIS and automatic ET 16 Proteins if SQ or MIS 17 Query data one block for each query 18 Index 150 Mascot Installation and Setup General Notes 1 Values are shown in italics Label case doesn t matter 3 Labels are used to assist readability but kept short to minimise file size 4 Parameters are grouped logically 5 Order of blocks is not important except that the index block must be the last block Presence of blank lines within the index block may cause a problem 6 Because the MIME type is defined as an unknown application if this file passes through a mail agent it will be treated as an octet stream and encoded base64 for transmission Search parameters gc0p4J0q0M2Yt084jU534c0p Content Type application x Mascot name parameters USERNAME user name in plain text USEREMAIL email address in plain text SEARCH PMF COM search title text DB SwissProt CLE Trypsin MASS Monoisotopic MODS Mod 1 Mod 2 RULES 1 2 5 6 8 9 13 14 gc0p4Jqo0M2Yt08jU534c0p The Parameters section contains the complete set of parameter values from the search form apart from the contents of the uploaded data file or the query window Labels must be unique independent of case Where a parameter can be multivalued e g mods the values are list
33. 2500 vmemory 1048576 kbytes threads 1024 These values will be different for root and a normal user and possibly different again for the owner of CGI processes apache or www data Since you may not be able to log in as the CGI user it can be hard to find out what the real values are If a script or binary is failing in the web browser try running from the command line as both root and a normal user Changing the default limits There are different utilities configuration files on every system Refer to system documentation Detailed Information on each memory limit This section gives details about how the mascot software reports errors and tries to increase the limits where appropriate Virtual address space Mascot executables are compiled as both 32 bit and 64 bit programs If you use the 32 bit executables the amount of virtual address space is limited to 3 or 4 GB according to platform and this limit cannot be exceeded However default limit may be set lower than this by the operating system If memory cannot be mapped the error M00048 Failed to create memory map for filename Error detailed message will be displayed and put into the errorlog txt file The amount of memory that can be locked As well as the obvious limitation of physical memory there is generally a limit set on the amount of memory that can be locked Another fre quently used term for locked is wired On most systems memory ca
34. 3063 2364 3264 2 2 v R KPLVIIARDVDGEALS TLYLNR L 2 6l 828 1238 2481 3495 2481 3942 Ld a v R TALLDAAGVASLLTTARVVVTEIPK E e 62 828 1322 2481 3748 2481 3942 o 1 v R TALLDAAGVASLLTTAEVUVTEIPE E 64 854 0588 2559 1545 2559 2413 33 90 0 1 K LVQDVANWTWEEAGDG TTTATVLAR 21 04 2012 15 42 54 Mascot Installation and Setup 2 65 1038 5031 3112 4873 3112 5023 4 61 0 13 15 1 y K DMAIATGGAVEGEEGLTLNLEDYQPHDLGK V Oxidation M 2 CH D DROME Mass 60885 Score 175 Matches 4 2 Sequences 4 2 60 kDa heat shock protein mitochondrial OS Drosophila melanogaster GN Msp60 PE 1 SV 3 Check to include this hit in error tolerant search or archive report Query Observed Mx expt Mx calc ppm Miss Score Expect Rank Unique Peptide u 417 1822 832 3498 832 3828 39 57 0 45 0 018 i K APGPGONR K 27 617 2857 1232 5569 1232 5885 25 63 0 42 0 058 2 u R VGGSSEVEVNEK K 59 1163 1570 2364 2994 2364 3264 11 42 1 12 26 2 u R KPLVIIAEDIDGEALSTLVVNR L 64 054 0588 2559 1545 2559 2413 33 90 9 75 1 2 05 1 KK LVQOVANNTNEEAGDGTTTATVLAR A 3 CH60_CABEL Mass 60235 Score 139 Matches 3 3 Sequences 2 2 Chaperonin homolog Msp 60 mitochondrial OS Caenorhabditis elegans GNehsp 60 PE 1 SV Z Check to include this hit in rror tolerant search or archive report Query Observed Mr expt Mr calc ppm Miss Score Expect Rank Unique Peptide 12 427 1822 832 3499 932 3028 39 57 0 45 0 018 1 K APGPGONR K 39 752 8643 1503 7141 1503 7490 23 23 0 90
35. 65536 Maximum number of nodes in nodelist txt 4096 233 Web Server Configuration Mascot Directory Structure The Mascot directory structure is described in Chapter 2 Installation Linux Microsoft Internet Information Services The Mascot installation program automatically configures Microsoft IIS 5 0 or later CGI Timeout The CGI timeout is set to 1 day and any searches running longer than this will be terminated If you wish you can increase this timeout As of Mascot 2 2 the CGI timeout value is set only on the parent node w3sve 1 root mascot so that it is inherited by both the cgi and x cgi nodes If a value is also set on w8sve 1 root mascot cgi e g from a previous Mascot installation or set by an administrator then it will override any inherited value IIS 5 x and 6 0 2000 XP 2003 Server At the command prompt go to c Inetpub AdminScripts directory To get the value of the current Mascot cgi timeout cscript adsutil vbs get w3svc 1 root mascot cgi cgitimeout If you get an error message saying not set at this node go up one level to the mascot node escript adsutil vbs get w3svc 1 root mascot cgitimeout If you still get an error message saying not set at this node then you should set a value at this node If cgitimeout was already set at this node 234 Mascot Installation and Setup or at the cgi node you can change the value The default value as set by Mascot will be 86400 se
36. Sequences 1 1 60 kDa chaperonin OS Xanthomonas campestris pv campestris GNegrol PE 3 Svel CH60 XANOM Mass 57121 Score 42 Matches 1 1 Sequences 1 1 60 kDa chaperonin OS Xanthomonas oryzae pv oryzae strain MAFF 311018 GNegroL PE 3 sV 1 CH60 XANOP Mass 57121 Score 42 Matches 1 1 Sequences 1 1 60 kDa chaperonin OS Xanthomonas oryzae py oryzae strain PXOS9A GNegrol PE 3 SV 1 CH60_XANOR Mass 57121 Score 42 Matches 1 1 Sequences 1 1 60 kDa chaperonin OS Xanthomonas oryzae py oryzae GNegroL PE 3 SV 1 Peptide matches not assigned to protein hits no details means no match Query Observed Mx expt Mz calc ppm Miss Score Expect Rank Unique Peptide 33 724 3649 1426 7153 1426 8078 64 84 2 20 9 2 2 2 9 747 3962 746 3089 746 3381 68 1 a8 23 1 z 14 442 2283 882 4421 882 5267 98 15 1 17 s 2 30 663 8379 1325 6612 1325 7667 79 54 0 15 23 z 662 2756 661 2683 661 3217 80 80 0 14 a 3 2 6 673 3495 672 3422 672 3555 19 74 12 e2 2 23 1201 6217 1100 6144 1100 6012 12 0 12 6 2 2 8 714 3725 713 3652 713 4072 58 79 0 a 6 2 LAPAQSK a 57 747 0361 2238 0864 2238 1089 10 08 10 52 1i z 38 749 3840 1496 7534 1496 7657 8 21 0 9 1e 02 1 ad 22 1101 5366 1100 5293 1100 5349 5 09 9 1 302 1 z 29 642 3536 1282 6926 1282 7357 33 64 0 a 1 3602 1 a 2 500 2560 499 2487 499 2278 41 9 8 64 1 z 19 932 3644 931 3571 931 4433 92 57 1 se 1 56 1119 0452 2236 0758 2236 0947 8 44 6 1 5602 2 e 28 642 3526 1
37. a notice that there is no warranty or else saying that you provide a warranty and that users may redistribute the program under these conditions and telling the user how to view a copy of this License Exception if the Program itself is interactive but does not normally print such an announcement your work based on the Program is not required to print an announcement These requirements apply to the modified work as a whole If identifiable sections of that work are not derived from the Program and can be reasonably considered independent and separate works in themselves then this License and its terms do not apply to those sections when you distribute them as separate works But when you distribute the same sections as part of a whole which is a work based on the Program the distribution of the whole must be on the terms of this License whose permissions for other licensees extend to the entire whole and thus to each and every part regardless of who wrote it Thus it is not the intent of this section to claim rights or contest your rights to work written entirely by you rather the intent is to exercise the right to control the distribution of derivative or collective works based on the Program In addition mere aggregation of another work not based on the Program with the Program or with a work based on the Program on a volume of a storage or distribution medium does not bring the other work under the scope of this Licen
38. an empty string false FALSE False 0 off All missing parameters are defaulted to false value Translation table number is always output as well as taxonomy Id and scientific name Output format In response to any POST request XML format output is returned Encoding UTF 8 is to be used for output XML output is schema vali dated and schema versioned All XML output must be XML escaped using the following substitutions gt amp gt lt amp lt amp amp amp 6 amp apos amp quot Taxonomy information is returned in the order requested A lt msgs frame gt element will only be output for an NA database The example input file would produce output similar to this edited for brevity lt xml version 1 0 encoding UTF 8 standalone no gt lt msgt ms_gettaxonomy_out xmlns msgt http www matrixscience com xmlns schema msgettaxonomy_1 majorVersion 1 minorVersion 0 xmlns xsi http www w3 org 2001 XMLSchema in stance xsi schemaLocation http www matrixscience com xmlns schema msgettaxonomy_1 msgettaxonomy_1 xsd gt lt msgt results jobid 874 gt lt msgt db_ entry gt lt msgt db gt SwissProt lt msgt db gt lt msgt accession_str gt RL19 YEAST lt msgt accession_str gt lt msgt title gt gt sp P05735 RL19 YEAST 60S ribosomal protein L19 OS Saccharomyces cerevisiae GN RPL19A PE 1 SV 5 lt ms gt title gt lt msgt all_ accessions gt lt msgt accession gt 13
39. be found in the file config apache conf Installation Script Step 1 Web Server Operation Launch a JavaScript aware web browser and navigate to the URL corresponding to install html e g http your domain mascot install html Follow the instructions on this web page and those that follow to perform some simple system checks and create or update the Mascot configura tion file mascot dat Step 2 Perl If you get an error message or a File Save As dialog box after click ing on the Test Perl button then Perl is not functioning correctly This must be corrected before proceeding Possible reasons for this problem are listed along with useful links e Perl is not installed or was installed incorrectly e Perl or a soft link to Perl was not found at usr local bin perl e The mascot cgi directory is not configured for CGI execution e JavaScript is disabled If Perl is functioning correctly the next page displays the Perl version number If any of the required modules are missing there will be a warning Instructions for installing the required modules can be found towards the beginning of this chapter Step 3 GD Graphics Library Assuming Perl and the required modules are present click on the button to test the GD Graphics library If GD is installed and working the next page contains a small graphic to confirm this GD 0K If you do not see this picture or get an error message refer to
40. cne Chapter 3 Installation Microsoft Windows 27 The Microsoft web server for Vista is IIS 7 0 which is provided as part of the standard distribution A default installation of IIS 7 0 does not support running a CGI application such as Mascot From the Control Panel choose Programs and Features Choose Turn Windows features on or off Expand the node for Internet Information Services and ensure that all the checkboxes shown below are checked in addition to any default selections Then choose OK In Vista Home Premium the IIS 7 simultaneous request execution limit is 3 In Vista Business Enterprise and Ultimate Editions the limit is 10 This will limit the number of simultaneous searches that can be run from a simple web browser form Server 2008 including R2 Mascot will run under all Server 2008 editions except for Core It is advisable to ensure that the latest service pack has been installed Check the following URL for current information http msdn microsoft com en us windowsserver bb794698 http support microsoft com ph 1163 tab1 The Microsoft web server for Server 2008 is IIS 7 0 and 7 5 for Server 2008 R2 From the Control Panel choose Turn Windows features on or off to launch Server Manager Select Go to Roles scroll down to Web Server IIS and choose Add Role Services Then follow the configuration notes under the Windows Vista section above Windows 7 Mascot will run under all Windows 7 edit
41. command line This pseudo user is always used when running programs from the com mand line and can perform any task without restriction This user doesn t appear in the security administration utility and hence the account cannot be deleted or disabled The userid is 3 daemon This user should be used to run searches in Mascot Daemon See the Mascot Daemon help for details The user account is disabled by default so it will need to be enabled and before use The userid is 4 public_searches This is a pseudo user that is used for the example searches This user doesn t appear in the security administration utility and hence the account cannot be deleted or disabled It isn t possible to login as this user The userid is 5 system The Mascot Integra system account is used to query data on the Mascot server Do not change the name of this account or the type of the account There is no password associated with this account since it can only be called from the secure Mascot Integra server The userid is 6 Types of user Six types of user are available and the appropriate type should be selected using a the drop down list in the administration screen Standard Mascot User The user name and password are stored by Mascot 218 Mascot Installation and Setup Mascot Integra User The password password expiry and timeouts for these users are set in the Mascot Integra administration screens The standa
42. conforms to a client server architecture and the primary user interface is a JavaScript aware web browser Searches can be submitted from web browser forms customised for different types of searches or from a variety of client software Mascot Daemon is a client application bundled with Mascot Server for batch automation of search submission Mascot Distiller is a powerful application licensed separately that can process a wide range of native file formats into peak lists submit searches to a Mascot Server and import the search results for examina tion or further processing There are also a number of third party clients including many mass spectrometry data systems that support search submission to Mascot 2 Mascot Installation and Setup In most cases the Mascot search engine is executed as a CGI program On completion of a search it calls a Perl CGI script that reads the re sults file and returns an HTML report or some other machine readable MS Data System Mass Spectrometer Server Mascot HTML amp CGI scripts search results Mascot Search Engine i Public sequence databases FASTA FTP Database Management sequence databases digest of the results to the client Links to additional CGI scripts provide more detailed views of the results Chapter 1 Introduction 3 Mascot Components In this manual server
43. core Networking Destination Unreacha Core Networking core Networking Destination Unreacha Core Networking core Networking Dynamic Host Config Core Networking core Networking Dynamic Host Config Core Networking core Networking Internet Group Mana Core Networking Core Networking IPHTTPS TCP In Core Networking core Networking IPv6 IPv6 In Core Networking e Filter by Group Refresh There will be two entries for Apache one for UDP protocol and one for TCP Double click the TCP row On the Protocols and ports tab configure as shown Generel Programs and Services Protocols and Ports Scope Advanced Protocols and ports a Protocol type Protocol number Specific Forts 80 Example 80 443 5000 5010 All Ports x Example 80 443 5000 5010 Intemet Control Message Protocol ICMP settings Chapter 3 Installation Microsoft Windows 43 On the Advanced tab check all three profiles domain private public and Apply Back in the top level dialog with the Apache TCP row se lected choose Enable rule Security Mascot security is disabled on installation To enable Mascot security refer to Chapter 12 Miscellaneous LCQ DTA This utility an option on the Mascot search form selection page makes it possible to a upload a Thermo raw file as opposed to a peak list w
44. dat taxonomy definition to specify at a database level which code is to be used For further information on genetic codes see http www ncbi nlm nih gov Taxonomy Utils wprintgc cgi mode c Modifying the Taxonomy lineage link In the protein view a link to taxonomy lineage is shown Chapter 9 Taxonomy 171 E Mascot Search Results Protein View Microsoft Internet Explorer file Edit View Go Favorites Help eee oe a ff 3 Back F Stop Refresh Home Search Favorites History Channels Fullscreen Mail Print Address je http g6 400 mascot cgi protein_view pl file data 19990922 F001 239 dat amp hit 2 z l Links a bde xce Mascot Search Results Protein View Match to 143E_HUMAN 14 3 3 PROTEIN EPSILON MITOCHONDRIAL IMPORT STIMULATION FACTOR This sequence is common to the following entries 143E_HUMAN from Homo sapiens 143E_HUMAN from Rattus norvegicus 143E_HUMAN from Mus musculus 143E_HUMAN from Ovis aries Nominal mass of protein M 29155 Cleavage by Trypsin cuts C term side of KR unless next residue is P Matched peptides shown in Bold Red 1 MDDREDLVYQ AKLAEQAERY DEMVESMKKV AGMDVELTVE ERNLLSVAYK 51 NVIGARRASW RIISSIEQKE ENKGGEDKLK MIREYRQMVE TELKLICCDI 101 LDVLDKHLIP AANTGESKVF YYKMKGDYHR YLAEFATGND RKEAAENSLYV 151 AYKAASDIAMN TELPPTHPIR LGLALNFSVF YYEILNSPDR ACRLAKAAFD 201 DAIAELDTLS EESYKDSTLI MQLLRDNLTL WTSDMQGDGE EQNKEAL
45. data ppm 243501029130836 Content Disposition form data 243501029130836 Content Disposition form data 0 1 243501029130836 Chapter 8 I O File Formats 147 name COM name DB name CLE name PFA name QUANTITATION name TAXONOMY name MODS name IT MODS name TOL name TOLU name PEP ISOTOPE ERROR name TTOL 148 Mascot Installation and Setup Content Disposition form data name ITOLU Da 243501029130836 Content Disposition form data name CHARGE 1 243501029130836 Content Disposition form data name MASS Monoisotopic 243501029130836 Content Disposition form data name FILE filename test_search mgf Content Type application octet stream BEGIN IONS PEPMASS 498 272888 CHARGE 1 157 096962 23 72 185 160000 26 69 286 134951 80 7 385 210000 13 49 2000 120000 3 142 2000 568167 4 108 2001 020697 2 098 2001 820000 1 103 END IONS 243501029130836 Content Disposition form data name FORMAT Mascot generic 243501029130836 Content Disposition form data name PRECURSOR 243501029130836 Content Disposition form data name INSTRUMENT ESI QUAD TOF 243501029130836 Content Disposition form data name REPORT AUTO 243501029130836 Chapter 8 I O File Formats 149 Results File The results file contains the search results together with the search input parameters and peak li
46. files automatically to a specified schedule This section describes how to update the files for a database if your Mascot Server is not connected to the Internet or if you choose not to use Database Manager When a new release of a database becomes available it should be copied or downloaded into the incoming directory In many cases the downloaded file will have to be de compressed The filename may or may not be constant from release to release The Fasta database should be renamed to a name that includes a version or date stamp and matches the wild card path for the database then moved to the current directory Never copy a large file to the current directory under its final name because this will take time and the ex change process may be triggered prematurely If you are using a local reference file rename and move this file first Otherwise the exchange process will be triggered by the appearance of the Fasta file but will immediately fail because the new reference file is not yet available Note that Fasta and reference files must have identical names apart from the filename extension Once Mascot Monitor sees a new Fasta file that matches the wild card path for the database it will begin the exchange process Progress can be monitored from the Mascot Database Status page Obtaining Fasta files If your Mascot Server has an Internet connection and you are able to use Database Manager ignore the information in this secti
47. for clarity The individual columns contain the following information Column 1 Mascot job number Job numbers are allocated sequentially but will appear in the log in the order in which searches are completed If the submitted search contained an error which prevented the search starting there will be no entry in searches log but there should be an entry in errorlog txt Column 2 Process ID Column 3 Sequence Database searched Column 4 User name User names are required by the JavaScript search forms but not by the search engine so this field may be empty If an entry logs utility program activity rather than a search this field contains the name of the utility e g TESTPARSE or GETSEQ Column 5 User email address User email addresses are required by the JavaScript search forms but not by the search engine so this field may be empty Column 6 Search title Empty if none supplied Column 7 Relative path to Mascot search results file Column 8 Start time in the format illustrated in the example above Column 9 Duration in seconds Column 10 Completion Status normally User read res If EmailUsersEnabled is set to 1 and the user disconnected before the search was complete this entry would read user emailed Column 11 Job Priority Not currently implemented 106 Mascot Installation and Setup Column 12 Type of search PMF SQ or MIS Column 13 Enzyme Either yes if user selected an enzyme
48. freely A line which starts with the character pound in the US hash in Europe is a comment line Databases Do not modify this section if you ever use Database Manager Databases NCBInr c inetpub mascot sequence NCBInr current NCBInr_ fasta AA 1234 1411 10067 0 8 SwissProt c inetpub mascot sequence SwissProt current SwissProt_ fasta AA 1234 15 11 101 33 13 15 3 end A line that is commented out with a character at the start is an inac tive database definition Each line defines a database using the following 14 parameters 76 Mascot Installation and Setup 1 Name Each database must have a unique name Ideally the name should be short and descriptive Note that these names are case sensi tive and much confusion can be caused by creating say Sprot and SPROT The name does not need to be the same as or even similar to the filename of the actual FASTA file Allowed characters are alphanumerics and S amp 2 Path FASTA database files must be available locally Mascot creates its compressed files in the same directory as the original FASTA file The location of the FASTA file is defined in the Path field This must be the fully qualified path to the FASTA file with a wild card in the filename to allow incoming and outgoing database files with different version or date stamps to be present in the current directory simultaneously The delimiters between directories must always be forward slashes even if Mas
49. i pubwww1 i pubwww1 Status Currency Converter B MS Bugs amp FAQ E Twiki OB Family report amp 2009_ASMS_Fall_Wor Gil MascotImproveRepott Mascot search status page MASCOT search status page Version 2 3 00 Licensed to Matriz Science Internal Test 10 processors Using 5 nodes and 10 processors 0 searches running Search log monitor log ferror log Error message descriptions nodelist tzt De not auto refresh this page SwissProt Family C Inetpub Mascot sequence SwissProt current SwissProt_ fasta SwissProt_57 12 fasta Pathname C Inetpub Mascot sequence SwissProt current SwissProt_57 12 fasta Status In use Statistics Unidentified taxonomy State Time Sun Dec 20 17 11 19 searches 0 Mem mapped YES Request to mem map YES Request unmap NO Mem locked YES Number of threads 1 Current YES Name Filename oho od Cluster Nodes Node IP Address Os Responding Physical Memory Swap file Disk space sleepy 192 168 70 1 Windows NT OK 62 free 100 free O 53 free dopey 192 168 70 2 Windows NT OK 62 free 99 free 53 free grumpy 192 168 70 3 Windows NT OK 62 free 99 free 53 free bashful 192 168 70 4 Windows NT OK 63 free 99 free 54 free happy 192 168 70 5 Windows NT OK 62 free 99 free O 54 free If all is well you will see rows of ha
50. in httpd conf in the Mascot config directory Also ensure that ForkForUnixApache in the Options section of mascot dat is set to 1 Further information on web server configuration can be found in Appendix D Installation is finished but don t clear the checkbox Licence Registration If you cleared the checkbox at the end of the installation wizard from the Windows Start menu choose Programs Mascot Admin Database Status The following screen will be displayed in your default web browser Nr A ha Ole http ec vm64 mascot x cgi ms status exe Shc O X Register product key Mascot Server Product Key Registration View current licence information View database status You are about to be transferred to the Matrix Science licensing website to register a new product key When the registation process has been completed a licence file will be sent via e mail which should then be saved into the following directory on this server C inetpub mascot config licdb Register Online Now m7 Offline Registration If you are unable to view this page from a computer that can access the Intemet then click the button below to download a product registration file You can then transfer this file to another computer which does have Internet access and open the URL shown below in a web browser When prompted select the registration file that you saved http www matrixscience com licensing register Save Re
51. included the E I accept the terms in the Licence Agreement Printo mBan Net ks z NEEDIS Jaaa i Product Key E MATRIX Prepare for product key registration SC TEN CE Please ensure that you have a product key for Mascot Server 2 3 241RC1 before proceeding After the software is installed you will be required to register this product key in order to obtain a licence to use the software Mascot Server will not function until the registration process has been successfully completed If you received this software on physical media from Matrix Science the product key can be found on a printed label attached to the case Example A12B C34D E56F G78H I90 This is a reminder that you will need to register a product key to create a licence file This product key may be printed on a sticker on the CD case 32 Mascot Installation and Setup or it may have been sent by email If you cannot locate your product key contact support matrixscience com for assistance The next screen allows you to choose which components will be installed iy Mascot Server Setup o me Custom Setup MATRIX Select the way you want features to be installed fi C TEN CE Click the icons in the tree below to change the way features will be installed Main application components for Mascot Server amp IIS Web Site X Apache Web Site B r Sequence Databases EM SwissProt 2012_03 This feature requires 490MB on you
52. is there a taxonomy section in mascot dat for that number When the compressed files are built the taxonomy index has the name database_name t00 If this file doesn t exist for the database it may be necessary to stop Mascot Monitor delete the stats file for the database and restart Monitor How Mascot gets a species ID for sequences This section contains complex configuration information It is normally only necessary to read and understand this section when adding a new database of a different type When ms monitor creates the compressed files it also makes a file containing the taxonomy ID s for each sequence To do this it needs to follow certain rules These rules are defined in mascot dat The rule number for each database is specified as the 14 parameter in the databases section of the mascot dat file To help explain these rules the following sections describe these rules for NCBInr SwissProt and EST_others All text searches and comparisons are case insensitive except where stated Taxonomy definition keywords in the mascot dat file are also case insensitive Several taxonomy definition blocks are obsolete and retained only for backwards compatibility Only the current definitions are described below 174 Mascot Installation and Setup NCBInr This non identical protein database from the NCBI may contain multiple title lines for each sequence The titles are separated by a control A character code
53. it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 2 1 of the License or at your option any later version This library is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU Lesser General Public License for more details You should have received a copy of the GNU Lesser General Public License along with this library if not write to the Free Software Foundation Inc 59 Temple Place Suite 330 Boston MA 02111 1307 USA Also add information on how to contact you by electronic and paper mail You should also get your employer if you work as a programmer or your school if any to sign a copyright disclaimer for the library if necessary Here is a sample alter the names Yoyodyne Inc hereby disclaims all copyright interest in the library Frob a library for tweaking knobs written by James Random Hacker xxiv Mascot Installation and Setup signature of Ty Coon 1 April 1990 Ty Coon President of Vice That s all there is to it Regex Copyright 1992 1993 1994 Henry Spencer All rights reserved This software is not subject to any license of the American Telephone and Telegraph Company or of the Regents of the University of California Permission is granted to anyone to use this software for an
54. mixtures found hl_score total score for mixture 1 hl _numprot number of proteins in mixture 1 hl_nummatch number of queries matched hl _ml accession string for protein component 1 hl _m2 accession string for protein component 2 160 Mascot Installation and Setup hl_mm accession string for protein component m h2_score hn_mm gc0p4Tq0M2Yt 08jU534c0p The Mixture section is only output for a peptide mass fingerprint If any statistically significant protein mixtures are found the mixture compo nents are summarised For details of individual components use the accession strings to refer back to the Summary section If this is an automatic decoy database search a second mixture block appears containing the second set of results The section name is decoy_mixture The syntax of the contents is identical Peptides gc0p4Jq0M2Yt084jU534c0p Content Type application x Mascot name peptides ql_pl missed cleavages 1 indicates no match peptide Mr delta number of ions matched peptide string peaks used from Ions1 variable modifications string ions score ion series found peaks used from Ions2 peaks used from Ions3 accession string data for first protein frame number start end multiplicity accession string data for second protein frame number start end multiplicity etc ql_pl_et_mods modification mass neutral loss mass modification description ql_pl_et_mods master neutral loss m
55. number gt sessionID lt string gt This will return either the results file name filename lt filename gt or searchcontrol error nnn with values of nnn as for status Note that lt filename gt may be empty for some states this is not an error This may then be used from the command line for other applications to provide functionality that is not in ms searchcontrol exe For example a client application needs the USER name from a search In this case a perl script getusername pl could be written that takes the passed unique task ID finds the results file name using ms searchcontrol exe result_file name and then looks for the user name in the results file ms searchcontrol exe result file mime taskID lt number gt sessionID lt string gt This will return the results file as a mime format file or searchcontrol error nnn with values of nnn as for status ms searchcontrol exe result file ini taskID lt number gt sessionID lt string gt This will return the results file as a windows ini format file or searchcontrol error nnn with values of nnn as for status 136 Mascot Installation and Setup ms searchcontrol exe results taskID lt number gt sessionID lt string gt If the job is complete then this will return the search results in a format recognised by Mascot Daemon For a peptide mass fingerprint the output is of the form
56. obtained from the SearchControl utility described later in this chapter By specifying an identifier progress reports and search results can be obtained asynchronously from SearchControl Optional argument sessionID is used to specify a Mascot security session identifier see Chapter 12 The file piped to STDIN must be a MIME format file containing the search parameters and mass spectrometry data Monitor test mode has a different syntax nph mascot exe 2 3 path number lt in asc Required argument path is the path to a flag file e g data test SwissProt_2011_ 06 fasta bu253neb5renpqtv2jiiannc2y testedok and optional argument number is the cluster number The input file e g data test SwissProt asc is created automatically from the do_not_delete asc template The Monitor application must be running before search engine can be invoked During search execution warnings errors progress reports etc are written to standard output STDOUT This output is formatted as HTML text for viewing on a web browser If the search engine is not being executed as a CGI application the calling application may need to parse the output to remove the HTML tags When a search is complete an HTML string is written to STDOUT which causes the client browser to invoke the script defined in mascot dat for displaying a results report master _results pl or master results 2 p1 Ifthe search engine is not being executed as a CGI appli
57. or no if user selected enzyme type None Column 14 User IP address Monitor Log Mascot Monitor activity such as sequence database exchange is logged to logs monitor log The following extract shows a typical example of the contents Fri Apr 20 17 21 28 2012 ms monitor 2 4 0 started Fri Apr 20 17 21 28 2012 Locked memory for file data mascot control Fri Apr 20 17 21 28 2012 Waiting for valid licence Fri Apr 20 17 30 28 2012 Licensed to Edman University XQ5P TFRR 3APW FB33 7H6X Fri Apr 20 17 30 28 2012 Starting up to Checking that Mascot Nodes exist Fri Apr 20 17 30 28 2012 Checking that Mascot Nodes exist to Loading DB informa tion Fri Apr 20 17 30 28 2012 Loading DB information to Started up success fully Fri Apr 20 17 30 29 2012 SwissProt0O Not in use to Preparing to run ist test Fri Apr 20 17 30 29 2012 SwissProt0O Preparing to run lst test to Waiting Fri Apr 20 17 30 30 2012 SwissProt0O Waiting to About to compress files Fri Apr 20 17 30 30 2012 SwissProt0O About to compress files to Creating compressed files Fri Apr 20 17 30 33 2012 Creating compressed files from usr local mascot sequence SwissProt current SwissProt_2012 03 fasta Fri Apr 20 17 30 33 2012 Creating compress file usr local mascot sequence SwissProt current SwissProt_2012 03 100 Fri Apr 20 17 30 33 2012 Creating compress file usr local mascot sequence SwissProt current SwissProt_2012_03 s00 Fri Apr 20
58. proteins EMBL EST TAXONOMY FOR EMBL EST Taxonomy 13 Identifier EMBL EST Fasta Enabled 1 0 to disable it FromRefFile 0 ErrorLevel 0 SpeciesFiles ACC2TAXID acc_to taxid mapping txt NCBI names dmp NodesFiles NCBI nodes dmp NCBI merged dmp DefaultRule ACC2TAXID CHOP gt EM_EST A Z0 9 GencodeFiles NCBI gencode dmp MitochondrialTranslation 0 end The ACC2TAXID identifier is used to identify a file that contains a simple mapping of accession to taxonomy ID It has two values per line Accession taxonomyID For example A00001 10641 A00002 9913 Chapter 9 Taxonomy 177 where A00001 and A00002 are accessions and 10641 is the NCBI tax onomy id for Cauliflower mosaic virus and 9913 is the NCBI taxonomy id for Bos taurus The accession and ID can be separated using any white space The acc_to_taxid mapping txt file from the EMBL contains entries for all the EMBL EST databases so is very large approximately 3Gb The file is created at the start of each EMBL release every 3 months and so does not include the latest entries in the cumulative Fasta files Performance when creating the compressed files is faster if the order of entries in the taxonomy file is the same as the order of sequences in the fasta file When the ACC2TAXID file is first used or is updated lookup files cdb are created in the taxonomy directory These files are only used when compressing the database and are not
59. re centroiding profile data Must be a floating point number between 0 and 10 Re centroiding is applied whenever the number of peaks in a single scan exceeds CentroidWidthCount DecoyTypeNoEnzyme 3 DecoyTypeSpecific 1 These parameters determine how decoy sequences are created for Mascot Auto decoy searches DecoyTypeSpecific applies to MS MS searches using fully specific or semi specific enzymes DecoyTypeNoEnzyme applies to MS MS searches with no enzyme For PMF random protein sequences are used whatever the settings For NA databases the sequences are randomized before translation Classifications are based on G Wang et al 2009 Decoy Methods for Assessing False Positives and False Discovery Rates in Shotgun Proteomics Anal Chem 81 1 146 159 Values supported in Mascot 2 4 are 1 Reverse the sequence of each protein entry Chapter 6 Configuration amp Log Files 83 3 For each protein entry generate a random sequence of the same length with the composition based on the average composition of the whole database This is the default in Mascot 2 3 and earlier 4 Digest each protein sequence into peptides then generate a random sequence for each peptide but keep the same terminal residues and don t introduce new cutting sites EmailErrorsEnabled 0 EmailFromTextName EmailFromUser EmailPassword EmailProfile EmailService EmailTimeOutPeriod 120 EmailUsersEnabled 0 ErrMessageEmailTo MailTempFile C TEMP MXXXXXX
60. returned task_id e You can monitor control the running search using ms searchcontrol exe A simpler static system could be implemented by adding a SUBCLUSTER command to a Daemon parameter set SwissProt SCl1 par might contain SUBCLUSTER 1 so selecting this for a task would direct searches to sub cluster 1 etc Database Status If multiple sub clusters are defined the database status screen ms status exe only shows one sub cluster at a time An additional summary table is shown at the bottom of the page with links for the other sub clusters 214 Mascot Installation and Setup 215 Security Overview The security model allows a Mascot administrator to e Prevent un authorised changes of Mascot server configuration files using for example the database maintenance utility e Restrict access to results files and sequence databases based on group and user definitions e Provide standard session support with time outs so users do not need to continually re enter passwords e Restrict access to Mascot server based utilities that allow dele tion of searches and other job control functions e Provide read only access to configuration files for third party applications without requiring login e Optionally allow submission of searches etc for 3rd party appli cations without a login e Switch OS platform painlessly if Mascot or Mascot Integra authentication is used e Easily set up Mascot Daemon
61. should only be run if search fulfills criteria for running Percolator The title string will be displayed in the search progress while the process is running This string must not contain a comma The command string can include literals and also the following tags which will be substituted at run time Tag Replaced with resultfilepath Relative path from the cgi directory to the results file resultfilename File name part of resultfilepath percolator_pip Relative path from the cgi directory to the Percolator input file percolator_decoy_pop Relative path from the cgi directory to the Percolator output file for the decoy matches percolator_target_pop Relative path from the cgi directory to the Percolator output file for the target matches session_id is the session identifier of the logged in user when Mascot Security is enabled task_id is the task identifier assigned using client pl when called from client applications PercolatorExeFlags The value of PercolatorExeFlags Paths to executables and any paths included as arguments should use forward slashes and should not include spaces FeatureTableLength 30000 If a nucleic acid sequence is longer than 30000 bases the protein view report will automatically switch to feature table mode and output the matches as a GenBank feature table The threshold for switching to feature table mode can be altered using the parameter FeatureTableLength in the Options section of mascot dat o
62. source and binary forms with or without modification are permitted provided that the following conditions are met Redistributions of source code must retain the above copyright notice this list of conditions and the following disclaimer Redistributions in binary form must reproduce the above copyright notice this list of conditions and the following disclaimer in the documentation and or other materials provided with the distribution Neither the name of the University of Chicago nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission THIS SOFTWARE IS PROVIDED BY THE UNIVERSITY OF CHICAGO AND CONTRIBU TORS AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED IN NO EVENT SHALL THE UNIVERSITY OF CHICAGO OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT INDIRECT INCIDENTAL SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES INCLUDING BUT NOT LIM ITED TO PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE II Copyright c 1995 1998 The University of Ut
63. the fragment peaks which may give rise to spurious fragment ion matches It is usually best if the precursor is removed before the search With the default arguments of 1 1 a smart filter is created This removes peaks within the fragment ion tolerance window about each of the precursor isotope peaks The number of isotopes is assumed to be as follows Mr Number lt 1000 3 1000 1999 4 2000 2999 5 3000 3999 6 4000 4999 7 5000 5999 8 6000 6999 9 gt 7000 10 So if the precursor m z was 800 the charge was 2 and fragment ion tolerance was 0 1 Da the filter would remove 4 notches of width m z 800 0 0 1 m z 800 5 0 1 m z 801 0 0 1 m z 801 5 0 1 At first sight this may seem a strange mix of m z and Da The reason is that we need to avoid matches from 1 fragment ions whatever the charge on the precursor If the arguments are anything other than 1 1 a single notch is used where the first argument is the mass offset of the beginning of the notch and the second value is the mass offset of the end of the notch For the precursor in the last example if the arguments were 1 4 then the notch would run from m z 799 5 to m z 802 0 However if the precursor charge was 1 then the notch would be from m z 799 to m z 804 The mascot dat setting can be over ridden in a search by using the search parameter CUTOUT Note that the peaks removed by this filter are not recorded in the resul
64. the Mascot authenti cation A typical case for this might be for a service lab manager running Windows and IIS with integrated authentication This user would not typically want to create a separate Windows login account for the admin istrator but would choose to login explicitly as administrator to update configuration files etc For an Apache server with authentication switched on most users would want to be set to use the authenticated login Users New users are added using the Mascot security administration utility There are 6 special system user accounts guest The guest user is not enabled by default If this account is enabled then any user is automatically logged in as guest and needs to explicitly login as another user to gain further access rights The guest account cannot be deleted but the account can be disabled The userid is 1 Chapter 12 Security 217 admin This account should be used to perform administration on the Mascot server It is recommended that you always log in as administrator to perform security and other administration rather than assign adminis trator rights to another user The administrator account cannot be deleted or disabled and the admin user cannot be removed from the administrators group By default the administrator can access all the administrator screens but cannot submit searches The userid is 2 The initial password for admin is admin but this must be changed on first login
65. the add_user pl script in the mascot bin directory can be used Usage add_user pl u username p password x password expiry f fullname e email address g group to which user should belong The password expiry should be 0 for never expires or 1 to force the user to change the password when they first log in Chapter 12 Security 223 Resetting the administrator password If the admin user password is lost the easiest way to reset it is to re run enable_security pl from the command line as described above This will not affect any existing groups or users but will just reset the password User ID The user ID for each search is saved in the results file If security is disabled then the search ID will be set to zero Special user IDs are listed above Other users will have an automatically assigned IDs start ing at 1000 224 Mascot Installation and Setup 225 Basic Regular Expressions equence database parsing in Mascot is defined using rules which conform to Basic Regular Expression BRE notation as defined in standard ISO IEC 9945 2 1993 BRE notation is widely used in Unix e g in the grep command but it may be less familiar to those from a DOS or Windows background Man pages containing a rigorous defini tion of BRE notation can be found on most Unix systems The following description is much simplified and is intended to provide just enough information to understand the existing rules in mascot
66. the first part of this chapter for information on installing GD pm 12 Mascot Installation and Setup Step 4 Configuration Mascot Installation So CS fi Obogong t 2_4_0_64 cgi install2 pl Mascot Installation You should see a blue rectangle below containing the text GD OK in red GD OK If you do not see this picture then the GD library is not functioning correctly Step 4 Configuration Configure Mascot for single server operation oO Configure Mascot for master node of a cluster Click here to continue Configure Mascot Copyright 2011 Matrix Science Ltd All Rights Reserved Indicate whether you plan to configure Mascot as a single SMP server or a cluster and choose Configure Mascot If this is a version upgrade the main configuration file mascot dat will be updated If it is a clean install a new mascot dat will be created Chapter 2 Installation Linux 13 Step 5 Start Mascot Monitor Mascot Installation Step 5 Start Mascot Monitor As root enter the following at a shell prompt ed usr local mascot_2_4 0 64 bin ms monitor exe Then follow this link to the Mascot Database Status page Unless there is already a valid licence file in place for this version of Mascot you will be asked to register your product key The system will be ready for searches when all databases show as In Use Copyright 2011 Matrix Science Ltd All Rights Reserved
67. this section if you ever use Database Manager The WWW section defines where CGI scripts look for the information needed to compile a results report At least one line is required for each database to define the source from which the sequence string of a database entry can be obtained A second line can optionally define the source from which the full text report of an entry can be obtained The syntax is very similar in both cases inde pendent of whether the information originates locally or on a remote system Sequence strings can always be retrieved locally because the FASTA file must be present on a local disk The Mascot utility ms getseq exe is normally used to retrieve a sequence string If full text for an entry is available locally and the database has been defined as including a ref file Column 10 in the Database section of mascot dat ms getseq exe can be used to retrieve the full text Otherwise a utility or URL must be identified which can accept an accession string and return the report text in a parseable format An example of a suitable external URL for full annotation text is shown in the example for Trembl below Each line in the WWW section contains 5 columns Chapter 6 Configuration amp Log Files 79 WWW Trembl SEQ 8 localhost 80 c inetpub mascot x cgi ms getseq exe Trembl ACCESSION seq Trembl REP 23 www uniprot org 80 uniprot ACCESSION txt end
68. to main status page From which links allow details of any specific search to be displayed Mascot Job status Job 1276 Database SwissProt Job Number 1276 Process ID 6547 Task ID 0 User Name User ID 0 User email Search title MS MS Example Percent complete 27 Intermed file af di 422 F00127 Start time Sun Apr 22 13 42 04 2012 End time Searching time Upload time Query prep time Whole process time Job status Searching Priority current value 0 Change this by 5 1 1 5 IP address 192 168 8 3 Type of srch NIS Enzyme Yes CPU utilisation 0 Job requests No requests Kill Pause Resume Back to main status page Chapter 7 Program Reference 123 Status can also be used to print Mascot configuration and result files to STDOUT This provides a method to display these files in a browser For example http your_server mascot x cgi ms status exe Show MS_ENZYMES where the argument to Show determines the file to be displayed MS_ENZYMES enzymes MS_ FRAGMENTATION RULES fragmentation_rules MS_MASCOT_DAT mascot dat MS_MASSES masses MS_MOD_FILE mod_file MS_QUANTITATIONXML quantitation xml MS_SUBSTITUTIONS substitutions MS_TAXONOMY taxonomy MS_UNIMODXML unimod xml MS_USERS users xml The above files are all displayed as plain text without any formatting If Show RESULTFILE then a results file from any directory under mas cot data can be returned with HTML forma
69. to the list of Inbound Rules Windows 7 On each search node log in as a user with local administrator rights Go to Control Panel Network amp Sharing Center and ensure the network connection to the master node is described as Work If it shows as Public click on the hyperlink to change it Choose Change Advanced sharing settings and ensure File and Printer Sharing is enabled 194 Mascot Installation and Setup ee Go All Control Panel Items Network and Sharing Center oa r I Search Contro Pane 2 Control Panel Home View your basic network information and set up connections Change adapter settings W Fo Q See full map Ch dvanced sh ange advanced sharing EC VM64 Network 2 Internet settings This computer View your active networks Connect or disconnect Network 2 Access type Internet Work network Connections Local Area Connection Change your networking settings g Set up a new connection or network lt Set up a wireless broadband dial up ad hoc or VPN connection or set up a router or access point Connect to a network Connect or reconnect to a wireless wired dial up or VPN network connection a Choose homegroup and sharing options Acct ss files and printers located on other network computers or change sharing settings Sj Trouble cot problems HomeGroup Diagnose and repair network problems or get troubleshooting information Internet Options W
70. wish to avoid the danger that redistributors of a free program will individually obtain patent licenses in effect making the program proprietary To prevent this we have made it clear that any patent must be licensed for everyone s free use or not licensed at all The precise terms and conditions for copying distribution and modification follow GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING DISTRIBUTION AND MODIFICATION 0 This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License The Program below refers to any such program or work and a work based on the Program means either the Program or any derivative work under copyright law that is to say a work containing the Program or a portion of it either verbatim or with modifications and or translated into another language Hereinafter translation is included without limitation in the term modification Each licensee is addressed as you Activities other than copying distribution and modification are not covered by this License they are outside its scope The act of running the Program is not restricted and the output from the Program is covered only if its contents constitute a work based on the Program independent of having been made by running the Program Whether that is true depends on what the Program does 1
71. you will need to change the shebang lines of all scripts to something similar to c perl bin perl exe User authentication Apache provides several ways to restrict access to directories or files One method is to limit access to clients from a range of IP addresses or a particular domain Another method is to require a username and pass word which may be a convenient way for a system administrator to limit access to the x cgi directory Setting up user authentication takes two steps firstly creating a file containing the usernames and passwords Secondly telling the server what resources are to be protected and which users are allowed after entering a valid password to access them Creating a User Database A list of users and passwords needs to be created in a file For security reasons this file should not be under the document root This example assumes the file is called usr local mascot config passwd The file will consist of a list of usernames and a password for each The format is similar to the standard Unix password file with the username and password being separated by a colon However you cannot just type in the usernames and passwords because the passwords are stored in an encrypted format 238 Mascot Installation and Setup The program htpasswd is used to add create a user file and to add or modify users This can be found in the bin directory of the Apache distribution To create a new user file and add
72. 0 23 day of the month 1 31 month of the year 1 12 day of the week 0 6 with 0 Sun day Each of these patterns may be an asterisk meaning all legal values a range of integers or a list of comma separated integers An element is either a number or two numbers separated by a minus sign meaning an inclusive range Note that days may be specified in two different ways day of the month and day of the week If both are specified as a list of elements both are adhered to For example OO debe aL 104 Mascot Installation and Setup Log files would run a command on the first and fifteenth of each month as well as on every Monday To specify days by only one field the other field should be set to for example 0 0 1 would run a command only on Mondays The sixth field is a string that is executed by the shell command prompt at the specified times The string must be on a single line The entire string up to the end of the line is passed to the command prompt for execution The part of the string up to the first space must be the fully qualified path to an executable The remainder of the line will be passed to the command as parameters Mascot maintains several log files which are described below When trouble shooting it can be useful to inspect the web server log files also Errors in Perl scripts for example will be appear in the web server error log not the Mascot error log Error Log All errors are logg
73. 0 Mascot Installation and Setup lt msgt accession_ str gt RL19 YEAST lt msgt accession_str gt lt msgt taxonomy gt lt msgt db gt SwissProt lt msgt db gt lt msgt taxonomy _id gt 4932 lt msgt taxonomy_id gt lt msgt scientific name gt Saccharomyces cerevisiae lt msgt scientific_ name gt lt msgt translation_ table id gt 1 lt msgt translation_ table id gt lt msgt common_names gt lt msgt synonym gt Candida robusta lt msgt synonym gt lt msgt Synonym gt Saccaromyces cerevisiae lt msgt synonym gt lt msgt synonym gt Saccharomyces capensis lt msgt synonym gt lt msgt synonym gt Saccharomyces italicus lt msgt synonym gt lt msgt synonym gt Saccharomyces oviformis lt msgt synonym gt lt msgt synonym gt Saccharomyces uvarum var melibiosus lt msgt synonym gt lt msgt Ssynonym gt Saccharomyes cerevisiae lt msgt synonym gt lt msgt synonym gt Sccharomyces cerevisiae lt msgt synonym gt lt msgt Synonym gt YEAST lt msgt synonym gt lt msgt Ssynonym gt baker amp apos s yeast lt msgt synonym gt lt msgt Synonym gt brewer amp apos s yeast lt msgt synonym gt lt msgt synonym gt lager beer yeast lt msgt synonym gt lt msgt Ssynonym gt yeast lt msgt synonym gt lt msgt common_names gt lt msgt tree gt lt msgt node level 12 gt Saccharomyces cerevisiae lt msgt node gt lt msgt node level 11 gt Saccharomyces lt msgt node gt lt msgt node level 10 gt Saccharomycetaceae lt msgt no
74. 0 residues position 0 would indicate the amino terminus and position 11 would indicate the carboxy terminus If there is no location information the range is output as 0 256 hn_qm_terms shows the residues the bracket the peptide in the protein If the peptide forms the terminus of the protein then a hyphen is used instead hn_qm_subst is output when the matched peptide contained an ambigu ous residue B X or Z The argument is one or more triplets of comma separated values For each triplet the first value is the residue position the second is the ambiguous residue and the third is the residue that has been substituted to obtain the reported match For a large MS MS search num_hits is set to zero and the summary block only contains entries for qmassn gexpn qmatchn qplugholen The threshold for switching to this mode is specified using two parameters in the Options section of mascot dat SplitDataFileSize is the size of the search process in bytes default 10000000 and SplitNumberOfQueries is the size of the search in queries default 1000 If this is a two pass search either an automatic decoy database search or an automatic error tolerant search a second summary block appears containing the second set of results The section name is either et_summary or decoy_summary The syntax of the contents is identical Mixture gc0p4Jq0M2Yt084jU534c0p Content Type application x Mascot name mixture num_hits number of
75. 1 is contained in the HTML help pages Choose Help from the Mascot main menu bar and then choose Quantitation Database Manager Database Manager is mainly described in the HTML help pages Choose Help from the Mascot main menu bar and then choose Sequence Data base Setup Database Manager Configuration Options The is a simple interface to the Options section of mascot dat which contains a variety of global settings Reference material can be found below Chapter 6 Configuration amp Log Files 75 mascot dat Two sections of mascot dat Processors and Cluster have no interface in either Database Manager or Configuration Options and the only way to make changes is to edit mascot dat Windows users should note that the path delimiters used in mascot dat must always be forward slashes never the backward slashes used at the command prompt If sequence database files are not on a local disk drive the remote drive must be mapped to a local drive letter UNC path specifications cannot be used Finally spaces are not allowed in file or directory names Hence C InetPub mascot config mascot dat correct yY C InetPub mascot config mascot dat wrong matrix_nt_01 InetPub mascot config mascot dat wrong x matrix_nt_01 InetPub mascot config mascot dat wrong x General mascot dat is divided into sections Each section starts with a unique keyword and ends with the keyword end Comments and blank lines can be used
76. 1 Identifier An identifier constructed from the name of the database an underscore character and either the keyword SEQ or REP Thus Tremb1_ SEQ is the source for the sequence string of an entry in the database called Trembl 2 Parse rule The index of a rule in the PARSE section that can be used to extract the information required Note that the rule for parsing a sequence string from ms getseq exe the same for all databases 3 Host The information source For ms getseq exe or a similar local executable this column should contain localhost For a remote source or a local source that will be queried as a CGI application enter the hostname NB the word localhost is used to determine whether the application is a command line executable or a CGI application If you want to specify a CGI application on the local server just specify the hostname in some other way for example 127 0 0 1 4 Port The port number This should be left at 80 unless another value is required to access a web server operating on a non default port 5 Path A string containing the path to the executable and parameters some of which are variables In the case of a command line executable the parameters will generally be delimited by spaces In the case of a CGI application the parameters may be delimited from the executable by a question mark and there must be no spaces within the parameter string In general spaces in URL s must be replaced by plus sy
77. 100 longest Longest sequence matched ions reported separately for each ion series as with fracIonsMatched backbone only 142 Mascot Installation and Setup fracIonsMatched Fraction of calculated ions matched reported sepa rately for each ion series with NLs lumped to gether e g fracIonsMatchedB1 fracIonsMatchedBlderiv fracIonsMatchedB2 fracIonsMatchedB2deriv matchedIntensity Matched ion intensity reported separately for each ion series as with fraclIonsMatched numUnigPeps The excess of the number of unique peptide matches unique primary sequence over the number of matches expected by chance qmatch The number of peptide matches for which an ms ms match was attempted peptide The peptide string that was matched proteins A tab separated list of accessions of proteins that contain this peptide Must be last indicates that this feature is not implemented Error codes Return Description 1 Invalid parameters Use help for help Missing or invalid mascot dat Error No MS MS spectra in results file Automatic decoy search not enabled 2 3 4 5 Insufficient number of queries 6 Insufficient number of sequences searched 7 Cannot read the results file Error 8 Failed to create output file 9 Invalid feature in mascot dat options 10 Invalid feature for a option 11 Invalid feature for r option Miscellaneous Utilities Service Supplied for Windows only This application shows the status of t
78. 17 30 33 2012 Creating compress file usr local mascot sequence SwissProt current SwissProt_2012 03 a00 Fri Apr 20 17 30 33 2012 Creating compress file usr local mascot sequence SwissProt current SwissProt_2012_03 t00 Fri Apr 20 17 30 33 2012 Creating compress file usr local mascot sequence SwissProt current SwissProt_2012 03 stats Fri Apr 20 17 32 26 2012 SwissProt0O Creating compressed files to Finished compress ing files Fri Apr 20 17 32 26 2012 SwissProt0O Finished compressing files to Running 1st test Fri Apr 20 17 32 33 2012 SwissProt0O Running 1st test to First test just run OK Fri Apr 20 17 32 33 2012 SwissProtO First test just run OK to Waiting for other DB to end Fri Apr 20 17 32 33 2012 SwissProt0O Waiting for other DB to end to Trying to memory map files Fri Apr 20 17 32 33 2012 SwissProtO Trying to memory map files to Just enabled memory mapping Fri Apr 20 17 32 33 2012 SwissProtO Just enabled memory mapping to In use Chapter 6 Configuration amp Log Files 107 IPC Log In cluster mode only an interprocess communication log can be enabled by setting PCLogging in the cluster section of mascot dat to 1 or 2 This log can be used to investigate communications errors at the socket level 108 Mascot Installation and Setup 109 Program Reference ascot implements a client server architecture using the HTTP protocol web server web browser In this mode the search
79. 282 6906 1262 6447 35 0 i 4 3 5e2 2 2 AG 949 5507 1897 0869 1697 0520 14 o 3 2 7002 1 bd 55 1099 0947 2196 1749 2196 1646 4 67 2 3 2 7e 02 1 z 67 1116 1775 3345 5106 3345 7564 73 45 1 2 1 4e02 1 a 10 747 4125 746 4052 746 3633 56 2 0 2 1 1e 03 1 z 20 933 4990 932 4917 932 4498 45 0 2 2 8 5e 02 2 2 3 741 3647 710 3574 710 3711 19 31 0 2 4 1e0 2 2 32 711 3707 1420 7269 1420 7054 isa 0 2 6 3e 02 2 Ll 498 2729 497 2656 2of3 21 04 2012 15 43 Chapter 4 Validation 55 3 575 5584 574 5511 662 4172 661 4099 930 6831 929 6758 930 7030 929 6957 932 4608 1862 9071 933 0038 1863 9930 z al 665 0096 1992 0069 50 1068 5615 2095 1085 2 63 832 7986 2495 3739 66 1113 8947 3338 6621 Search Parameters Type of search i MS MS Ion Search Enzyne Trypsin P Fixed modifications Carbamidomethyl C Variable modifications Oxidation M Mass values Monoisotopic Protein Mass Unrestricted Peptide Mass Tolerance 100 ppm Fragnent Mass Tolerance t 0 1 Da Max Missed C s 4 Instrument type ESI QUAD TOF Number of queries 67 Mascot http www matrix science com 3 of 3 21 04 2012 15 44 56 Mascot Installation and Setup 57 Sequence Database Setup equence database URL s and formats change constantly Provided your Mascot Server can connect to the Internet Mascot Database Manager will keep database definitions up to date automatically for many popular public databases For each database you can confi
80. 4 mascat x cai ms status exe x ce Links Mascot search status page i Fatal error failed to initialise the memory map Please try the following o Check that the Mascot service is running From the start menu choose Programs Mascot config Show Mascot Service Status Ifthe screen says that the Mascot service is NOT running then start the service From the start menu choose Programs Mascot config Start Mascot Service o Try refreshing this screen Click here Technical Information for support personnel Command line parameters were 0 C INETPUB MASCOT x cgi ms status exe NG E Done fa Local intranet There are several possible causes Service not started Since one of the first things that the Monitor service does is to create the memory mapped file this could indicate that the service has not started You can tell whether the service has started by choosing Start Pro grams mascot config Show Mascot ms monitor service status If the service is not running check the monitor log and errorlog txt file inthe logs directory If there is nothing in those files then it may be necessary to try and run ms monitor exeasa command line executable You should only do this if the Mascot service is not running To do this open a command prompt window and change directory to the mascot bin directory If your installation path was the default you will need to type cd Inetpub mascot bin next start the m
81. 76190 beta actin Trichosurus vulpecula Other parameters are possible for ms gettaxonomy exe see the refer ence section ms gettaxonomy in Chapter 7 Common Questions Why do I sometimes get results for a species I didn t specify Sometime when specifying for example Human species the results may appear at first sight to be from for example a Mouse sample The most common reason for this is that for a non redundant database exactly the same sequence has been found in many species To check this look at the protein view where you should see at least one entry for the species you selected Chapter 9 Taxonomy 173 What is the unclassified and other species The NCBI cannot always classify every sequence either because no species information was supplied with the data or because it currently doesn t fit into any currently classification There are about 1500 such sequences in the NCBI nr database Other species include plasmids and artificial sequences How do I see which sequences Mascot couldn t assign a taxonomy ID In the status screen click on the Unidentified taxonomy link This will show sequences where one or more of the species names were not identi fied by Mascot Why do I get the message Taxonomy xxx ignored No taxonomy indexes for this database Check the following In the mascot dat file is parameter 14 for the problem database a valid number and
82. 8 83 Port number 5001 Number of processors to use on this node 2 Use the Browse button to ensure that the UNC path to the node is correct If the machines are in a Windows domain and the remote drive is not explicitly shared you can enter C for drive C etc to use the administrative shares If the base directory does not exist create it using the Make New Folder button The recommended base folder name is MascotNode Ensure that the local path to the MascotNode directory matches the UNC path This must be a local or mapped drive on the node so that the path can be specified using a drive letter The dialog will try to guess the local path from the UNC path but it may get it wrong Ensure that this path is correct before pressing OK It is not necessary to fill in the Host name and IP Address fields unless the node is a multi homed system and it is necessary to define which network interface will be used for communication with the Mascot master The default port number for cluster communication is 5001 If there are conflicts this can be changed The number of processors must be specified The total number of proces sors specified for all nodes can be greater than the number of processors in the Mascot licence The surplus processors will then behave as hot 186 Mascot Installation and Setup spares to be swapped into the cluster as required if there is a hardware problem on another node NOTE If the mas
83. 9 Error Messages complete listing of Mascot error codes messages and explana tions can be found at the URL mascot cgi ms geterror exe ALL 3 Mascot Errors Netscape File Edit View Go Cor Ip is 34 2 a0 826 8 E Back Foward Reload Home Search Netscape Print Security Shop Stop 7 Bookmarks Location htp 192 168 42 13 mascot ogi ms geteror exe7ALL E7 What s Related A unic M00027 Sorry the database database name is not currently available for searching Further help Only databases listed in mascot dat can be searched It is possible that there was an error when the database was coming on line To check this and to see the status of all databases look at the status screen there s a link to it from the home page on the intranet versions To add a new database see the Sequence Database Setup chapter Actions e Show message to end user Terminate search e Message put into errorlog txt file e Message not put into monitor log file Message not emailed to the Mascot administrator a Message not put into the search results file ah SD Document Done ES a Se ga te The same text can also be found in the file errors htm1 in the root directory of the Mascot CD ROM 230 Mascot Installation and Setup System Limits 231 Number of different modifications in unimod xml unlimited Number of enzymatic peptides per sequence user definable MaxNumPep
84. ARTICULAR PURPOSE THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU SHOULD THE PROGRAM PROVE DEFECTIVE YOU ASSUME THE COST OF ALL NECESSARY SERVIC ING REPAIR OR CORRECTION 12 INNO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRIT ING WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY MODIFY AND OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE BE LIABLE TO YOU FOR DAM AGES INCLUDING ANY GENERAL SPECIAL INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER End User Licence Agreements xiii PROGRAMS EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES END OF TERMS AND CONDITIONS bzip2 This program bzip2 the associated library libbzip2 and all documentation are copyright C 1996 2005 Julian R Seward All rights reserved Redistribution and use in source and binary forms with or without modification are permitted provided that the following conditions are met 1 Redistributions of source code must retain the above copyright notice this list of conditions and the following disclaimer 2 The origin of this software must not be misrepresented you must not claim that you wrote the original software If you use this sof
85. BR gt Siete ioe 60 complete lt BR gt Lisa is 70 complete lt BR gt B ee Gs 90 complete lt BR gt 271397 sequences and 86500527 residues checked lt BR gt lt SCRIPT LANGUAGE JavaScript gt lt Begin hiding Javascript from old browsers if window navigator userAgent indexOf MSIE 1 window location replace cgi master_results pl file amp data 20090312 F002642 dat else if window location replace null window location assign cgi master_results pl file amp data 20090312 F002642 dat else window location replace cgi master_results pl file amp data 20090312 F002642 dat End hiding Javascript from old browsers gt lt SCRIPT gt lt NOSCRIPT gt lt A HREF cgi master_results pl file data 20090312 F002642 dat gt amp Click here to see Search Report lt A gt lt NOSCRIPT gt lt BODY gt lt HTML gt The executable called nph mascot1 exe is for Mascot TD BIG Mas cot where the precursor mass limit of 16 kDa has been removed It will only be used for searches if enabled in the licence 112 Mascot Installation and Setup Monitor The primary function of Mascot Monitor bin ms monitor exe is to manage the sequence databases Monitor must be running in order for the search engine to execute Under Linux this runs as a daemon and under Windows this runs as a service Monitor does the following
86. COPYRIGHT HOLDERS AND CONTRIBUTORS AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIM ITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT INDIRECT INCIDENTAL SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES INCLUDING BUT NOT LIM ITED TO PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE C Clustering Library The C clustering library Copyright C 2002 Michiel Jan Laurens de Hoon End User Licence Agreements xxvii This library was written at the Laboratory of DNA Information Analysis Human Genome Center Institute of Medical Science University of Tokyo 4 6 1 Shirokanedai Minato ku Tokyo 108 8639 Japan Contact mdehoon AT gsc riken jp Permission to use copy modify and distribute this software and its documentation with or without modifications and for any purpose and without fee is hereby granted provided that any copyright notices appear in all copies and that both those copyright notices and this permission notice appear in supporting documentation and that the names of
87. Default Port Sets the default port number to be used when this parameter is missing from nodelist txt Recommended default is 5001 UseCompleteDatabase Not used If specified must be set to 1 206 Mascot Installation and Setup nodelist txt This file is used to define the nodes that belong to the cluster For a very large cluster it is advisable to define a few percent of additional nodes as spares For example if 51 nodes with 102 processors were available and Mascot was configured to use 2 sub clusters each of 50 processors the node with the 2 spare processors could be used to replace a failed node automatically Cluster node definitions Each line begins with the word Node followed by a space and then a comma delimited list of configuration parameters ip address port computer host name maximum number of node CPU s to be used operating system local path to home directory home directory as seen from master specify for NT master only Node 10 0 0 1 5001 searchOl 2 Windows NT c MascotNode search01 c MascotNode Node 10 0 0 2 5001 search02 2 Windows NT c MascotNode amp search02 c MascotNode Node 10 0 0 3 5001 search03 2 Windows NT c MascotNode search03 c MascotNode Node 10 0 0 4 5001 search04 2 Windows NT c MascotNode searchod4 c MascotNode Node 10 0 0 5 5001 search0O5 2 Windows NT c MascotNode search05 c MascotNode
88. ENHANCEMENTS OR MODIFICATIONS Linux glibc section 6b applies Appendix G GNU Lesser General Public License Version 2 1 February 1999 Copyright 1991 1999 Free Software Foundation Inc 59 Temple Place Suite 330 Boston MA 02111 1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document but changing it is not allowed This is the first released version of the Lesser GPL It also counts as the successor of the GNU Library Public License version 2 hence the version number 2 1 G 0 1 Preamble The licenses for most software are designed to take away your freedom to share and change it By contrast the GNU General Public Licenses are intended to guarantee your freedom to share and change free software to make sure the software is free for all its users xvi Mascot Installation and Setup This license the Lesser General Public License applies to some specially designated software typically libraries of the Free Software Foundation and other authors who decide to use it You can use it too but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case based on the explanations below When we speak of free software we are referring to freedom of use not price Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software and ch
89. GVAVLK V z i 456 7806 911 5467 59 0 00072 a y K VGLQVVAVK A gt 22 480 7647 959 4748 o 46 0 033 2 U R VTOAMNATR A 2 24 595 7855 1189 5565 0 s7 0 002 1 v KR EICHIISDNE KC 2 25 605 7720 1205 5294 1205 5962 o 60 1 v K EIGNIISDAK K Oxidation 0 26 608 3099 1214 6052 1214 6507 o a U K NAGVEGSLIVEK I 27 617 2857 1232 5569 1232 5085 a v K VGGTSDVEVNEK K s 31 672 8375 1343 6605 1343 7085 o 1 v R TVIIEQSWGSPK V 2e 34 714 7623 1427 8058 o a U R GVMLAVDAVIAELK K 3 ma 7730 1427 8058 LJ a v R GVMLAVDAVIAELK K e 36 722 7552 1443 8007 1 v R GVMLAVDAVIAELK K Oxidation i 2 37 722 7722 1443 8007 o a v R GVNLAVDAVIAELK K Oxidation M 6 2 1503 7490 a U KTLNDELETIECm F z 40 1519 7439 o 1 v K TLNDELEIIECMK F Oxidation H 2 a 1770 8458 o 1 v R CIPALDSLTPANEDQK I 2 as 2928 0636 2 Y K ISSIQSIVPALEIANAHNR K 46 1918 0636 o a U K ISSIQSIVPALEIANANR K 2 48 2037 0153 o 1 LA R IQEIIEQLDVTTSEYEK E z a 2040 0375 o 1 uv K PVTTPEEIAQVATISANGDK E 51 2112 1323 a U R AMIQGVDLLADAVAVTMGPK G z 52 2128 1272 1 v R ALMLOGVDLLADAVAVTMGPK G Oxidation M z 53 1065 0623 2128 1100 2128 1272 o 25 1 6 1 Y R ALMLQGVDLLADAVAVIMGEK G Oxidation M z 54 1073 0477 2144 0009 2144 1221 92 3 7e 07 1 v R ALIQGVOLLADAVAVTNGPK G 2 Oxidation M 2 58 789 2062 2364 2968 2364 3264 2 56 6 0012 i v BR KPLVIIAEDVDGEALS TLVLBR L z 59 1183 1570 2364 2994 2364 3264 1 0 00018 1 v R KPLVIIARDVDGEALSTLVLNR L 2 60 7869 1094 2364
90. H3 if b significant and fragment includes RKNQ 10 b H20 if b significant and fragment includes STED 13 y series 14 y NH3 if y significant and fragment includes RKNQ 15 y H20 if y significant and fragment includes STED 17 internal yb lt 700 Da 18 internal ya lt 700 Da minInternalMass 200 maxInternalMass 1000 k 74 Mascot Installation and Setup Quantitation Mascot configuration Microsoft Internet Explorer File Edit View Favorites Tools Help Q sxx Q x a O 2 Search Sie Favorites g ie Se i Snaglt Powermarks Pt A Address http Frill mascot x cgi ms config exe u 1172165637 amp QUANT_SHOW 1 Mi Go Mascot Configuration Quantitation Methods Quantitation Methods Name Protocol None null ITRAQ 4plex reporter Copy Delete ICAT ABI Cleavable MD precursor Copy Delete ICPL duplex pre digest MD precursor Copy Delete ICPL duplex post digest i precursor Copy Delete SILAC K 6 R 10 MD precursor Copy Delete 180 corrected MD precursor Copy Delete 180 corrected multiplex multiplex Copy Delete SILAC K 6 R 6 multiplex multiplex Copy Delete 15N Metabolic MD precursor Copy Delete New quantitation method Serva ICPL TM post digest so all N terms are labelled http Frillimascot_2_2_beta x cgi ms config exe u 1172165637 g Local intranet A detailed description of quantitation methods the relevant Configura tion Editor pages and the underlying file quantitation xm
91. Homo sapiens gt Homo gt Hominidae gt Catarrhini gt Primates gt Eutheria gt Theria gt Mammalia gt Amniota gt Tetrapoda gt Sarcopterygi gt Euteleostomi gt Teleostomi gt Gnathostomata gt Vertebrata gt Craniata gt Chordata gt Deuterostomia gt Coelomata gt Bilateria gt Eumetazoa gt Metazoa gt Fungi Metazoa group gt eukaryote crown group gt Eukaryota gt cellular organisms gt root AAA35732 Homo sapiens human man Homo sapiens gt Homo gt Hominidae gt Catarrhini gt Primates gt Eutheria gt Theria gt Mammalia gt Amniota gt Tetrapoda gt Sarcopterygu gt Euteleostomi gt Teleostomi gt Gnathostomata gt Vertebrata gt Craniata gt Chordata gt Deuterostomia gt Coelomata gt Bilateria gt Eumetazoa gt Metazoa gt Fungi Metazoa group gt eukaryote crown group gt Eukaryota gt cellular organisms gt root Description lines gt CCHU cytochrome c human Parameters Returns 1 database accession Space separated list of accession string tax_ID number and scientific species name Where a database entry represents multiple accessions this information is repeated for each accession Plain formatted Chapter 7 Program Reference 127 2 database accession 3 database accession 4 database accession 5 database tax_ID 6 database tax_ID 7 database tax_ID 8 database species 9 database accession Batch mode Request format Space separated pair of accession string and scientific species name Where
92. ITY WHETHER IN AN ACTION OF CONTRACT TORT OR OTHERWISE ARISING FROM OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE Except as contained in this notice the name of a copyright holder shall not be used in advertising or otherwise to promote the sale use or other dealings in this Software without prior written authorization of the copyright holder End User Licence Agreements vii gzip ht Dig cksum touch libstdc GNU GENERAL PUBLIC LICENSE Version 2 June 1991 Copyright C 1989 1991 Free Software Foundation Inc 675 Mass Ave Cambridge MA 02139 USA Everyone is permitted to copy and distribute verbatim copies of this license document but changing it is not allowed Preamble The licenses for most software are designed to take away your freedom to share and change it By contrast the GNU General Public License is intended to guarantee your freedom to share and change free software to make sure the software is free for all its users This General Public License applies to most of the Free Software Foundation s software and to any other program whose authors commit to using it Some other Free Software Foundation software is covered by the GNU Library General Public License instead You can apply it to your programs too When we speak of free software we are referring to freedom not price Our General Public Licenses are designed to make sure that you have the freedom to d
93. MATRIX SCIENCE Mascot Installation and Setup 2012 Matrix Science Ltd All rights reserved The information contained in this publication is for reference purposes only and is subject to change at any time Every effort has been made to supply complete and accurate infor mation However Matrix Science Ltd assumes no responsibility and will not be liable for any consequential loss or damages that might result from the use of this manual or from any errors or omissions in the information contained herein No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical for any purpose without the express written permission of Matrix Science Ltd Mascot is a trade mark of Matrix Science Ltd All third party trade marks and service marks referred to in this publication are hereby acknowledged Matrix Science Limited 64 Baker Street London W1U 7GB UK Phone 44 0 20 7486 1050 Fax 44 0 20 7224 1344 Email info matrixscience com WWW http www matrixscience com April 2012 Revision 2 4 0 End User Licence Agreements MASCOT PROTEIN IDENTIFICATION SYSTEM END USER LICENCE AGREEMENT IMPORTANT PLEASE READ CAREFULLY This End User Licence Agree ment is a legally binding contract between you either an individual or a single corporate entity and Matrix Science Limited for the product identi fied above which includes computer software electronic documentation printed d
94. May need to be increased for very large searches 6 Stack space Not normally an issue for executables or any of the perl scripts 7 Thread stack space Not normally an issue for executables The perl scripts are not threaded File size limits This is normally unlimited but a limit may have been configured e g etc security limits conf You should manually verify that your system can successfully FTP a file larger than 2 GB as FTP doesn t necessarily report an error when it fails How the errors are reported If the Mascot executables report a memory error the error can be found in the errorlog txt file including the error code returned by the operating system For a Perl script running in CGI mode the web server may just kill the job and no error will be logged Determining what the limits are Most systems have two sets of limits the current limits and the hard limits There is no standard Unix command across all platforms although limit or limits will work on most systems from the C shell ulimit a cputime unlimited filesize unlimited datasize 1048576 kbytes stacksize 65536 kbytes coredumpsize unlimited memoryuse 250004 kbytes descriptors 200 vmemory 1048576 kbytes threads 1024 20 Mascot Installation and Setup ulimit aH cputime unlimited filesize unlimited datasize 1048576 kbytes stacksize 524288 kbytes coredumpsize unlimited memoryuse 524288 kbytes descriptors
95. NECESSARY SERVICING REPAIR OR CORRECTION 16 INNO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY MODIFY AND OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE BE LIABLE TO YOU FOR DAMAGES INCLUDING ANY GENERAL SPECIAL INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES G 0 2 How to Apply These Terms to Your New Libraries If you develop a new library and you want it to be of the greatest possible use to the public we recommend making it free software that everyone can redistribute and change You can do so by permitting redistribution under these terms or alternatively under the terms of the ordinary General Public License To apply these terms attach the following notices to the library It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty and each file should have at least the copyright line and a pointer to where the full notice is found one line to give the library s name and an idea of what it does Copyright C year name of author This library is free software you can redistribute
96. OQDY 251 EDENQ Sort Peptides By Residue Number C Increasing Mass Decreasing Mass Start End Mr Miss Sequence L 4 535 21 O MDDR i 12 1481 68 1 MDDREDLVYQAK Ions score QO 5 12 964 49 O EDLVYQAK ni egea A A F Local intranet zone The default behaviour is for this to link to the NCBI taxonomy browser For non redundant databases with more than one species source per sequence there will be a list of the species each with a link For the NCBInr database a separate gi number will be shown for each database entry with a link to Entrez and the NCBI Taxonomy browser for each entry If security and confidentiality protocols may make this unacceptable for your installation then change the entry in the Options section of the mascot dat file from TaxBrowserUrl http www ncbi nlm nih gov htbin gt amp post Taxonomy wgetorg lvl 0 amp lin f amp id TAXID to TaxBrowserUrl x cgi ms gettaxonomy exe 4 DATABASE ACCESSION In this case the link will display the information in the following format 172 Mascot Installation and Setup Taxonomy for gi 4501885 gi 4501885 Unknown species gi 113270 Homo sapiens human man HUMAN 103D 10GS 11GS 121P 12CA 12GS 133L Homo sapiens gt Homo gt Hominidae gt Catarrhini gt Primates gt Eutheria gt Theria gt Mammalia gt Am niota gt Tetrapoda gt lobe finned fish and tetrapod clade gt bony vertebrates gt Gnathostomata gt Verte brata gt Craniat
97. SEQUENTIAL DAMAGES INCLUDING BUT NOT LIMITED TO PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY OUT vi Mascot Installation and Setup OF THE USE OF THIS SOFTWARE EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE This software consists of voluntary contributions made by many individuals on behalf of the Apache Software Foundation and was originally based on software copyright c 1999 International Business Machines Inc http www iom com For more information on the Apache Software Foundation please see lt http www apache org gt T Curl COPYRIGHT AND PERMISSION NOTICE Copyright c 1996 2003 Daniel Stenberg lt daniel haxx se gt All rights reserved Permission to use copy modify and distribute this software for any purpose with or without fee is hereby granted provided that the above copyright notice and this permission notice appear in all copies THE SOFTWARE IS PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND EXPRESS OR IMPLIED INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM DAMAGES OR OTHER LIABIL
98. SearchControlSaveE 0 Obsolete SearchLogFile see ErrorLogFile SendmailPath see EmailErrorsEnabled SelectSwitch 1000 If the number of queries in an MS MS search is less than or equal to this number the default report is the Peptide Summary If it is greater than this number the default report is the Select Summary 100 Mascot Installation and Setup SeparateLockMem 0 Only required for 32 bit versions if the total amount of memory to be locked is greater than 2Gb or lower if some system limit is set Setting this value to 1 indicates that ms monitor exe will run a separate program ms lockmem exe that will lock the memory blocks A value greater than 1 specifies the block size in Mb For example if there is a 1 5 Gb s00 file and this parameter is set to 750 then two instances of ms lockmem exe will be run ShowAllFromErrorTolerant 0 ShowSubSets 0 Standard behaviour for the result report of a manual error tolerant search is to show only those matches that satisfy two criteria i the score must be at least as high as the match for the same query in the original parent search ii the score equals or exceeds the identity threshold for the same query in the original parent search Setting ShowAl1FromErrorTolerant to 1 causes all matches to be displayed This global default can be overridden on an individual report URL by appending amp _showallfromerrortolerant X where X is 0 or 1 If this is set to 1 und
99. TRegex A Z0 9 ABEV 0 9 End There is just a single species per sequence so the DescriptionLineSep is set to 0 The SpeciesFiles are from NCBI and SwissProt and the NodesFiles are taken from NCBI as before There is only one database source so the DefaultRule can be used This takes everything after the first underscore to the next space For exam ple gt 104K THEPA P15711 104 KD MICRONEME RHOPTRY ANTIGEN Would find the text THEPA which it would look up in speclist txt NCBI EST Definition 9 for EST_others is very similar to that for NCBInr 176 Mascot Installation and Setup TAXONOMY FOR dbEST using GI2TAXID Taxonomy 9 Enabled 1 FromRefFile 0 ErrorLevel 0 DescriptionLineSep 1 ctrl a hex code 1 For multiple descrip tions per entry 0 to disable it SpeciesFiles GI2TAXID gi_taxid_nucl dmp NCBI names dmp NodesFiles NCBI nodes dmp NCBI merged dmp DefaultRule GI2TAXID CHOP gi 0 9 The gi number Identifier NCBI nucleotide FASTA using GI2TAXID GencodeFiles NCBI gencode dmp AccFromSpeciesLine gi 0 9 MitochondrialTranslation 0 End A different species file gi_taxid_nucl dmp is used for nucleic acid se quences The file containing genetic code data is specified as an argument to GencodeFiles while MitochondrialTranslationis set to 0 specify ing that all entries should be translated using the genetic code for nu clear
100. UsersEnabled is set to 1 search results will be emailed to a user if their web browser does not respond within the number of seconds specified in EmailTimeOut Period following the completion of a search Email messages can be sent in batches at intervals specified by MonitorEmailCheckFreg in seconds MailTempFile is the name of the temporary file used to store email messages until they can be sent If EmailErrorsEnabled is set to 1 serious error messages will be emailed to ErrMessageEmailTo MAPI Configuration Windows Only Set MailTransport to 1 Set the EmailPassword to the password if any that is required to log onto MAPI Set the EmailProfile to the profile name used by MAPI This can be found by opening the Windows Control Panel and clicking on Mail Depending on whether you have an internet mail only or a corporate or workgroup installation of MS Outlook you will have a list of either account names or profile names to choose from Sendmail Configuration Linux Only Set MailTransport to 2 Set the EmailFromUser parameter to the name that is required in the From field of the email messages Set EmailFromTextName as the name of the server that is running mascot For example setting EmailFromUser to www and EmailFromTextName to Mascot Server will result in emails from www Mascot Server The From field of the email will be www www your_domain com Set sendmailPath as the path for the sendmail program e
101. You may copy and distribute verbatim copies of the Program s source code as you receive it in any medium provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty keep intact all the notices that refer to this License and to the absence of any warranty and give any other recipients of the Program a copy of this License along with the Program You may charge a fee for the physical act of transferring a copy and you may at your option offer warranty protection in exchange for a fee 2 You may modify your copy or copies of the Program or any portion of it thus forming a work based on the Program and copy and distribute such modifications or work under the terms of Section 1 above provided that you also meet all of these conditions End User Licence Agreements ix a You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change b You must cause any work that you distribute or publish that in whole or in part contains or is derived from the Program or any part thereof to be licensed as a whole at no charge to all third parties under the terms of this License c If the modified program normally reads commands interactively when run you must cause it when started running for such interactive use in the most ordinary way to print or display an announcement including an appropriate copyright notice and
102. _2 pl ms createpip exe MSAnatomiser class mi_getpeaklist pl msms gif pl nph mascot exe ms searchcontrol exe ResultsCache master _results pl master results 2 pl protein _view pl export _dat pl export _dat_2 pl ms createpip exe MSAnatomiser class mi_getpeaklist pl nph mascot exe ms searchcontrol exe Comma space or tab delimited string of scripts and applications that will use cache files to speed up access to the results files To prevent the use of the cache for a particular script remove it from this list There are Chapter 6 Configuration amp Log Files 99 two sets of cache files one for the results file independent of any par ticular report format controlled by ResfileCache and one for each combi nation of summary report format settings controlled by ResultsCache See also CacheDirectory ResultsFileFormatVersion If present and the argument is 2 1 the result file format will be 2 1 compatible That is no xml sections No other arguments are supported at this time ResultsFullURL see NoResultsScript ResultsFullURL_2 see NoResultsScript ResultsPerlScript see NoResultsScript ResultsPerlScript 2 see NoResultsScript ReviewColWidths 7 8 8 27 30 100 32 25 6 13 2 4 6 16 7 This sets the widths of the columns in ms review exe SaveEveryLastQueryAsc see LastQueryAscFile SaveLastQueryAsc see LastQueryAscFile ScoreThresholdForAuto Deprecated use SigThreshold SearchControlLifetime 7200
103. a 7 Powermarks A A Address http slave02 mascat x cai security_admin pl vE sna Mascot Security Administration Add user Logged in as Administrator Nome User is a member of the following groups Guests Administrators PowerUsers O Never Daemons Default MascotIntegraSystem Force change at next login Password Password expiry Full name Email address User type Standard Mascot user Multiple selections can be made by means of the shift and control keys platform dependent Account enabled M Help window Enter a user name password full name and email address for the new user Select one or more groups for the user to belong to Finally press the Add user button For further help on any input parameter hold the mouse over the blue text 2 Local intranet Chapter 12 Security 221 The new user must be given a name password full name and email address There is a description of the different user types of user earlier in this chapter You should also select one or more groups that the user should belong to before pressing the Add user button New groups may be added or edited E Mascot Security Administration Utility Microsoft Internet Explorer File Edit View Favorites Tools Help L EE amp x a CA gt Search she Favorites amp A 7 Z mm a t Powermarks MA A Address http fslaveo2 mas
104. a gt Chordata gt Deuterostomia gt Coelomata gt Bilateria gt Humetazoa gt Metazoa gt Fungi Metazoa group gt eukaryote crown group gt Eukaryota gt root gi 3320892 Trichosurus vulpecula Tricosurus vulpecular Trichosurus vulpecular common brush tailed possum TRIVU Trichosurus vulpecula gt Trichosurus gt Phalangeridae gt Diprotodontia gt Metatheria gt Theria gt Mam malia gt Amniota gt Tetrapoda gt lobe finned fish and tetrapod clade gt bony vertebrates gt Gnathostomata gt Vertebrata gt Craniata gt Chordata gt Deuterostomia gt Coelomata gt Bilateria gt Eumetazoa gt Metazoa gt Fungi Metazoa group gt eukaryote crown group gt EKukaryota gt root Description lines gt gi 4501885 ref NP_001092 1 pACTB beta actin gi 113270 sp P02570 ACTB_HUMAN ACTIN CYTOPLASMIC 1 BETA ACTIN gi 71618 pir ATHUB actin beta human gi 71619 pir ATMSB actin beta mouse gi 279669 pir ATCHB actin beta chicken gi 28252 emb CAA25099 X00351 beta actin Homo sapiens gi 49866 emb CAA27307 X03672 beta actin aa 1 375 Mus musculus gil 55575 emb CAA24528 V01217 beta actin Rattus norvegicus gi 177968 M10277 cytoplasmic beta actin Homo sapiens gi 211237 L08165 beta actin Gallus gallus gi 2116655 dbj BAA20266 AB004047 beta actin Homo sapiens gi 2182269 U39357 beta actin Ovis aries gi 2661136 AF035774 beta actin Equus caballus gi 3320892 AF0
105. a database entry represents multiple accessions this information is repeated for each accession Followed by the FASTA title line for the accession supplied as an argument Pretty formatted Same as mode 2 plus a list of common spe cles names in parentheses Same as mode 3 plus complete taxonomy tree The scientific species name as a string Pretty formatted Same as mode 5 plus a list of common spe cles names in parentheses Same as mode 6 plus complete taxonomy tree verbose tax_ID information genetic code number GET request always means single entry mode POST request automati cally means batch mode A batch mode request should use UTF 8 encod ing and be of multipart form data enctype for example 41184676334 Content Disposition form data SwissProt 41184676334 Content Disposition form data RL19 YEAST 41184676334 Content Disposition form data 1061 41184676334 Content Disposition form data name db name accession name taxID name showtitle 128 Mascot Installation and Setup on 41184676334 Content Disposition form data name showSynonyms on 41184676334 Content Disposition form data name showTaxTree on 41184676334 Content Disposition form data name SessionID 123456 41184676334 The batch format aggregatesboth find taxonomy from accession and find taxonomy from id requests Maximum number of a
106. accession accession string frame frame number 0 if not supplied in the input or missing if AA database Chapter 7 Program Reference 121 Status The Database Status utility x cgi ms status exe provides an overview of the active and recent searches on all of the configured databases The top level display will resemble this 3 Mascot search status page e CS fi O locahost t MASCOT search status page Version 2 3 109 Matrix Science 5214 224E 9C5D 477E 8D0B Licence Info 8 logical 2 physical Intel processors hyper threading disabled in bios quad core CPUs 0 2234567 available using 07 234567 0 searches running Search log monitor log error log Error message descriptions Do not auto refresh this page Name Filename Status contaminants Family usr local mascot_2_3_02_64 sequence contaminants c contaminants_ fasta Pathname usr local mascot_2_3_02_64 sequence contaminants c In use Statistics Compression warnings State Time Mon Apr 16 11 05 38 searches 0 Mem mapped YES Request to mem map YES Request unmap NO Mem locked NO Number of threads 2 Current YES Name Filename Status CRAP Family usr local mascot_2_3_02_64 sequence cRAP current c cRAP_20120229 fasta Pathname usr local mascot_2_3_02_64 sequence cRAP current c In use Statistics State Time Mon Apr 16 11 05 38 searches 0 Mem mapped YES Request to mem map YES Req
107. aea Archaeobacterna modifications Eukaryota eucaryotes Alyeolata alveolates Plasmodium falciparum malana parasite coenes Other Alveolata Metazoa Animals Variable Caenorhabditis elegans modifications gt Drosophila fruit flies Ty Chordata vertebrates and relatives eoenrere bony vertebrates eer er lobe finned fish and tetrapod dade ere trae Mammalia mammals Peptide tol t oss ss Primates POUTVCCr COREE Homo sapiens human Peptide charge i Other primates Data file sss essere eee Rodentia Rodents rae se Ramee en eee Mus Data format sss steerer eee Mus musculus house mouse ic eee eceereer ss Rattus Inctrument Mefarilt v Frrar talerant l This file can be edited using any text editor Under Windows from the start menu choose Programs Mascot Config Mascot taxonomy file The following is an extract from the supplied file Title All entries Include 1 Exclude 0 Title Archaea Archaeobacteria Include 2157 Exclude Title Eukaryota eucaryotes Include 2759 Exclude Title Alveolata alveolates Include 33630 Exclude TAtles 2 g a 4 we 4 a hb SR w On 4 e Primates Chapter 9 Taxonomy 167 Include 9443 Exclude k Title a aa a aaa 2 Homo sapiens human Include 9606 Exclude k Title a a aaa aa aa a a Other primates Include 9443 Exclude 9606 The first line of each block must
108. ah and the Regents of the University of California All Rights Reserved End User Licence Agreements XV Permission is hereby granted without written agreement and without license or royalty fees to use copy modify and distribute this software and its documentation for any purpose provided that 1 The above copyright notice and the following two paragraphs appear in all copies of the source code and 2 redistributions including binaries reproduces these notices in the supporting documentation Substantial modifications to this software may be copyrighted by their authors and need not follow the licensing terms described here provided that the new terms are clearly indicated in all files where they apply IN NO EVENT SHALL THE AUTHOR THE UNIVERSITY OF CALIFORNIA THE UNIVERSITY OF UTAH OR DISTRIBUTORS OF THIS SOFTWARE BE LIABLE TO ANY PARTY FOR DIRECT INDIRECT SPECIAL INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION EVEN IF THE AUTHORS OR ANY OF THE ABOVE PARTIES HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE THE AUTHOR THE UNIVERSITY OF CALIFORNIA AND THE UNIVERSITY OF UTAH SPECIFICALLY DISCLAIM ANY WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE THE SOFTWARE PROVIDED HEREUNDER IS ON AN AS IS BASIS AND THE AUTHORS AND DISTRIBUTORS HAVE NO OBLIGATION TO PROVIDE MAINTE NANCE SUPPORT UPDATES
109. ame maximum number of node CPU s to be used operating system local path to home directory home directory as seen from master specify for NT master only Node 10 0 0 1 5001 searchOl 2 Linux usr local mascotnode Node 10 0 0 2 5001 search02 2 Linux usr local mascotnode Node 10 0 0 3 5001 search03 2 Linux usr local mascotnode Node 10 0 0 4 5001 search04 2 Linux usr local mascotnode Node 10 0 0 5 5001 search05 2 Linux usr local mascotnode 4 Re start ms monitor exe Note that you must change directory to mascot bin and have super user privileges to execute ms monitor exe Note Linux only Under Redhat Linux 8 0 if ms monitor exe termi nates immediately after launch without any error messages the prob lem may relate to a bug in gethostbyname_r In the cluster section of mascot dat try using the IP address for the master node rather than the hostname as the argument to MasterComputerName 5 In a web browser navigate to ms status exe and verify that the system starts up correctly Reference The Cluster section in mascot dat Cluster Enable 1 or disable 0 cluster mode Enabled 1 MasterComputerName must be the hostname MasterComputerName mascot master Chapter 11 Cluster Mode 203 Node defaults DefaultNodeOS Windows NT DefaultNodeHomeDir c mascotnode Following line must be commented out UNLESS this is a DefaultNodeHomeDirFromMaster lt host_name gt
110. an run this wizard again Select a role If the role has not been added you can add it If it has already been added you can remove it If the role you want to add or remove is not listed open Add or Remove Programs Server Role Configured Application server IIS ASP NET File server Yes Print server No Application server IIS ASP NET No Application servers provide the core Mail server POP3 SMTP No technologies required to build deploy Terminal server No and operate XML Web Services Web Remote access VPN server No applications and distributed Domain Controller Active Directory No applications Application server DNS server No technologies include ASP NET COM DHCP server No and Internet Information Services Streaming media server No ils WINS server No Read about application servers Proceed through the wizard accepting all the defaults to install IIS IIS 6 does not serve files with unknown MIME types and its default list of MIME types does not include XML schema documents See Microsoft Knowledge Base article Q326965 for the procedure to add XSD to the TIS 6 list of MIME types http support microsoft com default aspx scid kb en us 326965 Vista Mascot will run under all Windows Vista editions except for Starter and Home Basic It is advisable to ensure that the latest service pack has been installed Check the following URL for current information http windows micro
111. and Chapter 6 Configuration amp Log Files 71 cleavage after M at the other When Independent is omitted or given a value of 0 the specificities are combined as if the reagents had been applied simultaneously or serially to a single sample aliquot The key word Independent does not take an index Title semiTrypsin Cleavage 0 KR Restrict 0 P Cterm 0 SemiSpecific 1 k k If the keyword SemiSpecific appears and is given a value of 1 this means that any given peptide need only conform to the cleavage specificity at one end The other end can result from non specific cleav age When SemiSpecific is omitted or given a value of 0 peptides are required to conform to the cleavage specificity at both ends The keyword SemiSpecific does not take an index 72 Mascot Installation and Setup Instruments Mascot configuration Microsoft Internet Explorer DER File Edit Yiew Favorites Tools Help a Q sxx Q x a O Search Sie Favortes amp Mi Snaglt Powermarks P A Address http fril mascotlx cgi ms config exe u 1172165637 amp FRAGMENTATIONRULES_SHOW 1 i Go Mascot Configuration Instruments Default ESI MALDI ESI ESI ESI MALDI ESI FTMS ETD MALDI Ion series QUAD TOF TRAP QUAD FTICR TOF 4SECTOR ECD TRAP QUAD TOF PSD TOF TOF 1 x x x x x x x x x x 2 x x x x x x x x 2 precursor gt 3 immonium a at a0 b b y must be significant y must be highest score z 1 x x Minimum ma
112. ant that for a period of ninety days 90 from the date of delivery the Warranty Period 5 1 the medium on which the Software is recorded will be free from defects in materials and workmanship under normal use If the medium fails to conform to this warranty you may as your sole and exclusive remedy obtain at your option either a replacement free of charge or a full refund if you return the defective medium to us or to your supplier during the War ranty Period and End User Licence Agreements iii 5 2 the copy of the Software in this package will materially conform to the documentation that accompanies the Software If the Software fails to operate in accordance with this warranty you may as your sole and exclu sive remedy return all of the Software and the documentation to us or to your supplier during the Warranty Period specifying the problem and we will provide you either with a new version of the Software or a full refund at your option 6 Disclaimer We do not warrant that this Software will meet your requirements or that its operation will be uninterrupted or error free We exclude and hereby expressly disclaim all express and implied warranties or conditions not stated herein so far as such exclusion is or disclaimer is permitted under the applicable law THIS AGREEMENT DOES NOT AFFECT YOUR STATUTORY RIGHTS 7 Liability 7 1 Our liability to you for any losses shall not exceed the amount you originally paid f
113. arge for this service if you wish that you receive source code or can get it if you want it that you can change the software and use pieces of it in new free programs and that you are informed that you can do these things To protect your rights we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it For example if you distribute copies of the library whether gratis or for a fee you must give the recipients all the rights that we gave you You must make sure that they too receive or can get the source code If you link other code with the library you must provide complete object files to the recipients so that they can relink them with the library after making changes to the library and recompiling it And you must show them these terms so they know their rights We protect your rights with a two step method 1 we copyright the library and 2 we offer you this license which gives you legal permission to copy distribute and or modify the library To protect each distributor we want to make it very clear that there is no warranty for the free library Also if the library is modified by someone else and passed on the recipients should know that what they have is not the original version so that the original author s reputation will not be affect
114. ary specifies a version number of this License which applies to it and any later version you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation If the Library does not specify a license version number you may choose any version ever published by the Free Software Foundation If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these write to the author to ask for permission For software which is copyrighted by the Free Software Foundation write to the Free Software Foundation we sometimes make exceptions for this Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE THERE IS NO WARRANTY FOR THE LIBRARY TO THE EXTENT PERMITTED BY APPLICABLE LAW EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND OR OTHER PARTIES PROVIDE THE LIBRARY AS IS WITHOUT WARRANTY OF ANY KIND EITHER EXPRESSED OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU SHOULD THE LIBRARY End User Licence Agreements Xxiii PROVE DEFECTIVE YOU ASSUME THE COST OF ALL
115. ases without disrupting ongoing searches is handled by Mascot Monitor The new database is compressed and tested by running a standard search If errors are detected in the new database the data base exchange process is abandoned and searches continue to use the earlier database Assuming the test is successful all new searches are performed against the new database while searches that were in progress against the old database are allowed to continue Once the final search against the old database is complete the compressed files are deleted and the FASTA file is moved to an archive directory If the database being exchanged is memory mapped the mapping and unmapping are also handled auto matically 4 Mascot Installation and Setup Status The Mascot package includes a CGI application that provides a live status display via a web browser For each database the Mascot job queue the executing jobs and the completed jobs are listed The status lines for completed jobs contain hyperlinks to individual results reports Review Review is a CGI application that provides easy access to the flat file database of search result files Key search parameters such as time and date job number user name search type etc are displayed ina spreadsheet like table Columns can be hidden sorted and filtered to facilitate locating a specific file or group of files Each row includes hyperlinks either to generate a Mascot results reports or to disp
116. ass neutral loss mass ql_pl_et_mods_ slave neutral loss mass neutral loss mass Chapter 8 I O File Formats 161 ql_pl_ primary _nl neutral loss string ql_pl_na_diff original NA sequence modified NA sequence ql_pl_tag tagNum startPos endPos seriesID ql_pl_drange startPos endPos ql_pl_terms residue residue residue residue ql_pl_subst posil1 ambigi matched1 posn ambign matchedn ql_pl_comp quantitation component name ql_pl_summed_mods variable modifications string ql _p2e qn _pme gc0p4dq0M2Yt08jU534c0p Each line contains the data for a peptide match followed by data for at least one protein in which the peptide was found If there multiple entries in the database containing the matched peptide there will be a corresponding number of pairs of bracketing residues listed in qn_pm_terms Otherwise individual field descriptions are identical to those for the Summary section If this is a two pass search either an automatic decoy database search or an automatic error tolerant search a second peptides block appears containing the second set of results The section name is either et_peptides or decoy_peptides The syntax of the contents is identical Proteins gc0p4Jq0M2Yt084jU534c0p Content Type application x Mascot name proteins accession string protein mass title text accession string protein mass title text gc0p4Jq0M2Yt084jU534c0p This
117. ate 2H 5 124 068498 124 1515 2H 5 C 7 N O Copy Delete Print Phospho 79 966331 79 9799 H O 3 P Copy Delete Print Phosphoadenosine 329 052520 329 2059 H 12 C 10 N 5 O 6 P Copy Delete Print Phosphoguanosine 345 047435 345 2053 H 12 C 10 N 5 O 7 P Copy Delete Print PhosphoHex 242 019154 242 1205 H O 3 P Hex Copy Delete Print PhosphoHexNa c 283 045704 283 1724 H O 3 P HexNAc Copy Delete Print Phosphopantetheine 340 085794 340 3330 H 21 C 11 N 2 O 6 P S Copy Delete Print PhosphoribosyldephosphoCoA 881 146904 881 6335 H 42 C 26 N 7 O 19 P 3 S Copy Delete Print Phosphouridine 306 025302 306 1660 H 11 C 9 N 2 O 8 P Copy Delete Print Phycocyanobilin 586 279135 596 6780 H 38 C 33 N 4 O 6 Copy Delete Print Phycoerythrobilin 588 294785 588 6939 H 40 C 33 N 4 O 6 Copy Delete Print Page 20 of 25 Go to page PE to Ph lt lt S gt page size 20 Add new modification Main menu Biotinyl iodoacetamidyl 3 6 dioxaoctanediamine Alternative name 1 Pierce EZ Link PEO Iodoacetyl Biotin http Frill mascot_2_2_beta x cgi ms config exe u 1172165637 Le Local intranet For modifications more detailed help can be found in the Unimod help pages This is also the place to find details of the file format which is fully defined by a schema called unimod_1 xsd 68 Mascot Installation and Setup Enzymes Mascot configuration Microsoft Internet Explorer Eile Edit view Favorites Tools Help ay Q ax
118. ation http www unimod org xmlns schema unimod_2 unimod_2 xsd gt lt umod elements gt lt umod elem avge_mass 1 00794 full name Hydrogen mono_mass 1 007825035 title H gt lt umod elem avge_mass 2 014101779 full name Deuterium mono_mass 2 014101779 title 2H gt lt umod elem avge_mass 6 941 full name Lithium mono_mass 7 016003 title Li gt lt umod elem avge_mass 12 0107 full name Carbon mono_mass 12 ELELEe C gt This section is an extract from unimod xm1 containing data for the elements amino_acids and any modifications specified in the search form For more details and a link to the schema refer to the help pages at www unimod org 154 Mascot Installation and Setup Enzyme gc0p4Jq0M2Yt08jU534c0p Content Type application x Mascot name enzyme Title Trypsin Cleavage KR Restrict P Cterm This section is simply an extract from the enzyme file Syntax details can be found in Chapter 6 Taxonomy gc0p4Jq0M2Yt08jU534c0p Content Type application x Mascot name taxonomy Title Homo sapiens human Include 9606 Exclude This section is simply an extract from the taxonomy file Syntax details can be found in Chapter 9 Header gc0p4Jqo0M2Yt08jU534c0p Content Type application x Mascot name header sequences number of sequences in DB sequences after _tax number of sequences after taxonomy filter residu
119. ation File Please Note As part of the product registration process the following information will be transmitted to Matrix Science Details of any existing licence Machine identifiers for node locking purposes eg MAC address A product key is required and must be registered online The licence file will be returned by email and must be saved to the specified location on the Mascot server If the Mascot server cannot connect to the Internet a file containing registration information can be saved and copied to a system with Internet access for submission The registration form allows a second email address to be specified in case the person installing Mascot is not the end user Ensure that the end user email address is entered into the upper part of the form and the email address to which the licence file should be sent is entered into the CC email field in the lower part of the form The licence file must be saved to the config licdb directory as a file with the extension lic Chapter 2 Installation Linux 15 Verify System Operation A copy of the SwissProt database is included in the files copied from the DVD It is recommended that the operation of Mascot is verified and tested using this database before adding further databases or making configuration changes Mascot Monitor ms monitor exe is used to manage the swapping and memory mapping of the sequence databases used by Mascot For Mascot to operate ms m
120. axon N Official name Node C Common name S Synonym AAV2 V 010804 N Adeno associated virus 2 C AAV2 ABDS2 B 056673 N Antarctic bacterium DS2 3R ABIAL E 045372 N Abies alba C Edeltanne This file is available at ftp ftp ebi ac uk pub databases uniprot amp knowledgebase docs speclist txt 170 Mascot Installation and Setup If you wish to add more entries a new file should be made with just the new entries Mascot will load multiple files as specified below Most Mascot updates will contain the updated speclist txt file Genetic code selection During a search of a nucleic acid database Mascot also uses the tax onomy of each entry to choose the correct genetic code for translation The genetic codes are defined in the NCBI file gencode dmp which is included in the archive taxdump tar mentioned above Nodes dmp is used as a lookup table to obtain a genetic code number from a TaxID For many species the genetic code is different for mitochondrial and nuclear proteins Although Mascot could try to determine whether a database entry is mitochondrial by performing a keyword search of the FASTA description this is unreliable In any case mitochondrial pro teins will usually represent only a very small fraction of the entries in any comprehensive database The most important requirement is to use the correct code for a database that is specifically mitochondrial proteins The solution is to include a flag in each mascot
121. be supplied and an optional nice value If a valid new nice value is supplied this will return the text job_niced If a nice value is not supplied the program will return the current nice value nice xxx If there is an error one of the following will be returned unknown_id job_not_running searchcontrol error nnn with values of nnn as for status The nice is implemented by setting a flag in the mascot control memory mapped file The nph mascot exe task is responsible for resuming itself Nice values range from 20 to 20 A value of 20 will set the task toa very low priority The Mascot status screen shows the nice value as a priority which is simply 1 the nice value Microsoft Windows does not allow such a fine grained control of priorities so for example 20 and 19 will map to the same priority ms searchcontrol exe set to queued taskID lt number gt sessionID lt string gt If the task is successful this will return the text queued If there is an error one of the following will be returned unknown_id 140 Mascot Installation and Setup already running already complete searchcontrol error nnn with values of nnn as for status A batch processing client can make queued jobs visible to the Mascot system by getting a taskID and using this call to set the status to queued When the search is eventually submitted nph mascot exe will set the status
122. because searches may run slower than normal If you are trying to search an assembled genome you might want to consider searching shorter se quences instead such as a database of the contigs MaxVarMods 9 The maximum number of variable mods allowed for an MIS search global default can be over ridden for a group in security Value is an integer in the range 0 to 32 MinPeaksForHomology 6 For an MS MS search a homology threshold will not be reported if the number of peaks in a spectrum is less than this value MinPepLenInPepSummary 7 In a Peptide Summary report two proteins are reported as distinct matches if the peptide matches to one protein are not identical to or a sub set of the peptide matches to the other protein Since matches to very short peptides are usually random peptides shorter than MinPepLenInPepSummary are not considered in this comparison MinPepLenInSearch 7 Peptides shorter than MinPepLenInSearch are rejected during the search Matches to very short peptides are meaningless because a 2 mer or 3 mer can occur in almost every entry in a database If such matches Chapter 6 Configuration amp Log Files 93 are allowed in the peptides section it can cause serious bloating of the result file MonitorEmailCheckFreq see EmailErrorsEnabled MonitorLogFile see ErrorLogFile MonitorPidFile monitor pid The name for the file that holds the process ID number for ms monitor exe Default is monitor pid Monito
123. block contains reference data for the proteins listed in the peptides block 162 Mascot Installation and Setup Input data for query n gc0p4Jq0M2Yt084jU534c0p Content Type application x Mascot name queryn title query title index query index seql sequence qualifier e g N ABCDEF seq2 seqn compl composition qualifier e g O P 2 W comp2 compne PepTol peptide tolerance qualifier e g 2 000000 Da IT_MODS Mod 1 Mod 2 INSTRUMENT instrument identifier e g ESI TRAP RULES 1 2 5 6 8 9 13 14 INTERNALS min mass max mass CHARGE charge state e g 2 RTINSECONDS a b c d SCANS a b c d tagl sequence tag e g t 889 4 QK S 1104 54 tagne mass _min lowest mass mass _max highest mass int_min lowest intensity int_max highest intensity num_vals number of mass values num_used1 1 obsolete ions1 1344 65 34 3 1365 41 13 2 ions2 y 1344 65 34 3 1365 41 13 2 ions3 b 1344 65 34 3 1365 41 13 2 gc0p4Jqo0M2Yt08jU534c0p Value queryn runs from query1 no leading zeros ionsn values are sorted in the order that they were selected for scoring Chapter 8 I O File Formats 163 Most searches will only require a few of these fields For example a peptide mass fingerprint would only include the charge field The index is a 0 based record of the original query order before sorting by Mr ions2 and ions3 are only required when fragment ions are specified in a s
124. c mascotnode Following line must be commented out WHEN this is a MascotNodeScript HHHROOTHHH bin load_node pl Sub cluster definition Syntax is SubClusterSet X Y where X is the sub cluster number and Y is the maximum number of processors in the sub cluster SubClusterSet 0 10 Time outs log files IPCTimeout 5 seconds with no IPCLogging 0 no logging 0 IPCLogfile logs ipc log relative path CheckNodesAliveFreq 30 seconds between node SecsToWaitForNodeAtStartup 20 seconds to wait for end Enabled 1 to enable cluster mode 0 to enable single server mode MasterComputerName Enter the host name for the master computer and optionally the IP address separated by a comma The IP address may need to be specified for a multi homed master where it is necessary to define which network card is on the LAN and which is the gateway to the outside world DefaultNodeOS If no OS is defined for a particular node then this OS is assumed Must be one of e Windows_NT e Linux Note that these names are case sensitive DefaultNodeHomeDir If no specific home directory is specified for a particular node then this default is used 204 Mascot Installation and Setup On a Linux system this will typically be usr local mascotnode It is best not to use usr local mascot as this is the directory mostly used for the master On a Windows system this will typically be C MascotNode or D MascotNode
125. can be re started Programs Mascot config Start Mascot service Percolator 0 PercolatorFeatures mScore lgDScore mrCalc charge dM dMppm absDM absDMppm isoDM isoDMppm mc varmods totInt intMatchedTot relIntMatchedTot PercolatorMinQueries 100 PercolatorMinSequences 100 PercolatorUseProteins 0 PercolatorUseRT 0 PercolatorExeFlags i 10 D 14 v 0 Set Percolator to 1 if percolated results should be opened by default 0 otherwise PercolatorFeatures specifies the list of features used by Percolator To see the list of available features run ms createpip exe help Percolator will only be run if the number of queries in the search is at least PercolatorMinQueries and the number of entries in the sequence database is at least PercolatorMinSequences Percolator will use the assignment of proteins to peptides as a feature if PercolatorUseProteins is set to 1 This can have undesirable results and should be used with great care This flag is not supported in the current release Percolator will use the retention times of peptides as a feature if PercolatorUseRT is set to 1 PercolatorExeFlags is used to specify the Percolator command line arguments with the exception of the file path arguments j B r If the string includes the argument D num this will be removed unless PercolatorUseRT is set to 1 PrecursorCutOut 1 1 96 Mascot Installation and Setup The precursor peak can often have very high intensity relative to
126. cation the name of the results file can be parsed directly from this string The output to STDOUT from a successful search will closely resemble the following null 200 OK Server null Content type Pragma no ca lt HTML gt lt HEAD gt lt TITLE gt lt META HTTP EQ lt META HTTP EQ lt HEAD gt lt BODY lt comment lt comment lt comment lt comment lt comment text html che Mascot searching lt TITLE gt UIV Expires CONTENT 0 gt UIV Pragma CONTENT no cache gt BGCOLOR FFFFFF gt here gt here gt here gt here gt here gt Chapter 7 Program Reference 111 lt comment here gt lt comment here gt lt comment here gt lt comment here gt lt comment here gt lt H1 gt lt IMG SRC images 88x31_logo_white gif WIDTH 88 HEIGHT gt Q 31 ALIGN TOP BORDER 0 NATURALSIZEFLAG 3 gt Mascot Search lt H1 gt Licensed to Matrix Science In house test system lt BR gt Not a real form Finished uploading search details lt BR gt lt B gt IMPORTANT lt B gt If you get disconnected or choose not to wait amp for your search results lt BR gt DO NOT RESUBMIT THE SEARCH Your amp results will be sent by email when the search is complete lt BR gt Searching lt BR gt 10 complete lt BR gt 20 complete lt BR gt 30 complete lt BR gt 50 complete lt
127. cations that expect to find these files Files in config dbmanager are configuration files used by Data base Manager For descriptions see the Database Manager HTML help page A browser based Configuration Editor is provided to view and edit these files These files are all text files so can also be edited in any 66 Mascot Installation and Setup text editor If you choose to edit the files exercise care and always make a backup first because seemingly small errors can render Mascot unusable Configuration Editor The local Mascot home page contains a link to the Configuration Editor The top level page is a menu If Mascot security is enabled there will be an additional menu item for Mascot security adminis tration 3 Mascot configuration e C fi locahost 8090 mascot_2_4_0_64 x cgi ms c Mascot Configuration Elements Element masses Amino Acids Amino Acid Data Modifications Modification definitions Symbols Symbols used in chemical formulae Enzymes Enzyme definitions Instruments Fragmentation Rules Quantitation Quantitation Methods Configuration Options Global Options in mascot dat Database Manager Sequence databases Parse Rules and automated downloads unimod xml The first four menu items Elements Amino acids Modifications and Symbols are interfaces to different aspects of unimod xm1 It should only be necessary to make changes to unimod xm1 in exceptional cir cumstances An example might be if you wante
128. ccessions taxIDs submitted at once must not exceed 100000 and the total size of request should be no more than 10 MB All request parameter names are case insensitive Any parameter value can be in quotes DB mandatory parameter and can only appear once If several databases are searched than ms getseq must be called separately for each database ACCESSION can appear any number of times Quotes are mandatory Can have a list of accessions delimited by commas spaces tabs or new line characters All ACCESSION fields are merged into one list of acces sion strings internally TAXID can appear any number of times and contains a list of tax onomy ids delimited by commas spaces or new line characters All such fields are merged into one list internally SHOWTITLE can appear only once and if set to TRUE a description for each db entry has to be output SHOWSYNONYMS can appear only once and if set to TRUE a list of common names should be output for each taxonomy SHOWTAXTREE can appear only once and if set to TRUE taxonomy tree should be output for each taxonomy SESSIONID an optional parameter and can appear at most once If no session ID is supplied then ms gettaxonomy can either process the request when security is disabled or try to retrieve the ID from cookies Chapter 7 Program Reference 129 Boolean values can be coded in different ways true TRUE True on any number except 0 any string except
129. ce 5 Optionally if Mascot sercurity enabled sessionID followed by a space and then the security session identifier Chapter 7 Program Reference 115 If the keyword seq is supplied the output from GetSeq has the following format Content type text plain MMSARGDFLNYALSLMRSHNDEHSDVLPRLY PLYSSKOQTLKOKLLLAIKTKNFGFV gt 100K RAT 100 KD PROTEIN EC 6 3 2 RATTUS NORVEGICUS RAT The keyword all is only applicable if a local full text database is avail able and configured in mascot dat In which case the returned text has a format similar to the following Content type text plain MMSARGDFLNYALSLMRSHNDEHSDVLPRLY PLYSSKOTLKOKLLLAIKTKNFGFV gt 100K RAT 100 KD PROTEIN EC 6 3 2 RATTUS NORVEGICUS RAT gt P1 100K RAT 100 KD PROTEIN EC 6 3 2 RATTUS NORVEGICUS RAT C DOMAIN 827 847 PRO RICH C BINDING 858 858 UBIQUITIN BY SIMILARITY C Keywords UBIQUITIN CONJUGATION LIGASE In all cases the first line is a content type specifier followed by a blank line For seg and a11 there is then an asterisk followed by the unformatted sequence in one letter code The next line is identical to the FASTA title line beginning with a right angle bracket In the case of a full text report this is followed by the raw text entry as retrieved from the sequence database full text file If the keyword len is supplied then the length of the sequence is re turned as ascii text If the database is a
130. ception the source code distributed need not include anything that is normally distributed in either source or binary form with the major components compiler kernel and so on of the operating system on which the executable runs unless that component itself accompanies the executable If distribution of executable or object code is made by offering access to copy from a designated place then offering equivalent access to copy the source code from the same place counts as distribution of the source code even though third parties are not compelled to copy the source along with the object code 4 You may not copy modify sublicense or distribute the Program except as expressly provided under this License Any attempt otherwise to copy modify sublicense or distribute the Program is void and will automatically terminate your rights under this License However parties who have received copies or rights from you under this License will not have their licenses terminated so long as such parties remain in full compliance 5 You are not required to accept this License since you have not signed it However nothing else grants you permission to modify or distribute the Program or its derivative works These actions are prohibited by law if you do not accept this License Therefore by modifying or distributing the Program or any work based on the Program you indicate your acceptance of this License to do so and all its terms and cond
131. cked If you plan to use Remote Desktop from the master you might want to check this at the same time Chapter 11 Cluster Mode 189 Windows Firewall General Exceptions Advanced Windows Firewall is blocking incoming network connections except for the programs and services selected below Adding exceptions allows some programs to work better but might increase your security risk Programs and Services Name File and Printer Sharing Remote Assistance CO Remote Desktop O UPnP Framework Add Program Add Port Delete Display a notification when Windows Firewall blocks a program What are the tisks of allowing exceptions Choose Add Port and enter the following data Don t press OK yet 190 Mascot Installation and Setup Add a Port Use these settings to open a port through Windows Firewall To find the port number and protocol consult the documentation for the program or service you want to use Name MascotNodePort5001 TCP Port number 5001 ICP O UDP What are the risks of opening a port Choose Change scope and select the second option My network subnet only Change Scope To specify the set of computers for which this port or program is unblocked click an option below To specify a custom list type a list of IP addresses subnets or both separated by commas O Any computer including those on the Internet O Custom list E
132. conds 1 day To change the timeout value at the cgi node cscript adsutil vbs set w3svc 1 root mascot cgi cgitimeout x To set a new timeout value at the mascot node or change the existing value escript adsutil vbs set w3svc 1 root mascot cgitimeout x where x is the value in seconds that you want to set You will then need to re start IIS from the IIS Management console IIS 7 x Vista Server 2008 7 In the Windows Start menu go to Control panel Administrative Tools Internet Information Services IIS Manager On the connections tree expand Sites and Default web site and select mascot In the central pane double click the CGI properties icon The CGI time out will be displayed and can be edited If you make changes choose Apply in the Action pane Internet Information Services IIS Manager a 151 x 09 Gl gt WIN I9B9VXG3DIN gt Sites Default WebSite gt mascot gt jaz x aa File View Help ec CGI E e 1218 mal By soy ca Start Page Ae Cancel lt x Cance 1 3 WIN 9B9vxG20IN wIn 1989v _DSbIay _ Friendly Names Application Pools E Behavior Help E 8 Sites Time out hh mm ss 1 00 00 00 Online Help Default Web Site Use New Console For Eac False 2 4 mascot E Security fel coi Impersonate User True F downloads E help w images H E pdf 5 templates ffl x cai H E xmins Impersonate User Specifies whether a CGI process is created in the syst
133. cot is running on a Windows system 3 AA NA AA for an amino acid protein database and NA for a nucleic acid DNA database 4 Obsolete This parameter used to contain the approximate number of entries sequences in the database used for progress reports during a search The value is now just a place holder 5 Obsolete This parameter used to contain a unique identification number The value is now just a place holder 6 Mem map Flag to indicate whether the database file should be memory mapped 1 or not 0 Database files should always be memory mapped Unlike memory locking this does not consume physical RAM 7 Obsolete This parameter Blocks must always be set to 1 8 Threads A Mascot search can use multiple threads If you are run ning in cluster mode Threads is ignored Otherwise set to 1 to allow the number of threads to be determined automatically To specify a fixed number of threads in non cluster mode set a value of 1 or more 9 Mem lock Flag to indicate whether a memory mapped database file should be locked in memory 1 or not 0 This setting is only relevant if column 6 contains a 1 Memory mapped files can be locked in memory but only if the computer has sufficient RAM Having a database locked in memory means that it can never be swapped out to disk ensuring there will never be a lag if the database files have to be read from disk Of course there also needs to be sufficient RAM for t
134. cot x cai security_admin pl vao sna Mascot Security edit group Guests Logged in as Administrator Unique ID 1 Users in Group Users not in group Name Guests Tasks that cannot be SEARCH Allow ms ms and SQ searches performed by group members SEARCH Allow msms no enzyme searches SEARCH Allow no enzyme pmf searches Add task SEARCH Maximum number of concurrent searches per user SEARCH Maximum mascot search job priority Tasks that can be Performed by group Task Parameter SEARCH Allow pmf searches To remove tasks select one or more check boxes SEARCH Allow all fasta databases to be searched and press Remove Remove GENERAL View config files using ms status Save changes cancel Help window Change details for an existing group Change the users that belong to the group and the tasks that members of the group can perform No changes are saved until the Save changes button is pressed 4 Local intranet Each group has a unique ID that cannot be changed Users can be added to or removed from groups either on this screen or from the edit add user screens Mascot security is fine grained There is a list of about 20 tasks that members of a group can or cannot perform The tasks that are not permitted are in the top list To allow group members to perform one of these tasks click on the task in the list and then Add task This ta
135. crypted password to third party SMB serv Disabled a a IP Security Policies on Loca RE Microsoft network server Amount of idle time required before suspending ses 15 minutes RE Microsoft network server Digitally sign communications always Disabled RE Microsoft network server Digitally sign communications if client agrees Disabled Microsoft network server Disconnect clients when logon hours expire Enabled Network access Allow anonymous SID Name translation Disabled Network access Do not allow anonymous enumeration of SAM accounts Enabled Network access Do not allow anonymous enumeration of SAM accounts and Disabled Network access Do not allow storage of credentials or NET Passports for ne Disabled Network access Let Everyone permissions apply to anonymous users Disabled Network access Named Pipes that can be accessed anonymously COMNAP COMNOD Network access Remotely accessible registry paths System CurrentCon Network access Shares that can be accessed anonymously COMCFG DFS RE Network security Do not store LAN Manager hash value on next password ch Disabled RE Network security Force logoff when logon hours expire Disabled RE Network security LAN Manager authentication level Send LM amp NTLM re RE Network security LDAP client signing requirements Negotiate signing R8 Network security Minimum session security for NTLM SSP based including sec No minimum RE Network security Minimum sessio
136. d It is advisable to ensure that the latest service pack has been installed Check the following URL for current information http www microsoft com windowsxp downloads default mspx The Microsoft web server for 32 bit editions of Windows XP is IIS 5 1 which is provided as part of the standard distribution If IIS is not installed choose Add or Remove Programs in the Control Panel Select Add Remove Windows Components and check the box for Internet Information Services XP Professional x64 Edition uses IIS 6 0 the same as Server 2008 Server 2003 Mascot will run under all editions of Windows Server 2003 except those for the Itanium processor It is advisable to ensure that the latest service pack has been installed Check the following URL for current information Chapter 3 Installation Microsoft Windows 25 http technet microsoft com en us windowsserver bb405947 The Microsoft web server for Server 2008 is IIS 6 0 You may need to install IIS by configuring the server as an Application Server When you start your server you should see a Manage Your Server wizard If not go to Administrative Tools and click on Manage Your Server When the wizard opens click on Add or Remove a Role then select Application Server Configure Your Server Wizard x Server Role You can set up this server to perform one or more specific roles If you want to add more than one p role to this server you c
137. d to add a modification that was confidential or experimental Otherwise better to add a new modification to the public Unimod database www unimod org and later download an updated configuration file from the Unimod help page By going this route you share the new modification with others and benefit in turn from other people s updates Most of the pages of the Configuration Editor are self explanatory Where necessary help text is displayed when the mouse rolls over a hyperlink Chapter 6 Configuration amp Log Files 67 F Mascot configuration Microsoft Internet Explorer DER Fie Edit View Favorites Tools Help ay Q wx x a cA Search she Favortes amp B A Gd Snagit Powermarks ft A Address http Frillimascot x cgi ms config exe u 1172165637 Mascot Configuration Modifications Title Monoisotopic Average Composition PEO lodoacetyl LC Biotin 414 193691 414 5196 H 30 C 18 N 4 O 5 S Copy Delete Print PET i 121 035005 121 2028 H 7 C 7 N Of 1 S Copy Delete Print PGA1 biatin 660 428442 660 9504 H 60 C 36 N 4 O 5 S Copy Delete Print Phe gt Cys 44 059229 44 0310 H 4 C 6 S Copy Delete Print Phe gt lle 33 984350 34 0162 H 2 C 3 Copy Delete Print Phe gt Ser 60 036386 60 0966 H 4 C 6 O Copy Delete Print Phe gt Tyr 15 994915 15 9994 O Copy Delete Print Phe gt Val 48 000000 48 0428 C 4 Copy Delete Print Phenylisocyanate 119 037114 119 1207 H S5 C 7 NO Copy Delete Print Phenylisocyan
138. dat and to enable someone without prior knowledge of regular expressions to write simple rules for new databases Only the most basic aspects of BRE notation are touched on In mascot dat the PARSE section contains a number of rules For each rule the pattern in double quotes is a BRE which is used to identify a string so that it can be parsed from the surrounding text For example Report text from NCBI excluding sequence used for AA entries RULE_10 LOCUS ORIGIN The part of the BRE between the backslashed parentheses and is the string which we are trying to locate and extract This rule looks for the word LOCUS followed by a space It will extract all the text including the word LOCUS up to but excluding the word ORIGIN followed by a space BRE Rules The rules for performing this match are as follows The BRE always looks for the longest leftmost matching string Matching is case sensitive Newline characters LF in Unix or CR LF in Windows are treated like any other character 226 Mascot Installation and Setup The sub expression to be extracted from the surrounding text is defined using backslashed parentheses The parentheses are ignored for matching purposes Some characters are Special The period left bracket and backslash are special except when used in a bracket expression The asterisk is special except when used in a bracket expres sion as the first character of an en
139. de gt lt msgt node level 9 gt Saccharomycetales lt msgt node gt lt msgt node level 8 gt Saccharomycetes lt msgt node gt lt msgt node level 7 gt Saccharomycotina lt msgt node gt lt msgt node level 6 gt Ascomycota lt msgt node gt lt msgt node level 5 gt Dikarya lt msgt node gt lt msgt node level 4 gt Fungi lt msgt node gt lt msgt node level 3 gt Fungi Metazoa group lt msgt node gt lt msgt node level 2 gt Eukaryota lt msgt node gt lt msgt node level 1 gt cellular organisms lt msgt node gt lt msgt tree gt lt msgt taxonomy gt lt msgt accession gt lt msgt all_accessions gt lt msgt db entry gt lt msgt tax_from_id gt lt msgt taxonomy gt lt msgt db gt SwissProt lt msgt db gt lt msgt taxonomy _id gt 1061 lt msgt taxonomy_id gt lt msgt scientific_name gt Rhodobacter capsulatus lt msgt scientific_ name gt Chapter 7 Program Reference 131 lt msgt translation_table_id gt 11 lt msgt translation_table id gt lt msgt taxonomy gt lt msgt tax_from_id gt lt msgt results gt lt msgt ms_gettaxonomy_out gt The way information is represented in the XML output will be clearer if a few rules are kept in mind e msgt title element will only appear in the output if showTitle true e msgt common_names element will only appear in the output if showSynonyms true e msgt tree element will only appear in the output is showTaxTree true e order of e
140. display fragment ion masses in reports can be altered by changing this value IteratePMFIntensities 1 Set this option to 0 to prevent selection of PMF values on the basis of their intensity LabelAll 0 Set this option to 1 to make the initial display in Peptide View one in which all peaks that match a calculated mass value are labelled LastQueryAscFile logs lastquery asc SaveEveryLastQueryAsc 1 SaveLastQueryAsc 0 SaveLastQueryAsc is a flag which controls whether the most recent input file to Mascot i e the MIME format file containing MS data and search parameters should be saved to disk 1 or not 0 This can be a 90 Mascot Installation and Setup useful debugging tool when writing scripts or forms to submit searches to Mascot If SaveLastQueryAsc is set to 1 the name of the file is deter mined by LastQueryAscFile Each new search over writes this file NB LastQueryAscFile is a disk path not a URL An additional debugging tool is provided by SaveEveryLastQueryAsc If set to 1 the Mascot input file will be saved for any search that fails to complete because it generates a fatal error The name of the output file follows the same naming convention as a normal Mascot result file except for the additional suffix inp If a search goes to completion this file is deleted as soon as the normal output file has been written to disk LogoImageFile images 88x31_ logo white gif This is the URL of the Matrix Science logo
141. e Default is 1 102 Mascot Installation and Setup TargetFDRPercentages 0 1 0 2 0 5 1 2 5 TaxBrowserURL Choices available for the FDR drop down list in the Protein Family Summary report of an auto decoy search Each item in the list is a percentage The symbol specifies the default setting of the control 1 in this case No default The URL used in reports to retrieve taxonomy information for a Protein View report By default this points to the NCBI If you don t want to send such queries out to the internet the URL can be replaced by a call to the ms gettaxonomy exe utility TaxBrowserUrl x cgi ms gettaxonomy exe 4 DATABASE ACCESSION TestDirectory see ErrorLogFile UniqueJobStartNumber see ErrorLogFile UnixDirPerm 777 Specify the Linux permissions for the daily result file directories For example 775 makes each directory world readable but not writeable This option provides more fine grained control than UnixWebUserGroup UnixWebUserGroup This entry if present will restrict access to the files created by ms monitor exe and hence improve system security The UnixWebUserGroup is the number of the group used by the web server to run CGI programs With Apache the group name will generally be nobody and you will need to ascertain the group number from the group file For other Web servers check the documentation that comes with the server to find out which user name is used for running CGI pro
142. e db_name is the database family name from mascot dat e g MSDB fasta is the fully qualified path to the FASTA file ms compress exe compresses the fasta file using the rules specifed in mascot dat and must be run so that it s current working directory is mascot bin Return value of 0 for success gt 0 for failure 144 Mascot Installation and Setup MakeSearchLog The bin ms makesearchlog exe utility is used to rebuild the search log by scanning all the result files located in sub directories of mascot data This can be useful if the original search log has been damaged or if result files have been pruned after archiving There are no arguments LockMem On 32 bit platforms the 2GB address space limit can quickly be reached by having several large databases locked into memory To work around this limit the bin ms lockmem exe utility is provided LockMem is enabled by adding the parameter SeparateLockMem 1 to the options section of mascot dat Specifying a value greater than 1 specifies the block size in MB For example if there is a 1 5 GB s00 file and this parameter is set to 750 two instances of ms lockmem exe will be run GetError The utility cgi ms geterror exe takes an error number as an argu ment and returns the corresponding text string For example C Inetpub MASCOT cgi gt ms geterror exe 23 You specified enzyme s which is not available Choose another 145 VO File Formats MIME Ver
143. e library The former contains code derived from the library whereas the latter must be combined with the library in order to run 0 This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License also called this License Each licensee is addressed as you A library means a collection of software functions and or data prepared so as to be conveniently linked with application programs which use some of those functions and data to form executables The Library below refers to any such software library or work which has been distributed under these terms A work based on the Library means either the Library or any derivative work under copyright law that is to say a work containing the Library or a portion of it either verbatim or with modifications and or translated straightforwardly into another language Hereinafter translation is included without limitation in the term modification Source code for a work means the preferred form of the work for making modifications to it For a library complete source code means all the source code for all modules it contains plus any associated interface definition files plus the scripts used to control compilation and installation of the library Activities other than copying d
144. e Perl or web server software You cannot mix x86 and x64 nodes in a Mascot cluster All must be 32 bit or all must be 64 bit The master detects that search nodes are responding by issuing an echo command to TCP on port 7 under Linux and ICMP echo under Windows Hence these services must not be disabled or blocked by firewalls Overview of Implementation Each search is distributed to all the cluster nodes but each node searches just an allocated portion of the sequence database Search results are returned to the master which merges them writes the result file to disk and optionally generates HTML reports to be returned to a client web browser All master node communication is via TCP IP Configuration and program files are distributed and updated automati cally from the Master node Mascot administration tools provide web browser based system status reports These are continuously updated and show at a glance important parameters such as processor usage and free disk space for each of the nodes As an option critical alerts can also be sent to the system admin istrator by email In cluster mode Mascot is intended to run as a dedicated system Trying to run other applications on the cluster simultaneously may have unpre dictable effects on search speed Installation of Mascot It is only necessary to install or upgrade Mascot on the master system In fact no files are copied to the Mascot nodes during installation The
145. ePerl 5 14 bin perl usr local bin perl Mascot Directory Structure There are two directory structures to consider One consists of the real paths to files on disk the other consists of the virtual directories which define the web server URL s The virtual directories are mapped to real directories For example the server URL http your domain mascot home html might be mapped to the disk file usr local mascot html1 home html Any virtual directory that contains CGI executable programs e g nph mascot exe or scripts e g master_results p1 must have script execution enabled Under normal circumstances if a directory is mapped to a URL all of its subdirectories are also accessible as subdirectories of the URL Figure 2 1 shows the recommended directory structure for Mascot The root of this structure can be any convenient path Some of the directory paths can be changed by using a symbolic link or by modifying the configuration file mascot dat For example it may be desirable to have the sequence or data directories on a separate drive from the rest of the files Care should be taken with any changes which affect a URL mapped directory or file because this may require one or more HTML files to be edited to modify links In most cases the contents of the directories can be deduced from their names bin contains non CGI executables cgi contains CGI executables cluster contains a sub directory for platfor
146. eSep is set to 1 the ASCII character code for CTRL A There are two SpeciesFiles gi_taxid_prot dmp and names dmp and two NodesFiles nodes dmp and merged dmp All four files must be present and up to date The method of finding the species is particularly simple for NCBI databases The default rule is applied to the FASTA title line to extract the gi number from the accession string The species file gi_taxid_prot dmp is a look up table that returns an NCBI taxonomy ID number Chapter 9 Taxonomy 175 For example the FASTA title line gt gi 2147497 pir S73153 hypothetical protein 10 red amp alga Porphyra purpurea chloroplast returns a gi number 2147497 Looking this number up in gi_taxid_prot dmp returns a taxonomy ID of 2787 Looking this number up in names dmp returns the string 2787 Porphyra purpurea gt amp scientific name SwissProt The rules for SwissProt are fairly simple Block 3 should always used for SwissProt and Trembl even if you have a local full text dat file TAXONOMY FOR SwissProt or Trembl from the fasta file Taxonomy_3 Identifier SwissProt FASTA Enabled 1 0 to disable it FromRefFile 0 DescriptionLineSep 0 ctrl a hex code 1 For multiple descriptions per entry SpeciesFiles NCBI names dmp SWISSPROT speclist txt NodesFiles NCBI nodes dmp NCBI merged dmp DefaultRule SWISSPROT CHOP gt _ Anything after _ before space SWISSPRO
147. ected from a dropdown list in the report Each column arrangement is of the form Name columns where Name is the column arrangement name e g Standard and columns is a comma separated list of column names as used by Report Builder The following is the standard list of column names available in every report family 98 Mascot Installation and Setup member db acc score mass matches matches sig sequences sequences sig empai frame desc Frame will not be shown in the report if the search is against a proteindatabase Quantitation methods add additional column names but these are generated from the quantitation ratio names The easiest way to create a column arrangement is to arrange the columns in Report Builder then export the arrangement as a string ReportNumberChoices 5 10 20 30 40 50 If present this list will define the choices provided in the search form Report top drop down list RequireBoldRed 0 If this flag is set to 1 only protein matches which have one or more bold red peptide matches will be listed in a peptide summary report That is proteins that include at least one top ranking peptide match that has not already appeared in the report This global default can be overridden on an individual report URL by appending amp _requireboldred X where X is 0 or 1 ResfileCache master results pl master _ results 2 pl peptide view pl protein view pl export _dat pl export _dat
148. ection After saving the changes monitor progress in the Mascot database status page as the new database is brought on line Chapter 5 Sequence Database Setup 63 eaa To A A nn zle http ec vm64 mascot x cgi ms status exe X Mascot search status page Lu tas 293 MASCOT search status page Version 2 3 241 Edman University SULW F7M9 TYGH 3GJ3 R3VJ Licence Info 1 Intel processor No hyper threading in cpu single core 0 searches running Search log monitor log error log Error message descriptions Do not auto refresh this page Name NCBInr Family C inetpub mascot sequence NCBinr current NCBInr_ fasta Filename NCBInr_20120419 fasta Pathname C inetpub mascot sequence NCBinr current NCBInr_2012041 Status In use Statistics Compression warnings Unidentified taxonomy State Time Sun Apr 22 09 04 22 searches 0 Mem mapped YES Request to mem map YES Request unmap NO Mem locked NO Number of threads 1 Current YES Name SwissProt Family C inetpub mascot sequence SwissProt current SwissProt_ Filename SwissProt_2012_03 fasta Pathname C inetpub mascot sequence SwissProt current SwissProt_ Status In use Statistics Unidentified taxonomy State Time Sat Apr 21 20 40 18 searches 0 Mem mapped YES Request to mem map YES Request unmap NO Mem locked NO Number of threads 1 Current YES If all is well the status line for
149. ed by problems that might be introduced by others Finally software patents pose a constant threat to the existence of any free program We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder Therefore we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license Most GNU software including some libraries is covered by the ordinary GNU General Public License This license the GNU Lesser General Public License applies to certain designated libraries and is quite different from the ordinary General Public License We use this license for certain libraries in order to permit linking those libraries into non free programs When a program is linked with a library whether statically or using a shared library the combination of the two is legally speaking a combined work a derivative of the original library The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom The Lesser General Public License permits more lax criteria for linking other code with the library We call this license the Lesser General Public License because it does Less to protect the user s freedom than the ordinary General Public License It also provides other free software developers Less of an advantage over competing non free p
150. ed by the University of California Berkeley and its contributors 4 Neither the name of the University nor the names of its con tributors may be used to endorse or promote products derived from this software without specific prior written permission THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT INDIRECT INCIDENTAL SPECIAL EXEMPLARY OR CONSE QUENTIAL DAMAGES INCLUDING BUT NOT LIMITED TO PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUP TION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE COPYRIGHT 8 1 Berkeley 3 16 94 t PLPA Copyright c 2004 2006 The Trustees of Indiana University and Indiana University Research and Technology Corporation All rights reserved Copyright c 2004 2005 The Regents of the University of California All rights reserved Copyright c 2007 Cisco Systems Inc All rights reserved Portions copyright xxvi Mascot Installation and Setup Co
151. ed on one line separated by commas RULES contains a list of the rule numbers that define the instrument type in the configuration file fragmentation rules The rule numbers are listed explicitly because the contents of the configuration file may have changed since the search was run Masses gc0p4Jq0M2Yt08jU534c0p Chapter 8 I O File Formats 151 Content Type application x Mascot name masses A 71 037110 B 114 534940 C 160 030649 D 115 026940 EBH 129 042590 F 147 068410 G 57 021460 H 137 058910 T 113 084060 J 0 000000 K 128 094960 L 113 084060 M 131 040480 N 114 042930 O0 0 000000 P 97 052760 Q 128 058580 R 156 101110 S 87 032030 T 101 047680 U 150 953630 V 99 068410 W 186 079310 X 111 000000 Y 163 063330 Z 128 550590 Hydrogen 1 007825 Carbon 12 000000 Nitrogen 14 003074 Oxygen 15 994915 Electron 0 000549 C_term 17 002740 N_term 1 007825 deltal 15 994919 Oxidation M NeutralLoss1 0 000000 FixedMod1 57 021469 Carbamidomethyl C FixedModResidues1 C gc0p4Jq0M2Yt0845U534c0p This block contains actual mass values That is average or monisotopic residue masses including any fixed modifications C and N terminus groups also include any fixed modifications FixedMod1 FixedMod2 etc records the delta mass and name for each fixed modification as comma separated values FixedModResidues1 gives the site specificity If multiple residues are affected they are listed as a 152 Mascot Insta
152. ed separately in a number of files that are then memory mapped The description line s are not memory mapped only an index to the description in the original database Compared with mapping the origi nal FASTA database this can reduce memory requirements by more than 30 Furthermore the savings for a nucleic acid database are even greater because the files are compressed with a 2 1 ratio For trouble shooting purposes Monitor can be started from a command or shell prompt with the argument DEBUG Under Windows ms monitor exe must not be started from the command line if it is already running as a service Status and error messages from Monitor can be viewed from a web browser using the Mascot Status application described below GetSeq is a utility for retrieving the sequence title or full text of an entry in a database configured for use by Mascot The utility can be used to retrieve information for a single entry or in batch mode Single entry mode The executable x cgi ms getseq exe accepts the following command line parameters 1 The name of the database e g NCBInr This argument is re quired 2 An accession string e g 1OOK_RAT This argument is required 3 One of five keywords seq all len title or pI This argument is required and is explained further below 4 Nucleic acid databases only Frame number between 1 and 6 to retrieve a sequence translated into protein or 0 for the original nucleic acid sequen
153. ed the choice of which web site to install under However there 44 Mascot Installation and Setup are a number of issues to be aware of if the Default Web Site is not used Perl installation On installation ActiveState Perl only sets up mappings for the default web site To add mappings for another web site right click on the new web site choose properties and then click on the Home Directory tab Then click on the configuration button If there is no entry in the list for pl extension click on Add and enter the following information Add Edit Application Extension Mapping X x Executable C Perl bin Perl ex Extension pl Browse Method exclusions PUT DELETE I Script engine I Check that fle exists Cancel He Host name Mascot will be installed in the correct virtual directory but the host base name may be wrong the installation program has chosen the computer name If you have set up multiple web sites then it is probable that you have created a DNS entry that is the same as the Host header name In this case replace the computer name with the Host header name Refer to the IIS documentation for details on setting up multiple web sites using the same IP and port address Briefly you will need to add a new web site and then click on the Web site tab of the properties box Next click on the Advanced button and enter a host name Chapter 3 Installa
154. ed to logs errorlog txt This is may be the only place to find a fatal error message resulting from a major configuration problem Examples of typical error messages are shown below A comprehensive list of all Mascot error messages can be found in the file errors html in the root directory of the Mascot CD ROM Error M00088 Job 2636 X00123 file upload Thu Mar 11 10 59 30 2009 Invalid command mass at line 1 of your query Line is where am I Error M00034 Job 2638 X00251 modifications Thu Mar 11 10 59 59 2009 Modification conflict Both Carbamidomethyl C and amp Carboxymethyl C modify the same residue Error M00133 Job 2639 X00938 www Thu Mar 11 11 00 21 2009 Peptide mass of 1234 is too small The minimum mass allowed is 30 Chapter 6 Configuration amp Log Files 105 Searches Log Every Mascot search is listed in logs searches log The Mascot Review utility provides a web browser interface to this file displaying filtered and sorted listings of searches Mascot Review is described in Chapter 7 Alternatively the file can be opened in a spreadsheet program The file consists of 14 columns delimited by tabs Row 1 contains column titles An example of a single entry is shown below 2633 t 185 t NCBInr t JSC t JSC gmail com t it gt amp data 20090311 F002633 dat t Thu Mar 11 09 10 36 2009 amp t 17 t User read res t 1 t PMF t Yes t 192 168 42 4 Tabs indicated by t
155. eference file this value is ignored and can be set to 0 14 Taxonomy Index of the taxonomy rule block to be used to parse taxonomy information If taxonomy information is not available or is not to be used this value should be set to 0 PARSE Do not modify this section if you ever use Database Manager The PARSE section contains Basic Regular Expressions used to extract strings from various files PARSE For NCBI accession e g RULE 6 s gi 0 9 For NCBI description everything after the first space RULE_7 Se ee eee end 78 Mascot Installation and Setup The syntax of a standard Basic Regular Expression BRE is described in Appendix A Rules defined in this section are referred to by means of their index number in two sections Databases and WWW RULE_6 for example looks for the gt at the beginning of the title line The string to be extracted is in backslashed parentheses gi then as many digits as possible The match will stop when a non digit is encoun tered such as a pipe symbol or a space If you are not familiar with regular expressions use the information in Appendix A to understand how the pre defined rules in mascot dat work A mistake in a rule called from the databases section may prevent Mascot from using the database concerned Always use the Database Manager to configure and test new database definitions before they are brought on line WWW Do not modify
156. egistrati O Windows Firewall Remote Management E Windows Management Instrumentation WMI O Windows Media Player C Windows Media Player Network Sharing Service O Windows Media Player Network Sharing Service In C Windows Peer to Peer Collaboration Foundation O Windows Remote Management O Wireless Portable Devices il oo00000gs8000 ooo0o0008008 ooo0000000 mo If you installed Apache instead of IIS there may be no entry for HTTP Choose Windows firewall with advanced security then Incoming ruleg 42 Mascot Installation and Setup File Action View Help e gt 20e 4h E oora rues Inbound Rules Name Group ei By Outbound Rules Buy Connection Security Rules Apache HTTP Server F New Rule b E Monitoring Apache HTTP Server Y Filter by Profile BranchCache Content Retrieval HTTP In BranchCache Coi Y Filter by State Y BranchCache Hosted Cache Server HTT BranchCache Ho BranchCache Peer Discovery WSD In BranchCache Pee Connect to a Network Projector TCP In Connect to a Netw View connect to a Network Projector TCP In Connect to a Netw connect to a Network Projector WSD Ev Connect to a Netw Connect to a Network Projector WSD Ev Connect to a Netw sp Export List connect to a Network Projector WSD Ev Connect to a Netw 6 Connect to a Network Projector WSD Ev Connect to a Netw Connect to a Network Projector WSD In Connect to a Netw
157. eins jobid 873 gt lt msgs protein gt lt msgs accession gt RL19 YEAST lt msgs accession gt lt msgs db gt SwissProt lt msgs db gt lt msgs prot_title gt amp gt sp P05735 RL19 YEAST 60S ribosomal protein L19 OS Saccharomyces cerevisiae GN RPL19A PE 1 SV 5 lt msgs prot_title gt lt msgs prot_len gt 189 lt msgs prot_len gt lt msgs prot_ pi gt 11 35 lt msgs prot_pi gt lt msgs prot_ sequence gt MANLRT ALLKEDA lt msgs prot_sequence gt lt msgs protein gt lt msgs protein gt lt msgs accession gt G3P2 YEAST lt msgs accession gt lt msgs protein gt Chapter 7 Program Reference 119 lt msgs protein gt lt msgs accession gt TRY1_ BOVIN lt msgs accession gt lt msgs protein gt lt msgs all_proteins gt lt msgs ms_getseq out gt Error messages All errors have unique codes and are logged to both the XML output and the Mascot error log but only the first 10 instances of any particular error number The XML output contains a full set of error messages in a structured format that can be processed automatically Fatal Errors no database entry is going be retrieved 403 Error while reading mascot dat Parameters errstring error message as generated by ms parser 463 db parameter is missing 464 accession parameter is missing 440 Invalid session or session ID Parameters errstring error message as returned by security objects 443 Not allowed to search the da
158. em context or in the context of the requesting user J pf e e Configuration localhost applicationHost config lt location path Default Web Site mascot gt If you have configured IIS 7 with multiple web sites and the Mascot server is not installed in the default web site you will need to browse to Appendix D Web Server Configuration 235 the appropriate location You can also inspect the CGI timeout at other connection nodes in case a different timeout has been set manually at the cgi node or even at the level of individual files inadvisable Apache Apache is a very rugged and popular server for Unix platforms It is a less obvious choice for Windows since the Mascot installation program will configure Microsoft IIS automatically Important When using Apache in the Options section of mascot dat ensure that ForkForUnixApache is set to 1 If the URL mascot is mapped to disk path mascot htm1 then URL mascot images will correspond to disk path mascot html images So it is important that the entries for the cgi and x cgi directories come before that for the htm1 directory Otherwise the server will report that it cannot find the cgi and x cgi paths because it has assumed from the URL that they are sub directories of mascot html If the web browser connection breaks when submitting a large search or viewing a large result report add or increase the Timeout directive in the configuration file Remember to r
159. en on an individual report 94 Mascot Installation and Setup URL by appending amp _server_mudpit_switch X where X is the ratio between the number of queries and the number of database entries after any taxonomy filter NoResultsScript cgi master_results pl ProteinFamilySwitch 300 ResultsFullURL URL cgi master_ results pl ResultsFullURL_ 2 URLH cgi master_ results 2 pl ResultsPerlScript cgi master results pl ResultsPerlScript 2 cgi master_results 2 pl These are URL s not disk paths for the scripts to be called by the search engine at the completion of a search A successful search calls ResultsPerlScript if the number of queries is less than ProteinFamilySwitch otherwise ResultsPerlScript 2 A search that didn t find any hits calls NoResultsScript The ResultsFullURL and ResultsFullURL 2 are used when a link to the search results is emailed to a user Since the email will probably be received on another system the link needs to have the full URL includ ing the Web server hostname URL 1s replaced by the server URL during installation NTIUserGroup Users NTMonitorGroup Administrators Under Windows the Mascot service is generally run using the Local System account It has to create write and read the memory mapped files The CGI scripts such as nph mascot exe are run by the Web server and will be run using a different user name with different permissions from the service These program
160. ended to apply and the section as a whole is intended to apply in other circumstances It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system it is up to the author donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License 8 If the distribution and or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries so that distribution is permitted only in or among countries not thus excluded In such case this License incorporates the limitation as if written in the body of this License 9 The Free Software Foundation may publish revised and or new versions xii Mascot Installation and Setup of the General Public License from time to time Such new versions wi
161. ent You have read and understand this agreement and agree that it constitutes the complete and exclusive statement of the agreement between us with respect to the subject matter hereof and supersedes all proposals representations understandings and prior agreements whether oral or written and all other com munications between us relating thereto Assignment This agreement is personal to you either an individual or a single corporate entity and you may not assign transfer sub contract or otherwise part with this agree ment or any right or obligation under it without our prior written consent Law and Disputes This agreement and all matters arising from it are governed by and construed in accordance with the laws of England and Wales whose Courts shall have exclusive jurisdiction over all disputes arising in connection with this agreement If you have any questions about this agreement write to us at Matrix Science Ltd 64 Baker Street London W1U 7GB UK or call us at 44 0 20 7486 1050 or email us at info matrixscience com End User Licence Agreements Vv Xerces ie The Apache Software License Version 1 1 Copyright c 1999 2001 The Apache Software Foundation All rights reserved Redistribution and use in source and binary forms with or without modification are permitted provided that the following conditions are met 1 Redistributions of source code must retain the above copyright notice
162. equence query as being N terminal or C terminal series The first field in a tagn value is t for a standard sequence tag and e for an error tolerant sequence tag Some search parameters can be define in the local scope of a query These are CHARGE COMP INSTRUMENT IT_MODS TOL TOLU Any that are used are listed here If the MGF file contained scan range information in terms of seconds or scans this is written to RTINSECONDS and or SCANS Index gc0p4Jq0M2Yt084jU534c0p Content Type application x Mascot name index parameters 4 masses 78 unimod 116 enzyme 322 taxonomy 329 header 336 summary 351 et_summary 6059 peptides 6473 et_peptides 7143 proteins 7292 queryl1 7362 query2 7374 query81 8322 query82 8334 gc0p4Jq0M2Yt084jU534c0p Values in index are the line number offsets of the section Content Type lines starting from 0 for the first line of the file 164 Mascot Installation and Setup 165 Taxonomy ascot supports the use of a taxonomy filter to limit the database entries to be searched This is useful because it speeds up the search and can reduce the proteins in the results list to those expected in the sample being analysed Some databases record taxonomy in a manner that makes it difficult to extract the information reliably The major problems are 1 The location of the text containing the species identifier is mostly not defined and can even vary within one database 2
163. equired In the case of a nucleic acid database the limited character set allows Mascot to pack two base codes into each byte of memory If a taxonomy filter is required a taxonomy index is built at the same time as the file is compressed Naming Conventions and Directory Structure Although Microsoft Windows permits file and directory names to include spaces file and directory names to be used by Mascot or to appear in a URL cannot include spaces By following some simple conventions in database naming Mascot Monitor enables sequence databases to be automatically updated with out any disruption to on going searches Chapter 5 Sequence Database Setup 59 The procedure followed by Monitor is that the new database is com pressed and tested by running a standard search If errors are detected in the new database the database exchange process is abandoned Assuming the test is successful all new searches are performed against the new database while searches that were in progress against the old database are allowed to continue Once the final search against the old database is complete the disk file is moved into an archive directory If the database being exchanged is memory mapped the mapping and un mapping are also handled automatically Assuming that the new database will be updated periodically a directory structure similar to the one created for SwissProt during installation is recommended For example
164. er each protein match in a peptide summary report matches to proteins that contain a sub set of the same peptides will also be listed This was the default behaviour in version 1 6 and earlier If this flag is set to 0 which is now the default the sub set matches will not be shown Values between 0 and 1 represent the fraction of the protein score of the primary hit that a subset hit can lose and still be listed For example if ShowSubSets is 0 2 and the primary hit has a protein score of 200 sub set hits with scores of 160 or more will be listed If multiple entries contain the full set of peptides they are all displayed whatever the setting of this parameter This global default can be over ridden on an individual report URL by appending amp _showsubsets X where X is 0 or 1 SigThreshold 0 05 Significance threshold used in result reports default 0 05 Valid range is 1 to 1E 18 This global default can be overridden on an individual report URL by appending amp _sigthreshold X where X is the significance thresh old Chapter 6 Configuration amp Log Files 101 SiteAnalysisMD10Prob 0 1 Used to calculate relative probabilities of modification assignments in Peptide View It defines the factor in probability that a peptide score difference of 10 corresponds to The default is 0 1 which means a score difference of 10 corresponds to a factor of 10 in probability Similarly 0 05 corresponds to a factor of 20 SortUnassigned sc
165. er node as the user who will own the ms monitor exe process generally root and generate a version 2 RSA key pair by executing ssh keygen t rsa 2 When asked for a passphrase press return to indicate no passphrase is required Accept the default location for saving the key files SHOME ssh 3 The contents of the public key HOME ssh id_rsa pub must be added to a file called HOME ssh authorized_keys on each of the search nodes 4 Test communication by logging in to each search node from the master node using ssh The first time a connection is made confirm that the new host should be added to the list of known hosts HOME ssh known_hosts Installation Perform a standard installation of Mascot onto the master system ac cording to the procedure in Chapter 2 Verify correct system operation as a single server by performing searches of SwissProt and familiarise yourself with administrative tools such as ms review exe and ms status exe Chapter 7 Any problems need to be resolved before reconfiguring for cluster operation Chapter 11 Cluster Mode 201 Cluster Configuration Procedure 1 Kill ms monitor exe 2 Open mascot config mascot dat in a text editor Move down to the Cluster section and enter configuration information for the cluster The parameters are fully described below in the Reference section In the databases section verify that the threads and blocks parameters are set to 1 for all databases
166. eronin homolog Hsp 60 mitochondrial OS Caenorhabditis elegans GNehsp 60 PE 1 SV 2 CH6O STEMS 60 kDa chaperonin OSsStenotrophomonas maltophilia strain RS 1 3 GNegroL PE 3 sVel Mascot Score Histogram Ions score is 10 Log P where P isthe probability that the observed match is a random event Individual ions scores gt 41 indicate identity or extensive homology p lt 0 05 Protein scores are derived from ions scores as a non probabilistic basis for ranking protein hits Numer of Mite a 15 Protein Scere Sr Peptide Summary Report As Papnde Summary Significance threshold p lt 005 Help Max number of hits AUTO Standard scoring MudPIT scoring lons score or expect cut off 9 Show sub sets 0 Show pop ups Suppress pop ups Sort unassigned Oecreasing Score Require boldred Select All Select None Seah Selected D Error tolerant Archive Report 1 CH6EO HUMAN Mass 61197 Score 1365 Matches 33 29 Sequences 21 19 60 kDa heat shock protein mitochondrial OS Homo sapiens GN lt HSPD1 PE 1 SV 2 Check to include this hit in error tolerant search or archive report 1of3 Query Observed Mr expt Mxr calc ppm Miss Score Expect Rank Unique Peptide z LL 417 18522 832 3498 6832 3828 39 57 0 as 0 018 1 K APGFGONR K 2 12 422 7433 843 4720 843 5066 a6 0 037 2 kd K VGEVIVTK D z 13 430 7328 859 4510 859 4837 36 0 32 1 v K IPAWTIAK N Oxidation M ad 15 451 2499 900 4853 o 56 0 003 1 v K LSD
167. es Boolean values can be coded in different ways true TRUE True on any number except 0 any string except an empty string false FALSE False 0 All missing parameters are defaulted to false value Missing frame parameter by default is equal to 0 Output format In response to any POST request XML format output is returned Encoding UTF 8 is to be used for output XML output is schema vali 118 Mascot Installation and Setup dated and schema versioned All XML output must be XML escaped using the following substitutions gt amp gt lt amp lt amp amp amp 6 amp apos amp quot Proteins are returned in the order requested A lt msgs frame gt element will only be output for an NA database The example input file would produce output similar to this edited for brevity lt xml version 1 0 encoding UTF 8 standalone no gt lt msgs ms_getseq out xmlns msgs http www matrixscience com xmlns schema msgetseq 1 majorVersion 1 minorVersion 0 xmlns xsi http www w3 org 2001 XMLSchema in stance xsi schemaLocation http www matrixscience com xmlns schema msgetseq 1 msgetseq 1 xsd gt lt msgs all_ errors gt lt msgs error code 461 gt lt msgs err description gt Sequence not found lt msgs err_ description gt lt msgs err param name accession gt ERROR_YEAST lt msgs err_param gt lt msgs error gt lt msgs all_errors gt lt msgs all prot
168. es number of residues in DB distribution see below distribution decoy see below decoy _type n type of decoy 1 8 exec _time search time in seconds date timestamp seconds since Jan 1 1970 time time in hh mm ss queries number of queries gt 1 max hits maximum number of hits to be listed version version information fastafile full path to database fasta file Chapter 8 I O File Formats 155 release filename of actual database used e g Owl_31 fasta taskid unique task identifier for searches submitted asynchronously pmf num _queries used number of mass values selected for PMF match pmf_ queries used comma separated list of selected query numbers Warn0 Warnl Warn2 gc0p4JqOM2Yt084jU534c0p The Header section contains general values used in the master results page header paragraph Distribution is a comma separated list of values that represent a histogram of the complete protein score distribution The first value is the number of entries with score 0 the second is the number of entries with score 1 and so on up to the maximum score for the search Scores are converted to integers by truncation This distribution is only mean ingful for a peptide mass fingerprint search If intensity values are supplied for a peptide mass fingerprint Mascot iterates the experimental peaks to find the set that gives the best score The number of values selected is reported in pmf_ num queries _ used and the selected queries li
169. estart Apache after saving the change The argument is in seconds Timeout 3600 Linux configuration The following lines illustrate typical mappings and permissions for the Mascot directories ScriptAlias mascot cgi htsearch usr lib cgi bin htsearch lt Directory usr local mascot cgi gt AllowOverride None Options None Order allow deny Allow from all lt Directory gt ScriptAlias mascot cgi usr local mascot cgi lt Directory usr local mascot x cgi gt AllowOverride None Options None 236 Mascot Installation and Setup Order allow deny Allow from all lt Directory gt ScriptAlias mascot x cgi usr local mascot x cgi lt Directory usr local mascot html gt AllowOverride None Options None Order allow deny Allow from all lt Directory gt Alias mascot usr local mascot html Windows Installation If you choose to use Apache under Windows a good starting point for support information is http httpd apache org docs 2 4 platform windows html Important If IIS is installed stop the IIS service before installing Apache Perl and Mascot Mascot 2 4 has been tested with Apache 2 2 22 on Windows 7 After Mascot has been installed from the Windows Start menu choose Programs Apache HttpServer 2 2 Configure Apache Server Copy the customised Apache configuration settings from the httpd conf file in the Mascot config directory and paste them at the end of the Apache httpd conf file Save the changes From
170. g usr lib sendmail Set MailTempFile as the name of the file used to store email messages until they can be sent must be the path followed by a filename in the form MXXXXXX This will create temporary files that begin with M Chapter 6 Configuration amp Log Files 85 followed by a unique number Typically this parameter will be var tmp MXXXXXX Blat Configuration Windows only Blat is a free easily installed mail program for Windows For more information visit http www blat net Set MailTransport to 3 Set the EmailUserFrom parameter to the name that is required in the From field of the email messages Set EmailFromTextName as the name of the server that is running mascot For example setting EmailUserFrom to www and EmailFromTextName to Mascot Server will result in emails from www Mascot Server The From field of the email will be www www your_domain com Set sendmailPath as the fully qualified path including drive letter for the Blat program Set MailTempFile as the name of the file used to store email messages until they can be sent must be in the form path MXXXXXX This will create a new temp file where the first letter will be an M and the next 6 characters will make up a unique number Typically this parameter will be c temp MXXXXXX ErrorLogFile logs errorlog txt GetSeqJobIdFile data getseq job InterFileBasePath c inetpub mascot data Windows usr local mascot data Linux InterFi
171. g exe u 1172165637 he Edit Enzyme CNBr Trypsin General Title CNBr Trypsin Independent o Semispecific Fi Components Sense Cleave At Restrict Delete C Term M o C Term KR p o Delete est Protein MSEELSQKPSSAQSLSLREGRNRFPFLSLSQREGRFFPSLSLSERDGRKFSFLSMFSFLM PLLEVIKITISSVASVIFVGFACVTLAGSASALVYS TPYFIIFSPVLYPATIATVVLATGFTAG GSFGATALGLIMWLVKRRMGVKPKDNPPPAGLPPNSGAGAGGAQSLIKKSKAKSKGGLK Start End Peptide 1 1 n 2 18 SEELSQKPSS AQSLSLR 19 21 EGR 22 23 NR 24 FPFLSLS0R kal EGR 36 FFPSLSLSER 46 DGR 49 K 50 FSFLSM 56 FSFLM 61 PLLEVIK ITIASVASVI FYGFACVTLA GSAAALVVST PYFIIFSPVL VPATIATVV L ATGFTAGGSF GATALGLIN WLVK ooon PUNE H 68 g Local intranet File format enzymes Each cleavage agent is defined by a block of lines Blocks are delimited from one another by a line containing an asterisk Each line in a block starts with a keyword 70 Mascot Installation and Setup Title Trypsin Cleavage KR Restrict P Cterm Title Asp N Cleavage DB Nterm k The first line of each block must start with the Title keyword fol lowed by a text string that is used to identify the cleavage agent in forms and reports The definition should be short and self explanatory It should only include alphanumeric characters and spaces Internal spaces are significant Each block must also include a line starting with the keyword Cleav age fol
172. gistration File Please Note As part of the product registration process the following information will be transmitted to Matrix Science e Details of any existing licence e Machine identifiers for node locking purposes eg MAC address z A product key is required and must be registered online The licence file will be returned by email and must be saved to the specified location on the Mascot server If the Mascot server cannot connect to the Internet a file containing registration information can be saved and copied to a system with Internet access for submission The registration form allows a second email address to be specified in case the person installing Mascot is not the end user Ensure that the end user email address is entered into the upper part of the form and the 38 Mascot Installation and Setup email address to which the licence file should be sent is entered into the CC email field in the lower part of the form The licence file must be saved to the config licdb directory as a file with the extension lic Verify System Operation A copy of the SwissProt database is included in the files copied from the CD ROM It is recommended that the operation of Mascot is verified and tested using this database before adding further databases or making configuration changes The Mascot Monitor service is used to manage the swapping and memory mapping of the sequence databases used by Mascot For Mascot to operate
173. grams A value of 2 can be used if the same user name is used to run Web server scripts as runs ms monitor exe This is generally only possible under Irix using capabilities In this case The files created by ms monitor exe will not be world accessible and chown is not used on the files to change ownership Failure to put the correct group name will generally result in one of two error messages Chapter 6 Configuration amp Log Files 103 Failed to open memory mapped file lt filename gt amp Error access denied or Failed to create memory map for lt filename gt amp Error Access denied Vmemory 1 Obsolete Cron Do not modify this section if you ever use Database Manager Database Manager uses the information in this section to schedule database updates Cron CronEnable 1 Logfile logs cron log Logging 3 0 59 1 31 usr local mascot bin dbman_process tasks pl end CronEnable is set to 1 to enable cron functionality 0 to disable Logfile specifies the path to the log for recording cron events Logging controls the verbosity 0 No logging 1 Log successful commands return code 0 2 Log unsuccessful commands return code not 0 3 Log successful and unsuccessful commands The remaining lines in this section simulate a crontab file Each line contains six fields separated by spaces or tabs The first five are integer patterns that specify the following minute 0 59 hour
174. guration MATRIX Configure IIS web site settings i CIEN CE Please enter the name of an existing IIS web site that you want to use for Mascot Usually the default web site is the most appropriate Web Site Default Web Site Below you can modify the name of the Mascot virtual directory in IIS However we recommend that you accept the default name This value is added to the web site given above to form the full Mascot URL eg you might type into your browser http EC VM64 mascot Virtual Directory ME eonan aa For Apache or any other web server you need to confirm the local web server hostname and port Do not enter localhost as the web site if you wish to access your Mascot server from other computers on your LAN If there are DNS problems so that a hostname is not recognised across the LAN then enter an IP address The default ports are 80 for http and 443 for https The installer will test that the web server responds using the specified hostname and port number If you have configured your Apache web server as a secure server https check the box for Use SSL TLS to access this web site 34 Mascot Installation and Setup ii EAE SEEE AEEA NEANS ERE IME ENSO ga B Apache Configuration MATRIX Configure Apache web site settings SCIEN CE Please enter the host name or IP address to be used for accessing the Apache web server on this computer You may optionally specify a port number after a colon Usually
175. gure a file update schedule so that new releases are downloaded automati cally For more information about Database Manager refer to the Mas cot HTML help pages If you want to set up a custom database such as the proteome or genome of a single organism download and configuration information can also be found in the Mascot HTML help pages Note that the HTML help pages for your in house Server are only updated when you install a new version of Mascot so for the latest information go to the help pages on the Matrix Science public web site http www matrixscience com help seq_db_setup html This chapter contains reference material most of which is only impor tant if you choose not to use Database Manager The Fasta Format Mascot can search any Fasta format sequence database as long as it can parse a unique identifier accession string from each entry in a consist ent fashion The accession string can contain any US ASCII printing characters except comma and double quotes The Fasta format is extremely simple Each entry consists of a one line title followed by one or more lines containing the contiguous sequence string in 1 letter code Fasta databases can contain either amino acid sequences or nucleic acid sequences but not both Nucleic acid databases are translated on the fly by Mascot in all six reading frames 58 Mascot Installation and Setup The Fasta title line begins with a greater than character followed by
176. h is submitted from a browser and the connection is broken before the search is complete the search will be killed The only known workaround is to use a different web server e g Apache Mascot requires Perl together with several Perl library modules ActivePerl 5 14 is recommended ActivePerl 5 8 5 10 and 5 12 are also supported Active Perl 5 14 2 build 1402 from ActiveState Corporation is supplied on the Mascot CD You must install or upgrade Perl after installing the Web server and before installing Mascot IMPORTANT You cannot perform single step upgrades for ActivePerl You must uninstall the old version before installing the new one As a precaution it is also worth deleting the Perl application directory after the uninstall step To install ActivePerl from the CD in Windows Explorer double click on the appropriate file 32 bit ActivePerl1 5 14 2 1402 MSWin32 x86 295342 msi 64 bit ActivePerl 5 14 2 1402 MSWin32 x64 295342 msi It is recommended that you accept all the default options for the installa tion Full documentation for ActivePer 5 14 can be found here http docs activestate com activeperl 5 14 full_toc html Chapter 3 Installation Microsoft Windows 29 The Mascot installer uses the Windows file association for the pl extension to locate Perl If you have more than one version of Per in stalled ensure that the file association is for the correct version You can examine the current assoc
177. has a suitable version of Microsoft Windows installed Mascot requires Windows XP or later on Intel or AMD Virus scanning software or Microsoft Outlook should not be running during the installation Install Web server software unless already installed Install or upgrade Perl unless a compatible version is already in stalled Run setup32 exe 32 bit or setup64 exe 64 bit off the Mascot CD It is essential that steps 4 5 and 6 are performed in that order 24 Mascot Installation and Setup System Requirements Disk Space A typical installation of the Microsoft Web server requires about 150 MB A typical installation of ActiveState Perl requires about 120 MB A full installation of Mascot requires approx 3 6 GB The hard disk must be formatted for NTFS FAT32 has a file size limit of 4GB which would prevent the use of large sequence databases It is advisable that NTFS file compression is not used for the compressed database files There are reports that NTFS compression is not fully compatible with memory mapping NTFS file compression can be used on the FASTA and reference files if you wish Memory To get the best performance from Mascot the database files need to be memory mapped It is recommended that you have at least 4 GB of RAM On a 64 bit system 12 GB or more will help ensure best perform ance Microsoft Windows versions XP Mascot will run under Windows XP Professional Windows XP Home is not supporte
178. he Matrix Science Mascot Service and allows it to be stopped and started It is normally accessed from the start menu Chapter 7 Program Reference 143 Programs Mascot config Show Mascot Service Status Programs Mascot config Start Mascot Service Programs Mascot config Stop Mascot Service These options run the program x cgi ms service exe with the first parameter set to the service name MatrixScienceMascotService and the second parameter being 0 1 or 2 respectively It is also possible to run this program as a CGI script by entering the following URL in the browser http your host mascot x cgi amp ms service exe MatrixScienceMascotService 0 Where your host is replaced by the host name of the Mascot server This CGI script can be run from any computer on the network However it is not usually possible to start and stop the service from another computer using the default access rights There is a final option which will allow removal of the service This may be required for a manual de installation and will not normally be re quired If this option is used Mascot will not run again without re running the installation program The command to enter is ms service MatrixScienceMascotService remove Compress Compress is a utility for compressing FASTA files independently of Mascot monitor The executable bin ms compress exe is executed from a shell or command prompt ms compress exe db name fasta wher
179. he operating system Windows consumes approximately 60 MB anything from tens to hundreds of MB for each Mascot search and space for any other applications which might be running Chapter 6 Configuration amp Log Files 77 If you try to lock databases into RAM when there isn t room this will not be a major problem The locking will fail generate an error message and Mascot will carry on regardless A more serious problem is when there is just sufficient RAM to lock the databases but none left over for searches or other applications In this case the whole system will slow down and the hard disk will be observed to be thrashing Eventually the system is likely to hang or crash 10 Local ref file Flag to indicate whether a local reference file is available 1 or not 0 For certain databases e g SwissProt it is possi ble to have a local reference file from which full text information can be taken for a Protein View report 11 AccessionParseRule Index of the regular expression in the PARSE section that can be used to parse an accession string from a FASTA file title line 12 DescriptionParseRule Index of the regular expression in the PARSE section that can be used to parse a description string from a FASTA file title line 13 AccessionRefParseRule Index of the regular expression in the PARSE section that can be used to parse an accession string from a local full text reference file If there is no local r
180. hen submitting a Mascot search The HTML form executes a utility from Thermo for generating a peak list from a centroided raw file For more details see the Mascot HTML help page http www matrixscience com help instruments_xcalibur html EXTRACT The script supplied with Mascot expects to find the executable in the path C Program Files Thermo ExtractMSn ExtractMSn exe If the program is in another directory e g on a 64 bit system it might be in Program Files x86 or if the executable has a different name open the file mascot cgi lcq_dta_shell p1in an editor such as Notepad and modify the following line my lcqExe C Program Files Thermo ExtractMSn ExtractMSn exe Note that the backslashes used as directory delimiters must be entered in pairs exactly as shown above The script needs to create temporary files and it uses a directory C TEMP If this does not exist you should either create it or change the following line to point to a suitable temporary directory my S tempDir c temp To use the leq_dta form enter the filename and any other parameters required and press the Generate DTA Files button After a few sec onds the Mascot search screen will be displayed Enter search param eters and proceed as normal Multiple web sites Using IIS on Server versions of Windows it is possible to create multiple web sites If multiple web sites exist when Mascot is installed you will be offer
181. her yes if user selected an enzyme or no if user selected enzyme type None 14 User IP address At the top of each column is a checkbox and a radio button Select the radio button to sort the display on that column Uncheck the checkbox to hide that column OCmornonaw rwhd Along the top of the screen are a series of controls The Sort filter button updates the display to reflect changes in parameters If you have multiple log files a specific file can be displayed by entering its path into the Log File text field Start can be used to page through a long listing in blocks of entries specified by the number in the following field Setting start to 1 displays the list starting from the last entry in the log file rather than the first Finally there is a field to specify a path to the data files The log file only contains a relative path If the data files have been moved possibly to an archive directory or CD ROM the path to the new location can be specified here so as to restore the validity of the relative path An example of the Status display filtered to show MS MS searches of NCBInr is shown below Chapter 7 Program Reference 125 e C fi Owy a MASCOT search log 3 after filters Data dir e Ty 2 a a NCBI h NCBlx my ka sp NCBle h ch NCBle si m NCBle ch ch 197 2734 77 110 197 147 190 229 171 236 77 110 197 147 oo LEF E E s GetTaxonomy GetTaxonomy i
182. i mum number of occurrences The expression m matches exactly m occurrences of the preceding BRE m matches at least m occur rences and m n matches any number of occurrences between m and n inclusive For example in the string abababccccccd the BRE c 3 is matched by characters seven to nine the BRE ab 4 is not matched at all and the BRE c 1 3 d is matched by characters ten to thirteen The behaviour of multiple adjacent duplication symbols produces unde fined results Expression Anchoring A BRE can be limited to matching strings that begin or end a line this is called anchoring The circumflex and dollar sign special characters will be considered BRE anchors in the following contexts A circumflex is an anchor when used as the first character of an entire BRE The circumflex will anchor the expression to the beginning of a string only sequences starting at the first character of a string will be matched by the BRE For example the BRE ab matches ab in the string abcdef but fails to match in the string cdefab A dollar sign is an anchor when used as the last character of an entire BRE The dollar sign will anchor the expression to the end of the string being matched not including a final newline character if present A BRE anchored by both and matches only an entire string For example the BRE abcdef matches strings consisting only of abcdef 228 Mascot Installation and Setup 22
183. iation by opening a command window and entering ftype Perl ActiveState Marketing Requirements The following statements are included to comply with the ActiveState Redistribution Agreement Commercial support for ActivePerl is available through ActiveState at http www activestate com enterprise edition For peer support resources for ActivePerl issues see http community activestate com forums activeperl support The ActiveState Repository has a large collection of modules and exten sions in binary packages that are easy to install and use To view and install these packages use the Perl Package Manager PPM which is included with ActivePerl ActivePerl is the up to date quality assured ActivePerl binary distribu tion from ActiveState Current releases and other professional tools for open source language developers are available at http www activestate com Mascot Installation From My Computer or Windows Explorer select the Mascot CD and double click on setup32 exe for 32 bit or setup64 exe for 64 bit Before the installation of Mascot begins required Microsoft Visual C libraries will be installed The following window will be displayed 30 Mascot Installation and Setup rver Setup Welcome to the Mascot Server Setup Wizard O O m Wy aly The Setup Wizard will install Mascot Server 2 3 241 RC1 on this computer Click Next to continue or Cancel to exit the Setup Wizard N
184. if a fatal error occurs edit the registry key HKEY LOCAL MACHINE Software Microsoft DrWatson Set the value of VisualNotification to 0 When the Mascot node service starts on a Windows system it sets a Dr Watson registry entry to ensure that Dr Watson log files are written to the node logs direc tory Registry Settings Two registry entries are used on each search node to record the root directory of the mascot file structure and the port number used for communication For example HKEY_ LOCAL MACHINE SOFTWARE MatrixScience Mascot 1 00 MascotNodeFolder C mascotnode bin MascotNodePort 5001 Very large Mascot clusters Very large clusters gt 30 nodes pose certain special problems e Even with reliable hardware node failures can be expected relatively frequently e LAN communication can become a bottleneck e Need to avoid mixing processors with different speeds because the slower processors become a bottleneck Mascot allows large clusters to be divided into sub clusters Each sub cluster uses identical databases and configuration files but operates independently of the other sub clusters An incoming search can be directed to a specific sub cluster or the first available sub cluster Should a node go down only the sub cluster is affected Ideally there will be one or more spare nodes defined Mascot will reconfigure the sub cluster using a spare node and re start If there are no spare nodes
185. indows Firewall Choose Windows Firewall then Advanced Settings Select Inbound Rules in the left hand panel and New Rule in the action panel In the wizard choose Port Next TCP Specific Local Ports 5001 Next Allow Connection Next Clear the checkbox for Domain and Public Next Enter the name as MascotNodePort5001 Finish The new rule will be added to the list of Inbound Rules Nodes belonging to a Workgroup The steps in this section are not required if all the nodes belong to a Windows domain For XP and Server 2008 from the Control Panel select Administrative tools Choose Local Security Policy item and double click on it Go down the following path Security settings gt Local Policies gt Security Options On the right side panel select Network access Sharing and security model for local accounts Chapter 11 Cluster Mode 195 Local Security Settings File Action View Help vel ny 2 S8 Security Settings C Account Policies Re Interactive logon Require Domain Controller authentication to unlock worksta Disabled a Local Policies R8 Interactive logon Require smart card Not defined H rms ei Re Interactive logon Smart card removal behavior No Action a a h ide aomen RE Microsoft network client Digitally sign communications always Disabled 2 as TEET Microsoft network client Digitally sign communications if server agrees Enabled E E Software Restriction Policie Microsoft network client Send unen
186. ing Mascot on a single multiprocessor server leave the Enable cluster mode checkbox clear The next step is your last opportunity to cancel the installation re Mascot Server Setup toa el MATRIX Ready to install Mascot Server SCIENCE Click Install to begin the installation Click Back to review or change any of your installation settings Click Cancel to exit the wizard Back Cancel Copying the program files takes only a few minutes 36 Mascot Installation and Setup Installing Mascot Server i MATRIX SCIENCE Please wait while the Setup Wizard installs Mascot Server Status Copying new files Back Next Cancel Unpacking the SwissProt files takes longer and a command window will be dis played at this point Please be patient and don t try to close the command Window E Mase tu Tol gt MAS C OT Completed the Mascot Server Setup Wizard NY 10 NS Click the Finish button to exit the Setup Wizard x S You will not be able to perform any searches against a amp database eg SwissProt until the status of that database 2 changes to In Use in the Mascot server status page If the E 5 database needs to be compressed by Mascot then this may take some time to complete 9 50 Open Mascot server status page Back Finish Cancel Chapter 3 Installation Microsoft Windows 37 If you are using Apache model entries for the Apache configuration file can be found
187. ion if any accompanying the Software provided that the original and each copy is kept in your possession and that your installation and use of the Software does not exceed that allowed by this agreement modify the HTML and Perl documents for sole use by yourself in connection with the Software 3 Restrictions of Use You may not 3 1 3 2 3 3 3 4 3 5 3 6 3 7 4 Title load the Software into two or more computers at the same time If you wish to transfer the Software from one computer to another you must erase the Software from the first system before you install it onto a second system sub license assign rent lease or transfer the licence or the Software or make or distribute copies of the Software translate reverse engineer decompile disassemble modify or create de rivative works based on the Software except as permitted by Law make copies of the Software except for backup or archival purposes as permitted hereunder use any backup copy of the Software or allow anyone else to use such copies for any purpose other than to replace the original copy in the event it is destroyed or becomes defective distribute copies of modified HTML or Perl documents or copy the written materials except as provided by this agreement accompa nying the Software As licensee you own only the medium on which the Software is recorded We shall at all times retain ownership of the Software 5 Warranty We warr
188. ions but note that only Profes sional and Enterprise support remote desktop It is advisable to ensure that the latest service pack has been installed Check the following URL for current information http windows microsoft com en US windows downloads windows 7 The Microsoft web server for Windows 7 is IIS 7 5 By default this is not installed To install IIS from the Control Panel choose Programs and Features Turn Windows features on or off Expand the node for Internet Information Services then follow the configuration notes under the Windows Vista section above Web Server Mascot for Windows is tested with IIS and Apache 28 Mascot Installation and Setup The Mascot installation has been fully automated for Microsoft Internet Information Server 5 0 and later A good starting point for IIS support information is http www 1is net IMPORTANT If you are using IIS 7 x Vista Server 2008 Windows 7 you must configure it as described in the Windows Vista section above before proceeding with the installation Otherwise the Perl and Mascot installations will fail If IIS is configured as a secure server SSL TLS you must change it temporarily to non secure mode http on port 80 Once the installation is complete you can change back to secure mode If you wish to use Apache as your web server you will need to perform some manual configuration as described in Appendix D IIS 6 0 and later Perl If a searc
189. istribute copies of free software and charge for this service if you wish that you receive source code or can get it if you want it that you can change the software or use pieces of it in new free programs and that you know you can do these things To protect your rights we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights These restrictions translate to certain responsibilities for you if you distribute copies of the software or if you modify it For example if you distribute copies of such a program whether gratis or for a fee you must give the recipients all the rights that you have You must make sure that they too receive or can get the source code And you must show them these terms so they know their rights We protect your rights with two steps 1 copyright the software and 2 offer you this license which gives you legal permission to copy distribute and or modify the software Also for each author s protection and ours we want to make certain that everyone understands that there is no warranty for this free software If the software is modified by someone else and passed on we viii Mascot Installation and Setup want its recipients to know that what they have is not the original so that any problems introduced by others will not reflect on the original authors reputations Finally any free program is threatened constantly by software patents We
190. istribution and modification are not covered by this License they are outside its scope The act of running a program using the Library is not restricted and output from such a program is covered only if its contents constitute a work based on the Library independent of the use of the Library in a tool for writing it Whether that is true depends on what the Library does and what the program that uses the Library does xviii Mascot Installation and Setup 1 You may copy and distribute verbatim copies of the Library s complete source code as you receive it in any medium provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty keep intact all the notices that refer to this License and to the absence of any warranty and distribute a copy of this License along with the Library You may charge a fee for the physical act of transferring a copy and you may at your option offer warranty protection in exchange for a fee 2 You may modify your copy or copies of the Library or any portion of it thus forming a work based on the Library and copy and distribute such modifications or work under the terms of Section 1 above provided that you also meet all of these conditions a The modified work must itself be a software library b You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change c You must cau
191. itions for copying distributing or modifying the Program or works based on it 6 Each time you redistribute the Program or any work based on the End User Licence Agreements xi Program the recipient automatically receives a license from the original licensor to copy distribute or modify the Program subject to these terms and conditions You may not impose any further restrictions on the recipients exercise of the rights granted herein You are not responsible for enforcing compliance by third parties to this License 7 If as a consequence of a court judgment or allegation of patent infringement or for any other reason not limited to patent issues conditions are imposed on you whether by court order agreement or otherwise that contradict the conditions of this License they do not excuse you from the conditions of this License If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations then as a consequence you may not distribute the Program at all For example if a patent license would not permit royalty free redistribution of the Program by all those who receive copies directly or indirectly through you then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program If any portion of this section is held invalid or unenforceable under any particular circumstance the balance of the section is int
192. iversity SULW F7M9 TYGH 3GJ3 R3VJ Licence Info 1 Intel processor No hyper threading in cpu single core 0 searches running Search log monitor log error log Error message descriptions Do not auto refresh this page Name SwissProt Family C inetpub mascot sequence SwissProt current SwissProt_ SwissProt_2012 03 fasta Pathname C inetpub mascot sequence SwissProt current SwissProt_ Creating compressed files 20 complete Sat Apr 21 05 33 56 searches 0 Mem mapped NO Request to mem map YES Request unmap NO Mem locked NO Number of threads 1 Current NO Filename Status ohm ot State Time If an error occurs use the links to the monitor log and the error log to investigate the cause If all is well you will see the following messages displayed on the status line for SwissProt Creating compressed files Running 1st test First test just run OK Trying to memory map files Just enabled memory mapping In Use You can begin exploring and using Mascot However do not try to run searches or view results reports until the relevant sequence database is In Use Windows Firewall If Windows Firewall is enabled you may be blocked from accessing the Mascot server from other computers If so you need to open up port 80 From the Control Panel choose Windows Firewall In the case of Win dows XP on the exceptions tab check the box for World Wide Web Services HTTP 40 Mascot Installation a
193. l Setup 1 Choose a suitable location for the database files The default location for database directories is under the Mascot se quence directory but database files can be located on any local drive If you decide to put the files in a different location you will need to change the path in step 3 Create a directory called NCBInr and under this create three directories called incoming current and old 2 Unpack the files from the DVD archive Linux Unpack the files using gzip and tar If the Databases DVD is mounted as mnt dvdrom typical command lines would be cd usr local mascot sequence gzip dc mnt dvdrom NCBInr 20120419 fasta gz gt NCBInr current NCBInr 20120419 fasta cd taxonomy gzip dc mnt dvdrom taxonomy tar gz tar xvf Windows Many people will prefer to use a graphical utility such as WinZip or 7 Zip to unpack these archives Make sure you use a recent version that can cope with files larger than 4 GB Extract NCBInr_20120419 fasta gzinto NCBInr current and extract the files in taxonomy tar gz into the Mascot taxonomy direc tory It isn t sufficient just to de compress taxonomy tar make sure you extract the files inside the tar archive 62 Mascot Installation and Setup liz host Shared Folders scratch taxonomy tar gz taxonomy tar lee File Edit View Favorites Tools Help a b vyv x Add Extract Test Copy Move Delete Info e D di taxonomy tar gz ta
194. lay the file contents as raw text Installation Linux Release Notes Mascot 2 4 is compiled for 32 bit and 64 bit Linux Refer to the release notes for last minute additions to documentation and the Matrix Science web site support page for patches and known issues http www matrixscience com mascot_support html Cluster Mode If you have a licence to run Mascot on multiple processors and plan to do so on a networked cluster of machines then please familiarise yourself with the material in Chapter 11 Cluster Mode before proceeding with the installation System Requirements Web Server Perl Mascot is compatible with most web servers Appendix D provides con figuration information for Apache If a web server is being installed for the first time in connection with the installation of Mascot it is essential to verify that it is serving docu ments correctly before attempting to install Mascot Mascot requires Perl Perl 5 14 is recommended Perl 5 8 5 10 and 5 12 are also supported Mascot scripts assume that Perl can be found at usr local bin perl If Perl is installed in a different path just add a symbolic link ln s actual location of perl usr local bin perl 6 Mascot Installation and Setup If any library modules are missing this will be identified during the installation procedure Binary packages of Perl and most Perl modules are available for most Linux distributions The mechanism for downl
195. le 210 Mascot Installation and Setup Windows Manual Configuration The following configuration steps on each search node are performed automatically as part of the Windows installation MascotNodeService Under Windows ms mascotnode exe is configured to run as a service This should be taken care of automatically If there are any problems service creation or deletion requires the Microsoft utility sc exe which can be found in the mascot cluster Windows_NT directory The command to create the service is sc create MascotNodeService amp type own amp binpath c mascotnode bin ms mascotnode exe amp start auto You may need to change the path to the executable and note that the spaces after the equals signs are significant To verify that the service has been created successfully from the Control panel open the Services control panel and choose MascotNodeService Select Startup and the following dialog should be displayed Matrix Science Mascot Service Properties Local Comp General Log On Recovery Dependencies Log on as Oloc C Allow service to interact with desktop O Ihis account You can enable or disable this service for the hardware profiles listed below Chapter 11 Cluster Mode 211 To delete the service first stop it close the services control panel then enter sc delete MascotNodeService Dr Watson To prevent invisible dialog boxes from being displayed
196. le form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine readable source code which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange If distribution of object code is made by offering access to copy from a designated place then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code even though third parties are not compelled to copy the source along with the object code 5 A program that contains no derivative of any portion of the Library but is designed to work with the Library by being compiled or linked with it is called a work that uses the Library Such a work in isolation is not a derivative work of the Library and therefore falls outside the scope of this License However linking a work that uses the Library with the Library creates an executable that is a derivative of the Library because it contains portions of the Library rather than a work that uses the library The executable is therefore covered by this License Section 6 states terms for distribution of such executables When a work that uses the Library uses material from a header file that is part of the Library the object code for the work may be a derivative work of the Library even though the source code is not Whether this is true is especia
197. le loading taxonomy nodes Parameters messages more detailed error information Failed to register job Please inspect mascot error log A POST request is submitted with zero content length Cannot find boundary string First line was not a boundary Corrupted input possibly a binary file is submitted Corrupted input or incompatible browser Invalid accession format for ms gettaxonomy exe Too large POST request Invalid taxID format for ms gettaxonomy exe Standard input stream error Parameters bytesread number of bytes already read lengthofdata total size of input data in the stream Chapter 7 Program Reference 133 Non fatal errors 461 Sequence not found Parameters accession accession string 470 Cannot find taxonomy id Parameters accession accession string empty if non fatal error can be non empty only in warning section for accession requests taxid taxonomy id Warnings that are only reported in the end of XML document 400 Missing or invalid gencode id Table 1 is used for translation Parameters accession accession string empty if non fatal error can be non empty only in warning section for accession requests taxid taxonomy id 470 Cannot find taxonomy id Parameters accession accession string empty if non fatal error can be non empty only in warning section for accession requests taxid
198. leRelPath data MascotCmdLine cgi nph mascot exe MascotControlFile data mascot control MascotJobIdFile data mascot job MascotNodeControlFile data mascotnode control MonitorLogFile logs monitor log SearchLogFile logs searches log TestDirectory data test UniqueJobStartNumber 001234 These entries determine local paths not URL s ErrorLogFile MascotCmdLine MonitorLogFile SearchLogFile and TestDirectory are self explanatory Get SeqJobIdFile contains the next available job number for the ms getseq exe utility These numbers wrap around at 999 and do not 86 Mascot Installation and Setup appear in the search logs If this file is deleted the next job number will be reset to 1 and a new jobId file created automatically Mascot output files are written to a path given by InterFileBasePath InterFileRelPath yyyymmdd Fnnnnnn dat Where yyyymmdd is the current ISO date and nnnnnn is a sequential job number with a minimum of 6 digits The path is split into a base path and a relative path as seen by the CGI scripts so that the search engine can pass a file path to say master_results plas InterFileRelPath yyyymmdd Fnnnnnn dat TestDirectory contains the input files used by Monitor to test new sequence databases MascotControlFile contains critical internal parameters This file must be memory mapped and locked to provide interprocess communica tion between different Mascot components MascotNodeCo
199. lements within msgt tree is essential e inmsgt tree root element is not listed but always assumed e msgt translation_table_id element may not be available e Any of the elements msgt db_entry msgt tax_from_id can be missing or repeated several times depending on request Error messages All errors have unique codes and are logged to both the XML output and the Mascot error log but only the first 10 instances of any particular error number The XML output contains a full set of error messages in a structured format that can be processed automatically Fatal Errors no database entry is going be retrieved 403 Error while reading mascot dat Parameters errstring error message as generated by ms parser 463 db parameter is missing 465 POST request to ms gettaxonomy is empty 440 Invalid session or session ID Parameters 132 Mascot Installation and Setup 443 27 251 469 462 460 270 55 56 259 72 466 468 467 54 errstring error message as returned by security objects Not allowed to search the database Parameters db database name that was requested Database is not available or not active Parameters db database name that was requested No taxonomy indexes for this database Parameters db database name that was requested Failed to load species file Parameters messages more detailed error message One or more errors happened whi
200. les On the search screen find out what caused the error by clicking on the Error log link fix the fault possibly out of disk space and then click on retry 50 Mascot Installation and Setup 51 Validation CGI Operation To verify that the search engine is functioning correctly when executed as a CGI application launch a JavaScript aware web browser and load the Mascot home page http your_server mascot Select Mascot from the main menu and then choose the Peptide Mass Fingerprint link near the top of the page This will load the search form for a peptide mass fingerprint Enter your name and email address into the fields at the top of the form and type a number say 1234 into the Query field Then press the Start Search button The search form will be replaced by the search progress screen This has a few lines of text at the top ending in the line Searching Addi tional lines will appear showing the percentage of the search that has been completed Once the search is complete the Master Results page will appear Unless you went to the trouble of entering some real mass values the results will be meaningless Monitor Test When Mascot Monitor is started it runs a test search against each sequence database It also runs this same test search against any update to the database as part of the exchange procedure If the test search fails an error message will be displayed in the Mascot Stat
201. list txt If you edit this file while Mascot is not running these values can be deleted subcluster ID number 0 based node within subcluster 0 based status 0 unknown status 1 attempting to bring into use 2 no response to ping 3 failed to start service 4 in use number of CPU s actually being used File Replication The configuration files such as mascot dat that are on the Mascot master are automatically replicated to the nodes So it is only necessary to update a file on the master The ms monitor exe program run as the Matrix Science Mascot Service under Windows continually looks to see if a file has been updated and will distribute new versions to the nodes as required The dates times and lengths of the distributed files should be identical on all systems The same process is used for updates to executable programs except that these updates will only be made when the ms monitor exe service first starts The Status screen will indicate if any executable files need updating 208 Mascot Installation and Setup Files required on each Mascot Node Target File name and directory relative to node home directory Notes bin ms mascotnode exe Updated at start up bin nph mascot exe config enzymes config mascot dat Updated at start up config unimod xml config mascot license Updated at start up config taxonomy config fragmentation_rules config quantitation xml taxonomy nodes dmp
202. ll be similar in spirit to the present version but may differ in detail to address new problems or concerns Each version is given a distinguishing version number If the Program specifies a version number of this License which applies to it and any later version you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation If the Program does not specify a version number of this License you may choose any version ever published by the Free Software Foundation 10 If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different write to the author to ask for permission For software which is copyrighted by the Free Software Foundation write to the Free Software Foundation we sometimes make exceptions for this Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally NO WARRANTY 11 BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE THERE IS NO WAR RANTY FOR THE PROGRAM TO THE EXTENT PERMITTED BY APPLICABLE LAW EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND OR OTHER PARTIES PROVIDE THE PROGRAM AS IS WITHOUT WARRANTY OF ANY KIND EITHER EX PRESSED OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A P
203. llation and Setup string e g STY If there was a neutral loss the delta mass is given by the value of FixedModNeutralLoss1 FixedModn delta Name FixedModResiduesn A Z C_term N_term FixedModNeutralLossn mass Fixed modifications cannot have peptide neutral losses multiple neutral losses and cannot be protein terminal or residue terminal In all these cases fixed modifications are automatically converted into variable ones Variable modifications are reported in deltal delta2 etc Each entry defines the difference in mass introduced by the modification together with the name of the modification separated by a comma If a variable modification suffers a neutral loss on fragmentation the delta is speci fied by a NeutralLossn entry By definition this is always a master neutral loss If there are multiple neutral losses then two more lines appear NeutralLossn master mass mass NeutralLossn slave mass mass The first neutral loss defined by NeutralLossn has an implicit index number of 1 Any additional neutral losses defined by NeutralLossn_master or followed by NeutralLossn_slave have implicit index numbers of 2 and up If a modification includes a required or optional neutral loss from the precursor this is recorded as follows ReqPepNeutralLossn mass mass PepNeutralLossn mass mass Error tolerant modifications are not listed in masses section Quantitation gc0p4Jq0M2Yt08jU534c0
204. llowing files would be created for the database SwissProt_2012_03 SwissProt_2012 03 a00 SwissProt_2012 03 100 SwissProt 2012 03 s00 SwissProt _2012 03 stats Chapter 7 Program Reference 113 SwissProt_2012 03 NoTaxonomyMatch txt SwissProt 2012 03 t00 The final two files are only created if taxonomy is specified in the database configuration Compressed files are a proprietary format which is unlikely be useful for other applications 3 If a serious error occurs while creating these files then the conversion to the new database stops an error is put into the error log and optionally the error message is emailed to the administra tor Also if the status screen is shown the existence of the error is shown on that screen Searches on the existing database will con tinue until the problem is resolved 4 A test search is performed on the new database The test uses the appropriate file in the data test directory If the test is suc cessful then a file with the name lt database_name gt lt unique hash key gt fasta testedOk is put into the data test directory If the test fails then an error is put into the error log and optionally the error message is emailed to the administrator Also if the status screen is shown the existence of the error is shown on that screen Searches on the existing database will continue until the problem is resolved 5 Any new searches submitted by users will now use the new database
205. lly significant if the work can be linked without the Library or if the work is itself a library The threshold for this to be true is not precisely defined by law If such an object file uses only numerical parameters data structure layouts and accessors and small macros and small inline functions ten lines or less in length then the use of the object file is unrestricted regardless of whether it is legally a derivative work Executables containing this object code plus portions of the Library will still fall under Section 6 Otherwise if the work is a derivative of the Library you may distribute the object code for the work under the terms of Section 6 Any executables containing that work also fall under Section 6 whether or not they are linked directly with the Library itself xx Mascot Installation and Setup 6 As an exception to the Sections above you may also combine or link a work that uses the Library with the Library to produce a work containing portions of the Library and distribute that work under terms of your choice provided that the terms permit modification of the work for the customer s own use and reverse engineering for debugging such modifications You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License You must supply a copy of this License If the work during execution displays copyright notices you must inc
206. lowed by a list of the residues that identify the cleavage site Optionally a block can include a line starting with the keyword Re strict followed by a list of the residues which prevent cleavage if present adjacent to the potential cleavage site Finally the block must include either the keyword Cterm or Nterm to define whether cleavage occurs on the C terminal or N terminal side of the specified residues This syntax can be extended to support multiple cleavage specificities enabling enzyme mixtures to be modelled or mixed C term and N term cutters This is achieved by appending zero based index numbers in square brackets to the keywords Cleavage Restrict Cterm and Nterm For example Title CNBr Trypsin Cleavage 0 M Cterm 0 Cleavage 1 KR Restrict 1 P Cterm 1 Independent 0 The use of index numbers is optional when only one specificity is defined but required when there are multiple specificities as in this example For a definition with multiple specificities if the keyword Independent appears and is given a value of 1 this means that the specificities should be treated as if independent digests had been performed on separate sample aliquots and the resulting peptide mixtures combined Thus any given peptide will conform to the specificity of one cleavage type only In the case of CNBr Trypsin if Independent was set to 1 you would not find any peptides resulting from cleavage after K or R at one end
207. lude the copyright notice for the Library among them as well as a reference directing the user to the copy of this License Also you must do one of these things a Accompany the work with the complete corresponding machine readable source code for the Library including whatever changes were used in the work which must be distributed under Sections 1 and 2 above and if the work is an executable linked with the Library with the complete machine readable work that uses the Library as object code and or source code so that the user can modify the Library and then relink to produce a modified executable containing the modified Library It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions b Use a suitable shared library mechanism for linking with the Library A suitable mechanism is one that 1 uses at run time a copy of the library already present on the user s computer system rather than copying library functions into the executable and 2 will operate properly with a modified version of the library if the user installs one as long as the modified version is interface compatible with the version that the work was made with c Accompany the work with a written offer valid for at least three years to give the same user the materials specified in Subsection 6a above for a charge no more than the co
208. luster Mascot system limits The following are relevant to very large clusters Other system limits are listed in Appendix C of manual e Maximum number of processors per machine 64 e Maximum number of sub clusters in a cluster 50 e Maximum number of machines in a sub cluster 1024 e Maximum number of processors in a sub cluster 65536 Chapter 11 Cluster Mode 213 e Maximum number of nodes in nodelist txt 4096 Directing jobs to a sub cluster The SUBCLUSTER search parameter is used to direct jobs to a sub cluster This can be added to the web browser search form as a hidden field by editing the Perl script To use the next free sub cluster SUBCLUSTER 1 If all of the sub clusters have searches running and the search has been submitted from a browser then the following will be displayed in the browser until a sub cluster becomes free Waiting for sub cluster to become available To use a specific sub cluster e g sub cluster 2 SUBCLUSTER 2 The default value is 0 so if this parameter is not specified a search will go to the first sub cluster Specifying which sub cluster a particular job goes to usually implies some third party job queuing system is being used For example e Job gets submitted to PBS and PBS decides which sub cluster to run the search on e PBS adds a SUBCLUSTER x to the search parameters e PBS creates a task_id using ms searchcontrol exe create_task_id e PBS submits the search passing the
209. m specific executables for distribution to the nodes in a cluster config contains configuration files data contains Mascot results files By default a new sub directory is created for each day s results files The name of each sub direc tory is that day s date in ISO format yyyymmdd htm1 is the root directory for documents logs contains search and error logs etc 8 Mascot Installation and Setu mascot bin auto msparser GD modules cgi etc cluster lt platform gt config test om data 20100203 SS Ss J rtm fH downloads H help images pdf vendor templates master_results 2 xmins schema logs m sequence SwissProt m incoming sessions current taxonomy old unigene NCBInr incoming X Cgi i current Key old not mapped to a URL A sess etc l mapped to a URL mapped and executable J Figure 2 1 Mascot Directory Structure Chapter 2 Installation Linux 9 sequence contains a sub directory for each FASTA database As illustrated for each database there are 3 sub directories to organise the FASTA files into new downloads incoming active databases current and the most recently replaced files 01d sessions contains security session files taxonomy contains taxonomy resources
210. mbols and non alphanumeric charac ters should be URL encoded using the nn notation A reminder to Windows users Do not use backslashes as path delimiters because these will be interpreted as escape characters Most parameters are entered as literal strings with two exceptions ACCESSION is a place holder that will be replaced by an actual acces sion string FRAME is a place holder that will be replaced by the number of the reading frame used to translate a nucleic acid sequence Obviously this last parameter is only used with NA databases 80 Mascot Installation and Setup The syntax for calling ms getseq exe is described in Chapter 7 In the examples shown above the full text report for Trembl is taken from an external URL because the full text file for Tremb is huge 40 GB The default configuration for SwissProt uses a local full text reference file Processors Mascot licensing is physical CPU or socket based For each CPU covered by the licence Mascot will fully utilise up to 4 logical processors or cores If the number of processors available is the same as the number licensed then it is best not to include a PROCESSORS section You can include one if you wish but this may have a negative impact on system perform ance If the number of processors available is greater than the number li censed you can use a PROCESSORS section to force specific cores to be used Logical processor core numbers generally star
211. me these values set the limits on the amount the mass is allowed to increase MaxEtagMassDelta or decrease MinEtagMassDelta in order to reach the first available cleavage point MaxEtVarMods 2 The maximum number of variable mods allowed in the first pass of an automated error tolerant search global default can be over ridden for a group in security MaxNumPeptides The maximum number of peptides that can be expected from the enzymatic digest of a single entry The default is MaxSequenceLen 4 MaxPepNumVarMods 5 The maximum number of variable mods allowed for a PMF MaxQueries 10000 The maximum number of MS MS spectra allowed in a single search Note that the maximum number of mass values in a PMF is hard coded to 1000 92 Mascot Installation and Setup MaxSearchesPerUser 0 Sets the maximum number of concurrent searches from a single IP address A value of 0 means no limit global default can be over ridden for a group in security MaxSequenceLen 50000 The maximum length of a database entry in characters bases for NA or residues for AA The default is 50 000 The length of the longest se quence in a database can be found in the stats file created by Mascot Monitor when the database is compressed The larger the value of MaxSequenceLen the more memory mascot uses So if you need to increase it make it just a little greater than the length of the longest sequence On a 32 bit system try not to exceed 3 million
212. memory Each search node requires sufficient free disk space for the Mascot application software and the compressed FASTA databases The master also requires sufficient disk space for the original FASTA databases and the accumulating search result files The amount of space required for the results files depends on how heavily the system is used and how often the files are backed up and deleted For best performance it is advisable for the nodes to have local hard disks If you prefer to use shared storage then each node must have its own dedicated directory structure Mascot nodes may have any number of processors but the number of cores in each node should be a multiple of 4 to make maximum use of the number of CPUs in the licence A search node does not require a keyboard monitor or mouse If you are running Windows on the nodes and want to be able to see the indi vidual desktops you might consider using a KVM switch so that a single keyboard monitor and mouse can be shared between all the nodes Alternatively Windows Remote Desktop or VNC can be used http www realvnc com Operating System Requirements For nodes running Windows it is not necessary to use a Server version of Windows on the search nodes Chapter 11 Cluster Mode 183 For Linux clusters it must be possible for the master to communicate with the search nodes using ssh or rsh without quoting a password or passphrase Search nodes do not requir
213. modification is a single base change in the primary se quence the two mass fields will be set to zero and one of the keywords NA_INSERTION NA_DELETION or NA_SUBSTITUTION will appear in the description field The additional parameter hn_qm_na_diff is then used to record the before and after nucleic acid sequences If the search includes a quantitation method and the search parameter MULTI_SITE_MODS is set to 1 then a single site can carry two modifi cations When this occurs a second modifications string e g h1_pl_summed_mods is used to record the additional modification s Ion series is a string of 19 digits representing the ion series a place holder att b place holder b y place holder y c c X x Z z z H z H 158 Mascot Installation and Setup z 2H z 2H A digit is set to 1 if the corresponding series contains more than just random matches and 2 if the series contributes to the score Multiplicity means number of peptide mass matches for a query in a protein For each sequence tag four colon separated values are output 1 based tag number 1 based residue position where tag starts 1 based residue position where tag ends ion series into which the tag was matched 1 means no matches for the tag 0o a series single charge 1 a NH3 series single charge 2 a series double charge 3 b series single charge 4 b NH3
214. mon ossi isssssisssssesosrissreeisssre isens essises sitein cc seseiivaasees 179 Cluster Moderisano vi vaca ain eisi iu nasaat 181 SCCUVUY ER ERR RE nT RPE RTT TNT Pen EY Teme Pe Rare 215 Basic Regular Expressions cssccccsccossccsscccssccessccescccscccscees 225 Error MCSSABES sisisessdcsasiecsnssvecseigecssdeuessdeeecsessccsesereedosadsecsabecsess 229 System LiM tS isco cere casa ied Dei ceas oe tie a eee iaae iia ra sisa 231 Web Server Configuration csccccsccossccossccescccscccsscccecccescccescs 233 xxx Mascot Installation and Setup Typographical Conventions Description Example Filenames pathnames directory names 2Ph mascot exe folders programs and commands are printed in italic fixed pitch font In Unix these names are case sensitive usr local httpd gt owl 100K_RAT 100 KD PROTEIN gt EC 6 3 2 The contents of text files are printed in fixed pitch font on a grey background Where it is necessary to break a long line this is indicated by an indent and the symbols gt amp Text omitted from a line is indicated by an ellipsis while missing lines are indicated by 3 vertical periods gt owl 100K_RAT NORVEGICUS RAT Text which should be entered literally is C TEMP gt ftp ncbi nlm nih gov shown in bold fixed pitch font on a grey background Control characters are kill PID shown in angle brackets apart from lt return gt i e carriage return newline
215. mp PIR General Log On Recovery Dependencies Log on as O Local System account MATRIX_SCIENCE Administrat Password eeccccce Confirm password eeeeeeee You can enable or disable this service for the hardware profiles listed below Press OK and you will be returned to the Services dialog 198 Mascot Installation and Setup u Services File Action e gt m B RME m Sia Services Local i view Help Name SRA IPSEC Services By iTechnology iGateway 4 2 Sy Java Quick Starter Bs Logical Disk Manager Sa Mascot Daemon Service g Matrix Science Mascot Service Sy Message Queuing By message Queuing Triggers Ss Messenger ams Software Shadow Copy Provider Sa MySQL Sy Net Logon Sa Net Tcp Port Sharing Service Sy NetMeeting Remote Desktop Sharing Sa Network Access Protection Agent e etork Connections Extended A Standard Bs Logical Disk Manager Administrative S Description Manages IP security policy and st Allows iSponsors to publish and ru Prefetches JRE files for Faster sta Detects and monitors new hard di Configures hard disk drives and v Manages local Mascot databases Provides a communications infrast Associates the arrival of incoming Transmits net send and Alerter se Manages software based volume Supports pass through authentic Provides ability to share TCP port Enables an authorized user to acc Allow
216. n only be locked by root Before a Failed to lock memory for file xxx error is given Mascot Monitor will try and increase the amount of RSS available by calling Chapter 2 Installation Linux 21 setrlimit RLIMIT_ RSS xxx with the current value plus the size of the file to be locked Under Solaris the RLIMIT_AS value is used rather confusing use of AS by Sun If the resource limit cannot be increased then error M00114 Error calling setrlimit RLIMIT_RSS memory requested error detailed error message will be put into errorlog txt If the memory cannot be locked then the error M00073 Failed to lock memory for file file name Error detailed text will be put into the errorlog txt file If Mascot Monitor ms monitor exe is a 32 bit executable the 3 or 4 GB limit can quickly be reached by having several large databases locked into memory To work around this limit a separate ms lockmem exe program is provided this is fork d exec d from ms monitor exe when the flag SeparateLockMem 1 is added to the options section of mascot dat Physical memory If the amount of memory locked gets close to the amount of physical memory the system will grind to a halt The error M00073 Failed to lock memory for file file name Error detailed text will also probably be put into the errorlog txt file Data segment size This amount does not include the space used by memory mapped files
217. n security for NTLM SSP based including sec No minimum R Recovery console Allow automatic administrative logon Disabled Recovery console Allow floppy copy and access to all drives and all Folders Disabled RE Shutdown Allow system to be shut down without having to log on Enabled If the current setting is Guest only double click on the item to change the setting 196 Mascot Installation and Setup Network access Sharing and security model for local PIX dA Local Security Setting e Network access Sharing and security model for local accounts Classic local users authenticate as themselves Classic local users authenticate as themselves Guest only local users authenticate as Guest Select Classic local users authenticate as themselves and press OK Close the Local Security Settings window For Vista Server 2008 and Windows 7 a registry change is required to allow administrator rights when logging in using a local SAM account This procedure is taken from Microsoft KB article 951016 1 Click Start click Run type regedit and then press ENTER If the start menu does not have a Run option then open a Command Prompt window from the Accessories program folder and use this instead 2 Locate and then click the following registry subkey HKEY LOCAL MACHINENSOFTWAREMMicrosdf Windows Current Verson Polices System 3 If the LocalAccountTokenFilterPolicy registry entry does not exis
218. n the same system as Mascot Daemon Voyager DAT files can be processed If a copy of ExtractMsn exe or similar is installed on the same system as Mascot Daemon Thermo Xcalibur RAW files can be imported A utility called TS2Mascot can be used to import peak lists from an AB SCIEX 4000 5000 series database Several Mascot Daemon clients can submit searches to a single Mascot Server and can even share a common task database If you have several mass spectrometers you can choose whether to install separate copies of Daemon on each instrument data system or whether to have a single copy of Daemon somewhere on the LAN marshalling searches for all instruments User Help Mascot Daemon includes comprehensive context sensitive on line help Press F1 at any time to jump to the relevant topic Installation After Mascot Server has been installed go to your local home page for links to a help page that describes how to install upgrade or troubleshoot Mascot Daemon All the required installation files are hyperlinked from this page 181 Cluster Mode Introduction Mascot has been designed and implemented to work efficiently on a cluster of computers A cluster of single or dual processor boxes provides a highly cost effective solution for high throughput protein identification Mascot can be run in cluster mode on all supported hardware platforms and operating systems
219. nd Setup WP Wino Fes Sings E General Exceptions Advanced Exceptions control how programs communicate through Windows Firewall Add a program or port exception to allow communications through the firewall Windows Firewall is currently using settings for the private network location What are the risks of unblocking a program To enable an exception select its check box Program or port Secure World Wide Web Services HTTPS CI SNMP Trap O Windows Collaboration Computer Name Registration Service O Windows Firewall Remote Management O Windows Management Instrumentation WMI O Windows Media Player C Windows Media Player Network Sharing Service O Windows Meeting Space Windows Peer to Peer Collaboration Foundation C Windows Remote Management O Wireless Portable Devices World Wide Web Services HTTP F Notify me when Windows Firewall blocks a new program For Windows 7 the appearance is slightly different Chapter 3 Installation Microsoft Windows 41 e hP lt All Control Panel items Windows Firewall Allowed Programs File Edit View Tools Help Allow programs to communicate through Windows Firewall To add change or remove allowed programs and ports click Change settings What are the risks of allowing a program to communicate Allowed programs and features Name Domain Home Work Private Public Stylus Studio O Windows Collaboration Computer Name R
220. nk sequences including EST s which represent a unique gene It is not an attempt to produce a consensus sequence UniGene can be used to simplify the results of a Mascot search of dbEST An index file must be downloaded for each species of interest For each species the fully qualified path to the index file is associated with the species name UniGene human C Inetpub MASCOT unigene human current Hs data mouse C Inetpub MASCOT unigene mouse current Mm data mosquito C Inetpub MASCOT unigene mosquito current Aga data To add a UniGene report option to Mascot for a particular sequence database add a line containing the name of the database followed by a list of the available species names EST human human EST mouse mouse EST others mosquito end Options The Options section is used for miscellaneous parameters which are listed here in alphabetical order If a parameter is shown with argument s these are the default s that apply if the parameter is missing AutoSelectCharge 1 Controls how MS MS queries are treated when the CHARGE parameter specifies more than one charge state e g 1 2 and 3 This is usually because no charge information was available for a query so the search form defaults applied If set to 0 a query is generated for each charge state and these queries are searched and reported independently This is the default setting because this was the behaviour in earlier versions of Mascot 82 Mascot
221. nodelist txt A full description of these files can be found below in the Reference section Then start the Mascot service Windows Firewall on Search Nodes Windows XP and later includes a software firewall called Windows Firewall You can avoid the configuration steps in this section by turning off Windows Firewall on the search nodes If the search nodes are on a separate subnet that can only connect to the master node having a firewall enabled on a search node is of little use It is redundant until the Chapter 11 Cluster Mode 187 master node has been compromised by which time it is too late If the search nodes are not on a separate subnet or if you simply want to enable Windows Firewall because the operating system keeps nagging you to do so it is necessary to run through the following steps on each search node Windows firewall configuration varies across the different editions of Windows and also according to whether it was part of the original instal lation or added in a service pack Windows XP and Server 2003 On each search node log in as a user with local administrator rights Go to Control Panel and launch Windows Firewall On the Advanced tab make sure the network connection to the master is checked Windows Firewall General Exceptions Advanced Network Connection Settings Windows Firewall is enabled for the connections selected below To add exceptions for an individual connection select it
222. npacked after mascot tar so as to over write the 64 bit files in the mascot tar archive bzip2 d mascot 32 tar bz2 tar xvf mascot 32 tar Alternatively you can combine decompression and tar into a single command for example bzip2 dc dvdrom mascot tar bz2 tar xvf This will create the directory structure illustrated in Figure 2 1 Ensure that the ownership of the files matches the user ID that your web server is configured to use The mascot tar file has been created using root root The required ID when Apache is installed from a RedHat RPM will be apache apache and when installed on Ubuntu or Debian it will be www data www data chown R apache apache usr local mascot If this is not acceptable then the logs config sessions and data directories plus the file logs errorlog txt must be made writeable by the web server process Create URL mappings If this is a clean installation add the following mappings to your web server configuration substituting your actual disk path to the new mascot directory Disk path URL Executable usr local mascot cgi mascot cgi Yes usr local mascot html mascot No usr local mascot x cgi mascot x cgi Yes You may wish to restrict access to the administrative programs by setting a password or IP address restriction on mascot x cgi Chapter 2 Installation Linux 11 Notes on web server configuration can be found in Appendix D Example configuration entries for Apache can
223. ntrolFile is a similar additional file used in cluster mode MascotJobIdFile contains the next available job number If this file is deleted the next job number will be initialised to the value given by UniqueJobStartNumber and a new jobId file created automatically NB UniqueJobStartNumber must never be set lower than 1000 ErrTolMaxAccessions 0 The maximum number of database entries allowed for a manual error tolerant search Default is 0 meaning no limit ExecAfterSearch n flag num flag num title string command string Defines a command to be run after a search is complete N is one or two digits in the range 1 to 10 The Mascot installer creates the following two entries which provide Percolator integration ExecAfterSearch 1 waitfor 0 logging 0 Creating percola tor input bin ms createpip exe i sresultfilepath o percolator pip ExecAfterSearch 2 waitfor 1 logging 1 Percolating bin percolator exe PercolatorExeFlags The following flags may be specified Chapter 6 Configuration amp Log Files 87 flag num description waitfor 0 10 The command should wait for completion of the command specified by num A value of 0 means don t wait equivalent to omitting the flag logging 0 3 0 no logging 1 log successful commands return code 0 2 log unsuccessful commands return code not 0 3 log suc cessful and unsuccessful commands percolator 0 1 0 no dependency on Percolator 1 command
224. nucleic acid database then the length returned will depend on the translation frame number specified If the keyword title is supplied the FASTA title line is returned begin ning with a right angle bracket If the keyword pI is supplied the calculated iso electric point is returned 116 Mascot Installation and Setup Batch mode Request format GET request always means single entry mode POST request automati cally means batch mode A batch mode request should use UTF 8 encod ing and be of multipart form data enctype for example 41184676334 Content Disposition form data SwissProt 41184676334 Content Disposition form data RL19 YEAST G3P2 YEAST ERROR YEAST 41184676334 Content Disposition form data TRY1 BOVIN 41184676334 Content Disposition form data on 41184676334 Content Disposition form data on 41184676334 Content Disposition form data on 41184676334 Content Disposition form data on 41184676334 Content Disposition form data off 41184676334 Content Disposition form data 123456 41184676334 name db name accession name accession name showpi name showtitle name showlen name showsequence name showreference name sessionID Chapter 7 Program Reference 117 Maximum number of accession strings submitted at once shouldn t be more than 100 000 and the total size of request shouldn
225. oading and installing new modules and updates is distribution specific For example to install the non core module Bundle LWP on some common distributions Red Hat CentOS Linux yum install perl libwww perl Debian Ubuntu Linux aptitude install libwww perl SUSE Linux yast i perl libwww perl If a module has missing dependencies you will be prompted to install these The required non core modules are Module Debian Ubuntu Red Hat CentOS SuSE GD libgd gd2 noxpm perl perl GD Bundle LWP libwww perl perl lbwww perl Algorithm Diff libalgorithm diff perl _perl Algorithm Diff XML Simple libxml simple perl perl XML Simple If other applications require a version of Perl not supported by Mascot or if you have difficulty compiling Perl or one of the required modules the Mascot DVD includes ActivePerl 5 14 for Linux 32 bit and 64 bit You can install this for general use or for use by Mascot only A typical installation might use the following commands cd tmp gzip dc dvdrom ActivePerl 5 14 2 1402 x86 64 linux glibc 2 3 5 295342 tar gz tar xvf cd ActivePerl 5 14 2 1402 x86 64 linux glibc 2 3 5 295342 sudo install sh Follow the installation script instructions and choose to install into an appropriate location If ActivePerl was being installed for Mascot use only we might install into usr local ActivePer 5 14 and create a symbolic link as follows Chapter 2 Installation Linux 7 ln s usr local Activ
226. observed peptide mass in Da absDMppm Absolute value of calculated minus observed peptide mass in ppm isoDM Calculated minus observed peptide mass after eliminating possible isotope errors up to 2 Da in Da isoDMppm Calculated minus observed peptide mass after eliminating possible isotope errors up to 2 Da in ppm mc Number of missed cleavages always 0 if no enzyme varmods Number of modified sites divided by number of modi fiable sites varmodsCount The number of variable mods used in the peptide That is if there are 10 Met and 5 of these are oxidised this counts as 1 A peptide with Met Ox phosphoS deamidation and acetylation would count as 5 modifiable Total number of modifiable sites modified Total number of modified residues and terminii totInt Log total ion intensity The 20 most intense peaks in each 100 Da bin are used for all features and totInt reports this value intMatchedTot Log total matched ion intensity reliIntMatchedTot Total matched ion intensity divided by total ion intensity as a percentage no logs involved fragDeltaMed Median value of all matched fragment errors in Da fragDeltalqr Interquartile range value of all matched fragment errors in Da fragDeltaMedPPM Median value of all matched fragment errors in ppm fragDeltaIgqrPPM Interquartile range value of all matched fragment errors in ppm fragDeltaPolyFit 2nd order polynomial fit to m z vs delta Result is RSquared multiplied by the number of points divided by
227. ocessor with hyper threading enabled Troubleshooting Check the Mascot Server Support Page There may be a fix listed on the Matrix Science Web Site From the menu choose Support Mascot Server and scan down to see if your problem is described The installation program doesn t recognise Perl To test whether Perl is correctly installed you can open a command window and type perl v The version number should be displayed If this seems to be functioning correctly and the Perl version is either 5 8 5 10 5 12 or 5 14 re start the computer and then re run the Mascot installation program If it still fails contact Matrix Science technical support support matrixscience com If when you type perl v you see the text The name specified is not recognized as an internal or external command operable program or batch file then Perl is not installed or is not on the path If you have just installed it you should try restarting the computer and performing the test again If that fails try re installing Perl making sure that you choose the option to add it to the path The status screen shows an error If the Mascot Monitor service fails to start then the following text or something similar will be displayed in the status screen Chapter 3 Installation Microsoft Windows 47 lolx Ele Edt View Favorites Tools Help Ea esak gt O A A Reach Favorites lt Bristory G5 S B Address http hast123
228. ocumentation and any subsequent updates and supplements the Software By installing or using the Software you agree to be bound by the terms of this agreement If you do not agree to the terms of this agreement we are unwilling to license the Software to you In this case do not install or use the Software Return the Software to Matrix Science Limited or their authorised distributor within 30 days of receipt for a full refund 1 Licence Matrix Science Limited owns the copyright in the Software contained within this package and all other copies which you are authorised by this agreement to make This licence is personal to you either an individual or a single corporate entity as the purchaser of a licence to use the Software and the licence granted herein is for your benefit only You may not use the Software in any way that permits unlicensed access to the Software In particular individuals who are not party to this licence or the general public must not be permitted access to the Software through a public network such as the Internet 2 Permitted Users As purchaser of a licence to use the Software you may subject to the following conditions 2 1 load the Software onto and use it on a single computer of the type identi fied on the package which is under your control and ii Mascot Installation and Setup 2 2 2 3 copy the Software for backup and archival purposes and make up to two copies of the documentat
229. on ionquery4 2167 784350 from 1084 900000 2 query 4 daemon score4 39 11 Chapter 7 Program Reference 137 daemon Sigscore4 47 daemon Selectpeptides 1 If the job is incomplete or has failed then an error will be returned unknown_id searchcontrol error nnn with values of nnn as for status ms searchcontrol exe xmlresults taskID lt number gt reporttop FILE AUTO num hits sessionID lt string gt If the job is complete then this will return the results formatted as an XML instance document that conforms to the schema mascot html xmIns schema DistillerMascotSearch_1 DistillerMascotSearch_1 xsd If the job is incomplete or has failed then an error will be returned unknown_id searchcontrol error nnn with values of nnn as for status ms searchcontrol exe create task id sessionID lt string gt On failure this will return searchcontrol error nnn with values of nnn as for status And on success it will return taskID nnn ms searchcontrol exe mascot job number taskID lt number gt sessionID lt string gt This will return either the job number mascotjobnumber nnnn or searchcontrol error nnn with values of nnn as for status 138 Mascot Installation and Setup ms searchcontrol exe kill job taskID lt number gt sessionID lt string gt If the task is successful this will return the text job killed If the
230. on pertaining to NCBlInr Simply enable the predefined definition for NCBInr in Data base Manager and the latest files will be downloaded automatically If your Mascot Server is not connected to the Internet download the required files on a PC with Internet access and copy them to your Mascot Server Download URLs and configuration information for popular databases can be found on the Matrix Science web site at http www matrixscience com help seq_db_setup html As a convenience for users who have no access to the Internet a copy of NCBInr a comprehensive protein sequence database is included with Mascot on a separate DVD together with the required taxonomy files As with the copy of SwissProt installed with the Mascot program files these files were current at time of release but will become increasingly out of date The procedure to make use of the NCBInr files on the DVD depends on whether you use Database Manager or not Chapter 5 Sequence Database Setup 61 Using Database Manager e Choose Create New e Enter NCBInr as the database name choose Use predefined defini tion template and select NCBInr from the list e If necessary modify the location for the sequence database direc tory then choose Create e Unpack the Fasta file into the specified location and unpack the taxonomy files into the Mascot taxonomy directory as described below under Manual Setup step 2 e Choose Activate Manua
231. onitor exe must be running at all times Once the new licence file is in place follow the hyperlink to Database Status You should see a display similar to the following Mascot search status page e C fi Obogong mascot_2 64 x cgi ms status exe Show MAIN_PAGE MASCOT search status page Version 2 3 241 MSL XQ5P TFRR 3APW FB33 7H6X Licence Info 8 logical 2 physical Intel processors hyper threading disabled in bios quad core CPUs 0 7 2 3 4 5 67 available using 0 2 2 34567 0 searches running Search log monitor log error log Error message descriptions Do not auto refresh this page Name SwissProt Family usr local mascot_2_4 0 64 sequence SwissProt current Filename SwissProt_2012_03 fasta Pathname usr local mascot_2_4 0 64 sequence SwissProt curr Status Creating compressed files 63 complete State Time Fri Apr 20 17 30 30 searches 0 Mem mapped NO Request to mem map YES Request unmap NO Mem locked NO Number of threads 1 Current NO If an error occurs use the links to the monitor log and the error log to investigate the cause If all is well you will see the following messages displayed on the status line for SwissProt 16 Mascot Installation and Setup Creating compressed files Running 1st test First test just run OK Trying to memory map files Just enabled memory mapping In Use You can begin exploring and using Mascot However do not
232. onitor program ms monitor DEBUG Any error messages should be displayed on the screen If possible correct the faults and then start the Mascot Service from the start menu Note that the mascot service should never be running at the same time as ms monitor exe is being run from the command line 48 Mascot Installation and Setup International Versions Of Windows If Mascot is installed on a version of Windows that is not in the English language then when the ms status screen is displayed it may have the error Failed to initialise memory map To correct this fault the following procedure is required 1 You will need to find the names of the groups that your version of Windows uses for Administrators and Users In German for example these names are Administratoren and Benutzer respectively To see a list of User names from the start menu select Programs Ad ministrative Tools common User Manager The section at the bottom of the screen displays the group names Make a note of the two names From the start menu select Programs Mascot Config Stop Mascot Service From the start menu select Programs Mascot Config Mascot Configuration File Scroll down to near the bottom of the file and find the line NTIUserGroup Users and change this to for example for German NTIUserGroup Benutzer Find the line NTMonitorGroup Administrators and change this to for example for German NTMonit
233. or the Software 7 2 In no event will we be liable to you for any indirect or consequential dam ages even if we have been advised of the possibility of such damages In particular we accept no liability for any programs or data made or stored with the Software nor for the costs of recovering or replacing such program or data 7 3 Nothing in this clause limits our liability to you in the event of death or personal injury resulting from our negligence 8 Termination 8 1 The agreement and the licence hereby granted to use the Software auto matically terminates if you 8 1 1 fail to comply with any provisions of this agreement or 8 1 2 voluntarily return the Software to us 8 2 In the event of termination in accordance with clause 8 1 you must destroy or delete all copies of Software from all storage media in your possession 9 Severability In the event that any provision of this agreement is declared by any judicial or other competent authority to be void voidable illegal or otherwise unenforceable or indications of the same are received by either you or us from any relevant compe tent authority we shall amend that provision in such reasonable manner as achieves the intention of the parties without illegality or at our discretion such provision may be severed from this agreement and the remaining provisions of this agreement shall remain in full force and effect iv Mascot Installation and Setup 10 11 12 Entire Agreem
234. orGroup Administratoren Save the mascot dat file Delete the files c inetpub mascot sequence SwissProt current SwissProt a0d0o c inetpub mascot data mascot control Note that these files may be in a different directory if you did not install mascot in the default location From the start menu select Programs Mascot Config Start Mascot Service Re load the status page Programs Mascot Search Status You may need to re fresh re load the page Wait until the files have been compressed and a test search has been done Mascot is now ready for use Chapter 3 Installation Microsoft Windows 49 The site search facility does not work The local Mascot web pages are indexed using a product called ht Dig A log file is made as the indexes are built during the installation The log file mascot htdig build log may contain an error message indicat ing the nature of the problem If the web server was not operational during Mascot installation it will not have been possible to build the keyword index To build or rebuild it open a command window and enter the following commands If Mascot was installed into a different path you may have to modify the first two lines C cd inetpub mascot htdig bin htdig exe v bin htmerge exe v Once the commands have completed keyword search using the control at the top right of the web pages should be operational Search status shows a failure to create compressed fi
235. oredown In a peptide summary report peptide matches that are not assigned to protein hits are initially sorted by descending score scoredown Alterna tives for SortUnassigned are ascending query order queryup and descending intensity order intdown This global default can be overrid den on an individual report URL by appending amp _sortunassigned X where X is scoredown queryup or intdown SplitDataFileSize 10000000 Large searches are divided into chunks and no single chunk can exceed this number of bytes default 10 Mb When a search is divided into chunks protein and peptide match data are no longer written to the summary section of the result file This means that a Protein summary report cannot be generated SplitNumberOfQueries 1000 Large searches are divided into chunks and no single chunk can exceed this number of queries default 1000 When a search is divided into chunks protein and peptide match data are no longer written to the summary section of the result file This means that a Protein summary report cannot be generated StoreModPermutations 1 If set to 0 only the highest scoring permutation of variable modifications for each unique peptide sequence is retained in the list of the top 10 ions scores If set to 1 then different permutations of variable modifications are treated as independent matches creating the possibility that all 10 top ions scores correspond to the same primary sequenc
236. ork connections e EC VM12 Network Internet Diagnose and repair This computer A Network Private network Customize Access Local and Internet Connection Local Area Connection View status lB Sharing and Discovery Network discovery On File sharing Off a When file sharing is on files and printers that you have shared from this computer can be accessed by people on the network Turn on file sharing Turn off file sharing See also Internet Options Apply Windows Firewall Public folder sharing Off e EAEE ae onr Select Administrative Tools and launch Windows Firewall with Ad vanced Security Select Inbound Rules in the left hand panel and New Rule in the action panel Chapter 11 Cluster Mode 193 amp New Inbound Rule Wizard x Rule Type Select the type of firewall rule to create Steps Rule Type What type of rule would you like to create Protocol and Ports Action Program Profile Rule that controls connections for a program Name Port Rule that controls connections for a TCP or UDP port Predefined BITS Peercaching Rule that controls connections for a Windows experience Custom Custom rule Leam more about rule types Back Next gt Cancel In the wizard choose Port Next TCP Specific Local Ports 5001 Next Allow Connection Next Clear the checkbox for Domain and Public Next Enter the name as MascotNodePort5001 Finish The new rule will be added
237. ot a sequence database directory Under Windows remember that the directory separator in Database Manager and in mascot dat must be a forward slash not a back slash DVD read errors File checksums and sizes as reported by cksum for the files on the Databases DVD 3537046107 4218730222 NCBInr_20120419 fasta gz 657867792 149274785 taxonomy tar gz 65 Configuration amp Log Files Configuration Files Mascot configuration files are located in the mascot config directory unimod xm1 defines mass values and modifications including substitutions enzymes defines enzyme cleavage specificity fragmentation rules specifies which fragment ion series corre spond to defined instrument types mascot dat contains general configuration information If you use Database Manager do not modify the sequence database related sections of mascot dat because any changes will be lost when Data base Manager is next used taxonomy specifies the taxonomy filter choices for the search form described in Chapter 9 quantitation xml1 defines quantitation methods nodelist txt configures the systems belonging to a Mascot cluster described in Chapter 11 user xml group xml security_options xml and security _tasks xml are the configuration files for Mascot secu rity described in Chapter 12 mod_file masses and substitutions are obsolete configu ration files that are created on the fly from unimod xm1 to support third party appli
238. p Content Type application x Mascot name quantitation lt xml version 1 0 encoding UTF 8 standalone no gt lt quantitation majorVersion 1 minorVersion 0 xmlns http www matrixscience com xmlns schema quantitation_1 xmlns xsi http www w3 org 2001 XMLSchema instance xsi schemaLocation http www matrixscience com xmlns schema quantitation_1 qu antitation_1 xsd gt Chapter 8 I O File Formats 153 lt method constrain _search false description 15N metabolic label ling min_num_peptides 2 name 15N Metabolic MD pro t_score type mudpit protein _ratio_type weighted report _detail true require bold red true show _sub_sets 0 5 sig th reshold_ value 0 05 gt lt component name light gt lt isotope gt lt component gt lt component name heavy gt lt isotope gt lt old gt N lt old gt lt new gt 15N lt new gt lt isotope gt This section is an extract from quantitation xml containing the quantitation method specified for the search For more details and a link to the schema refer to the Mascot HTML help pages for quantitation Unimod gc0p4Jq0M2Yt08jU534c0p Content Type application x Mascot name unimod lt xml version 1 0 encoding UTF 8 standalone no gt lt umod unimod xmlns umod http www unimod org xmlns schema unimod_2 majorVersion 2 minorVersion 0 xmlns xsi http w ww w3 org 2001 XMLSchema instance xsi schemaLoc
239. ppy faces and the status line will display the following messages Creating compressed files Running 1st test First test just run OK Trying to memory map files Just enabled memory mapping In Use Once the database is In use you can begin exploring and using Mascot Clicking on the links in the cluster node table will display more detailed status information for individual nodes 200 Mascot Installation and Setup Linux Communication Under Linux the master node communicates with the search nodes using either ssh preferred or rsh If communication can be established using ssh then scp is used for file copying If rsh is used for communica tion then rcp is used for file copying Whether ssh or rsh is used it is essential that communication can be established without requiring passwords or passphrases In the case of ssh key based authentication is the preferred mechanism A less secure alternative for rsh is provided by file based authentication using rhosts or hosts equiv A detailed description of the many ways to configure ssh or rsh is outside the scope of this manual For key based authentication read the man pages for ssh sshd ssh keygen ssh add ssh agent For file based authentication read the man pages for rsh rshd rlogin hosts equiv The minimum procedure to set up key based authentication for ssh on a clean Linux system where there are no pre existing keys is as follows 1 Login to the mast
240. propri ate commands e See if any files are missing or out of date see above and if neces sary update them This is done though the TCP IP socket so no directory mapping NFS mounts are required Once all the Mascot nodes have been successfully initialised then Mas cot Monitor starts as normal Licensing The number of processors that the search is permitted to run on is restricted by the number of mascot licenses The Mascot master node is not included in this list since it merely distributes the search and col lates the results The number of processors to be used for Mascot will never exceed the number specified in the licence Error messages and emails In the single server version of Mascot selected warning messages can optionally be emailed to the system administrator when something critical such as a database update fails on the server The following additional messages specific to a cluster can also be emailed M00323 One or more cluster nodes has stopped responding M00316 Dr Watson log updated indicating a software crash on one of the cluster nodes Who Am I If the Mascot master is also being used as a node when nph mascot exe is run it needs to know whether it is running as a node task or as master task Since the different mascot dat files are identical it determines this from a file mascot config iam dat that is created by the Mascot node service when it starts up Do not copy or replace this fi
241. pyright c 2004 2005 The University of Tennessee and The Univer sity of Tennessee Research Foundation All rights reserved Copyright c 2004 2005 High Performance Computing Center Stuttgart University of Stuttgart All rights reserved Copyright c 2006 2007 Advanced Micro Devices Inc All rights reserved SCOPYRIGHTS Additional copyrights may follow SHEADERS Redistribution and use in source and binary forms with or without modification are permitted provided that the following conditions are met Redistributions of source code must retain the above copyright notice this list of conditions and the following disclaimer Redistributions in binary form must reproduce the above copyright notice this list of conditions and the following disclaimer listed in this license in the documentation and or other materials provided with the distribution Neither the name of the copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission The copyright holders provide no reassurances that the source code provided does not infringe any patent copyright or any other intel lectual property rights of third parties The copyright holders dis claim any liability to any recipient for claims brought against re cipient by any third party for infringement of that parties intellec tual property rights THIS SOFTWARE IS PROVIDED BY THE
242. r by append 88 Mascot Installation and Setup ing _featuretablelength X to the protein view URL where X is the length in bases FeatureTableMinScore By default only matches with significant scores p lt 0 05 are output A different score threshold can be specified using the parameter FeatureTableMinScore in the Options section of mascot dat or by ap pending _featuretableminscore X to the protein view URL where X is the score threshold ForkForUnixApache 0 If a user presses Stop or goes to another page in their browser when a search is running then the intended behaviour is that the search should continue and the user be emailed with their results However when running some versions of Apache the search is terminated by Apache when the connection to the browser is lost To stop this from happening set this value to 1 Setting this parameter to 1 with other servers can cause problems so only use this setting if necessary When set to 1 the result is that nph mascot exe ignores PIPE signals does a fork the parent exits and the child then ignores HUP signals FormVersion 1 01 Mascot users may save search forms off line or submit searches using scripts or private forms When the search engine is upgraded there is the possibility that old scripts or forms may contain invalid or obsolete parameters If a search is submitted to Mascot without a version number or if the version number is lower than that specified b
243. r hard drive It has 1 of 2 subfeatures selected The subfeatures require SKB on your hard drive Location C inetpub mascot i ae Ga oes If IIS is installed and functional the default selections will be as shown above with IIS being configured automatically If you don t have IIS installed the Apache option will be selected instead A test for whether Apache or some other web server is actually installed comes later You can de select the Swiss Prot database but if this is a clean install you are advised not to do so It is better to proceed with a full installa tion so that correct installation of Mascot can be verified If you don t want SwissProt to be available you can easily remove it later The default location for the installation is inetpub mascot on the drive with most free space with the sequence databases in inetpub mascot sequence You can change one or both of these by selecting the component then choosing Browse If there is insufficient disk space on the selected drive s the installation will not be able to continue The next step depends on whether IIS or Apache was selected as the web server For IIS there will be a drop down list of all the available web sites In most cases you should select Default Web Site If you select a different web site refer to the notes on multiple web sites later in this chapter Chapter 3 Installation Microsoft Windows 33 boba IIS Confi
244. rTestTimeout 1200 A time out can be applied to the test searches used to validate a new database If the test search on a new database does not produce a valid result within the number of seconds specified by MonitorTestTimeout the problem is assumed to be with the new database and the exchange process is halted MoveOldDbToOldDir 1 After a successful database swap the old Fasta file and old reference file if any are moved to the old directory unless this parameter is present and set to 0 Note that if set to 0 the old files are not deleted Some other application must take care of this or there will be problems next time Monitor starts up Mudpit 1000 Obsolete see MudpitSwitch MudpitSwitch 0 001 Mascot has two ways to calculate protein scores in a Peptide or Select summary report Standard scoring is used when the ratio between the number of queries and the number of database entries after any tax onomy filter is small The standard score is the sum of the ion scores after excluding duplicate matches and applying a small correction Protein score calculation switches to large search mode when the ratio between the number of queries and the number of database entries after any taxonomy filter exceeds the value specified by MudpitSwitch Only those ions scores that exceed one or both significance thresholds contribute to the score so that low scoring random matches have no effect The global default can also be over ridd
245. rcumflex and specifies a list that matches any character except for the characters in the list after the leading circumflex For example abc matches any one character except the characters a b or c The circumflex will have this special meaning only when it occurs first in the list immediately following the left bracket Appendix A Basic Regular Expressions 227 A range expression represents the inclusive set of characters between two characters in the ASCII character set The starting and ending characters are separated by a hyphen For example A Z will match to any single upper case letter while 0 9 A Za z matches any single alphanumeric character Matching Multiple Characters When a BRE matching a single character or a subexpression is followed by the special character asterisk together with that asterisk it matches what zero or more consecutive occurrences of the character For example ab and ab ab are equivalent when matching the string ab The expression ab c will match to ac or abc or abbbbbbce When a BRE matching a single character or a subexpression is followed by an interval expression of the format m m or m n together with that interval expression it matches what repeated consecu tive occurrences of the BRE would match The values of m and n will be decimal integers in the range 0 lt m lt n lt 255 where m specifies the exact or minimum number of occurrences and n specifies the max
246. rd Mascot login screen will be displayed but authentication is performed using Mascot Integra The Mascot Integra server details must be specified in the options sec tion of the security administration utility IP address This user should only used for third party legacy applications that do not support Mascot security Instead of a user name enter the static IP address of the computer that will access the Mascot server Do not enter a password Computer name Same as the IP address but the computer name is used instead A computer name is more practical where dynamic IP addresses are being used Agent string Should only be used as a last resort for third party applications that haven t implemented Mascot security and where the computer name IP address is not reliable A case sensitive substring comparison will be made with the HTTP_USER_AGENT environment variable Use built in web server authentication See description of authentication above Mascot will never prompt these users for username and password and hence passwords and password expiry will be ignored Mascot security session time outs do not apply In Microsoft Internet Information Services IIS if anonymous access and integrated authentication are both enabled then users will gener ally be logged in as anonymous until they try to access a file where permission is denied This almost certainly means that anonymous login must be disabled
247. rdless of who wrote it Thus it is not the intent of this section to claim rights or contest your rights to work written entirely by you rather the intent is to exercise the right to control the distribution of derivative or collective works based on the Library In addition mere aggregation of another work not based on the Library with the Library or with a work based on the Library on a volume of a storage or distribution medium does not bring the other work under the scope of this License End User Licence Agreements xix 3 You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library To do this you must alter all the notices that refer to this License so that they refer to the ordinary GNU General Public License version 2 instead of to this License If a newer version than version 2 of the ordinary GNU General Public License has appeared then you can specify that version instead if you wish Do not make any other change in these notices Once this change is made in a given copy it is irreversible for that copy so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy This option is useful when you wish to copy part of the code of the Library into a program that is not a library 4 You may copy and distribute the Library or a portion or derivative of it under Section 2 in object code or executab
248. re is an error one of the following will be returned unknown_id job_not_ running searchcontrol error nnn with values of nnn as for status The kill is implemented by setting a flag in the mascot control memory mapped file The nph mascot exe task is responsible for killing itself ms searchcontrol exe pause job taskID lt number gt sessionID lt string gt If the task is successful this will return the text job_paused If there is an error one of the following will be returned unknown_id job_not_ running job_already_ paused searchcontrol error nnn with values of nnn as for status The pause is implemented by setting a flag in the mascot control memory mapped file The nph mascot exe task is responsible for paus ing itself ms searchcontrol exe resume job taskID lt number gt sessionID lt string gt If the task is successful this will return the text job_resumed If there is an error one of the following will be returned unknown_id Chapter 7 Program Reference 139 job_not_running job_not_paused searchcontrol error nnn with values of nnn as for status The resume is implemented by setting a flag in the mascot control memory mapped file The nph mascot exe task is responsible for resum ing itself ms searchcontrol exe nice job taskID lt number gt nice lt integer gt sessionID lt string gt The task ID need to
249. required when running a search Species specific nucleic acid databases Even a species specific database such as EST_human requires tax onomy to be defined at the database level so that the correct genetic code can be chosen For EST_human the default taxonomy block in mascot dat is TAXONOMY FOR EST human with TaxID Taxonomy _ 10 Enabled 1 0 to disable it SpeciesFiles NCBI names dmp NodesFiles NCBI nodes dmp NCBI merged dmp Identifier All human with TaxID 9606 GencodeFiles NCBI gencode dmp MitochondrialTranslation 0 TaxID 9606 End MitochondrialTranslation is set to 0 off and TaxID is set to 9606 specifying that all database entries are homo sapiens So genetic code 1 standard will be selected for all entries HUPO PSI PEFF Format The HUPO Proteomics Standards Initiative PEFF Fasta format is described here http www psidev info index php q node 363 178 Mascot Installation and Setup TAXONOMY FOR PEFF Taxonomy_14 Identifier HUPO PSI PEFF Format Enabled 1 0 to disable it FromRefFile 0 ErrorLevel 0 SpeciesFiles NCBI names dmp NodesFiles NCBI nodes dmp NCBI merged dmp DefaultRule EXPLICIT CHOP NcbiTaxId 0 9 end The NCBI taxonomy ID can be parsed directly from the title line 179 Mascot Daemon Overview Mascot Daemon is a client application that automates the submission of searches to a Mascot Server Functionality includes 1 Batch mode in
250. residue hl_ql_subst pos1 ambigl1 matched1 posn ambign matchedn h1l_pl_summed_mods variable modifications string hl _q2e hl1_qme h2e hn_qme gc0p4JqOM2Yt084jU534c0p Chapter 8 I O File Formats 157 Where a parameter has multiple values these are shown on separate lines for clarity In the actual result file all values for a parameter are on a single line and there are no spaces or tabs between values Variable modifications is a string of digits one digit for the N terminus one for each residue and one for the C terminus Each digit specifies the modification used to obtain the match 0 indicates no modification 1 indicates deltal 2 indicates delta2 etc in the masses section If the number of modifications exceeds 9 the letters A to W are used to repre sent modifications 10 to 32 X is used to indicate a modification found in error tolerant mode neutral loss string is the same concept as the variable mod string except each character represents the index of the primary neutral loss one of the master NL Any position that is not modified or where the mod has no neutral loss is set to 0 hn_qm_primary_nl will only be output if the string contains at least one non zero character Ifa new modification is found in an error tolerant search its position is marked by X and details are recorded in an additional entry hn_qm_et_mods If the error tolerant search is of a nucleic acid data base and the
251. rograms These disadvantages are End User Licence Agreements xvii the reason we use the ordinary General Public License for many libraries However the Lesser license provides advantages in certain special circumstances For example on rare occasions there may be a special need to encourage the widest possible use of a certain library so that it becomes a de facto standard To achieve this non free programs must be allowed to use the library A more frequent case is that a free library does the same job as widely used non free libraries In this case there is little to gain by limiting the free library to free software only so we use the Lesser General Public License In other cases permission to use a particular library in non free programs enables a greater number of people to use a large body of free software For example permission to use the GNU C Library in non free programs enables many more people to use the whole GNU operating system as well as its variant the GNU Linux operating system Although the Lesser General Public License is Less protective of the users freedom it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library The precise terms and conditions for copying distribution and modification follow Pay close attention to the difference between a work based on the library and a work that uses th
252. running A queued job will return queued when ms searchcontrol exe is called with the status argument ms searchcontrol exe version sessionID lt string gt If the task is successful this will return the version number CreatePIP Usage ms createpip exe OPTION i filename Options h help f features exit sessionID lt id gt mand line o output_file q lt queries gt mascot dat s lt sequences gt mascot dat a lt feature gt r lt feature gt C p lt interval gt seconds nocache version display this help page and exit display list of features defined in mascot dat and not normally used because this is run from com default is filename pip override minimum number of queries set in override minimum number of sequences set in add a feature to the list specified in mascot dat remove a feature to the list specified in mascot dat use cached results progress reports of Process X every lt interval gt do not use cached results display version number and exit Chapter 7 Program Reference 141 Features retentionTime Retention time in seconds if available dM Calculated minus observed peptide mass in Da mScore Mascot score always on lgDScore Mascot score minus Mascot score of next best non isobaric peptide hit mrCalc Calculated Mr charge Charge dMppm Calculated minus observed peptide mass in ppm absDM Absolute value of calculated minus
253. ry the recipient automatically receives a license from the original licensor to copy distribute link with or modify the Library subject to these terms and conditions You may not impose any further restrictions on the recipients exercise of the rights granted herein You are not responsible for enforcing compliance by third parties with this License 11 If as a consequence of a court judgment or allegation of patent infringement or for any other reason not limited to patent issues conditions are imposed on you whether by court order agreement or otherwise that contradict the conditions of this License they do not excuse you from the conditions of this License If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations then as a consequence you may not distribute the Library at all For example if a patent license would not permit royalty free redistribution of the Library by all those who receive copies directly or indirectly through you then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library xxii Mascot Installation and Setup If any portion of this section is held invalid or unenforceable under any particular circumstance the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances It is not the purpose of this section to induce you
254. s a utility for retrieving taxonomy details for an entry in a database configured for use by Mascot The utility can be used to retrieve information for a single entry ot in batch mode Single entry mode The executable x cgi ms gettaxonomy exe can be called from the command line or via a URL as a CGI application When calling as a CGI application with arguments appended to the URL the parameter list must be URL escaped Spaces replaced by and characters other than letters or numbers replaced by a xx where xx is the ASCII code for the character as a hexadecimal number When running from a command line the accession string should be enclosed in single or double quotes This is essential for accession strings beginning gi because the pipe character has special meaning in Linux and Windows In the table below the first argument supplied to ms gettaxonomy exe is an integer to specify the mode The remaining arguments are selected from database Mascot database name e g NCBInr accession accession string e g gi 7633482 126 Mascot Installation and Setup tax_ID taxonomy ID number e g 9606 species name of species e g homo sapiens File Edit View Favorites Tools Help Ea epak gt A Qsearch Syravorites CHristory B Sp H Address je http dellSO00 mascot x cgi ms gettaxonomy exe 4 MSDB CCHU gt Go Links Taxonomy for CCHU CCHU Homo sapiens human man
255. s also need to be able to read and write to these files For example with the Microsoft Web server IIS a new user with the name IUSR_ lt name_of_pc gt is created when the server is installed and the scripts are run using this user name The installation program sets these values appropriately Other Web servers may use different user names with different permissions NTIUserGroup is the name of a group that the user name of the process to run CGI scripts belongs to NTMonitorGroup is the name of the local Administrators group If not using IIS check the documentation that comes with the server to find out which user name is used for running scripts then from the start menu choose Programs administrative tools common and User Manager Double click on the user name and press the groups button to find out which groups this user name belongs to This is the name to put in mascot dat for NTIUserGroup Chapter 6 Configuration amp Log Files 95 Failure to put the correct group name will generally result in one of two error messages Failed to open memory mapped file lt filename gt QR Error access denied or Failed to create memory map for lt filename gt amp Error Access denied After changing either of these entries the Mascot service will need to be stopped from the start menu choose Programs Mascot config Stop Mascot service All compressed database files must be de leted Then the Mascot service
256. s normally generated by the web browser If an other application is used to generate an input file simply ensure that it conforms to the MIME format standard The Mascot Monitor test searches use captured input files Hence an example of a file can be seen by opening mascot data test SwissProt asc in any text editor 243501029130836 Disposition form data 243501029130836 Disposition form data 243501029130836 Disposition form data 243501029130836 Disposition form data 243501029130836 Disposition form data 243501029130836 Disposition form data 243501029130836 Disposition form data 243501029130836 Disposition form data Program Test name INTERMEDIATE name FORMVER name SEARCH name PEAK name REPTYPE name ErrTolRepeat name SHOWALLMODS name USERNAME 243501029130836 Content Disposition form data MS MS Test Search 243501029130836 Content Disposition form data SwissProt 243501029130836 Content Disposition form data Trypsin P 243501029130836 Content Disposition form data 1 243501029130836 Content Disposition form data None 243501029130836 Content Disposition form data All entries 243501029130836 Content Disposition form data Carbamidomethyl C 243501029130836 Content Disposition form data Oxidation M 243501029130836 Content Disposition form data 100 243501029130836 Content Disposition form
257. s to the Apache configuration Add the following ScriptAlias entry immediately before the ScriptAlias for mascot cgi ScriptAlias mascot cgi htsearch usr lib cgi bin htsearch On Red Hat CentOS usr 1ib cgi bin should be replaced with var www cgi bin You may also need to add the following if you get 403 errors especially if you have Mascot defined in a separate virtual host lt Directory usr lib cgi bin gt Order allow deny Allow from all lt Directory gt Finally build an index of the Mascot web site documents rundig v This may need to be run by the web server user or root depending on how htdig has been installed and configured Indexing will only take a minute or two Use of the v flag causes verbose progress reports to be generated 18 Mascot Installation and Setup Miscellaneous Hyper threading Intel only Hyper threading is a technique used by Intel to improve the performance of multi threaded programs Hyper threading does not double performance because pairs of cores share other resources such as the on chip cache On some systems a BIOS setting can be used to enable and disable hyper threading Hyper threading is detected automatically Each CPU in the Mascot licence enables up to 4 cores to be used for searches Hyper threading is ignored when counting cores so that you may see a 1 CPU licence using 8 threads on a system with a quad core processor with hyper threading enabled File System
258. s windows clients to particip Manages objects in the Network a Status Started Started Started Started Started Started Started Started Started Started Highlight the entry for Matrix Science Monitor Service and press Start If the service fails to start the cause must be investigated and the problem fixed before proceeding Monitor progress using the Database Status page on the master Choose Monitor log and watch for error messages as the program files and database files are copied to the search nodes Completion The installation is now complete There will be a lot of disk activity while the Mascot service compresses the SwissProt sequence database Searches on the database cannot be performed until the files have been compressed You should open up the status screen in a web browser Start menu Programs Mascot Search Status and verify the cluster status If this is a clean installation or a version update you will need to follow the links to register a product key as described in Chapter 2 Linux installation or Chapter 3 Windows installation Once the licence file has been saved to config licdb on the master node you will be able to proceed to Database Status Chapter 11 Cluster Mode 199 2 Mascot search status page Mozilla Firefox File Edit View History Bookmarks Tools Help X Sas cA B File C Documents and Settings johnc MATRIX_SCIENCE Deskkop ms status exe htm hl Sugar
259. script can be edited Chapter 11 Cluster Mode 205 SubClusterSet X Y Large clusters can be divided into sub clusters X is a unique integer value 0 based used to identify the sub cluster Y is the maximum number of processors in the sub cluster A single cluster must have a single entry with X set to 0 IPCTimeout The timeout in seconds for inter process communication IPCLogging 0 for no logging of inter process communication 1 for minimal logging 2 for verbose logging IPCLogfile The relative path to the inter process communication log file CheckNodesAliveFreq The interval in seconds between health checks on the nodes SecsToWaitForNodeAtStartup At startup if a node is not available within this time the system will continue to startup without that node If the value is set to 0 then the system will wait indefinitely Default is 60 seconds This timeout is also used if a node fails while the system is running The system will wait for this number of seconds before re initialising ms monitor exe This means that a short lived interruption in network communication doesn t create a major service interruption MascotNodeRebootScript Path to an optional CGI script to re boot a cluster node If this parameter is defined there will be a link at the bottom of each Mascot Cluster Node status page Clicking on this link will execute the specified CGI script with the host name of the specified node as an argument
260. se 3 You may copy and distribute the Program or a work based on it under Section 2 in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following a Accompany it with the complete corresponding machine readable source code which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange or x Mascot Installation and Setup b Accompany it with a written offer valid for at least three years to give any third party for a charge no more than your cost of physically performing source distribution a complete machine readable copy of the corresponding source code to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange or c Accompany it with the information you received as to the offer to distribute corresponding source code This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer in accord with Subsection b above The source code for a work means the preferred form of the work for making modifications to it For an executable work complete source code means all the source code for all modules it contains plus any associated interface definition files plus the scripts used to control compilation and installation of the executable However as a special ex
261. se the whole of the work to be licensed at no charge to all third parties under the terms of this License d Ifa facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility other than as an argument passed when the facility is invoked then you must make a good faith effort to ensure that in the event an application does not supply such function or table the facility still operates and performs whatever part of its purpose remains meaningful For example a function in a library to compute square roots has a purpose that is entirely well defined independent of the application Therefore Subsection 2d requires that any application supplied function or table used by this function must be optional if the application does not supply it the square root function must still compute square roots These requirements apply to the modified work as a whole If identifiable sections of that work are not derived from the Library and can be reasonably considered independent and separate works in themselves then this License and its terms do not apply to those sections when you distribute them as separate works But when you distribute the same sections as part of a whole which is a work based on the Library the distribution of the whole must be on the terms of this License whose permissions for other licensees extend to the entire whole and thus to each and every part rega
262. second column is the parent taxonomy ID Note that the parent of Arabidopsis thaliana 3702 is Arabidopsis 3701 Chapter 9 Taxonomy 169 3700 3699 family 1 ae be 8s i 3701 3700 genus 1 1 2 0 3702 3701 species AT 4 1 21 21 4212 21 40401 4 1 1 0 4 1 1 0 Both files can be obtained from the NCBI ftp site ftp ftp ncbi nih gov pub taxonomy taxdump tar gz For NCBInr you will also need gi_taxid _prot dmp gz For NCBI EST databases you will need gi_taxid_nucl dmp gz You should not modify the names dmp and nodes dmp file in the tax onomy directory If you wish to add more entries a new file should be made with just the new entries Mascot will load multiple files as speci fied below Most Mascot updates will contain the updated names dmp and nodes dmp files PDBeast File This file contains a list of entries that are derived from the Brookhaven Protein databank PDB The file is available at ftp ftp ncbi nih gov mmdb pdbeast table SwissProt File SwissProt also supplies a file speclist txt that is similar to the NCBI names dmp file except that it gives the NCBI taxonomy ID for the SwissProt Code A regular expression is used to extract the Code and Taxon Node from the file The regular expression should be defined in any Taxonomy_ x section that uses speclist txt and is defined as SWISSPROTRegex A Z0 9 ABEV 0 9 Code T
263. sion Content Type oth Mascot search input files and results output files are in MIME format This is a text file which can be viewed easily for inspection or debugging purposes The MIME format is defined in various request for comment docu ments The following are the most relevant ftp ftp isi edu in notes rfc2045 txt ftp ftp isi edu in notes rfc2046 txt ftp ftp isi edu in notes rfc2388 txt Very briefly a unique boundary string is used to divide the file into sections each of which contains data in a format defined by a Content type Each section begins with two hyphens followed by the boundary string The next line contains the content definition and name followed by a blank line Then data until the beginning of the next section For exam ple 1 0 Generated by Mascot version 1 0 multipart mixed boundary gc0p4Jq0M2Yt08jU534c0p gc0p4Jq0M2Yt084jU534c0p Content Type first value application x Mascot name first_ name gc0p4Jq0M2Yt08jU534c0p Content Type another value application x Mascot name another name gc0p4Jq0M2Yt084jU534c0p gc0p4Jq0M2Yt08jU534c0p 146 Mascot Installation and Setup Content Type application x Mascot name final_ name final value gc0p4Jqo0M2Yt08jU534c0p Search Input File Content Content 1 01 Content MIS Content AUTO Content peptide Content Content Content Monitor The search input file i
264. sk will then appear in the lower list Similarly to remove a task click on the check box in the lower list and click on the Remove button To get further information about any task hold the mouse over the task in the lower window and further details will appear in the help box 222 Mascot Installation and Setup No changes to a group are saved until the Save changes button is pressed Session files Session files are created in the mascot sessions directory Sessions that have expired will be deleted automatically by ms monitor Log file The log file security log in the mascot logs directory contains informa tion about all security changes The file is not available from any web based application for security reasons The level of logging can be con trolled from the security administration utility Configuration Files Security information is saved in three configuration files in the mascot config directory security _options xml security_tasks xml group xml user xml The schema for these files is mascot_security_1_0 xsd Use the security administration utility or Mascot Parser rather than editing these files manually Automating addition of new users Mascot Parser users have access to all of the documentation for the lower level functions to administer Mascot security programmatically The security administration utility uses some of these functions To simply to add a large number of users then
265. soft com en US windows downloads windows vista 26 Mascot Installation and Setup Turn Windows features on or off To turn a feature on select its check box To turn a feature off clear its check box A __ filled box means that only part of the feature is turned on Indexing Service m Internet Information Services E FTP Publishing Service o m Web Management Tools o MJ 1S6 Management Compatibility VIJe 156 Management Console 156 Scripting Tools MI j 1S 6 WMI Compatibility ea IS Metabase and IIS 6 configuration compatibility MIJ IS Management Console wie IS Management Scripts and Tools EJ Is Management Service o MM World Wide Web Services 8 m Application Development Features EJE NET Extensibility DR asp Hb ASP NET MIJ CG m ISAPI Extensions J ISAPI Filters EJ Server Side Includes Common Http Features B Default Document Lb Directory Browsing JE HTTP Errors J HTTP Redirection B Static Content Health and Diagnostics J Custom Logging J HTTP Logging J Logging Tools Ji ODBC Logging J Request Monitor Tracing Performance Features B Http Compression Dynamic p Static Content Compression Security E Basic Authentication J Client Certificate Mapping Authentication Digest Authentication p IS Client Certificate Mapping Authentication J IP Security J Request Filtering B URL Authorization Windows Authentication tno murrr Lan Oe o0000 s80 q q008o s0eTRE ox
266. ss Max mass 700 000 700 000 700 000 700 000 700 000 700 000 700 000 700 000 700 000 700 000 700 000 7 Delete Delete Delete Delete Delete Delete Delete Delete Delete Delete Edit Edit Edit Edit Edit Edit Edit Edit Edit Edit Edit New Instrument Main menu gy Local intranet The INSTRUMENT search parameter is used to select the set of ion series used for scoring MS MS matches Chapter 6 Configuration amp Log Files 73 File format fragmentation_rules Each instrument is defined by a block of lines Blocks are delimited from one another by a line containing an asterisk The first line of each block must start with the Title keyword followed by a text string that is used to identify the instrument in forms and reports The definition should be short and self explanatory It should only include alphanumeric characters and hyphens The following lines start with an integer each of which represents an ion series or a rule to be included in the definition Refer to the file header for a list of avail able integers Anything following a hash symbol is treated as a comment A block can also specify mass range limits for internal ions The default range is 0 to 700 Da and could be changed as in this example title MALDI QIT TOF 1 singly charged 4 immonium 5 a series 6 a NH3 if a significant and fragment includes RKNQ 7 a H20 if a significant and fragment includes STED 8 b series 9 b N
267. st data This means that a results file contains everything necessary to generate a report repeat the search at a later date or act as the self contained input file to a project database or LIMS Mascot Parser provides an object oriented Application Programmer Interface API to Mascot result files and configuration files making it easy for programs written in C Java Perl or Python to access Mascot results We strongly recommend that anyone writing software to process Mascot results uses Mascot Parser just like all of the Mascot result report scripts e It makes application development much faster e It makes your code simpler and easier to debug e You don t have to worry about updating your code every time a new version of Mascot is released The Mascot Parser package which includes object libraries header files binary executables extensive documentation and example code for many functions is available as a free download For more information go to http www matrixscience com msparser html For reference the result file contents are divided into logical sections 1 Search parameters 2 Mass values 3 Quantitation method if used 4 Unimod extract 5 Enzyme definition 6 Taxonomy if a taxonomy filter was used 7 Misc header information 8 Summary results for Protein Summary 9 Mixtures if PMF 10 Summary of decoy results if automatic decoy 11 Summary of error tolerant results if automatic ET
268. st of performing this distribution d If distribution of the work is made by offering access to copy from a designated place offer equivalent access to copy the above specified materials from the same place e Verify that the user has already received a copy of these materials or that you have already sent this user a copy For an executable the required form of the work that uses the Library must include any data and utility programs needed for reproducing the executable from it However as a special exception the materials to be distributed need not include anything that is normally distributed in either source or binary form with the major components compiler kernel and so on of the operating system on which the executable runs unless that component itself accompanies the executable It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system Such a End User Licence Agreements xxi contradiction means you cannot use both them and the Library together in an executable that you distribute 7 You may place library facilities that are a work based on the Library side by side in a single library together with other library facilities not covered by this License and distribute such a combined library provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted
269. start with the Title keyword fol lowed by a text string that is used to identify the species in forms and reports The definition should be short and self explanatory To show the tree structure indentation can be used Unfortunately it is not possible to use tabs or multiple spaces for indentation in an html form so a full stop period and a space are used to indent the list Internal spaces are significant and there should never be two or more spaces together This should be followed with a definition line starting with the In clude keyword followed by one or more NCBI taxonomy IDs separated with commas This should be followed with a definition line starting with the Ex clude keyword followed by one or more NCBI taxonomy IDs separated with commas Any sequence with a taxonomy ID that passes the include test may then be rejected by any entry in the exclude list Finally each entry must end with a There are two ways of finding the NCBI taxonomy ID for a given species The first is to open the file names dmp in the mascot taxonomy directory Under Windows from the start menu choose Programs Mascot Config NCBI taxonomy names dmp file and search for the species name The ID is the number on the left For example the ID for Filicophyta is 3263 3263 Filicophyta scientific name 3263 ferns preferred common name Alternatively the NCBI taxonomy browser can be used http www ncbi nlm nih gov Taxonomy
270. sted in pmf_ queries used Summary results gc0p40q0M2Yt084jU534c0p Content Type application x Mascot name summary qmass1l Mr qexpl m z for query 1 charge qintensityl intensity value for queryl if available qmatchl Total number of peptide mass matches for queryl in database qplugholel Threshold score for homologous peptide match MIS only qmass2e qexp2 qintensityl qmatch2 qplughole2 qmassne qexpne gqintensityn qmatchne qplugholene num_hits number of hits in the summary block lt max_hits 156 Mascot Installation and Setup hl accession string total protein score obsolete intact protein mass hl_text title text hl _frame frame_number between 1 and 6 for nucleic acid only hl_ql missed cleavages 1 indicates no match peptide Mr delta start end number of ions matched peptide string peaks used from Ionsi variable modifications string ions score multiplicity ion series found peaks used from Ions2 peaks used from Ions3 total area of matched peaks h1l_ql_et_mods modification mass neutral loss mass modification description hl ql_et_ mods master neutral loss mass neutral loss mass h1l_ql_et_ mods slave neutral loss mass neutral loss mass hl_ql_ primary _nl neutral loss string hl_ql_na_diff original NA sequence modified NA sequence hl_ql_tag tagNum startPos endPos seriesID h1l_ql_ drange startPos endPos hl_ql_terms residue
271. t follow these steps A On the Edit menu point to New and then click DWORD Value B Type LocalAccountTokenFilterPolicy and then press EN TER 4 Right click LocalAccountTokenFilterPolicy and then click Modify 5 In the Value data box type 1 and then click OK 6 Exit Registry Editor Repeat this entire procedure on every search node Chapter 11 Cluster Mode 197 Vista Server 2008 and Windows 7 On each search node from the Control panel Administrative Tools open the Services dialog and select Remote Registry Unless already set to Automatic right click and choose Properties On the General tab set Startup type to Automatic and also start the service Choose OK Starting the Mascot service for the first time On the master node from the Control panel Administrative Tools open the Services dialog and select Matrix Science Mascot Service Right click and choose Properties Go to the Log On tab and choose This account Enter the user name and password for a domain account with local Administrator rights on each search node not the local administrator account on the master You could use a domain administrator account but this might be considered risky If the nodes do not belong to a domain all nodes including the master must have a user defined with administrator rights and the same user name and password The service must be set to log in as this user Matrix Science Mascot Service Properties Local Co
272. t at 0 but see your com puter documentation The ProcessorSet line specifies the complete set of logical processors cores to be used Separate processor values with acomma The number in this list must be less than or equal to four times the number of physical CPU licensed or the system will not run Following this the processors to be used for each database are specified These numbers must be a subset of the numbers in the ProcessorSet and there must be the same number of values as the number of threads specified earlier in the database section For example if you had a 1 cpu licence and the physical processor had 6 cores and you wanted to avoid using cores 0 and 1 you could specify this as follows PROCESSORS ProcessorSet 2 3 4 5 SwissProt 2 3 4 5 end The PROCESSORS section must be after the Databases section in mascot dat and ProcessorSet must come before the other entries in this section Taxonomy Do not modify this section if you ever use Database Manager The syntax of the taxonomy blocks is fully described in Chapter 9 Cluster The syntax of the cluster block is fully described in Chapter 11 Chapter 6 Configuration amp Log Files 81 UniGene Do not modify this section if you ever use Database Manager UniGene is an index created by automatically partitioning GenBank sequences into a non redundant set of gene oriented clusters http www ncbi nlm nih gov UniGene Each UniGene cluster is a list of the GenBa
273. t file so cannot be recovered by changing this parameter in a repeat search ProteinFamilySwitch see NoResultsScript Chapter 6 Configuration amp Log Files 97 ProteinsInResultsFile 2 Determines the number of protein title lines saved to each results file 1 As in Mascot 1 7 and earlier only proteins that appear in the Summary section will appear in the Proteins section 2 Include proteins with at least one top ranking peptide match toa peptide of length greater than MinPepLengthInPepSummary 3 Include all proteins proxy _password proxy server proxy_username These entries support a proxy server between the Mascot server and the outside world A typical entry might be proxy server http our cache 3128 If there is no proxy_server entry scripts will look for proxy informa tion in the server environment The proxy _username and proxy password parameters are only required if the proxy server requires authentication Remote host authentication should be included directly in the URLs specified in mascot dat e g http username password hostname RemoveOldiIndexFiles 1 After a successful database swap the compressed files in the current directory are deleted unless this parameter is present and set to 0 ReportBuilderColumnArrangement Set the column arrangement at the given index Column arrangements are used by Report Builder introduced in Mascot 2 4 to provide a default list of columns to show These can be sel
274. tabase Parameters db database name that was requested 462 One or more errors happened while loading taxonomy nodes Parameters messages more detailed error information 460 Failed to register job Please inspect mascot error log 270 A POST request is submitted with zero content length 55 Cannot find boundary string 56 First line was not a boundary 120 Mascot Installation and Setup 259 Corrupted input possibly a binary file is submitted 72 Corrupted input or incompatible browser 458 Invalid accession format for ms getseq exe 459 Too large POST request 54 Standard input stream error Parameters bytesread number of bytes already read lengthofdata total size of input data in the stream Non fatal Errors 461 Sequence not found Parameters accession accession string frame frame number 0 if not supplied in the input or missing if AA database Warnings that are only reported in the end of the XML document 400 Missing or invalid gencode id Table 1 is used for transla tion Parameters accession accession string frame frame number 0 if not supplied in the input or missing if AA database 470 Cannot find taxonomy id Parameters accession accession string frame frame number 0 if not supplied in the input or missing if AA database 104 Sequence is too long for translation Parameters
275. taxonomy usernodes dmp Not required by most users Note that names dmp is not required on the Mascot Nodes Start up of ms monitor exe The following sequence occurs for each node when ms monitor exe starts for the first time on the master system items marked are for Windows clusters only See if the computer is available by opening a socket to the ping port port 7 If there is an entry StopMascotNodeCmd in the mascot dat file then run that command to stop the Mascot node daemon or See if there is a MascotNodeService installed on the computer if there is then stop that service If there is no ms mascotnode exe or if it is out of date on the Mascot node then copy update the file from the cluster lt OS gt direc tory on the Mascot Master system to the specified directory on the Mascot Node Tf the service is not installed then install the service and adda registry entry for the directory to be changed to at start up Chapter 11 Cluster Mode 209 e Makea logs and config directory and copy mascot dat and mascot license e Start the MascotNodeService on the Mascot Node computer With a Linux based system the ms mascotnode exe daemon will be started e Check that the service daemon now communicates through TCP IP sockets if it fails then a message indicating which Mascot node it is waiting for is displayed in the ms status screen e Initialise the MascotNodeService daemon by sending the ap
276. ter is also a search node and will execute Mascot searches in addition to running Mascot Monitor and the web server it must be added as a search node using this dialog Use the Add Edit and Delete buttons to specify the complete cluster Cluster Setup x Node Address Port Processors UNC Node Path Node Directory 192 168 10 7 plat pus 5001 2 platypus cS mascotnode c mascotnode add edit Delete Press OK to return to the installation wizard and file installation will begin Copying the files and configuring the system may take some time Once complete you will be presented with a message requesting that you configure and start the Mascot Monitor service This has to be done manually The Monitor service on the master needs to be run under an account that has local Administrator rights on each node because it needs to write to the registry install start and stop services on each node If you later change the password for this account remember to change it in the Logon tab of the Matrix Science Mascot Service proper ties also Very large clusters Defining a very large cluster using the Add node dialog can be tedious It is usually faster to define a small cluster let the installation program run to completion then edit the configuration files using a text editor From the Program menu stop the Mascot service and edit the cluster and sub cluster configuration details into mascot dat and
277. the Windows start menu choose Programs Apache HttpServer 2 2 Control Apache Server Re start You should now be able to view Mascot pages in a web browser and proceed with licence registration If Windows Firewall is enabled you will probably have to open up port 80 as described in Chapter 3 before the Mascot Server can be accessed from other computers Keyword Indexing The keyword index required for site search will not have been built during Mascot installation because the web server mappings were not in place To build the keyword index open a command window and enter the following commands If Mascot was installed into a different path you may have to modify the first two lines Appendix D Web Server Configuration 237 Gs cd inetpub mascot htdig bin htdig exe v bin htmerge exe v Once the commands have completed keyword search using the control at the top right of the web pages should be operational Using shebang under Windows The configuration file created by the installer includes this directive ScriptInterpreterSource Registry This enables Windows style registry file associations Assuming Perl has been installed correctly the extension p1 will be associated with perl exe Without this directive Apache uses the shebang line at the top of each Perl script to associate the script with the Perl interpreter The default shebang line is usr local bin perl If you want to use this on a Windows system
278. the contributors or copyright holders not be used in advertising or publicity pertaining to distribution of the software without specific prior permission THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS IN NO EVENT SHALL THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE DATA OR PROFITS WHETHER IN AN ACTION OF CONTRACT NEGLIGENCE OR OTHER TORTIOUS ACTION ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE xxviii Mascot Installation and Setup XX X Contents End User Licence Agreements ccsscccsscccssccsscccscccsscccesccesccceecs i Introdu cti n seiscientos inin aiin inaia iee 1 Installation LiNuX 55 secs sicctnnicdsnniervecboeGeviceressebasvieebasinwasevecbuescedey 5 Installation Microsoft Windows cccscccssccsscccescccscccesccees 23 Validation niise ieaiaia RNa ts isase ssia 51 Sequence Database Setup cccccccsscccssccrscccssccssccsscccsccccsccceecs 57 Configuration amp Log Files essssssssssesssecseessseoeeoesosecseeoseeoseoee 65 Program Reference sssssssssesssesosessscoseoseosecseeesecseeoesoseeseecseose 109 UO File Formats waxisieccascsuesssasanhavadervansaesessssiansosacqonasenssiaeveeeunens 145 Taxonomy csee ina i a ii ia 165 Mascot Dae
279. the default value is the most appropriate Web Site femmes oOo Use SSL TLS to access this web site Below you can modify the name of the Mascot virtual directory in Apache However we recommend that you accept the default name This value is added to the host name given above to form the full Mascot URL eg you might type into your browser http EC VM64 mascot Virtual Directory mascot The virtual directory name can be changed but remember that users are more likely to guess the correct URL if you stick with mascot Also some third party software may incorrectly assume the directory name is always mascot H va Nn TH UR ST TR T bol o Ea J Cluster Configuration MATRIX Choose whether to use Mascot cluster mode SCIENCE Please read the cluster mode chapter of the installation manual before using this feature For a standard installation of Mascot on a single computer this feature should not be enabled Otherwise if you wish to enable cluster mode then please select the option below and then click the Configure button to specify the nodes that will be in the duster Enable Mascot cluster mode Configure j At least one node must be defined in the duster Chapter 3 Installation Microsoft Windows 35 If you have a multi CPU licence you can configure Mascot for execution on a networked cluster If you intend to do this refer to Chapter 11 for further details before proceeding If you are install
280. the new database will display the follow ing messages Creating compressed files Running 1st test First test just run OK Trying to memory map files Just enabled memory mapping In Use Troubleshooting Proxy Server Several databases do not have a local reference file The default configu ration is to retrieve full annotation text as required from the remote web sites If there is a proxy server between your Mascot server and the Internet this may fail unless you define your proxy server in the Options section of mascot dat The relevant parameters are proxy server proxy username and proxy password Unless your proxy server uses authentication you only need the first of these and a typical entry in mascot dat will look like this proxy server http our cache 3128 64 Mascot Installation and Setup Permissions Security Mascot monitor will need to create the compressed database files in the database current directory and may need to move old database files to the old directory Mascot searches running as CGI processes with very restricted privileges need to read the files Make sure Linux permissions or Windows security settings don t prevent this Files not where they are supposed to be When you enable the database if nothing happens double check that the sequence database files are exactly where the Path definition specifies Note that the taxonomy files are shared and go into the Mascot tax onomy directory n
281. the username mickey the command would be htpasswd c usr local mascot config passwd mickey The c argument tells htpasswd to create new users file You will be prompted to enter a password for mickey and confirm it by entering it again Other users can be added to the existing file in the same way except that the c argument is not needed The same command can also be used to modify the password of an existing user Specifying the password protected resources Having created a password file the next step is to modify the configura tion mapping for the x cgi directory Instead of the mapping shown earlier you would use a directive like this lt Directory usr local mascot x cgi gt AllowOverride None Options None AuthType Basic AuthName Restricted AuthUserFile usr local mascot config passwd require valid user lt Directory gt ScriptAlias mascot x cgi usr local mascot x cgi You will need to stop and restart Apache or send a kill HUP to the parent process to activate the new configuration For further informa tion on restricting access to the server see the Authentication and Access Restrictions section of the Apache FAQ documentation
282. this service must be running at all times Once the new licence file is in place follow the hyperlink to Database Status You should see a display similar to the following me a AAM 6 J http ec vm64 mascot x cgi ms status exe Shc O X G Licence information E Ls tos Mascot Server Licence Information Register anew productkey View database status Reload this page Please include ail the contents of this page when requested to provide this information to technical support Mascot Server version 2 3 241 Licence path c inetpub mascot config licdb Licence s found Product Key Start End Status Active E SULW F7M9 TYGH 3GJ3 R3VJ 2012 04 20 OK Feature Mascot Server Core functionality v2 4 Feature Mascot Server CPU units 2 Company Edman University User L Scene Distributor Matrix Science Ltd Inactive Node info M 000c296cf4d5 V 943cb7e2 B EC VM64 End of page X Follow the link at the top centre to view the status of the SwissProt sequence database Chapter 3 Installation Microsoft Windows 39 aioe x PHR x GC 2 je ttp ec vm64 mascot x cgi ms status exe Shc O X Mascot search status page Eile Edit View Favorites Tools Help han te see MASCOT search status page Version 2 3 241 Edman Un
283. tides Length of a sequence number of residues user definable MaxSequenceLen Number of seq comp and ions type qualifiers per query 20 Maximum number of tags and etags in a search 100 Number of peptide masses MS MS search unlimited Number of peptide masses PMF search 1000 Number of enzymes in the enzymes file 100 Number of protein hits saved in the results file summary section PMF 50 Number of peaks per MS MS spectrum 10 000 Number of lines with name in MIME format file 1 000 000 Maximum mass of any peptide in standard Mascot Daltons 16 000 Minimum mass of any peptide Daltons 100 Maximum mass of an unmodified amino acid residue 300 Length of any peptide in residues in standard Mascot 254 Length of name TITLE for any query when escaped 30 000 Length of database name 19 Length of enzyme name 50 Length of modification name 50 Simultaneous variable modifications 9 Number of missed cleavage sites in a peptide 9 Maximum number of cleavage rules per enzyme 20 Number of active sequence databases user definable MaxDatabases 232 Mascot Installation and Setup Number of threads per search 1024 Number of concurrent jobs per database 100 Number of parse rules 256 Length of parse rule 128 Maximum length of an accession string 200 Maximum number of processors per server 64 Maximum number of sub clusters in a cluster 50 Maximum number of machines in a sub cluster 1024 Maximum number of processors in a sub cluster
284. tion Microsoft Windows 45 Advanced Multiple Web Site Configuration x m Multiple identities for this Web Site IP Address TCP Port Host Header Name Y All Unassigned Add Remove IE aipe ESEE OT S ENSE IP Address SSL Port Remove Eda Edif OR Cancel Help To memory lock databases totalling more than 2 GB For 32 bit editions of Windows there is a 2 GB limit on the address space for any single process Mascot Monitor ms monitor exe can easily reach this limit by trying to lock several large databases into memory To work around the 2GB limit a separate ms lockmem exe program is provided this is fork exec d from ms monitor exe when the flag SeparateLockMem 1 is added to the options section of mascot dat Further details can be found in Chapter 7 Hyper threading Intel only Hyper threading is a technique used by Intel to improve the performance of multi threaded programs Hyper threading does not double performance because pairs of cores share other resources such as the on chip cache On some systems a BIOS setting can be used to enable and disable hyper threading Hyper threading is detected automatically Each CPU in the Mascot licence enables up to 4 cores to be used for searches Hyper threading is ignored when counting cores so that you may see a 1 CPU licence using 46 Mascot Installation and Setup 8 threads on a system with a quad core pr
285. tire BRE after an initial if any as the first character of a subexpression after an initial if any The circumflex is special when used as an anchor or as the first character of a bracket expression The dollar sign is special when used as an anchor Matching Single Characters Any character that is not a special character is an ordinary character An ordinary character or a special character preceded by a backslash matches to itself A period used outside a bracket expression matches to any single character including a newline character A bracket expression a list of characters enclosed in square brackets matches any single character from the enclosed list The following rules and definitions apply to bracket expressions A bracket expression is either a matching list expression or a non matching list expression The right bracket loses its special meaning and represents itself in a bracket expression if it occurs first in the list after an initial circumflex if any Otherwise it terminates the bracket expression The special characters period asterisk left bracket and backslash respectively lose their special meaning within a bracket expression A matching list expression matches any one of the characters in the list The first character in the list must not be the circumflex For example abc matches any one of the characters a b or c A non matching list expression begins with a ci
286. to infringe any patents or other property right claims or to contest validity of any such claims this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system it is up to the author donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License If the distribution and or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries so that distribution is permitted only in or among countries not thus excluded In such case this License incorporates the limitation as if written in the body of this License The Free Software Foundation may publish revised and or new versions of the Lesser General Public License from time to time Such new versions will be similar in spirit to the present version but may differ in detail to address new problems or concerns Each version is given a distinguishing version number If the Libr
287. to run searches as the customer in a service or core lab environment Some third party applications require helper scripts to be installed on the Mascot web server If Mascot security is enabled you should be aware that such scripts may create security holes Enabling security When Mascot is first installed the security system is disabled To enable security open a command prompt or shell on the Mascot server and change to the mascot bin directory Enter the command perl enable security pl 216 Mascot Installation and Setup The Mascot service ms monitor exe must then be stopped and re started Disabling security To disable security open a command prompt or shell on the Mascot server and change to the mascot bin directory Enter the command perl disable security pl The Mascot service ms monitor exe must then be stopped and re started Authentication There are two different ways in which users can be authenticated 1 Mascot authentication The passwords are stored and maintained by the Mascot security libraries and or by Mascot Integra 2 Web server authentication Available with any web server that supports authentication Refer to your web server documentation for details on how to set up authentication The type of authentication is set up at the user level and not as a global setting Even if the server has web authentication switched on it may be useful to set some users to be authenticated using
288. to use this option IIS user names generally include the Domain name e g matrix_science charles The comparison will be with everything after the last forward or back slash So in this case you would enter charles as the user name Chapter 12 Security 219 Groups Access rights can are assigned to groups not users Therefore a user has no effective rights unless they belong to one or more groups If a user belongs to more than one group then their rights are the combination of the rights in both groups There are 5 special built in groups Guests By default the guest user is the only member of this group and the guest group can only submit PMF searches against any database This can easily be changed using the security administration utility Administrators The admin user always belongs to this group Members of the group can perform any administration task but cannot submit searches PowerUsers Members of this group can submit all types of searches and perform some administration They cannot access the security administration utility Daemons The daemon user belongs to this group by default MascotIntegraSystem The system user is the only member of this group Using the security administration utility When the security administration utility is started for the first time you will need to login as admin admin You are then forced to change the password The main page lists the current users and gro
289. try to run searches or view results reports until the relevant sequence database is In Use Usually you ll want to add ms monitor exe to the system boot process so that it is started automatically An example Linux init script called mascot can be found in the Mascot bin directory For RHEL CentOS move this to the etc init d directory with permissions rwxr xr x and owner root root As root type chkconfig add mascot Security Mascot security is disabled on installation To enable Mascot security refer to Chapter 12 Keyword Indexing Users of Mascot may wish to be able to search the help text by keywords or phrases The web pages are designed to work with an indexing tool called ht Dig This is standard in several Linux distributions If not installed we recommend stable release 3 1 6 Red Hat CentOS Linux yum install htdig Debian Ubuntu Linux aptitude install htdig SUSE Linux yast i htdig A few binary packages are also available at http www htdig org files binaries Alternatively if you have a working development system with a C compiler you can download the source code from http www htdig org Once installed you ll need to edit the following values in the ht Dig configuration file htdig conf Chapter 2 Installation Linux 17 start_url http your_host mascot home html limit_urls_ to mascot exclude_urls pl exe gif jpg pdf msi png It is also necessary to add an alia
290. tting For example http your_server mascot x cgi ms status exe Show RESULTFILE amp DateDir 20031231 amp ResJob F006983 dat For security reasons the following characters are not allowed in the DateDir or ResJob The argument MS_USERS returns a list of users that can be spoofed by the user whose session ID was supplied This may be an empty list Output format is username user id user type full name email address E g guest 1 1 Guest user guest localhost admin 2 1 Administrator admin localhost daemon 4 1 Mascot Daemon daemon localhost system 6 2 Mascot Integra system account integra localhost MS_STATUSXML returns an XML formatted document equivalent to the main status page The schema is html xmlns schema msstatus_1 msstatus_l xsd 124 Mascot Installation and Setup Review Mascot Review x cgi ms review exe provides similar functionality to Status but takes its input from searches log The tabular display can be filtered and sorted to locate specific searches by title user name or any one of the following log fields 1 Mascot job number Process ID Sequence Database User name User email address Search title Results file path Start time and date Duration in seconds 10 Completion Status 11 Job priority 12 Type of search PMF SQ or MIS 13 Enzyme Eit
291. tware in a product an acknowledgment in the product documentation would be appreciated but is not required 3 Altered source versions must be plainly marked as such and must not be misrepresented as being the original software 4 The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission THIS SOFTWARE IS PROVIDED BY THE AUTHOR AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT INDIRECT INCIDENTAL SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES INCLUDING BUT NOT LIMITED TO PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE Julian Seward Cambridge UK jseward acm org bzip2 libbzip2 version 1 0 3 of 15 February 2005 xiv Mascot Installation and Setup SWIG Simplified Wrapper and Interface Generator SWIG SWIG is distributed under the following terms This software includes contributions that are Copyright c 1998 2002 University of Chicago All rights reserved Redistribution and use in
292. uest unmap NO Mem locked NO Number of threads 2 Current YES Name Filename Status State Time Mem mapped lt IPI bovine Family usr local mascot_2_3_02_64 sequence IPI_bovine cur IPI_bovine_3 73 fasta Pathname usr local mascot_2_3_02_64 sequence IPI_bovine cu In use Statistics Mon Apr 16 11 05 38 searches 0 YES Request to mem map YES Request unmap NO Mem locked NO By clicking on a database hypertext link a page is displayed showing the activity on that particular database 22 Mascot Installation and Setup 3 Mascot search status page C fi Olocalhost 3090 5 ms status exe Autorefresh true OBLIST E Y A Mascot database status SwissProt Current jobs Job PID Start time Status UserID Title 1277 8552 Sun Apr 22 Searching 0 MS MS Example 1276 8547 Sun Apr 22 Searching 0 MS MS Example Completed jobs PID Start time Status User Title 756 Wed Apr 18 User read Mascot Daemon Submitted from bug 6 6531 Sun Apr 22 f User read GETSEQ Getting sequence 8526 Sun Apr 22 User read GETSEQ Getting sequence 10165Sat Apr 21 7 User read GETSEQ Getting sequence 1194 Wed Apr 18 User read GETSEQ Getting sequence 1144 Wed Apr 18 User read GETSEQ Getting sequence 1143 Wed Apr 18 User read GETSEQ Getting sequence 31718Wed Apr 18 User read GETTAX Getting taxonomy 31675Wed Apr 18 User read GETSEQ Getting sequence 31671Wed Apr 18 User read GETSEQ Getting sequence oo000000000 Back
293. umber of Hits o 100 150 200 Probability Based Mowse Score If the installation cannot proceed a message box will be displayed Typical problems include You do not have Administrator privileges Log out and log in as a user with local Administrator rights Perl is not installed Install Perl as described above Unsupported Windows platform Refer to the system require ments at the beginning of this Chapter Any problem s must be fixed before the installer will proceed Pressing Next displays the Mascot End User Licence Agreement Chapter 3 Installation Microsoft Windows 31 ie lascot Server Setup ecom fae End User Licence Agreement MATRIX Please read the following licence agreement carefully as CE MASCOT PROTEIN IDENTIFICATION SYSTEM S End user Licence Agreement E IMPORTANT PLEASE READ CAREFULLY This End User Licence Agreement is a legally binding contract between you either an individual or a single corporate entity and Matrix Science Limited for the product identified above which includes computer software electronic documentation any printed documentation and any subsequent updates and supplements the Software By installing or using the Software you agree to be bound by the terms of this agreement If you do not agree to the terms of this agreement we are unwilling to license the Software to you In this case do not install or use the Software Return the nackane that
294. ups and has buttons for deleting adding and editing users and groups The global security options can also be modified from this page On all the pages there is a help window that gives details about specific options just position the mouse over the relevant hyperlink to see the help 220 Mascot Installation and Setup Mascot Security Administration Utility Microsoft Internet Explorer File Edit View Favorites Tools Help x x a O p gt Search Sie Favorites B ia Powermarks Tif Ab Address http slaveoz mascot x cai security_admin pl Ej co snet Logged in as Administrator Groups PowerUsers Daemons icarstenp it MascotIntegraSystem Options Option Option Security enabled Verify IP address Session timeout Logging level Mascot Integra server Default password expiry URL http integra 8080 topaz perag password oooO Mascot Integra database integra Use session cookies Integra Oracle server integra Help window i i Use this configuration application to add delete edit users and groups For further help on any input parameter hold the mouse over the blue text 2 Local intranet To add a new user click on the Add button F Mascot Security Administration Utility Microsoft Internet Explorer File Edit View Favorites Tools Help x Q pak Q x a A J Search she Favortes 2 A 2 LW
295. us screen and the database will not be available for searching Error messages from Moni tor are logged to errorlog txt in the mascot logs directory Both this file and monitor log can be viewed using links on the Mascot Status page The input file which defines a test search can be found in the mascot data test directory The filename is constructed from the name of the 52 Mascot Installation and Setup database together with the extension asc For example SwissProt asc Note Test files for new databases are generated by modifying do_not_delete asc Never delete this file The output of the test search may change slightly with each new update to a database Sequences may be corrected or descriptions modified Quite often a new entry appears which is very homologous with one of the matched proteins so that it appears on the hit list Using SwissProt 2012_03 the report from running the standard test search is shown on the following pages Chapter 4 Validation 53 MATRIX USCIENCES Mascot Search Results User Monitor Test DB 0 Email Search title MS MS Test Search MS data file test_search mgf Database SwissProt 2012 03 535248 sequences 189901164 residues Timestamp 14 Apr 2012 at 18 46 03 cur Protein hits CH6O HUMAN 60 kDa heat shock protein mitochondrial OS Homo sapiens GN MSED1 PEwl SW 2 CH6O DROME 60 kDa heat shock protein mitochondrial OS Drosophila melanogaster GNeHsp60 PE 1 3V 3 CH6O_CAREL Chap
296. used at the top of a search progress report You can customise this by substituting the URL of your own logo For optimum appearance the image should be 88 pixels wide and 31 pixels high MailTempFile see EmailErrorsEnabled MailTransport see EmailErrorsEnabled MascotCmdLine see ErrorLogFile MascotControlFile see ErrorLogFile MascotJobIdFile see ErrorLogFile MascotMessage A text string to be displayed ahead of the progress reports when a search is run MassDecimalPlaces 2 Mascot calculates all masses to an accuracy of 1 65535 Daltons The number of decimal places used to display peptide mass values in reports can be altered by changing this value MaxAccessionLen Obsolete MaxConcurrentSearches 10 Chapter 6 Configuration amp Log Files 91 This parameter limits the maximum number of concurrent searches so as to avoid overloading the Mascot server Default is 10 MaxDatabases 64 The maximum number of concurrently active sequence databases In creasing this value uses more RAM so don t set unnecessarily high There is no upper limit to this value You need to restart the Mascot service after changing this value MaxDescriptionLen 100 Description text parsed from the FASTA title line will be truncated at this number of characters Note There is no need to recompress a database if this parameter is changed MaxEtagMassDelta 1770 MinEtagMassDelta 130 In an error tolerant tag search with a fully specific enzy
297. which an arbitrary group of files can be de fined for searching either immediately or at some pre set time 2 Real time monitor mode in which new files on a pre defined path are searched as they are created 3 Data dependent follow up tasks For example automatically repeating an unsuccessful search at a later date or against a different sequence database Tasks The functional unit of Mascot Daemon is a task A task can be created or modified in the Task Editor A task is defined by 1 The data source e g a file list or a file path 2 How the data are to be searched an associated set of search parameters 3 When the searches are to take place 4 Any follow up activities such as conditional repeat searches Tasks can be in one of four states running paused completed or can celled A paused task can be resumed A paused or completed task can be cancelled or deleted Data Files Data files can be any of the peak list formats supported by Mascot Other types of file such as binary data can be specified if an appropriate data import filter is available 180 Mascot Installation and Setup Flexibility A wide range of native file formats can be processed using the Mascot Distiller library requires an additional licence If AB SCIEX Analyst is installed on the same system as Mascot Daemon Analyst WIFF data files can be processed using the mascot dll script If AB SCIEX Data Explorer is installed o
298. xample 192 168 114 201 192 168 114 201 255 255 255 0 Now press OK buttons in this and in the previous window in order to return to the Windows Firewall dialog which should now look like this Chapter 11 Cluster Mode 191 Windows Firewall General Exceptions Advanced Windows Firewall is blocking incoming network connections except for the programs and services selected below Adding exceptions allows some programs to work better but might increase your security risk Programs and Services Name File and Printer Sharing MascotNodePort5001 TCP Remote Assistance O Remote Desktop O UPrP Framework Add Program Add Port Display a notification when Windows Firewall blocks a program What are the tisks of allowing exceptions Press OK Repeat this entire procedure on every search node Vista and Server 2008 On each search node log in as a user with local administrator rights Go to Control Panel Network Status and ensure the network connection to the master node is described as Private If it shows as Public choose customise to change it Under Sharing and Discovery Enable File Shar ing 192 Mascot Installation and Setup GO SS gt Control Panel Network and Sharing Center file Edit View Tools Help Network and Sharing Center View computers and devices os View full map Connect to a network Set up a connection or network L A Manage netw
299. xonomy tar bd Name Size Packed Size Modified Mode User Group Link D estdmp 15185 15360 2000 11 15 16 13 Orw rw rw root root gencode dmp 3377 3584 2012 04 21 10 20 Orw rw rw root root gi_taxid_prot 815 798 994 815799296 2012 04 21 10 23 Orw v root root _ merged dmp 436 371 436736 2012 04 21 10 20 Orw rw rw root root LJnames dmp 81 830 934 81831424 2012 04 21 10 20 Orw rw rw root root LJ nodes dmp 67 249 279 67 249664 2012 04 21 10 20 Orw rw rw root root LJ owl dmp 7643 7680 1999 09 21 15 50 Orw rw rw root root L usernodes d 264 512 2000 07 31 11 17 Orw rw rw root root H speclist bt 1700 361 1700864 2012 04 21 14 08 Orw rw rw root root E w r 9 object s selected 967 042 408 15185 2000 11 15 16 13 3 Edit mascot dat The first time you use Database Manager database related configura tion information is moved to an XML file and the database related sections of mascot dat are re written whenever changes are saved So either use Database Manager all the time or edit mascot dat by hand all the time You cannot swap between the two The format of mascot dat is described in Chapter 6 For NCBInr configu ration information is already present in mascot dat but commented out to make the database inactive Once all the files are in place all you need to do is check the path is correct then remove the comment charac ter and any leading space at the start of the NCBInr line in the Databases s
300. y FormVersion a warning will be included in the results file and in the master results report GetSeqJobIdFile see ErrorLogFile ICATQuantitationMethod ICAT For backward compatibility if a search is submitted from an old client with ICAT ON then the specified quantitation method will be used IgnoreDupeAccessions EST others A comma separated list of database names For any database in this list don t check for duplicate accession numbers when creating the com pressed files A database should only be added to this list if it has a very Chapter 6 Configuration amp Log Files 89 large number of sequence which may causes the system to run out of memory when creating the compressed files IgnorelIonsScoreBelow 0 0 When a report is generated any ions score lower than this value will be set to zero and ignored The parameter is a floating point number de fault 0 0 Values greater than 0 and less than 1 act as an expect value threshold and the scores for any peptide matches with higher expect values are set to 0 This global default can be over ridden on an indi vidual report URL by appending amp _ignoreionsscorebelow X where X is the cut off value IntensitySigFigs 2 The precision of intensity values written to the result file InterFileBasePath see ErrorLogFile InterFileRelPath see ErrorLogFile IonsDecimalPlaces 2 Mascot calculates all masses to an accuracy of 1 65535 Daltons The number of decimal places used to
301. y purpose on any computer system and to alter it and redistribute it subject to the following restrictions 1 The author is not responsible for the consequences of use of this software no matter how awful even if they arise from flaws in it 2 The origin of this software must not be misrepresented either by explicit claim or by omission Since few users ever read sources credits must appear in the documentation 3 Altered versions must be plainly marked as such and must not be misrepresented as being the original software Since few users ever read sources credits must appear in the documentation 4 This notice may not be removed or altered ve Copyright c 1994 The Regents of the University of California All rights reserved Redistribution and use in source and binary forms with or without modification are permitted provided that the following conditions are met 1 Redistributions of source code must retain the above copyright notice this list of conditions and the following disclaimer 2 Redistributions in binary form must reproduce the above copy notice this list of conditions and the following disclaimer in the documentation and or other materials provided with the distribu End User Licence Agreements XXV 3 All advertising materials mentioning features or use of this software must display the following acknowledgement This product includes software develop

Download Pdf Manuals

image

Related Search

Related Contents

2000 (12 [v]) 4000 (230 [v])  Tascam VL-X5 User's Manual  User manual  Henny Penny HDS-200 User's Manual  Jet Tools JML-1014 User's Manual  etiquetado propuesta cofepris acuerdo por el que se emiten los  Téléphone mains libres à deux lignes Mémoire de 32 numéros  No Drilling Required GB32018-SS-NDR Installation Guide    Mode d`emploi  

Copyright © All rights reserved.
Failed to retrieve file