Home

Biomedical Genomics Workbench

1. Figure 6 69 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability CHAPTER 6 WHOLE EXOME SEQUENCING WES 146 of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match
2. Figure 7 31 Select the target regions track one labeled Export Parameters allows specification of the export destination When selecting an export location you will export the analysis parameter settings that were specified for this specific experiment 9 Click on the button labeled OK to go back to the previous wizard step and choose Save Note If you choose to open the results the results will not be saved automatically You can always save the results at a later point Output from the Identify Somatic Variants from Tumor Normal Pair TAS workflow Eight different outputs are generated e Read Mapping Normal The mapped sequencing reads for the normal sample The reads are shown in different colors depending on their orientation whether they are single CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 180 a Identify Somatic Variants from Tumor Normal Pair TAS Result handling 1 Choose where to run 2 Select normal sequencing reads 3 Select tumor sequencing reads 4 InDels and Structural Workflow parameters Variants normal Preview All Parameters 5 QC for Target Sequencing Result handling 6 InDels and Structural Open Variants tumor om 7 Low Frequency Variant Detection Log handling 8 QC for Target Sequencing Open log tumor 9 Remove Variants Outside Targeted Regions 10 Remove Germline Variants 11 Result handling EBES Erm
3. gt Q lt enter search term gt Figure 4 9 Select the version of the 1000 genomes or Hapmap database you want to work with or select the option custom suspect that a downloaded reference is corrupt and needs to be re downloaded or if you need to clean up space e g locally Note Custom reference data sets specific to the workbench on which they are created and will not appear in other workbenches connected to the same server At the bottom of the wizard you can find e A button Help button that links to the section in the Biomedical Genomics Workbench reference manual that describes the Manage Reference Data button e A Create Custom Set button that allows you to create your own set of reference data from an existing data Sets Clicking on this button will open a window figure 4 10 where you can edit the name of the data set the organism it represents the chromosomal extension and the annotation types used For each type of reference a drop down menu allows you to choose from the different versions available as well as from a custom database This is useful when you have your own version of the reference data that you have imported in CHAPTER 4 GETTING STARTED 45 the workbench and that you would like to use rather than the currently available Reference Data Sets The customs data sets are saved under the Custom Reference Data Sets tile Do not forget to click on the button Apply if you
4. Figure 3 8 Specify the parameters for the QC for Target Sequencing tool Specify the target region for the Indels and Structural Variants tool figure 3 9 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents _InDels and Structural Variants Configurable Parameters Restrict calling to target regions gt 0293689_Regions_BED gt Locked Settings Figure 3 9 Specify the parameters for the Indels and Structural Variants tool Specify the relevant 1000 Genomes populations figure 3 10 Note this window will appear in workflows that annotate variants with information from the 1000 Genomes project unless you have already selected the relevant populations of interest in your reference data management prior to running the workflow Some wizard window will be called Add Information from 1000 Genomes Project or Remove Variants found in the 1000 Genomes Project Specify the 1000 Genomes population that should be used to add or filter out variants found in the 1000 Genomes project This can be done using the drop down list found in this wizard step Please note that the populations
5. gt gt gt ERR319085 Reads locally 1 Choose where to run 2 Select variants identified in tumor 3 1000 Genomes Figure 7 16 Specify which 1000 Genomes population to use for annotation Choose where to run Select variants identified in tumor z 1000 Genomes Remove Variants Outside Targeted Regions P Filter Somatic Variants TAS Remove Variants Outside Targeted Regions Targeted region track gt target regions Figure 7 17 Select your target regions track 1 2 note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 7 Click on the button labeled Next to go to the last wizard step Shown in figure 7 20 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Filter Somatic Variants TAS workflow Two types of output are generated 1 Somatic Candidate Variants Track that holds the variant data This track is also included CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 173 r Filter Somatic Variants TAS Choose wWier to nn Remove Variants Found in 1000 Genomes Project 1000 Genomes Project variant
6. ee eae ee Wleiabin or se oe ee ae ee Ip Track List_1 X Homo_sapiens_sequence_hg1i9 Homo_sapiens_ensembli_v73_Genes Gene annotations 5 321 D K dl 4 D 4 Id DID D D DID EEDE CE amp lt d ad HEDD lt a Homo_sapiens_ensembl_v73_CDS CDS annotations 7 923 eal ome pi a See eee i a U 3 076 00 Paired reads_Sample_1 locally realigned 1 4 171 282 reads 0 00 114 Sample_1 paired Reads 1 locally realigned Variants MVF 1 Variants 8 447 A Bk c a aa LL a AeA hi i a Le oa Figure 2 11 When zooming in on a selected region more and more details become visible In this image the individual genes are visible To distinguish the individual exons you would have to zoom in a bit further to a particular SNV in the Genome Browser view you could click on the row in the variant track table This is shown in figure 2 15 Add tracks to a Genome Browser view The most simple way to add a track to the Genome Browser view is simply to locate the file in the Navigation Area click on the file while holding down the mouse key and drag it into the genome Browser view in the View Area When you drop the file in the Genome Browser view the track will be added to the Genome Browser view figure 2 16 Note After having added a new track to the Genome Browser view an asterisk has appeared on the Genome Browser view tab This indicates that the Genome Browser vi
7. Figure 6 59 Specify the parameters for the QC for Target Sequencing tool e Ignore broken pairs reads that belong to broken pairs will be ignored Specify the parameters for the QC for Target Sequencing tool for the affected child Specify the parameters for the QC for Target Sequencing tool for the unaffected parent Specify the parameters for the QC for Target Sequencing tool for the affected parent Specify the parameters for the Fixed Ploidy Variant Detection tool for the unaffected parent Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected parent Specify the parameters for the Fixed Ploidy Variant Detection tool for the the proband Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters and it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify Causal Inherited Variants in a Family of Four WES workflow Six types of output are generated Reads Tracks One for each family member The reads mapped to the reference sequence Variants in One track for each family member The variants identified in each of the family members The variant track can be opened in table view to see all information about the variants Putative Causal Variants in Child The putative disease causing variants identified in the child The variant track can be opened in table vi
8. Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage Minimum count Minimum frequency gt Locked Settings Figure 6 80 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural var
9. r Bx Prepare Overlapping Raw Data Select input for PrepareOverlappingReads 1 Select input for Prepare Navigation Area Selected elements 4 Overlapping Raw Data _ B HIF_batchS_FF1_GCCAAT_LOOS HTF_batch5_FF1_GCCAAT_LO06 HTF_batch5_FF1_GCCAAT_LOO7 HIF_batchS_FF1_GCCAAT_LOOS 2013 11 21 test for manual HTF_batchS_FF1_GCCAAT_LOO5_R1_00 i HTF_batchS_FF1_GCCAAT_LO08_R1_00 i HTF_batch5_FF1_GCCAAT_LO06_R1_00 HTF_batchS_FF1_GCCAAT_LOO7_R1_00 H HTF_batch5_FF1_GCCAAT_LOO5_R1_00 gt HTF_batch5_FF1_GCCAAT_LO08_R1_00 1E HTF_batchS_FF1_GCCAAT_LO06_R1_00 HTF_batchS_FF1_GCCAAT_LOO7_R1_00 G E 2013 11 21 test for manual batch i HTF_batch5_FF1_GCCAAT_LOO5_R1 i HTF_batch5_FF1_GCCAAT_LO06_R1 i HTF_batch5_FF1_GCCAAT_LOO7_R1i i HTF_batch5_FF1_GCCAAT_LO08_R1 i HTF_batch5_FF1_GCCAAT_LOOS_Ri i HTF_batchS_FF1_GCCAAT_LOO5_R1 i HTF_batchS_FF1_GCCAAT_LOO7_R1 lt a uw Qy lt enter search term gt F Batch Figure 4 22 Select the sequencing raw data that should be prepared for further analysis At this step you can also choose to prepare several reads in batch mode tick Batch at the bottom of the wizard as shown in figure 4 22 and select the folder that holds the data you wish to analyze When you have selected the sample s you want to prepare click on the button labeled Next CHAPTER 4 GETTING STARTED 53
10. e Minimum count Only variants that are present in at least this many reads are called CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 211 e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html 14 Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected child 15 Specify the parameters for the Fixed Ploidy Variant Detection tool for the father 16 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters and it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Rare Disease Causing Mutations in a Trio TAS workflow Twelve different types of output are generated e Reads Tracks One for each family member The reads mapped to the reference sequence e Variant Tracks One for each family member The variants identified in each of the family members The variant track can be opened in table view to see all information about the variants e De novo variants Variant track showing de novo variants in the proband The variant track can be opened in table view to see all information about the variants e Recessive variants Variant track sh
11. lit ul Figure 6 29 Specify setting for removal of germline variants f Bx Identify Somatic Variants from Tumor Normal Pair WES InDels and Structural Variants normal Configurable Parameters Choose where to run sae tumor sequencing Restrict calling to target regions gt gt target_regions reads Select normal sequencing gt Locked Settings reads InDels and Structural Variants tumor Low Frequency Variant Detection QC for Target Sequencing tumor Remove Variants Outside Targeted Regions 2 Remove Germiine Variants InDels and Structural Variants eanan naan Figure 6 30 Select target region track In the Preview All Parameters wizard you can only check the settings and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows At the bottom of this wizard there are two buttons regarding export functions one button allows specification of the export format and the other button the one labeled Export Parameters allows specification of the export destination When selecting an export location you will export the analysis parameter settings that were specified for this specific experiment 9 Click on the button labeled OK to go back to the previous wizard step and choose Save Note If you choose to open the results the results will not be saved automatically
12. 2 As part of the data preparation the Sequences are trimmed In the wizard shown in figure 4 23 you can specify different trimming parameters and select the adapter trim list that should be used for adapter trimming by clicking on the folder icon acy A Prepare Overlapping Raw Data Trim Sequences 1 Selectinput for Prepare Overlapping Raw Data Settable parameters 2 Trim Sequences Trim adapter list Illumina adpter list ve Ambiguous trim J Ambiguous limit 2 Quality trim v Quality limit 0 05 Use colorspace Also search on reversed sequence Remove 5 terminal nucleotides Number of 5 terminal nucleotides 1 Maximum number of nucleotides in reads 1000 Minimum number of nucleotides in reads 15 Discard short reads Remove 3 terminal nucleotides Number of 3 terminal nucleotides 1 Discard long reads Figure 4 23 Select your adapter trim list You can use the default trim parameters or adjust them if necessary 3 Click on the button labeled Next This will take you to the next wizard step figure 4 24 lE amp Prepare Overlapping Raw Data Result handli 1 Select input for Prepare ee Overlapping Raw Data 2 Trim Sequences 3 Result handling Workflow parameters Preview All Parameters Result handling Open Save Log handling Open log Figure 4 24 Check the settings and save your results At this step you get the chance
13. 2 Specify the RNA seq reads from the normal sample The panel in the left side of the wizard shows the kind of input that should be provided figure 8 16 Select by double clicking on the reads file name or clicking once on the file and then clicking on the arrow pointing to the right side in the middle of the wizard Click on the button labeled Next m Identify Candidate Variants and Genes from Tumor Normal Pair Lik EE Select sequencing reads oose where to run Navigation Area Selected elements 1 e 2 Select normal sequencing f IE 2N R1001 paired a 27N_R1_001 paired 23T R1_001 paired 26N_R1_001 paired 26T_R1_001 paired ui 27T_R1_001 paired 45N_R1_001 paired 45T_R1_001 paired 4 m n Q zenter search term gt Figure 8 16 Select the RNA seq reads from the normal sample 3 In the next step you will be asked to select the RNA seq reads from the tumor sample see figure 8 17 P Identify Candidate Variants and Genes from Tumor Normal Pair iik ane Select sequencing reads Choose e to run a Navigation Area Selected elements 1 2 Select normal sequencing 23N_R1_001 paired a 27T_R1_001 paired reads 23T_R1_001 paired l iE 26N_R1_001 paired 3 Select tumor sequencing 26T_R1_001 paired 27N_R1_001 paired gt 45N_R1_001 paired 4i 45T_R1_
14. 5 0 Read position filter Significance Relative read direction filter Significance Remove pyro error variants Pyro filter length With frequency below 0 8 gt Locked Settings Previous X Gencal Figure 7 45 Specify the parameters for variant calling 1 Choose where to run 2 Select sequencing reads 3 1000 Genomes 4 QC for Target Sequencing 5 lt Low Frequency Variant Detection 6 Remove Variants Outside Targeted Regions ja Bx Identify and Annotate Variants TAS Remove Variants Outside Targeted Regions Targeted region track E target regions 189 Figure 7 46 In this wizard step you can specify the target regions track Variants found outside these regions will be removed can only check the settings and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows 10 Choose to Save your results and press Finish Note If you choose to open the results the results will not be saved automatically You can always save the results at a later point Output from the Identify and Annotate Variants TAS workflow The Identify and Annotate Variants TAS tool produces several outputs Please do not delete any of the produced files alone as some of them are linked to other outputs Please always delete all of them at
15. CHAPTER 1 WELCOME TO BIOMEDICAL GENOMICS WORKBENCH QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark http www clcbio com http www qiagenbioinformatics com VAT no DK 28 30 50 87 Email support clcbio qiagen com Telephone 45 70 22 32 44 If you have questions or comments regarding the program you can contact us through the sup port team as described here http www clcsupport com clcgenomicsworkbench current index php manual Getting_help html Chapter 2 Introduction to user interface workflows and tracks Contents 2 1 Thestartscreen 2 ee eee te 10 2 1 1 The getting started table 2 2 ee eee ee 10 22 Import of example data uaiaea daw GGe ee a eure din dcae ss 12 2 2 The user interface 2 0 ee eee 13 Seek JOT ac bha te de eee eee eee Hee ee ee 14 2 3 Workflows an overview 2 08 eee eee ee es 15 2 4 The track format 2 ee ee eee 17 2A TOOK OPOS a ko eH eh eS ER Ra eR 17 2 4 2 The Genome Browser 0 00 0 2 eee eee eee ee ee 17 This section introduces the Biomedical Genomics Workbench general features and functionalities including the user interface and a general introduction to workflows and tracks The information in this chapter underpins that of later chapters and is highly recommended for new users of the Workbench You can find more detailed information in the Biomedical Genomics Work bench reference manual which can be found online at http ww
16. CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 170 yy Genome Browse X 115 258 660 115 258 680 115 258 700 115 258 720 115 258 740 115 258 760 115 258 780 115 258 800 115 258 820 l l l l l l l Homo_sapiens_sequen PINABII CRIRACRORIAI D CORCTAIIACOAIAAO O DIAO DOIA 0 G00 ADOA 0 UNDE BARATEO BRACO RORACCARRROOCRO DOO ORRIRA OOROICOCAORA O OOA COUR DO OCOCRO OIOOIO DONIOIO Homo_sapiens_ensem Gene annotations 5 363 Homo_sapiens_ensem Homo_sapiens_ensem bl_v74_CDS CDS annotations 8 031 ERR319085 Target Regions Coverage BED annotations 3 o ERR319085 Read apping a 2 955 reads o e e e ERR319085 Overview 00 0 Variants Detected Dg Variants 7 801 I Do 0 ERR319085 Variants Hl 0 Detected in Detail Doo Variants 7 801 li Do o p mS Fa Oa Figure 7 13 Genome Browser View that allows inspection of the identified variants in the context of the human genome and external databases human reference sequence are first filtered away then variants outside the targeted region are removed and lastly variants found in the Common dbSNP 1000 Genomes Project and HapMap databases are deleted Variants in those databases are assumed to not contain relevant somatic variants Please note that this tool will likely also remove inherited cancer variants that are present at a low percentage in a population Next the remaining somatic variants are annotated with gene names amino acid chang
17. Pressing the button Preview All Parameters allows you to preview all parameters At this CHAPTER 6 WHOLE EXOME SEQUENCING WES 113 Bx Filter Somatic Variants WES Remove Variants Found in 1000 Genomes Project Choose where to run 1000 Genomes Project variant track 1000GENOMES_phase_1 EUR Select variants identified in tumor 1000Genomes Remove Variants Outside Targeted Regions Remove Variants Found in 1000 Genomes Project Choose where to run Select variants identified in tumor 1000 Genomes Remove Variants Outside Targeted Regions lt Remove Variants Found in 1000 Genomes Project Remove Variants Found in Result handling Choose where to run Select variants Workflow parameters identified in tumor Preview All Parameters 1000 Genomes Result handling Remove Variants Outside Targeted Regions Open Remove Variants Found in Save 1000 Genomes Project Log handling Open log Remove Variants Found in HapMap Result handling Previous Figure 6 20 Check the selected parametes by pressing Preview All Parameters step you can only view the parameters it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Filter Somatic Variants WES workflow Two types of output are generated e Somatic Candidate Variants Track that h
18. Select sequencing reads Navigation Area Selected elements 1 2 Selectreads from Family of Four A Family member affected affected family member i Affected child 1 Father affected Mother unaffected 1 Choose where to run Family member affected uw Previous gt Next Figure 7 60 Specify the sequencing reads for the appropriate family member 3 Select the reads for the affected parent 4 Select the targeted region file figure 7 61 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents 5 Select the reads for the affected child 6 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap figure 7 62 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 Specify the parameters for the QC for Target Sequencing tool for the affected child figure 7 63 CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 200 Select input for ta
19. b Locked Settings Previous gt Next Figure 6 73 Specify the proband s gender T Specify the Hapmap populations that should be used for filtering out variants found in Hapmap for the father figure 6 74 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 mE Remove Variants Found in HapMap 3 1 Select variant tracks HapMap database track Selected 12 elements 1000 Genomes population Remoye Variants Found in 1000 Genomes Project Bx Select Variant track x Remove Variants Found in Selected HapMap 3 HAPMAP _phase_3_MKK HAPMAP phase _3_CHD HAPMAP _phase_3_TSI HAPMAP _phase_3_CHB A HAPMAP _phase_3_GIH HAPMAP _phase_3_HCB HAPMAP _phase_3_LWK HAPMAP _phase_3_CEU HAPMAP phase _3_MEX HAPMAP phase _3_YRI HAPMAP _phase_3_JPT HAPMAP _phase_3_ASW Figure 6 74 Select the relevant Hapmap population s 8 Specify the Hapmap populations that should be used for filtering out variants found in CHAPTER 6 WHOLE EXOME SEQUENCING WES 150 10 11 12 13 Hapmap for the mother Specify the Hapmap populations that should be used for filtering out variants found in Hapmap from the de novo assembly Specif
20. hia le lt t PIERE 7 a a wm Figure 6 21 The Genome Browser View showing the annotated somatic variants together with a range of other tracks To see the level of nucleotide conservation from a multiple alignment with many vertebrates in the region around each variant a track with conservation scores is added as well Mapped sequencing reads as well as other tracks can be easily added to this Genome Browser View By double clicking on the annotated variant track in the Genome Browser View a table will be shown that includes all variants and the added information annotations see figure 6 22 Adding information from other sources may help you identify interesting candidate variants for further research E g common genetic variants present in the HapMap database or variants known to play a role in drug response or other clinical relevant phenotypes present in the ClinVar database can easily be identified Further variants not found in the ClinVar database can be prioritized based on amino acid changes in case the variant causes changes on the amino acid level CHAPTER 6 WHOLE EXOME SEQUENCING WES 115 Ify Genome Browse X 115 258 720 115 258 740 115 258 760 115 258 780 115 258 733 Homo_sapiens_sequence_hg19 TT CT GGATTAGCT GGATT GT CAGTGCGCTTTTCCCAACACCACIGT GCT CCAACCACCACCAGTTT GTACTCAGT CATTTCACACCAG Homo_sapiens_ensembI_v73_Genes Gene annotations 5 321 Homo_sapiens_ensembI_v73_mRNA
21. m Disease Identify Rare Disease Causing Mutations in a Trio WGS Fere 1 Double click on the Identify Rare Disease Causing Mutations in a Trio WGS tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the sequencing reads from the father figure 5 44 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Select sequencing reads Navigation Area Selected elements 1 2 Selectreads from Family of Four a i Family member affected affected family member Affected child Father affected Mother unaffected 1 Choose where to run fff amily member affected Aii Qr lt enter search term gt Previous gt Next Figure 5 44 Specify the sequencing reads for the appropriate family member 3 Select the sequencing reads from the mother CHAPTER 5 WHOLE GENOME SEQUENCING WGS 95 4 Select the sequencing reads from the affected child 5 Specify the parameters for the Fixed Ploidy Variant Detection tool for the mother fig ure 5 45 The parameters used by the Fixed Ploidy Variant Detectio
22. reads E Ere TEE 45N_R1_001 paired 4 45T_R1_001 paired 4 nm i gt lt enter search term gt ES Compare Variants in DNA and RNA InDels and Structural Variants RNA Configurable Parameters 1 Select DNA sequencing reads i Restrict calling to target regions gt 50293689 Regions_BED 2 Select RNA sequencing p J reads b Locked Settings 3 InD ls and Structural Variants RNA gt S Figure 8 10 Specify the target region for the Indels and Structural Variants tool 5 Set the parameters for the Low Frequency Variant Detection step for your RNA sample see figure 8 11 For a description of the different parameters that can be adjusted in the variant detection step see http clcsupport com biomedicalgenomicsworkbench current index php manual Low_Frequency_Variant_Detection html If you click on Locked Settings you will be able to see all parameters used for variant detection in the ready to use workflow 6 If you are working with the workflow from the Human folder specify here the relevant 1000 Genomes population for your RNA sample from the drop down list see figure 8 12 Choose the population that matches best the population your samples are derived from Under Locked settings you can see that Automatically join adjacent MNVs and SNVs CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS ES Compare Variants in DNA and RNA recs Low Frequency Variant Detection
23. 1 Double click on the Identify Causal Inherited Variants in a Trio WGS tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the sequencing reads from the affected parent figure 5 37 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that CHAPTER 5 WHOLE GENOME SEQUENCING WGS 88 Select sequencing reads Navigation Area Selected elements 1 2 Select reads from eS Family of Four E F Family member affected affected family member f Affected child Father affected Mother unaffected 1 Choose where to run H Family member affected w Qr lt enter search term gt Previous gt Next Figure 5 37 Specify the sequencing reads for the appropriate family member Should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard 3 Select the reads for the unaffected parent 4 Select the reads for the affected child 5 Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected parent figure 5 38 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyse
24. 7 Identify and Annotate Variants WES Low Frequency Variant Detection Configurable Parameters Choose where to run Select sequencing reads Significance 1 0 1000 Genomes Target regions QC for Target Sequencing Ignore broken pairs a Low Frequency Variant ignore non specific matches laada Detection Minimum read length 20 Minimum coverage Minimum count Minimum frequency Base quality filter Read direction filter Direction frequency Read position filter Significance Relative read direction filter Significance Remove pyro error variants Pyro filter length With frequency below b Locked Settings Figure 6 45 Specify the parameters for variant calling 6 Click on the button labeled Next which will take you to the next wizard step figure 6 46 In this dialog you can specify the target regions track The variants found outside the targeted region will be removed at this step in the workflow P H Identify and Annotate Variants WES Remove Variants Outside Targeted Regions Choose where to run Targeted region track gt target regions Select sequencing reads 1000 Genomes QC for Target Sequencing Low Frequency Variant Detection Remove Variants Outside Targeted Regions Figure 6 46 In this wizard step you can specify the target regions track Variants found outside these regions w
25. Adding information from other sources may help you identify interesting candidate variants for further research E g known common genetic variants present in the HapMap database or variants known to play a role in drug response or other clinical relevant phenotypes present in the ClinVar database can easily be identified Further variants not found in the ClinVar database can be prioritized based on amino acid changes in case the variant causes changes on the amino acid level CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 225 A high conservation level between different vertebrates or mammals in the region containing the variant can also be used to give a hint about whether a given variant is found in a region with an important functional role If you would like to use the conservation scores to identify interesting variants we recommend that variants with a conservation score of more than 0 9 PhastCons score is prioritized over variants with lower conservation scores It is possible to filter variants based on their annotations This type of filtering can be facilitated using the table filter found at the top part of the table If you are performing multiple experiments where you would like to use the exact same filter criteria you can create a filter that can be saved and reused To do this Toolbox Identify Candidate Variants Create Filter Criteria This tool can be used to specify the filter and the Annotate Var
26. CHAPTER 3 READY TO USE WORKFLOWS DESCRIPTIONS AND GUIDELINES 35 1000 Genomes population 1000 Genomes gt gt NOMES_phase_1_EUR 1000GENOMES_phase_1_AMR 1000GENOMES_phase_1_AFR j Figure 3 10 Select the relevant 1000 Genomes population s available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 Specify the relevant Hapmap populations figure 3 11 Note this window will appear in workflows that annotate variants with information from the Hapmap project unless you have already selected the relevant populations of interest in your reference data management prior to running the workflow a Remove Variants Found in HapMap HapMap database track Selected 12 elements Select Variant track Selected HAPMAP _phase_3_CEU HAPMAP_phase_3_CHB HAPMAP_phase_3_ASW IHAPMAP_phase_3_GIH IHAPMAP_phase_3_CHD IHAPMAP_phase_3_MEX IHAPMAP_phase_3_LWK HAPMAP_phase_3_MKK IHAPMAP_phase_3_TSI HAPMAP_phase_3_JPT IHAPMAP_phase_3_HCB HAPMAP _phase_3_YRI ancer cas Disease WES Figure 3 11 Select the relevant Hapmap population s Some wizard window will be called Add Information from the Hapmap project or Remove Variants found in Hapmap Specify the Hapmap population that should be used to add or filter out variants found in the Hapmap project
27. PX Biomedical Genomics Workbench APPLICATION BASED MANUAL Manual for Biomedical Genomics Workbench 2 5 1 Windows Mac OS X and Linux October 15 2015 This software is for research purposes only CLC bio a QIAGEN Company Silkeborgvej 2 Prismet DK 8000 Aarhus C Denmark tll bio A QIAGEN company Contents Introduction 6 Welcome to Biomedical Genomics Workbench 7 1 1 Introduction to Biomedical Genomics Workbench 44 f 1 2 Available documentation sae 05 2 4s we ae ee Oo oe we ee we ee ws 8 1 3 The material covered by this manual 0 0 ee ee ee ee ee 8 1 4 We welcome your comments and suggestions 00 025588 8 15 GOMMSCLINOUMEUON is saara sda adie bawia Eak bw oe eee ee 5 8 Introduction to user interface workflows and tracks 10 ook WHO SIOMeClee o sa ee ee Bee ee kee ee eee ea ee ee a 10 Moe NEUSE MENICE ceca ee ee bee eee we eee eee eee eee Ss 13 2 3 WOCTIOWS GINOVEIVIEW ci aacuiaeectbaaneiaeettebueauwd ag 15 24 TRE WSCRTIONNOL cca cteaneeeeivceotdenueeeinbctaageeeudd 17 Applications ready to use workflows 27 Ready to Use Workflows descriptions and guidelines 28 3L G neral WOrKTOW so s saarnaan ara ee eB oe oe ee Re 28 So POMOCE OS sesapa arae eaaa e a a bees 29 3 3 Hereditary Disease aaoo a a 29 Getting started 36 See ROCE UI e a i ae ae ee eh ee eee ee eee ew a ee a a i 36 ee CER ISI sss kee eae ee Ret Cee eee ee et baeeeueen 45 4 3 Import sequencing data no
28. Toolbox Ready to Use Workflows Whole Genome Sequencing z Hereditary Disease Identify Rare Disease Causing Mutations in a Family of Four WGS Ser KE 1 Double click on the Identify Rare Disease Causing Mutations in a Family of Four WGS tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the sequencing reads from the unaffected sibling figure 5 40 CHAPTER 5 WHOLE GENOME SEQUENCING WGS 91 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Select sequencing reads Navigation Area Selected elements 1 2 Selectreads from W 3 Family of Four a i Family member affected affected family member Affected child gt Father affected 1 Choose where to run Mother unaffected Family member affected um Previous gt Next Figure 5 40 Specify the sequencing reads for the appropriate family member Select the sequencing reads from the affected child Select the sequencing reads from the mother Select the sequencing reads from the father O Oo Aa W Specify the paramet
29. locally realigned 94 reads 0 00 6 ERR319087 single Annotated Variants 0 _ 2 799 Cosmic_v67 Variants 126 891 0 ae eit EA as talh kahh ahl ealMfaaladaad 239 ClinVar_20130930 Variants 3 496 0 p d oh Le ok Li al a nia r er os h dl 8 3 1000GENOMES_phase_1_EUR Variants 1 332 748 0 1 00 ns nservation_scores_hg19 Graph ao Fa Seah Figure 7 50 Genome Browser View to inspect identified variants in the context of the human genome and external databases To see the level of nucleotide conservation from a multiple alignment with many vertebrates in the region around each variant a track with conservation scores is added as well By double clicking on the annotated variant track in the Genome Browser View a table will be shown that includes all variants and the added information annotations see figure 7 51 The added information will help you to identify candidate variants for further research For example can common genetic variants present in the HapMap database or variants known to play a role in drug response or other clinical relevant phenotypes present in the ClinVar database easily be seen Not identified variants in ClinVar can for example be prioritized based on amino acid changes do they cause any changes on the amino acid level A high conservation level on the position of the variant between many vertebrates or mammals can also be a hint that this region could hav
30. tumor fe Save Remove Variants Outside Targeted Regions 2 z Log handling Remove Germline Variants Open log InDels and Structural Variants normal QC for Target Sequencing normal 11 Result handling las fan Figure 6 32 Check the parameters and save the results reads or paired reads and whether they map unambiguously For the color codes please see the description of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual View_settings_in_Side_Panel html 2 Read Mapping Tumor The mapped sequencing reads for the tumor sample The reads are shown in different colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the description of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual EQUALS View_settings_in_ Side _Panel html 3 Target Region Coverage Report Normal The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from the normal CHAPTER 6 WHOLE EXOME SEQUENCING WES 121 sample 4 Target Region Coverage Tumor A track showing the targeted regions The table view provides information about the targeted regions such as t
31. Data preparation Prepare sequencing data j Iutorials ndla manual pdf Data analysis Identify variants in whole genome data Analysis of one sample Reference manual pdf EARP Explore amp learn Interpretation Eilter somatic variants Identify and annotate differentially expressed genes and Download reference pathways data Annotate variants Annotate variants impott data Ihe user interface Identify cancer driver mutations using Ingenuity Variant Analysis Identify cancer driver mutations using Ingenuity Variant Analysis overview Ihe track format How to edit workflows Data analysis amp Identify known variants in one sample Identify candidate variants and genes from tumor normal pair Frari examole data interpretation Figure 2 2 The table in the Biomedical Genomics Workbench visible when no datasets have been opened for viewing provides links so that you can quickly navigate to relevant sections of the application based manual To the right hand side of the table the Getting Started and Explore and Learn areas provide links to more general information resources that you may find useful Summary stages in data analysis are listed at the left side of the table Data Preparation Data Analysis Interpretation and Data Analysis and Interpretation Click on the text in the table to CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 12 open the relevant section in the application based manual The recommended way to use the t
32. Fiter Chromosome Region Type Reference Allele Reference Length zygosity Exact match start posi end posit BaseQRank al 115258748 SNV Ji No 1 Heterozygous Cosmic v67 0 0 0 0 il 115258749 SNV i z No 1 Heterozygous 0 0 0 0g 2 212652720 212652721 MNV TG GT No 2 Heterozygous 0 0 0 0 2203 3 AIZOO O a ctieuws ar Ble 41 Letece see 4 OAOD lt i D als Create Track from Selection als EE Fi Figure 5 19 The Genome Browser View showing the annotated somatic variants together with a range of other tracks Adding information from other sources may help you identify interesting candidate variants for further research E g common genetic variants present in the HapMap database or variants known to play a role in drug response or other clinical relevant phenotypes present in the ClinVar database can easily be identified Further variants not found in the ClinVar databases can be prioritized based on amino acid changes in case the variant causes changes on the amino acid level A high conservation level between different vertebrates or mammals in the region containing the variant can also be used to give a hint about whether a given variant is found in a region with an important functional role If you would like to use the conservation scores to identify interesting variants we recommend that variants with a conservation score of more than 0 9 PhastCons score is pr
33. GETTING STARTED 40 r H Manage Reference Data for Installed Workflows vA rd Manage Reference Data Locally w Free space on CLC_References location 197 73 GB Free space on temporary folder location 197 73 GB QIAGEN Custom Reference Data Library Reference Data Sets v Reference Data Sets F hg38 Size on disk 2 Version 3 Reference Data Set 96 15 GB t Re Ensembl v81 dbSNP v142 ClinVar 20150901 Ai Create Custom Set Download Delete Apply g J 4 RefSeq 72 dbSNP v142 ClinVar 20150629 ha38 Reference Data induded sembl v80 dbSNP v142 ClinVar 20150629 Reference Data Version Download Size On Disk Size Applied A 1000 Genomes Project phase_3_ensembl_v81 1 90 GB 10 80 GB No hg19 Ensembl v74 dbSNP v138 ClinVar 20131203 4 Joos ensembl_v81 14 7 MB 60 9 MB No QIAGEN GeneRead Panels hg 19 4 civar 20150901 8 9 MB 105 6 MB No Ensembl v74 K Conservation Scores PhastCons hg38 3 94 GB 6 00 GB No 4 vans Jisne 142 6 19 GB 71 47 GB No nak 4 Jisne Common 142 1 07 GB 3 47 GB No a 4 Ensembl v79 4 cere Ontology 20150630 4 4 MB 46 6 MB No gt Reference Data Elements 4 cenes ensembl_v81 1 9 MB 7 9 MB No gt Tutorial Reference Data Sets HapMap phase_3_ensembl_v81 573 3 MB 3 44 GB No l pi re SEE yee i mrna ensembl_v81 18 4 MB 84 6 MB No a ETA hg38 654 5 MB 700 7 MB No vi Hep Figure 4 3 The Manage Reference Data wizard gives access to th
34. Mapping TGCTGATAC TTTATTACA TTTGCCA C Alc AAGT ACT CAGATGGA TTTTCTTGTTCATTCAGCCTGAGTG 29 824 reads TGCTGATAC TTTATTACA TTTTGCCA C AIGAAAGT ACTCCAGATGGA TTTTCTTGTTCATCCAGCCTGAGTG TGCTGATAC TTTATTACA TTTGCCA C AIG AAAGT ACTCCAGATGGA TTTTCTTGTTCATCCAGCCTGAGTG TGCTGATAC TTTATTACA TTTTGCCA C AG AAAGTAACTCCAGATGGATTTTTCTTGTTCATCCAGCCTGAGTG TGCTGATAC TTTATTACA TTTTGCCA C JAGAAAGT ACTCCAGATGGATTTTTCTTGTTCATCCAGCCTGAGTG s LGCTGATACTTTTATTACA TTITGCCA C AGAAAGT ACTCCAGATGGATTITICTIGTTCATCCAGCCTGAGTG 6410 ERR319085 Overview Variants Detected Variants 5 081 ERR319085 Variants Detected in Detail Variants 5 081 4 a t EH ERR319085 Var X Rows 91 229 Table view Homo sapiens Chromos Region Type Reference Allele Reference Length Zygosity Count Coverage Frequency Forwai 5 112175675 SNV A A Yes 1 Homozygous 5399 5428 99 47 a 5 112175675 SNV A G No 1 Unknown 27 5428 0 50 5 112175676 MNV AG AG Yes 2 Homozygous 5305 5414 97 99 5 112175676 Deletion AG No 2 Unknown 16 5414 0 30 5 112175676 MNV AGAG AGAG Yes 4 Homozygous 4997 5336 93 65 5 112175676 Deletion AGAG No 4 Unknown 0 5336 0 00 5 112175770 SNV G G Yes 1 Unknown 58 2684 2 16 cS 112175770 2620 268a 97 62 5 112175902 MNV AG AG Yes 2 Homozygous 10 10 100 00 g 117170017 Nalatan An a Na 9 Linkrnauin n 1n ann m 4 Mal j sla Create Track from Selection als Zz E Figure 7 14 Genome Browser View with an open
35. Select sequencing reads Track of Target Regions gt target_regions InDels and Structural Minimum coverage 30 Variants Ignore non specific matches QC for Target Sequencing Ignore broken pairs b Locked Settings Figure 6 36 Select the track with the targeted regions from your experiment 4 Click on the button labeled Next which will take you to the next wizard step figure 6 37 In this wizard you can specify the parameter for detecting variants 5 Click on the button labeled Next which will take you to the next wizard step figure 6 38 6 Click on the button labeled Next to go to the last wizard step figure 6 39 In this wizard you get the chance to check the selected settings by clicking on the button labeled Preview All Parameters In the Preview All Parameters wizard step you can only check the settings and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows At the bottom of this CHAPTER 6 WHOLE EXOME SEQUENCING WES a Identify Variants WES Low Frequency Variant Detection Choose where to run Select sequencing reads InDels and Structural Varian QC for Target Sequencing Low Frequency Variant Detection gt Configurable Parameters Required significance Ignore positions with coverage above Restrict calling to target regions Ignore broken pairs Ignore non specific mat
36. Select your target region track a Bx Identify Somatic Variants from Tumor Normal Pair TAS QC for Target Sequencing tumor Choose where to run m Configurable Parameters 2 normal sequencing Track of Target Regions target_regions ye Minimum coverage 30 gt ae eee ae Ignore non specific matches Ignore broken pairs 4 InDels and Structural Variants normal gt Locked Settings 5 QC for Target Sequencing normal 6 InDels and Structural Variants tumor N lt Low Frequency Variant Detection 8 QC for Target Sequencing tumor srs Figure 7 29 Specify setting for removal of germline variants 8 Click on the button labeled Next and once again select the target region track the same track as you have already selected in previous wizard steps figure 7 30 In the next wizard step you must once again select your target regions track This time you specify the track to be used for quality control of the targeted sequencing as this tool reports the performance enrichment and specificity of a targeted re sequencing experiment figure 7 31 In the next wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters figure 32 In the Preview All Parameters wizard you can only check the settings and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in
37. The parameter Detection Frequency will be used in the calculation twice First it will report in the result if a variant has been detected observed frequency gt specified frequency or not observed frequency lt specified frequency Moreover it will determine if a variant Should be labeled as heterozygous frequency of another allele identified at a position of a variant in the alignment gt specified frequency or homozygous frequency of all other alleles identified at a position of a variant in the alignment lt specified frequency Click on the button labeled Next 6 In the last wizard step figure 6 12 you can check the selected settings by clicking on the button labeled Preview All Parameters Identify Known Variants in One Sample WES Result handling 1 Select sequencing reads 2 InDels and Structural Workflow parameters 3 QC for Target Sequencing Preview All Parameters 4 Identify Known Mutations from Sample Mappings Result handling 5 Result handling Save Log handling Open log Previous Figure 6 12 Check the settings and save your results At the bottom of this wizard there are two buttons regarding export functions one button allows specification of the export format and the other button the one labeled Export Parameters allows specification of the export destination T Click on the button labeled OK to go back to the previous dialog box
38. This will make a wizard appear as shown in figure 4 16 3 Locate and select the files to import Note that you can select all sequence files and import them simultaneously If you take a closer look at the different options in this wizard you can see that it is possible to choose different import options We recommend to import data with the standard settings If you wish to make your own adjustments you can find further details about the import options in the Biomedical Genomics Workbench reference manual http www clcbio com support downloads manuals 4 Click on the button labeled Next This will take you to the next wizard step see figure 4 17 5 Choose the default settings to save the sequence data and click on the button labeled Next This will take you to the wizard step shown in figure 4 18 6 Locate the folder in the Navigation Area that you have created for the purpose T Click on the button labeled Finish It can take some seconds or even minutes before all data have been imported and saved 4 4 Prepare sequencing data The first thing to do after data import is to check the quality of the sequencing reads and perform the necessary trimming This applies no matter whether you are working with Whole Genome Sequencing Whole Exome Sequencing Targeted Amplicon Sequencing or Whole Transcriptome Sequencing In the toolbox you can choose between the two different ready to use workflows for data preparation that are shown in
39. Transcript Expression Tumor Expression iN 27N_R1_001 paired be Transcript Expression a Normal E Expression in Differentially Expressed Genes Expression Comparison a annotations SGS a tte EERE Ee e po eree e a e 27T_R1_001 paired 7 E FE FERRER ES RE rah i a a Se 27T_R1_001 paired gt aerei ta ee ot oe 222222b 22222Ftr222c2222 222 Read Mapping Tumor et ae Dih aiia i 56 o o SE LEP IYRTcLicy cs atime teea T mg 27N_R1_001 paired z ai o n a a A L 27N_R1_001 paired eats oe Read Mapping Normal bel i Ww 23 3 27N_R1_001 paired 27T_R1_001 paired u 0 1 Clinvar_20131203 Variants 7 801 o 6 Cosmic_v67 Variants 126 891 o 1 PhastCons_conservati on_scores_hg19 i o 4 l aS Figure 8 24 The Genome Browser View is a collection of a number of tracks The Genome Browser View makes it easy to compare the different tracks Each track kan be opened individually by double clicking on the track name in the left side of the View Area 8 5 Identify variants and add expression values The Identify Variants and Add Expression Values ready to use workflows can be used to identify novel and known mutations in RNA seq data automatically map quantify and annotate the CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 236 transcriptomes and compare the mutational patterns in the samples with the expression values of the corresponding transcripts and genes To run the ready to use wor
40. Variant track bbb My known variants Minimum coverage 10 from Sample Mappings Detection frequency 20 0 gt Locked Settings hasana annn Previous gt Next Figure 5 9 Specify the track with the known variants that should be identified The parameters that can be set are e Minimum coverage The minimum number of reads that covers the position of the variant which is required to set Sufficient Coverage to YES e Detection frequency The minimum allele frequency that is required to annotate a variant as being present in the sample The same threshold will also be used to determine if a variant is homozygous or heterozygous In case the most frequent alternative allele at the position of the considered variant has a frequency of less than this value the zygosity of the considered variant will be reported as being homozygous The parameter Detection Frequency will be used in the calculation twice First it will report in the result if a variant has been detected observed frequency gt specified frequency or not observed frequency lt specified frequency Moreover it will determine if a variant CHAPTER 5 WHOLE GENOME SEQUENCING WGS 67 should be labeled as heterozygous frequency of another allele identified at a position of a variant in the alignment gt specified frequency or homozygous frequency of all other alleles identified at a position of a variant in the alignment l
41. ee ze Identify and Annotate Variants TAS HD la Whole Transcriptome Sequencing Figure 7 1 The eleven workflows available for analyzing targeted amplicon sequencing data 7 14 General Workflows TAS 7 1 1 Annotate Variants TAS Using a variant track FFF e g the output from the Identify Variants ready to use workflow the Annotate Variants WGS ready to use workflow runs an internal workflow that adds the following annotations to the variant track e Gene names Adds names of genes whenever a variant is found within a known gene e mRNA Adds names of MRNA whenever a variant is found within a known transcript e CDS Adds names of CDS whenever a variant is found within a coding sequence e Amino acid changes Adds information about amino acid changes caused by the variants e Information from ClinVar Adds information about the relationships between human varia tions and their clinical significance e Information from dbSNP Adds information from the Single Nucleotide Polymorphism Database which is a general catalog of genome variation including SNPs multinucleotide polymorphisms MNPs insertions and deletions InDels and short tandem repeats STRs e PhastCons Conservation scores The conservation scores in this case generated from a multiple alignment with a number of vertebrates describe the level of nucleotide conservation in the region around each variant How to run the Annotate Variants TAS workflow 1 Go to
42. gt Identify Rare Disease Causing Mutations in a Family of Four WGS Ser 1 Double click on the Identify Rare Disease Causing Mutations in a Family of Four WES tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the targeted region file figure 6 65 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents 3 Select the sequencing reads from the unaffected sibling figure 6 66 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that CHAPTER 6 WHOLE EXOME SEQUENCING WES 144 Select input for targeted region file Navigation Area Selected elements 1 gt targeted_sequencing JE 0293689_Regions_BED CTFR Cergentis a AmpliSeq 3 3 agilent_sure_select 4 wi Qr zenter search term gt Previous gt Next Figure 6 65 Select the targeted region file you used for sequencing should be provided Select by double clicking on the reads fi
43. ooo locally realigned 4 203 899 reads _ o tumor_01 paired Variants 77 553 tumor_01 paired Reads locally realigned InDel Variants 143 tumor_01 paired Reads locally realigned SV Complex insertion Replacement annotations 40 42692413 42692444 j TATACATATATATGTACACACACACACACACA 42925643 42925644 j 43654990 43654991 44059782 44059783 44188365 44188366 als Create Track from Selection Figure 5 30 This figure shows a Genome Browser View with an open track table The table allows deeper inspection of the identified variants Toolbox Ready to Use Workflows Whole Genome Sequencing j Hereditary Disease s Filter Causal Variants WGS HD 5 1 Double click on the Filter Causal Variants WGS HD tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the variant track you want to use for filtering causal variants figure 5 31 The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on the variant track name or click once on the file and then click on the arrow pointing to the right side in the middle of the wizard 3 Specify which of the 1000 Genomes populations that should be used for annotation figure 5 32 4 Specify the 1000 Genomes population that should be used for filtering out varian
44. the relevant windows At the bottom of this wizard there are two buttons regarding export functions one button allows specification of the export format and the other button the CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 179 P Bx Identify Somatic Variants from Tumor Normal Pair TAS Remove Variants Outside Targeted Regions Choose where to run m Targeted region track gt gt target_regions ve 2 Select normal sequencing reads QI Select tumor sequencing reads 4 InDels and Structural Variants normal wi QC for Target Sequencing normal 6 InDels and Structural Variants tumor N Low Frequency Variant Detection 8 QC for Target Sequencing tumor 9 Remove Variants Outside Targeted Regions alms as Figure 7 30 Select target region track ia Identify Somatic Variants from Tumor Normal Pair TAS Remove Germline Variants 1 Choose where to run Configurable Parameters 2 Select normal sequencing Keep variants with control read count below 2 reads 3 Select tumor sequencing gt Locked Settings reads 4 InDels and Structural Variants normal w QC for Target Sequencing normal 6 InDels and Structural Variants tumor Ss Low Frequency Variant Detection 8 QC for Target Sequencing tumor 9 Remove Variants Outside Targeted Regions 10 Remove Germline Variants srs
45. 0 9 as long as the probability of the entire variant site is greater than 0 9 CHAPTER 3 READY TO USE WORKFLOWS DESCRIPTIONS AND GUIDELINES 33 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html Specify the parameters for the QC for Target Sequencing tool figure 3 6 When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows Again you can choose to use the default settings or you can choose to adju
46. 1 A Filter Somatic Variants WGS 1000 Genomes 1 Choose where to run 1000 Genomes 1D00GENOMES phase _i EUR 2 Select variant tracks 3 1000 Genomes Remove Variants Found in 1000 Genomes Project Choose where to run 1000 Genomes Project variant tack 1O000GENOMES phase_1 FUR Select variant tracks gt 1000 Genomes h Remove Variants Found in 1000 Genomes Project Remove Variants Found in HapMap 1 Choose where to run HapMap database track HAPMAP_phase_3 CEU 2 Select variant tracks i 3 1000 Genomes Remove Variants Found in 1000 Genomes Project Remove Variants Found in HapMap Figure 5 16 Specify which HapMap population to use for filtering out known variants 6 Click on the button labeled Next to go to the last wizard step Shown in figure 5 17 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Filter Somatic Variants WGS workflow Two types of output are generated 1 Somatic Candidate Variants Track that holds the variant data This track is also included in the Genome Browser View If you hold down the Ctrl key Cmd on Mac while clicking on the table icon in the lower left side of the View Area you can open the table view in split CHAPTER 5 WHOLE GENOME SEQUENCING WGS 12 EM Filter Soma
47. 446 960 17 446 980 17 447 000 17 447 020 j n erpa peesaa TTCTGGTTAGGCTTGAGGCTGCGGTTGACAGGGGGCAGCTGGACCT Homo_sapiens_ensem bl_v74_Genes Gene annotations 1 263 Homo_sapiens_ensem bl_v74_CDS CDS annotations 2 176 Homo_sapiens_ensem bil_v74_mRNA mRNA annotations 4 036 Identify Known Variants in One Sample WES Target Regions Coverage BED annotations 4 829 a TTCTGGTTAGGCTTGAGGCTGCGGTTGACAGGGGGCAGCTGGACCT GTTGA CAGGGGGCAGCTGGACCTI TTCTGGTTAGGCTTGAGGCTGCGGTTGACAGGGGGCAG E eee O Q Identify Known Variants in One Sample CAGAGTCCCTGGACATTGTCAAGAAATAGAATGGGGAAAGACTTACG WES Read Mapping CAGAGTCCCTGGACATTGTCAAGAAATAGAATGGGGAAAGACTTACG 845 245 read CAGAGTCCCTGGACATTGTCAAGAA 58 Identify Known Variants in One Sample WES Overview Variants Detected Variants 1 431 Identify Known Variants in One Sample WES Variants Detected in Detail Variants 1 431 E Identify Know X Rows 1 431 Table view Genome Chromosome Region Type Reference Allele Reference Length Zygosity Count Coverage Frequency Probability Forware 22 17385537 SNV A G No 1 Heterozygous 11 16 68 75 1 00 a 22 17385537 SNV A A Yes 1 Heterozygous 5 16 31 25 1 00 22 17469049 SNV Cc A No 1 Homozygous 6 10 60 00 1 00 22 17488823 17488824 Insertion G No 1 Homozvaous 13 13 100 00 1 00 x 4 mi b ala Create Track from Selection alt EE E Figure 6 14 Genome Browser View with an open
48. CHAPTER 5 WHOLE GENOME SEQUENCING WGS 92 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html T Specify the affected child s gender figure 5 42 Trio Analysis Configurable Parameters Child gender Female v gt Locked Settings Figure 5 42 Specify the proband s gender 8 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap for the mother figure 5 43 This can be done usin
49. Coverage Nucleotide distributions GC content Ambiguous base content Quality distribution e 4 Over representation analyses Enriched bmers Sequence duplication levels Duplicated sequences Supplementary QC Report e 1 Summary e 2 Per sequence analysis Lengths distribution GC content Ambiguous base content Quality distribution e 3 Per base analysis Coverage Nucleotide distributions GC content Ambiguous base content Quality distribution e 4 Over representation analyses Enriched bmers Sequence duplication levels Duplicated sequences CHAPTER 4 GETTING STARTED Import data Resequence data i l Run workflow 1 D Preparing Raw Daka Wy Prepare Overlapping Raw Data 1 Prepare Raw Gata Preparation of Raw Data Inspect results Output QC trim reports f QC trim reports are not OK lt i if QC trim reports are OK 4 Use prepared data as input 58 Run workflow 2 Run workflow 2 Run workflow 2 ptomic d ki analysis Soy Whae Genome Sequencing Wihele Exome Sequenang ep Targeted Amphion Sequencing beg Tharscriphoemes Anayasa h annotate Variants WGS gt annatate Variants WES Set Annotate Variants TAS PGE Set Un Experiment R kfl P Je Filter Somate Variants WG B Fiter Samat Variants WES SE Fiter Somate Variants TAS un wor Ow aE Piter Somatic Variants from Tumor Normal Pair WG a Piter Somatic Variants from Tumor N
50. Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called CHAPTER 5 WHOLE GENOME SEQUENCING WGS 86 e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the t
51. SEQUENCING WTS 239 i All parameters for Identify Variants and Add Expression Values Workflow Input Homo sapiens ensembl v74 CDS O Genes Workflow Input Homo_sapiens_ensembl_v74_Genes O ClinVar Workflow Input gt gt gt Clinvar_20131203 Conservation scores Workflow Input Reference sequence Workflow Input Homo_sapiens_sequence_hg19 ie mRNA Workflow Input Homo_sapiens_ensembl_v74_mRNA 0 Cosmic i Export to Excel 2010 iS Export Parameters Figure 8 30 Preview all parameters At this step it is not possible to introduce any changes it is only possible to view the settings 7 Log amp A log of the workflow execution 8 6 Identify and Annotate Differentially Expressed Genes and Pathways The Identify and Annotate Differentially Expressed Genes and Pathways compares the gene expression in different groups of samples using an empirical analysis and performs a gene ontology GO enrichment analysis on the differentially expressed genes to identify affected pathways To run the ready to use workflow Toolbox Ready to Use Workflows Whole Transcriptome Sequencing E Human zi Mouse or Rat H2 Identify and Annotate Differentially Expressed Genes and Pathways J a 1 Double click on the Identify and Annotate Differentially Expressed Genes and Pathways ready to use workflow to start the analysis If you are conne
52. Select sequencing reads Navigation Area Selected elements 1 2 Select tumor sequencing H2 cancer_research_workbench SRR719299_1 paired reads 3 whole_exome_sequencing SRR719300_1 paired SRR719299_ 1 paired du new_import whole_genome_sequencing HJ targeted_amplicon_sequencing 1 Choose where to run T a zenter search term gt Previous gt Next Figure 6 23 Select the tumor sample reads When you have selected the tumor sample reads click on the button labeled Next 2 In the next wizard step figure 6 24 please specify the normal sample reads CHAPTER 6 WHOLE EXOME SEQUENCING WES 117 Bx Identify Somatic Variants from Tumor Normal Pair WES Select sequencing reads Navigation Area Selected elements 1 2 Select tumor sequencing 3 cancer_research_workbench i SRR719300_1 paired reads whole_exome_sequencing 1 Choose where to run HEE SRR719299_1 paired 3 Selettnomalsequencng Eee SRR719299_1 paired d W H new_import H E whole_genome_sequencing 4 H H targeted_amplicon_sequencinc 4 nm j Qr lt enter search term gt Figure 6 24 Select the normal sample reads 3 When you have selected the sample s you wish to analyze click on the button labeled Next This step allow you to restrict the calling of InDels and structural variants to the targeted regions figur
53. Structural Variants Tumor Low Frequency Variant Detection Remove Germline Variants Result handling sir Figure 8 22 Check the selected parametes by pressing Preview All Parameters i Identify Candidate Variants and Genes from Tumor Normal Pair Remove Variants Found in HapMap 1 Choose where to run 2 Selectnomalsequening Css reads 3 Select tumor sequencing reads a Create Fold Change Track w Low Frequency Variant Detection Remove Germline Variants oa 7 Remove Variants Found in HapMap epee ane Ve Xeon Figure 8 23 Preview all parameters At this step it is not possible to introduce any changes it is only possible to view the settings 3 RNA Seq Mapping Report Normal and RNA Seq Mapping Report Tumor This re port contains information about the reads reference transcripts and statistics This is explained in more detail in the Biomedical Genomics Workbench reference manual in sec tion RNA Seq report http clcsupport com biomedicalgenomicsworkbench current index php manual RNA_Seq_report html Read Mapping Normal and Read Mapping Tumor The mapped RNA seq reads The RNA seq reads are shown in different colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the description in see http www clc
54. TRACKS 18 APPLICATIONS READY TO USE WORKFLOWS Prepare Raw Data Trimming amp QC Annotate Variants WES Identify Variants WES Filter Somatic Variants WES Tumor Normal Pair WES Identify and Annotate Variants WES Identify Known Variants in One Sample WES Annotate Variants TAS Prepare Overlapping Raw Data or r Targeted Amplicon Identify Variants TAS Filter Somatic Variants TAS Identify Somatic Variants from Tumor Normal Pair TAS Identify and Annotate Variants TAS Identify Known Variants in One Sample TAS aan Figure 2 7 The available pre installed ready to use workflows for the individual application types download by Biomedical Genomics Workbench can be imported into the Navigation Area using the import option found in the toolbar Toolbar Import gt Tracks To illustrate this a Genome Browser view is shown in figure 2 8 to figure 2 13 It consists of the following tracks all tied to the human hg19 reference genomic sequence gene coding sequence CDS a read mapping and variants In figure 2 8 we have used the zoom tools to zoom all the way in on a SNV that is found in a coding region CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 19 Iny Track List_1 X 150 000 000 Homo_sapiens_sequence_hg19 78 Homo_sapiens_ensembl_v73_Genes Gene annotations 5 321 0 223 Homo_sapiens_ensembl_v73_CDS B l T Ly E uch thud adhe A Se
55. This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 Chapter 4 Getting started Contents 4 1 Reference data sasaaa eee ee ee 36 4 1 1 The Workbench Reference data location 008285 37 4 1 2 Space requirements ac lt a be wd bbe GS Go eo dw a 37 4 1 3 Where reference data is downloaded from n nomoan oaoa oa a a a a 38 4 1 4 Download and configure reference data nononono oa aoao a a 38 4 1 5 Troubleshooting reference data downloads a aoao oaoa aoao o a a 45 4 2 Create new folder 2 0 0 ee ee eee 45 4 3 Import sequencing data 2 22 eee ee ee 47 4 3 1 How to impor Gald 2n5 ee eevee een wea ee Ge BE ee eS 48 4 4 Prepare sequencing data 02 eee ee ee ee ee 48 4 4 1 Choosing between Prepare Raw Data and Prepare Overlapping Raw Data WOO 6b ek awe eae ee ee ee he he ee ee 49 4 4 2 Import adapter trim list aoao oa a a a a a 50 4 4 3 Howto run the Prepare Overlapping Raw Data ready to use workflow 51 4 4 4 How to run the Prepare Raw Data ready to use workflow 54 4 4 5 Output from the Prepare Overlapping Raw Data and Prepare Raw Data VOOR seraa ee ORS eee eee eee ee ee 56 4 4 6 How to check the output reports 0 ee ee ee l 4 1 Reference data The ready
56. Variants from Tumor Normal Pair TAS QC for Target Sequencing normal Configurable Parameters Choose where to run am normal sequencing Track of Target Regions target_regions re Minimum coverage 30 Select tumor sequencing ar reads Ignore non specific matches _ I broken pai InDels and Structural ibaa ime Variants normal gt Locked Settings QC for Target Sequencing normal L JUS Figure 7 26 Specify the settings for the variant detection 5 Click on the button labeled Next which will take you to the next wizard step figure 7 27 In this wizard step you can select your target regions track to be used for reporting the performance of the targeted re sequencing experiment for the tumor sample P Bx Identify Somatic Variants from Tumor Normal Pair TAS InDels and Structural Variants tumor Configurable Parameters Choose where to run nae normal sequencing Restrict calling to target regions gt target_regions re Select tumor sequencing gt Locked Settings reads InDels and Structural Variants normal QC for Target Sequencing normal InDels and Structural Variants tumor eis nS Figure 7 27 Select your target region track 6 Click on the button labeled Next to specify the target regions track to be used in the Remove Variants Outside Targeted Regions step figure 7 2
57. You can always save the results at a later point Identify Somatic Variants from Tumor Normal Pair WES Eight different outputs are generated 1 Read Mapping Normal The mapped sequencing reads for the normal sample The reads are shown in different colors depending on their orientation whether they are single CHAPTER 6 WHOLE EXOME SEQUENCING WES 120 m Bx Identify Somatic Variants from Tumor Normal Pair WES QC for Target Sequencing normal i Configurable Parameters nae tumor sequencing Track of TargetRegions target_regions reads Minimum coverage 30 Select normal sequencing ee Ignore non specific matches Ignore broken pai E InDels and Structural z O Variants tumor gt Locked Settings Low Frequency Variant Detection QC for Target Sequencing tumor 7 Remove Variants Outside Targeted Regions 2 8 Remove Germline Variants 9 InDels and Structural Variants normal 10 QC for Target Sequencing normal ima aS Figure 6 31 Select the target regions track a amp Identify Somatic Variants from Tumor Normal Pair WES Result handling Choose where to run Select tumor sequencing reads Select normal sequencing reads InDels and Structural Workflow parameters Variants tumor Preview All Parameters Low Frequency Variant Detection Result handling QC for Target Sequencing Open
58. a ama T IC gt i Homo_sapiens_ensem a bl_v74_mRNA x 8 Toe mRNA annotations _ n a S aa wey 2 15 469 _ s ee ee ne E He EG i E P a JE Homo_sapiens_ensem bl_v74_CDS 4 i CDS annotations 8 031 ea eS a I s _ r saa ona lod a pore e 1 om g dilo k Bm i S u a 39 SRR719299_1 paired Target Region l li mn a an B Bilik l n o h hl 1 634 SRR719300_1 paired Read Mapping Normal 4 177 231 reads 0 1 SRR719299_1 paired Read Mapping Tumor 4 345 673 reads oA re Ae ee a ee 2 SRR719300_1 paired SRR719299_1 paired l 7 o Clinvar_20131203 Variants 7 801 E oe oe r o i at 1 PhastCons_conservati on_scores_hg19 v 4 m Figure 7 33 The Genome Browser View presents all the different data tracks together and makes it easy to compare different tracks 7 2 3 Identify Variants TAS The Identify Variants TAS tool takes sequencing reads as input and returns identified variants as part of a Genome Browser View The tool runs an internal workflow which starts with mapping the sequencing reads to the human reference sequence Then it runs a local realignment to improve the variant detection which is run afterwards At the end variants with an average base quality smaller than 20 are filtered away In addition a targeted region report is created to inspect the overall coverage and mapping specificity in the targeted regions CHAP
59. and reused To do this Toolbox Identify Candidate Variants Create Filter Criteria This tool can be used to specify the filter and the Annotate Variants workflow should be extended by the Identify Candidate Tool configured with the Filter Criterion The Biomedical Genomics Workbench reference manual has a chapter that describes this in detail http www clcbio com support downloads manuals see chapter Workflows for more information on how pre installed workflows can be extended and or edited Note Sometimes the databases e g doSNP are updated with a newer version or maybe you have your own version of the database In such cases you may wish to change one of the used databases This can be done with Data Management function which is described in section CHAPTER 6 WHOLE EXOME SEQUENCING WES 105 Ify Genome Browse X 13 020 55 249 040 55 249 060 55 249 080 55 249 100 Homo _sapiens_sequence_hg19 ACGTGTGCCGCCTGCTGGGCATCTGCCTCACCTCCACCGTGCAQCTCATCACGCAGCTCATGCCCTTCGGCTGCCTCCTGGACTAT CA Homo_sapiens_ensembl_v73_Genes Gene annotations 2 818 Homo_sapiens_ensembl_v73_MRNA mRNA annotations 8 251 Homo_sapiens_ensembI_v73_CDS CDS annotations 4 332 ERR319087 single Reads locally realigned Variants Annotated Variants Variants 6 Cosmic_v67 Variants 64 363 ClinVar_20130930 Variants 4 176 dbsnp_v138 Variants 7 205 594 i 1 00 Phast Cons _conservation_scores_hg19 Graph
60. different vertebrates or mammals in the region containing the variant can also be used to give a hint about whether a given variant is found in a region with an important functional role If you would like to use the conservation scores to identify interesting variants we recommend that variants with a conservation score of more than 0 9 PhastCons score are prioritized over variants with lower conservation scores It is possible to filter variants based on their annotations This type of filtering can be facilitated using the table filter found at the top part of the table If you are performing multiple experiments where you would like to use the exact same filter criteria you can create a filter that can be saved and reused To do this Toolbox Identify Candidate Variants Create Filter Criteria This tool can be used to specify the filter and the Annotate Variants workflow should be extended by the Identify Candidate Tool configured with the Filter Criterion The Biomedical Genomics Workbench reference manual has a chapter that describes this in detail http www clcbio com support downloads manuals see chapter Workflows for more information on how pre installed workflows can be extended and or edited Note Sometimes the databases e g dbSNP are updated with a newer version or maybe you have your own version of the database In such cases you may wish to change one of the used databases This can be done with Dat
61. different ways provide information about the mapped reads 2 Read Mapping The mapped sequencing reads The reads are shown in different colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the descrip tion of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual View_settings_in_Side_Panel html 3 Variants Detected in Detail Annotation track showing the known variants Like the Overview Variants Detected table this table provides information about the known variants Four columns starting with the sample name and followed by Read Mapping coverage Read Mapping detection Read Mapping frequency and Read Mapping zygosity provides the overview of whether or not the known variants have been detected in CHAPTER 5 WHOLE GENOME SEQUENCING WGS 68 the sequencing reads as well as detailed information about the Most Frequent Alternative Allele labeled MFAA 4 Genome Browser View Identify Known Variants A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions target regions coverage the mapped reads the overview of the detected variants and the variants detected in detail It is a good idea to start looking at the mappin
62. ee 4 Remove Variants Found in Selected apie 3 HAPMAP_phase_3 MKK HAPMAP_phase_3_CHD HAPMAP _phase_3_TSI IHAPMAP_phase_3_CHB ees anan HAPMAP _phase_3_GIH IHHAPMAP_phase_3_HCB HAPMAP_phase_3_LWK IHAPMAP_phase_3_CEU HAPMAP_phase_3_MEX HAPMAP _phase_3_YRI HAPMAP_phase_3_JPT HAPMAP _phase_3_ASW Figure 5 39 Select the relevant Hapmap population s Specify the parameters for the Fixed Ploidy Variant Detection tool for the unaffected parent 8 Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected child 9 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters and it is not possible to make any changes Choose to save the results and click on the button labeled Finish CHAPTER 5 WHOLE GENOME SEQUENCING WGS 90 Output from the Identify Causal Inherited Variants in a Trio WGS workflow Five types of output are generated e Reads Tracks One for each family member The reads mapped to the reference sequence e Variants in One track for each family member The variants identified in each of the family members The variant track can be opened in table view to see all information about the variants e Putative Causal Variants in Child The putative disease causing variants identified in the child The variant track can be opened in table view to see all information about the variants e Ge
63. ee ae 15 061 00 Paired reads_Sample_1 locally realigned 1 4 171 282 reads 0 00 419 Sample_1 paired Reads 1 locally realigned Variants MVF 1 E AT G eo eae eee eee W Lialia Price E A E See L Figure 2 8 A Genome Browser view with a genomic sequence track a gene track a coding sequence CDS track a read mapping track and a variant track A Genome Browser view like the one shown in figure 2 8 allows for a complete overview of reads mapped to a reference and identified variants You can see how many reads and variants you have and you can compare them to the complete human genome genes and coding regions How to zoom in a Genome Browser view One way to zoom in to take a closer look at the reads and variants is to use the zoom tools These are located in the lower right corner of the view area see figure 2 9 Click and hold down the mouse button for a second or two on the relevant icon This can be either an arrow or a magnifying glass By clicking the magnifying glass icon three icons will appear These can be used for zooming in zooming out or panning The different zoom options are described in detail in the Biomedical Genomics Workbench reference manual in the section entitled Zoom and selection in View Area m T _ Si EN kA I Sil Figure 2 9 Click and hold down the mouse button for a second or two on the mangnifying glass icon until additional icons appear Select the arrow to
64. eee ee 225 8 4 Identify Candidate Variants and Genes from Tumor Normal Pair 230 8 5 Identify variants and add expression values 2 02028 eee eee 235 8 6 Identify and Annotate Differentially Expressed Genes and Pathways 239 The technologies originally developed for next generation DNA sequencing can also be applied to deep sequencing of the transcriptome This is done through cDNA sequencing and is called RNA sequencing or simply RNA seq One of the key advantages of RNA seq is that the method is independent of prior knowledge of the corresponding genomic sequences and therefore can be used to identify transcripts from unannotated genes novel splicing isoforms and gene fusion transcripts Wang et al 2009 Martin and Wang 2011 Another strength is that it opens up for studies of transcriptomic complexities such as deciphering allele specific transcription by the use of SNPs present in the transcribed regions Heap et al 2010 RNA seq based transcriptomic studies have the potential to increase the overall understanding of the transcriptome However the key to get access to the hidden information and be able to make a meaningful interpretation of the sequencing data highly relies on the downstream bioinformatic analysis In this chapter we will first discuss the initial steps in the data analysis that lie upstream of the analysis using ready to use workflows Next we will look at what the individual ready to use workflo
65. elements 1 2 Selectreads from Family of Four a ES Family member affected affected family member 2 Affected child Father affected Mother unaffected H Family member affected w X nter search term gt Previous gt Next Figure 1 Specify the sequencing reads for the appropriate family member 3 Select the sequencing reads from the mother 4 You then need to select the targeted region file figure 7 72 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents Select input for targeted region file Navigation Area Selected elements 1 gt targeted_sequencing par E 50293689_Regions_BED E CTFR Cergentis E Ampliseq agilent_sure_select 5 4450293689_Regions_BED wm j Qr lt enter search term gt Previous gt Next Figure 7 7 2 Select the targeted region file you used for sequencing 5 Select the sequencing reads from the affected child 6 Specify the affected child s gender for the Trio analysis figure 7 73 Specify the Hapmap populations that should be used for filt
66. filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html 12 Specify the parameters for the Fixed Ploidy Variant Detection tool for the father 13 Specify the parameters for the Fixed Ploidy Variant Detection tool for the sibling 14 Specify the parameters for the Fixed Ploidy Variant Detection tool for the mother 15 Specify the parameters for the QC for Target Sequencing tool for the father figure 6 70 When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows Again you can choose to use the default settings or you can choose to adjust the parameters QC for Target Sequencing proband Configurable Parameters Minimum coverage 30 Ignore non specific matches Ignore broken pairs gt Locked Settings io X Cancel Figure 6 70 Specify the parameters for the QC for Target Sequencing tool The parameters that can be set are e Minimum coverage provides the length of each target region
67. ftp ensembl org pub release 79 gtf rattus_norvegicus filename Rattus_norvegicus Rnor_5 0 79 gtf gz e dbSNP variants ENSEMBL ftp ftp ensembl org pub release 79 variation gvf rattus_norvegicus filename Rattus_norvegicus gvf gz APPENDIX A REFERENCE DATA OVERVIEW 255 e PhastCons Conservation Scores UCSC http hgdownload cse ucsc edu goldenPath rn5 phastCons13way Each chromosome has a separate wigfix file Each needs to be downloaded 22 files and then combined to make single wigfix file before importing in workbench filename phastConsl3way wigFix gz e Rat Gene Ontology GO slim file EBI http www ebi ac uk QuickGO GMultiTerm Gene Ontology file in slim format only high level GO terms annotated for the GO categories Molecular Function Biological Process and Cellular Component annotated on mouse genes The file was made using the QuickGO tool from the EBI http www ebi ac uk QuickGO GMultiTerm Appendix B Mini dictionary Tp eserintion SSS SSS Application Type of analysis Whole Genome Sequencing Wole Exome Sequencing Targeted Amplicon Sequencing RNA seq Automated workflow A workflow consisting of several tools that have been built POMAS NONTON agate and oly equres few puts om the ser o n Navigation area The area in the left side of the Biomedical Genomics Work Seenaa penen mat lds he data Ready to use workflow Pre installed automated workflow consisting of several tools that have been built together and o
68. go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html Specify the parameters for the QC for Target Sequencing tool for the affected family member figure 7 59 When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows Again you can choose to use the default settings or you can choose to adjust the parameters QC for Target Sequencing proband Configurable Parameters Minimum coverage 30 Ignore non specific matches Ignore broken pairs gt Locked Settings ian haan X Cancel Figure 7 59 Specify the parameters for the QC for Target Sequencing tool The parameters that can be set are e Minimum coverage provides the length of each target region that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore
69. identified variants in the context of the human genome and external databases Finally a track with conservation scores has been added to be able to see the level of nucleotide conservation from a multiple alignment with many vertebrates in the region around each variant By double clicking on one of the annotated variant tracks in the Genome Browser View a table will be shown that includes all variants and the added information annotations see 5 12 Note We do not recommend that any of the produced files are deleted individually as some of them are linked to other outputs Please always delete all of them at the same time CHAPTER 5 WHOLE GENOME SEQUENCING WGS 69 ly Genome Browse X icin cil ee bie ana 4 680 440 Homo_sapiens_ensem bil_v74_mRNA mRNA annotations 3 572 Homo_sapiens_ensem bl_v74_CDS CDS annotations 2 018 o tumor_01 paired Read Mapping TGAGTACAGCAACCAGAACAACTTTGTGCACGACTGCGTCAATATO 4 207 953 reads AGAACAGGAACCAGACAAACTTTGTGGCCGACTTGGTCAATATGC CCAGAACAACTTTGTGCACGACTGCGTCAATATC GAACAACTTTGTGCACGACTGCGTCAATATG ATQACAATCAAGCAGCACACGGT CACCACAACCACCAAGGG NCAATCAAGCAGCACACGGTCACCACAACCACCAAGGG ICAATCAAGCAGCACACG NCAATCAAGCAGCACACGGTCACCACAACCACCAAGGG NICAATCAAGCAGCACACGGTCACCACAACCACCAAGGG NCAATCAAGCAGCACACGGTCACCACAACCACCAAGGG alesia iiiaae E aa Wad GA GG ee TTAGT GCACGACT GCGACAATATGGCATACAAGCAGCACACGT GCACCCCAT CCCCCAAGGG m o tumor_01 paire
70. ie 223 Homo sapiens_ensem bl_v74_CDS CDS annotations 8 031 AEEA at eum dhe 73 Homo _sapiens_ensem bl_v 4_Genes Gene annotations 5 363 a mM bt Lil duh dt mn dh om a il M O O N a L i i a tte A _E Sage eee a ee 716 Clinvar_20131203 Variants 7 301 0 i PhastCons_conservati on_scores_hgi9 Graph a Homo sapiens_sequen ce_hgi9 249 250 621bp 403 Homo _sapiens_ensem bl_vw74_mRANA vra m la ak harik L 4 I F Iba Fi gt cy __ Hki Figure 8 15 The genome browser view makes it easy to compare a range of different data 8 4 Identify Candidate Variants and Genes from Tumor Normal Pair The Identify Candidate Variants and Genes from Tumor Normal Pair tool identifies somatic variants and differentially expressed genes in a tumor normal pair One tumor normal pair can be compared at the time If you would like to compare more than one pair you must repeat the analysis with the next tumor normal pair CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 231 To run the ready to use workflow Toolbox Ready to Use Workflows Whole Transcriptome Sequencing Human al Mouse or Rat E2 Identify Candidate Variants and Genes from Tumor Normal Pair 2 1 Double click on the Identify Candidate Variants and Genes from Tumor Normal Pair tool to Start the analysis If you are connected to a server you will first be asked where you would like to run the analysis
71. in a track are shown in the table view of the track as described in the next section How to open a table in split view The table view of a track provides the details of the information that is presented in the track itself It is often useful to view the table at the same time as the track this is done by opening the table in a split view From an individual track open in the Viewing area of the Workbench this can be done by depressing the Ctrl key and clicking using the mouse on the small icon of a table at the bottom of the view From a genome Browser view open in the Viewing area the table view of a particular track can be opened in a split view by double clicking on the track name in the list This is shown in figure 2 14 The table and the track are linked which means that clicking on a particular row in the table brings that position into focus in the Genome Browser view For example if you wished to jump CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 21 Iny Track List_1 X 50 000 000 100 000 000 150 000 000 200 000 000 l M hi TEN ATA EEN eer oe E PN PON E TER Homo_sapiens_sequence_hg19 78 Homo_sapiens_ensembl_v73_Genes Gene annotations 5 321 0 223 Homo_sapiens_ensembl_v73_CDS CDS annotations 7 923 A 15 061 00 Paired reads_Sample_1 locally realigned 1 4 171 282 reads 0 00 419 Sample_1 paired Reads 1 locally realigned Variants MVF 1 Variants 8 447
72. input You may need to provide additional information relevant to your data and analysis to run a given workflow for example adapter trim lists for trimming sequences or when performing Targeted Amplicon Sequencing a description of the sequenced regions Irrespective of the type of sequencing data you wish to analyze there are only few steps necessary before the identified variants are available for your inspection A schematic representation of the flow that an analysis could take is shown in figure 2 6 Import data il Sequencing Reads p s gt l l l l l l l l I I I L I I Resequence data l Run workflow 1 a gt Preparing Raw Data hA prepare Overlapping Raw Data wh Prgpog Flac Data Preparation of Raw Data T T Inspect results Output QC trimreports J fQc trim reports are not OK lt i ifQc trim reports are OK s Output Prepared data i Use prepared data as input _ Run workflow 2 Run workflow 2 Run workflow 2 Transcriptomics d iFa i analysis aat While Genome Saquencing Whole Exome Sequencing ep Targeted Ampioon Sequencing beg TRermcripioerecs Analyses e areotaw Variants NGS Se Arrsstate Variants WES Se annotate Variants TAS p AE Set Unesperiment SAS Filter Somatic Variants G5 B Fiter Somatic Variants WES E Filter Somate Variants TAS F Ru n workfl Ow 2 22
73. mRNA annotations 15 412 Homo_sapiens_ensembI_v73_CDS CDS annotations 7 923 ERR319087 single Reads a locally realigned Variants Somatic Candidate Variants Variants 2 Cosmic_v67 Variants 126 891 ClinVar_20130930 Variants 3 496 1 00 Phast Cons _conservation_scores_hg19 Graph 0 00 4 Ill gt 4 Lill gt TEER Ea FE E ERR3 19087 si X Rows 16 Table view Genome Fiter F h ai Chromosome Region Type Reference Allele Reference Length Zygosity Exact match start posi end posit BaseQRank al 115258748 SNV Ii No 1 Heterozygous Cosmic v67 0 0 0 00 1 115258749 SNV T C No 1 Heterozygous 0 0 0 0 2 212652720 212652721 MNV TG GT No 2 Heterozygous 0 0 0 0 WoO1FO4 A QOJ O A Ot Le Alo 4 eterna ein 4 2 D a 4 lam ey als Create Track from Selection al E Figure 6 22 The Genome Browser View showing the annotated somatic variants together with a range of other tracks A high conservation level between different vertebrates or mammals in the region containing the variant can also be used to give a hint about whether a given variant is found in a region with an important functional role If you would like to use the conservation scores to identify interesting variants we recommend that variants with a conservation score of more than 0 9 PhastCons score is prioritized over variants with lower conservati
74. more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual QC_Target_Sequencing html Click on the button labeled Next and specify the track with the known variants that should be identified in your sample figure 6 11 z Bx Identify Known Variants in One Sample WES Identify Known Mutations from Sample Mappings Configurable Parameters 2 InDels and Structural Variant track Variants 1 Select sequencing reads bbb My known variants Minimum coverage 10 h for Target Sequenci 5 ae Detection frequency 20 0 4 6 Identify Known Mutations from Sample Mappings gt _ Locked Settings unkan aan previous gt Next Figure 6 11 Specify the track with the known variants that should be identified The parameters that can be set are CHAPTER 6 WHOLE EXOME SEQUENCING WES 108 e Minimum coverage The minimum number of reads that covers the position of the variant which is required to set Sufficient Coverage to YES e Detection frequency The minimum allele frequency that is required to annotate a variant as being present in the sample The same threshold will also be used to determine if a variant is homozygous or heterozygous In case the most frequent alternative allele at the position of the considered variant has a frequency of less than this value the zygosity of the considered variant will be reported as being homozygous
75. normal control sample from the same patient When running the Identify Somatic Variants from Tumor Normal Pair WGS the reads are mapped and the variants identified An internal workflow removes germline variants that are found in the mapped reads of the normal control sample and variants outside the target region are removed as they are likely to be false positives due to non specific mapping of sequencing reads Next remaining variants are annotated with gene names amino acid changes conservation scores and information from clinically relevant databases like ClinVar variants with clinically relevant association Finally information from dbSNP is added to see which of the detected variants have been observed before and which are completely new How to run the Identify Somatic Variants from Tumor Normal Pair WGS workflow To run the Identify Somatic Variants from Tumor Normal Pair WGS tool go to Toolbox Ready to Use Workflows Whole Genome Sequencing gt Somatic Cancer Fa Identify Somatic Variants from Tumor Normal Pair WGS RE 1 Go to the toolbox and double click on the Identify Somatic Variants from Tumor Normal Pair WGS ready to use workflow This will open the wizard shown in figure 5 20 where you can select the tumor sample reads m Identify Somatic Variants from Tumor Normal Pair WGS Select sequencing reads 1 Choose where to run Navigation Area Selected elements 1 2 Select tumor sequencing
76. of one sample Reference manual pdf fE Quality Control H E Preparing Raw Data Analysis of multiple samples ppp Resequencing Analysis Explore amp learn H E Add Information to Variants E3 Remove Variants Interpretation Filter somatic variants Identify and annotate differentially expressed genes and Download reference H E Add Information to Genes pathways data is gy Compare Samples Annotate variants Annotate variants eng spp Identify Candidate Variants Norki 7 fm Identify Candidate Genes Identify cancer driver mutations using Ingenuity Variant Analysis ldentify cancer driver mutations using Ingenuity Variant Analysis overview H Expression Analysis f Helper Tools How to edit workflows a f Cloning and Restriction Site i ss es meneame Data analysis amp Identify known variants in one sample dentify candidate variants and genes from tumor normal pair Elugics Sanger Sequencing interpretation Download example data H t Epigenomics Analysis Workflows Ea CLC Server H E Legacy Tools Identify somatic variants from tumor normal pair Tobe rc Favorites Job H Job Board B Idle 0 element s are selected Figure 2 1 The Biomedical Genomics Workbench start up window Currently Biomedical Genomics Workbench can be used to analyze DNA sequencing data Analysis of RNA sequencing data is planned for a future release In this section we take a closer look at the table in the viewing area figure 2 2 DNA RNA Getting started
77. of the export format and the other button the one labeled Export Parameters allows specification of the export destination When selecting an export location you will export the analysis parameter settings that were specified for this specific experiment 6 Click on the button labeled OK to go back to the previous wizard step and choose Save Note If you choose to open the results the results will not be saved automatically You can always save the results at a later point Output from the Identify Somatic Variants from Tumor Normal Pair WGS workflow Six different outputs are generated 1 Read Mapping Tumor The mapped sequencing reads for the tumor sample The st eer a a T reads are shown in different colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the description of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual View_settings_in_Side_Panel html 2 Read Mapping Normal The mapped sequencing reads for the normal sample The reads are shown in different colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the description of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcs
78. of the variants or right clicking on the variant A tooltip will appear with detailed information about the variant Genome Browser View Annotated Variants Ip A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions and variants detected in dbSNP ClinVar 1000 Genomes and PhastCons conservation scores see figure 8 5 CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 223 bye Genome Browse X cst ct uana ia bibani 79 Homo_sapiens_ensem bl_ v74_Genes d 7 f 40a Homo sapiens_ensem os Declan mitn a Miho 223 Homo_sapiens_ensem anne li Lt ul TENN NERE tht ii ERR319065 ERR319065 Variants Annotated a l 107 413 dbsnp_wi36 Variants 8 659 871 Whi l l 0 716 Clinvar_20131203 Variants 7 201 ees ene eral Ol E ee ene sew E ne seca MF FIT itd Labaik kh ihh idh Ltd al seooctnoues ona EUR La 3 517 4000GENOMES phase_4 _AMR a 12 443 TOOOGENOMES phase_1 AFR PhastCons_conservati on_scores_ hgi9 me Fa cam Figure 8 5 The output from the Annotate Variants ready to use workflow is a genome browser view a track list containing individual tracks for all added annotations Note Please be aware that if you delete the annotated variant track this track will also disappear from the genome browser view It is pos
79. on organization of the Toolbox The first to note is the top level folders and their associated icons see figure 2 4 Ready to Use Workflows Ga Preparing Raw Data E gee Whole Exome Sequencing ee Targeted Amplicon Sequencing H E Whole Transcriptome Sequencing Tools E ww Genome Browser E G Quality Control Ge Preparing Raw Data pa Resequencing Analysis Fg Add Information to Variants Fi Remove Variants E 4 Add Information to Genes EE By Compare Samples Fel ppp Identify Candidate Variants E e Identify Candidate Genes E a oa Analysis F Fac ac anani H E Legacy Tools Figure 2 4 The top level folders of the Toolbox are divided into two main categories the Ready to Use Workflows and the Tools The elements under the folders of the Tools section can be used for manual analysis or used for editing existing workflows and building your own workflows The toolbox contains two different categories of tools 1 the Ready to Use Workflows which can be used to run complete analyses and 2 Tools containing many individual tools that can be used for analysis by themselves or can be used to build workflows from or which can be added to existing workflows to expand their functionality The name of the folders in the Ready to use workflows section reflect the type of analysis the workflows in that folder are designed for See figure 2 5 Manual data analysis that is execution of individual analysis steps can
80. position filter Significance Remove pyro error variants In homopolymer regions with minimum length 3 With frequency below 0 8 gt Locked Settings Figure 8 27 Specify the parametes for transcriptomic variant detection 5 If you are working with the workflow from the Human folder specify here the relevant 1000 Genomes population from the drop down list See figure 8 28 Choose the population that matches best the population your samples are derived from Under Locked settings you can see that Automatically join adjacent MNVs and SNVs has been selected The reason for this is that many databases do not report a succession of SNVs as one MNV as is the case for the Biomedical Genomics Workbench and as a consequence it is not possible to directly compare variants called with Biomedical Genomics Workbench with these databases In order to support filtering against these databases anyway the option to Automatically join adjacent MNVs and SNVs is enabled This means that an MNV in the experimental data will get an exact match if a set of SNVs and MNVs in the database can be combined to provide the same allele Note This assumes that SNVs and MNVs in the track of known variants represent the same allele although there is no evidence for this in the track of known variants Click on the button labeled Next to go to the last wizard step shown in figure 8 29 Pressing the button Preview All
81. select your target regions track to be used for reporting the performance of the targeted re sequencing experiment for the tumor sample F Bx Identify Somatic Variants from Tumor Normal Pair WES QC for Target Sequencing tumor 1 Choose where to run Configurable Parameters 2 iis tumor sequencing Track of Target Regions target_regions 5 Minimum coverage 30 i T ee Ignore non specific matches I broken pai 4 InDels and Structural iii emis Variants tumor gt Locked Settings 5 Low Frequency Variant Detection amp QC for Target Sequencing tumor L2 jis _ X cance Figure 6 27 Select your target region track 6 Click on the button labeled Next to specify the target regions track to be used in the Remove Variants Outside Targeted Regions step figure 6 28 The targeted region track Should be the same as the track you selected in the previous wizard step Variants found outside the targeted regions will not be included in the output that is generated with the ready to use workflow Click on the button labeled Next E Identify Somatic Variants from Tumor Normal Pair WES Remove Variants Outside Targeted Regions 2 1 Choose where to run Targeted region track gt target_regions ye 2 Select tumor sequencing reads ww Select normal sequencing reads A InDels and Structural Variants tumor 5 Low Frequency Variant Detec
82. spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html T Specify a targeted region file to remove variants outside of this region figure 6 86 Select input for targeted region file Navigation Area Selected elements 1 i eH targeted_sequencing a E 50293689_Regions_BED a CTFR Cergentis AmpliSeq agilent_sure_select 5 4450293689_Regions_BED uw j Qr zenter search term gt Previous gt Next Figure 6 86 Select the targeted region file you used for sequencing 8 Specify the 1000 Genomes population that should be used to add information on variants found in the 1000 Genomes project This can be done using the drop down list found in this wizard step Please note that the populations av
83. that makes it easy to compare information from the individual tracks such as compare the identified variants with the read mappings and information from databases e De novo Mutations Amino Acid Track e Recessive Variants Amino Acid Track 6 3 6 Identify Variants WES HD You can use the Identify Variants WES HD ready to use workflow to call variants in the mapped and locally realigned reads The workflow removes false positives and in case of a targeted experiment removes variants outside the targeted region Variant calling is performed with the Fixed Ploidy Variant Detection tool The Identify Variants WES HD ready to use workflow accepts sequencing reads as input How to run the Identify Variants WES HD workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Variants WES HD workflow go to Toolbox Ready to Use Workflows Whole Exome Sequencing i Hereditary Disease 5 Identify Variants WES HD ij 1 Double click on the Identify Variants WES HD tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis CHAPTER 6 WHOLE EXOME SEQUENCING WES 153 2 Select the sequencing reads you want to analyze figure 6 77 The panel in the left side of the wiza
84. the Run workflow 1 box in figure 4 19 CHAPTER 4 GETTING STARTED 49 Illumina set parameters Lookin rm Downloads A ar d EE E 26N_RL_001 fastg gz re bi 26N_R2_001 fastg gz Recent Items 26T_R1_001 fastq gqz x 26T_R2_001 fastq qz Desktop 1 Choose where to run 2 Import files and options File name L_001 fastg gz 26T_R2_001 fastg gz E Files of type Illumina files txt fasta fa g General options Paired reads me Paired read information Discard read names Paired end forwardteverse 5 Mate Discard quality scores Minimum distance 180 Maximum distart Illumina options Remove failed reads Quality scores NCBI Sanger or Illumina Pipeline 1 8 ar T MiSeq de multiplexing Trim reads Previous gt Next Figure 4 16 Locate and select the files to import Tick Paired reads if you as in this example are importing paired reads 4 4 1 Choosing between Prepare Raw Data and Prepare Overlapping Raw Data workflows The Preparing Raw Data ready to use workflows are universal and can be used for all appli cations Whole Genome Sequencing Exome Sequencing and Targeted Amplicon Sequencing But many whole genome sequencing exome sequencing using capture technology and targeted amplicon sequencing strategies produce overlapping reads Downstream stages of the Biomed ical Genomics Workbench e g variant calling take the frequencies of observed alleles into c
85. the Identify Known Variants in One Sample TAS workflow 1 Go to the toolbox and double click on Toolbox Ready to Use Workflows Targeted Amplicon Sequencing Sequencing General Workflows TAS Identify Known Variants from One Sample TAS Be 2 This will open the wizard step shown in figure 7 8 where you can select the reads of the sample which should be tested for presence or absence of your known variants g Identify Known Variants in One Sample TAS Select sequencing reads 1 Choose where to run AS Navigation Area Selected elements 1 2 Select sequencing reads cancer_research_workbench A i ERR319085 whole_exome_sequencing whole_genome_sequencing targeted_amplicon_sequencing gt melanoma_ffpe i fresh_frozen tumor_1 4i F ERR319085 mi cat sl ww p Qy lt enter ht Batch Previous gt Next Fin X Cancel Figure 7 8 Select the sequencing reads from the sample you would like to test for your known variants lf several samples from different folders should be analyzed the tool has to be run in batch mode This is done by selecting Batch and spcifying the folders that hold the data you wish to analyse Click on the button labeled Next 3 Specify the target region for the Indels and Structural Variants tool figure 7 9 This step is optional and will speed the completion time of the workflow by running the tool only
86. the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html 6 Specify the affected child s gender figure 5 46 CHAPTER 5 WHOLE GENOME SEQUENCING WGS 96 Some workflows take the gender into account When asked for it provide the gender of the child the proband Trio Analysis Configurable Parameters Child gender Female gt Locked Settings Previous gt Next Figure 5 46 Specify the proband s gender T Specify the Hapmap populations that should be used for filtering out variants found in 10 11 Hapmap for the father figure 5 47 This can
87. the same time A good place to start is to take a look at the mapping report to see whether the coverage is sufficient in the regions of interest e g gt 30 Furthermore please check that at least 90 CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 190 r Bx Identify and Annotate Variants TAS Add Information from 1000 Genomes Project Configurable Parameters Select sequencing reads Known variants track 1000GENOMES_phase_1_EUR Choose where to run 1000 Genomes gt Locked Settings QC for Target Sequencing Low Frequency Variant Remove Variants Outside Targeted Regions Add Information from 1000 Genomes Project Figure 7 4 7 Select the relevant population from the 1000 Genomes project This will add information from the 1000 Genomes project to your variants r Bx Identify and Annotate Variants TAS Add Information from HapMap Configurable Parameters Select sequencing reads Known variants track HAPMAP Choose where to run 1000 Genomes Locked Settings QC for Target Sequencing Low Frequency Variant Detection Remove Variants Outside Targeted Regions Add Information from 1000 Genomes Project 8 Add Information from HapMap Figure 7 48 Select a population from the HapMap database This will add information from the Hapmap database to your variants r Bx Identify and Annotate Variants TAS Choo
88. the table the track view will automatically bring this position into focus 2 Genome Browser View Filter Somatic Variants A collection of tracks presented together Shows the somatic candidate variants together with the human reference sequence genes transcripts coding regions and variants detected in ClinVar 1000 Genomes and the PhastCons conservation scores see figure 7 21 CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 174 Iny Genome Browse X 18 700 000 18 750 000 18 800 000 I I Homo_sapiens_sequem Maass Homo_sapiens_ensem bl_v74_Genes Gene annotations 1 263 OT O Homo_sapiens_ensem ne sA bl_v74_mRNA Ajy mRNA annotations 4 036 KAN Ir a NY YYA ie ie lee Homo_sapiens_ensem bl_v74_CDS CDS annotations 2 176 o 3 SRR719299_1 trimmed Reads locally 10 Clinvar_20131203 Variants 1 210 0 3 Cosmic_v67 Variants 20 923 0 36 1000GENOMES phase_1 _EUR 7 o m a I bean a i 7 do i Aa a a im amp _ b 1 PhastCons_conservati 4 P MORE a m 55 Figure 7 21 The Genome Browser View showing the annotated somatic variants together with a range of other tracks To see the level of nucleotide conservation from a multiple alignment with many vertebrates in the region around each variant a track with conservation scores is added as well Mapped sequencing reads as well as other tracks can be easily added to this Genome Browser View By double clicking on the ann
89. to identify and annotate variants in one sample The tool consists of a workflow that is a combination of the Identify Variants and the Annotate Variants workflows The tool runs an internal workflow which starts with mapping the sequencing reads to the human reference sequence Then it runs a local realignment to improve the variant detection which is run afterwards After the variants have been detected they are annotated with gene names amino acid changes conservation scores information from clinically relevant variants present in the ClinVar database and information from common variants present in the common dbSNP HapMap and 1000 Genomes database Furthermore a detailed mapping report or a targeted region report whole exome and targeted amplicon analysis is created to inspect the overall coverage and mapping specificity Import your targeted regions A file with the genomic regions targeted by the amplicon or hybridization kit is available from the vendor of the enrichment kit and sequencing machine To obtain this file you will have to get in contact with the vendor and ask them to send this target regions file to you You will get the file in either bed or gff format CHAPTER 6 WHOLE EXOME SEQUENCING WES 127 Inip Genome Browse X TOOTA 42 676 000 42 676 500 42 677 000 42 677 500 42 675 425 ft _cuprens_suyuenee_my ee Homo_sapiens_ensembl_v73_Genes Gene annotations 1 322 Homo_sapiens_ensem
90. to identify interesting variants we recommend that variants with a conservation score of more than 0 9 PhastCons score is prioritized over variants with lower conservation scores It is possible to filter variants based on their annotations This type of filtering can be facilitated using the table filter found at the top part of the table If you are performing multiple experiments where you would like to use the exact same filter criteria you can create a filter that can be saved and reused To do this Toolbox Identify Candidate Variants j Create Filter Criteria This tool can be used to specify the filter and the Annotate Variants workflow should be extended by the Identify Candidate Tool configured with the Filter Criterion The Biomedical Genomics Workbench reference manual has a chapter that describes this in detail http www clcbio com support downloads manuals see chapter Workflows for more information on how pre installed workflows can be extended and or edited Note Sometimes the databases e g dbSNP are updated with a newer version or maybe you have your own version of the database In such cases you may wish to change one of the used databases This can be done with Data Management function which is described in section 4 1 4 7 1 2 Identify Known Variants in One Sample TAS The Identify Known Variants in One Sample TAS ready to use workflow is a combined data analysis and interpretation read
91. variants specified by the user e g known breast cancer associated variants for their presence or absence in a sample Please note that the ready to use workflow will not identify new variants The Identify Known Variants in One Sample WES ready to use workflow maps the sequencing reads to a human genome sequence and does a local realignment of the mapped reads to improve the subsequent variant detection In the next step only variants specified by the user are identified and annotated in the newly generated read mapping CHAPTER 6 WHOLE EXOME SEQUENCING WES 106 Import your known variants To make an import into the Biomedical Genomics Workbench you should have your variants in GVF format http www sequenceontology org resources gvf html or VCF format http ga4gh org fileformats team Please use the Tracks import as part of the Import tool in the toolbar to import your file into the Biomedical Genomics Workbench Import your targeted regions A file with the genomic regions targeted by the amplicon or hybridization kit will be provided by the vendor To obtain this file you will have to get in contact with the vendor and ask them to send this target regions file to you You will get it in either bed or gff format Please use the Tracks import as part of the Import tool in the toolbar to import your file into the Biomedical Genomics Workbench How to run the Identify Known Variants in One Sample WES workflow 1 G
92. warning as shown in figure 7 7 This is simply a warning telling you that it may take some time to create the table if you are working with tracks containing large amounts of annotations Please note that in case none of the variants are present in ClinVar or dbSNP the corresponding annotation column headers are missing from the result i Warning You are about to display 172 890 annotations in a table view The workbench might be unresponsive while the new view is created Press OK to continue or Cancel to use another view Figure 7 7 Warning that appears when you work with tracks containing many annotations Adding information from other sources may help you identify interesting candidate variants for further research E g common genetic variants present in the HapMap database or variants CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 165 known to play a role in drug response or other clinical relevant phenotypes present in the ClinVar database can easily be identified Further variants not found in the ClinVar database can be prioritized based on amino acid changes in case the variant causes changes on the amino acid level A high conservation level between different vertebrates or mammals in the region containing the variant can also be used to give a hint about whether a given variant is found in a region with an important functional role If you would like to use the conservation scores
93. wish to use this set for your workflows Create Custom Reference Data Set Name Hg38 Organism homo_sapiens Chromosomal Extension Full human genome Annotation Type Indudes all std annotations eference Version Selected data 1000 Genomes Project 1000GENOMES_phase_3 ensembl_v80 v 20150629 X onservation Scores PhastCons hg38 v 142 v SNP Common 142 v ene Ontology 20150630 v S ensembl_v80 v phase_3_ensembl_v80 v HAPMAP_phase_3_ensembl A ensembl_v80 v quence hg38 v _Cancei _save Figure 4 10 Select the reference data elements you want to add to you custom reference data set e A button labeled Close Click on this to close the wizard 4 1 5 Troubleshooting reference data downloads Network connection errors can occur when downloading reference data If this happens you can try to resume the download from the Process tile when the network connection has been restored see figure 4 11 Alternatively you can simply press stop to cancel the download process and clean up any temporary data Toolbox Download reference data Downloading dbsnp_v138 dc gz Ea ay 3 L Stop Pause show Results Find Results Show Log Info Show Message Show Errors f 4 Processes Toolbox Favorites Job History Job Board E Download reference data Downloading dbsnp_v138 de cz SSS SSS Figure 4 11 It is possible to resume the download of data if you have encou
94. workflows are run This means that you will not be able to execute these workflows on the server figure 4 7 Manage Reference Data on server v space on temporary folder location 22 48 GB Size on disk 7 4 51 GB inload Delete Apply ad Size On Disk Size 5427 5 MB 2 1 MB 22 ME 3 310 2 ME 5 KB Figure 4 Check where your reference data is applied by looking at the column Applied in the data set description For references like the 1000 Genomes Project and HapMap databases which contain more than one reference data file the workflow will initially be configured with all the populations being available and you will be able to specify which reference data to use in the workflow wizard directly But you can also modify a pre existing Reference Data Set to contain only the population you want to work with In the Data Management wizard select the Reference Data Set you are interested in click on Create Custom Set Select the version of the 1000 genomes or Hapmap database you wish to work with figure 4 8 A pop up window will open where you can select the population you want to work with Alternatively click on the option custom in lieu of version and choose from the CLC_References folder the population of your choice figure 4 9 Three letter codes are used to specify the population that the different reference data origin from e g ASW American s of African Ancestry in SW USA For the phase 3 Ha
95. 0 A Download reference data Done 100 eee a a Download reference data Done 100 Fl A Download reference data Done 100 A _ _ EEE_EE _ gt gt E__ _____EE E a Download reference data Done 100 Ll lh gt _ _______ SS __LL_______________ Processes Toolbox Favorites Job History Job Board Download reference data Downloading HAPMAP_phase_3_ensembl_v Figure 4 6 Click on the info button to see the legal notice and license information Once the reference data has been downloaded the set or element is marked with a check icon G If you have finished downloading the appropriate Reference Data Set click on the button labeled Apply and the workflows will automatically be configured with all the relevant reference data available The information in the Applied column in the right panel of the reference data manager describes whether the dataset has been applied to the location specified in the drop down menu For example a Yes in the Applied column when the drop down menu is set to On Server means that the given data will be used from the server when the affected workflows are run This will be the case even if you choose execute the workflow locally i e in the workbench If CHAPTER 4 GETTING STARTED 43 the Applied column contains Yes when the drop down menu is set to Locally this means the given data will be used from the local reference folder when the affected
96. 001 paired 2 7 MAIA Paw Semi Q zenter search term gt 2 Figure 8 17 Select the RNA seq reads from the tumor sample 4 In the next wizard step figure 8 18 you can adjust the settings for the Create fold change track tool This tool calculates for each transcript or gene the ratio between the expression values in the normal and the tumor sample It becomes then possible to filter on fold changes and expression values which makes it easy to identify differentially expressed tran scripts or genes The parameters that can be adjusted in this wizard step are described in de tail in the Biomedical Genomics Workbench user manual See http clcsupport com CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 232 biomedicalgenomicsworkbench current index php manual Create_fold_change_ track htm Bx Identify Candidate Vanants and Genes from Tumor Normal Pair h Hia Create Fold Change Track T ee e Configurable Parameters 2 Select normal sequencing Scale reads Fold change cutoff 0 5 Select tumor sequencing 4 Ignore features with maximum expression level below 1 0 reads Create Fold Change Track Locked Settings aa Figure 8 18 Specify the parameters for variant calling 5 Specify in the next 2 windows a target region for the analysis of the Normal sample with the Indels and Structural Variants tool first for the Normal sample followed by the Tumor sample figure 8 19
97. 001 paired intermediate 1 year month date Exome project Qy lt enter search term gt A Batch Figure 4 27 Select the sequencing raw data that you wish to prepare before further analysis At this step you can also choose whether you wish to prepare several reads in batch mode There are three ways you can prepare your data you can run them through the workflow one sample at the time or you can select several samples and prepare them simultaneously or finally you can run them in batch mode recommended if your data are found in separate folders If you use batch mode you will get an individual report for every single sample whereas you will get one combined report for all samples if you do not run in batch mode To run several samples at once select multiple samples from the left hand side list and use the small arrow pointing to the right side in the middle of the wizard to send them to Selected elements in the right side of the wizard To run the samples in Batch mode tick Batch at the bottom of the wizard as shown in figure 4 22 and select the folder that holds the data you wish to analyze When you have selected the sample s you want to prepare click on the button labeled Next As part of the data preparation the sequences are trimmed In the next wizard figure 4 28 you can specify different trimming parameters and select the adapter trim list that should be used for adapter tri
98. 027 whole genome and chromosome specific CDS ensembl_v80 and ensembl_v 4 whole genome and chromosome specific mRNA ensembl_v80 and ensembl_v 4 whole genome and chromosome specific Target Regions qiagen_v2 01_hg38 Target Regions qiagen_v2 01 whole genome and chromosome specific and qiagen_v2 whole genome and chromosome specific Target Primers giagen_v2 01_hg38 giagen_v2 01 whole genome and chromosome specific qiagen_v2 whole genome and chromosome specific e For mus musculus CDS ensemb _ v80 Conservation Scores Phastcons mm 10 CHAPTER 4 GETTING STARTED 42 dbSNP ensembl_v80 Gene Ontology 20150630 Genes ensembl_v80O mRNA ensembl_v80 Sequence ensemble_v80O e For rattus norvegicus CDS ensemb_v 9 Conservation Scores Phastcons Rnor_5 0 dbSNP ensembl_v 9 Gene Ontology 20150630 Genes ensembl_v79 mRNA ensembl_v79 Sequence ensemble_v79 Data that has not been downloaded yet is represented by a plus icon 4 Select the set or element you would like to download and click on the Download button Once the data is downloading the Download button fades out and you can check the progress of the downloading in the Processes tab below the toolbox figure 4 6 Toolbox r a Download reference data Downloading HAPMAP_phase_3 ensembl_v80_JPT A Download reference data Done 100 A A E E I I I I III IIIA a Download reference data Done 100 A a Download reference data Done 10
99. 100 Homo _sapiens_sequence_hg19 ACGTGTGCCGCCTGCTGGGCATCTGCCTCACCTCCACCGTGCAQCTCATCACGCAGCTCATGCCCTTCGGCTGCCTCCTGGACTAT CA Homo_sapiens_ensembl_v73_Genes Gene annotations 2 818 Homo_sapiens_ensembI_v73_mRNA mRNA annotations 8 251 Homo_sapiens_ensembI_v73_CDS CDS annotations 4 332 ERR319087 single Reads locally realigned Variants Annotated Variants Variants 6 Cosmic_v67 Variants 64 363 ClinVar_20130930 Variants 4 176 dbsnp_v138 Variants 7 205 594 i 1 00 Phast Cons _conservation_scores_hg19 Graph 0 00 4 II 4 ms ka Bam FS ERR3 19087 si X Rows 26 Table view Genome Region Type Reference Reference Length zygosity Exact match 7 55211116 55211117 Insertion A No 1 Heterozygous 7 Shey ala al ey taba a da eg Insertion T No 1 Heterozygous 7 552425125552425 13 Insertion T No 1 Homozygous 7 55249063 SNV G A No 1 Homozygous Cosmic v67 dbsnp_v138 7 5524912955249130 Insertion A No 1 Heterozygous 4 l els Create Track from Selection ols Fi Figure 7 6 The output from the Annotate Variants ready to use workflow is a genome browser view a track list The information is also available in table view Click on the small table icon to open the table view If you hold down the Ctrl key while clicking on the table icon you will open a split view showing both the genome browser view and the table view You may be met with a
100. 161 7 1 2 Identify Known Variants in One Sample TAS 208 165 7 2 Somatic Cancer TAS 008 ee ee eee te 169 7 2 1 Filler Somatic Variants TAS 4 4 00 60 e 62 do a 169 7 2 2 Identify Somatic Variants from Tumor Normal Pair TAS 175 7 2 3 Identify Variants TAS 1 2 ee a 181 1 2 4 Identify and Annotate Variants TAS 2 0 0 0 eee ee es 185 7 3 Hereditary Disease TAS 08 2 ee eee ee ee 192 7 3 1 Filter Causal Variants TAS HD 2 0 2 eee a 192 7 3 2 Identify Causal Inherited Variants in Family of Four TAS 194 7 3 3 Identify Causal Inherited Variants in Trio TAS 2 2 22 2 ae 198 7 3 4 Identify Rare Disease Causing Mutations in Family of Four TAS 202 7 3 5 Identify Rare Disease Causing Mutations in Trio TAS 207 7 3 0 Identify Variants JASHD acc e ce eae eee wane bd ew oe eH a 212 ot Identify and Annotate Variants TAS HD 2 5004 214 Targeted sequencing also known as targeted resequencing or amplicon sequencing is a focused approach to genome sequencing with only selected areas of the genome being sequenced In cancer research and diagnostics targeted sequencing is usually based on sequencing panels that target a number of known cancer associated genes Thirteen ready to use workflows are available for analysis of targeted amplicon sequencing data figure 7 1 The concept of the pre installed ready to us
101. 37 Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage Minimum count Minimum frequency gt Locked Settings Figure 6 58 Specify the parameters for the Fixed Ploidy Variant Detection tool e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true
102. 44059783 44188365 44188366 sla Create Track from Selection Figure 7 41 Genome Browser View with an open track table to inspect identified variants more closely in the context of the human genome P Identity and Annotate Variants TAS Select sequencing reads Navigation Area Selected elements 1 2 Select sequencing reads 5 cancer_research_workbench ERR319087 single 4 whole_exome_sequencing 4 whole_genome_sequencing targeted_amplicon_sequencing melanoma_ffpe G E fresh_frozen 1 Choose where to run H test_AnnotateVariants G test_identifyVariants H 3 test_identifyAndAnnotz H 3 test_filterSomaticVariar 4 m Qr lt enter search term gt Batch Figure 7 42 Please select all sequencing reads from the sample to be analyzed select the folder that holds the data you wish to analyse If you have your sequencing data in separate folders you should choose to run the analysis in batch mode When you have selected the sample s you wish to prepare click on the button labeled Next 3 In the next wizard step figure 7 43 you can select the population from the 1000 Genomes project that you would like to use for annotation 4 In the next wizard figure 7 44 you can select the target region track and specify the CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 188 m Identify and Annotate Variants TAS 1000 Genomes 1 C
103. 6 Filter Somatic Variants from Tumor Normal Pair W65 ff Piter Somatic Variants from Tumor Normal Pasir WES ffl Filter Somatic Variants from Tumor Normal Pair TAS i whole transcriptome 5 SAI Identify Known Variants in One Sample WIGS BA Identify Known Variants in One Sample WES ef Identify Kroen Varianta in One Sampie TAS f annotate Variants WTS BS Identify Variants VS oD Identify Variants WES pce identify Variants TAS HE Compare Variants in DMA and RHA g Identify and Annotate Wariarits WES Identify and Annotate Variants TAS identify Candide Variants and Genes from Tumor Hormel Pair fe Eden tify Variants and Add Expression Values A ne i E las identify and Annotabe Oefierertbally E esed Genes and Pathenys _ Analysisof WGS data _ Analysis of WES data Analysis of TAS data _ Analysis of WTS data i H i H j H i i 11 I je F F F iF Data interpretation Figure 2 6 A basic example of the flow of steps for a sequencing data analysis The data is first imported into the Workbench Then it should be prepared for analysis Here a ready to use workflow labeled workflow 1 is used for this It runs quality control and trimming steps After inspection of the quality and trimming reports the trimmed data are used as input for another ready to use workflow called workflow 2 in this figure This is where the data analysis is carried out Here workflow choices associated with variant detection are
104. 78 Specify the parameters for the Indels and Structural Variants tool e Minimum coverage provides the length of each target region that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored QC for Target Sequencing Configurable Parameters Track of Target Regions gt S0293689_Regions_BED ye Minimum coverage 30 Ignore non specific matches Ignore broken pairs gt Locked Settings X cancel Figure 7 79 Specify the parameters for the QC for Target Sequencing tool For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual QC_Target_Sequencing html 5 Specify the parameters for the Fixed Ploidy Variant Detection tool including a target region file figure 7 80 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of t
105. 8 The targeted region track should be the same as the track you selected in the previous wizard step Variants found outside the targeted regions will not be included in the output that is generated with the ready to use workflow Click on the button labeled Next 7 Click on the button labeled Next to go to the step where you can adjust the settings for removal of germline variants figure 7 29 CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 178 7 Identify Somatic Variants from Tumor Normal Pair TAS Low Frequency Variant Detection Configurable Parameters 1 Choose where to run 2 oe mn a Pi Select normal sequencing reads Select tumor sequencing reads InDels and Structural Variants normal QC for Target Sequencing normal InDels and Structural Variants tumor Low Frequency Variant Detection Required significance Ignore positions with coverage above Restrict calling to target regions Ignore broken pairs Ignore non specific matches Minimum read length Minimum coverage Minimum count Minimum frequency Base quality filter Read direction filter 1 0 100000000 IE target_regions Direction frequency 5 0 Read position filter Significance 1 0 Relative read direction filter Significance 1 0 Remove pyro error variants In homopolymer regions with minimum length 3 With frequency below 0 gt Locked Settings Figure 7 28
106. Acid Track Shows the consequences of the variants at the amino acid level in the context of the original amino acid sequence A variant introducing a stop mutation is illustrated with a red amino acid e Genome Browser View This is a collection of tracks shown together in a view that makes it easy to compare information from the individual tracks such as compare the identified putative causal variants with the read mappings and information from databases 5 3 3 Identify Causal Inherited Variants in Trio WGS The Identify Causal Inherited Variants in a Trio WGS ready to use workflow identifies putative disease causing inherited variants by creating a list of variants present in both affected individuals and subtracting all variants in the unaffected individual The workflow includes a back check for all family members The Identify Causal Inherited Variants in a Trio WGS ready to use workflow accepts sequencing reads as input How to run the Identify Causal Inherited Variants in a Trio WGS workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Causal Inherited Variants in a Trio WGS workflow go to Toolbox Ready to Use Workflows Whole Genome Sequencing z Hereditary Disease H Identify Causal Inherited Variants in a Trio WGS EET
107. All Parameters allows you to preview all parameters At this step you can only view the parameters and it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify Causal Inherited Variants in a Family of Four WGS workflow Five types of output are generated e Reads Tracks One for each family member The reads mapped to the reference sequence e Variants in One track for each family member The variants identified in each of the family members The variant track can be opened in table view to see all information about the variants CHAPTER 5 WHOLE GENOME SEQUENCING WGS 87 e Putative Causal Variants in Child The putative disease causing variants identified in the child The variant track can be opened in table view to see all information about the variants e Gene List with Putative Causal Variants Gene track with the identified putative causal variants in the child The gene track can be opened in table view to see the gene names e Target Region Coverage Report One for each family member The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from each sample e Target Region Coverage One track for each individual When opened in table format it is possible to see a range of different information about the targeted regions such as target region length read count and base count e An Amino
108. CCGGAGT GAGGAGGGCCTGGAGA TGTQOTGGCTCATCCGG TGTOTGGCTCATCCGGAGTGAGGAGGGCCTGGAGATGCTGAT TGTOTGGCT CAT CCGGAGT GAGGAGGGCCTGGAGATGCTGAT TGTOTGGCTCATCCGGAGTGAGGAGG CT GGCTCATCCGGAGT GAGGAGGGCCTGGAGATGCTGAC CIT TGTQTGGCTCATCCGGAGTGAGGAGGGCCTGGAGATGCTGAG TGTOTGGCTCATCCGGAGTGAGGAGGGCCTG TGTOTGGCTCATCCGGAGT GAGGAGGGCCTGGAGATGCTGAT TGTOTGGCTCATCCGGAGTGAGGAGGGCCTGGAGATGCTGAT TGTOTGGCT CAT CCGGAGT GAGGAGGGCCTGGAGATGCTGAT TGTQTGGCTCATCCGGAGTGAGGAG GCCT GGAGATGCTGAT TGTOTGGCTCATCCGGAGTGAGGAGG TGTOTGGCTCATCC TGTOTGGCT CAT CCGGAGTGAGGAGGGCCTGGAGATGCTGAT TGTOTGGCTCATCCGGAGTGAGGAGGGCCTGGA TGTOTGGCTCATCCGGAGTGAGGAGGGCCTGGAGATGCTGAT CITT TGT CT GGCTCATCCGGAGTGAGGAGGGCCTGGAGATGCTGAT GCCTGGAGATGCTGAG GCCTGGAGATGCTGAC GCCTGGAGATGCTGAT GCCTGGAGATGC Zygosity Heterozygous J Frequency 29 447 6 49 Count 29 Coverage 447 Probability 1 00 Forward reverse balance 0 07 2 27 Average quality 61 38 start positions 21 _ end positions 19 BaseQRankSum 0 0 Hold Shift to display tooltip without delay CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 24 lny Track List_1 02 800 15 302 820 16 302 640 16 302 360 I I I I TTT TT TT OAL OL OLILA T OLDDA D I VAGGA CACCCCTCAGCCAGCTGTICTTGGAGGTCCTGCCCCTGGGACTIGTCTGGCTCATCCG CACCCCTCAGCCAGCTGTTICTTGGAGGTCCTGCCCCTGGGACTTIGTCTGGCTCATCCG CACC CCAGCTGTTCTTGCAGGTCCTGCCCCTGGGACTTGTCTGGCTCATCCGGAGTGAGGAGG CACCACTCAGCCAGCTGTTCTTGGAGG
109. ED 41 e chr 14 of hg19 2 3 GB for use with the Copy Number Variant Detection tutorial e chr 17 of hg19 2 GB for use with the RNA Seq Analysis of Human Breast Cancer Data tutorial e chr 21 of hg19 1 GB for use with the ChIP Sequencing tutorial e chr 22 of hg19 1 GB for use with the Identification of Somatic Variants in a Matched Tumor Normal Pair tutorial Each Reference Data Set is made of a compilation of Reference Data Elements Downloading sets will automatically download the elements the set is made of but you can also download elements individually under the Reference Data Elements folder Note that data for hg19 is available for the whole genome as well as for individual chromosome 5 14 17 21 and 22 e For homo sapiens Sequence hg38 Sequence hg19 whole genome and chromosome specific dbSNP 142 dbSNP 138 whole genome and chromosome specific dbSNP Common 142 dbSNP Common 138 whole genome and chromosome specific Hapmap phase_3_ensembl_v80 Hapmap phase_3 whole genome and chromosome specific Genes ensembl_v80 ensembl_v 3 ensembl_v 74 whole genome and chromosome specific Conservation Scores PhastCons hg38 Conservation Scores PhastCons hg19 whole genome and chromosome specific ClinVar 20150629 and 20130930 whole genome and chromosome specific 20131203 whole genome and chromosome specific 1000 Genomes Project phase_3 and phase_1 whole genome and chromosome specific Gene Ontology 20150630 and 20131
110. ET ETAT TORAN ae A dill aa e I Read tapping Affected Parent 26 200 000 000 EE Ahi a tt hk T da edad aidd iaa adia aa NOE EE ee E _ pean re HI I III I HI I Pil ot dlan d PAE E Figure 2 16 The conservation score track has been added to the Genome Browser view by dragging the track from the Navigation Area into the Genome Browser view in the View Area Part Il Applications ready to use workflows 2 Chapter 3 Ready to Use Workflows descriptions and guidelines Contents 3 1 General Workflow 2 2 eee nnn 28 3 2 Somatic Cancer 2 2 ee eee eee 29 3 3 Hereditary Disease 2 02 eee ee 29 Biomedical Genomics Workbench contains several ready to use workflows that support analysis of cancer data but also analysis of hereditary diseases and other conditions that are best studied using family analysis The workflows are specific to the type of data used as input Whole Genome Sequencing WGS Whole Exome Sequencing WES Targeted Amplicon Sequencing TAS and Whole Transcriptome Sequencing WTS For each of the first three categories WGS WES and TAS workflows exist that can be used for general identification and annotation of variants irrespective of disease these workflows are found in a folder called General Analysis In folders called Somatic Cancer you can find workflows that are specific for cancer research Finally you will find a folder under each of the WGS WES and TAS applicat
111. Figure 9 4 The second Create Installer wizard step The installed workflow will appear in the Workflow folder in the toolbox Chapter 10 Using data from other workbenches Contents 10 1 Open outputs from other workbenches 00 0022 nea 247 10 1 Open outputs from other workbenches Please note that if you also have access to CLC Genomics Workbench CLC Main Workbench or CLC Sequence Viewer you may have generated different types of output that you would like to view in the Biomedical Genomics Workbench All types of output that have been created in CLC Genomics Workbench CLC Main Workbench or CLC Sequence Viewer can be opened in the Biomedical Genomics Workbench This means that you are capable of opening certain output types that cannot be generated from within the Biomedical Genomics Workbench In such cases we refer to our other manuals e g the CLC Genomics Workbench manual that can be found here http www clcbio com support downloads manuals for further information about the output types that are not described in the Biomedical Genomics Workbench manual Output files from other workbenches can be imported as described in section 4 3 1 using Standard Import 247 Part IV Plugins 248 Chapter 11 Plugins The Biomedical Genomics Workbench can be upgraded and customized by installing plugins This can be done by clicking on the button labeled Plugins in the upper right corner of the Biomedical Genomics Work
112. Fixed_Ploidy_Variant_Detection html Specify the parameters for the Fixed Ploidy Variant Detection tool for the father Specify the parameters for the Fixed Ploidy Variant Detection tool for the mother Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected child Specify the parameters for the QC for Target Sequencing tool for the affected child Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters and it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify Rare Disease Causing Mutations in a Family of Four TAS workflow Twelve different types of output are generated e Reads Mapping One for each family member The reads mapped to the reference sequence e Variant Tracks One for each family member The variants identified in each of the family members The variant track can be opened in table view to see all information about the variants e Target Region Coverage One track for each individual When opened in table format it is possible to see a range of different information about the targeted regions such as target region length read count and base count CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 207 e Target Region Coverage Report One for each family member The report consists of a number of tables and graphs that in different ways provide infor
113. GT CCT GCCCCTGGGAC gt CCCTCAGCCAGCTGT TCT TGGAGGTCCTGCCCCTGGGAC GT TCT TGGAGGTCCTGCCCCTGGGAC gt CCCTCAGCCAGCTGT TCT TGGAGGTCCTGCCCCTGGGAC gt CCCTCAGCCAGCTGT TCT TGGAGGTCCTGCCCCTGGGAC gt CACTCAGCCAGCTGTTCTTGGAGGTCC GGGA gt CCCTCAGCCAGCTGTTCT TG AGGTCCTGCCCCTGGGAC gt CACTCAGCCAGCTGT TCT TGGAGGTCCTGCCCCTGGGAC MMORE GGA gt CCCTCAGCCAGCTGTTCTTGCAGGTCCTGCCCCTGGGAC gt CCCTCAGCCAGCTGTTCTTGCAGGTCCTGCCCCTGGGAC gt CCCTCAGCCAGCTGTTCTTGGAGGTCCTGCCCCTGGGA gt CCCTCAGCCAGCTGTTCTTGGAGGTC A gt CACTCAGCCAGCTGT TCT TGGAGGTCCTGCCCCTGGGAC CCT GCCCCTGGGAC CAGACAGCTGT TCT TGCAGGTCCTGCCCCTGGGAC TGTTCTTGGAGGTCCTGCCCCTGGGAC CCTGCCCCTGGGAC GCT TGCT TGGAGGTCCTGCCCCTGGGAC gt CCCTCAGCCAGCTGT TCT TGGAGGTCCTGCCCCTGGGAC gt CCCTCAGCCAGCTGT TCT TGGAGGTCCTGCCCCTGGGAC TGT TCT TGGAGGT CCT GCCCCTGGGAC gt CCGTCAGCCGGCTGTTTGTGGAGGTCCTGCGCCTGGGAC gt CACTCAGCCAGCTGT TCT TGGAGGTCCTGCCCCTGGGAC gt CCCTCAGCCAGCTGT TCT TGGAGGTCCTGCCC CAGCCAGCTGTTCTTGGAGGTCCTTCCCCTGGGAC gt CCCTCAGCCAGCTGT TCT TGGAGGTCCTGCCCCTGGGAC gt CACTCAGCCAGCTGTTCTTGGAGGTCCT TGGGAC gt CCCTCAGCCAGCTGT TCT TGCAGGT CCT GCCCCTGGGAC with a red arrow here TGTQOTGGCTCATCCGGAGTGAGGAGGGCCTGGAGATGCTGAG TETQOTGGCTCATCCGGAGTGAGGAGGGCCTGGAGATGCTGAT TGTQOTGGCTCATCCGGAGTGAGGAGGGCCTGGAGATGCTGA TGTOTGGCT CAT CCGGAGT GAGGAGGGCCTGGAGATGCTGAT TGTOTGGCTCATCCGGAGTGAGGAGG TGTOTGGCTCATCCGGAGT GAGGAGGGCCTGGAGATGCTGAC CIT TGTOTGGCTCATCCGGAGTGAGGAGGGCCTGGAGATGCTGAG TGTQTGGCT CAT
114. Gene track with the identified putative compound heterozygous Variants in the proband The gene track can be opened in table view to see the gene names e Genome Browser View This is a collection of tracks shown together in a view that makes it easy to compare information from the individual tracks such as compare the identified variants with the read mappings and information from databases e De novo Mutations Amino Acid Track Recessive Variants Amino Acid Track 5 3 6 Identify Variants WGS HD You can use the Identify Variants WGS HD ready to use workflow to call variants in the mapped and locally realigned reads The workflow removes false positives and in case of a targeted experiment removes variants outside the targeted region Variant calling is performed with the Fixed Ploidy Variant Detection tool The Identify Variants WGS HD ready to use workflow accepts sequencing reads as input CHAPTER 5 WHOLE GENOME SEQUENCING WGS 98 How to run the Identify Variants WGS HD workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Variants WGS HD workflow go to Toolbox Ready to Use Workflows Whole Genome Sequencing gt Hereditary Disease jc Identify Variants WGS HD 2 1 Double click on the Identify Variants WGS HD
115. Identify and Annotate Variants TAS tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis Click on the button labeled Next 2 This will open the wizard shown in figure 7 42 where you can select the sequencing reads from the sample that should be analyzed If several samples should be analyzed the tool has to be run in batch mode This is done by selecting Batch tick Batch at the bottom of the wizard as shown in figure 7 42 and CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 187 Inip Genome Browse X TOOTA ai aa 42 676 500 42 677 000 42 677 500 42 675 425 I l muvimu _oupreno_ouyuurey_myte Homo_sapiens_ensembl_v73_Genes Gene annotations 1 322 Homo_sapiens_ensembl_v73_CDS CDS annotations 1 996 ft a EEE Dee OO E ao o a tumor_01 paired Reads 2 amp 5 SS ee E I M p eet e a E EES pe ee ee ee a mumm 4 203 899 reads 0 e er a SS _ 0 tumor_01 paired Variants 77 553 tumor_01 paired Reads locally realigned InDel Variants 143 tumor_01 paired Reads locally realigned SV Complex insertion Replacement annotations 40 TIERE Pa pAn ES tumor_01 pai X Rows 143 Table view Genome Chromosome Region Reference 42692413 42692444 TATACATATATATGTACACACACACACACACA 4292564342925644 4365499043654991 44059782
116. If you choose to open the results the results will not be saved automatically You can always save the results at a later point Output from the Identify Variants W The Identify Variants WES tool produces six different types of output 1 Read Mapping The mapped sequencing reads ES workflow The reads are shown in different CHAPTER 6 WHOLE EXOME SEQUENCING WES 125 r Identify Variants WES Result handling 1 Choose where to run 2 Select sequencing reads Workflow parameters InDels and Structural Preview All Parameters Variants QI QC for Target Sequencing Result handling d Low Frequency Variant Detection Save Remove Variants Outside Targeted Regions g Log handling Open log N Result handling EREE ECE Figure 6 39 Choose to save the results In this wizard step you get the chance to preview the settings used in the ready to use workflow colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the descrip tion of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual View_settings_in_Side_Panel html 2 Target Regions Coverage The target regions coverage track shows the coverage of the targeted regions Detailed information about
117. Management Fy function before starting the workflows see section 4 1 4 Select the variant track figure 3 1 The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on the variant track name or click once on the file and then click on the arrow pointing to the right side in the middle of the wizard Bx Filter Causal Variants WGS HD Select variant tracks 1 Choose where to run Geo Navigation Area 2 Select variant tracks SET aE a Selected elements 1 bbb Acinic cell carcinoma variants with bbb SRR719300_1 paired trimmed bb SRR719300_1 paired trimmed dA Acinic cell carcinoma variants wi ny bbe Acinic cell carcinoma variants wi et Q lt enter search term gt Batch Figure 3 1 Select the variant track from which you would like to filter somatic variants Specify the sequencing reads for each family member figure 3 2 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that Should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Select sequencing reads 1 Choose where to run eae Navigation Area Selected elements 1 2 Select reads fr
118. Minimum coverage Minimum count Minimum frequency gt Locked Settings Figure 7 58 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 197 10 Ti 12 13 may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may
119. Navigation Area Selected elements 1 2 Select reads from Family of Four a Family member affected affected family member Affected child Father affected Mother unaffected f Family member affected Ai Qr lt enter search term gt Previous gt Next Figure 1 Specify the sequencing reads for the appropriate family member 3 Specify a target region file for the Indels and Structural Variants tool figure 7 78 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents 4 Specify the parameters for the QC for Target Sequencing tool including a target region file figure 7 79 When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows Again you can choose to use the default settings or you can choose to adjust the parameters The parameters that can be set are CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 213 InDels and Structural Variants Configurable Parameters Restrict calling to target regions 0293689_Regions_BED gt Locked Settings Figure 7
120. Ontology file in slim format only high level GO terms annotated for the GO categories Molecular Function Biological Process and Cellular Component annotated on human genes The file was made using the QuickGO tool from the EBI http www ebi ac uk QuickGO GMultiTerm target primers and target regions QIAGEN _v2 https www gqiagen com dk shop sample technologies dna sample technologies genomic dna generead dnasegq gene panels v2 These primers and regions are defined and provided for by QIAGEN GeneRead DNAseq Targeted Panels V2 Human hg38 e Human reference sequence ENSEMBL ftp ftp ensembl org pub release 80 fasta homo_sapiens dna The file Homo_sapiens GRCh38 dna toplevel fa gz has chromosomal sequences along with several scaffolds The scaffolds were removed in the workbench e Human genes coding sequences and transcripts ENSEMBL ftp ftp ensembl org pub release 80 gtf homo_sapiens filename Homo_sapiens GRCh38 80 gtf gz APPENDIX A REFERENCE DATA OVERVIEW 253 e HapMap variants ENSEMBL ftp ftp ensembl org pub release 80 variation gvf homo_sapiens The goal of the International HapMap Project is to develop a haplotype map of the human genome the HapMap which will describe the common patterns of human DNA sequence variation for more information about HapMap see http hapmap ncbi nlm nih gov Please note that there are 12 different files tracks to be downloaded one file for each population It is recomme
121. Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes see figure 8 30 Choose to save the results and click on the button labeled Finish Seven different output types are generated CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 238 m Identify Variants and Add Expression Values Add Information from 1000 Genomes Project Configurable Parameters 1 Select sequencing reads 2 en a Structural Known variants track 1000GENOMES_phase_1_AFR 1000GENOMES_phase_1_AMR 1000GENOMES_phe 4k arian 3 Low Frequency Variant gt Locked Settings Detection 4 Add Information from 1000 Genomes Project aes E mevo gt ne Figure 8 28 Select the relevant population from the drop down list Identify Variants and Add Expression Values Result handling 1 Select sequencing reads 2 InDels and Structural Variants Workflow parameters Preview All Parameters 3 Low Frequency Variant Detection Result handling 4 Add Information from N 1000 Genomes Project Open Save 5 Result handling Log handling Open log Figure 8 29 Check the selected parametes by pressing Preview All Parameters 1 Gene expression 5 A track showing gene expression annotations Hold the mouse over or right clicking on the track A tooltip will appear with information about e g gene name and expres
122. QUENCING TAS 212 7 3 6 Identify Variants TAS HD You can use the Identify Variants TAS HD ready to use workflow to call variants in the mapped and locally realigned reads The workflow removes false positives and in case of a targeted experiment removes variants outside the targeted region Variant calling is performed with the Fixed Ploidy Variant Detection tool The Identify Variants TA HD ready to use workflow accepts sequencing reads as input How to run the Identify Variants TAS HD workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Variants TAS HD workflow go to Toolbox Ready to Use Workflows Targeted Amplicon Sequencing Sequencing amp Hereditary Disease Hb Identify Variants TAS HD 1 Double click on the Identify Variants TAS HD tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the sequencing reads you want to analyze figure 7 7 7 The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard r LAT fz Select sequencing reads nT ey
123. RNA 1 Select DNA sequencing reads Select RNA sequencing reads InDels and Structural Variants RNA Low Frequency Variant Detection RNA Figure 8 11 Specify the parametes for transcriptomic variant detection Configurable Parameters Required significance 1 0 Ignore positions with coverage above Restrict calling to target regions Ignore broken pairs E Ignore non specific matches Reads Minimum read length 100 000 000 Minimum coverage Minimum count Minimum frequency Base quality filter Read direction filter Direction frequency Relative read direction filter Significance Read position filter Significance Remove pyro e rror variants In homopolymer regions with minimum length 3 With frequency below gt Locked Settings 22 has been selected The reason for this is that many databases do not report a succession of SNVs as one MNV as is the case for the Biomedical Genomics Workbench and as a consequence it is not possible to directly compare variants called with Biomedical Genomics Workbench with these databases In order to support filtering against these databases anyway the option to Automatically join adjacent MNVs and SNVs is enabled This means that an MNV in the experimental data will get an exact match if a set of SNVs and MNVs in the database can be combined to provide the same allele Note This assumes that SNVs and MNVs in the track of known va
124. S The Identify Variants WES tool takes sequencing reads as input and returns identified variants as part of a Genome Browser View The tool runs an internal workflow which starts with mapping the sequencing reads to the human reference sequence Then it runs a local realignment to improve the variant detection which is run afterwards At the end variants with an average base quality smaller than 20 are filtered away In addition a targeted region report is created to inspect the overall coverage and mapping specificity in the targeted regions Import your targeted regions A file with the genomic regions targeted by the amplicon or hybridization kit will be provided by the vendor To obtain this file you will have to get in contact with the vendor and ask them to send this target regions file to you You will get it in either bed or gff format Please use the Tracks import as part of the Import tool in the toolbar to import your file into the Biomedical Genomics Workbench CHAPTER 6 WHOLE EXOME SEQUENCING WES 122 ly Genome Browse X ee Homo_sapiens_sequen eee eee eee Homo_sapiens_ensem q bl_v74_Genes D l Gene annotations 5 363 Gi in 4 a CE gt it PICK D DID DD ED Di DI ame E CE E li Homo_sapiens_ensem i bl_v74_mRNA az A TARA mRNA annotations r oo _ i eS 15 469 _ S Ld a Joe mae Ee He EG E P SSS SS J i Homo_sapiens_ensem bl_v74_CDS 2 i CDS a
125. Sapiens was approximately Size as estimated in August 2015 CHAPTER 4 GETTING STARTED 38 Database Size 1000 Genomes 8 GB CDS 56 MB ClinVar 140 MB PhastConc 6 GB Cytogenic 80 KB Ildeogram dbSNP 71 GB dbSNP Common 3 GB Genes 6 MB Gene Ontology 45 MB HapMap 3 GB mRNA 75 MB Sequence 100 MB Target Regions 1 MB Target Primers MB 4 1 3 Where reference data is downloaded from Reference data must be downloaded and configured manually before you can start using the ready to use workflows in the Biomedical Genomics Workbench You only have to do this once When all necessary reference data have been downloaded and configured you will be automatically notified whenever updated reference data are available Data is provided by QIAGEN and the Workbench is configured to download from QIAGEN by default The location to download the data from can be seen in Edit Preferences Advanced as shown in figure 4 1 Unless you are in the special circumstance that your system administrator has decided to mirror this data locally and wishes you to use that mirror of the data you should not change this setting 4 1 4 Download and configure reference data The first time you open Biomedical Genomics Workbench you will be presented with the dialog box shown in figure 4 2 which informs you that data are available for download either to the local or server CLC_References repository If you check the Never show this dialog again then subsequent
126. Shows the human reference sequence annotation tracks for genes coding regions transcripts and expression comparison with GO information and a conservation score track see figure 8 35 CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 242 aar ai 1 i k ai 200 000 000 Homo_sapiens_sequen ce_hg19 249 250 621bp 79 Homo_s ure em bLv Gene anno arc a po CDS a ian eens u tts elt bes ju Homo_sapie ery ae ession Comparison annotations 5 363 1 PhastCons_conservati on_scores_hg19 nl OF i Figure 8 35 The genome browser view allows comparison of the expression comparison tracks with the reference sequence and different annotation tracks Part Ill Customized data analysis 243 Chapter 9 How to edit application workflows Contents 9 1 Introduction to customized data analysis 0 888 eee eee 244 9 2 How to edit preinstalled workflows 0 000 ee eu ee nna 244 9 1 Introduction to customized data analysis Biomedical Genomics Workbench offers a range of different tools that can be used for customized data analysis The vast majority of the tools are workflow enabled which means that the tools can be connected and used in customized workflows The Biomedical Genomics Workbench reference manual has a chapter that describes this in detail http www clcbio com support downloads manuals chapter Workflows 9 2 How to edit preinstalled workflows An important feature o
127. TARGETED AMPLICON SEQUENCING TAS 186 ly Genome Browse X 50 000 000 100 000 000 150 000 000 200 000 000 I l l l Homo_sapiens_sequence_hg19 78 Homo_sapiens_ensembl_v73_Genes Gene annotations 5 321 o 223 Homo_sapiens_ensembl_v73_CDS j 1 T ki bbe ut ana L h W aatia ala anh 881 SRR719300_1 paired trimmed w seutignod coverage Lath adh A aaas dh La Ln hw ft PRAE a a P 13 843 00 SRR719300_1 paired trimmed paired Reads locally realigned 3 954 261 reads 0 00 652 SRR719300_1 paired trimmed paired a the Pe a J 1 inten oe E Ps eee E Pee Figure 7 40 The Genome Browser View allows you to inspect the identified variants in the context of the human genome Import your targeted regions A file with the genomic regions targeted by the amplicon or hybridization kit is available from the vendor of the enrichment kit and sequencing machine To obtain this file you will have to get in contact with the vendor and ask them to send this target regions file to you You will get the file in either bed or gff format To import the file Go to the toolbar Import 4 Tracks E How to run the Identify and Annotate Variants TAS workflow To run the Identify and Annotate Variants TAS workflow go to Toolbox Ready to Use Workflows Targeted Amplicon Sequencing Sequencing amp Somatic Cancer Identify and annotate Variants TAS SY 1 Double click on the
128. TCCTGCCCCTGGGACTIGTCTGGCTCATCCGGAGTGAGGAGG CACCACTCAGCCAGCTGTTICTTGGAGGTCCTGCCCCTGGGACTTIGTCTGGCTCATCCGGAGTGAGGAGGC CACCACTCAGCCAGCTGTICTTGGAGGTCCTGCCCCTGGGACTIGTICTGG CACCACTCAGCCAGCTGTICTTGGAGGTCCTGCCCCTGGGACTIGTCTGGCTCA CACCACTCAGCCAGCTGTTCTTGGAGGTCCTGCCCCTGGGACTTIGTCTGGCTCATC CACCACTCAGCCAGCTGTTCTTGGAGGTCCTG GGACTTGTCTGGCTCATCCGGAGTGAGGAGGC CACACCGCAGCCAGCTGTTCTTGOAGGTCCTGCCCCTGGGACTTITTCTGGCTCATCCGGAGAGAAGAGGE CACACCTCAGCCAGCTGTTCTTGGAGGTCCTGCCCCCGGGACTTGTCTGGCTCATGGGCGGGTGAGGAGGC CACCCCTCAGCCAGCTGTTICTTGGAGGTCCTGCCCCTGGGACTTGTCTGGCTCA CT STTICTTGGAGGTCCTGCCCCTGGGACTTIGTCTGGCTCATCCGGAGTGAGGAGG _ CACCCCTCAGCCAGCTGTICTTGGAGGTCCTGCCCCTGGGACTIGTCTGGCTCATCCGGAGTGAG Sample_1 paired IR eatis ocally realigned Variants Mi Rows 104 645 Table view Genome Refe Length Zygosity Count Cov Yes 0 Hetero No 4 Hetero 1 Hetero 1 Hetero 1 Hetero 1 Hetero 1 Hetero 1 Hetero 1 Hetero 1 Hetero 1 Hetero 1 Homaz 1 Homoz 1 Homaz 2 Homoz 1 Homoz 1 Homaz annwSoarEPERa tian Ru Rua B SE eos BER a Bann Bun Bo Ww B SERB ERE RSS BREE ae cmammanse Unt anne Su onya JInabdhid bea th i PEROEHRSEHAAN San tutu ela Create Track from Selection Figure 2 14 Double click on the track name in the left side of the view area to open the table view shown in split view When opening a track directly from the genome browser view the table and track are linked Hence when selecting a row
129. TER 7 TARGETED AMPLICON SEQUENCING TAS 182 Import your targeted regions A file with the genomic regions targeted by the amplicon or hybridization kit will be provided by the vendor To obtain this file you will have to get in contact with the vendor and ask them to send this target regions file to you You will get it in either bed or gff format Please use the Tracks import as part of the Import tool in the toolbar to import your file into the Biomedical Genomics Workbench How to run the Identify Variants TAS workflow To run the Identify Variants TAS workflow go to Toolbox Ready to Use Workflows Targeted Amplicon Sequencing Sequencing amp Hereditary Disease 3 Identify Variants TAS 2 5 1 Select the sequencing reads from the sample that should be analyzed figure 7 34 P Identify Variants TAS Select sequencing reads 1 Choose where to run Navigation Area Selected elements 1 2 Select sequencing reads cancer_research_workbench a ERR319085 whole_exome_sequencing whole_genome_sequencing targeted_amplicon_sequencing 4 melanoma _ffpe i gt fresh_frozen gt tumor_1 eT Wi s j Qr lt enter search term gt 2 Batch Previous gt Next Finish X Cancel Figure 7 34 Please select all sequencing reads from the sample to be analyzed Select all sequencing reads from your sample If several sample
130. TGGAATGTAATGGAACGGAATGGAAT GAATAGTARGGAAGGGAATGGAATGTAATGGAACGGAATGGAAT GAATGGTARQGAATGGAATGGAATGTAATGGAACGGAATGGAAT GAATGGTARGGAATGGAATGGAATGTAATGGAACGGAATGGAAT AATGGTA AATGGTA AATGGTARF ATGGTA ATGGTA ATGGTA OGAATGGAATGGAATGTAATGGAACGGAATGGAAT OGAATGGAATGGAATGTAATGGAACGGAATGGAAT OGAATGGAATGGAATGTAATGGAACGGAATGGAAT OGAATGGAATGGAATGTAATGGAACGGAATGGAAT OGAATGGAATGGAATGTAATGGAACGGAATGGAATE OGAATGGAATGGAATGTAATGGAACGGAATGGAAT Sample_1 paired Read locally realigned Variants MN Vanants 5 Rows 104 645 Table view Genome Refe Allele Refe Leng Fygosity Count Cov Freq Prob Mans ww AVEr 1 Hetero 50 58 86 21 1 00 7 0 34 62 58 1 Hetero 42 16 607 1 00 0 435 63 36 1 Hetero 42 83 33 1 00 0 34 62 63 1 Homoz 37 37 100 00 1 00 1 0 435 58 57 i 1 Homoz 12 i 100 00 1 00 s 585 83 7 SH 1 Homo 10 100 00 1 00 61 00 724473 MNV NE O r a aE 1 Homoz 13 13 100 00 1 00 6 0 46 63 77 No 1 Homo 12 150 92 31 1 00 6 6 0 50 63 92 10 Mo 1 Homoz 12 12 100 00 1 00 6 6 0 50 6267 11 1 1 1 1 1 1 1 1 1 1 Figure 2 15 When you click on an entry in the table this position will automatically be brought into focus Here a row with information about an MNV which is variant consisting of two or more SNVs was clicked on This brought the location of that MNV into focus in the graphical view To jump directly to a detailed view of a position zoom the graphical view to the desired leve
131. TGTA 5 434 ERR319087 single Annotated riants Cosmic_v67 ClinVar_ 20130930 1000GENOMES_phase_1_EUR dbsnp_common_v138 1 00 Phast Cons_conservation_scores_hg19 Graph 0 00 Mey Fa n BE ES ERR3 19087 si X Rows 17 Table view Genome Region Zygosity Homo_sapiens_ensembl_v73_Genes Amino acid change in longest transcript 126476175 SNV Homozygous 115258748 SNV Heterozygous NRAS ENSP000003585 48 p Glyl2Ser ENST00000369535 34G gt A 123245037 SNV Homozygous FGFR2 ENST00000358487 c 2067C gt T 44 AO Lamm a als Create Track from Selection Figure 7 51 Genome Browser View with an open track table to inspect identified somatic variants more closely in the context of the human genome and external databases Data Management 7 3 Hereditary Disease TAS 7 3 1 Filter Causal Variants TAS HD If you are analyzing a list of variants you can use the Filter Causal Variants TAS HD ready to use workflow to remove variants that are outside the target region as well as common variants present in publicly available databases The workflow will annotate the remaining variants with gene names conservation scores and information from clinically relevant databases The Filter Causal Variants TAS HD ready to use workflow accepts variants tracks files as input files How to run the Filter Causal Variants TAS HD workflow To run the Filter Causal Var
132. TTTCCCAACACCACTICGCT CCAACCACCACCAGTTT GT 2 494 reads TAGTT GGATT GT CAGT GCGCTTTTCCCAACACCACTICGCT CCAACCA CACCAGTTT GT AGTT GGATT GT CAGT GCGCTTTT CCCAACACCACTICGCT CCAACCACCACCAGTTTGTA 5 434 ERR319087 single Annotated y Variants Ih Cosmic_v67 ClinVar_20130930 1000GENOMES_phase_1_EUR dbsnp_common_v138 1 00 Phast Cons_conservation_scores_hg19 Graph 0 00 TIERE Fa Bam ES ERR3 19087 si X Rows 17 Table view Genome Region Type Ref Allele Zygosity Homo_sapiens_ensembl_v73_Genes Amino acid change in longest transcript Coding region change in longest transcri 126476175 Homozygous 115258748 Heterozygous ENSP00000358548 p Gly1l2Ser ENST00000369535 c 34G gt A 123245037 Homozygous 2 ENST00000358487 c 2067C gt T als Create Track from Selection Figure 6 51 Genome Browser View with an open track table to inspect identified somatic variants more closely in the context of the human genome and external databases The added information will help you to identify candidate variants for further research For example can common genetic variants present in the HapMap database or variants known to play a role in drug response or other clinical relevant phenotypes present in the ClinVar database easily be seen Not identified variants in ClinVar can for example be prioritized based on amino acid changes do they cause any changes on the amino ac
133. The targeted region file is a file that specifies which regions have been sequenced This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents i Identify Candidate Variants and Genes from Tumor Normal Pair InDels and Structural Variants Normal Select normal sequencing Configurable Parameters reads Restrict calling to target regions E 50293689_Regions_BED Select tumor sequencing 5 reads gt Locked Settings Create Fold Change Track InDels and Structural Variants Normal EBES Figure 8 19 Specify the target region for the Indels and Structural Variants tool 6 Set the parameters for the Low Frequency Variant Detection step see figure 8 20 For a description of the different parameters that can be adjusted in the variant detection step see http clcsupport com biomedicalgenomicsworkbench current index php manual Low_Frequency_Variant_Detection html If you click on Locked Settings you will be able to see all parameters used for variant detection in the ready to use workflow The next wizard step figure 8 21 concerns removal of germline variants You are asked to Supply the number of reads in the control data set that should support the variant allele in order to include it as a match All the variants where at least thi
134. Variants from One Sample WGS HE 2 This will open the wizard step shown in figure 5 8 where you can select the reads of the sample that should be tested for presence or absence of your known variants i Identify Known Variants in One Sample WGS aie Select sequencing reads ose where to run Navigation Area Selected elements 1 2 Select sequencing reads cancer_research_workbench i tumor_01 paired whole_exome_sequencing E3 whole_genome_sequencing normal_01 paired trimmed paired i normal_01 paired trimmed orphans iy i normal_01 paired result_identify_variants a aco Ww i p Qr lt enter search term gt Batch ai aa Xen Figure 5 8 Select the sequencing reads from the sample you would like to test for your known variants If several samples from different folders should be analyzed the tool has to be run in batch mode This is done by selecting Batch and specifying the folders that hold the data you wish to analyse Click on the button labeled Next In the next wizard step select the file containing the known variants you want to identify in the read mapping figure 5 9 7 Identify Known Variants in One Sample WGS 1 Select sequencing reads 2 InDels and Structural Variants 3 Identify Known Mutations Identify Known Mutations from Sample Mappings Configurable Parameters
135. View file see 7 40 The Genome Browser View includes the track of identified variants in context to the human reference sequence genes transcripts coding regions targeted regions and mapped sequencing reads By double clicking on the variant track in the Genome Browser View a table will be shown which includes information about all identified variants see 7 41 In case you like to change the reference sequence used for mapping as well as the human genes please use the Data Management 7 2 4 Identify and Annotate Variants TAS The Identify and Annotate Variants TAS tool should be used to identify and annotate variants in one sample The tool consists of a workflow that is a combination of the Identify Variants and the Annotate Variants workflows The tool runs an internal workflow which starts with mapping the sequencing reads to the human reference sequence Then it runs a local realignment to improve the variant detection which is run afterwards After the variants have been detected they are annotated with gene names amino acid changes conservation scores information from clinically relevant variants present in the ClinVar database and information from common variants present in the common dbSNP HapMap and 1000 Genomes database Furthermore a detailed mapping report or a targeted region report whole exome and targeted amplicon analysis is created to inspect the overall coverage and mapping specificity CHAPTER 7
136. _20131203 Variants 1 643 a i PhastCons_conservati on_scores_hgi9 mo oy pa 55m Figure 5 25 The Genome Browser View presents all the different data tracks together and makes it easy to compare different tracks Toolbox Ready to Use Workflows Whole Genome Sequencing z Somatic Cancer Identify Variants WGS 2 1 Select the sequencing reads from the sample that should be analyzed figure 5 26 Select all sequencing reads from your sample If several samples should be analyzed the tool has to be run in batch mode To do this tick Batch at the bottom of the wizard and select the folder that holds the data you wish to analyze If you have your sequencing data in separate folders you should choose to run the analysis in batch mode When you have selected the sample s that you want to prepare click on the button labeled Next 2 In the next wizard step figure 5 27 you can specify the parameters for variant detection 3 Click on the button labeled Next This will take you to the next wizard step figure 5 28 CHAPTER 5 WHOLE GENOME SEQUENCING WGS 19 Identify Variants WGS Select sequencing reads Navigation Area 2 Select sequencing reads 3 cancer_research_workbench 4 whole_exome_sequencing whole_genome_sequencing normal_01 paired trimmed pair normal_01 paired trimmed orpt normal_01 paired H 3 result_identify_variants tumor_01 paired H H 9 targeted
137. _amplicon_sequencing 4 li 1 Choose where to run Ltt Q zenter search term gt Batch Previous gt Next Figure 5 26 Please select all sequencing reads from the sample to be analyzed I Identify Variants WGS Low Frequency Variant Detection Configurable Parameters 1 Choose where to run N Select sequencing reads Required significance 1 0 InDels and Structural Ignore positions with coverage above 1000 Variants 2 Ww Restrict calling to target regions yO gt Low Frequency Variant Ignore broken pairs Detection Ignore non specific matches Reads X Minimum read length 20 Minimum coverage 10 Minimum count 2 Minimum frequency 5 0 Base quality filter W Read direction filter Direction frequency 5 0 Read position filter Significance Relative read direction filter Significance Remove pyro error variants In homopolymer regions with minimum length 3 With frequency below 0 8 gt Locked Settings Figure 5 27 The next thing to do is to specify the parameters that should be used to detect variants In this wizard you can check the selected settings by clicking on the button labeled Preview All Parameters In the Preview All Parameters wizard you can only check the settings and if you wish to make changes you have to use the Previous button fr
138. a A good starting point could be to run an analysis with the default settings Fixed Ploidy Variant Detection Configurable Parameters Ignore broken pairs Minimum coverage Minimum count Minimum frequency Required variant probability 50 0 Figure 6 85 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called CHAPTER 6 WHOLE EXOME SEQUENCING WES 158 might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of
139. a E 0 00 ji l4 4 mex Ea sem FS ERR3 19087 si X Rows 26 Table view Genome Region Type Reference Reference Length zygosity Exact match 7 55211116 55211117 Insertion A No 1 Heterozygous 7 55211116455211117 Insertion M No 1 Heterozygous 7 55242512455242513 Insertion T No 1 Homozygous 7 55249063 SNV G A No 1 Homozygous Cosmic v67 dbsnp_v138 7 5524912955249130 Insertion A No 1 Heterozygous 4 l els Create Track from Selection als E Figure 6 6 The output from the Annotate Variants ready to use workflow is a genome browser view a track list The information is also available in table view Click on the small table icon to open the table view If you hold down the Ctrl key while clicking on the table icon you will open a split view showing both the genome browser view and the table view s Bx Warning You are about to display 172 890 annotations in a table view The workbench might be unresponsive while the new view is created g Press OK to continue or Cancel to use another view Figure 6 7 Warning that appears when you work with tracks containing many annotations 4 1 4 6 1 2 Identify Known Variants in One Sample WES The Identify Known Variants in One Sample WES ready to use workflow is a combined data analysis and interpretation ready to use workflow It should be used to identify known
140. a Management function which is described in section 4 1 4 5 1 2 Identify Known Variants in One Sample WGS The Identify Known Variants in One Sample WGS ready to use workflow is a combined data analysis and interpretation ready to use workflow It should be used to identify known variants specified by the user e g known breast cancer associated variants for their presence or absence in a sample Please note that the ready to use workflow will not identify new variants The Identify Known Variants in One Sample WGS ready to use workflow maps the sequencing reads to a human genome sequence and does a local realignment of the mapped reads to improve the subsequent variant detection In the next step only variants specified by the user are identified and annotated in the newly generated read mapping Import your known variants To make an import into the Biomedical Genomics Workbench you should have your variants in GVF format http www sequenceontology org resources gvf html or VCF format hnttp ga4gh org fileformats team Please use the Tracks import as part of the Import tool in the toolbar to import your file into the Biomedical Genomics Workbench How to run the Identify Known Variants in One Sample WGS workflow 1 Go to the toolbox and double click on CHAPTER 5 WHOLE GENOME SEQUENCING WGS 66 Toolbox Ready to Use Workflows Whole Genome Sequencing 5 General Workflows WGS Identify Known
141. a oaoa a a a a 4T 4 4 Prepare sequencing data aoao oaoa oaoa a a a a a e 48 CONTENTS 4 5 Whole genome sequencing WGS 60 5 1 General Workflows WGS aoaaa a a 61 ee POMC WOS seridir erke reire Steeda ew SESE 69 5 3 Hereditary Disease WGS 2 424 noaoo aa a a a a a 81 6 Whole exome sequencing WES 100 6 1 General Workflows WES aoao a a a a a baeencves 101 6 2 Somatic Cancer WES aooaa a a a 110 6 3 Hereditary Disease WES anaoa a a 133 7 Targeted amplicon sequencing TAS 160 7 1 General Workflows TAS aooaa 161 fad GOMA Cancer TAS ess ea we esw eee e we wee ae we e d 169 3 Hereditary Disease TAS a 192 8 Whole Transcriptome Sequencing WTS 219 8 1 Analysis of multiple samples a a ee 220 8 2 Annotate Variants WTS oaoa a a a a a eee mn we o 221 8 3 Compare variants in DNA and RNA aasa a a ee ee 225 8 4 Identify Candidate Variants and Genes from Tumor Normal Pair 230 8 5 Identify variants and add expression valueS no aoao oa oa 0 0000 ee es 235 8 6 Identify and Annotate Differentially Expressed Genes and Pathways 239 Iil Customized data analysis 243 9 How to edit application workflows 244 9 1 Introduction to customized data analysis aoa oa ee ee ee a 244 9 2 Howto edit preinstalled workflows 00 eee eee ee eee 244 10 Using data from other workbenches 247 10 1 Open outputs from other workbenches 0 000 eee eee 247 CONTENTS IV Plug
142. able is to start at the top and click on one of the Whole genome Whole exome or Targeted tabs found under the big DNA label if you are working on DNA seg data This acts to select the relevant application area This done when you click on a link within the DNA section of the table you will be directed to the section in the application based manual about that topic for example Annotate Variants that applies to that particular application area for example Whole genome analysis Likewise if you work on RNA seq data you can find relevant manual entries with the links provided under the big RNA label To the right side of the table is a box with two sections Getting started and Explore and learn The Getting started area contains links to the Tutorials http http gqgiagenbioinformatics com products biomedical genomics workbench Full length application based manual PDF and Full length reference manual PDF http www clcbio com support downloads manuals The Explore and Learn section provides links to different sections of the application based manual as well as a link to a web page where you can download example data Finally the Download example data provides links to two different example data sets This is described in section 2 1 2 2 1 2 Import of example data It might be easier to understand the logic of the program by trying to do simple operations on existing data Therefore Biomedical Gen
143. ack in the left hand side of the Genome Browser View a table that includes all variants and the added information annotations will open see figure 6 6 The table and the Genome Browser View are linked if you click on an entry in the table this particular position in the genome will automatically be brought into focus in the Genome Browser View You may be met with a warning as shown in figure 6 7 This is simply a warning telling you that it may take some time to create the table if you are working with tracks containing large amounts of annotations Please note that in case none of the variants are present in ClinVar or doSNP the corresponding annotation column headers are missing from the result Adding information from other sources may help you identify interesting candidate variants for further research E g common genetic variants present in the HapMap database or variants known to play a role in drug response or other clinical relevant phenotypes present in the ClinVar database can easily be identified Further variants not found in the ClinVar database can be prioritized based on amino acid changes in case the variant causes changes on the amino acid level A high conservation level between different vertebrates or mammals in the region containing the variant can also be used to give a hint about whether a given variant is found in a region with an important functional role If you would like to use the conservation scores to iden
144. activate the selection tool This can be used to select user defined regions An quick and easy way to zoom in on a particular region is to first use the selection tool which is activated by clicking on the arrow shown in figure 2 9 You can then select specific regions by clicking on the relevant point in the track and keeping the mouse button depressed dragging across the area that you wish to zoom in on This selects the region Once selected you can use CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 20 the Zoom to selection tool Shown in figure 2 10 to zoom in on the selected region It is also possible to zoom in just using the mouse hold down the Alt key while scrolling with the mouse wheel This zooms in or out on the region that is in focus in the View Area Figure 2 10 The Zoom to selection tool can be used to zoom in on a selected region Next to the Zoom to selection icon you can find the Zoom to fit icon that can be used to zoom all the way out The Zoom slider on the left side of the Zoom to selection can also be used to zoom in and out When clicking on the Zoom to selection icon you will zoom in on the region that you have selected and you will be able to see more and more details as you zoom in This is shown in figure 2 11 and figure 2 12 In figure 2 12 the presence of SNVs can be seen in the variant track and an overview of the mapping at that region in the mapping can be fo
145. ads Resequence dota Run workflow 1 S Preparing Raw Data Wy Prepare Overlapping Raw Data ig Prepare Raw Data Preparation of Raw Data ke L l r Inspect results Output Output ARR QC trimreports _ Prepared data i i f QC trim reports are not OK lt i if QC trim reports are OK 4 Use prepared data as input Run workflow 2 Run workflow 2 Run workflow 2 y f E if analysis gag While Genome Sequencing j Whole Emme Sequenang ep Targeted Amphion Sequencing beg Thammcriphormes Anaiysis aArestaw Variants AWS Ge Aresstate Variants WES Set Annotate Variants TAS FE Set Us Experiment Ru l kfl ow 2 She Fiter Somat Variants WGS BH Fiter Samat Variants WES SR Aiter Somate Variants TAS a LUN WOrKTiIOWw 296 Fiter Somatic Variants from Tumor Normal Pair WS a Fiter Somat Variants from Tumor Normal Pasir WES f Filter Somatic Variants from Tumor Normal Pair TAS whole Transcriptome Sequencng i MET Identify Known Variants in One Sampie DAGS DT Identify Known Variants in One Sampie VES SA Identify Known Variants in One Sampie TAS annotate Variants WTS ine Identify Variants WGS oo Identify Variants WES Be Identify Variants TAS NE Compare Variants in DMA and RMA Identify and Annotate Warisits NES Identify and Annptabe Variants TAS identify Candiclete Varian andi Genes from Tumar Homa Par ip den
146. ailable from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 9 Specify the Hapmap population that should be used to add information on variants found in the Hapmap project This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 10 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify and Annotate Variants WES HD workflow Six types of output are generated CHAPTER 6 WHOLE EXOME SEQUENCING WES 159 A 1 Reads Track A 1 Coverage Report Read Mapping e A 1 Per region Statistics Track A Filtered Variant Track Annotated variants e An Amino Acid Track Shows the consequences of the variants at the amino acid level in the context of the original amino acid sequence A variant introducing a stop mutation is illustrated with a red amino acid A Genome Browser View Chapter 7 Targeted amplicon sequencing TAS Contents 7 1 General Workflows TAS 0 0 0 8 eee ee et 161 7 1 1 Annotate Variants TAS sw 6 Gee me ee a we we we ee ee a
147. ailable in table view Click on the small table icon to open the table view If you hold down the Ctrl key while clicking on the table icon you will open a split view showing both the genome browser view and the table view You may be met with a warning as shown in figure 5 7 This is simply a warning telling you that it may take some time to create the table if you are working with tracks containing large amounts of annotations Please note that in case none of the variants are present in ClinVar or doSNP the corresponding annotation column headers are missing from the result P Warning You are about to display 172 890 annotations in a table view The workbench might be unresponsive while the new view is created Press OK to continue or Cancel to use another view Figure 5 7 Warning that appears when you work with tracks containing many annotations Adding information from other sources may help you identify interesting candidate variants for further research E g common genetic variants present in the HapMap database or variants known to play a role in drug response or other clinical relevant phenotypes present in the ClinVar database can easily be identified Further variants not found in the ClinVar database can be prioritized based on amino acid changes in case the variant causes changes on the amino acid level CHAPTER 5 WHOLE GENOME SEQUENCING WGS 65 A high conservation level between
148. airs reads that belong to broken pairs will be ignored Specify the parameters for the QC for Target Sequencing tool for the unaffected parent Specify the parameters for the QC for Target Sequencing tool for the affected parent Specify the parameters for the Fixed Ploidy Variant Detection tool for the unaffected parent figure 6 64 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage 10 Minimum count Minimum frequency gt Locked Settings Figure 6 64 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are CHAPTER 6 WHOLE EXOME SEQUENCING WES 142 e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be
149. al pairs The workflows found in the Somatic Cancer category use the Low Frequency Variant Detection for variant calling The advantages of using this variant caller when analyzing cancer data are that 1 it does not take ploidy into consideration and 2 it is particularly good at picking up low frequency variants in contrast to the other variant callers The workflows that are available in this category are e Filter Somatic Variants Removes variants outside the target region only targeted experi ments and common variants present in publicly available databases Annotates with gene names conservation scores and information from clinically relevant databases e Identify Somatic Variants from Tumor Normal Pair Removes germline variants by referring to the control sample read mapping removes variants outside the target region in case of a targeted experiment and annotates with gene names conservation scores amino acid changes and information from clinically relevant databases e Identify Variants Calls variants in the mapped and locally realigned reads removes false positives and in case of a targeted experiment removes variants outside the targeted region Variant calling is performed with the Low Frequency Variant Detection tool 3 3 Hereditary Disease The third category found under each of the three applications WGS WES and TAS are the Hereditary Disease workflows that have been developed to support identification of disea
150. ale v gt Locked Settings am n a previous gt Next Finish X Cancel Figure 3 4 Specify the proband s gender The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings m Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage Minimum count Minimum frequency b Locked Settings Figure 3 5 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant Site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than
151. aliii 223 Homo_sapiens_ensem bl_v74_CDS o ERR319085 ERR319085 Variants Annotated o 107 413 np_v138 Variants 8 659 871 FFF j i o 716 Clinvar_20131203 Variants 7 801 ii ia o GREE maia a Li an ae Sa sage Rey eS P n 2 799 Cosmic_v67 Variants 126 891 i Ja aall tattle Jalila tod teat be sh talh Nett ahl mindhik 8 309 1000GENOMES_phase_1 o 9 517 1000GENOMES_phase_1 _AMR o 12 443 4000GENOMES_phase_1 _AFR o 1 PhastCons_conservati on_scores_hg19 o 4 mt May Fa S Figure 5 5 The output from the Annotate Variants ready to use workflow is a genome browser view a track list containing individual tracks for all added annotations Note Please be aware that if you delete the annotated variant track this track will also disappear from the genome browser view It is possible to add tracks to the Genome Browser View such as mapped sequencing reads as well as other tracks This can be done by dragging the track directly from the Navigation Area to the Genome Browser View CHAPTER 5 WHOLE GENOME SEQUENCING WGS 64 If you double click on the name of the annotated variant track in the left hand side of the Genome Browser View a table that includes all variants and the added information annotations will open see figure 5 6 The table and the Genome Browser View are linked if you click on an entry in the table this particular position in the ge
152. alysis of your sequencing data see figure 4 31 CHAPTER 4 GETTING STARTED 59 Import data pesess ss gt Qs E Sequencing Reads Resequence data l Inspect results Output _ QC trimreports _ Prepared data g fQc trim reports are OK IfQC trim reports are not d Use prepared data as input Run workflow 2 Run workflow 2 Run workflow 2 analysis Arrevtaie Variants GS Aneytate Variants VES Fr Annotate Variants TAS Set Up Espernet l F Jy Filter Somate variants WGS Fitter omat variants WES SR Filter Somatic Variants TAS Run workflow 2 ER Piter Somatic Variants from Tumor Normal Pair V5 iter Somat Variants from Tumor formal Pair MES of Filter Somatic Variants fom Tumor Hermal Par TAS Jj Whole Transoipiome Sequenong Identify Known Variants in One Sample DAGS gf Identify Known Variants in One Sample WES AA Identify Known Variants in One Sample TAS annotate variants WTS int Identify Variants WGS E Ideritifiy Variants WES E Identify Variants TAS 2 Compare Variants in DMA and RHA X Identify and Annotate wariarits WES Identify and Annotate Variants TAS Identify Canckdlate Warients and Genes from Tumor Mores Pair t identify Vanants and Add Expression Values l g l R l ft identify and Annotate Differentially Expressed Genes and Pathenys Analysis of WGS data Analysis of WES data Analysis of TAS data Analysis of WTS data i it i
153. ameters for the Fixed Ploidy Variant Detection tool for the affected child 13 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters and it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify Causal Inherited Variants in a Trio WES workflow Six types of output are generated e Reads Tracks One for each family member The reads mapped to the reference sequence e Variants in One track for each family member The variants identified in each of the family members The variant track can be opened in table view to see all information about the variants e Putative Causal Variants in Child The putative disease causing variants identified in the child The variant track can be opened in table view to see all information about the variants e Gene List with Putative Causal Variants Gene track with the identified putative causal variants in the child The gene track can be opened in table view to see the gene names CHAPTER 6 WHOLE EXOME SEQUENCING WES 143 e Target Region Coverage Report One for each family member The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from each sample e Target Region Coverage One track for each individual When opened in table format it is possible to see a range of different information a
154. amily of Four WGS gree Identify Rare Disease Causing Mutations in Trio WGS fb iE Identify Variants WGS HD E gee Whole Exome Sequencing Targeted Amplicon Sequencing Whole Transcriptome Sequencing Figure 5 1 The eleven workflows available for analyzing whole genome sequencing data 5 1 General Workflows WGS 5 1 1 Annotate Variants WGS Using a variant track FF e g the output from the Identify Variants ready to use workflow the Annotate Variants WGS ready to use workflow runs an internal workflow that adds the following annotations to the variant track e Gene names Adds names of genes whenever a variant is found within a known gene e mRNA Adds names of MRNA whenever a variant is found within a known transcript e CDS Adds names of CDS whenever a variant is found within a coding sequence e Amino acid changes Adds information about amino acid changes caused by the variants e Information from ClinVar Adds information about the relationships between human varia tions and their clinical significance e Information from dbSNP Adds information from the Single Nucleotide Polymorphism Database which is a general catalog of genome variation including SNPs multinucleotide polymorphisms MNPs insertions and deletions InDels and short tandem repeats STRs e PhastCons Conservation scores The conservation scores in this case generated from a multiple alignment with a number of vertebrates describe the lev
155. an analysis with the default settings Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage 10 Minimum count Minimum frequency gt Locked Settings Figure 7 85 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for bio
156. and Structural ES ai Variants normal Ignore non specific matches InDels and Structural Minimum reed tenis Variants tumor Minimum coverage Low Frequency Variant Minimum count Detection Minimum frequency Base quality filter Read direction filter Direction frequency Read position filter Significance Relative read direction filter Significance Remove pyro error variants E In homopolymer regions with minimum length 3 With frequency below gt Locked Settings Figure 5 22 Specify the settings for the variant detection In this wizard step you can adjust the settings used for variant detection For a de scription of the different parameters that can be adjusted in the variant detection step we refer to the description of the Low Frequency Variant Detection tool in the Biomedical Genomics Workbench user manual http www clcsupport com biomedicalgenomicsworkbench current index php manual Low_Frequency_ Variant_Detection html If you click on Locked Settings you will be able to see all parameters used for variant detection in the ready to use workflow 4 Click on the button labeled Next to go to the step where you can adjust the settings for removal of germline variants figure 5 23 Click on the button labeled Next CHAPTER 5 WHOLE GENOME SEQUENCING WGS 16 2 Identify Somatic Variants from Tumor Normal Pair WGS Remove Germline Variants 1 Choose w
157. and choose to Save your results Note If you choose to open the results the results will not be saved automatically You can always save the results at a later point Output from the Identify Known Variants in One Sample WES The Identify Known Variants in One Sample WES tool produces five different output types e Read Mapping The mapped sequencing reads The reads are shown in different colors depending on their orientation whether they are single reads or paired reads CHAPTER 6 WHOLE EXOME SEQUENCING WES 109 and whether they map unambiguously For the color codes please see the descrip tion of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual View_settings_in_Side_Panel html e Target Regions Coverage 5 A track showing the targeted regions The table view provides information about the targeted regions such as target region length coverage regions without coverage and GC content e Target Regions Coverage Report The report consists of a number of tables and graphs that in different ways show e g the number length and coverage of the target regions and provides information about the read count per GC e Variants Detected in Detail Annotation track showing the known variants Like the Overview Variants Detected table this table provides information about the known variants Four columns starting w
158. ants WES Select the track of variants 1 Choose where to run Navigation Area Selected elements 1 T Geleet the track of Bea Fee tumor Reads locally realig variants results FFF SRR719299_1 trimmed Reads locally gt aad tumor Reads locally realigned Varia ads SRR719299 1 trimmed Reads locally gt 4 Ty j es Q zenter search term gt E Batch Figure 6 2 Select the variant track to annotate 2 Click on the button labeled Next The only parameter that should be specified by the user is which 1000 Genomes population yo use figure 6 3 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 i Annotate Vanants WES 1000 Genomes 1 Choose where to run 1000 Genomes 1000GENOMES phase_1 FUR Select the track of Wariants 6 1000 Genomes Previous Figure 6 3 Select the relevant 1000 Genomes population s 3 Click on the button labeled Next to go to the last wizard step figure 6 4 In this wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters In the Preview All Parameters wizard you can only check the settings and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the re
159. arget region length coverage regions without coverage and GC content 5 Target Region Coverage Report Tumor The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from the tumor sample 6 Variants A variant track holding the identified variants that are found in the targeted resions The variants can be shown in track format or in table format When holding the mouse over the detected variants in the Genome Browser view a tooltip appears with information about the individual variants You will have to zoom in on the variants to be able to see the detailed tooltip Annotated Somatic Variants FF A variant track holding the identified and annotated somatic variants The variants can be shown in track format or in table format When holding the mouse over the detected variants in the Genome Browser view a tooltip appears with information about the individual variants You will have to zoom in on the variants to be able to see the detailed tooltip 8 Genome Browser View Tumor Normal Comparison l A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions the mapped reads for both normal and tumor the annotated somatic variants information from the ClinVar database and finally a track showing the conservation score see figure 6 33 6 2 3 Identify Variants WE
160. ariants workflow should be extended by the Identify Candidate Tool configured with the Filter Criterion The Biomedical Genomics Workbench reference manual has a chapter that describes this in detail http www clcbio com support downloads manuals see chapter Workflows for more information on how pre installed workflows can be extended and or edited Note Sometimes the databases e g doSNP are updated with a newer version or maybe you have your own version of the database In such cases you may wish to change one of the used databases This can be done with Data Management function which is described in section 4 1 4 7 2 2 Identify Somatic Variants from Tumor Normal Pair TAS The Identify Somatic Variants from Tumor Normal Pair TAS ready to use workflow can be used to identify potential somatic variants in a tumor sample when you also have a normal control sample from the same patient When running the Identify Somatic Variants from Tumor Normal Pair TAS the reads are mapped and the variants identified An internal workflow removes germline variants that are found in the mapped reads of the normal control sample and variants outside the target region are removed as they are likely to be false positives due to non specific mapping of Sequencing reads Next remaining variants are annotated with gene names amino acid changes conservation scores and information from clinically relevant databases like ClinVar variants with clini
161. ations in a Trio WES identifies de novo and compound heterozygous variants from a Trio The workflow includes a back check for all family members The Identify Rare Disease Causing Mutations in a Trio WES ready to use workflow accepts sequencing reads as input How to run the Identify Rare Disease Causing Mutations in a Trio WES workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Rare Disease Causing Mutations in a Trio WES workflow go to Toolbox Ready to Use Workflows Whole Exome Sequencing i Hereditary Disease Identify Rare Disease Causing Mutations in a Trio WES 527 1 Double click on the Identify Rare Disease Causing Mutations in a Trio WES tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the sequencing reads from the father figure 6 71 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that Should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Select sequencing reads 1 Choose whe
162. ausal Variants in Child The putative disease causing variants identified in the child The variant track can be opened in table view to see all information about the variants Gene List with Putative Causal Variants Gene track with the identified putative causal variants in the child The gene track can be opened in table view to see the gene names Target Region Coverage Report One for each family member The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from each sample Target Region Coverage One track for each individual When opened in table format it is possible to see a range of different information about the targeted regions such as target region length read count and base count An Amino Acid Track Shows the consequences of the variants at the amino acid level in the context of the original amino acid sequence A variant introducing a stop mutation is illustrated with a red amino acid Genome Browser View This is a collection of tracks shown together in a view that makes it easy to compare information from the individual tracks such aS compare the identified variants with the read mappings and information from databases 7 3 3 Identify Causal Inherited Variants in Trio TAS The Identify Causal Inherited Variants in a Trio TAS ready to use workflow identifies putative disease causing inherited variants by creating a list of variants present in both affected i
163. be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 Bx xi Remove Variants Found in HapMap 3 1 Select variant tracks Ca a track Sdlected 12 pren 1000 Genomes population Remoye Variants Found in n Bx 1000 Genomes Project Select Variant track Remove Variants Found in HapMap 3 Selected HAPMAP _phase_3_MKK HAPMAP_phase_3_CHD HAPMAP_phase_3 TSI HAPMAP_phase_3_CHB A HAPMAP phase 3 GIH HAPMAP_phase_3 HCB HAPMAP _phase_3_LWK HAPMAP_phase_3_CEU HAPMAP_phase_3_MEX HAPMAP_phase_3_YRI HAPMAP_phase_3_JPT HAPMAP_phase_3_ASW Figure 5 47 Select the relevant Hapmap population s Specify the Hapmap populations that should be used for filtering out variants found in Hapmap for the mother Specify the Hapmap populations that should be used for filtering out variants found in Hapmap from the de novo assembly Set up the parameters for the Fixed Ploidy Variant Detection tool for the affected child Set up the parameters for the Fixed Ploidy Variant Detection tool for the father CHAPTER 5 WHOLE GENOME SEQUENCING WGS 97 12 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters a
164. be performed using the tools contained in the Tools section Full analyses can be run this way or such tools can be used upstream or downstream of workflow based data analyses The tools that relevant for CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 15 Ready to Use Workflows Preparing Raw Data P e Prepare Overlapping Raw Data ti Prepare Raw Data E b Whole Genome Sequencing ai E General Workflows WGS S Annotate Variants WGS5 itll Identify Known Variants in One Sample WGS Et EA Somatic Cancer WGS Dk ie Filter Somatic Variants WGS ie gil Identify Somatic Variants from Tumor Normal Pair WGS Lote Identify Variants WGS by Hereditary Disease WGS fees Filter Causal Variants WGS HD E Identify Causal Inherited Variants in Family of Four WGS Identify Causal Inherited Variants in Trio VGS bees ZE vee Identify Rare Disease Causing Mutations in Family of Four WGS gree Identify Rare Disease Causing Mutations in Trio WGS Ba AEE Identify Variants WWGS HD H rs Whole Exome Sequencing H m Targeted Amplicon Sequencing B E Whole Transcriptome Sequencing 5 a Human af Annotate Variants WTS ADE Compare Variants in DNA and RNA W Identify Candidate Variants and Genes from Tumor Normal Pair Boi Identify Variants and Add Expression Values e Identify and Annotate Differentially Expressed Genes and Pathways G r Mouse a Annotate Variants WTS M LES Compare Variants in DNA and RNA M WE Identif
165. beled Next to go to the last wizard step figure 7 39 183 In this wizard you get the chance to check the selected settings by clicking on the button labeled Preview All Parameters In the Preview All Parameters wizard step you can only check the settings and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows At the bottom of this wizard there are two buttons regarding export functions one button allows specification CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 184 Identify Variants TAS Remove Variants Outside Targeted Regions Choose where to run Targeted region track P target_regions Select sequencing reads InDels and Structural Variants QC for Target Sequencing Low Frequency Variant Detection 6 Remove Variants Outside Targeted Regions srs Figure 7 38 Select the targeted region track Variants found outside the targeted region will be removed P Identify Variants TAS Result handling Choose where to run Select sequencing reads Workflow parameters InDels and Structural Preview All Parameters Variants QC for Target Sequencing Result handling Low Frequency Variant Open Detection Save Remove Variants Outside Targeted Regions Log handling Result handling Open log ii ladan Figure 7 39 Cho
166. bench figure 11 1 Lo 2 a Figure 11 1 Click on the button labeled Plugins to download plugins The plugins that are available for Biomedical Genomics Workbench are e Advanced Peak Shape Tools Plugin e Batch Rename e Biobase Genome Trax Annotate e Biobase Genome Trax Download e Duplicate Mapped Reads Removal e Shannon Human Splicing Pipeline e Shannon Human Splicing Pipeline Client You can find a detailed description of how to download and install plugins in the Biomedi cal Genomics Workbench reference manual in chapter Introduction to Biomedical Genomics Workbench section Plugins 249 Part V Appendix Appendix A Reference data overview Human hg19 e Human reference sequence ENSEMBL ftp ftp ensembl org pub current_fasta homo_sapiens dna Chromosomes 1 22 X Y and M human reference DNA sequence GRCh3 7 HG19 e Human genes coding sequences and transcripts ENSEMBL ftp ftp ensembl org pub current_gtf homo_sapiens All annotated protein coding genes for human reference sequence GRCh3 HG19 The annotation was done by ENSEMBL and includes annotations from RefSeq CCDS as well as ENSEMBL itself e HapMap variants ENSEMBL ftp ftp ensembl org pub current_variation gvf homo_sapiens The goal of the International HapMap Project is to develop a haplotype map of the human genome the HapMap which will describe the common patterns of human DNA sequence variation for more information about HapMap see
167. bl_v73_CDS mmm a ea Ie CDS annotations 1 996 L m ame e a aaaea o M O HU aa atie tumor_01 paired Reads Pi iS i a a SS locally realigned 4 203 899 reads _ gi a nt tumor_01 paired Variants 77 553 tumor_01 paired Reads locally realigned InDel Variants 143 tumor_01 paired Reads locally realigned SV Complex insertion Replacement annotations 40 4 r FES tumor_01 pai X Rows 143 Chromosome Region Reference 20 42692413 42692444 TATACATATATATGTACACACACACACACACA 20 4292564342925644 in 20 4365499043654991 20 44059782 44059783 20 4418836544188366 sla Create Track from Selection oE Figure 6 41 Genome Browser View with an open track table to inspect identified variants more closely in the context of the human genome To import the file Go to the toolbar Import 4 Tracks E How to run the Identify and Annotate Variants WES workflow To run the Identify and Annotate Variants WES workflow go to Toolbox Ready to Use Workflows Whole Exome Sequencing f Somatic Cancer Fae Identify and Annotate Variants WES SE 1 Double click on the Identify and Annotate Variants WES tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis Click on the button labeled Next 2 You can select the se
168. bout the targeted regions such as target region length read count and base count e An Amino Acid Track Shows the consequences of the variants at the amino acid level in the context of the original amino acid sequence A variant introducing a stop mutation is illustrated with a red amino acid e Genome Browser View This is a collection of tracks shown together in a view that makes it easy to compare information from the individual tracks such as compare the identified variants with the read mappings and information from databases 6 3 4 Identify Rare Disease Causing Mutations in Family of Four WES You can use the Identify Rare Disease Causing Mutations in a Family of Four WES ready to use workflow to identifie de novo and compound heterozygous variants from an extended family of four where the fourth individual is not affected The Identify Rare Disease Causing Mutations in a Family of Four WES ready to use workflow accepts sequencing reads as input How to run the Identify Rare Disease Causing Mutations in a Family of Four WES workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Rare Disease Causing Mutations tn a Family of Four WES workflow go to Toolbox Ready to Use Workflows Whole Exome Sequencing f Hereditary Disease
169. broken pairs reads that belong to broken pairs will be ignored For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual QC_Target_Sequencing html Specify the parameters for the QC for Target Sequencing tool for the affected child Specify the parameters for the QC for Target Sequencing tool for the unaffected parent Specify the parameters for the QC for Target Sequencing tool for the affected parent Specify the parameters for the Fixed Ploidy Variant Detection tool for the unaffected parent CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 198 14 15 16 Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected parent Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected child Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters and it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify Causal Inherited Variants in a Family of Four TAS workflow Six types of output are generated Reads Tracks One for each family member The reads mapped to the reference sequence Variants in One track for each family member The variants identified in each of the family members The variant track can be opened in table view to see all information about the variants Putative C
170. cally relevant association Finally information from dbSNP is added to see which of the detected variants have been observed before and which are completely new CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 176 Import your targeted regions A file with the genomic regions targeted by the amplicon or hybridization kit is available from the vendor of the enrichment kit and sequencing machine To obtain this file you will have to get in contact with the vendor and ask them to send this target regions file to you You will get the file in either bed or gff format To import the file Go to the toolbar Import j Tracks E4 How to run the Identify Somatic Variants from Tumor Normal Pair TAS workflow 1 Go to the toolbox and double click on the Identify Somatic Variants from Tumor Normal Pair TAS ready to use workflow This will open the wizard shown in figure 7 23 where you can select the tumor sample reads a Identify Somatic Variants from Tumor Normal Pair TAS Select sequencing reads 1 Choose where to run Navigation Area Selected elements 1 2 Input for Reads from fa cancer_research_workbench a ERR319087 single SS ae whole_exome_sequencing whole_genome_sequencing targeted_amplicon_sequencing 1 melanoma_ffpe fresh_frozen i Qr lt enter search term gt Figure 7 23 Select the tumor sample reads When you have selected the tumo
171. ches Minimum read length Minimum coverage Minimum count Minimum frequency Base quality filter Read direction filter Direction frequency Read position filter Significance Relative read direction filter Significance Remove pyro error variants 1 0 100000 gt target_regions Reads 10 2 5 0 i 1 0 1 0 In homopolymer regions with minimum length 3 With frequency below Locked Settings Figure 6 37 Please specify the parameters for variant detection F Identify Variants WES Remove Variants Outside Targeted Regions Choose where to run Select sequencing reads InDels and Structural Variants QC for Target Sequencing Low Frequency Variant Detection Remove Variants Outside Targeted Regions Targeted region track gt target_regions 124 Figure 6 38 Select the targeted region track Variants found outside the targeted region will be removed wizard there are two buttons regarding export functions one button allows specification of the export format and the other button the one labeled Export Parameters allows specification of the export destination When selecting an export location you will export the analysis parameter settings that were specified for this specific experiment T Click on the button labeled OK to go back to the previous wizard step and choose Save Note
172. cnscessssccecassesssssessonsscsesced w 1000 Genomes gt Locked Settings QC for Target Sequencing pn Low Frequency Variant Detection Remove Variants Outside Targeted Regions a 7 Add Information from 1000 Genomes Project o Add Information from HapMap a Cers Cane rs E Figure 6 48 Select a population from the HapMap database This will add information from the Hapmap database to your variants 9 In this wizard step figure 6 49 you get the chance to check the selected settings by clicking on the button labeled Preview All Parameters In the Preview All Parameters wizard you can only check the settings and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows 10 Choose to Save your results and press Finish Note If you choose to open the results the results will not be saved automatically You can always save the results at a later point Output from the Identify and Annotate Variants WES workflow The Identify and Annotate Variants WES tool produces several outputs Please do not delete any of the produced files alone as some of them are linked to other outputs Please always delete all of them at the same time CHAPTER 6 WHOLE EXOME SEQUENCING WES 131 Identify and Annotate Variants WES Result handli Choose where to run Select sequencing reads Workflow parameters Prev
173. coverage and read count can be found in the table format which can be opened by pressing the table icon found in the lower left corner of the View Area 3 Target Regions Coverage Report The report consists of a number of tables and graphs that in different ways provide information about the targeted regions 4 Identified Variants A variant track holding the identified variants The variants can be shown in track format or in table format When holding the mouse over the detected variants in the Genome Browser view a tooltip appears with information about the individual variants You will have to zoom in on the variants to be able to see the detailed tooltip 5 Genome Browser View Identify Variants l A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions the mapped reads the identified variants and the structural variants see figure 6 5 It is important that you do not delete any of the produced files individually as some of the outputs are linked to other outputs If you would like to delete the outputs please always delete all of them at the same time Please have first a look at the mapping report to see if the coverage is sufficient in regions of interest e g gt 30 Furthermore please check that at least 90 of reads are mapped to the human reference sequence In case of a targeted experiment please also check t
174. cted to a server you will first be asked where you would like to run the analysis Next you will be asked to select the experiment to analyze figure 8 31 To select an experiment EES double click on the experiment file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Click on the button labeled Next 2 In the next wizard step you can specify the parameters to be used for extraction of differentially expressed genes Configurable Parameters CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 240 Identify and Annotate Differentially Expressed Genes and Pathways Select one experiment 1 Choose where to run Navigation Area 2 Select one experiment Q lt enter search term gt C Batch Figure 8 31 Select the experiment to analyze e Type of p value This drop down menu allows you to select between raw and cor rected p values For a description of these please see the Transcriptomics Chap ter section Corrected p values in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual Corrected_p_values html Only the types of p values available for the given statistical analysis will be present in the drop down menu e Maximum p value In this input field you can enter the maximum allowed p value as a number between O and 1 If you do not want any filtering based on p value enter 1 e M
175. cused on To expand the depth of the reads track to view more details of the reads in a specific region simply place the mouse cursor near the bottom of the left side of the genome Browser view where the track names are hold down the mouse and drag downwards This is illustrated in the lower left side of figure 2 12 Here the blue line with the arrow under it within a red circle illustrates where you would place the mouse cursor to be able to expand the depth of the track In this figure the four bases in the genomic reference sequence can be discerned via the color coding The color codes for each of the bases are A red C blue G yellow and T green Particular SNVs can also be discerned at this zoom level The color of the reads indicates whether a read is part of an intact pair blue is a single read or a member of a broken pair mapped in the forward direction relative to the reference green or a single read or a member of a broken pair mapped in the reverse direction relative to the reference red Reads that could map equally well to other locations in the reference are colored yellow Figure 2 13 shows the view after zooming in on one specific SNV By looking at the other tracks at that point we can see that this SNV is found in a gene The tooltip which comes up with the mouse cursor hovers over the SNV in the variant track reveals that this is a heterozygous mutation occurring in 29 out of 447 reads Full details about the variants
176. d Overview Variants Detected Variants 1 643 tumor_01 paired Variants Detected in Deta Variants 1 643 Rows 91 229 Table view Homo sapiens Reference Allele Zygosity Coverage Frequency Forward re Rey 56137167 SNV G G Yes 1 Homozygous 12 12 100 00 7 a 57484770 SNV C C Yes 1 Homozygous 12 12 100 00 8 57485862 SNV T T Yes 1 Homozygous 12 12 100 00 Fa 57605482 SNV T T Yes 1 Homozygous 12 12 100 00 7 4680398 SNV G G Yes 1 Homozygous 13 13 100 00 5 25004239 SNV G G Yes 1 Homozygous 13 13 100 00 9 31024380 SNV Cc E Yes 1 Homozygous 13 13 100 00 8 39793971 a G G Yes 1 Homozygous 100 00 10 42159010 C cC Yes 1 Homozygous 100 00 9 C a a a E e a e 2 30411309 c 1 Homozygous 9 ly j Figure 5 12 Genome Browser View with an open overview variant track with information about if the variant has been detected or not the identified zygosity if the coverage was sufficient at this position and the observed allele frequency 5 2 Somatic Cancer WGS 5 2 1 Filter Somatic Variants WGS If you are analyzing a list of variants that have been detected in a tumor or blood sample where no control sample is available from the same patient you can use the Filter Somatic Variants WGS ready to use workflow to identify potential somatic variants The purpose of this ready to use workflow is to use publicly available or your own databases with common variants in a population to extract potential soma
177. d 12 elements 1000 Genomes population Remoye Variants Found in 1000 Genomes Project Bx Select Variant track lt Remove Variants Found in HapMap 3 li aaan IHAPMAP_phase_3_ASW Figure 5 33 Select the relevant Hapmap population s 6 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes Choose to save the results and click on the button labeled Finish CHAPTER 5 WHOLE GENOME SEQUENCING WGS 84 Output from the Filter Causal Variants WGS HD workflow Three types of output are generated e An Amino Acid Track e A Genome Browser View e A Filtered Variant Track 5 3 2 Identify Causal Inherited Variants in Family of Four WGS As the name of the workflow implies you can use the Identify Causal Inherited Variants in a Family of Four WGS ready to use workflow to identify inherited causal variants in a family of four The family relationship can be a child a mother a father and one additional affected family member where in addition to the child the proband one of the parents are affected and one additional family member is affected The fourth family member can be any related and affected family member such as a sibling grand parent uncle or the like The Identify Causal Inherited Variants in a Family of Four WGS ready to use workflow accepts sequenci
178. d of input that should be provided figure 5 13 Select by double clicking on the reads file name or clicking once on the file and then clicking on the arrow pointing to the right side in the middle of the wizard Filter Somatic Variants WGS5 Select variant tracks Navigation Area 1 Choose where to run 2 Select variant tracks whole _genome_sequencing gt result_identify_variants bkk normal_0 1 paired trimmed pa E normal_O1 paired trimmed pa Q lt enter search term gt Batch Previous Figure 5 13 Select the variant track from which you would like to filter somatic variants Click on the button labeled Next 3 In the next step you will be asked to specify which of the 1000 Genomes populations that should be used for annotation figure 5 14 Click on the button labeled Next 4 The next wizard step will once again allow you to specify the 1000 Genomes population that should be used this time for filtering out variants found in the 1000 Genomes project figure 5 15 Click on the button labeled Next 5 The next wizard step figure 5 16 concerns removal of variants found in the HapMap database Select the population you would like to use from the drop down list Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 CHAPTER 5 WHOLE GENOME SEQUENCING WGS
179. dentified variants which may or may not have been subjected to different kinds of filtering and or annotation In this chapter we will discuss what the individual ready to use workflows can be used for and go through step by step how to run the workflows 100 CHAPTER 6 WHOLE EXOME SEQUENCING WES 101 Ready to Use Workflows H a Preparing Raw Data ii Whole Genome Sequencing el na Whole Exome Sequencing Eb Seat General Workflows WES S Annotate Variants WES heen LEN Identify Known Variants in One Sample WES ef a Somatic Cancer WES BiS Filter Somatic Variants WES vn JE Identify Somatic Variants from Tumor Normal Pair WES AE Identify Variants WES ze t Identify and Annotate Variants WES a ee Disease WES been b Filter Causal Variants WES HD o Identify Causal Inherited Variants in Family of Four WES ar Mey Causal Inherited Variants in Trio WES bee x wee Identify Rare Disease Causing Mutations in Family of Four WES gr Identify Rare Disease Causing Mutations in Trio WES na Identify Variants WES HD t Identify and Annotate Variants WES HD H i Targeted Amplicon Sequencing E E Whole Transcriptome Sequencing Figure 6 1 The eleven workflows available for analyzing whole exome sequencing data Note Often you will have to prepare data with one of the two Preparing Raw Data workflows described in section 4 4 before you proceed to Analysis of sequencing data WES 6 1 General Wor
180. dentify Differentially eee Expressed Gene Groups and Pathways Identify and Annotate Differentially Expressed Genes and Pathways Result handling 1 Choose where to run Workflow parameters 2 Select one experiment Freview All Parameters Extract Differentially Expressed Genes Result handling sails Open Identify Differentially l Expressed Gene Groups Save and Pathways 5 Result handling Log handling Open log a Previous f Figure 8 34 The results handling step Three different types of output are generated 1 Annotated Differentially Expressed Genes This is an annotation track that gives access to the expression values and other information This information can be accessed in two different ways e Hold the mouse over or right clicking on the track A tooltip will appear with information about e g gene name results of statistical tests expression values and GO information e Open the track in table format by clicking on the table icon in the lower left side of the View Area 2 Enriched Gene Groups and Pathways H8 A table showing the results of the GO enrichment analysis The table includes GO terms a description of the affected function pathway the number of genes in each function pathway the number of affected genes within the function pathway and p values 3 Genome Browser View Differentially Expressed Genes and Pathways lyp A collection of tracks presented together
181. des Number of 5 terminal nucleotides 1 Maximum number of nucleotides in reads 1000 Minimum number of nucleotides in reads 15 Discard short reads Remove 3 terminal nucleotides Number of 3 terminal nucleotides 1 Discard long reads Figure 4 28 Select your adapter trim list You can use the default trim parameters or adjust them if necessary P Prepare Raw Data handli ee Result handling 2 Select input for Prepare Raw Data Workflow parameters 3 Trim Sequences Preview All Parameters 4 Result handling Result handling Open Save Log handling Open log Previous Figure 4 29 Check the settings and save your results 4 4 5 Output from the Prepare Overlapping Raw Data and Prepare Raw Data work flows Different outputs are generated from the Prepare Overlapping Raw Data and Prepare Raw Data workflows Prepare Overlapping Raw Data Performs quality control and trimming of the sequencing reads and merging of overlapping read pairs and generates five different outputs 1 QC graphic report The report should be checked by the user 2 QC supplementary report The report should be checked by the user 3 Trimming report the trimmed sequences are automatically used as input in the merging of paired reads step The report should be checked by the user 4 Merged reads output Use as input together with the Not merged reads output in the nex
182. ds from the sample to be analyzed Select all sequencing reads from your sample If several samples should be analyzed the CHAPTER 6 WHOLE EXOME SEQUENCING WES 123 tool has to be run in batch mode This is done by selecting Batch tick Batch at the bottom of the wizard as shown in figure 6 42 and select the folder that holds the data you wish to analyze If you have your sequencing data in separate folders you should choose to run the analysis in batch mode When you have selected the sample s you wish to prepare click on the button labeled Next 2 In this wizard you can restrict calling of InDels and structural variants to the targeted regions by specifying the track with the targeted regions from the experiment figure 6 35 i Identify Variants WES InDels and Structural Variants Configurable Parameters 1 Choose where to run 2 Select sequencing reads Restrict calling to target regions target_regions oe 3 InDels and Structural Variants gt Locked Settings Figure 6 35 Select the track with the targeted regions from your experiment 3 In the next wizard step figure 6 36 you have to specify the track with the targeted regions from the experiment You can also specify the minimum read coverage which should be present in the targeted regions Identify Variants WES QC for Target Sequencing Configurable Parameters 1 Choose where to run
183. due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html 14 Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected child 15 Specify the parameters for the Fixed Ploidy Variant Detection tool for the father 16 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters and it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Rare Disease Causing Mutations in a Trio WES workflow Twelve different types of output are generated e Reads Tracks One for each family member The reads mapped to the reference sequence e Variant Tracks One for each family member The variants identified in each of the family members The variant track can be opened in table view to see all info
184. e The workflows use the Fixed Ploidy Variant Detection tool which is a variant caller that has been designed to call variants in samples with known ploidy from read mapping data Workflows designed to detect rare variants can both pick up de novo variants as well aS compound heterozygous variants In addition to the Trio and Family of Four workflows additional workflows exist that have been designed to pick up variants that are inherited from either the mother or the father The available workflows in this category are e Filter Causal Variants Removes variants outside the target region only targeted experi ments and common variants present in publicly available databases Annotates with gene names conservation scores and information from clinically relevant databases e Identify Causal Inherited Variants in a Family of Four Identifies putative disease causing inherited variants by creating a list of variants present in all three affected individuals and subtracting all variants in the unaffected individual The workflow includes a back check for all family members e Identify Causal Inherited Variants in a Trio Identifies putative disease causing inherited variants by creating a list of variants present in both affected individuals and subtracting all variants in the unaffected individual The workflow includes a back check for all family members e Identify Rare Disease Causing Mutations in a Family of Four dentifies de novo and com
185. e an important functional role and variants with a conservation score of more than 0 9 PhastCons score should be prioritized higher A further filtering of the variants based on their annotations can be facilitated using the table filter on top of the table If you wish to always apply the same filter criteria the Create new Filter Criteria tool should be used to specify this filter and the Identify and Annotate workflow should be extended by the Identify Candidate Tool configured with the Filter Criterion See the reference manual for more information on how preinstalled workflows can be edited Please note that in case none of the variants are present in ClinVar or dbSNP the corresponding annotation column headers are missing from the result In case you like to change the databases as well as the used database version please use the CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 192 Ify Genome Browse X 00 115 258 720 115 258 740 115 258 760 115 258 780 Homo_sapiens_ensembI_v73_mRNA Homo_sapiens_ensembI_v73_CDS 0 GGTT CT GGA TAGCT GGATT GT CAGT GCGCTTTT CCCAACACCACT GCT CCAACCACCACCAGTTTGTA ERR319087 single Reads TA CT GGATT GT CAGT GCGCTTTTICCCAACAC CACHE GCT CCAACCACCACCAGTTTGT locally realigned TAQCT GGATT GT CAGT GCGC TTTCCCAACACCACTICGCT CCAACCACCACCAGTTT GT 2 494 reads TAGTT GGATT GT CAGT GCGCTTTTCCCAACACCACMCGCT CCAACCA CACCAGTTT GT AGTT GGATT GT CAGT GCGCTTTT CCCAACAC CACME GCT CCAACCACCACCAGTT
186. e reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard E LAT fa Select sequencing reads dara ey Navigation Area Selected elements 1 2 Selectreads from Family of Four i Family member affected affected family member Affected child Father affected Mother unaffected Family member affected m Q lt enter search term gt Previous gt Next Figure 7 81 Specify the sequencing reads for the appropriate family member 3 Specify which 1000 Genomes population you would like to use figure 7 82 4 Specify a target region file for the Indels and Structural Variants tool figure 7 83 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 216 1000 Genomes population 1000 Genomes gt gt NOMES_phase_1_EUR 1000GENOMES_phase_1_AMR 1000GENOMES_phase_1_AFR j Figure 7 82 Select the relevant 1000 Genomes population s r InDels and Structural Variants Configurable Parameters Restrict calling to target re
187. e 6 25 a Bx Identify Somatic Variants from Tumor Normal Pair WES InDels and Structural Variants tumor Configurable Parameters Restrict calling to target regions target_regions gt Locked Settings Figure 6 25 Specify the target regions track 4 Click on the button labeled Next to go to the next wizard step figure 6 26 G Bx Identify Somatic Variants from Tumor Normal Pair WES Low Frequency Variant Detection Configurable Parameters 2 Select tumor sequencing Required significance 1 0 reads 1 Choose where to run Ignore positions with coverage above 100000 a RDSE 9 Restrict calling to target regions J target_regions Ignore broken pairs 4 InDels and Structural i Variants tumor Ignore non specific matches 5 Low Frequency Variant Minimum read length Detection Minimum coverage Minimum count Minimum frequency Base quality filter Read direction filter Direction frequency Read position filter Significance Relative read direction filter Significance Remove pyro error variants In homopolymer regions with minimum length 3 With frequency below 0 8 gt Locked Settings Figure 6 26 Specify the settings for the variant detection CHAPTER 6 WHOLE EXOME SEQUENCING WES 118 5 Click on the button labeled Next which will take you to the next wizard step figure 6 27 In this wizard step you can
188. e Causing Mutations in Trio TAS 207 Identify Rare Disease Causing Mutations in Trio WES 148 Identify Rare Disease Causing Mutations in Trio WGS 94 Identify Somatic Variants from Tumor Normal Pair TAS 175 Identify Somatic Variants from Tumor Normal Pair WES 116 Identify Somatic Variants from Tumor Normal Pair WGS 74 Identify Variants TAS 181 Identify Variants TAS HD 212 Identify Variants WES 121 Identify Variants WES naa 152 Identify Variants WGS 7 Identify Variants WGS o 97 Identify variants and add expression values 239 Import data 47 Menu Bar illustration 13 Navigation Area 259 INDEX illustration 13 Reference data 36 Configure 38 Download 38 References 25 RNA seq analysis Identify variants and add ex pression values 235 RNA seq differentially expressed genes and pathways 239 RNA seq identify candidate variants and differ entially expressed genes 230 Status Bar illustration 13 Toolbar illustration 13 Toolbox illustration 13 User interface 13 View Area illustration 13 260
189. e Fixed Ploidy Variant Detection tool for the affected family member figure 5 35 CHAPTER 5 WHOLE GENOME SEQUENCING WGS 85 Select sequencing reads 1 Choose where to run Navigation Area Selected elements 1 2 Select reads from gt Family of Four gr Family member affected affected family member Affected child Father affected Mother unaffected H Family member affected w Qr lt enter search term gt Previous gt Next Figure 5 34 Specify the sequencing reads for the appropriate family member The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage 10 Minimum count Minimum frequency gt Locked Settings Figure 5 35 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the
190. e Reference Reference Length Zygosity Exact match ie 7 55211116455211117 Insertion A No 1 Heterozygous 3 E te Sjeyetaba bala cote ey sata ba lal Insertion We No 1 Heterozygous 2 fs 55242512455242513 Insertion T No 1 Homozygous 3 Fi 55249063 SNV G A No 1 Homozygous Cosmic_v67 dosnp_v138 14 7 55249129455249130 Insertion A No 1 Heterozygous 1 4 l gt sla Create Track from Selection al EE E Figure 8 6 The output from the Annotate Variants ready to use workflow is a genome browser view a track list The information is also available in table view Click on the small table icon to open the table view If you hold down the Ctrl key while clicking on the table icon you will open a split view showing both the genome browser view and the table view You may be met with a warning as shown in figure 8 7 This is simply a warning telling you that it may take some time to create the table if you are working with tracks containing large amounts of annotations Please note that in case none of the variants are present in ClinVar or dbSNP the corresponding annotation column headers are missing from the result a Warning You are about to display 172 890 annotations in a table view The workbench might be unresponsive while the new view is created F Press OK to continue or Cancel to use another view Figure 8 7 Warning that appears when you work with tracks containing many annotations
191. e a Vem Xow Figure 7 32 Check the parameters and save the results reads or paired reads and whether they map unambiguously For the color codes please see the description of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual View_settings_in_Side_Panel html e Read Mapping Tumor The mapped sequencing reads for the tumor sample The reads are shown in different colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the description of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual EQUALS View_settings_in_Side_Panel html e Target Region Coverage Report Normal The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from the normal sample e Target Region Coverage Tumor gt A track showing the targeted regions The table view provides information about the targeted regions such as target region length coverage regions without coverage and GC content e Target Region Coverage Report Tumor The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from the tumor sample e Va
192. e reference data that are required to be able to run the ready to use workflows bl pe cp py Worepece Plugins Data Management Workfows Figure 4 4 Click on the button labeled Data management to open the Manage Reference Data dialog where you can download and configure the reference data that are necessary to be able to run the ready to use workflows Manage Reference Data Locally wv Free space on CLC_References lociiive Free space on temporary folder locion Server Figure 4 5 Reference data can be available locally or on the server When selecting a reference set or an element the window on the right show the size of the folder as well as some complementary information about the reference database For Reference Data Sets a table recapitulates the elements included in the set with their version number and respective size as well as a list of the workflows affected by the set Here is the list of the Reference Data Sets and their approximate size Reference Data Sets hg38 96 GB with Ensembl v81 dbSNP v142 ClinVar 20150901 hg38 88 GB with Ensembl v80 doSNP v142 ClinVar 20150629 hg19 63 GB with Ensembl v74 dbSNP v138 ClinVar 20131203 QIAGEN Gene Reads Panels hg19 8 MB with Ensembl v74 Mouse 15 GB with Ensembl v80 Rat 5 5 GB with Ensembl v79 Tutorial Reference Data Sets e chr 5 of hg19 4 5 GB for use with the Identification of Variants in a Tumor Sample tutorial CHAPTER 4 GETTING START
193. e workflows is that read data are used as input in one end of the workflow and in the other end of the workflow you get a track based genome browser view and a table with all the identified variants which may or may not have been subjected to different kinds of filtering and or annotation In this chapter we will discuss what the individual ready to use workflows can be used for and go through step by step how to run the workflows Note Often you will have to prepare data with one of the two Preparing Raw Data workflows described in section 4 4 before you proceed to Automatic analysis of sequencing data TAS 160 CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 161 Ready to Use Workflows al Preparing Raw Data ne on Whole Genome Sequencing iga Whole Exome Sequencing Targeted Amplicon Sequencing Eb General Workflows TAS i Annotate Variants TAS oa By Identify Known Variants in One Sample TAS hea Somatic Cancer TAS Bb Filter Somatic Variants TAS i E Identify Somatic Variants from Tumor Normal Pair TAS BA Identify Variants TAS A Identify and Annotate Variants TAS Eh ay Hereditary Disease TAS Filter Causal Variants TAS HD HR Identify Causal Inherited Variants in Family of Four TAS Identify Causal Inherited Variants in Trio TAS E Identify Rare Disease Causing Mutations in Family of Four TAS ger Identify Rare Disease Causing Mutations in Trio TAS BM Identify Variants TAS HD
194. eads i 3 1000 Genomes Figure 6 43 Select the population from the 1000 Genomes project that you would like to use for annotation 4 In the next wizard figure 6 44 you can select the target region track and specify the minimum read coverage that should be present in the targeted regions Bx Identify and Annotate Variants WES QC for Target Sequencing Configurable Parameters Choose where to run Select sequencing reads Track of TargetRegions gt target regions 1000 Genomes Minimum coverage 30 OC for Target Sequencing Ignore non specific matches Ignore broken pairs gt Locked Settings Cenon tee Figure 6 44 Select the track with targeted regions from your experiment 5 Click on the button labeled Next which will take you to the next wizard step fig ure 6 45 In this dialog you have to specify the parameters for the variant detection CHAPTER 6 WHOLE EXOME SEQUENCING WES 129 For a description of the different parameters that can be adjusted in the variant de tection step we refer to the description of the Low Frequency Variant Detection tool in the Biomedical Genomics Workbench user manual http www clcsupport com biomedicalgenomicsworkbench current index php manual Low_Frequency_ Variant_Detection html If you click on Locked Settings you will be able to see all parameters used for variant detection in the ready to use workflow
195. ease Causing Mutations in a Family of Four TAS tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis Select the targeted region file figure 7 65 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents Select input for targeted region file Navigation Area Selected elements 1 targeted_sequencing CTFR Cergentis AmpliSeq gt agilent_sure_select 54450293689 _Regions_BED uw Qr zenter search term gt p JE 50293689_Regions_BED vr ode Previous gt Next Figure 7 65 Select the targeted region file you used for sequencing 3 Select the sequencing reads from the unaffected sibling figure 66 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that Should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Selec
196. ect by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard i 1 Choose where to run Select sequencing reads Navigation Area Selected elements 1 2 Select reads from E Family of Four a Family member affected affected family member Affected child E Father affected gt E Mother unaffected ae eau affected Ca m a Qy lt enter search term gt Previous gt Next Figure 6 81 Specify the sequencing reads for the appropriate family member 3 Specify which 1000 Genomes population you would like to use figure 6 82 1000 Genomes population 1000 Genomes NOMES_phase_1_EUR 1000GENOMES_phase_1_AMR 1000GENOMES_phase_1_AFR pry Figure 6 82 Select the relevant 1000 Genomes population s 4 Specify a target region file for the Indels and Structural Variants tool figure 6 83 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents InDels and Structural Variants Configurable Parameters Restrict calling to target reg
197. ed but information is also provided for regulatory and non protein coding regions Eleven ready to use workflows are available for analysis of whole genome sequencing data figure 5 1 The concept of the pre installed ready to use workflows is that read data are used as input in one end of the workflow and in the other end of the workflow you get a track based genome browser view and a table with all the identified variants which may or may not have been subjected to different kinds of filtering and or annotation In this chapter we will discuss what the individual ready to use workflows can be used for and go through step by step how to run the workflows Note Often you will have to prepare data with one of the two Preparing Raw Data workflows described in section 4 4 before you proceed to Automatic analysis of sequencing data WGS 60 CHAPTER 5 WHOLE GENOME SEQUENCING WGS 61 Ready to Use Workflows E a Preparing Raw Data ang Whole Genome Sequencing General Workflows WGS Ge Annotate Variants WGS fd Identify Known Variants in One Sample WGS EA Somatic Cancer WGS i RS Filter Somatic Variants WGS gh Identify Somatic Variants from Tumor Normal Pair WGS indi Identify Variants WGS ag Hereditary Disease WGS Ms Filter Causal Variants WWGS HD i Identify Causal Inherited Variants in Family of Four WGS Identify Causal Inherited Variants in Trio WG E Identify Rare Disease Causing Mutations in F
198. el of nucleotide conservation in the region around each variant How to run the Annotate Variants WGS workflow 1 Go to the toolbox and select the Annotate Variants WGS workflow In the first wizard step select the input variant track figure 5 2 2 Click on the button labeled Next The only parameter that should be specified by the user is which 1000 Genomes population you use figure 5 3 This can be done using the CHAPTER 5 WHOLE GENOME SEQUENCING WGS 62 Bx Annotate Variants WGS Select the track of variants 1 Choose where to run aS Navigation Area Selected elements 1 2 Select the track of E whole_genome_sequencing ee Variants 5 6 result_identify_variants 5E normal_O1 paired trimmed ad normal_O1 paired trimmed aad normal_O1 paired trimmed Figure 5 2 Select the variant track to annotate drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 Annotate Vanants WGS 1000 Genomes Choose where to run rua 1000 Genomes 1000GENOMES phase_1 FUR Select the track of i variants 1000 Genomes Figure 5 3 Select the relevant 1000 Genomes population s 3 Click on the button labeled Next to go to the last wizard step figure 5 4 Bx Annotate Vanants WGS Result handling Choose where to run Workflow para
199. elevant Hapmap population s Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters and it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Rare Disease Causing Mutations in a Family of Four WGS workflow Eleven types of output are generated Read Mapping One for each family member The reads mapped to the reference sequence Filtered Variant Track One for each family member The variants identified in each of the family members The variant track can be opened in table view to see all information about the variants Read Mapping Report One for each family member The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from each sample De novo variants Variant track showing de novo variants in the proband The variant track can be opened in table view to see all information about the variants Recessive variants Variant track showing recessive variants in the proband The variant track can be opened in table view to see all information about the variants Identified Compound Heterozygous Genes Proband Gene track with the identified putative compound heterozygous Variants in the proband The gene track can be opened in table view to see the gene names Gene List with de novo Variants Gene track with the identified putative compound
200. els and Structural Variants Configurable Parameters 1 Select sequencing reads 2 er taki Structural Restrict calling to target regions gt 0293689_Regions_BED anan b Locked Settings Figure 6 9 Specify the targeted region file for the Indels and Structural Variants tool When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows This step is not optional and you need to specify the targeted regions file adapted to the sequencing technology you used Choose to use the default settings or to adjust the parameters i Bx Identify Known Variants in One Sample WES QC for Target Sequencing Configurable Parameters y Select sequencing reads Y InDels and Structural ts sie Track of Target Regions E S0293689_Regions_BED iS anan Minimum coverage 30 p for Target Sequencing Qe InCTaiget Seq Ignore non specific matches _ Ignore broken pairs b Locked Settings asas aas X can Figure 6 10 Specify the parameters for the QC for Target Sequencing tool The parameters that can be set are e Minimum coverage provides the length of each target region that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored For
201. ence sequence genes transcripts coding regions the mapped reads the identified variants and the structural variants see figure 5 5 Before looking at the identified variants we recommend that you first take a look at the mapping report to see whether the coverage is sufficient in the regions of interest e g gt 30 Furthermore CHAPTER 5 WHOLE GENOME SEQUENCING WGS 81 please check that at least 90 of the reads map to the human reference sequence In case of a targeted experiment please also check that the majority of reads map to the targeted region Next open the Genome Browser View file see figure 5 29 The Genome Browser View lists the track of the identified variants in context to the human reference sequence genes transcripts coding regions and mapped sequencing reads Ivy Genome Browse X 50 000 000 100 000 000 150 000 000 200 000 000 I I Homo_sapiens_sequence_hg19 78 Homo_sapiens_ensembl_v73_Genes Gene annotations 5 321 0 223 Homo_sapiens_ensembl_v73_CDS ara 1 T Liy bbe EPO TO W a PEF eee ee 881 SRR719300_1 paired trimmed ee realigned coverage La lta a ft EEY Allie Ll sk 13 843 00 SRR719300_1 paired trimmed paired Reads locally realigned 3 954 261 reads 0 00 652 SRR719300_1 paired trimmed l paired ry baal Th re N d eee ee _ a e ae ae a ll Figure 5 29 The Genome Browser View allows easy inspection of the identified s
202. ence track fp Targeted Amplicon Sequencing Variants 7 801 Whole Transcriptome Sequencing d Gene track Tools Pigs eer Wyler ae ne ei eee or us at ee ee lS eee Ley gt mRNA track E t Genome Browser 2 799 P Cosmic_v67 Ez AREAN Variants 126 891 Fpp Resequencing Analysis a Ja hail ie ish sh doa ak alld H t Add Information to Variants 8 303 z Remove Variants 1000GENOMES phase_1 44g Add Information to Genes Eu yy Compare Samples K o p Identify Candidate Variants 9 517 mp Identify Candidate Genes 1000GENOMES phase_1 Helper Tools o Eal Sanger Sequencing 12 443 E Workflows 1000GENOMES_phase_1 4 9 CLC Server AFR H E Legacy Tools a ses Toolbox Favorites Job Histo Job Board i ie Alh us iita uada pa lib Text format View Tools Ba 2aml n e e es E kg D Idle Statu S Ba I 46 items in 1Mbp region starting at aaa Figure 2 3 At the top you find the Menu Bar and under that the Toolbar The Navigation Area is on the left Here you can view and organize your data and from here you can open data to view select it for launching in applications Saved data will appear within this area The Toolbox is available in two locations in the Workbench One is in a tab of the pane below the Navigation Area The other is via the menu system The Toolbox is where Workflows and most tools that play a role in your data analysis are launched fro
203. er the variants have been detected they are annotated with gene names amino acid changes conservation scores information from clinically relevant variants present in the ClinVar database and information from common variants present in the common dbSNP HapMap and 1000 Genomes database Furthermore a targeted region report is created to inspect the overall coverage and mapping specificity The difference between Identify and Annotate Variants TAS HD and WES HD is that the Autodetect paired distances has been switched off in Map Reads to Reference tool for the TAS workflows How to run the Identify and Annotate Variants TAS HD workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify and Annotate Variants TAS HD workflow go to Toolbox Ready to Use Workflows Targeted Amplicon Sequencing Sequencing amp Hereditary Disease b Identify and Annotate Variants 3 1 Double click on the Identify and Annotate Variants TAS HD tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the sequencing reads you want to analyze figure 7 81 The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on th
204. ering out variants found in Hapmap for the father figure 4 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 8 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap for the mother 9 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap from the de novo assembly CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 209 Trio Analysis Configurable Parameters Child gender Female gt Locked Settings Previous gt Next Figure 7 73 Specify the proband s gender Select variant tracks 1000 Genomes population Remoye Variants Found in 1000 Genomes Project 7 Remove Variants Found in HapMap 3 an2a uaa Remove Variants Found in HapMap 3 HapMap database track Selected 12 elements EY Select Variant track Figure 7 74 Select the relevant Hapmap population s 10 Specify the parameters for the QC for Target Sequencing tool for the affected child figure 7 75 When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows Again you can choose to use the default settings or you can cho
205. ers for the Fixed Ploidy Variant Detection tool for the unaffected sibling figure 5 41 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs z Minimum coverage Minimum count Minimum frequency Locked Settings Figure 5 41 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant Site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9
206. es conservation scores and information from ClinVar known variants with medical impact and dbSNP all known variants How to run the Filter Somatic Variants TAS workflow To run the Filter Somatic Variants TAS tool go to Toolbox Ready to Use Workflows Targeted Amplicon Sequencing H Somatic Cancer 4 Filter Somatic Variants 1 Double click on the Filter Somatic Variants TAS tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Next you will be asked to select the variant track you would like to use for filtering somatic variants The panel in the left side of the wizard shows the kind of input that should be provided figure 7 15 Select by double clicking on the reads file name or clicking once on the file and then clicking on the arrow pointing to the right side in the middle of the wizard Click on the button labeled Next CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS Inip Genome Browse X 112 175 760 112 175 780 l 112 175 800 l l a A Homo_sapiens_ensem E bl_v74_mRNA mRNA annotations 7 333 Homo_sapiens_ensem bl_v74_CDS E F CDS annotations 4 039 ERR319085 Target Regions Coverage BED annotations 10 9 TGCTGATAC TTTATTACA TTTTGCCA C INGAAAGTGACTCCAGATGGA TTTTCTTGTTCATCCAGCCTGAGTG ERR319085 Read TGCTGATAC TTTATTACA TTTTGCCG C IAGAAAGT ACTCCAGATGGA TTTTCTTGTTCATCCAGCCTGAGTG
207. es and information from clinically relevant databases The Filter Causal Variants WES HD ready to use workflow accepts variants tracks files How to run the Filter Causal Variants WES HD workflow To run the Filter Causal Variants WES HD workflow go to Toolbox Ready to Use Workflows Whole Exome Sequencing f Hereditary Disease s Filter Causal Variants WES HD 55 1 Double click on the Filter Somatic Variants WES HD tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the variant track you want to use for filtering causal variants figure 6 52 The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on the variant track name or click once on the file and then click on the arrow pointing to the right side in the middle of the wizard Filter Causal Variants WGS HD Select variant tracks 1 Choose where to run PAP Selected elements 1 2 Select variant tracks Lobe SRR719300_1 paired trimmed gt Acinic cell carcinoma variants with bbb SRR719300_1 paired trimmed a _ a bbe Acinic cell carcinoma variants wi p 4 Q lt enter search term gt Batch Figure 6 52 Select the variant track from which you would like to filter somatic variants 3 Specify which of the 1000 Genomes populations sho
208. es Available for local download 28 New or missing versions of the reference data used in the cancer tools are available for download to the local CLC_References Do you want to open Data Management now Click below to dismiss this dialog for local references until new versions are available Never show this dialog again Figure 4 2 Notification that new versions of the reference data are available On the left hand side you can use the drop down menu to choose where you want to manage the reference data If you choose Locally the Download Delete and Apply buttons will work on the local reference data If you choose On Server only available if you are connected to the server the buttons will work on the reference data on the server you are connected to figure 4 5 You can also check how much free space is available for the Reference folder on your local disk or on the server The drop down menu also allows you to check which datasets have been downloaded locally or on the server You can see this in the left panel of the reference data manager When on the QIAGEN Reference Data Library tile we can see the list of all available references data under 4 headers Reference Data Sets Reference Data Elements Tutorial Reference Data Sets and Tutorial Reference Data Elements Two icons indicate whether you have already downloaded your data in your Reference folder 7 or not 4 CHAPTER 4
209. es a back check for all family members The Identify Rare Disease Causing Mutations in a Trio TAS ready to use workflow accepts sequencing reads as input How to run the Identify Rare Disease Causing Mutations in a Trio TAS workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Rare Disease Causing Mutations in a Trio TAS workflow go to Toolbox Ready to Use Workflows Targeted Amplicon Sequencing Sequencing 6 Hereditary Disease 22 Identify Rare Disease Causing Mutations in a Trio TAS 37 1 Double click on the Identify Rare Disease Causing Mutations in a Trio TAS tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 208 2 Select the sequencing reads from the father figure 7 71 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Select sequencing reads 1 Choose where to p7 g Navigation Area Selected
210. etected in Detail Annotation track showing the known variants Like the Overview Variants Detected table this table provides information about the known variants Four columns starting with the sample name and followed by Read Mapping coverage Read Mapping detection Read Mapping frequency and Read Mapping zygosity provides the overview of whether or not the known variants have been detected in the sequencing reads as well as detailed information about the Most Frequent Alternative Allele labeled MFAA 5 Genome Browser View Identify Known Variants l A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions target regions coverage the mapped reads the overview of the detected variants and the variants detected in detail It is a good idea to start looking at the Target Regions Coverage Report to see whether the coverage is sufficient in the regions of interest e g gt 30 Please also check that at least 90 of the reads are mapped to the human reference sequence In case of a targeted experiment we also recommend that you check that the majority of the reads are mapping to the targeted region When you have inspected the target regions coverage report you can open the Genome Browser View Identify Known Variants file See 7 13 The Genome Browser View includes an overview track of the known variants and a detailed
211. ew must be saved if you wish to keep the track that has been added CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 22 Inip Track List_1 X Homo_sapiens_sequence_hg19 Homo_sapiens_ensembl_v73_Genes Gene annotations 5 321 Homo_sapiens_ensembl_v73_CDS BetUS ee A CDS annotations 7 923 0 gt o e E kE p T EE n 3 _ _ i Lill g o eee e eee GE amaai Paired reads_Sample_1 locally ji zi peim Ee jam un 3 realigned 1 z y p 4 171 282 reads Ess H s peed el 3 _ _ po Say ba a2 in te aS antes ma 38 gt 2276 10 eae ara Rendon 7 1g e a lhl tale lakh ai TL i Inip Track List_1 X Homo_sapiens_sequence_hg19 Homo_sapiens_ensembl_v73_Genes Gene annotations 5 321 Homo_sapiens_ensembl_v73_CDS CDS annotations 7 923 o Paired reads_Sample_i locally realigned 1 4 171 282 reads 767 Sample_1 paired Reads 1 tally realigned Variants MVF 1 Variants 8 447 Figure 2 12 Zooming in reveals more details in all tracks CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS Inip Track List_1 X 2 16 902 820 l 16 902 840 18 902 844 16 902 860 16 902 880 l l Homo_sapiens_sequence_hg19 gt CCCT CAGCCAGCTGT TCT TGGAGGT CCT GCCCCTGGGACT TGT GT GGCTCATCCGGAGTGAGGAGGGCCTGGAGATGCTGAT TOTGGCT CAGCCGGAGT GAGGATGGCCT TGAGATGCTGAT T Homo_sapien
212. ew to see all information about the variants Gene List with Putative Causal Variants Gene track with the identified putative causal variants in the child The gene track can be opened in table view to see the gene names Target Region Coverage Report One for each family member The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from each sample CHAPTER 6 WHOLE EXOME SEQUENCING WES 139 e Target Region Coverage One track for each individual When opened in table format it is possible to see a range of different information about the targeted regions such as target region length read count and base count e An Amino Acid Track Shows the consequences of the variants at the amino acid level in the context of the original amino acid sequence A variant introducing a stop mutation is illustrated with a red amino acid e Genome Browser View This is a collection of tracks shown together in a view that makes it easy to compare information from the individual tracks such as compare the identified variants with the read mappings and information from databases 6 3 3 Identify Causal Inherited Variants in Trio WES The Identify Causal Inherited Variants in a Trio WES ready to use workflow identifies putative disease causing inherited variants by creating a list of variants present in both affected individuals and subtracting all variants in the unaffected individual The wor
213. f the Biomedical Genomics Workbench is the possibility to add delete and replace tools in the preinstalled workflows the tools found in the Application folder of the toolbox Moreover parameter settings can be unlocked or locked with different values The edited workflow can be installed in the Biomedical Genomics Workbench and Genomics Server as well as distributed between your collaborators When would it be relevant to edit a preinstalled workflow Example 1 You have an in house database with common variants identified in people from your local region You have imported the database variants as a track and would like to use this database for filtering out common variants instead of using HapMap 1000 Genomes data and common dbSNP Hence what you would like to do is to modify the Filter Somatic Variants workflow and replace the tools Add Information from HapMap Add Information from 1000 Genomes project and Add Information from common dbSNP with Add Information from External Databases 244 CHAPTER 9 HOW TO EDIT APPLICATION WORKFLOWS 245 Example 2 You would like to only see the known cancer associated variants and non synonymous variants in the result You have used the Create New Filter Criteria tool to create a new filter criterion and would like to extend the Identify Somatic Variants from Tumor Normal Pair to include the Identify Candidate Variants tool at the end How can I edit a workflow Click on Wo
214. ference sequence Next it runs a local realignment that is used to improve the variant detection that comes after the local realignment Two different variant callers are used the Low Frequency Variant Detection tool that is used to call small insertions deletions SNVs MNV and replacements and the InDel and Structural Variants caller that calls larger insertions deletions translocations and replacements By the end of the variant detection variants that have been detected by the Low Frequency Variant Detection caller with an average base quality smaller than 20 are filtered away A detailed mapping report is created to inspect the overall coverage and mapping specificity in the targeted regions How to run the Identify Variants WGS workflow To run the Identify Variants WGS workflow go to CHAPTER 5 WHOLE GENOME SEQUENCING WGS 18 Iv Genome Browse X 3 000 29 430 000 29 635 000 29 640 000 29 645 000 29 650 000 b iai l l Homo_sapiens_sequen Homo _sapiens_ensem bl_v r4_Genes Gene annotations 1 317 ML I 4 Homo _sapiens_ensem f bl_v74_ mRNA T mRNA annotations 3 572 Homo_sapiens_ensem T bl v74 cos T f CDS annotations 2 018 J T T T os i normal_ 4 paired HHz Read Mapping Normal J z i 3 977 560 reads 0 a tumor_01 paired Read PHG Mapping Tum 1 3 us 4207 953 re 64 6 normal_01 paired tumor_01 paired Annotated Somatic Variants Variants 1 573 a 6 Clinvar
215. fically for cancer research A core part of the Biomedical Genomics Workbench is the ready to use workflows that are bundled with reference data Workflows have been developed for the following applications e Whole Genome Sequencing e Whole exome Sequencing e Targeted Amplicon Sequencing CHAPTER 1 WELCOME TO BIOMEDICAL GENOMICS WORKBENCH 8 1 2 Available documentation The documentation for Biomedical Genomics Workbench can be found here http www clcbio com support downloads manuals Two manuals are available for Biomedical Genomics Workbench e The Biomedical Genomics Workbench application based manual This relatively short manual gives a basic introduction to Biomedical Genomics Workbench which includes a section on how to get started as well as describing how to use the different ready to use workflows for analysis of different types of sequencing data e The Biomedical Genomics Workbench reference manual This comprehensive manual explains the features and functionalities of the Biomedical Genomics Workbench in detail If you would like to use a CLC Server there are two additional manuals that are relevant e The CLC Server administrator manual This manual is for server administrators and describes how to install and manage CLC Servers e CLC Server end user manual This manual is for the users of the CLC Server In this manual you can find a description of how to use a CLC Server from a CLC Workbench 1 3 The mate
216. folder of this name did not already exist e The Workbench sets this new location as the place to download reference data to and the place the ready to use workflows should look for reference data This action does not e Remove the old CLC_References folder e Remove the contents of the old CLC_References folder such as previously downloaded data If you have previously downloaded data into the CLC_References folder with the old location you will need to use standard system tools to delete this folder and or its contents If you would like to keep the reference data from the old location you can move it using standard system tools into the new CLC_References folder that you just specified This would save you needing to download it again Note If you run out of Space and realize that the CLC_References should be stored somewhere else you can do this by choosing a new location then manually moving the already downloaded files to that new location and restarting the workbench The downloaded references file will then be updated with all the new references 4 1 2 Space requirements The total size of the complete reference data set you can download is approximately 200 GB The amount of time it will take to download this amount of data depends on your network connection It can take several hours or longer on slower connections For reference in August 2015 the maximum size of each individual reference data file for Homo
217. g a lt 7 Figure 7 4 Check the settings and save your results In this wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters In the Preview All Parameters wizard you can only check the settings and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 163 4 Choose to Save your results and click on the button labeled Finish Output from the Annotate Variants TAS workflow Two types of output are generated 1 Annotated Variants Annotation track showing the variants Hold the mouse over one of the variants or right clicking on the variant A tooltip will appear with detailed information about the variant 2 Genome Browser View Annotated Variants i A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions and variants detected in dbSNP ClinVar 1000 Genomes and PhastCons conservation scores see figure 7 5 Inip Genome Browse X 50 000 000 100 000 000 aaa b Homo_sapiens_sequen 79 Homo_sapiens_ensem bl_v74_Genes o 408 Homo_sapiens_ensem bl_v74_mRNA vv o 223 Homo_sapiens_ensem bl_v74_CDS vr o 11 ERR319085 ERR319085 Variants Annotated vv o 107 413 dbsnp_v138 Variants 8 659 871 i E i j
218. g report to see whether the coverage is sufficient in the regions of interest e g gt 30 Please also check that at least 90 of the reads are mapped to the human reference sequence When this has been done you can open the Genome Browser View file see 5 11 The Genome Browser View includes the overview track of known variants and the detailed result track in the context to the human reference sequence genes transcripts coding regions targeted regions and mapped sequencing reads iniy Genome Browse X 51 869 950 51 870 000 51 870 050 51 870 100 51 870 150 51 870 200 51 870 250 I I l 1 l I l Homo_sapiens_sequen lf iff 00 MANIU INIAN 0 Q TMH ON MADY 1 OL QUMDA OANA AY AO DRA ADONAN OU OQ OOQ COON QP OOA OA QUNDADI ON C0 OUOU OOUMD OAY OOQ QOO ONDA OOOD Q MOMON QO 0UOO OUONON OODOROOO ONOONO OO OANONUON ONO OOAD O OOMON QO 00 AD OAD ONN N Homo_sapiens_ensem bl_v74_Gene Gene annotations 1 317 e Homo_sapiens_ensem bl_v74_mRNA mRNA annotations 3 572 Homo_sapiens_ensem bl_v74_CDS CDS annotations 2 018 C O CELL Ly o SSS EE E TO _ Le tumor_01 paired Read apping 4 207 953 reads 1 tumor_01 paired Overview Variants Detected Variants 1 643 tumor_01 paired Variants Detected in Detail Variants 1 643 Mey Re OS Figure 5 11 Genome Browser View that allows inspection of the
219. g the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 9 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap for the father 10 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap from the de novo assembly 11 Specify the Fixed Ploidy Variant Detection settings that should be used for the for the affected child 12 Specify the Fixed Ploidy Variant Detection settings that should be used for the for the mother 13 Specify the Fixed Ploidy Variant Detection settings that should be used for the for the father CHAPTER 5 WHOLE GENOME SEQUENCING WGS 93 14 Bx a Remove Variants Found in HapMap 3 Select variant tracks HapMap database track Selected 12 elements oP 2 1000 Genomes population 3 Remoye Variants Found in Bx V 100b Genotties Project Bx Select Variant track 4 Remove Variants Found in Available Selected HAPMAP _phase_3_MKK HAPMAP _phase_3_CHD HAPMAP _phase_3_ TSI IHAPMAP_phase_3_CHB Y HAPMAP phase _3_GIH HAPMAP _phase_3_HCB HAPMAP _phase_3_LWK JHAPMAP _phase_3_CEU JHAPMAP _phase_3_MEX IHAPMAP_phase_3_YRI HAPMAP _phase_3_JPT HAPMAP_phase_3_ASW Figure 5 43 Select the r
220. gion file figure 6 79 When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows Again you can choose to use the default settings or you can choose to adjust the parameters The parameters that can be set are e Minimum coverage provides the length of each target region that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual QC_Target_Sequencing html item Specify the parameters for the Fixed Ploidy Variant Detection tool figure 6 80 CHAPTER 6 WHOLE EXOME SEQUENCING WES 154 QC for Target Sequencing Configurable Parameters Track of Target Regions gt S0293689_Regions_BED ve Minimum coverage 30 Ignore non specific matches Ignore broken pairs gt Locked Settings ees _ Cancel Figure 6 79 Specify the parameters for the QC for Target Sequencing tool The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings
221. gions E 50293689_Regions_BED r Locked Settings Figure 7 83 Specify the parameters for the Indels and Structural Variants tool 5 Specify the parameters for the QC for Target Sequencing tool including a target region file figure 7 84 r _QC for Target Sequencing Configurable Parameters Track of Target Regions E 0293689_Regions_BED S Minimum coverage 30 Ignore non specific matches Ignore broken pairs gt Locked Settings Figure 7 84 Specify the parameters for the QC for Target Sequencing tool The parameters that can be set are e Minimum coverage provides the length of each target region that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual QC_Target_Sequencing html 6 Specify the Fixed Ploidy Variant Detection settings including a target region file fig ure 7 85 CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 217 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run
222. gure 8 3 This can be done using CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 222 the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 fa Annotate Vanants WTS 1000 Genomes Choose where to run 1000 Genomes 1000GENOMES_phase_i_EUR 2 Select the track of Variants 1000 Genomes Previous lex Finish Cancel Figure 8 3 Select the relevant 1000 Genomes population s 3 Click on the button labeled Next to go to the last wizard step figure 8 4 i Annotate Variants WTS Result handling l Choose where torun Workflow parameters Preview All Parameters 2 Select the track of _Preview All Parameters _ variants Result handling 1000 Genomes i Open t Result handling B Save Log handling Open log 5 Cerean SE Figure 8 4 Check the settings and save your results In this wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters In the Preview All Parameters wizard you can only check the settings it is not possible to make any changes at this point 4 Choose to Save your results and click on the button labeled Finish Two types of output are generated 1 Annotated Variants F Annotation track showing the variants Hold the mouse over one
223. hase_1 ltl 7 0 1 PhastCons_conservati on_scores_hgi9 Graph 4 mt a Sy Figure 5 18 The Genome Browser View showing the annotated somatic variants together with a range of other tracks CHAPTER 5 WHOLE GENOME SEQUENCING WGS 3 The track with the conservation scores allows you to see the level of nucleotide conservation from a multiple alignment with many vertebrates in the region around each variant Mapped sequencing reads as well as other tracks can be easily added to the Genome Browser View If you click on the annotated variant track in the Genome Browser View a table will be shown that includes all variants and the added information annotations This is shown in figure 5 19 Ivy Genome Browse X 115 258 720 115 258 740 115 258 760 115 258 780 115 258 733 Homo_sapiens_sequence_hg19 TT CT GGATTAGCT GGATT GT CAGT GCGCT TTT CCCAACACCACIT GCT CCAAC CACCACCAGTTT GTACTCAGT CATTTCACACCAG Homo_sapiens_ensembI_v73_Genes Gene annotations 5 321 Homo_sapiens_ensembI_v73_mRNA mRNA annotations 15 412 Homo_sapiens_ensembI_v73_CDS CDS annotations 7 923 ERR319087 single Reads E locally realigned Variants Somatic Candidate Variants Variants 2 Cosmic_v67 Variants 126 891 ClinVar_20130930 Variants 3 496 1 00 Phast Cons _conservation_scores_hg19 Graph 0 00 4 il OE m gt TEE Ha Haee E ERR3 19087 si X Rows 16 Table view Genome
224. hase_3_LWK HAPMAP _phase_3_CEU HAPMAP _phase_3_MEX HAPMAP_phase_3_YRI HAPMAP _phase_3_JPT HAPMAP _phase_3 ASW Figure 7 67 Select the relevant Hapmap population s 8 Specify the affected child s gender figure 7 68 Trio Analysis Configurable Parameters Child gender Female gt Locked Settings previous gt Next Figure 7 68 Specify the proband s gender 9 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap for the father 10 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap from the mother CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 205 11 Specify the parameters for the QC for Target Sequencing tool for the sibling figure 7 69 When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows Again you can choose to use the default settings or you can choose to adjust the parameters _QC for Target Sequencing proband Configurable Parameters Minimum coverage 30 Ignore non specific matches Ignore broken pairs E gt Locked Settings ites malas _X cancel Figure 7 69 Specify the parameters for the QC for Target Sequencing tool The parameters that can be set are e Minimum coverage provides the length of each target regi
225. hat appear in at least 1 of the population or are 100 non reference filename snp142Common txt gz e ClinVar database variants NCBI tepi 7 Pep teb Nlm nin gov pub Clanvar veo GRO e ClinVar is designed to provide a freely accessible public archive of reports of the rela tionships among human variations and phenotypes with supporting evidence filename elanvar 20150629 V7er e PhastCons Conservation Scores UCSC http hgdownload cse ucsc edu goldenPath hg38 phastCons20way Conservation track of UCSC from a multiple alignments of 100 species and measurements of evolutionary conservation using the phastCons algorithm from the PHAST package filename hg38 phastCons20way wigFix e Human Gene Ontology GO slim file EBI http www ebi ac uk QuickGO GMultiTerm Gene Ontology file in slim format only high level GO terms annotated for the GO categories Molecular Function Biological Process and Cellular Component annotated on human genes The file was made using the QuickGO tool from the EBI http www ebi ac uk QuickGO GMultiTerm APPENDIX A REFERENCE DATA OVERVIEW 254 e target primers and target regions QIAGEN v2 https www gqiagen com dk shop sample technologies dna sample technologies genomic dna generead dnaseq gene panels v2 These primers and regions are defined and provided for by QIAGEN GeneRead DNAseq Targeted Panels V2 Mouse Mm10 e Mouse reference sequence ENSEMBL ftp ftp ensembl org pub release 80 fasta m
226. hat can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 CHAPTER 6 WHOLE EXOME SEQUENCING WES 151 Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage Minimum count Minimum frequency gt Locked Settings Figure 6 76 Specify the parameters for the Fixed Ploidy Variant Detection tool e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g
227. hat the majority of reads are mapping to the targeted region Afterwards please open the Genome Browser View file See 6 40 The Genome Browser View includes the track of identified variants in context to the human reference sequence genes transcripts coding regions targeted regions and mapped sequencing CHAPTER 6 WHOLE EXOME SEQUENCING WES 126 reads ly Genome Browse X 50 000 000 100 000 000 150 000 000 200 000 000 I l l l Homo_sapiens_sequence_hg19 78 Homo_sapiens_ensembl_v73_Genes Gene annotations 5 321 o 223 Homo_sapiens_ensembl_v73_CDS ia ETTI E E TA a eee ee 881 SRR719300_1 os trimmed paired Reads locally acs lhl dh Adil dae ha fe oad Milaan Ltt sk 13 843 00 SRR719300_1 paired trimmed paired Reads locally realigned 3 954 261 reads 0 00 652 SRR719300_1 paired trimmed paired uem h Pe i e ae oe d ee a a Bhn Libis se e oe eo il Figure 6 40 The Genome Browser View allows you to inspect the identified variants in the context of the human genome By double clicking on the variant track in the Genome Browser View a table will be shown which includes information about all identified variants see 6 41 In case you like to change the reference sequence used for mapping as well as the human genes please use the Data Management 6 2 4 Identify and Annotate Variants WES The Identify and Annotate Variants WES tool should be used
228. he individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 214 Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage Minimum count Minimum frequency gt Locked Settings Figure 7 80 Specify the parameters for the Fixed Ploidy Variant Detection tool e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum c
229. he parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called
230. here to run Configurable Parameters 2 Select tumor sequencing Keep variants with control read count below 2 reads 3 Select normal sequencing gt Locked Settings reads 4 InDels and Structural Variants normal 5 InDels and Structural Variants tumor 6 Low Frequency Variant Detection 7 Remove Germline Variants r Figure 5 23 Specify setting for removal of germline variants 5 In the next wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters figure 5 24 P Bx Identify Somatic Variants from Tumor Normal Pair WGS Result handling 1 Choose where to run 2 Select tumor sequencing reads Workflow parameters 3 Select normal sequencing Preview All Parameters reads 4 InDels and Structural Result handling Variants normal oe 5 InDels and Structural Kava Variants tumor ai 6 Low Frequency Variant Log handling Detection pee Z Remove Germline Variants 8 Result handling re previous gt Next Erish XK Cancel Figure 5 24 Check the parameters and save the results In the Preview All Parameters wizard you can only check the settings and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows At the bottom of this wizard there are two buttons regarding export functions one button allows specification
231. heterozy gous Variants in the proband The gene track can be opened in table view to see the gene names CHAPTER 5 WHOLE GENOME SEQUENCING WGS 94 e Gene List with recessive Variants Gene track with the identified recessive variants in the proband The gene track can be opened in table view to see the gene names e Genome Browser View This is a collection of tracks shown together in a view that makes it easy to compare information from the individual tracks such as compare the identified variants with the read mappings and information from databases e De novo Mutations Amino Acid Track e Recessive Variants Amino Acid Track 5 3 5 Identify Rare Disease Causing Mutations in Trio WGS The Identify Rare Disease Causing Mutations in a Trio WGS identifies de novo and compound heterozygous variants from a Trio The workflow includes a back check for all family members The Identify Rare Disease Causing Mutations in a Trio WGS ready to use workflow accepts sequencing reads as input How to run the Identify Rare Disease Causing Mutations in a Trio WGS workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Rare Disease Causing Mutations in a Trio WGS workflow go to Toolbox Ready to Use Workflows Whole Genome Sequencing Hereditary
232. hoose where to run 1000 Genomes 1000GENOMES_phase_1_EUR 2 Select sequencing reads 3 1000 Genomes Figure 7 43 Select the population from the 1000 Genomes project that you would like to use for annotation minimum read coverage that should be present in the targeted regions a Identify and Annotate Variants TAS QC for Target Sequencing Configurable Parameters Choose where to run Select sequencing reads Track of Target Regions gt target regions 0 1000 Genomes Minimum coverage 30 PPFP R rP QC for Target Sequencing Ignore non specific matches V Ignore broken pairs v gt Locked Settings Previous Figure 7 44 Select the track with targeted regions from your experiment 5 Click on the button labeled Next which will take you to the next wizard step fig ure 7 45 In this dialog you have to specify the parameters for the variant detection For a description of the different parameters that can be adjusted in the variant de tection step we refer to the description of the Low Frequency Variant Detection tool in the Biomedical Genomics Workbench user manual http www clcsupport com biomedicalgenomicsworkbench current index php manual Low_Frequency_ Variant_Detection html If you click on Locked Settings you will be able to see all parameters used for variant detection in the ready to use workflow 6 Click on
233. http hapmap ncbi nlm nih gov Please note that there are 12 different files tracks to be downloaded one file for each population It is recommended that you configure your workflows with the file from this population that best matches the ethnicity of the patient from which the sample was taken You can find more about the population codes which are part of the filename here http www sanger ac uk resources downloads human hapmap3 html e Variants found by the 1000 Genomes Project ENSEMBL ftp ftp ensembl org pub current_variation gvf homo_sapiens The 1000 Genomes Project Phase 1 created an integrated map of genetic variations from 1092 human genomes et al 2012 Please note that there are 4 different files tracks to be downloaded one file for each population It is recommended that you configure your workflows with the file from the population that bests matches the ethnicity of patient from which the sample was taken You can learn more about the population codes that are part of the filename here http www 1000genomes org e dbSNP variants UCSC http hgdownload soe ucsc edu goldenPath hg19 database snp138 txt gz 251 APPENDIX A REFERENCE DATA OVERVIEW 252 Human variants present in the Single Nucleotide Polymorphism Database dbSNP which in cludes smaller insertions deletions replacements SNPs and MNVs Please note that most variants in doSNP are not validated and everybody can submit data to dbSNP The c
234. iants TAS HD workflow go to Toolbox Ready to Use Workflows Targeted Amplicon Sequencing Sequencing amp Hereditary Disease Filter Candidate Variants TAS HD 5 1 Double click on the Filter Causal Variants TAS HD tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the variant track you want to use for filtering causal variants figure 7 52 The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on the variant track name or click once on the file and then click on the arrow pointing to the right side in the middle of the wizard CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 193 ws Filter Causal Variants WGS HD Select variant tracks 1 Choose where to run Navigation Area Selected elements 1 2 Select variant tracks i bb SRR719300_1 paired trimmed 4 gt gt Acinic cell carcinoma variants with bbb SRR719300_1 paired trimmed dam Acinic cell carcinoma variants wi i gt bbe Acinic cell carcinoma variants wi tha Q lt enter search term gt a Batch previous next Figure 7 52 Select the variant track from which you would like to filter somatic variants 3 Specify which of the 1000 Genomes populations that should be used for annotation figure 7 53 _1000 Ge
235. iants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called CHAPTER 6 WHOLE EXOME SEQUENCING WES 155 e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html 5 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify Variants WES HD workflow Four types of output are generated e A Reads Track Read Mapping e A Filtered Variant Track Identified variants e A Coverage Report e A Per region Statistics Track 6 3 Identify and Annotate Variants WES HD The Identify and Annotate Variants WES HD tool should be used to identify and annotate variants in one sample The tool consists of a workflow that is a combination of the Identify Variants and the Annotate Variants workflows The tool runs an internal wor
236. iants vhole_exome_sequencing Fee tumor Reads locally reali identified in tumor new_import EE tumor B results H P SRR719299_1 trimmed Read ce col 5RR719299_1 trimmed Read F Finish Cancel A Filter Somatic Variants WES 1000 Genomes 1 Choose where to run 1000 Genomes 1000GENOMES phase_1 FUR 2 Select Variants identified in tumor 3 1000 Genomes Figure 6 16 Specify which 1000 Genomes population to use for annotation Filter Somatic Variants WES Remove Variants Outside Targeted Regions Choose where to run Targeted region track gt target regions Select variants identified in tumor z 1000 Genomes Remove Variants Outside Targeted Regions Figure 6 17 Select your target regions track 5 The next wizard step will once again allow you to specify the 1000 Genomes population that should be used this time for filtering out variants found in the 1000 Genomes project figure 6 18 Click on the button labeled Next 6 The next wizard step figure 6 19 concerns removal of variants found in the HapMap database Select the population you would like to use from the drop down list Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 T Click on the button labeled Next to go to the last wizard step shown in figure 6 20
237. iants workflow should be extended by the Identify Candidate Tool configured with the Filter Criterion The Biomedical Genomics Workbench reference manual has a chapter that describes this in detail http www clcbio com support downloads manuals see chapter Workflows for more information on how pre installed workflows can be extended and or edited Note Sometimes the databases e g dbSNP are updated with a newer version or maybe you have your own version of the database In such cases you may wish to change one of the used databases This can be done with Data Management function which is described in section 4 1 4 8 3 Compare variants in DNA and RNA Integrated analysis of genomic and transcriptomic sequencing data is a powerful tool that can help increase our current understanding of human genomic variants The Compare variants in DNA and RNA ready to use workflow identifies variants in DNA and RNA and studies the relationship between the identified genomic and transcriptomic variants To run the ready to use workflow a Toolbox Ready to Use Workflows Whole Transcriptome Sequencing Human Mouse or Rat 2 Compare variants in DNA and RNA 5 Ll 1 Double click on the Compare variants in DNA and RNA ready to use workflow to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis Click on the button labeled Next 2 Select
238. id Track Anereated Variant Track sa Copy of Identify Known Variants in One Sample _ k m ERR319087 single Target Region Coverage Re N e E ERR319085 Read Mapping Tumor SN SS oe L EF enn nennne on m _ w Qr lt enter search term gt e Variant Tack Parameter yack RE Add Conservation Scores Announed Varian Track A e n Variant Track Known varianss Tack Toolbox kg A Acs tntormation from comic Ready to Use Workflows NAA a Preparing Raw Data f Whole Genome Sequencing soe ee Annotate Variants WGS ve o R L Filter Somatic Variants WGS 28 Identify Known Variants in One Sample WGS Arecated Variant Track 4 ia Add information from CanVar 2 R Identify Variants WGS Sy i fga Whole Exome Sequencing Variant Track Known varienss yack g Targeted Amplicon Sequencing Aca information trom dd ENP 3 Whole Transcriptome Sequencing Tools ty Genome Browser 2 Quality Control am Preparing Raw Data ypp Resequencing Analysis 4g Add Information to Variants x Remove Variants 4g Add Information to Genes Yy Compare Samples ppp Identify Candidate Variants f Identify Candidate Genes A Expression Analysis E Helper Tools faig Cloning and Restriction Sites ga Sanger Sequencing gh Epigenomics Analysis fp Workflows gg CLC Server Legacy Tools Annotated Varant Trac
239. id level A high conservation level on the position of the variant between many vertebrates or mammals can also be a hint that this region could have an important functional role and variants with a conservation score of more than 0 9 PhastCons score should be prioritized higher A further filtering of the variants based on their annotations can be facilitated using the table filter on top of the table If you wish to always apply the same filter criteria the Create new Filter Criteria tool should be used to specify this filter and the Identify and Annotate Variants WES workflow should be extended by the Identify Candidate Tool configured with the Filter Criterion See the reference manual for more information on how preinstalled workflows can be edited Please note that in case none of the variants are present in ClinVar or dbSNP the corresponding annotation column headers are missing from the result In case you like to change the databases as well as the used database version please use the Data Management CHAPTER 6 WHOLE EXOME SEQUENCING WES 133 6 3 Hereditary Disease WES 6 3 1 Filter Causal Variants WES HD If you are analyzing a list of variants you can use the Filter Causal Variants WES HD ready to use workflow to remove variants that are outside the target region as well as common variants present in publicly available databases The workflow will annotate the remaining variants with gene names conservation scor
240. iew All Parameters Low Frequency Variant Result handling Detection 1000 Genomes QC for Target Sequencing Open Remove Variants Outside Save Targeted Regions j Add Information from Log handling 1000 Genomes Project L Open log Add Information from HapMap Result handling Figure 6 49 Check the settings and save your results A good place to start is to take a look at the mapping report to see whether the coverage is sufficient in the regions of interest e g gt 30 Furthermore please check that at least 90 of the reads are mapped to the human reference sequence In case of a targeted experiment please also check that the majority of the reads are mapping to the targeted region Next open the Genome Browser View file See figure 6 50 The Genome Browser View includes a track of the identified annotated variants in context to the human reference sequence genes transcripts coding regions targeted regions mapped sequencing reads clinically relevant variants in the ClinVar database as well as common variants in common dbSNP HapMap and 1000 Genomes databases yy Genome Browse X 20 009 090 109 900 090 130 900 200 200 030 090 Homo_sapiens_sequence_hg19 78 Homo_sapiens_ensembl_v73_Genes 0 ee ee er eT a ERT RE ma kiaad s a Homo_sapiens_ensembI_v73_mRNA 0 ITAN E EE ES E a ee ee T ee 223 Homo_sapiens_ensemb I_v73_CDS EIS ee ee Te a Aila a an eee teers 2 ERR319087 s
241. ify the parameters for the QC for Target Sequencing tool e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual QC_Target_Sequencing html 5 Click on the button labeled Next and specify the track with the known variants that should be identified in your sample figure 7 11 E Identify Known Variants in One Sample TAS s E Identify Known Mutations from Sample Mappings 3 wii Configurable Parameters InDels and Structural Variant track bbb My known variants Variants Minimum coverage 10 b for Target Sequenci ne of fon Detection frequency 20 0 Identify Known Mutations from Sample Mappings gt Locked Settings Figure 7 11 Specify the track with the known variants that should be identified The parameters that can be set are e Minimum coverage The minimum number of reads that covers the position of the variant which is required to set Sufficient Coverage to YES e Detection frequency The minimum allele frequency that is required to annotate a variant as being present in the sample The same threshold will also be used to CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 168 determine if a variant is homozygous or heterozygous In case the most frequent alternative alle
242. ify the parameters for the QC for Target Sequencing tool for the affected parent 10 Specify the parameters for the Fixed Ploidy Variant Detection tool for the unaffected parent figure 7 64 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage Minimum count Minimum frequency gt Locked Settings Figure 7 64 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant ca
243. ill be removed T Click on the button labeled Next which will take you to the next wizard step figure 6 47 Once again select the relevant population from the 1000 Genomes project This will add information from the 1000 Genomes project to your variants CHAPTER 6 WHOLE EXOME SEQUENCING WES 130 m Identify and Annotate Variants WES Add Information from 1000 Genomes Project Configurable Parameters Choose where to run Select sequencing reads Known variants track 1000GENOMES_phase_1_EUR v 1000 Genomes gt Locked Settings QC for Target Sequencing Low Frequency Variant Detection Remove Variants Outside Targeted Regions Add Information from 1000 Genomes Project Finish X Cancel Figure 6 47 Select the relevant population from the 1000 Genomes project This will add information from the 1000 Genomes project to your variants 8 Click on the button labeled Next which will take you to the next wizard step figure 6 48 At this step you can select a population from the HapMap database This will add information from the Hapmap database to your variants P Identify and Annotate Variants WES Add Information from HapMap Configurable Parameters 1 Choose where to run 2 Select sequencing reads Known variants track HAPMAP phase 3 CEU iy Ueccceccccccccscsscee OM cccssccrsceneMMVeeMUrcccnsensesscnssensseessceescscsssssessenssensscess
244. in the table by clicking on this row this specific position in the track will be brought into focus CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 25 Dye Track List_1 X 40 724 460 724 480 724 500 l VARA I OVAA I VULVAR I UIAA PAA AA I VVAA I OGAATGGAATGGAATGTAATGGAACGGAATGGAAT OGAATGGAATGGAATGTAATGGAACGGAATGGAAT OGAATGGAATGGAATGTAATGGAACGGAATGGAAT OGAATGGAATGGAATGTAATGGAACGGAATGGAAT OGAATGGAATGGAATGTAATGGAACGGAATGGAAT BGAATGGAATGGAATGTAATGGAACGGAATGGAAT WGAATGGAATGGAATGTAATGGAACGGAATGGAAT BGAATGGAATGGAATGTAATGGAACGGAATGGAAT OGAATGGAATGGAATGTAATGGAACGGAATGGAAT AGATTEGAATGGAATCGAATGGAACAATATGGAAT OGAATGGAATGGAATGTAATGGAACGGAATGGAAT BUA A IGOR VARA I OCLVAAGAA I A I VVAA I VIl CGAATGGAATCGAATGGAACAATATGGAATGGTA EGAATGGAATEGAATGGAACAATATGGAATGGT EGAATGGAATCGAATGGAACAATATGGAATGGTA CGAATGGAATCGAATGGAACAATATGGAATGGTA CGAATGGAATEGAATGGAACAATATGGAATGGTA GGAATGGAATBGAATGGAAGAATATGGAATGGTA EBGAATGGAATEGAATGGAAEBAATATGGAATGGTA GGAATGGAATSGAATGGAAGAATATGGAATGGTA CGAATGGAATCGAATGGAACAATATCGAATGGTA ATCGAATGGAATGGACTGGAATGGAA ATGGAACAATATGGAATGGTA TATGGAATGGTARGGAATGGAATGGAATGTAATGGAACGGAATGGAAT GACTGGAATGGAAT GGATTEGAATGGAATCGAATGGAACAATATGGAAT GAAGGAGATGGAATT GTAAAAGAATGGACTCTAATGGAACGGAATCTAAT ATGGAACAATATGGAATGGTARQGAATGGAATGGAATCTAATGGAACGGAATGGAAT GGAATGGTACOGAATGGAATGGAATGTAATGGAACGGAATGGAAT GGAATGGTACOGAATGGAATGGAATGTAATGGAACGGAATGGAAT GAATGGTAROGGAATGGAA
245. ina l Save location for new elements Choose where to run Import files and options 3 2 Result handling Save location for new ba Can i Ma l Q zenter search term gt A aim Figure 4 18 Locate the folder in the Navigation Area that you have just created and save your imported reads in the folder 4 4 2 Import adapter trim list One important part of the preparation of raw data is adapter trimming To be able to trim off the adaptors an adapter trim list is required To obtain this file you will have to get in contact with the vendor and ask them to send this adapter trim list file to you When an adapter trim list has been supplied by the vendor of the enrichment kit and sequencing machine it must be formatted as a xls xlSx or csv list and imported into the Biomedical Genomics Workbench The adapter trim list can be imported by clicking on the button labeled Import in the Toolbar Select standard import figure 4 20 and find the adapter trim list you want to import Select Trim adapter list xls xlsx csv in the Files of type drop down list in the Import wizard Click on the button labeled Next and select where you wish to save the adapter trim list You can also create your own adapter trim list see http clcsupport com biomedicalgenomicsworkbench current index php manual Adapter_trimming html CHAPTER 4 GETTING STARTED 31 Import data fprteetnaaaa gt i P a ail Sequencing Re
246. ing reagents Identify Variants and Add Expression Values InDels and Structural Variants 1 Select sequencing reads Toisas EE 2 InDels and Structural Restrict calling to target regions gt 50293689_Regions_BED Variants gt Locked Settings F a Figure 8 26 Specify the target region for the Indels and Structural Variants tool 4 Set the parameters for the Low Frequency Variant Detection step see figure 8 27 For a description of the different parameters that can be adjusted in the variant detection CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 237 step see http clcsupport com biomedicalgenomicsworkbench current index php manual Low_Frequency_Variant_Detection html If you click on Locked Settings you will be able to see all parameters used for variant detection in the ready to use workflow Identify Variants and Add Expression Values _Low Frequency Variant Detection Select sequencing reads Configurable Parameters InDels and Structural Required significance 1 0 Varian Ignore positions with coverage above 100 000 000 Low Frequency Variant Detection Restrict calling to target regions Ignore broken pairs Ignore non specific matches Minimum read length Minimum coverage Minimum count Minimum frequency Base quality filter Read direction filter Direction frequency Relative read direction filter Significance Read
247. ingle Reads locally realigned coverage 0 602 00 ERR319087 single Reads locally realigned 2 494 reads 0 00 6 ERR319087 single Annotated Variants 0 2 799 Cosmic_v67 Variants 126 592 ly Kall EITE E td sate te Kats otal liens e T ClinVar_20130930 Variants 3 496 0 a h dLa L AT al Bi ral est ah dl dl all e E 1 l d eo A ul I a 8 309 1000GENOMES_phase_1_EUR Variants 1 332 748 0 13 978 dbsnp_common_v138 Variants 2 082 727 0 1 00 Phast Cons _conservation_scores_hg19 Graph Figure 6 50 Genome Browser View to inspect identified variants in the context of the human genome and external databases To see the level of nucleotide conservation from a multiple alignment with many vertebrates in CHAPTER 6 WHOLE EXOME SEQUENCING WES 132 the region around each variant a track with conservation scores is added as well By double clicking on the annotated variant track in the Genome Browser View a table will be shown that includes all variants and the added information annotations see figure 6 51 iy Genome Browse X 00 statis ail aca 115 238 760 at decd a Homo_sapiens_ensembI_v73_mRNA Homo_sapiens_ensembI_v73_CDS 0 GGTT CT GGA TAGCT GGATT GT CAGT GCGCTTTT CCCAACACCACT GCT CCAACCACCACCAGTTTGTA ERR319087 single Reads TA CTGGATT GT CAGT GCGCTTTTICCCAACAC CACHE GCT CCAACCACCACCAGTTTGT locally realigned TAGCT GGATT GT CAGT GCGC
248. inimum fold change value You can also specify the minimum allowed fold change value aS a number greater than zero If you do not want any filtering based on fold change enter O Identify and Annotate Differentially Expressed Genes and Pathways oe Extract Differentially Expressed Genes ae ee Configurable Parameters 2 Select one experiment Type of P value FDR p value correction Extract Differentially Maximum p value 0 05 Expressed Genes Minimum fold change value 1 5 Locked Settings Previous Figure 8 32 Select the parameters for extraction of differentially expressed genes 3 Click on the button labeled Next to go to the next step where you can choose the gene ontology type you wish to use 4 In the next step you can choose to preview the settings and save the results see figure 8 34 5 Click on the button labeled Preview All Parameters if you would like to preview the settings The parameters settings can be viewed but not edited in this view 6 Press OK specify where to save the results and then click on the button labeled Finish to run the analysis CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 241 Identify and Annotate Differentially Expressed Genes and Pathways Identify Differentially Expressed Gene Groups and Pathways Choose whereto run Configurable Parameters 2 Select one experiment GO type 3 Extract Differentially Exclude computationally inferred GO terms _ Expressed Genes I
249. ins 11 Plugins V Appendix A Reference data overview B Mini dictionary Bibliography VI Index 248 249 250 251 256 257 258 Part Introduction Chapter 1 Welcome to Biomedical Genomics Workbench Contents 1 1 Introduction to Biomedical Genomics Workbench 2 25858685 7 1 2 Available documentation 0 0 08 aaan 8 1 3 The material covered by this manual 0 0208 8 ee ee eee 8 1 4 We welcome your comments and suggestions a sos sos 08088 eae 8 1 5 Contact information 0 0 eee 8 Welcome to Biomedical Genomics Workbench 2 5 1 a software package supporting your daily bioinformatics work High throughput sequencing is currently revolutionizing both the cancer research and diagnostics areas Since the introduction of next generation sequencing NGS technologies the field has quickly moved forward with rapid improvements in sequencing capacity and the time required for data production As a result in many studies the sequencing process is no longer the bottleneck The bottleneck now is the bioinformatic analysis of the data Biomedical Genomics Workbench has been developed to address the bioinformatic bottleneck by offering automated workflows that cover all steps from the initial data processing and quality assurance through data analyses annotation and reporting 1 1 Introduction to Biomedical Genomics Workbench Biomedical Genomics Workbench has been developed speci
250. ions gt S0293689_Regions_BED b Locked Settings Figure 6 83 Specify the parameters for the Indels and Structural Variants tool 5 Specify the parameters for the QC for Target Sequencing tool including a target region file figure 6 84 The parameters that can be set are CHAPTER 6 WHOLE EXOME SEQUENCING WES 157 _QC for Target Sequencing Configurable Parameters Track of Target Regions Minimum coverage Ignore non specific matches Ignore broken pairs gt Locked Settings JE 50293689_Regions_BED 30 mis aas _ Cancel Figure 6 84 Specify the parameters for the QC for Target Sequencing tool e Minimum coverage provides the length of each target region that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual QC_Target_Sequencing html 6 Specify the Fixed Ploidy Variant Detection settings including a target region file fig ure 6 85 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing dat
251. ions that is labeled Hereditary Disease The workflows found in this folder can be used for studying variants that cause rare diseases or hereditary diseases HD The ready to use workflows found under each of the first three applications have similar names with the only difference that WGS WES or TAS or have been added after the name However some of the workflows have been tailored to the individual applications with parameter settings that have been adjusted to fit e g the expected differences in coverage between the different application types We therefore recommend that you use the ready to use workflow that is found under the relevant application heading 3 1 General Workflow The General workflows are universal workflows in the sense that they can be used independently of the disease that is being studied Two workflows exist in this category 28 CHAPTER 3 READY TO USE WORKFLOWS DESCRIPTIONS AND GUIDELINES 29 e Annotate Variants Annotates variants with gene names conservation scores amino acid changes and information from clinically relevant databases e Identify Known Variants in One Sample Maps sequencing reads and looks for the presence or absence of user specified variants in the mapping 3 2 Somatic Cancer The Somatic Cancer ready to use workflows are workflows that have been tailored to cancer research In this category it is possible to find e g workflows that can compare variants in matched tumor norm
252. ioritized over variants with lower conservation scores It is possible to filter variants based on their annotations This type of filtering can be facilitated using the table filter found at the top part of the table If you are performing multiple experiments where you would like to use the exact same filter criteria you can create a filter that can be saved and reused To do this Toolbox Identify Candidate Variants Create Filter Criteria CHAPTER 5 WHOLE GENOME SEQUENCING WGS _ o This tool can be used to specify the filter and the Annotate Variants workflow should be extended by the Identify Candidate Tool configured with the Filter Criterion The Biomedical Genomics Workbench reference manual has a chapter that describes this in detail http www clcbio com support downloads manuals see chapter Workflows for more information on how pre installed workflows can be extended and or edited Note Sometimes the databases e g dbSNP are updated with a newer version or maybe you have your own version of the database In such cases you may wish to change one of the used databases This can be done with Data Management function which is described in section 4 1 4 5 2 2 Identify Somatic Variants from Tumor Normal Pair WGS The Identify Somatic Variants from Tumor Normal Pair WGS ready to use workflow can be used to identify potential somatic variants in a tumor sample when you also have a
253. ir reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html 4 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify Variants WGS HD workflow Six types of output are generated e A Structural Variants e A Structural Variants Report e A Reads Track Read Mapping A Filtered Variant Track Identified variants e A Read Mapping Report A Genome Browser View Chapter 6 Whole exome sequencing WES Contents 6 1 General Workflows WES 0 00 ee ee eee et ee 101 6 1 1 Annotate Variants WES a a ee ee ee 101 6 1 2 Identify Known Variants in One Sample WES aoaaa aaa aaa 105 6 2 Somatic Cancer WES 0 08 eee eee ee 110 6 2 1 Filter Somatic Variants WES 2 0 0 0 00 ee eee ee ee 110 6 2 2 Identify Somatic Variants from Tumor No
254. ith the sample name and followed by Read Mapping coverage Read Mapping detection Read Mapping frequency and Read Mapping zygosity provides the overview of whether or not the known variants have been detected in the sequencing reads as well as detailed information about the Most Frequent Alternative Allele labeled MFAA e Genome Browser View Identify Known Variants A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions target regions coverage the mapped reads the overview of the detected variants and the variants detected in detail It is a good idea to start looking at the Target Regions Coverage Report to see whether the coverage is sufficient in the regions of interest e g gt 30 Please also check that at least 90 of the reads are mapped to the human reference sequence In case of a targeted experiment we alSo recommend that you check that the majority of the reads are mapping to the targeted region When you have inspected the target regions coverage report you can open the Genome Browser View Identify Known Variants file See 6 13 The Genome Browser View includes an overview track of the known variants and a detailed result track presented in the context of the human reference sequence genes transcripts coding regions targeted regions and mapped sequencing reads Finally a track with conservation scores ha
255. izard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected family member figure 7 58 CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 1 Select variant tracks N 1000 Genomes population QI Remove Variants Found in 1000 Genomes Project Remove Variants Found in HapMap 3 lis ma Remove Variants Found in HapMap 3 196 HapMap database track Selected 12 elements Select Variant track Figure 7 5 7 Select the relevant Hapmap population s Selected HAPMAP_phase_3_MKK IHAPMAP_phase_3_ CHD HAPMAP _phase_3_ TSI IHAPMAP_phase_3_CHB HAPMAP_phase_3_GIH HAPMAP _phase_3_HCB IHAPMAP_phase_3_LWK HAPMAP _phase_3_CEU IHAPMAP_phase_3_MEX IHAPMAP_phase_3_YRI IHAPMAP_phase_3_JPT HAPMAP_phase_3_ASW The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings r Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs
256. k Fa a m aa ee Ba oy Figure 9 2 Drag and drop the presintalled workflow in the workflow editor CHAPTER 9 HOW TO EDIT APPLICATION WORKFLOWS 246 You can remove tools connections or drag and drop new tools from the toolbox into the workflow editor How can install the edited workflow and where will it be in the toolbox After you have finished editing your workflow make sure that the validation of the workflow was successful and save your workflow design file Then click on the button labeled Installation This will open the wizard in figure 9 3 CR Create Installer x Aa Workflow information 1 Workflow information Workflow data Author name Author email Organization Workflow name Workflow ID Workflow icon io Workflow version oH l 1H Include original workflow file Workflow description HTML tags allowed a 2 amp gt Next X Cancel Figure 9 3 The Create Installer wizard to be used for workflow installation After you have added your details your name institution workflow name and a description of the workflow please click on the button labeled Next This will open the wizard shown in figure 9 4 install location Install location Install the workflow on your local computer Create an installer file to install it on another computer Previous X Cancel
257. kflow Toolbox Ready to Use Workflows Whole Transcriptome Sequencing E Human I al Mouse F or Rat E2 Identify Variants and Add Expression Values EE 1 Double click on the Identify Variants and Add Expression Values tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Specify the RNA seq reads to analyze The reads can be selected by double clicking on the reads file name or clicking once on the file and then clicking on the arrow pointing to the right side in the middle of the wizard figure 8 25 Identify Variants and Add Expression Values Select sequencing reads 1 Choose where to run Navigation Area 2 Select sequencing reads 4 23N_Ri_001 paired 42 26N_R1_001 paired 25T_R1_001 paired E 27N_R1_001 paired 27T_R1_001 paired 4 45N_R 1_001 paired 45T_R1_001 paired 4 Th gt zenter search term gt Batch Previous gt Next Figure 8 25 Select the sequencing reads to analyze Click on the button labeled Next 3 Specify a target region for the Indels and Structural Variants tool figure 8 26 The targeted region file is a file that specifies which regions have been sequenced This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequenc
258. kflow which starts with mapping the sequencing reads to the human reference sequence Then it runs a local realignment to improve the variant detection which is run afterwards After the variants have been detected they are annotated with gene names amino acid changes conservation scores information from clinically relevant variants present in the ClinVar database and information from common variants present in the common dbSNP HapMap and 1000 Genomes database Furthermore a targeted region report is created to inspect the overall coverage and mapping specificity How to run the Identify and Annotate Variants WES HD workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify and Annotate Variants WES HD workflow go to Toolbox Ready to Use Workflows Whole Exome Sequencing f Hereditary Disease gt Identify and Annotate Variants WES HD SY 1 Double click on the Identify and Annotate Variants WES HD tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis CHAPTER 6 WHOLE EXOME SEQUENCING WES 156 2 Select the sequencing reads you want to analyze figure 6 81 The panel in the left side of the wizard shows the kind of input that should be provided Sel
259. kflow includes a back check for all family members The Identify Causal Inherited Variants in a Trio WES ready to use workflow accepts sequencing reads as input How to run the Identify Causal Inherited Variants in a Trio WES workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Causal Inherited Variants in a Trio WES workflow go to Toolbox Ready to Use Workflows Whole Exome Sequencing i Hereditary Disease Identify Causal Inherited Variants in a Trio WES 227 1 Double click on the Identify Causal Inherited Variants in a Trio WES tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the sequencing reads from the unaffected parent figure 6 60 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that Should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard 3 Select the sequencing reads for the affected parent 4 Select the targeted region file figure 6 61 The targeted region file is a file that
260. kflows WES 6 1 1 Annotate Variants WES Using a variant track FFF e g the output from the Identify Variants ready to use workflow the Annotate Variants WES ready to use workflow runs an internal workflow that adds the following annotations to the variant track e Gene names Adds names of genes whenever a variant is found within a known gene e mRNA Adds names of MRNA whenever a variant is found within a known transcript e CDS Adds names of CDS whenever a variant is found within a coding sequence e Amino acid changes Adds information about amino acid changes caused by the variants e Information from ClinVar Adds information about the relationships between human varia tions and their clinical significance e Information from dbSNP Adds information from the Single Nucleotide Polymorphism Database which is a general catalog of genome variation including SNPs multinucleotide polymorphisms MNPs insertions and deletions InDels and short tandem repeats STRs e PhastCons Conservation scores The conservation scores in this case generated from a multiple alignment with a number of vertebrates describe the level of nucleotide conservation in the region around each variant CHAPTER 6 WHOLE EXOME SEQUENCING WES 102 How to run the Annotate Variants WES workflow 1 Go to the toolbox and select the Annotate Variants WES workflow In the first wizard step select the input variant track figure 6 2 Bx Annotate Vari
261. l first and then click on the row in the table view CHAPTER 2 Navigation Area BBO INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS yy Genome Browse X aac References E Downloaded References f SE fa homo_sapiens i 1000_genomes_project i ot Gy A dnvar EE 5 3 conservation_scores_phastcons tok hg19 i AAP hastCons_conservation_scores_h 4 m 50 009 000 saoe 000 1 02 900 000 Homo_sapiens_sequen 73 Homo_sapiens_ensem ar Goes 408 d Homo_sapiens_ensem r bl_v74_mRNA E bee PA Qr lt enter search term gt Toolbox Ready to Use Workflows al Preparing Raw Data ogg Whole Genome Sequencing ph Whole Exome Sequencing i Targeted Amplicon Sequencing A Whole Transcriptome Sequencing a Genome Browser Pn Resequencing Analysis E Add Information to Variants EX Remove Variants E Add Information to Genes W Compare Samples bp Identify Candidate Variants fo Identify Candidate Genes E E aaRAAA ee Bi A EE B E p 3367 12CPR_1 paired 1 Read Mapping Unaffected Parent n 4 Putative Causal ariants Proband Fie tT eae J 4 Gene Li t with Putative i I l Causal Variants ee ee eee eee TS Clinyar_20131203 Variants 7 301 i ae o oo aI a a B 8f ai 1 PhastCons_tonservati on_scores_hg19 o 223 YZ a VCD cost Amino Ji E
262. le at the position of the considered variant has a frequency of less than this value the zygosity of the considered variant will be reported as being homozygous The parameter Detection Frequency will be used in the calculation twice First it will report in the result if a variant has been detected observed frequency gt specified frequency or not observed frequency lt specified frequency Moreover it will determine if a variant Should be labeled as heterozygous frequency of another allele identified at a position of a variant in the alignment gt specified frequency or homozygous frequency of all other alleles identified at a position of a variant in the alignment lt specified frequency Click on the button labeled Next 6 In the last wizard step figure 7 12 you can check the selected settings by clicking on the button labeled Preview All Parameters m Bx Identify Known Variants in One Sample TAS 28 re 4 Result handling Select sequencing reads _ Workflow parameters 2 InDels and Structural Preview All Parameters ariants 3 QC for Target Sequencing Result handling 4 Identify Known Mutations Open from Sample Mappings Save 5 Result handling Log handling Open log o ere tee Figure 7 12 Check the settings and save your results At the bottom of this wizard there are two buttons regarding export functions one button allows specification of the export for
263. le name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Select sequencing reads Navigation Area Selected elements 1 2 Select reads from 3 63 Family of Four a i Family member affected affected family member Affected child Father affected Mother unaffected L gt Family member affected 1 1 Choose where to run Qr lt enter search term gt Previous gt Next Figure 6 66 Specify the sequencing reads for the appropriate family member 4 Select the sequencing reads from for the affected child 5 Select the sequencing reads from the mother 6 Select the sequencing reads from the father T Specify the affected child s gender figure 6 67 Trio Analysis Configurable Parameters Child gender Female gt Locked Settings o Ceres net Jr x Figure 6 67 Specify the proband s gender 8 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap for the mother figure 6 68 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 9 Specify the Hapmap populations that should be used for filtering out variants found i
264. le samples To analyze differential expression in multiple samples you need to tell the workbench how the samples are related This is done by setting up an experiment The tool that can be used to do this can be found here Toolbox Tools Transcriptomics Analysis amp Set Up Experiment Fz The output from the tool is an experiment which essentially is a set of samples that are grouped When setting up the experiment you define the relationship between the samples This makes it CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 221 possible to do statistical analysis to investigate the differential expression between the groups The experiment is also used to accumulate calculations like t tests and clustering because this information is closely related to the grouping of the samples How to set up an experiment is described in detail in the Biomedical Genomics Workbench reference manual under Setting up an experiment in Chapter Transcriptomics Analysis 8 2 Annotate Variants WTS Using a variant track FF e g the output from the Identify Variants and Add Expression Values ready to use workflow the Annotate Variants WGS ready to use workflow runs an internal workflow that adds the following annotations to the variant track e Gene names Adds names of genes whenever a variant is found within a known gene e mRNA Adds names of MRNA whenever a variant is found within a known transcript e CDS Adds names of CDS whene
265. levant windows 4 Choose to Save your results and click on the button labeled Finish Output from the Annotate Variants WES workflow Two types of output are generated 1 Annotated Variants FF Annotation track showing the variants Hold the mouse over one CHAPTER 6 WHOLE EXOME SEQUENCING WES 103 i Annotate Vanants WES Result handling 1 Choose where to run Workflow parameters Select the akaf Preview All Parameters variants Result handling 1000 Genomes Open Result handling F Save Log handling Open log Previous Figure 6 4 Check the settings and save your results of the variants or right clicking on the variant A tooltip will appear with detailed information about the variant 2 Genome Browser View Annotated Variants j A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions and variants detected in dbSNP ClinVar 1000 Genomes and PhastCons conservation scores see figure 6 5 Note Please be aware that if you delete the annotated variant track this track will also disappear from the genome browser view It is possible to add tracks to the Genome Browser View such as mapped sequencing reads as well as other tracks This can be done by dragging the track directly from the Navigation Area to the Genome Browser View If you double click on the name of the annotated variant tr
266. lled might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 202 For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html 11 Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected parent 12 Specify the parameters for the Fixed Ploidy Variant Detection tool for the proband 13 Pressing the button Preview All Parameters allows you to preview all paramete
267. llustrated with a red amino acid CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 194 Remove Variants Found in HapMap 3 Select variant tracks a a HapMap database track Selected 12 elements oP 2 1000 Genomes population 3 Remove Variants Found in Bx V 100b Genotties Project Bx Select Variant track 4 Remove Variants Found in Available Selected cag wey HAPMAP _phase_3_MKK HAPMAP _phase_3_CHD HAPMAP _phase_3_TSI IHAPMAP_phase_3_CHB 4 HAPMAP_phase_3_GIH HAPMAP _phase_3_HCB HAPMAP _phase_3_LWK HAPMAP _phase_3_CEU HAPMAP _phase_3_MEX IHAPMAP_phase_3_YRI HAPMAP _phase_3_JPT HAPMAP _phase_3_ASW Figure 7 54 Select the relevant Hapmap population s e A Genome Browser View e A Filtered Variant Track 7 3 2 Identify Causal Inherited Variants in Family of Four TAS As the name of the workflow implies you can use the Identify Causal Inherited Variants in a Family of Four TAS ready to use workflow to identify inherited causal variants in a family of four The family relationship can be a child a mother a father and one additional affected family member where in addition to the child the proband one of the parents are affected and one additional family member is affected The fourth family member can be any related and affected family member such as a sibling grand parent uncle or the like The Identify Causal Inherited Variants in a Family of Four TAS ready to use workfl
268. location you will export the analysis parameter settings that were specified for this specific experiment 4 Click on the button labeled OK to go back to the previous wizard step and choose Save Note If you choose to open the results the results will not be saved automatically You can always save the results at a later point 4 4 4 How to run the Prepare Raw Data ready to use workflow If you have sequencing reads without overlapping pairs you can use the Prepare Raw Data ready to use workflow for preparation of your sequences before you proceed to data analysis such as variant calling 1 Go to the toolbox and double click on the Prepare Raw Data ready to use workflow figure 4 26 Toolbox Ready to Use Workflows 5 3 Preparing Raw Data E any Whole Genome Sequencing fap Whole Exome Sequencing E Targeted Amplicon Sequencing m Figure 4 26 The ready to use workflows are found in the toolbox This will open the wizard shown in figure 4 27 where you can select the reads that you wish to prepare for further analyses CHAPTER 4 GETTING STARTED 95 P Prepare Raw Data L Siedma Select input for PrepareReads PrepareReads Navigation Area Selected elements 1 3 H CLC_Data a 23N_R1_001 paired Cancer i 23T_R1_001 paired 26N_R1_001 paired 26T_R1_001 paired 27N_R1_001 paired i 27T_R1_001 paired 4 Cancer WB 23N_R1_001 paired intermediate 23N_R1_
269. logical reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html T Specify a targeted region file to remove variants outside of this region figure 7 86 8 Specify the 1000 Genomes population that should be used to add information on variants found in the 1000 Genomes project This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 218 10 Select input for targeted region file Navigation Area Selected elements 1 J targeted_sequencing A PE 50293689_Regions_BED CTFR Cergentis AmpliSeq 7 gt agilent_sure_select 4 Ww j ee Qr lt enter search term gt Previous gt Next Figure 7 86 Select the targeted region file you
270. lson Williams C Farhi A Mane S and Lifton R P 2009 Genetic diagnosis by whole exome capture and massively parallel DNA sequencing Proc Natl Acad Sci U S A 106 45 19096 19101 Heap et al 2010 Heap G A Yang J H M Downes K Healy B C Hunt K A Bockett N Franke L Dubois P C Mein C A Dobson R J Albert T J Rodesch M J Clayton D G Todd J A van Heel D A and Plagnol V 2010 Genome wide analysis of allelic expression imbalance in human primary cells by high throughput transcriptome resequencing Hum Mol Genet 19 1 122 134 Martin and Wang 2011 Martin J A and Wang Z 2011 Next generation transcriptome assembly Nat Rev Genet 12 10 6 71 682 Ng et al 2009 Ng S B Turner E H Robertson P D Flygare S D Bigham A W Lee C Shaffer T Wong M Bhattacharjee A Eichler E E Bamshad M Nickerson D A and Shendure J 2009 Targeted capture and massively parallel sequencing of 12 human exomes Nature 461 7261 272 276 Wang et al 2009 Wang Z Gerstein M and Snyder M 2009 RNA Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1 57 63 257 Part VI Index 258 Index Annotate Variants Annotate Variants Annotate Variants Annotate Variants TAS 161 WES 101 WGS 61 WTS 221 SER A Bibliography 257 Compare variants in DNA and RNA 225 Compare variants in DNA and RNA 225 Co
271. ly you will only be presented with the dialog box when updated versions of the reference data are available Click on the button labeled Yes This will take you to the wizard shown in figure 4 3 This wizard can also be accessed from the upper right corner of the Biomedical Genomics Workbench by clicking on Data Management Fy figure 4 4 The Manage Reference Data wizard gives access to all the reference data that are used in the ready to use workflows and in the tutorials From the wizard you can download and configure the reference data In the upper part of the wizard you can find two tiles called QIAGEN Reference Data Library f and Custom Reference Data Sets fe CHAPTER 4 GETTING STARTED 39 Preferences Account T Password General Exclude hosts C Use Custom SOCKS Proxy Server SOCKS Host Port View You may have to restart the application for these changes to take effect FAVA Default Data Location Default Data Location CLC_ Data 7 NCBI BLAST URL to use when blasting http blast ncbi nlm nih gov Blast cgi Data Reference Data URL to use http reference clebio com CLC Server Login Save user name and password C Automatic login Figure 4 1 The location where reference data is downloaded from can be seen in the Workbench Preferences Generally this should not be altered except in the special case that the data from QIAGEN is being mirrored locally _ ES New Referenc
272. m Family member affected affected family member F Family of Four Affected child 4 Father affected Mother unaffected Family member affected iii Previous gt Next Figure 7 55 Specify the sequencing reads for the appropriate family member Select the sequencing reads from the unaffected parent Select the sequencing reads from the affected parent Select the targeted region file figure 7 56 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents Select input for targeted region file Navigation Area Selected elements 1 PE 0293689_Regions_BED J targeted_sequencing Cergentis AmpliSeq Q agilent_sure_select 5 4450293689_Regions_BED li lt enter search term gt j T Previous gt Next Figure 7 56 Select the targeted region file you used for sequencing 6 Select the reads for the affected child Specify the Hapmap populations that should be used for filtering out variants found in Hapmap figure 7 57 This can be done using the drop down list found in this w
273. m When opened datasets are shown in the View Area along with a Side Panel appears that allows you to customize the viewing options and also navigate to specific areas of the data At the bottom of a data view on the right are the View Tools that can be used for panning zooming and selection of specific regions At the bottom on the left are icons allowing to view data in a different way for example look at a table view of the data or view the history of actions taken on that dataset The Status Bar in the lower right corner indicates the location of a selection you have made or where the mouse pointer is pointing to within a dataset with co ordinates such as a track or sequence After a dataset is opened for example by double clicking on an item in one of the folders visible in the Navigation Area the user interface will look similar to that shown in figure 2 3 Each dataset in the View Area will have an associated Side Panel Status Bar and a set of View Tools The Side Panel Status Bar and View Area are only visible when data are open for viewing When no datasets are open the view is like that in figure 2 1 To learn more about the specific areas and functionalities of the user interface please refer to the Biomedical Genomics Workbench reference manual which can be found here http CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 14 www clcbio com support downloads manuals 2 2 4 The Toolbox Here we focus
274. m count Minimum frequency b Locked Settings Figure 5 49 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the CHAPTER 5 WHOLE GENOME SEQUENCING WGS 99 probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pa
275. maller variants larger insertions and deletions and structural variants in the context of the human genome By double clicking on the InDel variant track in the Genome Browser View a table will be shown that lists all identified larger insertions and deletions see figure 5 30 In case you would like to change the reference sequence used for read mapping or the human genes please use the Data Management see section 4 1 4 5 3 Hereditary Disease WGS 5 3 1 Filter Causal Variants WGS HD If you are analyzing a list of variants you can use the Filter Causal Variants WGS HD ready to use workflow to remove variants that are outside the target region as well as common variants present in publicly available databases The workflow will annotate the remaining variants with gene names conservation scores and information from clinically relevant databases The Filter Causal Variants WGS HD ready to use workflow accepts variants tracks files How to run the Filter Causal Variants WGS HD workflow To run the Filter Causal Variants WGS HD workflow go to CHAPTER 5 WHOLE GENOME SEQUENCING WGS 82 ly Genome Browse X 42 675 500 42 676 000 42 676 500 42 677 000 42 677 500 42 675 425 l Pwr _oupreno_ouyuurey_myiw Homo_sapiens_ensembl_v73_Genes Gene annotations 1 322 Homo_sapiens_ensembl_v73_CDS CDS annotations 1 996 o lt ee tumor_01 paired Reads i SS EOE
276. mat and the other button the one labeled Export Parameters allows specification of the export destination T Click on the button labeled OK to go back to the previous dialog box and choose to Save your results Note If you choose to open the results the results will not be saved automatically You can always save the results at a later point Output from the Identify Known Variants in One Sample TAS The Identify Known Variants in One Sample TAS tool produces five different output types 1 Read Mapping The mapped sequencing reads The reads are shown in different colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the descrip tion of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual View_settings_in_Side_Panel html 2 Target Regions Coverage 5 A track showing the targeted regions The table view provides information about the targeted regions such as target region length coverage regions without coverage and GC content CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 169 3 Target Regions Coverage Report The report consists of a number of tables and graphs that in different ways show e g the number length and coverage of the target regions and provides information about the read count per GC 4 Variants D
277. mation about the mapped reads from each sample e Identified Compound Heterozygous Genes Proband Gene track with the identified putative compound heterozygous Variants in the proband The gene track can be opened in table view to see the gene names e Gene List with de novo Variants Gene track with the identified putative compound heterozy gous Variants in the proband The gene track can be opened in table view to see the gene names e Gene List with recessive Variants Gene track with the identified recessive variants in the proband The gene track can be opened in table view to see the gene names e De novo variants Variant track showing de novo variants in the proband The variant track can be opened in table view to see all information about the variants e Recessive variants Variant track showing recessive variants in the proband The variant track can be opened in table view to see all information about the variants e De novo Mutations Amino Acid Track e Recessive Variants Amino Acid Track e Genome Browser View This is a collection of tracks shown together in a view that makes it easy to compare information from the individual tracks such as compare the identified variants with the read mappings and information from databases 7 3 5 Identify Rare Disease Causing Mutations tn Trio TAS The Identify Rare Disease Causing Mutations in a Trio TAS identifies de novo and compound heterozygous variants from a Trio The workflow includ
278. meters gt Select the Kesar Preview All Parameters variants Result handling Open 1000 Genomes Result handling Save Log handling Open log Figure 5 4 Check the settings and save your results In this wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters In the Preview All Parameters wizard you can only check the settings and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows CHAPTER 5 WHOLE GENOME SEQUENCING WGS 63 4 Choose to Save your results and click on the button labeled Finish Output from the Annotate Variants WGS workflow Two types of output are generated 1 Annotated Variants F Annotation track showing the variants Hold the mouse over one of the variants or right clicking on the variant A tooltip will appear with detailed information about the variant 2 Genome Browser View Annotated Variants ly A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions and variants detected in dbSNP ClinVar 1000 Genomes and PhastCons conservation scores see figure 5 5 Inip Genome Browse X 50 000 000 100 000 000 ne Homo_sapiens_sequen 79 Homo_sapiens_ensem bl_v74_Genes ahi o 408 Homo_sapiens_ensem bl_v74_mRNA S o EAR Bauk an aaa Mnike Max k
279. mino acid sequence A variant introducing a stop mutation is illustrated with a red amino acid e Genome Browser View This is a collection of tracks shown together in a view that makes it easy to compare information from the individual tracks such aS compare the identified variants with the read mappings and information from databases 7 3 4 Identify Rare Disease Causing Mutations in Family of Four TAS You can use the Identify Rare Disease Causing Mutations in a Family of Four TAS ready to use workflow to identifie de novo and compound heterozygous variants from an extended family of four where the fourth individual is not affected The Identify Rare Disease Causing Mutations in a Family of Four TAS ready to use workflow accepts sequencing reads as input CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 203 How to run the Identify Rare Disease Causing Mutations in a Family of Four TAS workflow This section recapitulates the steps you need to take to start the workflow each item corre Sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Rare Disease Causing Mutations in a Family of Four TAS workflow go to Toolbox Ready to Use Workflows Targeted Amplicon Sequencing Sequencing amp Hereditary Disease gt Identify Rare Disease Causing Mutations in a Family of Four WGS Fe Double click on the Identify Rare Dis
280. mming by clicking on the folder icon 7 To obtain this file you will have to get in contact with the vendor and ask them to send this adapter trim list file to you The adapter trim list has been supplied by the vendor of the enrichment kit and sequencing machine See section 4 4 2 for a description of how to import the adapter trim list Click on the button labeled Next which will take you to the next wizard figure 4 29 If you click on the button labeled Preview All Parameters you get the chance to check the selected settings If you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows The settings can be exported with the two buttons found at the bottom of this wizard one button allows specification of the export format and the other button the one labeled Export Parameters allows specification of the export destination When selecting an export location you will export the analysis parameter settings that were specified for this specific experiment 4 Click on the button labeled OK to go back to the previous wizard and choose Save CHAPTER 4 GETTING STARTED 56 m Prepare Raw Data Py E ET Trim Sequences Configurable Parameters 2 Select input for Prepare Raw Data Trim adapter list iO 3 Trim Sequences Ambiguous trim v Ambiguous limit 2 Quality trim W Quality limit 0 05 Use colorspace Also search on reversed sequence Remove 5 terminal nucleoti
281. mor Homa Par ip dentfy Variants and Add Exprenson Values Wf identify and Annotate Oifferenitaly JW Genes and Patheays Analysis of WTS data Analysis of WGS data PAs __ Analysis of WES date of WES data N Analysis of TAS data _ i uf ut yt I yt F F Ji Data interpretation Figure 4 14 The first thing to do is to import your sequencing data Below you can find a short guide on how to import data into the Biomedical Genomics Workbench If you wish to learn more about the import options in the Biomedical Genomics Workbench you can find a more detailed description in the Biomedical Genomics Workbench reference manual http www clcbio com support downloads manuals CHAPTER 4 GETTING STARTED 48 4 3 1 How to import data 1 Use the Import tool in the toolbar see figure 4 15 to import your sequencing data into the Biomedical Genomics Workbench 1a CLE Cancer Research Workbench 1 5 Beta 1 File Edit View Toolbox Workspace Help Debug Show New Save int Undo Redo Cut Copy Paste Deli Navigation Area Standard Import Ctrl I al b m ja Eh B 3 Tracks T COLC_Data Fa CLC_Referenc SAM BAM Mapping Files Import Primer Pairs Roche 454 Illumina SOLID Sanger lon Torrent Figure 4 15 Click on the tool labeled Import in the toolbar to import data Select importer according to the data type you wish to import 2 Click on one of the import options e g Illumina
282. n Hapmap for the father CHAPTER 6 WHOLE EXOME SEQUENCING WES 145 10 LL xa Remove Variants Found in HapMap 3 1 Select variant tracks HapMap database track Selected 12 elements dp 2 1000 Genomes population 3 Remoye Variants Found in mr Rx 1006 Ger Project Select Variant track 4 Remove Variants Found in Selected HAPMAP_phase_3_MKK HAPMAP_phase_3_CHD HAPMAP _phase_3 TSI IHAPMAP_phase_3_CHB A HAPMAP phase 3 GIH IHAPMAP_phase_3_HCB HAPMAP _phase_3 LWK HAPMAP phase _3 CEU IHAPMAP_phase_3_MEX HAPMAP _phase_3_YRI IHAPMAP_phase_3_JPT HAPMAP _phase_3 ASW Figure 6 68 Select the relevant Hapmap population s Specify the Hapmap populations that should be used for filtering out variants found in Hapmap from the de novo assembly Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected child figure 6 69 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings m Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs W Minimum coverage 10 Minimum count 2 Minimum frequency 10 0 gt Locked Settings
283. n found in the lower left corner of the View Area e Target Regions Coverage Report The report consists of a number of tables and graphs that in different ways provide information about the targeted regions e Identified Variants A variant track holding the identified variants The variants can be shown in track format or in table format When holding the mouse over the detected variants in the Genome Browser view a tooltip appears with information about the individual variants You will have to zoom in on the variants to be able to see the detailed tooltip e Genome Browser View Identify Variants Ip A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions the mapped reads the identified variants and the structural variants see figure 7 5 It is important that you do not delete any of the produced files individually as some of the outputs are linked to other outputs If you would like to delete the outputs please always delete all of them at the same time Please have first a look at the mapping report to see if the coverage is sufficient in regions of interest e g gt 30 Furthermore please check that at least 90 of reads are mapped to the human reference sequence In case of a targeted experiment please also check that the majority of reads are mapping to the targeted region Afterwards please open the Genome Browser
284. n tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage Minimum count Minimum frequency gt Locked Settings Figure 5 45 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of
285. nd Pathways M e Rat gt Annotate Variants WTS R t Compare Variants in DNA and RNA R JE Identify Candidate Variants and Genes from Tumor Normal Pair R Identify Variants and Add Expression Values R Identify and Annotate Differentially Expressed Genes and Pathways R Figure 8 1 The RNA seg ready to use workflows Note Often you will have to prepare data with one of the two Preparing Raw Data workflows described in section 4 4 before you proceed to the analysis of the sequencing data RNA Seq Note Make sure that you have selected the references corresponding to the species you will be working with To check and potentially change which Reference Data Set is currently in use click on the Data Management Fy button in the top right corner of the Workbench and click apply to the appropriate data set Hg38 Hg19 Mouse or Rat If you are given an error message about missing a reference data element when starting a workflow you can delete and re download the missing reference element or set Also note that in case of workflows annotating variants using databases available for more than one population you can select the population that matches best the population your samples are derived from This will be done in the wizard for populations from the 1000 Genomes Project while Hapmap populations can be specified with the Data Management Fy function before starting the workflows see section 4 1 4 8 1 Analysis of multip
286. nd it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Rare Disease Causing Mutations in a Trio WGS workflow Eleven types of output are generated e Read Mapping One for each family member The reads mapped to the reference sequence e Filtered Variant Tracks One for each family member The variants identified in each of the family members The variant track can be opened in table view to see all information about the variants e Read Mapping Report One for each family member The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from each sample e De novo variants Filtered variant track showing de novo variants in the proband The variant track can be opened in table view to see all information about the variants e Recessive variants Filtered variant track showing recessive variants in the proband The variant track can be opened in table view to see all information about the variants e Gene List with Putative Causal Variants Proband Gene track with the identified putative compound heterozygous Variants in the proband The gene track can be opened in table view to see the gene names e Gene List with recessive Variants Gene track with the identified recessive variants in the proband The gene track can be opened in table view to see the gene names e Identified Compound Heterozygous Genes Proband
287. nd parent uncle or the like The Identify Causal Inherited Variants in a Family of Four WES ready to use workflow accepts sequencing reads as input from each of the four family members How to run the Identify Causal Inherited Variants in a Family of Four WES workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Causal Inherited Variants in a Family of Four WES workflow go to Toolbox Ready to Use Workflows Whole Exome Sequencing 4 Hereditary Disease Identify Causal Inherited Variants in a Family of Four WES zs 1 Double click on the Identify Causal Inherited Variants in a Family of Four WES tool to Start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the sequencing reads from the affected family member figure 6 55 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that Should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Select sequencing rea ds 1 Choose where to run 2 Selectreads from Na
288. nded that you configure your workflows with the file from this population that best matches the ethnicity of the patient from which the sample was taken You can find more about the population codes which are part of the filename here http www sanger ac uk resources downloads human hapmap3 html e Variants found by the 1000 Genomes Project ENSEMBL ftp ftp ensembl org pub release 80 variation gvf homo_sapiens The 1000 Genomes Project Phase 1 created an integrated map of genetic variations from 1092 human genomes et al 2012 Please note that there are 4 different files tracks to be downloaded one file for each population It is recommended that you configure your workflows with the file from the population that bests matches the ethnicity of patient from which the sample was taken You can learn more about the population codes that are part of the filename here http www 1000genomes org e dbSNP variants UCSC http hgdownload soe ucsc edu goldenPath hg38 database Human variants present in the Single Nucleotide Polymorphism Database dbSNP which includes smaller insertions deletions replacements SNPs and MNVs Please note that most variants in doSNP are not validated and everybody can submit data to dbSNP The collection of variants includes clinical relevant as well aS common variants filename STipl42 UXt 9gz e dbSNP common variants UCSC http hgdownload soe ucsc edu goldenPath hg38 database Uniquely mapped variants t
289. ndividuals and subtracting all variants in the unaffected individual The workflow includes a back check for all family members The Identify Causal Inherited Variants in a Trio TAS ready to use workflow accepts sequencing reads as input CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 199 How to run the Identify Causal Inherited Variants in a Trio TAS workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Causal Inherited Variants in a Trio TAS workflow go to Toolbox Ready to Use Workflows Targeted Amplicon Sequencing Sequencing amp Hereditary Disease 5 Identify Causal Inherited Variants in a Trio TAS Sep 1 Double click on the Identify Causal Inherited Variants in a Trio TAS tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the sequencing reads from the unaffected parent figure 7 60 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard
290. ne List with Putative Causal Variants Gene track with the identified putative causal variants in the child The gene track can be opened in table view to see the gene names e An Amino Acid Track Shows the consequences of the variants at the amino acid level in the context of the original amino acid sequence A variant introducing a stop mutation is illustrated with a red amino acid e Genome Browser View This is a collection of tracks shown together in a view that makes it easy to compare information from the individual tracks such aS compare the identified variants with the read mappings and information from databases 5 3 4 Identify Rare Disease Causing Mutations in Family of Four WGS You can use the Identify Rare Disease Causing Mutations in a Family of Four WGS ready to use workflow to identifie de novo and compound heterozygous variants from an extended family of four where the fourth individual is not affected The Identify Rare Disease Causing Mutations in a Family of Four WGS ready to use workflow accepts sequencing reads as input How to run the Identify Rare Disease Causing Mutations in a Family of Four WGS workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Rare Disease Causing Mutations in a Family of Four WGS workflow go to
291. nfigure reference data 38 Contact information 8 Create new folder 45 Customized data analysis 244 Download reference data 38 Edit preinstalled workflows 244 Example data import 12 Filter Causal Variants TAS HD 192 Filter Causal Variants WESHD 133 Filter Causal Variants WGS HD 81 Filter Somatic Variants TAS 169 Filter Somatic Variants WES 110 Filter Somatic Variants WGS 69 Identify and annotate differentially expressed genes 239 Identify and Annotate Variants TAS 185 Identify and Annotate Variants TAS HD 214 Identify and Annotate Variants WES 126 Identify and Annotate Variants WES HD 155 Identify candidate variants and genes from tu mor normal pair 230 Identify Causal Inherited Variants in Family of Four TAS 194 Identify Causal Inherited Variants in Family of Four WES 135 Identify Causal Inherited Variants in Family of Four WGS 84 Identify Causal Inherited Variants in Trio TAS 198 Identify Causal Inherited Variants in Trio WES 139 Identify Causal Inherited Variants in Trio WGS 87 Identify Known Variants in One Sample TAS 165 Identify Known Variants in One Sample WES 105 Identify Known Variants in One Sample WGS 65 Identify Rare Disease Causing Mutations in Family of Four TAS 202 Identify Rare Disease Causing Mutations in Family of Four WES 143 Identify Rare Disease Causing Mutations in Family of Four WGS 90 Identify Rare Diseas
292. ng reads as input from each of the four family members How to run the Identify Causal Inherited Variants in a Family of Four WGS workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Causal Inherited Variants in a Family of Four WGS workflow go to Toolbox Ready to Use Workflows Whole Genome Sequencing z Hereditary Disease 2 Identify Causal Inherited Variants in a Family of Four WGS Jem 1 Double click on the Identify Causal Inherited Variants in a Family of Four WGS tool to Start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the sequencing reads from the affected family member figure 5 34 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that Should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Select the sequencing reads from the unaffected parent Select the sequencing reads from the affected parent Select the sequencing reads from the affected child O Oo A W Specify the parameters for th
293. nly requires few inputs from the user Side Panel The Side Panel shown to the right of all views that are opened in Biomedical Genomics Workbench allows you to change the way the content of a view is displayed Status Bar The Status Bar is located at the bottom of all views The left side of the bar shows whether the computer is making calculations or whether it is idle The right side of the bar indicates the range of the selection of a sequence Tool In the Biomedical Genomics Workbench this term is used shout btn sng to s and readytouse workows Toolbox The area in the lower left side of the Biomedical Genomics OP __ Wettench that hots the tls Track Data is presented in track format genome browser view in the Biomedical Genomics Workbench View Area The area in the middle of the Biomedical Genomics Work bench This is where you can visualize your results and work with your data View Tools The area in the lower right part of the View Area Here you can find tools for zooming panning and selection of data 256 Bibliography etal 2012 G P C Abecasis G R Auton A Brooks L D DePristo M A Durbin R M Handsaker R E Kang H M Marth G T and McVean G A 2012 An integrated map of genetic variation from 1 092 human genomes Nature 491 7422 56 65 Choi et al 2009 Choi M Scholl U I Ji W Liu T Tikhonova R Zumbo P Nayir A Bakkaloglu A zen S Sanjad S Ne
294. nnotations 8 031 oe es z acai _ zm i x b 3 8 v a g 25 B a S S d i 39 SRR719299_1 paired Target Region o ono oun 0 M Ls h u lu Poe TI h ae ee mr 1 634 SRR719300_1 paired Read Mapping Normal 4 177 231 reads m o 1 907 SRR719299_1 paired Read Mapping Tumor 4 345 673 reads w M m in ee a a ee 2 SRR719300_1 paired SRR719299_1 paired l i o 25 Clinvar_20131203 Variants 7 801 dl o a i PhastCons_conservati on_scores_hg19 mex rA sa Figure 6 33 The Genome Browser View presents all the different data tracks together and makes it easy to compare different tracks How to run the Identify Variants WES workflow To run the Identify Variants WES workflow go to Toolbox Ready to Use Workflows Whole Exome Sequencing Somatic Cancer E gt Identify Variants WES 25 1 Select the sequencing reads from the sample that should be analyzed figure 6 34 A Identify Variants WES Select sequencing reads Navigation Area Selected elements 1 B E cancer_research_workbench SRR719299_1 paired _ hei whole_exome_sequencing BA new_import E gt E whole_genome_sequencing _ gj targeted_amplicon_sequencing Ej example_set 4 m 1 Choose where to run 2 Select sequencing reads Qy lt enter search term gt E Batch Figure 6 34 Please select all sequencing rea
295. nome will automatically be brought into focus in the Genome Browser View ly Genome Browse X 49 020 55 249 040 55 249 060 55 249 080 55 249 100 Homo_sapiens_sequence_hg19 4c GT GT GCCGCCTGCT GGGCAT CT GCCT CACCT CCACCGT GCAICT CAT CACGCAGCT CAT GCCCTTCGGCTGCCTCCTGGACTATC Homo_sapiens_ensembI_v73_Genes Gene annotations 2 818 Homo_sapiens_ensembI_v73_mRNA mRNA annotations 8 251 Homo_sapiens_ensembI_v73_CDS CDS annotations 4 332 ERR319087 single Reads locally realigned Variants Annotated Variants Variants 6 Ee ice i ie Cosmic_v67 Variants 64 363 Fre _ ClinVar_20130930 Variants 4 176 a Bae E dbsnp_v138 o Variants 7 205 594 a 1 00 Phast Cons_conservation_scores_hg19 Graph 0 00 b Ill ZE Y Fa s aie E ERR3 19087 si X Rows 26 Table view Genome Filter Region Type ele Reference Length Zygosity Exact match 55211116 55211117 Insertion A No 1 Heterozygous 3 fm 7 55211116455211117 Insertion i No 1 Heterozygous 2 7 5524251255242513 Insertion i No 1 Homozygous 3 7 55249063 SNV G A No 1 Homozygous Cosmic_v67 dbsnp_v138 14 7 552491294 55249130 Insertion A No 1 Heterozygous 1 Li gt sls Create Track from Selection n EAS Figure 5 6 The output from the Annotate Variants ready to use workflow is a genome browser view a track list The information is also av
296. nomes population 1000 Genomes gt gt NOMES_phase_1_FUR 1000GENOMES_phase_1_AMR 1000GENOMES_phase_1_AFR 5 Figure 7 53 Select the relevant 1000 Genomes population s 4 Specify the L000 Genomes population that should be used for filtering out variants found in the 1000 Genomes project This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 5 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap figure 7 54 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 6 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Filter Causal Variants TAS HD workflow Three types of output are generated e An Amino Acid Track Shows the consequences of the variants at the amino acid level in the context of the original amino acid sequence A variant introducing a stop mutation is i
297. ntered e g network connection errors 4 2 Create new folder To get started you need some data to work with However before looking into how you can import your data into the Biomedical Genomics Workbench we will first create a new folder in the CHAPTER 4 GETTING STARTED 46 Navigation area that can be used to hold all data that are relevant for the analysis you are about to perform You can see how to do this in figure 4 12 Bx CLE Cancer Research Workbench L0 Beta 1 File Edit View Toolbox A te a Name of folder year month date Exome project Elf CLC_Data H P CLC_References Figure 4 12 Click on the Create Folder icon or use the too Vew In the toolbar to create a new folder Provide a name that will make it easy to keep track of your data The folder that you have just created will be placed in the CLC_Data location as shown in figure 4 13 A CLC Cancer Research Workbench 1 0 Beta File Edit View Toolbox Workspace Help GG Be Save Graphics Prin i Navigation Area b Wa A a CLC_Data G9 Cancer Figure 4 13 The folder that you have just created will be placed in the CLC_Data location CHAPTER 4 GETTING STARTED 4T 4 3 Import sequencing data We are now ready to start importing the data The simplistic diagram shown in figure 4 14 will be used throughout the rest of the manual to provide an overview as we step by step move through the different steps from data import to anal
298. ntify Somatic Variants from Tumor Normal Pair WES the reads are mapped and the variants identified An internal workflow removes germline variants that are found in the mapped reads of the normal control sample and variants outside the target region are removed as they are likely to be false positives due to non specific mapping of sequencing reads Next remaining variants are annotated with gene names amino acid changes conservation scores and information from clinically relevant databases like ClinVar variants with clinically relevant association Finally information from dbSNP is added to see which of the detected variants have been observed before and which are completely new Import your targeted regions A file with the genomic regions targeted by the amplicon or hybridization kit is available from the vendor of the enrichment kit and sequencing machine To obtain this file you will have to get in contact with the vendor and ask them to send this target regions file to you You will get the file in either bed or gff format To import the file Go to the toolbar Import Tracks E How to run the Identify Somatic Variants from Tumor Normal Pair WES 1 Go to the toolbox and double click on the Identify Somatic Variants from Tumor Normal Pair WES ready to use workflow This will open the wizard shown in figure 6 23 where you can select the tumor sample reads i Identify Somatic Variants from Tumor Normal Pair WES
299. o 716 Clinvar_20131203 Variants 7 301 111a o PESCE Tale an bibh si re f ee E ee 2 799 Cosmic_v67 we Sar EITA EA es talh Meta stud mMidhh 8 309 1000GENOMES phase_1 Y 0 9 517 1000GENOMES phase_1 _AMR vr o 12 443 1000GENOMES phase_1 _AFR vr o 1 PhastCons_conservati on_scores_hg19 d o gt MIRE P e _ 4 Sh Figure 7 5 The output from the Annotate Variants ready to use workflow is a genome browser view a track list containing individual tracks for all added annotations Note Please be aware that if you delete the annotated variant track this track will also disappear from the genome browser view CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 164 It is possible to add tracks to the Genome Browser View such as mapped sequencing reads as well as other tracks This can be done by dragging the track directly from the Navigation Area to the Genome Browser View If you double click on the name of the annotated variant track in the left hand side of the Genome Browser View a table that includes all variants and the added information annotations will open see figure 7 6 The table and the Genome Browser View are linked if you click on an entry in the table this particular position in the genome will automatically be brought into focus in the Genome Browser View Ify Genome Browse X 13 020 55 249 040 55 249 060 55 249 080 55 249
300. o to the toolbox and double click on Toolbox Ready to Use Workflows Whole Exome Sequencing i General Workflows WES Identify Known Variants from One Sample WES 2 2 This will open the wizard step shown in figure 6 8 where you can select the reads of the sample that should be tested for presence or absence of your known variants Identify Known Variants in One Sample WES Select sequencing reads 1 Choose where to run fate Navigation Area 2 Select sequencingreads gt whole_exome_sequencing E RR719300_1 paired dE SRR719299_1 paired il Q lt enter search term gt E Batch Figure 6 8 Select the sequencing reads from the sample you would like to test for your known variants If several samples from different folders should be analyzed the tool has to be run in batch mode This is done by selecting Batch and specifying the folders that hold the data you wish to analyse Click on the button labeled Next 3 Specify the target region for the Indels and Structural Variants tool figure 6 9 This step iS optional and will speed the completion time of the workflow by running the tool only on the selected target regions If you do not have a targeted region file to provide simply click Next 4 Specify the parameters for the QC for Target Sequencing tool figure 6 10 CHAPTER 6 WHOLE EXOME SEQUENCING WES 107 9 Bx Identify Known Variants in One Sample WES X InD
301. olds the variant data This track is also included in the Genome Browser View If you hold down the Ctrl key Cmd on Mac while clicking on CHAPTER 6 WHOLE EXOME SEQUENCING WES 114 the table icon in the lower left side of the View Area you can open the table view in split view The table and the variant track are linked together and when you click on a row in the table the track view will automatically bring this position into focus e Genome Browser View Filter Somatic Variants A collection of tracks presented together Shows the somatic candidate variants together with the human reference sequence genes transcripts coding regions and variants detected in ClinVar 1000 Genomes and the PhastCons conservation scores see figure 6 21 Iny Genome Browse X 0 55 000 000 60 000 000 65 000 C I I Homo_sapiens_sequen eee Homo_sapiens_ensem bl_v74_Genes Gene annotations 2 376 gt ll l 44d 4 BKI LNEANOID f Homo_sapiens_ensem i 5 Di ba dikib a bl_v74_mRNA mRNA annotations 8 281 Homo_sapiens_ensem ca bl_v74_CDS CDS annotations 4 337 il i 3 i aS oi iia 2 normal_01 paired trimmed paired Somatic Candidate Variants Variants 2 o 164 Clinvar_20131203 Variants 5 149 om a08 Cosmic_v67 Variants 64 363 o di a li 1 183 4000GENOMES phase 1 P niu o 1 PhastCons_conservati on_scores_hg19 Graph s m
302. ollection of variants includes clinical relevant as well as common variants Please note that the url must be modified according to what you would like to download e g if you are interested in snpl141Common txt gz 138 in the url should be replaced with 141Common for a full list see http hgdownload soe ucsc edu goldenPath hg19 database dbSNP common variants UCSC http hgdownload soe ucsc edu goldenPath hg19 database snp138Common ee a age BA Uniquely mapped variants that appear in at least 1 of the population or are 100 non reference Please note that the url must be modified according to what you would like to download e g if you are interested in snp141Common txt gz 138 in the url should be replaced with 141 for a full list see http hgdownload soe ucsc edu goldenPath hg19 database ClinVar database variants NCBI http www ncbi nlm nih gov clinvar docs maintenance_use ClinVar is designed to provide a freely accessible public archive of reports of the relation ships among human variations and phenotypes with supporting evidence PhastCons Conservation Scores UCSC http hgdownload soe ucsc edu goldenPath hg19 phastCons1l00way hgl19 100way phastCons Conservation track of UCSC from a multiple alignments of 100 species and measurements of evolutionary conservation using the phastCons algorithm from the PHAST package Human Gene Ontology GO slim file EBI http www ebi ac uk QuickGO GMultiTerm Gene
303. om i Family member affected affected family member Family of Four Affected child H1 Father affected Mother unaffected Family member affected mW E Figure 3 2 Specify the sequencing reads for the appropriate family member Previous gt Next Specify the targeted region file figure 3 3 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents Specify the affected child s gender for the Trio analysis figure 3 4 Some workflows contains a Trio Analysis and thus take the gender of the proband into account Specify the parameters for the Fixed Ploidy Variant Detection tool figure 3 5 CHAPTER 3 READY TO USE WORKFLOWS DESCRIPTIONS AND GUIDELINES 32 Select input for targeted region file Navigation Area Selected elements 1 gt targeted_sequencing JE S0293689_Regions_BED CTFR Cergentis E AmpliSeq agilent_sure_select 4450293689_Regions_BED Ww Qr lt enter search term gt Previous gt Next Trio Analysis Configurable Parameters Child gender Fem
304. om the wizard to edit parameters in the relevant windows At the bottom of this wizard there are two buttons regarding export functions one button allows specification of the export format and the other button the one labeled Export Parameters allows specification of the export destination When selecting an export location you will export the analysis parameter settings that were specified for this specific experiment 4 Click on the button labeled OK to go back to the previous wizard and choose Save Note If you choose to open the results the results will not be saved automatically You can always save the results at a later point CHAPTER 5 WHOLE GENOME SEQUENCING WGS 80 Identify Variants WGS Result handling Choose where to run Workflow parameters 1 Select sequencing reads Preview All Parameters 3 InDels and Structural Variants 2 Result handling Low Frequency Variant Open Detection Save a Result handling Log handling C Open log mm zw Previous ext Yi cai Figure 5 28 Check the settings and save your results Output from the Identify Variants WGS workflow The Identify Variants WGS tool produces six different types of output 1 Structural Variants Variant track showing the structural variants insertions deletions replacements Hold the mouse over one of the variants or right clicking on the variant A tooltip will appear with detailed information abou
305. omics Workbench includes an example data set If you would like to download the example data you have three options 1 You can click Download Example Data in the start up table that is visible in the Biomedical Genomics Workbench when no datasets have been opened for viewing This will take you to http www clcbio com support downloads data where you can choose to download two different example datasets that can be used for the following purposes e Variant identification in a tumor sample This dataset is taken from a larger whole exome dataset and includes data from a small fraction of chromosome 5 Exam ple_data_tumor zip e Identification of somatic variants in a tumor sample using the matched normal sample for removal of germline variants This is matched tumor and normal samples from chromosome 22 from a whole exome dataset Example_data_tumor_normal zip 2 You can also go to directly to http www clcbio com support downloads data and download the example data from there 3 Finally you can use these links to get the data http download clcbio com testdata cancer current Example_data_tumor Z1p Or http download clcbio com testdata cancer current Example_data_tumor_ norma l 2i1p When you have downloaded the data from the website you need to import them into the Biomedical Genomics Workbench How to import data is described in section 4 3 CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 13 2 2 The
306. on the selected target regions If you do not have a targeted region file to provide simply click Next 4 Specify the parameters for the QC for Target Sequencing tool figure 7 10 When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows This step is not optional and you need to specify the targeted regions file adapted to the sequencing technology you used Choose to use the default settings or to adjust the parameters The parameters that can be set are e Minimum coverage provides the length of each target region that has at least this coverage CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 167 Bx Identify Known Variants in One Sample TAS E InDels and Structural Variants me mei ae Configurable Parameters 2 oe ene Structural Restrict calling to target regions gt 0293689_Regions_BED Fe anan b Locked Settings umda anan Xc Figure 7 9 Specify the targeted region file for the Indels and Structural Variants tool Bx Identify Known Variants in One Sample TAS ad QC for Target Sequencing A gron Configurable Parameters InDels and Structural Track of Target Regions E 0293689_Regions_BED Variants Minimum coverage 30 i for Target Sequencing Qc Ignore non specific matches Ignore broken pairs gt Locked Settings Figure 7 10 Spec
307. on scores It is possible to filter variants based on their annotations This type of filtering can be facilitated using the table filter found at the top part of the table If you are performing multiple experiments where you would like to use the exact same filter criteria you can create a filter that can be saved and reused To do this Toolbox Identify Candidate Variants Create Filter Criteria This tool can be used to specify the filter and the Annotate Variants workflow should be extended by the Identify Candidate Tool configured with the Filter Criterion The Biomedical Genomics Workbench reference manual has a chapter that describes this in detail http www clcbio com support downloads manuals see chapter Workflows for more information on how pre installed workflows can be extended and or edited Note Sometimes the databases e g doSNP are updated with a newer version or maybe you have your own version of the database In such cases you may wish to change one of the used databases This can be done with Data Management function which is described in section 4 1 4 CHAPTER 6 WHOLE EXOME SEQUENCING WES 116 6 2 2 Identify Somatic Variants from Tumor Normal Pair WES The Identify Somatic Variants from Tumor Normal Pair WES ready to use workflow can be used to identify potential somatic variants in a tumor sample when you also have a normal control sample from the same patient When running the Ide
308. on that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored 12 Specify the parameters for the QC for Target Sequencing tool for the father 13 Specify the parameters for the QC for Target Sequencing tool for the mother 14 Specify the parameters for the Fixed Ploidy Variant Detection tool for the sibling fig ure 7 70 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage Minimum count Minimum frequency gt Locked Settings Figure 7 70 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 206 15 16 17 18 19 probability of the individual variant For the Fixed Ploidy Variant detector if a
309. onsideration as well as the forward reverse strand balance When merging overlapping reads these two parameters will be affected 1 the frequency of observed alleles in overlapping regions will be corrected a variant found both on the forward and the reverse read of the same fragment Should only be counted once and 2 in the merged fragments the information on forward reverse strand origin has become meaningless These effects have to be taken into consideration when filtering variants on these statistics As the forward reverse strand balance statistic is used as a variant filter i e the Read direction filter we recommend using the Prepare Overlapping Raw Data workflow on targeted amplicon sequencing data with overlapping read sequencing strategy whereas we recommend the Prepare Raw Data workflow for other sequencing protocols e g whole genome sequencing whole exome sequencing also if making use of overlapping read sequencing CHAPTER 4 GETTING STARTED 50 Be Illumina Result handlin 1 Choose where to run esult handling 2 Import files and options 3 Result handling Result handling Open Save Into separate folders Log handling Make log Ceres Figure 4 17 You now have the option to choose whether you wish to open or save the imported reads If you select to open the reads they will not be saved unless you do it manually at a later point Select Save and click on the button labeled Next Illum
310. ool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html Specify the Hapmap populations that should be used for filtering out variants found in Hapmap figure 5 36 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 X Remove Variants Found in HapMap 3 1 Select variant tracks Serene See waar Bae Se e HapMap database track Selected 12 elements 2 1000 Genomes population 3 Remoye Variants Found in Bx 1000 Genoities Project Bx Select Variant track 4 Remove Variants Found in Selected atten 3 HAPMAP phase _3_MKK HAPMAP_phase_3_CHD HAPMAP_phase_3 TSI IHAPMAP_phase_3_CHB Y HAPMAP _phase_3_GIH HAPMAP _phase_3_HCB HAPMAP_phase_3_LWK IHAPMAP_phase_3_CEU IHAPMAP_phase_3_MEX HAPMAP _phase_3_YRI HAPMAP _phase_3_JPT HAPMAP_phase_3_ASW Figure 5 36 Select the relevant Hapmap population s 8 Specify the parameters for the Fixed Ploidy Variant Detection tool for the unaffected parent 9 Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected parent 10 Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected child 11 Pressing the button Preview
311. or filtering out variants found in Hapmap figure 6 57 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 xE Remove Variants Found in HapMap 3 1 Select variant tracks HapMap database track Selected 12 elements oP 1000 Genomes population Remove Variants Found in 1000 Genomes Project Bx Select Variant track Remove Variants Found in Selected HapMap 3 HAPMAP _phase_3 MKK HAPMAP _phase_3 CHD HAPMAP _phase_3 TSI HAPMAP _phase_3 CHB l Q HAPMAP phase 3 GIH HAPMAP _phase_3 HCB HAPMAP _phase_3 LWK HAPMAP _phase_3_CEU HAPMAP_phase_3_ MEX IHAPMAP_phase_3_YRI HAPMAP_phase_3_JPT HAPMAP phase _3_ASW Figure 6 57 Select the relevant Hapmap population s 8 Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected family member figure 6 58 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings The parameters that can be set are CHAPTER 6 WHOLE EXOME SEQUENCING WES 1
312. ormal Pair WES 298 Fiter Somate Variants from Tumor Normal Par TAS i F Whole Transaipiome 3 20 Identify Known Variants in One Sampie DAGS DT Identify Known Variants in One Sampie VES SA Identify Known Variants in One Sampie TAS e annotate variants ws inc Identify Variants WGS rare Variants WES Senn Warianits TAS UA Compare Variants in DMA and RHA i Identify and Annotabe Wariarits WES k Identify and Annotate Variants TAS identify Canckdate Variants and Genes from Tumor Mone Par ip dentfy Vanants and Add Expression Values Se identify and Annotate Differentially AnalysisofWGSdata _ Analysis of WES data AN Analysis of TAS data AN Analysis of WIS da data 1 1 F F F i Data interpretation Figure 4 30 Inspect the quality and trimming reports and determine whether you can proceed with the data analysis or if you have to resequence some of the samples For a detailed description of the QC reports and indication on how to interpret the differ ent values see http clcsupport com biomedicalgenomicsworkbench current index php manual Report_contents html If you can accept the read quality you can now proceed to the next step and use the prepared reads output as input in the next ready to use workflow If the quality of your reads is poor and cannot be accepted for further analysis the best solution to the problem is to go back to start and resequence the sample You are now ready to perform the actual an
313. ose to adjust the parameters QC for Target Sequencing proband Configurable Parameters Minimum coverage 30 Ignore non specific matches Ignore broken pairs E gt Locked Settings Figure 7 75 Specify the parameters for the QC for Target Sequencing tool The parameters that can be set are CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 210 e Minimum coverage provides the length of each target region that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored 11 Specify the parameters for the QC for Target Sequencing tool for the father 12 Specify the parameters for the QC for Target Sequencing tool for the mother 13 Specify the parameters for the Fixed Ploidy Variant Detection tool for the mother fig ure 6 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage Minimum count Minimum frequency gt Locked Settings Figure 7 76 Specify t
314. ose to save the results In this wizard step you get the chance to preview the settings used in the ready to use workflow of the export format and the other button the one labeled Export Parameters allows specification of the export destination When selecting an export location you will export the analysis parameter settings that were specified for this specific experiment T Click on the button labeled OK to go back to the previous wizard step and choose Save Note If you choose to open the results the results will not be saved automatically You can always save the results at a later point Output from the Identify Variants TAS workflow The Identify Variants TAS tool produces six different types of output e Read Mapping The mapped sequencing reads The reads are shown in different colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the descrip tion of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual View_settings_in_Side_Panel html CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 185 e Target Regions Coverage The target regions coverage track shows the coverage of the targeted regions Detailed information about coverage and read count can be found in the table format which can be opened by pressing the table ico
315. otated variant track in the Genome Browser View a table will be shown that includes all variants and the added information annotations see figure 7 22 Adding information from other sources may help you identify interesting candidate variants for further research E g common genetic variants present in the HapMap database or variants known to play a role in drug response or other clinical relevant phenotypes present in the ClinVar database can easily be identified Further variants not found in the ClinVar databases can be prioritized based on amino acid changes in case the variant causes changes on the amino acid level A high conservation level between different vertebrates or mammals in the region containing the variant can also be used to give a hint about whether a given variant is found in a region with an important functional role If you would like to use the conservation scores to identify interesting variants we recommend that variants with a conservation score of more than 0 9 PhastCons score is prioritized over variants with lower conservation scores It is possible to filter variants based on their annotations This type of filtering can be facilitated using the table filter found at the top part of the table If you are performing multiple experiments where you would like to use the exact same filter criteria you can create a filter that can be saved and reused To do this Toolbox Identify Candidate Variants C
316. overage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html 6 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify Variants TAS HD workflow Four types of output are generated e A Reads Track Read Mapping e A Filtered Variant Track Identified variants e A Coverage Report e A Per region Statistics Track 7 3 7 Identify and Annotate Variants TAS HD The Identify and Annotate Variants TAS tool should be used to identify and annotate variants in one sample The tool consists of a workflow that is a combination of the Identify Variants and the Annotate Variants workflows CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 215 The tool runs an internal workflow which starts with mapping the sequencing reads to the human reference sequence Then it runs a local realignment to improve the variant detection which is run afterwards Aft
317. overview variant track with information about if the variant has been detected or not the identified zygosity if the coverage was sufficient at this position and the observed allele frequency 3 In the next step you will be asked to specify which of the 1000 Genomes populations that should be used for annotation figure 7 16 Click on the button labeled Next 4 In this wizard step you are asked to supply a track containing the targeted regions figure 7 17 Select the track by clicking on the folder icon K in the wizard Click on the button labeled Next 5 The next wizard step will once again allow you to specify the 1000 Genomes population that should be used this time for filtering out variants found in the 1000 Genomes project figure 7 18 Click on the button labeled Next 6 The next wizard step figure 7 19 concerns removal of variants found in the HapMap database Select the population you would like to use from the drop down list Please CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS r Bx Filter Somatic Variants TAS 1 Choose where to run Select variants identified in tumor Navigation Area 4 2 Select variants b targeted_amplicon_sequencing fresh_frozen tumor_1 448 RR319085 Reads locally realig gt ERR319085 Reads locally realic b gt ERR319085 Reads locally realic iil p Q lt enter search term gt Ea Batch ra t Selected elements 1
318. overview variant track with information about if the variant has been detected or not the identified zygosity if the coverage was sufficient at this position and the observed allele frequency Toolbox Ready to Use Workflows Whole Exome Sequencing i4 Somatic Cancer E Filter Somatic Variants 2 Double click on the Filter Somatic Variants tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis Next you will be asked to select the variant track you would like to use for filtering somatic variants The panel in the left side of the wizard shows the kind of input that should be provided figure 6 15 Select by double clicking on the reads file name or clicking once on the file and then clicking on the arrow pointing to the right side in the middle of the wizard Click on the button labeled Next In the next step you will be asked to specify which of the 1000 Genomes populations that should be used for annotation figure 6 16 Click on the button labeled Next In this wizard step you are asked to supply a track containing the targeted regions figure 6 17 Select the track by clicking on the folder icon Wr in the wizard Click on the button labeled Next CHAPTER 6 WHOLE EXOME SEQUENCING WES 112 EM Filter Somatic Variants WES Select variants identified in tumor 1 Choose where to run won Navigation Area Selected elements 1 2 Select var
319. ow accepts sequencing reads as input from each of the four family members How to run the Identify Causal Inherited Variants in a Family of Four TAS workflow This section recapitulates the steps you need to take to start the workflow each item corre sponding to a different wizard windows For more information on the specific tools used in this workflow see section 3 3 To run the Identify Causal Inherited Variants in a Family of Four TAS workflow go to Toolbox Ready to Use Workflows Targeted Amplicon Sequencing Sequencing amp Hereditary Disease gt Identify Causal Inherited Variants in a Family of Four TAS 13 1 Double click on the Identify Causal Inherited Variants in a Family of Four TAS tool to Start the analysis If you are connected to a server you will first be asked where you would CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 195 like to run the analysis Select the sequencing reads from the affected family member figure 7 55 The sequencing reads from the different family members are specified one at a time in the appropriate window The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Lh e Select sequencing reads Choose where to run Navigation Area Selected elements 1 2 Selectreads fro
320. owing recessive variants in the proband The variant track can be opened in table view to see all information about the variants e Identified Compound Heterozygous Genes Proband Gene track with the identified putative compound heterozygous Variants in the proband The gene track can be opened in table view to see the gene names e Gene List with de novo Variants Gene track with the identified putative compound heterozy gous Variants in the proband The gene track can be opened in table view to see the gene names e Gene List with recessive Variants Gene track with the identified recessive variants in the proband The gene track can be opened in table view to see the gene names e Target Region Coverage Report One for each family member The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from each sample e Target Region Coverage One track for each individual When opened in table format it is possible to see a range of different information about the targeted regions such as target region length read count and base count e Genome Browser View his is a collection of tracks shown together in a view that makes it easy to compare information from the individual tracks such as compare the identified variants with the read mappings and information from databases e De novo Mutations Amino Acid Track e Recessive Variants Amino Acid Track CHAPTER 7 TARGETED AMPLICON SE
321. pMap pop ulation codes please see http www sanger ac uk resources downloads human hapmap3 html and for the 1000 Genomes Project see http www ensembl org Help Fag id 328 The Delete button allows user to delete locally installed reference data whereas only adminis trators are capable of deleting reference data installed on the server This can be used if you CHAPTER 4 GETTING STARTED 44 B Create Custom Reference Data Set Name Hg19 Organism homo_sapiens Chromosomal Extension Full human genome Annotation Type Indudes all std annotations eference Version Selected data 1000 Genomes Project phase_1 w 1000GENOMES_phase_1_AF S ensembl_v74 v inVar 20131203 X onservation Scores PhastCons hg19 X bSNP 138 v bSNP Common 138 v Ontology 20131027 X es ensembl_v74 v apMap phase 3 a HAPMAP_phase_3_ASW HA NA phase_3 cpn phase_3_chr_21 Custom Cancel save Figure 4 8 Select the version of the 1000 genomes or Hapmap database you want to work with or select the option custom Select variant tracks Navigation Area Selected elements 0 5 3 CLC_Data workflows CLC_References 4 homo_sapiens J dinvar J cds a cosmic am 1000_genomes_project aE gene_ontology a genes 4 5 hapmap 6 mrna F sequence 4 69 dbsnp_common conservation_scores_phastcons H dbsnp a exons a cytogenetic_ideogram small_transcripts t
322. phase_3_GIH HAPMAP _phase_3_HCB IHAPMAP_phase_3_LWK IHAPMAP_phase_3_CEU IHAPMAP_phase_3_ MEX HAPMAP _phase_3_YRI HAPMAP _phase_3_JPT HAPMAP_phase_3_ASW Figure 6 54 Select the relevant Hapmap population s 6 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Filter Causal Variants WES HD workflow Three types of output are generated e An Amino Acid Track Shows the consequences of the variants at the amino acid level in the context of the original amino acid sequence A variant introducing a stop mutation is illustrated with a red amino acid e A Genome Browser View e A Filtered Variant Track CHAPTER 6 WHOLE EXOME SEQUENCING WES 135 6 3 2 Identify Causal Inherited Variants in Family of Four WES As the name of the workflow implies you can use the Identify Causal Inherited Variants in a Family of Four WES ready to use workflow to identify inherited causal variants in a family of four The family relationship can be a child a mother a father and one additional affected family member where in addition to the child the proband one of the parents are affected and one additional family member is affected The fourth family member can be any related and affected family member such as a sibling gra
323. pound heterozygous variants from an extended family of four where the fourth individual is not affected e Identify Rare Disease Causing Mutations in a Trio Identifies de novo and compound heterozygous variants from a Trio The workflow includes a back check for all family members e Identify Variants HD Calls variants in the mapped and locally realigned reads removes false positives and in case of a targeted experiment removes variants outside the targeted region Variant calling is performed with the Fixed Ploidy Variant Detection tool Although each workflow design to analyze Hereditary Diseases is specific to the data used or the type of analysis they share several tools and steps Below you can find a general description for how to run a workflow in the category Hereditary diseases In some workflows such as the Filter Causal Variants workflows you will be asked about a variant track as input Other workflows start with specifying a reads track This is the case for all workflows that starts with Identify Variants in the name Note that in case of workflows annotating variants using databases available for more than one population you can select the population that matches best the population your samples are CHAPTER 3 READY TO USE WORKFLOWS DESCRIPTIONS AND GUIDELINES 31 derived from This will be done in the wizard for populations from the 1000 Genomes Project while Hapmap populations are specified with the Data
324. quencing reads from the sample that should be analyzed If several samples should be analyzed the tool has to be run in batch mode This is done by selecting Batch tick Batch at the bottom of the wizard as shown in figure 6 42 and select the folder that holds the data you wish to analyse If you have your sequencing data in separate folders you should choose to run the analysis in batch mode CHAPTER 6 WHOLE EXOME SEQUENCING WES 128 Identify and Annotate Variants WES Select sequencing reads Navigation Area Selected elements 1 gt cancer_research_workbench Tumor reads trimmed paired 1 Choose where to run 2 Select sequencing reads whole_exome_sequencing whole_genome_sequencing targeted_amplicon_sequencing example_set 3 part_chr1 oldTestResults tumor reads Tumor reads normal reads somaticVariants 4 mW Qy lt enter search term gt E Batch Cees Figure 6 42 Please select all sequencing reads from the sample to be analyzed When you have selected the sample s you wish to prepare click on the button labeled Next 3 In the next wizard step figure 6 43 you can select the population from the 1000 Genomes project that you would like to use for annotation Bx Identify and Annotate Variants WES 1000 Genomes 1 Choose where to run casei 1000 Genomes 1000GENOMES phase_1 EUR 2 S lect sequencing r
325. r sample reads click on the button labeled Next 2 In the next wizard step figure 7 24 please specify the normal sample reads a Identify Somatic Variants from Tumor Normal Pair TAS Select sequencing reads Navigation Area Selected elements 1 Input for Reads from gt whole_exome_sequencing a ERR319085 normal sample tq whole_genome_sequencing k3 targeted_amplicon_sequencing Input for Reads from oc Pi z melanoma_ffpe i fresh_frozen g tumor_1 FERR 319085 4 Ww Choose where to run Q7 lt enter search term gt Figure 7 24 Select the normal sample reads 3 When you have selected the sample s you wish to analyze click on the button labeled Next This step allow you to restrict the calling of InDels and structural variants to the targeted regions figure 7 25 4 Click on the button labeled Next to go to the next wizard step figure 7 26 CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 1 7 r Bx Identify Somatic Variants from Tumor Normal Pair TAS InDels and Structural Variants normal Configurable Parameters 1 Choose where to run 2 oe normal sequencing Restrict calling to target regions P target_regions reacs 3 Select tumor sequencing gt Locked Settings reads 4 InDels and Structural Variants normal aa ada Figure 7 25 Specify the target regions track P Bx Identify Somatic
326. rd shows the kind of input that should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard R 1 Choose where to run Select sequencing reads Navigation Area Selected elements 1 2 Selectreads from W Family of Four a i Family member affected affected family member i Affected child 4 Father affected Mother unaffected H Family member affected w nter search term gt Previous gt Next Figure 6 7 7 Specify the sequencing reads for the appropriate family member 3 Specify a target region file for the Indels and Structural Variants tool figure 6 78 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents f InDels and Structural Variants Configurable Parameters Restrict calling to target regions S0293689_Regions_BED gt Locked Settings Figure 6 78 Specify the parameters for the Indels and Structural Variants tool 4 Specify the parameters for the QC for Target Sequencing tool including a Target re
327. re to run Navigation Area Selected elements 1 2 Select reads from affected family member Family of Four i Affected child 1 Father affected i Mother unaffected Family member affected iii Family member affected Previous gt Next Keel Figure 6 71 Specify the sequencing reads for the appropriate family member 3 Select the sequencing reads from the mother 4 Select the targeted region file figure 6 72 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents 5 Select the sequencing reads from the affected child 6 Specify the affected child s gender figure 6 73 CHAPTER 6 WHOLE EXOME SEQUENCING WES 149 Select input for targeted region file Navigation Area Selected elements 1 B targeted_sequencing JE 0293689_Regions_BED ols j CTFR EG gle malila 5 44S0293689_Regions_BED 7 Qr lt enter search term gt Previous gt Next Figure 6 72 Select the targeted region file you used for sequencing Trio Analysis Configurable Parameters Child gender Female
328. reate Filter Criteria CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 1 75 ly Genome Browse X 115 258 720 115 258 740 115 258 760 115 258 780 115 258 733 Homo_sapiens_sequence_hg19 TT CT GGATTAGCT GGATT GT CAGT GCGCT TTT CCCAACACCACIT GCT CCAACCACCACCAGTTTGTACTCAGT CATTTCACACCAG Homo_sapiens_ensembI_v73_Genes Gene annotations 5 321 Homo_sapiens_ensembI_v73_mRNA mRNA annotations 15 412 Homo_sapiens_ensembI_v73_CDS CDS annotations 7 923 ERR319087 single Reads a locally realigned Variants Somatic Candidate Variants Variants 2 Cosmic_v67 Variants 126 891 ClinVar_20130930 Variants 3 496 1 00 Phast Cons _conservation_scores_hg19 Graph 0 00 4 il DE m gt mo Fa San E ERR3 19087 si X Rows 16 Table view Genome Filter Chromosome Region Type Reference Allele Reference Length Zygosity Exact match start posi end posit BaseQRank al 115258748 SNV Ii No 1 Heterozygous Cosmic v67 0 0 0 0 1 115258749 SNV T C No 1 Heterozygous 0 0 0 0 2 212652720 212652721 MNV TG GT No 2 Heterozygous 0 0 0 0 d 129 a AIZO a A aca Las Alo 4 eterna ein 4 2 PET lii 4h gt als Create Track from Selection al E Figure 7 22 The Genome Browser View showing the annotated somatic variants together with a range of other tracks This tool can be used to specify the filter and the Annotate V
329. reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html 11 Specify the parameters for the Fixed Ploidy Variant Detection tool for the affected parent 12 Specify the par
330. result track presented in the context of the human reference sequence genes transcripts coding regions targeted regions and mapped sequencing reads Finally a track with conservation scores has been added to be able to see the level of nucleotide conservation from a multiple alignment with many vertebrates in the region around each variant By double clicking on one of the annotated variant tracks in the Genome Browser View a table will be shown that includes all variants and the added information annotations see 7 14 Note We do not recommend that any of the produced files are deleted individually as some of them are linked to other outputs Please always delete all of them at the same time 7 2 Somatic Cancer TAS 7 2 1 Filter Somatic Variants TAS If you are analyzing a list of variants that have been detected in a tumor or blood sample where no control sample is available from the same patient you can use the Filter Somatic Variants TAS ready to use workflow to identify potential somatic variants The purpose of this ready to use workflow is to use publicly available or your own databases with common variants in a population to extract potential somatic variants whenever no control normal sample from the same patient is available The Filter Somatic Variants TAS ready to use workflow accepts variant tracks t e g the output from the Identify Variants ready to use workflow as input Variants that are identical to the
331. rgeted region file Navigation Area Selected elements 1 targeted_sequencing E 0293689_Regions_BED i B CTFR Qr lt enter search term gt 8x Eo Remove Variants Found in HapMap 3 Select variant tracks HapMap database track Selected 12 elements 1000 Genomes population Remoye Variants Found in m 1000 Genomes Project Bx Select Variant track lt Remove Variants Found in HapMap 3 Figure 7 62 Select the relevant Hapmap population s When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows Again you can choose to use the default settings or you can choose to adjust the parameters QC for Target Sequencing proband Configurable Parameters Minimum coverage 30 Ignore non specific matches Ignore broken pairs E gt Locked Settings Figure 7 63 Specify the parameters for the QC for Target Sequencing tool The parameters that can be set are e Minimum coverage provides the length of each target region that has at least this coverage CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 201 e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored 8 Specify the parameters for the QC for Target Sequencing tool for the unaffected parent 9 Spec
332. rial covered by this manual This usermanual provides introductory material on how to work with the software including the import and initial handling of data and a guide to the data types and user interface Its main focus is to provide guidance on how to use the workflows that come with the software Also included is an appendix where there is a table listing the available reference data as well as a small dictionary of terminology used in the Biomedical Genomics Workbench The dictionary is not exhaustive but we hope it will serve as a useful reference especially for new users For comprehensive descriptions of the features and functionalities of the individual tools please refer to the Biomedical Genomics Workbench reference manual 1 4 We welcome your comments and suggestions We aim to provide user friendly software for important analyses such as identifying inherited disease traits and identifying somatic mutations that underlie this complex disease To this end we continuously develop our bioinformatic platform expand the collection of research tools and extend our documentation resources We welcome comments or suggestions you have These help us greatly in further developing and improving our software Comments and suggestions can be submitted directly from within the software using the menu option Biomedical Genomics Workbench Help Contact Support 1 5 Contact information The Biomedical Genomics Workbench is developed by
333. riants FF A variant track holding the identified variants that are found in the targeted regions The variants can be shown in track format or in table format When holding the mouse over the detected variants in the Genome Browser view a tooltip appears with information about the individual variants You will have to zoom in on the variants to be able to see the detailed tooltip e Annotated Somatic Variants A variant track holding the identified and annotated somatic variants The variants can be shown in track format or in table format When CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 181 holding the mouse over the detected variants in the Genome Browser view a tooltip appears with information about the individual variants You will have to zoom in on the variants to be able to see the detailed tooltip e Genome Browser View Tumor Normal Comparison lyp A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions the mapped reads for both normal and tumor the annotated somatic variants information from the ClinVar database and finally a track showing the conservation score see figure 7 33 ly Genome Browse X 149 800 000 oe a aa lade ois 150 600 000 I l Homo_sapiens_sequen Homo_sapiens_ensem lt 4 bl_v74_Genes p I Gene annotations 5 363 EE H 4 a E gt is PICK C D gt DID ED D D DI
334. riants represent the same allele although there is no evidence for this in the track of known variants Select DNA sequencing reads Select RNA sequencing reads InDels and Structural Variants RNA Low Frequency Variant Detection RNA Add Information from 1008 Genomes Project Intersection li sf Bx Compare Variants in DNA and RNA Add Information from 1000 Genomes Project Intersection Configurable Parameters Known variants track 1000GENOMES_phase_1_AFR 1000GENOMES_phase_1_AMR 100C 4 gt Locked Settings Figure 8 12 Select the relevant population from the drop down list CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 228 Repeat the 2 previous steps or 3 if you are working with the workflow from the human folder to specify the target region set the parameters for the Low Frequency Variant Detection the DNA sample and potentially the population from the 1000 Genomes Project that characterizes best your DNA sample 8 Click on the button labeled Next to go to the result handling step figure 8 13 Compare Variants in DNA and RNA Result handling Select DNA sequencing reads Select RNA sequencing reads InDels and Structural Variants RNA Workflow parameters Low Frequency Variant Preview All Parameters Detection RNA 1000 Genomes Project Intersection Add Information from Result handling InDels and Struct
335. rkflows gt Create new Workflow in the upper right side corner of the workbench figure 9 1 a Data Managem mo Workflows lt New Workflow 2 Manage Workflows Figure 9 1 Click on Create new Workflow Next drag and drop the preinstalled workflow that you would like to modify from the toolbox to the opened Workflow Editor figure 9 2 You can now see the underlying workflow If you right click on the View Area and click Layout the layout will be adjusted You will see that at this point you do not have any input associated with the workflow Please add an input at the top of the workflow by right clicking on the first tool in the workflow Navigation Area 4 Copy of Annot X 2a CopyofAnnot X Copy of Annot X fae O ITD k Identify somaticvariants from tumor normal pair WG SRR719299_1 paired trimmed paired Read N m SRR719299_1 paired trimmed paired Mappin SRR719300_1 paired trimmed paired Read N E SRR719300_1 paired trimmed paired Mappin Mp SRR719300_1 paired trimmed paired SRR7 varant Tacx Gene yack MANA Ya T Acs intormation from Overispping Genes gt gt SRR719300_1 paired trimmed paired SRR7 Anmousted Variant Treck lz Genome Browser View Tumor _ Normal Comparis ae a E Identify keker from Tumor Pe a ae Sg ERR319087 si Read Mapping Norm ee ols Copy of Identify Somatic Variants from Tumor N ioten So Amine Ac
336. rmal Pair WES 116 6 2 3 Identify Variants WES 1 ee 121 6 2 4 Identify and Annotate Variants WES 0 0 00 eee eae 126 6 3 Hereditary Disease WES 008 ee eee eee ee ee 133 6 3 1 Filter Causal Variants WES HD 2 2 0 2 ee ee ee 133 6 3 2 Identify Causal Inherited Variants in Family of Four WES 135 6 3 3 Identify Causal Inherited Variants in Trio WES 139 6 3 4 Identify Rare Disease Causing Mutations in Family of Four WES 143 6 3 5 Identify Rare Disease Causing Mutations in Trio WES 148 6 3 6 Identify Variants WESHD 1 446465 644848 ee ewe ww ee 152 6 3 7 Identify and Annotate Variants WES HD 4 155 The protein coding part of the human genome accounts for around 1 of the genome and consists of around 180 000 exons covering an area of 30 megabases Mb Ng et al 2009 By targeting sequencing to only the protein coding parts of the genome exome sequencing is a cost efficient way of generating sequencing data that is believed to harbor the vast majority of the disease causing mutations Choi et al 2009 Thirteen ready to use workflows are available for analysis of whole genome sequencing data figure 6 1 The concept of the pre installed ready to use workflows is that read data are used as input in one end of the workflow and in the other end of the workflow you get a track based genome browser view and a table with all the i
337. rmation about the variants e De novo variants Variant track showing de novo variants in the proband The variant track can be opened in table view to see all information about the variants CHAPTER 6 WHOLE EXOME SEQUENCING WES 152 e Recessive variants Variant track showing recessive variants in the proband The variant track can be opened in table view to see all information about the variants e Identified Compound Heterozygous Genes Proband Gene track with the identified putative compound heterozygous Variants in the proband The gene track can be opened in table view to see the gene names e Gene List with de novo Variants Gene track with the identified putative compound heterozy gous Variants in the proband The gene track can be opened in table view to see the gene names e Gene List with recessive Variants Gene track with the identified recessive variants in the proband The gene track can be opened in table view to see the gene names e Target Region Coverage Report One for each family member The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from each sample e Target Region Coverage One track for each individual When opened in table format it is possible to see a range of different information about the targeted regions such as target region length read count and base count e Genome Browser View This is a collection of tracks shown together in a view
338. rrow pointing to the right side in the middle of the wizard to send them to Selected elements in the right side of the wizard To run the samples in Batch mode CHAPTER 4 GETTING STARTED B Import Choose which files should be imported 52 1 Choose where to run Look in Downloads 2 Choose files to import a Po G R SRA FileDownloader jnip staff allservers large db 2014 01 15 1 lic Recent Items staff allservers large db 2014 01 15 lic _ teamevent2013 without consulting psf is jB TortoiseSVN 1 7 10 23359 x64 svn 1 7 7 msi Desktop 3 tracks from Uwe zip 3 trim zip J _ trunk only base psf My Deasnents j5 winedt55 exe Cy winedt80 64 exe f l dweb6 A E wordweb6 exe Computer 4 m d aw File name trim zip Ne Files of type l ene Ontology Annotation file v Options eneric annotation file for expression data csv txt Automatic import eric expression data table format txt text csv Force import as type Ictaden sequence sdn Force import as external Table in CSV format csv Trim adapter ist x s xlsx csv a Figure 4 20 After you have identified the trim list that you want to import select Trim adapter list xls xISx csv in the Files of type drop down list in the Import wizard Toolbox Ready to Use Workflows i Preparing Raw Data Wa Frepare Overlapping Raw Data hg Prepare Raw Data Figure 4 21 The ready to use workflows are found in the toolbox
339. rs At this step you can only view the parameters and it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify Causal Inherited Variants in a Trio TAS workflow Six types of output are generated e Reads Tracks One for each family member The reads mapped to the reference sequence e Variants in One track for each family member The variants identified in each of the family members The variant track can be opened in table view to see all information about the variants e Putative Causal Variants in Child The putative disease causing variants identified in the child The variant track can be opened in table view to see all information about the variants e Gene List with Putative Causal Variants Gene track with the identified putative causal variants in the child The gene track can be opened in table view to see the gene names e Target Region Coverage Report One for each family member The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from each sample e Target Region Coverage One track for each individual When opened in table format it is possible to see a range of different information about the targeted regions such as target region length read count and base count e An Amino Acid Track Shows the consequences of the variants at the amino acid level in the context of the original a
340. rt html 3 An RNA Gene Expression 5 A track showing gene expression annotations Hold the mouse over or right clicking on the track If you have zoomed in to nucleotide level a tooltip will appear with information about e g gene name and expression values CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 229 m All parameters for Compare Variants in DNA and RNA DNA sequencing reads Workflow Input tumor_01 paired D l RNA sequencing reads Workflow Input 27T_R1_001 paired E CDS Workflow Input Homo_sapiens_ensembl_v74_CDS Genes Workflow Input Homo_sapiens_ensembl_v74_Genes D ClinVar Workflow Input gt gt Clinvar_20131203 0 Conservation scores Workflow Input PhastCons_conservation_scores_hg19 O Export to Excel 2010 amp Export Parameters Figure 8 14 Preview all parameters At this step it is not possible to introduce any changes it is only possible to view the settings 4 An RNA Transcript Expression 5 A track showing transcript expression annotations Hold the mouse over or right clicking on the track A tooltip will appear with information about e g gene name and expression values 5 A Filtered Variant Track with All Variants Found in DNA or RNA This track shows all variants that have been detected in either RNA DNA or both only the variants that are present in both DNA and RNA With the table icon the lower left part of the View Area it is pos
341. s HapMap database track Selected 12 elements 1000 Genomes population Remoye Variants Found in n 1000 Genomes Project Bx Select Variant track z Remove Variants Found in Selected HapMap 3 IHAPMAP_phase_3_MKK HAPMAP _phase_3_CHD HAPMAP _phase_3_TSI HAPMAP phase _3_CHB A HAPMAP _phase_3_GIH HAPMAP phase _3_HCB HAPMAP _phase_3_LWK HAPMAP _phase_3_CEU HAPMAP _phase_3_MEX HAPMAP phase _3_YRI HAPMAP _phase_3_JPT HAPMAP _phase_3_ASW Figure 6 62 Select the relevant Hapmap population s CHAPTER 6 WHOLE EXOME SEQUENCING WES 141 Specify the parameters for the QC for Target Sequencing tool for the affected child 10 figure 6 63 When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows Again you can choose to use the default settings or you can choose to adjust the parameters QC for Target Sequencing proband Configurable Parameters Minimum coverage 30 Ignore non specific matches Ignore broken pairs E b Locked Settings mmj Figure 6 63 Specify the parameters for the QC for Target Sequencing tool The parameters that can be set are e Minimum coverage provides the length of each target region that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken p
342. s but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs v Minimum coverage Minimum count Minimum frequency gt Locked Settings Figure 5 38 Specify the parameters for the Fixed Ploidy Variant Detection tool The parameters that can be set are e Required variant probability is the minimum probability value of the variant site required for the variant to be called Note that it is not the minimum value of the probability of the individual variant For the Fixed Ploidy Variant detector if a variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely
343. s been added to be able to see the level of nucleotide conservation from a multiple alignment with many vertebrates in the region around each variant By double clicking on one of the annotated variant tracks in the Genome Browser View a table will be shown that includes all variants and the added information annotations see 6 14 Note We do not recommend that any of the produced files are deleted individually as some of them are linked to other outputs Please always delete all of them at the same time CHAPTER 6 WHOLE EXOME SEQUENCING WES 110 ly Genome Browse X 22 712 200 22 712 200 22 712 400 22 712 500 l Homo_sapiens_sequen MJMMNNON ME add O BUND B NINN OAB 0 MN UO Me OS MN a ON OOOO O UDIAN OD OOO OO QOMO OD 1 ON AMODA OOM QI OOO OVON OMAD O OAD OOO Q0 QUOMODO O OOU OOO OMM O ON OO O OM Homo_sapiens_ensem Identify Known Variants in One Sample WES Target Regions overage BED annotations 4 829 o Identify Known Variants in One Sample WES Read Mapping 845 245 reads 101 Identify Known Variants in One Sample WES Overview Variants Detected Variants 1 431 WES Variants Detected in Detail Variants 1 431 mS Pa 5 Sh Figure 6 13 Genome Browser View that allows inspection of the identified variants in the context of the human genome and external databases 6 2 Somatic Cancer WES 6 2 1 Filter Somatic Variants WES If you are analyzing a list of variant
344. s number of control reads show the particular allele will be filtered away in the result track 8 Click on the button labeled Next to go to the last wizard step shown in figure 8 22 CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS Select normal sequencing reads Select tumor sequencing reads Create Fold Change Track InDels and Structural Variants Normal InDels and Structural Variants Tumor Low Frequency Variant Detection Figure 8 20 Bx Identify Candidate Variants and Genes from Tumor Normal Pair Low Frequency Variant Detection Configurable Parameters Required significance Restrict calling to target regions Ignore broken pairs Ignore non specific matches Minimum read length Minimum coverage Minimum count Minimum frequency Base quality filter Read direction filter Direction frequency Relative read direction filter Significance Read position filter Significance Remove pyro error variants IE In homopolymer regions with minimum length 3 With frequency below gt Locked Settings Specify the parameters for variant calling Select normal sequencing reads Select tumor sequencing reads Create Fold Change Track InDels and Structural Variants Normal InDels and Structural Variants Tumor Low Frequency Variant Detection Remove Germline Variants lis uaa Bx Iden
345. s should be analyzed the tool has to be run in batch mode This is done by selecting Batch tick Batch at the bottom of the wizard as shown in figure 7 42 and select the folder that holds the data you wish to analyze If you have your sequencing data in separate folders you should choose to run the analysis in batch mode When you have selected the sample s you wish to prepare click on the button labeled Next 2 In this wizard you can restrict calling of InDels and structural variants to the targeted regions by specifying the track with the targeted regions from the experiment figure 7 35 3 In the next wizard step figure 7 36 you have to specify the track with the targeted regions from the experiment You can also specify the minimum read coverage which should be present in the targeted regions 4 Click on the button labeled Next which will take you to the next wizard step figure 7 37 In this wizard you can specify the parameter for detecting variants 5 Click on the button labeled Next which will take you to the next wizard step figure 7 38 CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS A Identify Variants TAS 1 Choose where to run 2 Select sequencing reads 3 InDels and Structural Variants InDels and Structural Variants Configurable Parameters Restrict calling to target regions gt target_regions b Locked Settings Figure 7 35 Select the track
346. s that have been detected in a tumor or blood sample where no control sample is available from the same patient you can use the Filter Somatic Variants WES ready to use workflow to identify potential somatic variants The purpose of this ready to use workflow is to use publicly available or your own databases with common variants in a population to extract potential somatic variants whenever no control normal sample from the same patient is available The Filter Somatic Variants WES ready to use workflow accepts variant tracks Fr e g the output from the Identify Variants ready to use workflow as input In cases with heterozygous variants the reference allele is first filtered away then variants outside the targeted region are removed and lastly variants found in the Common dbSNP 1000 Genomes Project and HapMap databases are deleted Variants in those databases are assumed to not contain relevant somatic variants Please note that this tool will likely also remove inherited cancer variants that are present at a low percentage in a population Next the remaining somatic variants are annotated with gene names amino acid changes conservation scores and information from ClinVar known variants with medical impact and dbSNP all known variants How to run the Filter Somatic Variants WES workflow To run the Filter Somatic Variants WES tool go to CHAPTER 6 WHOLE EXOME SEQUENCING WES 111 ly Genome Browse X 17
347. s_ensembl_v73_Genes Gene annotations 5 321 Homo_sapiens_ensembl_v73_CDS CDS annotations 7 923 0 Paired reads_Sample_1 locally realigned 1 4 171 282 reads 749 Sample_1 paired Reads 1 locally realigned Variants MVF 1 Variants 8 447 Figure 2 13 We have now zoomed in on one specific SNV that is found in a coding region By holding the mouse over the variant a tooltip will appear that provide further information about the specific variant In this case we have found a heterozygous SNV The normal base at this position is G but in some of the reads you will see a T Actually you can only see one T in the reads but if you look in the stacked reads which are those in the color mass where you cannot see each individual read represented there are four green lines read box indicating that there are Ts at this position in more reads When holding the mouse over an individual SNV as highlighted in the red circle a tooltip will appear with information about the SNV This tooltip informs us that 29 Ts are observed in the 447 reads covering this particular position When hovering the mouse cursor over a particular base in the reference track the genomic position for this base is shown as highlighted gt CCCTCAGCCAGCTGTTCTTGGAGGTCCTGCCCCTGGGAC gt CCCTCAGCCAGCTGTT gt CCCTCAGCCAGCTGT TCT TGGAGGTCCTGCCCCTGGGAC gt CCCTCAGCCAGC TTGCAGGTCCTGCCCCTGGGAC gt CACTCAGCCAGCTGT TCT TGGAGGTCCTGCCCCTGGGAC gt CCCTCAGCCAGCTGT TCT TGGAG
348. se causing mutations in families Hereditary diseases can be non cancer related diseases such as inherited heart diseases or familial hypercholesterolemia or it can be inherited cancers such as hereditary colorectal cancer or hereditary breast cancer In addition to the hereditary diseases family analysis can help researchers identify rare disease causing mutations that can be e anew mutation also known as a de novo mutation that is only present in a child and not in any of the parents e a combination of events that occur in the same gene but at different positions in each of the parents which is not disease causing by itself in either of the parents but when both variants are found in a child it becomes disease causing this type of variant is known as a compound heterozygous variant CHAPTER 3 READY TO USE WORKFLOWS DESCRIPTIONS AND GUIDELINES 30 A range of different workflows exist in this category that have been optimized for different purposes In the current version of the Biomedical Genomics Workbench we offer workflows tailored to two family sizes 1 a classical Trio consisting of a mother father and an affected child the proband and 2 a Family of Four which is mother father affected child and either a sibling in the workflows that detects rare diseases or another affected family member in the workflows that detect inherited diseases that can be any affected relative such as a sibling grand parent or the lik
349. se where to run Select sequencing reads 1000 Genomes Workflow parameters Preview All Parameters QC for Target Sequencing Low Frequency Variant Result handling Detection Open Remove Variants Outside Save Targeted Regions Add Information from Log handling 1000 Genomes Project Open log 8 Add Information from HapMap 9 Result handling Figure 7 49 Check the settings and save your results of the reads are mapped to the human reference sequence In case of a targeted experiment please also check that the majority of the reads are mapping to the targeted region Next open the Genome Browser View file see figure 7 50 The Genome Browser View includes a track of the identified annotated variants in context to CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 191 the human reference sequence genes transcripts coding regions targeted regions mapped sequencing reads clinically relevant variants in the ClinVar database as well as common variants in common dbSNP HapMap and 1000 Genomes databases iz Genome Browse X 20 000 000 100 000 000 150 000 000 es lace jomo_sapiens_sequence_hg19 78 Homo_sapiens_ensembl_v73_Genes 0 412 Homo_sapiens_ensembl_v73_MRNA 0 Lanska she ete ee O ee Latiii ie_ a hte 223 Homo_sapiens_ensembI_v73_CDS iL 0 Page ee rs Se 2 ERR319087 single Reads locally realigned coverage 0 602 00 ERR319087 single Reads
350. shown Additional analyses can be performed downsteam of this if desired Downstream analysis could involve using another ready to use workflow or could involve running individual tools from the Tools section of the Toolbox The ready to use workflows to run and how many of them to run depend on the type of data you have and the analysis you wish to perform For example overlapping paired data involves other considerations than single or non overlapping paired data Different workflows will be relevant CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 17 if your aim is to detect variants or annotate variants with information from other databases Typically you will need to run two or three workflows to complete a full analysis that includes preparation of the raw data Figure 2 shows some of the ready to use workflows that are available for each application Irrespective of the application type the first step involves preparation of the raw data The ready to use workflow to choose to launch the data preparation depends on the type of data being analyzed For example the Prepare Overlapping Raw Data workflow is designed to handle reads with overlapping pairs whereas the Prepare Raw Data workflow is for read sets without overlapping pairs The initial data preparation step involves quality control and trimming of the reads 2 4 The track format The Biomedical Genomics Workbench provides a built in Genome Browser This vie
351. sible to add tracks to the Genome Browser View such as mapped sequencing reads as well as other tracks This can be done by dragging the track directly from the Navigation Area to the Genome Browser View If you double click on the name of the annotated variant track in the left hand side of the Genome Browser View a table that includes all variants and the added information annotations will open see figure 8 6 The table and the Genome Browser View are linked if you click on an entry in the table this particular position in the genome will automatically be brought into focus in the Genome Browser View CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 224 yy Genome Browse X 19 020 55 249 040 55 249 060 55 249 080 55 249 100 Homo _sapiens_sequence_hg19 ac GT GT GCCGCCTGCTGGGCAT CT GCCT CACCTCCACCGT GCACCT CAT CACGCAGCT CAT GCCCTTCGGCTGCCTCCTGGACTATC Homo_sapiens_ensembI_v73_Genes Gene annotations 2 818 Homo_sapiens_ensembI_v73_mRNA mRNA annotations 8 251 Homo_sapiens_ensembI_v73_CDS CDS annotations 4 332 ERR319087 single Reads locally realigned Variants Annotated Variants Variants 6 Cosmic_v67 Variants 64 363 Er5e n oma a ClinVar_20130930 Variants 4 176 dbsnp_v138 Variants 7 205 594 5 1 00 Phast Cons _conservation_scores_hg19 Graph 0 00 gt 4 a E ERR3 19087 si X Rows 26 Table view Genome Fiter Chromosome Region Typ
352. sible to switch to table view The table view provides details about the variants such as type zygosity and information from a range of different databases 7 A Genome Browser View Variants Found in DNA and RNA b A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions and variants detected in ClinVar and dbSNP see figure 8 15 The three most important tracks generated are the Variants found in both DNA and RNA track All variants found in DNA or RNA track and the Genome Browser View The Genome Browser View makes it easy to get an overview in the context of a reference sequence and compare variant and expression tracks with information from different databases The two other tracks Variants found in both DNA and RNA track and All variants found in DNA or RNA track provides detailed information about the detected variants when opened in table view CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 230 Iv Genome Browse X E a aa 231_R1_001 paired A RNA Seg Reads locally realigned locally realigned 1 570 204 reads o 15 694 SRR719299_1 paired 3 440 237T_R1_001 paired GE Expression 0 s 10 747 231_R1_001 paired RNA Seq Reads 1 570 301 reads i 3 380 237T_R1_001 paired TE Expression a 26 23T_R4_004 paired FG Fusion Gene annotations 0 El s is B a ai B a a m I aa ll can Inc E
353. sion values 2 Transcript expression 5 A track showing transcript expression annotations Hold the mouse over or right clicking on the track A tooltip will appear with information about e g gene name and expression values 3 RNA Seq Mapping Report This report contains information about the reads reference transcripts and statistics This is explained in more detail in the Biomedical Genomics Workbench reference manual in section RNA Seq report http clcsupport com biomedicalgenomicsworkbench current index php manual RNA_Seq_report html 4 Read Mapping The mapped RNA seq reads The RNA seq reads are shown in different colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the description in see http www clcsupport com biomedicalgenomicsworkbench current index php manual View_settings_in_ Side _Panel html 5 Annotated Variants with Expression Values Annotation track showing the variants Hold the mouse over one of the variants or right clicking on the variant A tooltip will appear with detailed information about the variant 6 RNA Seq Genome Browser View lyp A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions and variants detected in ClinVar and dbSNP see figure 8 15 CHAPTER 8 WHOLE TRANSCRIPTOME
354. specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents CHAPTER 6 WHOLE EXOME SEQUENCING WES 140 Select sequencing reads Navigation Area Selected elements 1 2 Select reads from a i Family member affected affected family member ffected child 1 Choose where to run Qr lt enter search term gt Select input for targeted region file Navigation Area Selected elements 1 B E targeted_sequencing JE S 0293689_Regions_BED E CTFR 3 6 agilent_sure_select 5 44S0293689_Regions_BED 4 m Qy zenter search term gt Figure 6 61 Select the targeted region file you used for sequencing 5 Select the reads for the affected child 6 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap figure 6 62 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 Bxl Da Remove Variants Found in HapMap 3 1 Select variant track
355. st the parameters QC for Target Sequencing proband Configurable Parameters Minimum coverage 30 Ignore non specific matches Ignore broken pairs gt Locked Settings EDES _X cancel Figure 3 6 Specify the parameters for the QC for Target Sequencing tool The parameters that can be set are e Minimum coverage provides the length of each target region that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored When asked for it specify the targeted regions track figure 3 7 For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual QC_Target_Sequencing html Map Reads to a reference figure 3 8 For this tool the Autodetect paired distances settings is switched off in all Targeted Amplicon Sequencing workflows CHAPTER 3 READY TO USE WORKFLOWS DESCRIPTIONS AND GUIDELINES 34 QC for Target Sequencing Configurable Parameters Track of Target Regions gt S0293689_Regions_BED ce Minimum coverage 30 Ignore non specific matches Ignore broken pairs gt Locked Settings X Concel _Map Reads to Reference Configurable Parameters References 6 Homo sapiens hg 19 sequence gt Locked Settings
356. support com biomedicalgenomicsworkbench current index php manual View_settings_ in_Side_Panel html Differentially Expressed Genes file ag A track showing the differentially expressed genes The table view provides information about fold change difference in expression the maximum expression observed in either the case or the control the expression in the case and the expression in the control Variant Calling Report Tumor Report showing error rates for quality categories quality of examined sites and estimated frequencies of actual to called bases for different quality CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 235 score ranges Annotated Somatic Variants with Expression Values A variant track showing the somatic variants When mousing over a variant a tooltip will appear with information about the variant 8 Amino Acid Track 9 Genome Browser View RNA Seq Tumor_Normal Comparison l A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions and variants detected in ClinVar and dbSNP see figure 8 24 ly Genome Browse X 55 000 oe an a Homo_sapiens_sequen Homo_sapiens_ensem bl_v74_Genes Gene annotations 5 363 Homo_sapiens_ensem bl_v74_mRNA mRNA annotations 15 469 iiidid Homo_sapiens_ensem bl_v74_CDS CDS annotations 8 031 3335 27T_R1_001 paired
357. t 4s E p is Data interpretation Figure 4 31 Use the prepared data as input in the relevant ready to use workflow which we here for the sake of simplicity call Workflow 2 Chapter 5 Whole genome sequencing WGS Contents 5 1 General Workflows WGS 0 2 eee et ee ee 61 5 1 1 Annotate Variants WGS a a a 61 Sl Identify Known Variants in One Sample WGS 00 65 5 2 Somatic Cancer WGS 2 0 2 eee ee ee 69 5 2 1 Filter Somatic Variants WGS 2 a 69 5 2 2 Identify Somatic Variants from Tumor Normal Pair WGS 14 5 2 3 Identify Variants WGS 2 2664 4 865 Rew Rew ewe ee Ew HRS TI 5 3 Hereditary Disease WGS 0 02 ee te ee ee 81 5 3 1 Filter Causal Variants WGS HD 2 0 2 eee ee es 81 5 3 2 Identify Causal Inherited Variants in Family of Four WGS 84 5 3 3 Identify Causal Inherited Variants in Trio WGS 4 87 5 3 4 Identify Rare Disease Causing Mutations in Family of Four WGS 90 5 3 5 Identify Rare Disease Causing Mutations in Trio WGS 94 5 3 6 Identify Variants WGS HD aoaaa a a 97 The most comprehensive sequencing method is whole genome sequencing that allows for identification of genetic variations and somatic mutations across the entire human genome This type of sequencing encompasses both chromosomal and mitochondrial DNA The advantage of sequencing the entire genome is that not only the protein coding regions are sequenc
358. t ready to use workflow e g Identify Variants WES CHAPTER 4 GETTING STARTED or 5 Not merged reads output These should be used as input together with the Merged reads output in the next ready to use workflow e g Identify Variants WES Prepare Raw Data Performs quality control and trimming of the sequencing reads and generates five different outputs 1 QC graphic report The report should be checked by the user 2 QC supplementary report The report should be checked by the user 3 Trimming report The report should be checked by the user 4 Trimmed sequences output Use as input together with the Trimmed sequences broken pairs output in the next ready to use workflow e g Identify Variants WES D Trimmed sequences broken pairs output Use as input together with the Trimmed sequences output in the next ready to use workflow e g Identify Variants WES 4 4 6 How to check the output reports Three different reports are generated and all of these should be inspected in order to determine whether the quality of the sequencing reads and the trimming is acceptable We are now at the Inspect results step in figure 4 30 The interpretation of the reports is not always completely Straightforward but as you gain experience it becomes easier Graphical QC Report e 1 Summary e 2 Per sequence analysis Lengths distribution GC content Ambiguous base content Quality distribution e 3 Per base analysis
359. t specified frequency Click on the button labeled Next 4 In the last wizard step figure 5 10 you can check the selected settings by clicking on the button labeled Preview All Parameters r Identify Known Variants in One Sample WGS Result handling 1 Select sequencing reads Workflow parameters 2 InDels and Structural Preview All Parameters 3 Identify Known Mutations from Sample Mappings Result handling 4 Result handling atid Save Log handling Open log Previous gt Next Figure 5 10 Check the settings and save your results At the bottom of this wizard there are two buttons regarding export functions one button allows specification of the export format and the other button the one labeled Export Parameters allows specification of the export destination When selecting an export location you will export the analysis parameter settings that were specified for this specific experiment 5 Click on the button labeled OK to go back to the previous dialog box and choose Save Note If you choose to open the results the results will not be saved automatically You can always save the results at a later point Output from the Identify Known Variants in One Sample WGS workflow The Identify Known Variants in One Sample WGS tool produces four different output types 1 Read Mapping Report The report consists of a number of tables and graphs that in
360. t sequencing reads 1 Choose where to run aoe Navigation Area 2 Select reads from affected family member Family of Four i Affected child Father affected Mother unaffected Family member affected m Selected elements 1 Family member affected Previous gt Next Figure 7 66 Specify the sequencing reads for the appropriate family member CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 204 4 Select the sequencing reads from for the father 5 Select the sequencing reads from the mother 6 Select the sequencing reads from the affected child T Specify the Hapmap populations that should be used for filtering out variants found in Hapmap for the de novo assembly figure 7 67 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 Bx x Remove Variants Found in HapMap 3 1 Select variant tracks HapMap database track Selected 12 elements oP 1000 Genomes population Remove Variants Found in 1000 Genomes Project Bx Select Variant track Remove Variants Found in Selected HapMap 3 HAPMAP _phase_3_MKK HAPMAP _phase_3 CHD HAPMAP _phase_3 TSI HAPMAP _phase_3_CHB 2am A HAPMAP phase _3_GIH HAPMAP _phase_3_HCB HAPMAP _p
361. t the variant The structural variants can also be viewed in table format by switching to the table view This is done by pressing the table icon found in the lower left corner of the View Area Structural Variant Report The report consists of a number of tables and graphs that in different ways provide information about the structural variants Read Mapping The mapped sequencing reads The reads are shown in different colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the descrip tion of sequence colors in the CLC Genomics Workbench manual that can be found here http www clcsupport com clcgenomicsworkbench current index php manual View_settings_in_Side_Panel html Read Mapping Report The report consists of a number of tables and graphs that in different ways provide information about the mapped reads Structural Variants A variant track holding the identified variants The variants can be shown in track format or in table format When holding the mouse over the detected variants in the Genome Browser view a tooltip appears with information about the individual variants You will have to zoom in on the variants to be able to see the detailed tooltip Genome Browser View Identify Variants lp A collection of tracks presented together Shows the annotated variants track together with the human refer
362. tfy Vanants and Add Expression Values Se identify and Annotate Differentially E FW Genes and Patheays i _ AnalysisofWGSdata _ Analysis of WES data _ Analysis of TAS data _ Analysis of WTS data 1 1 dF F 4 F Data interpretation Figure 4 19 Two ready to use workflows are available for data preparation Prepare Overlapping Raw Data and Prepare Raw data 4 4 3 How to run the Prepare Overlapping Raw Data ready to use workflow If your sequencing reads contain overlapping pairs you can use the Prepare Overlapping Raw Data ready to use workflow for preparation of your sequences before you proceed to data analysis such as variant calling 1 Go to the toolbox and double click on the Prepare Overlapping Raw Data ready to use workflow figure 4 21 This will open the wizard shown in figure 4 22 where you can select the reads that you wish to prepare for further analyses There are three ways you can prepare your data you can run them through the workflow one sample at the time or you can select several samples and prepare them simultaneously or finally you can run them in batch mode recommended if your data are found in separate folders If you use batch mode you will get an individual report for every single sample whereas you will get one combined report for all samples if you do not run in batch mode To run several samples at once select multiple samples from the left hand side list and use the small a
363. that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored CHAPTER 6 WHOLE EXOME SEQUENCING WES 14 7 16 Specify the parameters for the QC for Target Sequencing tool for the mother 17 Specify the parameters for the QC for Target Sequencing tool for the sibling 18 Specify the parameters for the QC for Target Sequencing tool for the affected child 19 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters and it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify Rare Disease Causing Mutations in a Family of Four WES workflow Twelve different types of output are generated e Reads Mapping One for each family member The reads mapped to the reference sequence e Variant Tracks One for each family member The variants identified in each of the family members The variant track can be opened in table view to see all information about the variants e Target Region Coverage One track for each individual When opened in table format it is possible to see a range of different information about the targeted regions such as target region length read count and base count e Target Region Coverage Report One for each family member The report consists of a n
364. the DNA reads that you would like to analyze figure 8 8 To select the DNA reads double click on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard Click on the button labeled Next 3 Select now the RNA reads to analyze See figure 8 9 4 Specify a target region for the analysis of the RNA sample with the Indels and Structural Variants tool figure 8 10 The targeted region file is a file that specifies which regions have been sequenced This file is something that you must provide yourself as this file depends on the technology used for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 226 Compare Variants in DNA and RNA Select sequencing reads 1 Choose where to run ae Navigation Area Selected elements 1 2 SelectDNAsequencng C whole poe iencing r tumor_01 paired ols normal_01 paired trimmed p i normal_01 paired trimmed fec normal_01 paired a result_identify_variants E Iir Q zenter search term gt Figure 8 8 Select the DNA reads to analyze Bx Compare Variants in DNA and RNA Select sequencing reads Navigation Area Selected elements 1 1 Select DNA sequencing ii i 27T_R1_001 paired reads Choose where to run EE 26N_R1_001 paired Select RNA sequencing 267 R1_001 paired
365. the button labeled Next which will take you to the next wizard step figure 7 46 In this dialog you can specify the target regions track The variants found outside the targeted region will be removed at this step in the workflow Click on the button labeled Next which will take you to the next wizard step figure 7 47 Once again select the relevant population from the 1000 Genomes project This will add information from the 1000 Genomes project to your variants 8 Click on the button labeled Next which will take you to the next wizard step figure 7 48 At this step you can select a population from the HapMap database This will add information from the Hapmap database to your variants 9 In this wizard step figure 7 49 you get the chance to check the selected settings by clicking on the button labeled Preview All Parameters In the Preview All Parameters wizard you CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS Choose where to run Select sequencing reads 1000 Genomes QC for Target Sequencing le a amet A ad Low Frequency Variant Detection ii Identify and Annotate Variants TAS Low Frequency Variant Detection Configurable Parameters Significance 1 0 Target regions gt target regions Ignore broken pairs Ignore non specific matches Reads Minimum read length 20 Minimum coverage 10 Minimum count 2 Minimum frequency 5 0 Base quality filter V Read direction filter Direction frequency
366. the toolbox and select the Annotate Variants TAS workflow In the first wizard step select the input variant track figure 7 2 CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 162 Annotate Vanants TAS Select the track of variants 1 Choose where to run Wak Navigation Area 2 Select the track of Variants 1 m k ERR319085 Reads locally realigned Va ibt ERR319085 Reads locally realigned Va j Q zenter search term gt E Batch Figure 7 2 Select the variant track to annotate 2 Click on the button labeled Next The only parameter that should be specified by the user is which 1000 Genomes population you use figure 7 3 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 Bx Annotate Vanants TAS 1000 Genomes 1 Choose where to run 1000 Genomes 1000GENOMES phase_1 FUR 2 Select the track of i variants 3 1000 Genomes Figure 7 3 Select the relevant 1000 Genomes population s 3 Click on the button labeled Next to go to the last wizard step figure 7 4 Bx Annotate Vanants TAS Result handling Choose where to run Workflow parameters Preview All Parameters es variant Result handling 1000 Genomes p Open Result handling Save Log handing E Open lo
367. tic Variants WGS Result handling Workflow parameters Select variant watks Preview All Parameters 1000 Genomes Choose where to run Result handling Remove Variants Found in Open 1000 Genomes d a Save 5 Remove Variants Found in X Log s _ Open log Figure 5 17 Check the selected parametes by pressing Preview All Parameters view The table and the variant track are linked together and when you click on a row in the table the track view will automatically bring this position into focus 2 Genome Browser View Filter Somatic Variants A collection of tracks presented together Shows the somatic candidate variants together with the human reference sequence genes transcripts coding regions and variants detected in ClinVar 1000 Genomes and the PhastCons conservation scores see figure 5 18 yy Genome Browse X 0 55 000 000 60 000 000 l I 65 000 C I Homo_sapiens_sequen gt Homo_sapiens_ensem bl_v74_Genes Gene annotations 2 876 D ll 44 E 4 1 DKI Bal LEINO I Homo_sapiens_ ensem bl_v74_mRNA bi mRNA annotations 8 281 F E E Homo_sapiens_ensem bl_v74_CDS CDS annotations 4 337 tl i 3 i of i i normal_01 MRN trimmed paired Somatic Candidate Variants Variants 2 o bib dileib i W 164 Clinvar_20131203 Variants 5 149 0 e a08 Cosmic_v67 Variants 64 363 o di o I Bo fs cl II Se 1 183 1000GENOMES p
368. tic variants whenever no control normal sample from the same patient is available The Filter Somatic Variants WGS ready to use workflow accepts variant tracks FFF e g the output from the Identify Variants ready to use workflow as input Variants that are identical to the human reference sequence are first filtered away and then variants found in the Common dbSNP 1000 Genomes Project and HapMap databases are deleted Variants in those databases are assumed to not contain relevant somatic variants Please note that this tool will likely also remove inherited cancer variants that are present at a CHAPTER 5 WHOLE GENOME SEQUENCING WGS 10 low percentage in a population Next the remaining somatic variants are annotated with gene names amino acid changes conservation scores and information from ClinVar known variants with medical impact and dbSNP all known variants How to run the Filter Somatic Variants WGS workflow To run the Filter Somatic Variants WGS tool go to Toolbox Ready to Use Workflows Whole Genome Sequencing z Somatic Cancer 4 Filter Somatic Variants 4 1 Double click on the Filter Somatic Variants WGS tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Next you will be asked to select the variant track you would like to use for filtering somatic variants The panel in the left side of the wizard shows the kin
369. tify Candidate Variants and Genes from Tumor Normal Pair Remove Germline Variants Configurable Parameters Keep variants with control read count below 2 b Locked Settings 233 Figure 8 21 Specify the number of reads to use as cutoff for removal of germline variants Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes see figure 8 23 Choose to save the results and click on the button labeled Finish Thirteen types of output are generated 1 Gene Expression Normal and Gene Expression Tumor m A track showing gene expression annotations Hold the mouse over or right clicking on the track A tooltip will appear with information about e g gene name and gene expression values 2 Transcript Expression Normal and Transcript Expression Tumor transcript expression annotations Hold the mouse over or right clicking on the track A tooltip will appear with information about e g gene name and transcript expression values A track showing CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 234 m Identify Candidate Variants and Genes from Tumor Normal Pair Result handling Select normal sequencing reads Select tumor sequencing reads Workflow parameters Create Fold Change Track Preview All Parameters InDels and Structural Variants Normal Result handling InDels and
370. tify interesting variants we recommend that variants with a conservation score of more than 0 9 PhastCons CHAPTER 6 WHOLE EXOME SEQUENCING WES 104 Inip Genome Browse X 50 000 000 100 000 000 _ an gt Homo_sapiens_sequen 79 Homo_sapiens_ensem bl_v74_Genes vv 0 403 Homo_sapiens_ensem bl_v74_mRNA FFF we nad ts tates lll o 223 Homo_sapiens_ensem bl_v74_CDS vr 0 11 ERR319085 ERR319085 Variants Annotated vv o 107 413 snp_v138 Variants 8 659 871 i j 0 716 Clinvar_20131203 Variants 7 801 111a o E Monia an hih an oe ee i ee eee eee 2 799 Cosmic_v67 ow Sar EITA EA es es talh kihk shl mMidhh 1000GENOMES Tian 1 _EUR Y 0 9 517 1000GENOMES phase_1 _AMR vr o 12 443 1000GENOMES phase_1 FR vr o 1 PhastCons_conservati on_scores_hg19 vv o A 4 aS F OB Figure 6 5 The output from the Annotate Variants ready to use workflow is a genome browser view a track list containing individual tracks for all added annotations score is prioritized over variants with lower conservation scores It is possible to filter variants based on their annotations This type of filtering can be facilitated using the table filter found at the top part of the table If you are performing multiple experiments where you would like to use the exact same filter criteria you can create a filter that can be saved
371. tion 6 QC for Target Sequencing tumor Z Remove Variants Outside Targeted Regions 2 asta Xc Figure 6 28 Select your target region track T Click on the button labeled Next to go to the step where you can adjust the settings for removal of germline variants figure 6 29 8 Click on the button labeled Next and once again select the target region track the same track as you have already selected in previous wizard steps figure 6 30 In the next wizard step you must once again select your target regions track This time you specify the track to be used for quality control of the targeted sequencing as this tool reports the performance enrichment and specificity of a targeted re sequencing experiment figure 6 31 In the next wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters figure 6 32 CHAPTER 6 WHOLE EXOME SEQUENCING WES 119 a Identify Somatic Variants from Tumor Normal Pair WES Remove Germline Variants Configurable Parameters Choose where to run Select tumor sequencing Keep variants with control read count below 2 reads Select normal sequencing gt Locked Settings reads InDels and Structural Variants tumor Low Frequency Variant Detection QC for Target Sequencing tumor lt Remove Variants Outside Targeted Regions 2 Remove Germline Variants
372. to be less reliable so ignoring CHAPTER 5 WHOLE GENOME SEQUENCING WGS 89 them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html 6 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap figure 5 39 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 z Remove Variants Found in HapMap 3 1 Select variant tracks i HapMap database track Selected 12 elements oP 2 1000 Genomes population 3 Remoye Variants Found in os Select V tt 1000 Genomes Project Bx
373. to check the selected settings by clicking on the button labeled Preview All Parameters figure 4 25 In the Preview All Parameters wizard you can only check the settings and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows At the bottom of the wizard there are two buttons regarding export functions one button allows specification of the export format and the other button the CHAPTER 4 GETTING STARTED 54 r All parameters for Prepare Overlapping Raw Data Trim Sequences Locked parameters Trim adapter list Illumina adpter list Ambiguous trim Ambiguous limit Quality trim Quality limit Use colorspace Also search on reversed sequence Remove 5 terminal nucleotides Number of 5 terminal nucleotides Maximum number of nucleotides in reads 1000 Minimum number of nucleotides in reads 15 Discard short reads Remove 3 terminal nucleotides Number of 3 terminal nucleotides Discard long reads Export to Excel 2010 ly Export Parameters Excel 2010 Excel 97 2007 Portable Document Format pdf ext Tab Delimited Figure 4 25 In this wizard you can check the parameter settings It is also possible to export the settings to a file format that can be specified using the Export to drop down list one labeled Export Parameters allows specification of the export destination When selecting an export
374. to use workflows rely on the presence of particular reference datasets This reference data must be downloaded and configured before these workflows can be used The tools in the Workbench make it easy to download the necessary data such that the workflows can find and use it This section covers the download and configurations needed to make available the reference data relevant to the Biomedical Genomics Workbench including the human mouse and rat genomes annotations and variants made available by a variety of databases 36 CHAPTER 4 GETTING STARTED 37 4 1 1 The Workbench Reference data location Reference data must be stored in a folder called CLC_References When the Biomedical Genomics Workbench is installed such a folder is created on your file system under your home area This folder is specified within the Workbench as a reference location You can specify a different location to download reference data to This is recommended if you do not have enough space in the area the Workbench designates as the reference data location by default To change the reference data location from within the Navigation Area Right click on the folder CLC_References Choose Location Choose Specify Reference Location The new folder will also be called CLC_References but will be located where you specify In more detail this action results in the following e A folder called CLC_References is created in the location you specified if a
375. tool to start the analysis If you are connected to a server you will first be asked where you would like to run the analysis 2 Select the sequencing reads you want to analyze figure 5 48 The panel in the left side of the wizard shows the kind of input that should be provided Select by double clicking on the reads file name or click once on the file and then on the arrow pointing to the right side in the middle of the wizard 1 Choose where to run Select sequencing reads Navigation Area Selected elements 1 2 Select reads from eS Family of Four A Family member affected affected family member f Affected child Father affected Mother unaffected H Family member affected w Qr lt enter search term gt Previous gt Next Figure 5 48 Specify the sequencing reads for the appropriate family member 3 Specify the parameters for the Fixed Ploidy Variant Detection tool including a target region file figure 5 49 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings Fixed Ploidy Variant Detection Configurable Parameters Required variant probability 50 0 Ignore broken pairs W Minimum coverage Minimu
376. track 1000GENOMES_phase_1 EUR Select variants identified in tumor 1000 Genomes z Remove Variants Outside Targeted Regions Remove Variants Found in 1000 Genomes Project Figure 7 18 Specify which 1000 Genomes population to use for filtering out known variants E H Filter Somatic Variants TAS Remove Variants Found in HapMap Choose where to run Select variants identified in tumor 1000 Genomes Remove Variants Outside Targeted Regions Remove Variants Found in 1000 Genomes Project Remove Variants Found in HapMap Figure 7 19 Specify which HapMap population to use for filtering out known variants H Filter Somatic Variants TAS Result handling Choose where to run Select variants Workflow parameters identified in tumor Preview All Parameters 1000 Genomes Result handling Remove Variants Outside Targeted Regions Open Remove Variants Found in 1000 Genomes Project Log handling Open log Remove Variants Found in Result handling Figure 7 20 Check the selected parametes by pressing Preview All Parameters in the Genome Browser View If you hold down the Ctrl key Cmd on Mac while clicking on the table icon in the lower left side of the View Area you can open the table view in split view The table and the variant track are linked together and when you click on a row in
377. ts found in the 1000 Genomes project This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 5 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap figure 7 54 CHAPTER 5 WHOLE GENOME SEQUENCING WGS 83 amp Filter Causal Variants WGS HD Select variant tracks 1 Choose where to K3 dees Navigation Area Selected elements 1 2 Select variant tracks gt bbR SRR719300_1 paired trimmed bb Acinic cell carcinoma variants with bb SRR719300_1 paired trimmed A Acinic cell carcinoma variants wi ebb Acinic cell carcinoma variants wi 4 mw Qr lt enter search term gt F Batch 1000 Genomes population 1000 Genomes gt NOMES_phase_1_FUR 1000GENOMES_phase_1_AMR 1000GENOMES_phase_1_AFR pel Figure 5 32 Select the relevant 1000 Genomes population s This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 8x Ox Remove Variants Found in HapMap 3 1 Select variant tracks HapMap database track Selecte
378. uld be used for annotation fig ure 6 53 1000 Genomes population 1000 Genomes gt gt NOMES_phase_1_EUR 1000GENOMES_phase_1_AMR 1000GENOMES_phase_1_AFR 5 Ceres C Figure 6 53 Select the relevant 1000 Genomes population s CHAPTER 6 WHOLE EXOME SEQUENCING WES 134 4 Specify the 1000 Genomes population that should be used for filtering out variants found in the 1000 Genomes project This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 5 Specify the Hapmap populations that should be used for filtering out variants found in Hapmap figure 6 54 This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 2 Remove Variants Found in HapMap 3 ps Select variant tracks HapMap database track Selected 12 elements Gf 1000 Genomes population N po Remove Variants Found in Wa Select V tt 4000 G Ees Bx e ariant track gt Remove Variants Found in Selected tepiep 3 HAPMAP _phase_3_MKK HAPMAP_phase_3_CHD HAPMAP _phase_3_TSI IHAPMAP_phase_3_CHB _ IHAPMAP_
379. umber of tables and graphs that in different ways provide information about the mapped reads from each sample e Identified Compound Heterozygous Genes Proband Gene track with the identified putative compound heterozygous Variants in the proband The gene track can be opened in table view to see the gene names e Gene List with de novo Variants Gene track with the identified putative compound heterozy gous Variants in the proband The gene track can be opened in table view to see the gene names e Gene List with recessive Variants Gene track with the identified recessive variants in the proband The gene track can be opened in table view to see the gene names e De novo variants Variant track showing de novo variants in the proband The variant track can be opened in table view to see all information about the variants e Recessive variants Variant track showing recessive variants in the proband The variant track can be opened in table view to see all information about the variants e De novo Mutations Amino Acid Track e Recessive Variants Amino Acid Track e Genome Browser View This is a collection of tracks shown together in a view that makes it easy to compare information from the individual tracks such aS compare the identified variants with the read mappings and information from databases CHAPTER 6 WHOLE EXOME SEQUENCING WES 148 6 3 5 Identify Rare Disease Causing Mutations in Trio WES The Identify Rare Disease Causing Mut
380. upport com clcgenomicsworkbench current index php manual View_settings_in_Side_Panel html 3 Mapping Report Tumor The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from the tumor sample 4 Mapping Report Normal The report consists of a number of tables and graphs that in different ways provide information about the mapped reads from the normal sample 5 Annotated Somatic Variants A variant track holding the identified and annotated somatic variants The variants can be shown in track format or in table format When holding the mouse over the detected variants in the Genome Browser view a tooltip appears with information about the individual variants You will have to zoom in on the variants to be able to see the detailed tooltip 6 Genome Browser View Tumor Normal Comparison In A collection of tracks presented together Shows the annotated variants track together with the human reference sequence genes transcripts coding regions the mapped reads for both normal and tumor the annotated somatic variants information from the ClinVar database and finally a track showing the conservation score see figure 5 25 5 2 3 Identify Variants WGS The Identify Variants WGS tool takes sequencing reads as input and returns identified variants in a Genome Browser View The tool runs an internal workflow that first maps the sequencing reads to the human re
381. ural Save Variants DNA Log handli Low Frequency Variant jie Detection DNA Open log Add Information from 1000 Genomes Project Union Result handling annn aman Figure 8 13 Select the relevant population from the drop down list Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes see figure 8 14 Choose to save the results and click on the button labeled Finish 9 Press OK specify where to save the results and then click on the button labeled Finish to run the analysis Nine different output are generated 1 A DNA Read Mapping and a RNA Read Mapping gt The mapped DNA or RNA sequencing reads The sequencing reads are shown in different colors depending on their orientation whether they are single reads or paired reads and whether they map unambiguously For the color codes please see the description in see http www clcsupport com biomedicalgenomicsworkbench current index php manual View_settings_ in_Side_Panel html 2 A DNA Mapping Report and a RNA Mapping Report This report contains information about the reads reference transcripts and statistics This is explained in more detail in the Biomedical Genomics Workbench reference manual in section RNA Seq report http clcsupport com biomedicalgenomicsworkbench current index php manual RNA_Seq_repo
382. us_musculus dna The file Mus_musculus GRCm38 dna_sm toplevel fa gz has chromosomal se quences along with several scaffolds The scaffolds were removed in the workbench e Mouse genes coding sequences and transcripts ENSEMBL ftp ftp ensembl org pub release 80 gtf mus_musculus filename Mus_musculus GRCm38 80 gtf gz e dbSNP variants ENSEMBL ftp ftp ensembl org pub release 80 variation gvf mus_musculus filename Mus_musculus gvf gz e PhastCons Conservation Scores UCSC http hgdownload cse ucsc edu goldenPath mm10 phastCons60way mml10 60way phastCons Each chromosome has a separate wigfix file Each needs to be downloaded 22 files and then combined to make single wigfix file before importing in workbench filename x phastCons6Qway wigF1ix gz e Mouse Gene Ontology GO slim file EBI http www ebi ac uk QuickGO GMultiTerm Gene Ontology file in slim format only high level GO terms annotated for the GO categories Molecular Function Biological Process and Cellular Component annotated on mouse genes The file was made using the QuickGO tool from the EBI http www ebi ac uk QuickGO GMultiTerm Rat Rnor5 0 e Rat reference sequence ENSEMBL ftp ftp ensembl org pub release 79 fasta rattus_norvegicus dna The file Rattus_norvegicus Rnor_5 0 dna toplevel fa gz has chromosomal sequences along with several scaffolds The scaffolds were removed in the workbench e Rat genes coding sequences and transcripts ENSEMBL ftp
383. used for sequencing be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 Specify the Hapmap population that should be used to add information on variants found in the Hapmap project This can be done using the drop down list found in this wizard step Please note that the populations available from the drop down list can be specified with the Data Management Fy function found in the top right corner of the Workbench see section 4 1 4 Pressing the button Preview All Parameters allows you to preview all parameters At this step you can only view the parameters it is not possible to make any changes Choose to save the results and click on the button labeled Finish Output from the Identify and Annotate Variants TAS HD workflow Six types of output are generated A Reads Track A Coverage Report Read Mapping A Per region Statistics Track A Filtered Variant Track Annotated variants An Amino Acid Track Shows the consequences of the variants at the amino acid level in the context of the original amino acid sequence A variant introducing a stop mutation is illustrated with a red amino acid A Genome Browser View Chapter 8 Whole Transcriptome Sequencing WTS Contents 8 1 Analysis of multiple samples 0 0 00 ee ee eee ee ee 220 8 2 Annotate Variants WTS aaas 2 ee ee nanasssnnnnnnsnnnn 221 8 3 Compare variants in DNA and RNA 0 0000 eee
384. user interface The Biomedical Genomics Workbench user interface includes the Toolbox Navigation Area Menu Bar Toolbar Side Panel View Area View Tools and Status Bar figure 2 3 CLC Cancer Research Workbench 1 5 Beta 1 lt x File Edit View Toolbox Workspace Help Debug Men u Bar Bees Toolbar 3 4 o wf Show New Save Import Export Graphics Print Undo Redo Cut Copy Paste Delete Workspace Plugins Data Management Workflows Navigation Area 4 yy Genome Browse X amp BO 50 000 000 100 000 000 150 000 000 200 000 000 I gt Track List Settings Side Panel a f CLC_Data l l l l Navigation aiacreteences Navigation Area romo_sapiens_seauen 7 View Area e eeen Homo_sapiens_ensem bl_v74_Genes Insertions 2 E Annotated Variants 408 Homo_sapiens_ensem E dbsnp_v 138 bl_v74_mRNA ay l I F Clinvar_20131203 o inih aiin hallil ia a 223 Cosmic_v67 Homo_sapiens_ensem F 1000GENOMES_phase_1_EUR bivre os iy BIE E l i r W T ak 1000GENOMES_phase_1_AMR o 8 Y F 1000GENOMES_phase_1_AFR Annotated Variants Variants 101 ae a Qr lt enter search term gt 2 i 0 mi hi a il u i li l Boas Find lt All Tracks gt v Toolbox 7 107 413 E Find Ready to Use Workflows Vari pas Pa ariants 8 659 871 a Preparing Raw Data Toolbox E i Whole Genome Sequencing gt Track layout fpa Whole Exome Sequencing n ss Clinvar_20131203 gt DNA sequ
385. variant site and not the variant itself passes the variant probability threshold then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold For example if the required variant probability is set to 0 9 then the individual probability of the variant called might be less than 0 9 as long as the probability of the entire variant site is greater than 0 9 e Ignore broken pairs When ticked reads from broken pairs are ignored Broken pairs may arise for a number of reasons one being erroneous mapping of the reads In general variants based on broken pair reads are likely to be less reliable so ignoring them may reduce the number of spurious variants called However broken pairs may also arise for biological reasons e g due to structural variants and if they are ignored some true variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual
386. variants may go undetected Please note that ignored broken pair reads will not be considered for any non specific match filters e Minimum coverage Only variants in regions covered by at least this many reads are called e Minimum count Only variants that are present in at least this many reads are called e Minimum frequency Only variants that are present at least at the specified frequency calculated as count coverage are called For more information about the tool see http clcsupport com biomedicalgenomicsworkben current index php manual Fixed_Ploidy_Variant_Detection html 9 Specify the parameters for the QC for Target Sequencing tool for the affected family member figure 6 59 When working with targeted data WES or TAS data quality checks for the targeted sequencing is included in the workflows Again you can choose to use the default settings or you can choose to adjust the parameters The parameters that can be set are e Minimum coverage provides the length of each target region that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored CHAPTER 6 WHOLE EXOME SEQUENCING WES 138 10 rE 12 13 14 15 16 QC for Target Sequencing proband Configurable Parameters Minimum coverage 30 Ignore non specific matches Ignore broken pairs gt Locked Settings fi sD X Cancel
387. ver a variant is found within a coding sequence e Amino acid changes Adds information about amino acid changes caused by the variants e Information from ClinVar Adds information about the relationships between human varia tions and their clinical significance e Information from dbSNP Adds information from the Single Nucleotide Polymorphism Database which is a general catalog of genome variation including SNPs multinucleotide polymorphisms MNPs insertions and deletions InDels and short tandem repeats STRs e PhastCons Conservation scores The conservation scores in this case generated from a multiple alignment with a number of vertebrates describe the level of nucleotide conservation in the region around each variant 1 Go to the toolbox and select the Annotate Variants WTS workflow In the first wizard step select the input variant track figure 8 2 w Annotate Vanants WTS Select the track of variants 1 Choose where to run Navigation Area Selected elements 1 2 aes as track of Pee a Hek 23T _R1_001 paired RN 23T_R1_001 paired RNA Seq R bbb 23T_R1_001 paired RNA Seq R bk 23T_R1_001 paired RNA Seg R wi Qr enter search term gt Batch Previous gt Next Figure 8 2 Select the variant track to annotate 2 Click on the button labeled Next If you are using the workflow from the Human folder you should specify which 1000 Genomes population yo use fi
388. vigation Area Family of Four Selected elements 1 i Family member affected affected family member i Affected child Father affected Mother unaffected Family member affected uw Previous gt Next Eci Figure 6 55 Specify the sequencing reads for the appropriate family member 3 Select the sequencing reads from the unaffected parent 4 Select the sequencing reads from the affected parent 5 Select the targeted region file figure 6 56 The targeted region file is a file that specifies which regions have been sequenced when working with whole exome sequencing or targeted amplicon sequencing data This file is something that you must provide yourself as this file depends on the technology used CHAPTER 6 WHOLE EXOME SEQUENCING WES 136 for sequencing You can obtain the targeted regions file from the vendor of your targeted sequencing reagents r Select input for targeted region file Navigation Area Selected elements 1 targeted_sequencing DE S0293689_Regions_BED CTFR H E Cergentis AmpliSeq agilent_sure_select 5 4450293689_Regions_BED wm 4 p Qy lt enter search term gt Previous Figure 6 56 Select the targeted region file you used for sequencing 6 Select the sequencing reads from for the affected child T Specify the Hapmap populations that should be used f
389. w allows the reference sequenced to be displayed together with other data provided in a so called track format One of the big advantages of using tracks is that they allow visualization comparison and analysis of genome scale studies with all the information tied to genomic positions A central coordinate system provided by a reference genome makes it possible to view and compare different datasets together in a Genome Browser view Of course each track can be viewed individually if desired 2 4 1 Track types Several different track types are available To make it easier to recognize the different track types in the Navigation Area and in the View Area each track type is associated with a specific icon e Coverage graph lx e Read mapping e Reference genome sequence e Annotation track e Genome browser view fy e Variants from variant calling P F e Expression track 2 e Differentially expressed genes A 2 4 2 The Genome Browser The Genome Browser view is a collection of tracks Each track in a Genome Browser view is tied to the same underlying genomic co ordinate set making visualization and comparison of different results and data types simple and intuitive Annotations and variant information are provided together with the human reference genome via our Data Management Datasets e g in GFF of VCF format from resources not provided for CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND
390. w clcbio com support downloads manuals 2 1 The start screen When you start up the Biomedical Genomics Workbench you should see an image like the one in figure 2 1 The information in the left hand panes will differ depending on what data you already have available and any plugins you may have installed 2 1 1 The getting started table When no data has been opened for viewing a table is visible in the View Area of the Workbench This table provides links to sections of the application based user manual and is thus a simple and fast way to access information about using the Biomedical Genomics Workbench 10 CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 11 H Biomedical Genomics Workbench 2 2 oc C e Sa File Edit View Toolbox Workspace Help a Be 7 e GF amp Show New Save Import Export Graphics Print Undo Redo Cut Copy Paste Delete Workspace Plugins Data Management Workflows Navigation Area 4 Pa B O q CLC_Data H E CLC_References Qr lt enter search term gt 2 Toolbox kg D N A Ready to Use Workflows H Preparing Raw Data z t H b Whole Genome Sequencing Getting started fapa Whole Exome Sequencing Data preparation Prepare sequencing data Prepare sequencing data H t Targeted Amplicon Sequencing Tutorials Whole Transcriptome Sequencing Application based ools manual pdf fy Genome Browser Data analysis Identify variants in whole genome data Analysis
391. whole_exome_sequencing a tumor_01 paired reads whole_genome_sequencing normal_01 paired trimmed pa normal_01 paired trimmed or normal_01 paired La result_identify_variants targeted amplicon sequencing 4 mw p Q7 zenter search term gt Figure 5 20 Select the tumor sample reads When you have selected the tumor sample reads click on the button labeled Next CHAPTER 5 WHOLE GENOME SEQUENCING WGS 15 2 In the next wizard step figure 5 21 please specify the normal sample reads 5 Bx Identify Somatic Variants from Tumor Normal Pair WGS Select sequencing reads Navigation Area ia whole_exome_sequencing a bar whole_genome_sequencing normal_01 paired trimmed pz _ normal_01 paired trimmed or result_identify_variants tumor_01 paired 5 targeted amplicon sequencing 4 44i Q zenter search term gt Figure 5 21 Select the normal sample reads 3 Click on the button labeled Next to go to the next wizard step figure 5 22 7 Bx Identify Somatic Variants from Tumor Normal Pair WGS Low Frequency Variant Detection Configurable Parameters Select tumor sequencing Required significance reads Choose where to run Ignore positions with coverage above Select normal sequencing cle Restrict calling to target regions I e broken pairs InDels
392. with the targeted regions from your experiment P A Identify Variants TAS Choose where to run Select sequencing reads InDels and Structural Variants QC for Target Sequencing QC for Target Sequencing Minimum coverage Ignore broken pairs gt Locked Settings Configurable Parameters Track of Target Regions gt target_regions 30 Ignore non specific matches Figure 7 36 Select the track with the targeted regions from your experiment P H Identify Variants TAS Low Frequency Variant Detection Choose where to run Select sequencing reads InDels and Structural Variants QC for Target Sequencing Low Frequency Variant Detection Configurable Parameters Required significance Ignore positions with coverage above Restrict calling to target regions Ignore broken pairs Ignore non specific matches Minimum read length Minimum coverage Minimum count Minimum frequency Base quality filter Read direction filter Direction frequency Read position filter Significance Relative read direction filter Significance Remove pyro error variants 1 0 100000000 gt target_regions E In homopolymer regions with minimum length 3 With frequency below gt Locked Settings 0 8 Figure 7 37 Please specify the parameters for variant detection 6 Click on the button la
393. ws can be used for and go through step by step how to run the workflows The Biomedical Genomics Workbench offers a range of different tools for RNA seq analysis Currently 5 different ready to use workflows for 3 different species human 7 mouse and rat are available for analysis of RNA seq data e Annotate Variants WTS e Compare Variants in DNA and RNA 219 CHAPTER 8 WHOLE TRANSCRIPTOME SEQUENCING WTS 220 e Identify Candidate Variants and Genes from Tumor Normal Pair e Identify Variants and Add Expression Values e Identify and Annotate Differentially Expressed Genes and Pathways The ready to use workflows can be found in the toolbox under Whole Transcriptome Sequencing as shown in figure 8 1 Toolbox Ready to Use Workflows a Preparing Raw Data gqg Whole Genome Sequencing t gt a Whole Exome Sequencing t Targeted Amplicon Sequencing Whole Transcriptome Sequencing eat Human Annotate Variants WTS HF Compare Variants in DNA and RNA W Identify Candidate Variants and Genes from Tumor Normal Pair bad 4 Identify Variants and Add Expression Values 2 gt Identify and Annotate Differentially Expressed Genes and Pathways ex Mouse gt Annotate Variants WTS M LES Compare Variants in DNA and RNA M Ji Identify Candidate Variants and Genes from Tumor Normal Pair M H Identify Variants and Add Expression Values M a gt Identify and Annotate Differentially Expressed Genes a
394. y Candidate Variants and Genes from Tumor Normal Pair M 3 Identify Variants and Add Expression Values M iit Identify and Annotate Differentially Expressed Genes and Pathways M E Rat i Annotate Variants WTS R EEE Compare Variants in DNA and RNA R 2 lt Identify Candidate Variants and Genes from Tumor Normal Pair R wt Identify Variants and Add Expression Values R ban Fn Identify and Annotate Differentially Expressed Genes and Pathways R Figure 2 5 Each application type has its own set of ready to use workflows different types of data analysis will vary depending on the questions being asked of the data In section 2 3 we will use diagrams and examples to illustrate how different tools and workflows can be used for data analysis For a detailed description of the individual tools we refer to the Biomedical Genomics Workbench reference manual http www clcbio com support downloads manuals 2 3 Workflows an overview Biomedical Genomics Workbench offers a number of analysis workflows also referred to here as CHAPTER 2 INTRODUCTION TO USER INTERFACE WORKFLOWS AND TRACKS 16 the pre installed ready to use workflows which include all the necessary steps for a particular analysis from the initial quality checking and trimming of the reads to the final reporting of the results for example the disease causing mutations detected in an analysis The workflows are easy to use and just require the sequence data as
395. y the parameters for the QC for Target Sequencing tool for the affected child figure 6 75 When working with targeted data quality checks for the targeted sequencing is included in the workflows Again you can choose to use the default settings or you can choose to adjust the parameters QC for Target Sequencing proband Configurable Parameters Minimum coverage 30 Ignore non specific matches Ignore broken pairs E b Locked Settings srs X Cancel Figure 6 75 Specify the parameters for the QC for Target Sequencing tool The parameters that can be set are e Minimum coverage provides the length of each target region that has at least this coverage e Ignore non specific matches reads that are non specifically mapped will be ignored e Ignore broken pairs reads that belong to broken pairs will be ignored Specify the parameters for the QC for Target Sequencing tool for the father Specify the parameters for the QC for Target Sequencing tool for the mother Specify the parameters for the Fixed Ploidy Variant Detection tool for the mother fig ure 6 6 The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted We have optimized the parameters to the individual analyses but you may want to tweak some of the parameters to fit your particular sequencing data A good starting point could be to run an analysis with the default settings The parameters t
396. y to use workflow It should be used to identify known variants specified by the user e g Known breast cancer associated variants for their presence or absence in a sample Please note that the ready to use workflow will not identify new variants The Identify Known Variants in One Sample TAS ready to use workflow maps the sequencing reads to a human genome sequence and does a local realignment of the mapped reads to improve the subsequent variant detection In the next step only variants specified by the user are identified and annotated in the newly generated read mapping Import your known variants To make an import into the Biomedical Genomics Workbench you should have your variants in GVF format http www sequenceontology org resources gvf html or VCF format http ga4gh org fileformats team Please use the Tracks import as part of the Import tool in the toolbar to import your file into the Biomedical Genomics Workbench CHAPTER 7 TARGETED AMPLICON SEQUENCING TAS 166 Import your targeted regions A file with the genomic regions targeted by the amplicon or hybridization kit will be provided by the vendor To obtain this file you will have to get in contact with the vendor and ask them to send this target regions file to you You will get it in either bed or gff format Please use the Tracks import as part of the Import tool in the toolbar to import your file into the Biomedical Genomics Workbench How to run
397. ysis of your sequencing data Import data pesSesos gt i P m _ Sequencing Reads Resequence data i l Run workflow 1 SHE Preparing Raw Data Ny Prepare Overlapping Raw Data yi Prepare Raw Data Preparation of Raw Data Inspect results Output OC trim reports _ Output Prepared data i 6 f QC trim reports are not OK lt i fF QC trim reports are OK 4 Use prepared data as input Run workflow 2 Run workflow 2 Run workflow 2 nse N W an gag Whale Genome Sequencing Whole Expte Sequenong ep Targeted Amphion Sequencing eg Thereoriphomecs Anal yn Annotate Variants WGS ipes variants WES Set Annotate Variants TAS p FE Set Un Esmeriment Rur kfl J Filter Somate Variants G5 BH Fiter Samat Variants WES SE Fiter Sonate Variants TAS z un WOrkKTIOW 296 Fiter Somatic Variants from Tumor Normal Pair WG5 aff Fiter Somatic Variants from Tumor Normal Pair WES ffl Fier Somatic Variants from Tumor Normal Pair TAS whole Transoiptome HT Identify Known Variants in One Sampie DAGS UT Identify Known Variants in One Sampie VES SA Identify Known Variants in One Sampie TAS e annotate Variants wre ASS Identify Varianta WGS ce identify Variants WES one Wariarts TAS HE Compare variants in DMA and RMA GS Denti and Anngtabe Variarits WES X Identify and Annotate Variants TAS o Canddate Variants and Genes from Tu

Biomedical Genomics Workbench

Contents

Download Pdf Manuals

Related Search

Related Contents