Home

Metagenomics Breakout Session

1. Jj 9 View file txt with the text viewer less What does the file contain After you are finished hit q to exit less qlime qiime VirtualBox less file txt qiime qiime VirtualBox 10 Let s copy this very interesting file to test_1 Use the command cp and provide the file to copy and the path Don t forget to use tab to autocomplete How do you check to see if the command worked correctly 29 qlime qiime VirtualBox cp file txt Desktop dir 1 test 1 T qiime qiime VirtualBox 11 Move file txt from test_1 to test_2 using the mv command As with cp it takes the file name to be moved and the output path as arguments Write down the commands One method of doing this is below qiime qiime VirtualBox cd Desktop dir 1 qiime qiime VirtualBox Desktop dir 1 mv test 1 file txt test 2 qlime qiime VirtualBox Desktop dir 1 cd test 2 qiime qiime VirtualBox Desktop dir 1 test 2 ls file txt qlime qiime VirtualBox Desktop dir 1 test 2 cd qlime qiime VirtualBox Desktop dir 1 cd test 1 qiime qiime VirtualBox Desktop dir 1 test 1 ls qiime qiime VirtualBox Desktop dir 1 test 1 J After the mv command you can check with Is to verify that file txt is in test_2 but no longer in test 1 12 Remove file txt and the directories you created in this tutorial with the rm command The r option tells rm to remove recursively which allows you to remove all files and directories within a directory
2. qliime qiime VirtualBox Desktop dir 1 test 1 cd qlime qiime VirtualBox Desktop dir 1 cd test 2 qiime qiime VirtualBox Desktop dir 1 test 2 rm file txt qliime qiime VirtualBox Desktop dir 1 test 2 cd qlime qiime VirtualBox Desktop dir 1 cd qliime qiime VirtualBox Desktop rm r dir 1 qiime qiime VirtualBox Desktop ls VirtualBox Desktop It is sometimes challenging to adapt to interacting with the command line rather than a GUI There are a number of resources online both in help forums like StackOverflow and forums for Linux enthusiasts Check out the GNU documentation too As you use QIME you will get lots of practice moving around the file structure and providing absolute paths To reduce typing errors use tab as much as possible If the shell doesn t understand what you are pointing to then the QIIME scripts will not either For Reference Change directory cd Go up one directory cd Print working directory pwd 30 List the contents of directory Is Remove a file a directory rm lt file name gt rm R lt dir_ name gt Copy a file a directory cp lt file name gt lt new_location gt cp R lt file name gt lt new_location gt Move a file mv lt file name gt lt new_location gt Rename a file mv lt file name gt lt new file name gt Concatenate files cat lt file_1 gt lt file_2 gt gt lt combined file gt Write to a file gt file name Is gt list txt the contents o
3. 3 Which of the following best describes your previous experience with scientific research a b C this is my first research experience I have had some limited research experience prior to this course less than one year I have had 1 2 years of research experience d I have more than 2 years of research experience C other 16 4 Reason for taking Molecular Techniques Couldn t fit any other class in your schedule Wanted to learn about and apply cutting edge molecular technologies General Interest Other oe p 5 Gender a Female b Male c Other d prefer not to answer 6 Molecular techniques Assessment circle your response Very Overall Overall Experience Laboratory experience ae TN EE N PEN PPE PE Bioinformatics experience 1 J 2 3f 4 5s J Biostatistical Experience i a a o Quizzes Assignments Professor Handouts Discussions 7 Pre Post assessment Please assess each of the following in terms of how you felt BEFORE attending Molecular techniques and how you feel NOW 7A Very Somewhat Neutral Somewhat Very unlikely unlikely likely likely sequencing technologies in research BEFORE sequencing technologies in research NOW Very low A Neutral Somewhat Very A EE Knowledge of bioinformatics BEFORE A Knowledge of bioinformatics NOW ee S S A SE me ena high high BEFORE 17 Knowledge of biostatistical approaches 1 2 3 4 5 NOW Ca a ed a a
4. Shell Error Messages No such file or directory You are probably not looking for a file or directory in the appropriate place or you are attempting to pass a file into a script that is not found in the path you have provided Check the paths to make sure the files are indeed found there Use tab as much as possible to avoid these error messages since the shell will not let you input something it cannot interpret Check spelling and capitalization too The shell is case sensitive so Shared Folder and Shared folder are two different directories Avoid spaces and try to keep names informative but short to limit this type of error Permission denied VirtualBox will not let you write directly to the Shared Folder You can write to subdirectory of the Shared Folder so on your host machine the laptop or whatever machine VB is installed on create a subfolder where you are able to write files or directories If this is not a 73 Shared Folder issue use chmod to change the permissions so you may write or execute certain directories or processes QIUME specific Error Messages Usually QUME specific error messages will return the script you attempted to run plus a list of the options the script can take Each option will specify which kinds of files it takes filepath or directory and what sort of data should be included in the input files If you are still stuck after carefully checking the script input against the documentation check the QIIM
5. i lt distance_matrix txt gt 0 lt pc_matrix txt gt The output of this script is a text file with the distance matrix transformed into principal coordinates or the combinations of variables that most explain the variation in the dataset The principal coordinates matrix is used to generate 2D and 3D plots which allow you to determine if any clustering of the samples is explained by a metadata category B G D E F G H J c vector i 1 2 3 5 6 7 8 9 2 S9 0 01642 0 049978 0 02439 0 118965 0 03807 0 027451 0 11092 0 0258 0 045208 3 S8 0 124188 0 05052 0 012989 0 03037 0 08119 0 03197 0 06518 0 1011 0 04757 4 3 0 121453 0 204422 0 06203 0 01384 0 03246 0 06647 0 04391 0 10829 0 00974 5 S2 0 227945 0 059323 0 10107 0 04393 0 19171 0 088021 0 082157 0 020069 0 02187 6 1 0 139404 0 025643 0 20816 0 004351 0 049556 0 082484 0 09065 0 0077 0 036441 7 S5 0 172088 0 00131 0 09028 0 06313 0 130354 0 080865 0 083172 0 073391 0 074349 8 S4 0 143175 0 184739 0 01257 0 05678 0 10142 0 10056 0 06921 0 057695 0 12751 Figure 14 Sample principal coordinates analysis of the unweighted Unifrac distances iii Make the preferences files This script allows you to specify the coloring of the 2D and 3D plots In this case we will use it to get continuous coloring for the 2D plots to make it easier to observe gradients in metadata variables like salinity qiime qiime VirtualBox Desktop Shared Folder cso for hhmi make
6. which can be found at http www juniata edu services pathfinder Academic_Honesty standards html Students with Disabilities If you need special accommodations with respect to disabilities please contact the academic counselor in Academic Support Services who will help you with this process More information can be found at http www juniata edu services catalog section html s l appr amp s2 accommodationsCourse 20 Withdrawal After the drop add period has expired you may withdraw from this class Your last chance to do this is the last day of classes You will need my signature and the signature of your advisor s Err his module took approximately 5 weeks to teach in the classroom From nucleic acid extraction through sending out the libraries for sequencing Below you can find the objectives for each week All of the course materials can be found in the Environmental Genomics Research folder on your flashdrive Course materials are organized by week Keep in mind this was my first year teaching the course so approach cautiously Week 1 rRNA structure and function and nucleic acid extraction Outline of Objectives e Be able to define the structure and function of the rRNA operon e Utilize the 16S rRNA gene technology to describe microbial community structure in a sample e Describe the general steps involved in DNA RNA extraction e Be able to troubleshoot problems associated with nucleic acid extraction e Perform DNA RNA purific
7. 16S rRNA region for high throughput sequencing is the V4 region Caporoso et al 2010 A description of which regions are most useful for particular applications is described in Soergel et al Similarly the internal transcribed spacer ITS regions of the rRNA gene in eukaryotes is used for taxonomic profiling in fungi The ITS regions refer to pieces of non functionaal RNA situated between structural ribosomal RNA on a common precursor transcript Reading from the 5 to 3 direction this precursor transcript contains the 5 external transcribed sequence 5 ETS 18S rRNA ITS1 5 8S rRNA ITS2 28S rRNA and finally the 3 ETS During rRNA maturation ETS and ITS pieces are excised The ITS varies greatly between fungal taxa which has allowed it to be useful for determining which fungal taxa are present ina sample This can be explained by the relatively low evolutionary pressure acting on such non functional sequences The ITS region is now perhaps the most widely sequenced DNA region in fungi Peay et al 2008 For the purposes of this module we will amplify the ITS 1 region as described in McGuire et al Module Goals e Participants will learn the structural and functional importance of the rRNA operon and its utility in studying microbial diversity e Participants will prepare bacterial and or fungal sequencing libraries from DNA extracts by carrying out 16S rRNA and or ITS library preparation using illumina tag PCR E gel electrophore
8. Additions Click OK to get back to the VB Manager Make sure your desired VM is highlighted and click Start or double click on the VM The VM may take a few minutes to start up If there are a few boot options to choose from choose the first one by hitting enter VB will prompt you for allowing the mouse and keyboard to be used inside VB Hit okay to accept all of these prompts Once the desktop is present click Devices and Install Guest Additions at the bottom of the menu S QIME ine Running Oae Vd Vito Tt Devices ITEN E CD DVD Devices USB Devices Network Adapters a Shared Folders Enable Remote Display Install Guest Additions Mount the Guest Additions installation image O A FOG O BRightcri Click Run when the warning box pops up and enter the password qiime when you are asked to authenticate the installation You can monitor the process of installation from the command line Installation may take a few minutes When prompted hit Enter to finish the installation 34 Set up the Shared Folder The Shared Folder allows you to easily share files between the VM and the host system your computer Right click the gray folder in the bottom right hand corner of the VM window and select Shared Folder see screenshot below Click the blue folder with the green plus sign to set up a new shared folder Choose where you would like to set up the Shared Folder Desktop or a specific folder in
9. Genomics Research Course address the following objectives Utilize a virtual machine to employ linux operating systems Be able to understand directory structure files and processes in linux Understand the overall workflow we will execute in qiime Describe the difference between alpha and beta diversity and how we use these metrics to describe the microbial ecology of various ecosystems Understand how demultiplexing and quality filtering 1s achieved Use ordination methods to compare microbial community data from different samples Prepare metadata mapping files that are readable by QIIME Describe the three ways OTU picking can be performed Apply rarefaction analyses to normalize for sequencing depth Describe methods of quantifying system change Describe sources of measurement error in biodiversity data and propose ways to deal with these biases Be able to describe and implement both quantitative and qualitative measures of alpha and beta Be able to implement qiime scripts to measure alpha diversity in our samples and to statistically compare these metrics across samples diversity Understand the pros and cons of using phylogenetic divergence as a measure of biodiversity Time line of module This module will be spread out during down times during lab activities during this 2 5 day workshop However in the environmental genomics research course I taught last semester we spread this out over weeks 6 through 14 Again this
10. Only on clean exit Abot Open Cancel K A terminal window will open asking for your username and password If you are asked about the rsa2 key click yes All Operating Systems Once you are logged in you should see a message similar to the one below and a command prompt ee sickler hhmi a x Configuring your bashrc file To use the programs that are installed on our cluster you first need to set your PATH variable which tells the computer where to look for programs To configure your bashrc file with the path to the programs run the following commands cp sickler bashre bashre source bashrc To verify everything worked enter which nseg the computer should return 75 share apps nseg nseg 3 Python Notebook Several QUME tutorials make use of Python Notebook which is a tool that combines code with plain text and figures Python Notebook is viewed in a web browser and is accessible as a Static page to everyone you share it with even if they do not have Python Notebook installed themselves Python Notebook can be used to build workflows demonstrate scripts in realtime or keep lab notebooks for labs that are more computationally minded Python Notebook is easy to install and fun to play around with There are a number of sample galleries as well as instructions on how to get the most out of this tool 1 Python Notebook website 2 Python viewer 3 Cool Notebooks 4 Learn
11. an example of a 3D plot in Emperor A and in KiNG B The samples are colored by date in both plots Coloring by sample date seems to delineate the sample clustering by unweighted Unifrac distances fairly well Samples in red were taken on 10 29 in blue on 10 30 in orange on 10 31 and in green purple on 11 03 While both plots yield the same information the rendering quality is much better in Emperor and it is much easier to change the colors and scale the points than in KiNG Changing these values in KiNG requires passing a different prefs file with these modifications however this becomes tedious with a large number of points 9 Statistical Analyses QIIME offers support for downstream statistical analyses The documentation and the QUME forum often contain discussions on the relative utility of each test and how it can be applied to microbial data Many of the tests are implemented in R and wrapped with QIIME so the R documentation may also be useful in interpreting the results a OTU Category Significance This script allows you to determine which OTUs are significantly associated with a metadata category or a range of metadata values ANOVA and Pearson correlation tend to be the most useful however this script contains a number of other tests Make sure to use the clean OTU table or the OTU table that resulted from filtering low abundance OTUs and single rarefaction Since we have mostly continuous variables the Pearson correlation
12. application Make them draw a couple of cycles of paired end Illumina sequencing I also make them dig through the patents to find relevant information References and Suggested Reading High throughput sequencing of 16S rRNA 22 1 Earth Microbiome Project http www earthmicrobiome org 2 Caporaso JG et al 2010 QIIME allows analysis of high throughput community sequencing data Nat Methods 7 335 336 3 Caporaso JG et al 2011 Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample Proc Natl Acad Sci U S A 108 Suppl 1 4516 4522 4 Kuczynski J et al 2011 Using QUME to analyze 16S rRNA gene sequences from microbial communities Curr Protoc Bioinformatics Chapter 10 Unit 10 7 5 Soergel DAW Dey N Knight R Brenner SE 2012 Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences ISME Journal 6 1440 1444 Sample Design and Replication 6 Lennon JT 2011 Replication lies and lesser known truths regarding experimental design in environmental microbiology Environmental Microbiology 13 1383 1386 7 Prosser JI 2010 Replicate or lie Environmental Microbiology 12 1806 1810 ITS region papers 8 McGuire KL et al 2013 Digging the New York City skyline Soil fungal communities in green roofs and city parks PLoS ONE 8 e58020 9 K ljalg U Larsson K H Abarenkov K Nilsson RH Alexander IJ et al 2005 UNITE a database providing
13. be of the same species or the same operational taxonomic unit OTU QIIME supports several strategies for generating an OTU table or a table of each OTU present in the entire dataset and its absolute abundance in each of the samples which we will explore in the following tutorial 31 The dataset included in this tutorial is from a longitudinal study of an alluvial stream microbial community during and after a major storm event The dynamics of the microbial community of a stream fed primarily by runoff and stormwater are not well understood in the context of major storm events especially when combined and sanitary sewer overflows CSO SSO feed into the stream The samples n 47 were sequenced on the Ilumina MiSeq platform and were demultiplexed 1 e barcodes were matched to sample IDs and subsequently removed on the sequencer which requires us to modify the typical QUME workflow to get around the split libraries step QIIME pronounced chime is an open source bioinformatics platform for microbial community analysis QUME runs in Unix Linux and can be natively installed or used inside a VirtualBox The QIIME pipeline allows you input files directly from the sequencing instrument demultiplex barcoded samples generate an OTU table and perform downstream diversity and statistical analyses The basic workflow starts with filtering the reads to ensure good quality data and splitting the libraries to match barcoded samples to the app
14. cross disciplinary approaches Assessment 25 quizzes Drop your lowest Pre and Post quizzes for each topic 10 assignments 10 Poster of a molecular technique 10 presentation of a bioinformatics exercise from Samuelsson 10 Final Presentation 35 manuscript Grading will be as follows A Your work must be of the quality and completeness that could be published as part of a scientific manuscript To earn an A you must also demonstrate a high level of independence i e when something goes wrong you need to formulate a plan to figure things out Of course I will help discuss with you a proper way to proceed but you need to initiate plans of troubleshooting You effectively organize your time carefully plan experiments and critically assess your work Additionally you clearly and professionally can communicate your project written and oral B Your work is of good quality but 1s not sufficiently complete to warrant an A C Only basic requirements listed above are met D F Student does not meet the course requirements and will most likely be recommended to drop the class unless they can demonstrate formulate a plan to get themselves back on track Academic Integrity I expect that each of you will put forth your best honest effort in this class I also expect that you are responsible for upholding the standards of academic honesty and integrity as described by Juniata College Please familiarize yourself with these policies
15. files output by the summarize taxa script shows the relative abundance of each taxon at different taxonomic levels in each sample 8 Diversity Analyses Microbial ecologists are often interested in the microbial diversity of their samples Alpha diversity refers to the within sample diversity while beta diversity describes the differences in diversity between samples Both alpha and beta diversity can be computed in QIIME and a number of metrics for each measure of diversity can be implemented a Alpha Diversity While QIIME offers a workflow script for generating rarefaction plots previous experience has shown that it often fails and does not progress past the multiple rarefactions Therefore it is advisable to do each step of the workflow separately 48 Alpha diversity is computed by generating multiple rarefactions of the OTU table at different sampling depths calculating alpha diversity on each rarefied OTU table collating the results and making rarefaction plots The most computationally intensive step of this workflow is performing the multiple rarefactions Refer to the QIIME documentation for more information and other options i Generate multiple rarefactions Run this step in parallel if possible For serial environments use multiple rarefactions py and omit O Use the OTU table with OTUs observed at least 10 times in the dataset This script will take approximately 20 minutes to complete using the options below 3 co
16. gel system to run a gel to check PCR products e Troubleshoot PCR reactions that didn t work Weeks 3 and 4 Troubleshooting and PCR product purification Outline of Objectives e Continue troubleshooting negative PCR reactions e Utilize SPRI bead technology and or gel extraction to clean up pooled PCR products e Explain the biochemistry behind SPRI bead technology e Learn how to evaluate the quantity and quality of the prepared libraries e Using gel electrophoresis e Bioanalyzer e Qubit e Describe how to perform a literature review for the manuscript e Describe how Carl Woese discovered the third domain of life consider moving to week 1 or 2 e Discuss why fecal bacteria indicators do not always correlate with pathogens project specific objective e Describe limitations of pathogen detection project specific objective e Evaluate the advantages and disadvantages of shotgun metagenomics e Week 5 Sequencing Technologies e Outline of Objectives e Watch the following Broad Institute Bootcamp videos http www broadinstitute org scientific community science platforms genome sequencing broadillumina genome analyzer boot camp e Be able to describe the biochemistry behind the Sanger 454 and Illumina sequencing technologies e Analyze the output of the 454 and Illumina sequencing platforms e Explain the limitations and advantages of these sequencing platforms e Understand the Illumina sequencing technology in the context of our
17. high high ke ma OA A A A A BEFORE oN NOW Very low calla Neutral Somewhat Very Migh mign Comfort in executing command line based programs BEFORE Comfort in executing command line based programs NOW Open Ended Questions 8 What were the strengths of the Molecular Techniques course What did you find most useful or enjoyable 9 Which parts of the molecular techniques course were the least useful or enjoyable 10 How likely are you to recommend this course to a friend or colleague Very unlikely Somewhat unlikely Somewhat likely Very likely 11 Do you have any other comments or suggestions for improving Molecular techniques 12 What did you learn in Molecular techniques 13 How did this course challenge you Time line of module We will be performing all of the protocols described in this module over the 2 5 day workshop It should be noted that this module can be spread out over the first five weeks of a semester Please see the section on applications in the classroom for a detailed outline of how to do this Also on your flash drives and on the Wiki there will be links to all of the materials week by week for setting this up as a semester long class Discussion Topics for class e Experimental design o Why is replication important in biological studies o What are some simple statistical techniques that one can use if they don t replicate sampling o What is the difference betwe
18. html http qiime org scripts alpha_diversity_metrics html We will use the alpha diversity metrics chaol PD whole tree and Heip s evenness to calculate the richness and evenness of the dataset 49 qiime qiime VirtualBox Desktop Shared Folder cso for hhmi parallel alpha dive rsity py i multiple rarefactions o alpha div t home qiime qiime software q g 13 5 gg 13 5 otus trees 97 otus tree m chaol PD whole tree heip e 0 3 parallel alpha diversity py i lt input _dir gt Input the directory containing the multiple rarefactions 0 lt output dir gt t lt path to reference tree file gt Use the latest Greengenes tree Needed for PD _whole_ tree only m lt alpha_div_metrics gt Separate multiple metrics with commas do not include spaces O lt num jobs to start gt This script will output tab delimited files with the results of each alpha diversity metric at each sampling depth Sometimes the script completes without deleting the temporary folders If this is the case the next script will fail If the alpha diversity results look complete and all iterations are a similar size try to delete the temporary directories before collating iii Collate alpha diversity results http qiime org scripts collate_alpha html qlime qiime VirtualBox Desktop Shared Folder cso for hhmi collate alpha py i alpha_div 0 collated alpha collate alpha py i lt input dir gt Input the directory containing the results of
19. most often have to do with navigation within the file structure and file management 1 Open the terminal 2 Type pwd into the terminal and press enter to print the working directory This will allow you to see the current directory qlime qiime VirtualBox pwd home qiime qiime qiime VirtualBox Jj 28 3 Use Is list to list the contents of your home directory Which files and directories do you see qlime qiime VirtualBox ls 4 Type ls a to see the hidden files in your home directory Note there is a space between Is and the dash What distinguishes a hidden file from a visible file 5 Navigate to the Desktop by using cd change directory Don t forget to practice using tab to have the shell fill in as much information as possible qliime qiime VirtualBox cd Desktop qlime qiime VirtualBox Desktop 6 Create a directory using mkdir within Desktop Check to see 1f it worked by using ls VirtualBox cd Desktop VirtualBox Desktop mkdir dir 1 VirtualBox Desktop ls VirtualBox Desktop Jj 7 Make two subdirectories called test_1 and test_2 within dir_1 Write down the commands you used 8 Return to the home directory using cd which tells the shell to go up one directory Check to make sure you are there by using pwd qlime qiime VirtualBox Desktop dir 1 cd qlime qiime VirtualBox Desktop cd qlime qiime VirtualBox pwd home qiime qiime qiime VirtualBox
20. pairwise alpha diversity comparisons Open it in Excel for easier viewing WI A B C D z z G 1 Groupl Group2 Groupl mean Groupistd Group2 mean Group2 std t stat p value 2 283 444 48 1198415 0 7579345 nan nan None None 3 293 378 48 3624885 0 1453805 nan nan None None 4 284 515 49 1782625 1 2804655 47 533360 0 003788 1 284607279 1 5 192 2 416 45 920576 0 44 120016 1 265789 0 821269422 1 6 270 192 8 48 034586 3 872245 46 393933 1 948764 0 390003416 1 Figure 18 Sample output from a nonparametric t test comparing alpha diversity values The column p value shows which alpha diversity values is statistically different from each other The t stat column shows where the comparison fits in the constructed distribution We can see the results of the nonparametric sample in columns A and B along with the respective means and standard deviations for each group The t stat explains how the sample fits into the nonparametric distribution and the p value indicates if the comparison is significant In this case none of the conductivity values were significantly different from each other 10 Heatmaps There is a script to generate heatmaps in QIIME however the output is not very good If we want to generate a heatmap of an OTU table we use R Below is a link to the tutorial created by Keiko Sing a Lamendella Lab alumna and R guru for going from OTU table to heatmap in R Before following her instructions make sure your OTU table is in the
21. parallel alpha diversity py 0 lt output_dir gt The output of this script is one text file per metric containing the alpha diversity results iv Plot rarefaction curves See the other optional parameters for adjusting image resolution or file type This script may take a few minutes to complete http qiuime org scripts make_rarefaction_plots html qiime qiime VirtualBox Desktop Shared Folder cso for hhmi make rarefaction pl ots py i collated alpha m mapping files mapping complete txt o rarefaction p make rarefaction plots py i lt input dir gt Input the directory containing the results of collate alpha py m lt mapping file gt o lt output_dir gt 50 PD whole tree CONDUCTIVITY 60 Wi D O Rarefaction Measure PD whole tree Nm WwW 5D a 0 200 400 600 800 1000 Sequences Per Sample Figure 12 A sample alpha rarefaction plot showing the alpha diversity of the OTU table over a range of sampling depths In this rarefaction plot the sampling depth is plotted on the x axis while the species richness calculated with the alpha diversity metric PD whole tree is on the y axis Each color represents a different conductivity value Rarefaction plots can reveal differences in alpha diversity between treatments or timepoints however here it seems that the alpha diversity at each conductivity value is similar even as sampling depth increases Rarefaction plots can also tell you how well sampled t
22. raw data cd qlime qiime VirtualBox Desktop Shared Folder cso for hhmi check id map py m mapping txt o checked mapping No errors or warnings were found in mapping file qiime qiime VirtualBox Desktop Shared Folder cso for hhmi fj check id map py m lt mapping file txt gt o lt output_dir gt The output of this script is a log file which lists the errors and warnings found in the original A corrected mapping file will be produced with as the default replacement character Finally an HTML version of the mapping file will be created with the problem fields highlighted We have found it s best to fix the errors yourself and then recheck to ensure your metadata stays as you intended 4 Extract compressed files The raw data is compressed into a tarball to save space Extract the files using the tar utility Move to the directory where you would like to create the raw_ data directory In this case we are putting the raw data into a subdirectory of the Shared Folder called cso for hhmi giime qiime VirtualBox Desktop Shared Folder cso for hhmi tar xzvf raw data tar gz The tar command takes many options The options we used here are explained below See the GNU documentation of tar for more information http www math utah edu docs info tar_2 html X extract files compare c which is used to create an archive Z use the gzip utility V tar will print the files as they are extracted ver
23. sequences are already demultiplexed Description This must be the last column in the mapping file The fields must be unique For simplicity just copy the sample IDs from the first column into the Description column A sample mapping file looks like this The header for this mapping file starts with a pound character and generally requires a SampleID BarcodeSequence LinkerPrimerSequence and a Description all tab delimited The following example mapping file represents the minimum field requirement for the mapping file SampleID BarcodeSequence LinkerPrimerSequence lt metadata 1 gt Description fill in the barcode fill in the primer ete pvalues sample t fill in the barcode fill in the primer etc values sample 2 fill in the barcode fill in the primer etc values sample 3 fill in the barcode fill in the primer etc values sample4 37 Once the mapping file is created validate the format with this handy script Validating the mapping file prior to running any analyses will save you a lot of hassle http qiime org scripts check_id_map html A sample mapping file with common errors and omissions is included on the flash drive as well as a correct mapping file The correct mapping file may be used in downstream analyses if time constraints prohibit troubleshooting the mapping file with errors qliime qiime VirtualBox Desktop Shared Folder cso for hhmi
24. us who is there in our sample The rRNA operon contains genes encoding structural and functional portions of the ribosome This operon contains both highly conserved and highly variable regions which allow specific regions to be simultaneously targeted and to distinguish taxa from one another Microbiologists have relied upon DNA sequence information for microbial identification based primarily on the gene encoding the small subunit RNA molecule of the ribosome 16S rRNA gene Databases of rRNA sequence data can be used to design phylogenetically conserved probes that target both individual and closely related groups of microorganisms without cultivation Some of the most well curated databases of 16S rRNA sequences include Greengenes the Ribosomal Database Project and ARB Silva see references section for links to these databases tRNA P 16S SEUA 23S 5s j t T aAA Figure 1 Structure of the rRNA operon in bacteria Figure from Principles of Biochemistry 4th Edition Pearson Prentice Hall Inc 2006 The ribosomal RNA genes encoding 16S 23S and 5S rRNAs are typically linked together with tRNA molecules into operons that are coordinately transcribed to produce equimolar quantities of each gene product Figure 1 Universal primers can be used to amplify regions of the prokaryotic 16S rRNA gene Approximately genus level taxonomic resolution can be achieved depending on which variable region is amplified Currently most widely used
25. web based methods for the molecular identification of ectomycorrhizal fungi New Phytol 166 1063 1068 10 Peay K G Kennedy P G Bruns T D 2008 Fungal community ecology a hybrid beast with a molecular master BioScience 58 799 810 Background on rRNA operon and discovery of archaea 11 Balch WE Magrum LJ Fox GE Wolfe RS Woese CR An ancient divergence among the bacteria J Mol Evol 1977 Aug 5 9 4 305 11 12 Fox GE Magrum LJ Balch WE Wolfe RS Woese CR Classification of methanogenic bacteria by 16S ribosomal RNA characterization Proc Natl Acad Sci USA 1977 Oct 74 10 4537 4541 13 Woese CR Fox GE Phylogenetic structure of the prokaryotic domain the primary kingdoms Proc Natl Acad Sci U S A 1977 Nov 74 11 5088 90 Databases UNITE database for fungal identification http unite ut ee Ribosomal Database Project http rdp cme msu edu Greengenes Database http greengenes bl gov cgi bin nph index cg1 ARB Silva Database http www arb silva de 23 MODULE 2 ANALYSIS OF SEQUENCE DATA we are bioinformaticians thats what we do Sample preparation Gene identification Novel genes Discoveries ebe This module focuses on de convoluting the stuff Background The Illumina MiSeq platform allows cost effective deep sequencing by offering multiplexing and acceptable read lengths with a short turnaround time The open source Quantitative Insights into Microbial Ecology QI
26. wells empty Loading the Ladder and the Samples 1 Pipette 1 pl of DNA ladder yellow in the well marked 4 2 Ineach of the 12 sample wells pipette 1 pl of sample used wells or 1 pl of de ionized water unused wells ipl ladder 3 Put the chip horizontally in the adapter and vortex for 1 min at the indicated setting 2400 rpm 4 Run the chip in the Agilent 2100 bioanalyzer within 5 min Figure 5 Agilent Bioanalyzer Protocol Overview See the Quick Start guide for more information Preparing the Gel Dye Mix 1 Allow High Sensitivity DNA dye concentrate blue and High Sensitivity DNA gel matrix red to equilibrate to room temperature for 30 min 14 2 Add 15 ul of High Sensitivity DNA dye concentrate blue to a High Sensitivity DNA gel matrix vial red 3 Vortex solution well and spin down Transfer to spin filter 4 Centrifuge at 2240 g 20 for 10 min Protect solution from light Store at 4 C Loading the Gel Dye Mix Allow the gel dye mix to equilibrate at room temperature for 30 min before use Put a new High Sensitivity DNA chip on the chip priming station Pipette 9 0 ul of gel dye mix in the well marked G Make sure that the plunger is positioned at ml and then close the chip priming station Press plunger until it is held by the clip Wait for exactly 60 s then release clip Wait for 5 s then slowly pull back the plunger to the 1 ml position Open the chip priming
27. with barcoded fastq files with separate files containing the barcodes These are then input together and the script demultiplexes the reads assigns them a unique identifier quality filters and converts the sequences to fasta format for OTU picking We can still use the QUME script with a few modifications http qiime org scripts split_libraries_fastg html split libraries fastq py 1 lt sequence read _ fastq_ file fastq gt Separate these by commas no spaces 1f there is more than one o lt output_dir gt m lt mapping_ file gt Separate these by commas no spaces 1f there is more than one sample_ id lt file name when using demultiplexed data gt Use only when the sequences are demulitplexed in our case rev_comp This is required if you are using reverse reads q lt minimum_phred_score gt This is the minimum Phred Score a base may have A Phred score of 20 corresponds to a 99 chance that the base was called correctly barcode_ type not barcoded C Other Software 1 Other ways to analyze metagenomes mothur http www mothur org Picante R package _http picante r forge r project org 2 Installing R R is already installed in the QUME VDI This tutorial covers installing R on your Windows machine This tutorial is by Keiko Sing and can be found along with an introduction to R at her blog 71 http learningomics wordpress com 2013 01 28 1 thought r was a letter an introduction a Go to the R web
28. your file system and rename it to Shared _ Folder This name is case sensitive and the underscore is important If it is not exactly as above VB will not recognize it as your shared folder Check Auto mount and Make Permanent before clicking OK The path to the shared folder to should appear underneath Machine Folders Click OK Tee o w e e O eree r o o o ooo A QIME tiime Running Oracle VM VirtualBox ii Ubuntu Desktop ES fy 4 11 43AM amp QiimeUser lt t ff E General Shared Folders System Display Name Path Auto Mount Access e Storage Machine Folders pp Audio Transient Folders EP Network I Serial Ports amp Add Share USB Shared Folders Folders List Folder Path M C Wsers Erin Desktop Folder Name Shared_Folder Read only Z Auto mount V Make Permanent _ cancel BGO amp OG O BRrichtcu yalish United States Test the Shared Folder Click the icon in the upper right hand corner to shut down the VM Once you are returned to the VB Manager start the VM again Choose the Send Shutdown Signal option If the once gray folder is now blue a shared folder is active Once the Ubuntu desktop appears double click on Shared_ Folder and the contents of your shared folder should appear Note You cannot write directly to the Shared Folder from the VB You may create a new directory from the host share
29. 0 05 the grouping of samples by conductivity is statistically significant The R value indicates that approximately 10 of the variation in distances is explained by this grouping c Compare Alpha Diversity This script allows you to compare alpha diversity between samples using either a parameter or nonparametric two sample t test http qiime org scripts compare_alpha_ diversity html qlime qiime VirtualBox Desktop Shared Folder cso for hhmi compare alpha diver Sity py i collated alpha PD whole tree txt m mapping files mapping complete tx t c CONDUCTIVITY o compare alpha pd cond txt n 999 t nonparametric p bonfer roni d 1000 compare alpha diversity py i lt collated_alpha div_filepath txt gt This input is the output from collate alpha py m lt mapping file gt c lt metadata category gt 0 lt output_filepath gt t nonparametric OR parametric Optional default is nonparametric n lt num_permutations gt Optional default is 999 permutations and is used only for nonparametric tests p bonferroni OR fdr OR none 58 Here you can determine which multiple comparison correction method you want to use Bonferroni is the most conservative method while FDR false discovery rate 1s less conservative d lt depth of rarefaction to use gt This is the rarefaction depth for which you want to calculate significance Typically we use the greatest depth The output of this script is a tab delimited file containing the
30. AGTCCGGGTACGTACGTAACGCACGCTAGATCTCGTATGCCGTCTTCTGCTTG A _ TAATCTATGGGVHCATCAGGCCCATGCATGCA 5 Read 2 sequencing primer Read sequencing primer Index sequencing primer S ACGTACGTACGGTGTGCCAGOMGCCGCGGTAA gt 5 ATTAGAWACCCBDGTAGTCCGGCTGACTGACT gt TTACTATGCCGCTGGTGGCTCTGCATGCATGCCACACGGTCGKCGGCGCCATT 00 cee eee eee eee e eee eeeeeeee rc amplicon TAATCTWTGGGVHCATCAGGCCGACTGACTGATTGCGTGCGATCTAGAGCATACGGCAGAAGACGAAC Figure 2 Protocol for barcoded Illumina sequencing A target gene is identified which in this case is the V4 region of the 16S rRNA gene This region is PCR amplified using primer constructs with Illumina adapters linker and pad sequences and the forward reverse primers themselves The reverse primer construct contains an additional 12 bp barcode sequence After PCR amplification the target region is labeled with Illumina adapters and the barcode The sequencing primers anneal and produce reads while the index sequencing primer sequences the barcode This information is used prior to analyzing the reads to demultiplex the sequences See Caporaso et al 2011 for more information These PCR amplification protocols are based on the Earth Microbiome Project s list of standard protocols http www earthmicrobiome org emp standard protocols 16s Reactions will be performed in duplicate Record the PCR plate set up in the chart below or in the appropriate spreadsheet on the flash drive Tmitial Volup
31. E forum We refer to the metadata file as the mapping file It is easiest to make the mapping file in Excel and save it as a tab delimited file See the link below for more information http qiuime org documentation file_formats html input files The essential headers are SampleID BarcodeSequence LinkerPrimerSequence and Description The order is important and headers are case sensitive All of the metadata columns are to be placed between the LinkerPrimerSequence and Description columns Keep these formatting requirements in mind 1 Sample IDs must be unique and may contain only numbers letters or periods 2 Use only alphanumeric characters underscores except SampleID and periods 3 Do not use these characters in the mapping file 4 Fill in blank cells with NA 5 Avoid spaces Explanation of required headers SampleID This column contains the unique sample names No duplicates are allowed BarcodeSequence This is the unique barcode sequence that corresponds to each unique sample ID QUME will match the barcoded reads to a sample ID using the information in this column This column is required even if the sequences are already demultiplexed for example in the case of this tutorial LinkerPrimerSequence This is the linker primer pad sequence that 1s added to the amplicon during PCR amplification QIIME will use this information to remove these extraneous bases from the reads This column is required even if the
32. E help forum It is likely that several other people have run into the same issue Python Error Messages Python error messages are the most cryptic error messages Usually formatting issues or bugs in the scripts cause these sorts of errors They are general so other people using Python a programming language used in scripting may have encountered them before Typically they involve Python expecting a string of characters instead of a float number with a decimal or vice versa Sometimes there are temporary files left in directories even after scripts have completed that Python will complain about In previous versions of QUME OTU tables in BIOM format were typically suspect to Python errors because of Consensus Lineage formatting issues in earlier versions of BIOM If you run into Python errors check the QIIME forum or consult other resources like Stack Overflow to troubleshoot the error messages Occasionally there is a bug in the script reports of which you can find on the QIIME Github site Hanging Scripts When you are dealing scripts that require a lot of memory such as open reference or de novo OTU picking or multiple rarefactions sometimes the script hangs without completing From the command line it appears that the script is still running but the memory usage and the CPU usage see the System Monitor are low There might be a temporary file that did not get deleted and the script will therefore not proceed any further It coul
33. GCGGTAA GTGCCAGCMGCCGCGGTAA GTGCCAGCMGCCGCGGTAA PRIMER BARCODE 806 rcbc H09 572 806 rcbc H10 573 806 rcbc F04 543 806 rcbc F05 544 806 rcbc D11 526 806 rcbc Ell 538 806 rcbc E01 528 806 rcbc F01 540 2 Identify the two problems with the above mapping file 3 Explain in words what the following qiime python script will do Make sure you describe each of the parameters add qiime labels py m mapping file txt i file fasta n 1000000 o labelled seqs 4 Briefly describe how we quality filtered our sequences in qiime and why its important to quality filter sequence data before performing data analysis 63 5 Describe how you plan to assess microbial diversity in your study What hypotheses do you have with respect to potential changes in diversity over the course of this study 6 Give one advantage and one disadvantage of using a rarified OTU table to a specific sequencing depth 7 Describe what the following qiime script does Please include a description of the s and c parameters Also describe what the expected output would look like otu category significance py 1 rarefied _otu_tables m Map txt s correlation c pH o correlation txt 8 Why are divergence based measures more robust than species based beta diversity measures 9 Describe stepwise how you would statistically compare alpha diversity metrics between communities 64 Classroom applications Weeks 6 14 of the Environmental
34. IME pipeline which runs in Linux environments handles reads from a variety of sequencing platforms and can be used to process the raw reads generate an OTU table perform diversity analyses and compute statistics This module aims to introduce the shell and useful commands in Linux and to provide a short overview of the capabilities of QIIME for microbial community analysis 24 Module goals The goal of this module 1s to become familiar with the Linux file structure and command line as well as to use the QIIME to analyze 16S rRNA and ITS data from sequences Participants will also learn how to properly utilize multivariate statistics to help answer their biological questions Vision and Change core competencies addressed e Ability to apply the process of science by developing hypotheses and designing bioinformatics biostatistical workflows relevant to understanding microbial communities in their natural environments e Ability to use quantitative reasoning by o developing and Interpreting alpha and beta diversity graphs o Applying statistical methods to understand the relationship between microorganisms and their environment o Using models of microbial diversity to explain observed data o Using bioinformatics and biostatistical tools for analyzing large sequence data sets GCAT SEEK sequencing requirements See description in Module 1 Computer program requirements for data analysis e Microsoft Excel or similar program e Or
35. Metagenomics Workshop Lead by Regina Lamendella Juniata College lamendella juniata edu 814 641 3553 Acknowledgements I would like to thank Ms Erin McClure for her help in developing many of these tutorials and for her help in preparing this document Table of Contents 16S RRNA GENE AND ITS LIBRARY PREPARATION A Background B Library Preparation 1 16S rRNA gene Illumina tag itag PCR 2 ITS Illumina tag itag PCR C Check PCR Amplification 1 Pool replicate samples 2 E gel electrophoresis 3 DNA quantification with the Oubit fluorometer a Introduction b Materials c Protocol D Quality Check Libraries 1 Pool samples 2 Gel electrophoresis 3 QIAquick gel purification 4 Bioanalyzer a Introduction b Agilent high sensitivity DNA assay protocol c Interpreting Bioanalyzer results E References and suggested reading BIOINFORMATICS A Background B Unix Linux tutorial C Introduction to QIIME D Installing the QIIME VirtualBox image E QUME 16S workflow 1 Conventions 2 Flowchart 3 Metadata 4 Extract compressed files 5 Split libraries workaround 6 OTU table picking 7 Initial analyses a OTU table statistics b Clean OTU table c Summarize taxa 8 Diversity analyses a Alpha diversity b Beta diversity 9 Statistical analyses a OTU category significance b Compare categories c Compare alpha diversity 10 Heatmaps F QUME ITS workflow 1 Obtain tutorial files 2 OTU picking G Re
36. acle VirtualBox free see appendix e QIIME 1 7 free e Server or cluster with multiple cores You will have remote access to both the GCAT SEEK and HHMI clusters Protocols A Unix Linux Tutorial Linux is an open source Unix like operating system It allows the user considerable flexibility and control over the computer by command line interaction Many bioinformatics 25 pipelines are built for Unix Linux environment therefore it is a good idea to become familiar with Linux basics before beginning bioinformatics Every desktop computer uses an operating system The most popular operating systems in use today are Windows Mac OS and UNIX Linux is an operating system very much like unix and it has become very popular over the last several years Operating systems are computer programs An operating system is the first piece of software that the computer executes when you turn the machine on The operating system loads itself into memory and begins managing the resources available on the computer It then provides those resources to other applications that the user wants to execute The shell The shell acts as an interface between the user and the kernel When a user logs in the login program checks the username and password and then starts another program called the shell The shell is a command line interpreter CLI It interprets the commands the user types in and arranges for them to be carried out The commands are themselv
37. ail to map to SampleIDs 0 000 Percent of sequences with invalid characters 0 000 Percent of sequences with barcodes detected 0 000 Percent of sequences with barcodes detected at the beginning of the sequence 0 000 Percent of sequences with primers detected 0 000 6 OTU Table Picking Picking the OTU table is the most computationally intensive step in the entire workflow Fortunately there exist a few ways to reduce the computational time needed while retaining most of the informative reads Closed reference de novo and open reference OTU picking are three ways OTU picking can be done in QIIME We use the Greengenes 16S rRNA database which is pre clustered and of high quality chimera checked Approximately annual updates are released It 1s advisable to use the latest version of the database especially if you are characterizing non human associated environments since thousands of new OTUs are added with each update Closed reference OTU picking uses the algorithm UCLUST to compare the reads against Greengenes If a read fails to match a reference sequence at least 70 identity it is discarded Closed reference OTU picking is comparably fast however many reads are discarded We will use closed reference OTU picking in this workshop 45 De novo OTU picking is much slower however no reads are discarded The reads are clustered with themselves and assigned taxonomies Chimeric sequences could be present so it is essential to check
38. alyzer works with RNA as well and is useful for determining the quality of RNA See the DNA assay protocol and the Agilent website for more information about the applications and troubleshooting guides http www genomics agilent com en Bioanalyzer Instruments Software 2 100 Bioanalyzer cid AG PT 106 amp tabId AG PR 1001 b Agilent High Sensitivity DNA Assay Protocol Preparing the Gel Dye Mix 1 Allow DNA dye concentrate blue and DNA gel matrix red to equilibrate to gel dye mix room temperature for 30 min 25ul dye 2 Vortex DNA dye concentrate blue and add 25 pl of the dye to a DNA gel matrix vial red 3 Vortex solution well and spin down Transfer to spin filter 4 Centrifuge at 2240 g 20 for 15 min Protect solution from light Store at 4 C Loading the Gel Dye Mix 1 Allow the gel dye mix equilibrate to room temperature for 30 min before use Put a new DNA chip on the chip priming station Pipette 9 0 pl of gel dye mix in the well marked Make sure that the plunger ts positioned at 1 ml and then close the chip priming station Press plunger until it ts held by the clip Wait for exactly 60 s then release clip Wait for 5 s Slowly pull back plunger to 1ml position 8 Open the chip priming station and pipette 9 0 pl of gel dye mix in the wells marked 6 W Rh w om ol Loading the Markers 1 Pipette 5 yl of marker green in all 12 sample wells and ladder well Do not leave any
39. any people use the version of QIIME distributed inside of a VirtualBox The VirtualBox is a free product from Oracle the company that maintains Java VirtualBox allows a segment of your computer to be virtualized and run a different operating system The QIIME developers package QIIME and all of it 1ts dependencies inside a virtual disk image VDI which functions as a virtual hard drive with pre installed components We will use the VirtualBox here Detailed instructions for installing the VirtualBox and creating a virtual machine with the QIIME VDI are in the appendix 26 Once you have the Linux operating system running we can explore a few features of the Ubuntu distribution The graphical user interface GUI gooey looks like the typical desktop of a PC The dashboard on the left shows all of the frequently used or the running applications The System Monitor is the task manager of Ubuntu and you can see how many resources RAM CPU etc are being used here You can navigate using the graphical file structure by clicking on the directories folders on the desktop or by using the terminal In this example we ve navigated to the Desktop of the qiime user from the root directory The directories Before you start and Shared Folder can also be seen in the terminal Terminal apash home qiime qiime VirtualBox Desktop qiime qiime VirtualBox cd F nefora Saera qiime qiime VirtualBox ls _you_ qiime qiime Virtua
40. as originally written for structural biology research There are two different scripts for making 3D plots Generate both of them and see which is more useful for your study Make sure to make 3D plots for both unweighted and weighted Unifrac principal coordinates matrices http qume org scripts make_emperor html http qiuime org scripts make_3d_plots html make emporer py for Emperor qiime qiime VirtualBox Desktop Shared Folder cso for hhmi make emperor py i beta div pc unweighted pc txt m mapping files mapping complete txt o 3d unweig hted emperor qlime qiime VirtualBox Desktop Shared Folder cso for hhmi make 3d plots py i beta div pc unweighted pc txt m mapping files mapping complete txt o 3d unwel ghted king make 3d_plots py for KiNG 1 lt pc_matrix txt gt m lt mapping_ file gt o lt output_dir gt The 3D plots can be customized by adding vectors to connect samples add_ vectors defining an explicit axis ex a pH and or by making biplots t to input taxa summaries See the documentation for more information Figure 16 Sample 3D principal coordinates analysis of unweighted Unifrac distances Plot A was generated with Emperor while Plot B was generated with KiNG Samples are colored by date 55 The outputs of these scripts are HTML files with the 3D plots Both Emperor Chrome and Firefox and KiNG run in web browers however KiNG requires Java which doesn t always cooperate Below is
41. ations and describe the underlying biochemistry behind each step Define sources of variation in each step of our experimental sampling and design methods to measure this variation e Understand the difference between biological and technical replication and provide examples of each in the context of our experimental design e Describe the sources of error in a 16S rRNA gene study e Design a replication strategy for our 16S rRNA gene study e Describe some statistical methods that we can use if replication is not feasible Week 2 Illumina tag PCR Outline of Objectives e Describe biases associated with PCR and how they might effect microbial community analysis Understand how the Illumina itag PCR works i e be able to draw the forward and reverse constructs and know the function of each portion of the construct Also be able to draw the first three cycles of PCR e Evaluate potential strategies for overcoming these different PCR biases e Define different ways to measure DNA concentration e Perform itag PCR on our environmental samples e Discuss the various biases associated with different regions of the 16S rRNA gene e Explain how the secondary structure of the gene is relevant to its evolution and function 21 e Introduce various technologies that each group with be making poster for DNA RNA co extraction itag PCR the Qubit amp Bioanalyzer group of three E gels q PCR Illumina sequencing e Utilize a bufferless
42. bose f name of the archive Move to the raw _ data directory and list Is the contents to check If the directory contents look like below you are ready to proceed with the tutorial 38 qlime qiime VirtualBox Desktop Shared Folder cso for hhmi raw data ls Now that you have successfully extracted the archive let s break for some relevant humor a TO DISARM THE BOMB SIMPLY ENTER A VALID YOU USE UNIX tar COMMAND ON YOUR FIRST TRY NO GOOGLING COME QUICK http xkcd com 1168 You will likely generate a number of large files in the process of analyzing metagenomic data Therefore you will likely come across tar more often in the future Unfortunately it s nearly impossible to keep the flags straight especially across Unix Linux distributions and zipping utilities 5 Split Libraries Workaround Since we are dealing with pre demultiplexed data in fastq format it is necessary to convert the fastq files to fasta and qual files before running any other scripts as these scripts can only handle the fasta file format We can use bash scripts to automate these steps to save typing and reduce the possibility of errors Bash Bourne again shell scripts are small programs written in the terminal which you can use to automate certain tasks While bash scripting takes some practice implementing while loops can save a lot of time a Make a list of the files you want to convert from fastq to fasta qual format qlime
43. classic format which has a txt extension instead of a biom extension To install R see the appendix convert biom py i lt table biom gt o lt table from_biom txt gt b http learningomics wordpress com 2013 02 23 from otu table to heatma F QUME Fungal ITS Workflow 1 Obtain Tutorial Files The QIIME documentation contains example files you can use to practice the workflow The tutorial files are from a study of the fungal community in soils Use wget to download the files and the ITS reference OTUs Download tutorial files and reference OTUs glime qiime VirtualBox wget https s3 amazonaws com s3 qiime tutorial files its soils tutorial tqz 59 weet https s3 amazonaws com s3 qime tutorial files its soils tutorial tgz qlime qiime VirtualBox wget https github com downloads qiime its reference otus its 12 11 otus tar gqz weet https github com downloads qiume its reference otus its_12 11 otus tar gz Unzip the files using tar and gunzip qliime qiime VirtualBox its tar xzf its tutorial tgqz qliime qiime VirtualBox its tar xzf its otus tar gz qlime qiime VirtualBox its gunzip its otus rep set 97 otus fasta gz qlime qiime VirtualBox its gunzip its otus taxonomy 97 otu taxonomy t XT gz tar xzf its soils tutorial tgz tar xzfits 12 11 otus tar gz gunzip its 12 11 otus rep_ set 97_otus fasta gz gunzip its 12 11 otus taxonomy 97 otu_taxonomy txt gz You can see which
44. d Information http www youtube com watch v t0akxx8Dwsk Our prepared libraries for this workshop will be performed at the Dana Farber Sequencing Center They offer full MiSeq runs for 1 000 for educational research purposes http pcpgm partners org research services sequencing illumina Othe BROAD Institute provides a great set of Illumina sequencing videos which are really in depth and helpful Visit http www broadinstitute org scientific community science platforms genome sequencing broadillumina genome analyzer boot cam Instrumentation and Supply requirements for this Module 1 Pipettes and tips For projects with more than 48 samples multi channel pipettes are helpful 2 Qubit fluorometer Life technologies more information at http www 1nvitrogen com site us en home brands Product Brand Qubit html Note The PicoGreen assay and a Spec reader is just as accurate as the Qubit 2 0 fluorometer Nanodrop or specs that read 260 280 ratio can be used but are not as accurate because other substances can absorb at the same wavelength as DNA and skew results 3 Thermalcylcer pretty much any one will do At Juniata we use a mix of BIO RAD s and MJ Research cyclers 4 Electrophoresis unit Any electrophoresis unit will work fine We typically use between 1 2 gels for all applications We stain our gels with GelStar GEL STAIN Ethidium bromide is fine too Any 1Kb ladder will suffice ror the initial check gel a
45. d also be that the script is performing very inefficiently swap is high and will not complete in a reasonable time frame Some worfklow scripts like alpha_rarefaction py and any of the OTU picking workflows are typically suspect to hanging errors In these cases attempt to split up the workflow so that you have better control over the steps You may need to upgrade RAM use more resources on a cluster or check out the elastic compute cloud from Amazon Web Services to get enough RAM to complete the script 2 Connecting to Juniata s HHMI Cluster This guide by Alex Sickler is a detailed protocol for connecting remotely to the cluster for both Unix Linux and Windows users Mac or Linux open a terminal window enter ssh username 10 39 6 10 Windows Download putty from http www chiark greenend org uk setatham putty download html Launch putty In the host name box enter 10 39 6 10 Click open 74 ike PuTTY Configuration 52 Category Session Basic options for your PuTTY session T Specify the destination you want to connect to Termina Keyboard 5st Name or IP address ort Bell Features nection type Window Raw Telnet Riogin SSH Serial ii Load save or delete a stored session ehaviour Translation Saved Sessions Selection Colours Default Settings Load Connection _ SSS Data Save Proxy ee Telnet Delete Rlogin SSH Serial Close window on exit Always Never
46. d folder e g the desktop but not from the Ubuntu desktop This works in scripts as well you cannot write or move something to the Shared Folder If you try to do this the error below will arise 35 giime qiime VirtualBox Desktop Shared Folder qlime qiime VirtualBox cd Desktop qlime qiime VirtualBox Desktop cd qliime qiime VirtualBox cd Desktop Shared Folder qiime qiime VirtualBox Desktop Shared Folder mkdir test mkdir cannot create directory test Permission denied qiime qiime VirtualBox Desktop Shared Folders f To get around this create a subfolder in your Shared Folder from your host machine You may write to this subfolder within the Shared Folder from the command line and by using the Ubuntu GUI The root password for the VM is qiime E QUME 16S WORKFLOW 1 Conventions type text in this font directly into the command line lt use your own name gt omit lt gt Comments do not type into command line 2 Flowchart Format metadata into a mapping file Pick OTU table Clean OTU table Convert fastq files into fasta and qual files Quality filter reads Diversity analyses Get the reverse complement of the reads Other statistics Add QIME labels 36 Figure 9 The QIIME workflow as presented in this tutorial 3 Metadata Formatting the metadata properly is challenging but there are detailed instructions and troubleshooting tips in the QUME documentation and on the QIIM
47. dd qiime labels py m lt mapping txt gt Don t forget to put in the file path i lt fasta_dir gt Ensure that only the desired fasta files are in this directory c lt name gt This is the header of the column in the mapping file where the file names are located n 1000000 44 Each read needs a unique identifier Starting at 1000000 is recommended o lt output_dir gt The output of this script will be one fasta file called combined_seqs fna containing all of the reads with QIIME labels g Validate the QIIME labels Before moving on it s good to check that the labels were added correctly and that QIIME can understand the sample names If this script fails you will not be able to do any of the downstream analyses http qiuime org scripts validate_demultiplexed_fasta html gqlime qiime VirtualBox Desktop Shared Folder cso for hhmi labeled rc seqs val idate demultiplexed fasta py 1 combined seqs fna o home qiime Desktop Shared Folder cso for hhmi validated fasta m home qiime Desktop Shared Folder cso fo r hhmi mapping files mod mapping txt validate demultiplexed fasta py i combined_seqs fna m lt mapping _file txt gt o lt output_dir gt A log file will be written to the output directory Make sure there are no QUME incompatible fasta labels fasta file combined_seqs fna validation report Percent duplicate labels 0 000 Percent QIUME incompatible fasta labels 0 000 Percent of labels that f
48. der cso for hhmi converted files truncated fasta qliime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files trun cated fasta mkdir trunc fasta only qiime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files trun cated fasta cp fasta trunc fasta only qlime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files trun cated fasta cd trunc fasta only qiime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files trun cated fasta trunc fasta only ls gt trunc fasta txt qlime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files trun cated fasta trunc fasta only gedit trunc fasta txt cd lt truncated_fasta gt mkdir lt trunc_fasta_only gt cp fasta trunc_fasta_only cd trunc fasta only Is gt trunc_fasta txt gedit trunc_fasta txt Make an output directory for the reverse complements of the truncated sequence files Implement the loop Make sure the working directory contains only the truncated fasta files 43 qlime qiime VirtualBox mkdir Desktop Shared Folder cso for hhmi rc seqs qliime qiime VirtualBox cd Desktop Shared Folder cso for hhmi converted files truncated fasta trunc fasta only qiime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files trun cated fasta trunc fasta only while read Line gt do time adjust seq orientation py 1 line o home qiime Desktop Shared Folde r cso for hhmi rc seqs line gt do
49. e beta diversity workflow gqiime qiime VirtualBox Desktop Shared Folder cso for hhmi filter otus from ot u table py i otu table biom o otu table minl0 biom n 10 filter _ otus from otu table py i lt rarefied_otu table biom gt 0 lt output otu table biom gt n 10 qiime qiime VirtualBox Desktop Shared Folder cso for hhmi single rarefaction py i otu table minl0 biom d 1000 o otu table minl0 even1000 biom single _rarefaction py i otu_table biom 47 o lt output otu_table biom gt d 1000 When choosing a depth at which to rarefy it is important to maximize the number of samples kept while avoiding samples with a very low number of sequences A good rule of thumb is to exclude samples with fewer than 1000 seqs sample c Summarize Taxa The OTU table is divided into taxonomic level and either the relative abundance or absolute abundance of each taxon in each sample is returned These taxa summaries are useful to detect broad trends in the dataset and to make bioplots see beta diversity make _3d_plots py There is a workflow script for generating plots however we have found it easier to manipulate the data in Excel http giime org scripts summarize_taxa html qiime qiime VirtualBox Desktop Shared Folder cso for hhmi summarize taxa py i otu table evenl000 minlOotus biom o summarized taxa summarize taxa py i lt clean otu_table biom gt 0 lt output dir gt The default ouput of this
50. eagent intercalates double stranded DNA dsDNA and fluoresces only after intercalation Other methods of DNA quantification rely on UV Vis spectroscopy to quantify nucleic acids however they are much less specific as the dsDNA RNA and proteins absorb overlapping wavelengths Since the fluorophore fluoresces only after intercalating dsDNA the DNA concentrations assayed with the Qubit system are much more accurate than with other methods See the Qubit 2 0 user manual and the Invitrogen website for more information http www invitrogen com site us en home brands Product Brand fluorometer html Figure 3 The fluorescent probe blue intercalates the dsDNA allowing for both precise measurement of the dsDNA concentration of a sample For more information see the 10 Invitrogen website http www invitrogen com site us en home Products and Services Applications DNA RNA Purification Analysis html b Materials Qubit Fluorometer Qubit dsDNA HS Buffer Qubit reagent protect from light Qubit Assay tubes Standards 1 and 2 concentrations of Ong and 10ng respectively DNA extract or PCR Product c Protocol Manufacturer s Diagram E il t Standards from kit nsure all reagents are at room temperature w w if 7 i 10 ut 10 mt 7 H H Final volume 7 190 uL is 200 uL Vortex all assay tubes for Qubit 2 3 seconds Reagent 1xn yl Incubate at room temperature for 2 minutes User samples ee heres i
51. eet dna_quants xlsx for the volume of each sample to add to the pool 2 Gel Electrophoresis prep 90 mins loading and runtime 90 mins There may be extra bands in the PCR product caused by amplification of chloroplast DNA primer dimers or other undesired amplification You can separate the desired band from the unwanted bands by traditional gel electrophoresis and gel purification 1 Tape up gel tray or use rubber gasket to seal 2 Place 4 combs into gel tray at every 3 notches Do this before you pour the gel 3 Make a 2 agarose gel a Weigh out 8 g agarose b Make 400 ml 1X TBE buffer 80 ml 5X TBE 320 ml DI H20 c Microwave gel mixture to combine ingredients check every 30 40 sec watch out for boiling over wear a full glove to protect your hand and forearm in case it does boil over 4 Cool gel solution on a stir plate NO HEAT until you can touch the glass 5 Add 4 ul Gelstar SYBR green based per 100 ml gel to the gel 16 ul right before you pour 6 Pour gel slowly from one corner Avoid introducing air bubbles as you pour 7 Let gel cool for between 1 and 1 5 hours Remove tape and place into gel rig 8 Make 2500 ml 0 5X TBE running buffer big rig Make sure it s enough to cover gel 9 Make up 50 ml of loading buffer keep in refrigerator a 15 g sucrose b 0 175 g Orange G c bring up to 50 ml with DI H2O 10 Load 4 ul orange g into each well of a 96 well plate 11 Load 4 ul PCR product into each we
52. en biological and technical replicates 18 o Describe how replication enables one to make more inference about the potential differences in bacterial composition between two samples o Discuss how the secondary structure of the gene is relevant to its evolution and function Discuss how Carl Woese discovered the third domain of life Describe the utility of multiplexing samples using this approach Discuss biases associated with each section of the modules especially PCR biases Discuss the importance of quality checking DNA to be sequenced O O O Discuss Illumina sequencing chemistry Applications in the classroom Goals Environmental Genomics Research Course Syllabus Meeting Time TBA two three hour sessions Students will learn and apply hands on novel molecular techniques to study microbial communities in the context of an environmental or health related problem Students will learn how to generate microbial DNA libraries for high throughput sequencing and use appropriate informatics and statistical workflows to analyze sequence data they generate to answer biologically meaningful research questions Required Text Samuelsson Tore Genomics and Bioinformatics An Introduction to Programming Tools for Life Scientists Cambridge University Press 2012 Course Objectives Students will design and execute a project where they will investigate the microbial community profiles from samples of environmental or health
53. erent methods of correcting p value for multiple comparisons The name of the OTU is listed under Consensus Lineage 56 The OTU category lists the OTU number identifier which corresponds to the OTU name given in the Consensus Lineage column The Bonferroni and the FDR corrected p values refer to the p value after it 1s corrected for multiple comparisons Bonferroni 1s more conservative than FDR false discovery rate The r column gives the Pearson correlation of the metadata category to the OTU In this case the genus Leadbetterella name cut off is correlated strongly r 0 825 with conductivity as well as Clostridium bowmanii r 0 795 name cut off Pseudomonas fragi is strongly negatively correlated with conductivity r 0 792 name cut off The category prob lists the probability that the OTU relative abundance is correlated with the category values across samples The categories otu values y and otu values x list the data used to calculate the correlation b Compare Categories This script implements a number statistical tests to compare distance matrices and is therefore a good way to tell how significant the clustering observed in the PCoA plot is Below are the descriptions from selected methods adonis Partitions a distance matrix among sources of variation 1n order to describe the strength and significance that a categorical or continuous variable has in determining variation of distances This
54. es programs when they terminate the shell gives the user another prompt on our systems qliime qiime VirtualBox Desktop Filename Completion By typing part of the name of a command filename or directory and pressing the Tab key the shell will complete the rest of the name automatically If the shell finds more than one name beginning with those letters you have typed it will pause prompting you to type a few more letters before pressing the tab key again History The shell keeps a list of the commands you have typed in If you need to repeat a command use the cursor keys to scroll up and down the list or type history for a list of previous commands Files and Processes Everything in UNIX is either a file or a process A process is an executing program identified by a unique process identifier A file is a collection of data They are created by users using text editors running compilers etc Examples of files e a document report essay etc e the text of a program written in some high level programming language e instructions comprehensible directly to the machine and incomprehensible to a casual user for example a collection of binary digits an executable or binary file e a directory containing information about its contents which may be a mixture of other directories subdirectories and ordinary files It 1s not required to have a Linux operating system to use QIIME While a native installation is possible m
55. f the directory will be listed in a text file called list Superuser sudo lt command gt password is qiime B Thinking about your biological question s Take a moment to think about the questions you are investigating Are you interested in how members of the microbial community are varying with environmental parameters With a specific treatment or disease state Are you interested in the presence absence relative abundance or both Are you interested in rare members of the microbial community Do you have any specific hypotheses Take a moment to write these biological questions down in the space below This will help you understand what statistical approaches you may want to use C INTRODUCTION TO QIIME High throughput sequencing generates hundreds of thousands of reads per sample therefore bioinformatics tools are necessary to translate the raw reads into meaningful biological information Bioinformatics pipelines like QIIME enable microbial community analysis in that they support the processing and analysis of large datasets containing millions of reads QIIME uses reference databases to assign taxonomic information to a string of nucleotides representing the 16S sequences of most of the microbes in a sample The definition of species in microbiology is somewhat complicated as the traditional rules for defining species do not apply to microbes In general if two organisms have 97 16S rRNA sequence identity they are considered to
56. ferences and suggested reading APPENDIX A Helpful links B Additional protocols scripts 1 Purification by SPRI beads 2 Splitting libraries the traditional method C Other software 1 Other ways to analyze metagenomes 2 Installing R 3 Proprietary software for data analysis D Computing 1 Troubleshooting error messages 2 Connecting to Juniata s HHMI cluster 3 Python Notebook 4 Bash scripting E References MODULE 1 PREPARATION OF MICROBIAL SAMPLES FOR HIGH THROUGHPUT SEQUENCING i A After this module you will be able to show your children how to do this I promise Background The term metagneomics was originally coined by Jo Handelsman in the late 1990s and is currently defined as the application of modern genomics techniques to the study of microbial communities directly in their natural environments The culture independent molecular techniques have allowed microbiologists to tap into the vast microbial diversity of our world Recently massively parallel high throughput sequencing HTS has enabled taxonomic profiling of microbial communities to become cost effective and informative Initiatives such as the Earth Microbiome Project the Hospital Microbiome Project the Human Microbiome Project and others are consortia tasked with uncovering the distribution of microorganisms within us and our world Many efforts have focused on probing regions of the ribosomal RNA operon as a method for telling
57. files are in the ITS soils tutorial by going to the its soils tutorial directory and using Is to list the contents qlime qiime VirtualBox its cd its soils tutorial qlime qiime VirtualBox its its soils tutorial ls map txt params txt README md seqs fna qiime qiime VirtualBox its its soils tutorial cd home qiime its soils tutorial Is The tutorial includes the sequences in fasta format seqs fna a mapping file map txt a parameters file params txt and a readme file README md 2 OTU Picking In this tutorial we are doing open reference OTU picking which is a compromise between rapid closed reference OTU picking which excludes a good chunk of sequences and slow de novo OTU picking which retains most reads The parameters file included a simple text file that you can input to save typing out the options into the script arguments and to specify options in workflow scripts Many other scripts can take parameters files too but we will not use them in this workshop More information can be found in the QIIME documentation http qiime org documentation giime_parameters_files html Use nano to view the parameters file nano params txt Here the options for OTU picking and beta diversity are specified The differences between ITS and 16S analyses are in the assign taxonomy and the beta diversity options We must change the default behavior of QUME to use the fungal ITS reference database not Greengenes and use the B
58. for conductivity is shown below NOTE This script does not work in QIIME 1 6 0 Use QIIME 1 5 0 1 6 dev or 1 7 0 http qiime org scripts otu_category_significance html qlime qiime VirtualBox Desktop Shared Folder cso for hhmi otu category signif icance py i otu table minl0 evenl000 biom s correlation m mapping files mappi ng complete txt c CONDUCTIVITY o pearson conductivity txt otu category significance py i lt clean otu table biom gt 0 lt output_filepath txt gt m lt mapping file txt gt s ANOVA or correlation c lt metadata_category gt ANOVA requires categorical data Pearson requires continuous data no NAs etc The output of this script is a tab delimited file with the OTU the p values and the correlation coefficients Open it in Excel for easier manipulation OTU prob oticat Bonferroni FDR_corr r Consensus Lineage 913174 1 10E 08 0 27 6 29E 06 6 29E 06 0 8254 k_ Bacteria p_ Bacteroidetes c_Cytophagia o__Cytophagales f __Cytophagaceae g__Leadb 4477999 9 14E 08 0 27 5 21E 05 2 61E 05 0 7951 k Bacteria p_ Firmicutes c_Clostridia o__Clostridiales f __Clostridiaceae g Clostridium 327694 1 09E 07 0 27 6 21E 05 2 07E 05 0 792 k_ Bacteria p_ Proteobacteria c_Gammaproteobacteria o Pseudomonadales f __Pseudom Figure 17 Computing Pearson correlations results in this sample output The correlation coefficient is given by r while the Bonferroni and FDR corrected p values offer two diff
59. for chimeras before proceeding with downstream analyses Open reference OTU picking is a compromise between closed reference and de novo OTU picking Initially the reads are compared to the reference database and those that fail to match a reference sequence at at least 70 identity are retained The reads that didn t hit the database are clustered de novo and taxonomies are assigned Open reference OTU picking 1s slower than closed reference reads that may represent novel OTUs are not discarded The QIIME developers recommend open reference OTU picking http qiime org scripts pick_closed_reference_otus html qlime qiime VirtualBox Desktop Shared Folder erin cso pick closed reference o tus py i combined seqs fna a 0 15 o uclust closed otus r home qiime qiime software gg 13 5 otus rep set 97 otus fasta t home qiime qiime software gqg 13 5 otus taxonomy 97 otu taxonomy txt Jj pick closed reference otus py i lt input _seqs fna gt The sequences must be in fasta format o lt output_dir gt r lt rep_set gt Use the representative set from the latest version of Greengenes We typically use 97 as OTUs with 97 identity very roughly correspond to the same species t lt taxonomy map gt Use the taxonomy from the latest version of Greengenes a run in parallel O lt num jobs to start gt 7 Initial Analyses a OTU Table Statistics This script summarizes the OTU table information which allows you to find out basic
60. fter PCR we use the high throughput E gel system by Life Technologies to save time in the classroom The gels are bufferless precast and 5 only take 12 minutes to run More information on the Egel system can be found at http www invitrogen com site us en home Products and Services A pplications DNA RNA Purification Analysis Nucleic Acid Gel Electrophoresis E Gel Electrophoresis System 5 PCR reagents Qiagen HOTSTAR Kit is what we use for the PCR reagents in this module 6 Primers used in this study were ordered from IDT These primers were ordered in a 96 well format and were normalized to 3 nanomoles The approximate cost is 28 cents per basepair So each plate of primers costs roughly 2 000 USD Call your regional IDT rep and they will give you a sizeable discount A list of the primers used in this study More information can be found at Luckily we have tons of bacterial and fungal primers which we can send aliquots of directly to you 1f needed EN list of the primer constructs used in this module can be found in the Appendix 7 Gel visualization apparatus Any type of UV box with a camera adaptor will work 8 Bioanalyzer 2100 and Expert software More information on the bioanalyzer is available at https www genomics agilent com article jsp pageld 275 Ve you don t have a bioanalyzer any sequencing facility can quality check your libraries for you for a small additional cost roughly 100 150 chip Table 1 L
61. g f Incubate at 80 C for approximately 15 minutes g Centrifuge at maximum speed at 4 C for 20 minutes h Remove the supernatant 1 Wash the pellet with 1 volume of 70 ethanol Note 70 ethanol is hygroscopic Fresh 70 ethanol should be prepared for optimal results j Centrifuge again at maximum speed at 4 C for 20 minutes k Remove the supernatant 1 Aur dry the pellet for 3 4 minutes Note Tubes can be placed in water bath at 37 C for 5 minutes m Resuspend the pellet in 25ul TE buffer 25ul of 2ug for half plate Concentration Option B The pooled samples will need to be concentrated into 25ul in order for the sequencing libraries to be created 70 Using Millipore Centrifugal filter Units Amicon Ultra 0 5ml 30K membrane Pool samples from half plate together before concentration 1 Add 40ul of TE water to the filter spin the filter at 14 000g for 5minutes a This is a pre wet step to get better yields 2 Add the sample to the filter and centrifuge at 14 000g at room temperature for 8minutes a Time varies depending on the starting volume 3 Final volume left in the filter unit should be about 25ul 4 Flip the filter unit and re centrifuge at 1000g for 2minutes 2 Splitting Libraries The traditional method Often MiSeq sequencing platforms demultiplex the data with the files in fastq format which does not fit well into the QIIME workflow Typically splitting the libraries is done
62. he environment is If the OTU richness increases as more OTUs are being sampled then you probably should rarefy to a deeper sampling depth In this case species richness drops off as sampling depth increases The HTML output file includes all of the metadata categories and metrics used so it is easy to look at alpha diversity as calculated by multiple metrics and colored by multiple metadata variables The script compare _alpha py will allow you to determine if differences in alpha diversity values are significant b Beta Diversity QIIME offers a workflow script for computing beta diversity or each step can be performed individually The current workflow script uses KiNG to visualize the 3D PCoA plots however splitting up the workflow into individual scripts allows you to use Emperor a program specifically written by the Knight Lab to visualize PCoA plots The most computationally intensive part of beta diversity calculations is computing the distance matrix of pairwise dissimilarity values between samples so be sure to run this script in parallel if possible There is a number of different beta diversity metrics supported in QIIME however we have found the Unifrac distance metrics to be most useful and informative Unifrac takes phylogenetic relatedness into account in computing beta diversity Unweighted Unifrac regards only the presence or absence of taxa whereas weighted Unifrac uses taxon relative abundance as well 51 Workflow scri
63. horesis 15 30 mins for loading 15 mins for runtime E Gels can be used to check the PCR product instead of traditional gel electrophoresis We will only combine the successfully amplified samples for sequencing We can also detect the presence of additional bands in the reactions which may signal amplification of chloroplast DNA or excessive primer dimer bands If these bands are present we will need to purify the desired band from a traditional agarose gel The gels come pre stained with EtBr or SYBR and are encased in plastic Buffer is already included and the gels run rapidly 10 min The E gel electrophoresis unit and a specific ladder are required 1 Combine 16 ul diH20 and 4 ul PCR product 2 Remove the comb from the gel 3 Load into E gel wells Load 20 ul diH20 into empty wells 4 Load 10 ul low range ladder 5 Depending on the gel base choose the proper program if available and begin electrophoresis 6 Run for the specified amount of time 7 Remove E gel and image in UV box 16S rRNA gene products will be roughly 300 350 bp ITS amplification products will be roughly 1000 1100 bp 8 Record successful reactions on the E gel layout spreadsheet 3 DNA Quantification with the Qubit fluorometer Sample dependent about 90 minsfor 96 samples a Introduction The Qubit system was designed to specifically quantify nucleic acids and proteins using small quantities of PCR product The fluorescent probe Qubit r
64. ia org wiki Bash_ 28Unix_shell 29 E References l T Caporaso J G et al Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample Proc Natl Acad Sci U S A 108 Suppl 1 4516 4522 2011 Gardes M amp Bruns T D ITS primers with enhanced specificity for basidiomycetes application to the identification of mycorrhizae and rusts Mol Ecol 2 113 118 1993 McGuire K L et al Digging the New York City Skyline Soil Fungal Communities in Green Roofs and City Parks Plos One 8 e58020 2013 Qubit 2 0 User Manual http www invitrogen com etc medialib en filelibrary cell tissue _analysis Qubit all file types Par 0519 File dat Oubit 2 Fluorometer User Manual pdf Qubit dsDNA HS Manual http probes invitrogen com media pis mp3285 1 pdf Bioanalyzer DNA Assay 1000 Manual http www chem agilent com library usermanuals Public G2938 90014 KitGuideDNA1000Assay_ebook pdf Bioanalyzer DNA Assay 1000 Quick Protocol http www chem agilent com library usermanuals Public G2938 90015_QuickDNA1000 pdf Jq
65. information about the OTU table to verify that OTU picking occurred as expected http biom format org documentation summarizing biom tables html 46 qiime qiime VirtualBox Desktop Shared Folder cso for hhmi print biom table su mary py 1 otu table biom um samples 46 um observations 19495 otal count 1529520 0 able density fraction of non zero values 0 1772 able md5 unzipped 1b69e8f8b01cf532e4e8335bdb67d0d5 Counts sample summary Min 2 06 Max 184870 0 Median 12409 5 Mean 33250 434 7826 Std dev 46267 857757 Sample Metadata Categories None provided Observation Metadata Categories taxonomy Counts sample detail 36 2 0 T F tee A S6 3 0 T ge y S47 47 0 print biom table summary py i otu_table biom num_ observations Optional returns the number unique OTUs in each sample The results of this script are written to the terminal We can see that there are 46 samples in the OTU table We have 19 495 OTUs Num observations and a total of 1 529 250 reads In Count sample detail the number of sequences in each sample is listed in ascending order b Clean the OTU Table If it suits your study it is a good idea to remove OTUs with less than 10 observances The OTU table should also be rarefied prior to doing any further analysis as even sequencing depth is required to compare samples The final OTU table should be used in all downstream analyses except in alpha rarefaction and when using th
66. ing Python with Python Notebook 5 Quick Installation Guide QIIME amp Python Notebook Illumina Overview Python Notebook Tutorial Fungal ITS Python Notebook Tutorial Integrating QIIME and IPython Notebook 4 Bash Scripting Bash Scripting can be used to automate repetitive scripts or tasks that require lots of manual file management While you may be tempted to physically bash the computer correctly implemented bash scripts can save a lot of time and reduce human error The typical bash scripts we use with QIIME are loops to automate such tasks as file conversion or file truncation Several bash scripts are used in this tutorial and they are based on reading a file line by line and performing some task while there are still lines to be read in the file Make a list of the files you want to do something to by using Is and writing the list to a file Is gt list txt The general architecture of a while loop is below While the first line is read do the script and close the list once all the lines have been read Time is there to timestamp the scripts but is not required which may be helpful for estimating future runtimes and troubleshooting memory intensive scripts while read line do time lt copy and paste your script here using line as the input gt done lt list txt 76 Beginner s Guide to Bash Scripting http www tldp org LDP Bash Beginners Guide html Short History and Overview of Bash https en wikiped
67. is a nonparametric method and is nearly equivalent to db RDA see below except when distance matrices constructed with semi metric or non metric dissimilarities are provided which may result in negative eigenvalues adonis is very similar to PERMANOVA though it is more robust in that it can accept either categorical or continuous variables in the metadata mapping file while PERMANOVA can only accept categorical variables See vegan adonis for more details ANOSIM Tests whether two or more groups of samples are significantly different based on a categorical variable found in the metadata mapping file You can specify a category in the metadata mapping file to separate samples into groups and then test whether there are significant differences between those groups For example you might test whether Control samples are significantly different from Fast samples Since ANOSIM is nonparametric significance is determined through permutations See vegan anosim for more details PERMANOVA This method is very similar to adonis except that it only accepts a categorical variable in the metadata mapping file It uses an ANOVA experimental design and returns a pseudo F value and a p value Since PERMANOVA is nonparametric significance is determined through permutations http qiuime org scripts compare_categories html Since we have most continuous data the script will be shown with adonis over the metadata variable conductivity With th
68. is script we are determining whether or not there is a statistically significant difference between the unweighted Unifrac distances across a conductivity spectrum qlime qiime VirtualBox Desktop Shared Folder cso for hhmi compare categories py method adonis i beta div dms unweighted unifrac otu table minl0 evenl1000 t xt m mapping files mapping complete txt c CONDUCTIVITY o adonis conductivity n 999 compare categories py method adonis OR anosim OR permanova 57 i lt input distance matrix txt gt Use the distance matrix computed for beta diversity m lt mapping file txt gt c lt metadata_ category gt o lt output_dir gt n lt num_permutations gt Optional the default number of permutations is 999 The output of this script is a text file with the adonis results The formatting is off in the tab delimited format so refer to the compare categories tutorial to see how to break up the lines http qiuime org tutorials category_comparison html Call adonis formula as dist qiime data distmat qiime data map opts category permutations opts num permutations Terms added sequentially first to last Df SumsOfSqs MeanSqs F Model R2 Pr gt F qiime data map opts category 1 0 5654 0 56542 3 2057 0 09954 0 001 Residuals 29 5 1151 0 17638 0 90046 Total 30 5 6805 1 00000 Signif codes 0 0 001 0 01 0 05 2 0 1 7 1 The p value of 0 001 see Pr gt F indicates that at an alpha of
69. is used for determining where to truncate the reads to remove regions of poor quality We can verify this by looking at the text file output The header to the text file which is written to the same directory as the quality plot suggests that the reads in this sample should be truncated to 244 nt We have 250 bp reads so losing only 6 bp is very good Suggested nucleotide truncation position None if quality score average did not drop below the score minimum threshold 244 more raw data d Truncate the files After reviewing all of the data we can pick a position at which to truncate the reads We want to maximize the quality of the reads so let s truncate at position 220 to preserve as many quality sequences as possible We can use a bash script to automate this process as we have done above This script takes approximately 25 minutes to complete http qiime org scripts truncate_fasta_qual_files html Move to the directory where the converted fasta and qual files are contained As before write a list of the contents to a text file Use gedit to remove any unwanted files Save and close to return to the command prompt cd converted _files Is gt conv_files txt gedit conv _files txt Implement the loop which will read in the fna and qual files from the list and truncate each file at nucleotide 220 The output will be written to the directory specified with o qiime qiime VirtualBox Desktop Shared Folder cso fo
70. ist of reagents used in this module price 2013 50535 GelStar GEL STAIN 10 000 0X 2 X 250uL 161 00 202602 HotStar HiFidelity Polymerase Kit 100 120 00 28604 MinElute Gel Extraction Kit 50 117 00 each primer is 68 bp x 28 cents base x 96 a Technologies G 008 02 2 E Gel 96 Agarose 216 00 Life Technologies 12373031 Egel low range ladder 82 00 5067 1504 Agilent DNA 1000 Kit 25 chips 739 00 Protocols Some of protocols have been adapted from the Earth Microbiome Project For further information please visit http www earthmicrobiome org A Library Preparation 1 16S rRNA gene Illumina tag itag PCR set up time 2 hours runtime 3 hours Illumina tag PCR amplification accomplishes two steps in one reaction The desired region s of the 16S rRNA gene is amplified which is typically required to obtain enough DNA for sequencing By modifying the primer constructs to include the Illumina adapters and a unique barcode the amplified region of the 16S rRNA gene can be identified in a pooled sample and the sample is prepared for sequencing with the lumina MiSeq platform Target gene gt 3 strand Se GTICCAGIINAAATSSAGAG In GEGCCAGCMGCEGCGGTAA Sen e E A e E ureiele niers amplicon ATTAGAWACCCBDGTAGTCC ATACAGGTGAGCACCTTGTA GAAGGTGAATTTACTCTGAA CACGGTCGKCGGCGCCATT 2 2 ccc ee eee eee es KOSTO amplicon TAATCTWTGGGVHCATCAGG TATGTCCACTCGTGGAACAT strand 3 Amplification primers w
71. ith annealing sites stars GINGCACO RAAT AGAG Rin GT GCCAGOMGCCGCGGTAA M crease tore yoreintnin oie A a E x ie amplicon ATTAGAWACCCBDGTAGTCC ATACAGGTGAGCACCTTGTA e TAATCTWTGGGVHCATCAGG 6 Ga TGACTGATTGCGTGCGATCTAGAGCATACGGCAGAAGACGAAC gt Rev primer Rev Linker Rev Pad RC of RC of strand 3 Forward PCR primer construct barcode lumina Adapter strand 5 Illumina Adapter For Pad For Linker Reverse PCR primer construct 5 7 Forward primer AATGATACGGCGACCACCGAGACGTACGTACGGT GTGCCAGCMGCCGCGGTAA 3 T GAAGGIGAATTIACILGIGAA CACGGTCGKCGGCGCCATT r nee e e E E E eE A PES amplicon TAATCTWIGGGVHCATCAGG TATGTCCACTCGTGGAACAT Amplification products AATGATACGGCGACCACCGAGACG TACGTACGGTGTGCCAGCMGCCGCGGTAA 20 ccc cece eee eee ence amplicon ATTAGAWACCCBDGTAGTCCGGGTACGTACGTAACGCACGCTAGATCTCGTATGCCGTCTTCTGCTTG TAATCTWTGGGVHCATCAGG CCCATGCATGCATTGCGTGCGATCTAGAGCATACGGCAGAAGACGAAC s r anieemeererntemescememacieementerntsmerssememesinrmssinecemesssiermesinrmsmertsmrsrssicrmenitrmsmertsmeseeitemesinrnssmertememssi nmestnrtsmertsmemasinrmeninrtsmesssiermesiermemertemerssiermasinrmsmerremrrasinomesintnrtmrrsemermesinrmasinetsmeswsememasicnmssinttemerssicrmasitrmsmerism csseisemesisrmamessemerssitrmasinitsacrtememaesitrmenintntsmrssememesinrmssinrtsmeiesiermesincmsmertsmrsracions Sequencing primers with annealing sites AATGATACGGCGACCACCGAGACGTACGTACGGT GTGCCAGOMGCCGCGGTAA 2 1 6 eee ee eee teen eee amplicon ATTAGAWACCCBDGT
72. lBox cd home Pe g qiime qiime VirtualBox home ls A Shared_Folder qiime qiime VirtualBox home cd qiime qiime qiime VirtualBox ls examples desktop qiime qiime VirtualBox cd Desktop al qiime qiime VirtualBox Desktop ls E System Monitor qiime qiime virtualBox Desktop command line ee Terminal Figure 7 Several components of the Ubuntu graphical user interface GUI The dash home contains all of the programs installed The system monitor allows you to check resource use The terminal is the means by which you can interact with the computer through the command line Understanding the file structure and knowing how to use some basic Linux commands are essential for using QUME effectively Below is a simplified version of the file structure we see in our distribution of Linux 21 e m m l Shared_Folder Figure 8 A simplified Linux file structure Directories are in solid boxes while files are denoted by dashed boxes The root directory is shown as The subdirectories within qiime are qiime_ software and Desktop Question What subdirectories are within root The file structure is important when we use the command line since we need to tell the shell where to find certain files or where to output the results for example The full path to qiime is home qiime where the first forward slash indicates root Question What is the full path for Shared Folder The shell commands we use
73. l_box html VirtualBox Download VirtualBox b Here you will find links to VirtualBox binaries and its source code About Screenshots VirtualBox binaries D load ikat By downloading you agree to the terms and conditions of the respective license Documentation VirtualBox platform packages The binaries are released under the term End user docs VirtualBox 4 2 12 for Windows hosts gt x86 amd64 Technical docs VirtualBox 4 2 12 for os x hosts x86 amd64 VirtualBox 4 2 12 for Linux hosts Contribute VirtualBox 4 2 12 for Solaris hosts x86 amd64 Community VirtualBox 4 2 12 Oracle VM VirtualBox Extension Pack gt All supporte Support for USB 2 0 devices VirtualBox RDP and PXE boot for Intel cards 32 Create a new virtual machine VM Launch the VB manager and click New in the upper left hand corner Give your new VM an informative name and change the operating system to Linux and the version to Ubuntu 64 bit Click next to continue the process tenerai J K create New Virtual Machine Wa VM Name and OS Type Enter a name for the new virtual machine and select the type of the quest operating system you plan to install onto the virtual machine The name of the virtual machine usually indicates its software and hardware configuration It will be used by all VirtualBox components to identify your virtual machine Name QIIME tiime OS Type as vers Select the memory al
74. late onto the magnetic plate for 2 minutes to separate beads from the solution Note wait for the solution to clear before proceeding to the next step At this point there will be a brown ring around the side of the tube These are the SPRI beads containing the PCR product f Remove supernatant and discard When removing supernatant place the pipette tip at the bottom of the well making sure not to disturb the two bead rings Note This step must be performed while the tubes are situated on the magnetic plate Do not disturb the ring of separated magnetic beads If beads are drawn out leave a few microlitres of supernatant behind g Add 120ul 80 ethanol to the 0 2 ml tube on the plate and incubate for 1 minute at room temperature Pipette off the ethanol and discard Repeat for a total of 2 washes Note It is important to perform these steps with the tubes situated on the magnetic plate Do not disturb the separated magnetic beads Be sure to remove all the ethanol from the bottom of the tube as it is known as PCR inhibitor h Let excess alcohol evaporate lt 5 minutes Note Take care not to over dry the bead ring bead ring appears cracked as this will significantly decrease elution efficiency 1 Remove the tubes from the plate Add 40ul of elution buffer 1x TE and mix by pipetting This will separate the beads from the PCR product Note The liquid level needs to be high enough to contact the magnetic beads A greater
75. ll of the same 96 well plate DO NOT MIX 12 12 Remove combs gently from gel 13 Load 5 ul ladder AlphaQuant 4 into the first and last wells on each row 14 Load all 8 ul PCR product Orange G into the wells Use a multichannel pipetter See accompanying spreadsheet for loading scheme vOu may rinse tips in tank buffer and change them every few times 15 Run gel for about 1 5 to 2 hrs at between 60 80 V 3 QIAquick Gel Purification 2 hours l 10 11 12 13 14 Excise the DNA fragments from the agarose gel with a clean sharp scalpel Minimize the size of the gel slice by removing extra agarose minimize light exposure and manipulation of gel as this can denature the DNA ALWAYS wear safety glasses and keep the cover on the gel when looking on the light Weigh the gel slice in a colorless tube Add 3 volumes of Buffer QG to volume of gel E g a 100 mg gel slice would require 300 uL of Buffer QG The maximum amount of gel slice per QIAquick column in 400 mg For a gel slice gt 400 mg use more than one QIAquick column Incubate at 50 C for 10 minutes or until gel slice is completely dissolved Can vortex to help dissolve gel mixture After gel slice is dissolved completely check that the color of the mixture is yellow If it is orange or violet add 10 uL of 3 M sodium acetate pH5 and mix This should bring the mixture back to yellow Add 1 gel volume of isopropanol or 200 pro
76. location to the VM Select at least 3 GB but no more than indicated by the green region of the slider bar Load the QIIME VDI as the hard disk for the VM Click on the folder red arrow to browse Choose the unzipped VDI that you want to install i Create New Virtual Machine We Virtual Hard Disk If you wish you can now add a start up disk to the new machine You can either create a new virtual disk or select one from the list or from another location using the folder icon If you need a more complex virtual disk setup you can skip this step and make the changes to the machine settings once the machine is created The recommended size of the start up disk is 8 00 GB V Start up Disk Create new hard disk Use existing hard disk QIIME 1 7 0 amd64 vdi Normal 200 00 GB 7 A Change the VM Settings Click on the VM in the left panel so that it is highlighted Click Settings in the upper left hand corner Go to the System section and the Processor tab Enable as many cores as you would like to allow the VM to use This must be done before QIIME can run in parallel 33 Display Storage PP Audio oP Network Serial Ports USB Shared Folders I 1 Extended Features Enable PAE NX item to get more information Select a settings category fom the list on the left hand side and move the mouse over a settings Cox cena tte Install Guest
77. n Qubit FA FA YW J J Qubit 1 20 ut 1 20 pt 1 20 pL Working Solution H H Final volume 180 199 uL is 200 pL y where n number of standards plus number of samples Figure 4 Manufacturer s diagram of the Qubit protocol See the Qubit 2 0 for the high sensitivity dsDNA manual for more information 1 Prepare working buffer Qubit dsDNA HS Buffer Number of samples 3 199ul Qubit reagent fluorophore Number of samples 3 1ul Note The extra 3 samples allow for 2 standards and for pipetting error 2 Vortex the working buffer to mix 3 Label Qubit Assay tubes with sample ID 4 For each sample add 2ul of PCR product to 198ul of working buffer to the appropriate tube 5 For each of the two standards add 10ul of standard to 190ul of working buffer to the appropriate tube 6 Vortex each sample for 2 3 seconds to mix 7 Incubate for 2 minutes at room temperature 11 8 On the Qubit fluorometer hit DNA then dsDNA High Sensitivity then YES 9 When directed insert standard 1 close the lid and hit Read 10 Repeat step 9 for standard 2 This produces your two point standard calibration 11 Read each sample by inserting the tube into the fluorometer closing the lid and hitting Read Next Sample 12 Use the spreadsheet dna_quants xlsx to record the data D Quality Check Libraries 1 Pool 240 ng DNA per sample into one collection tube one hour See the column Volume to pool in the spreadsh
78. nal due to the pairwise comparisons e g B3 and C2 are identical since both S9 and S8 are being compared Aaj B c D F e H J K o 9 8 3 2 S1 S5 S4 19 18 39 2 E O 0 417418 0 418494 0 505656 0 436287 0 472299 0 476023 0 393154 0 454499 0 405428 3 58 0 4174 O 0 433982 0 443292 0 470943 0 461472 0 468011 0 439557 0 443821 0 510988 4 53 0 4185 0 433982 0 0 435458 0 43856 0 463217 0 407056 0 446045 0 51256 0 50475 5 52 0 5057 0 443292 0 435458 0 0 444524 0 441994 0 482498 0 497162 0 506902 0 565564 6 51 0 4363 0 470943 0 43856 0 444524 0 0 396904 0 482262 0 496 0 465256 0 532358 7 S5 0 4723 0 461472 0 463217 0 441994 0 396904 0 0 411945 0 443405 0 466943 0 531708 8 S4 0 476 0 468011 0 407056 0 482498 0 482262 0 411945 0 0 442192 0 531609 0 525095 52 Figure 13 Sample unweighted Unifrac distance matrix ii Compute principal coordinates matrices Do this for both unweighted and weighted distance matrices http qiuime org scripts principal_coordinates html qliime qiime VirtualBox Desktop Shared Folder cso for hhmi mkdir beta div pc qlime qiime VirtualBox Desktop Shared Folder cso for hhmi principal coordinat es py i beta div dms unweighted unifrac otu table minlO evenl000 txt o beta di v pc unweighted pc txt qlime qiime VirtualBox Desktop Shared Folder cso for hhmi principal coordinat es py i beta div dms weighted unifrac otu table minlO evenl000 txt o beta div pc weighted pc txt principal coordinates py
79. ne lt trunc fasta txt mkdir re_seqs while read line do time adjust_seq_orientation py i line o home qiime Desktop Shared_Folder cso_ for hhmi re_seqs line done lt trunc_fasta txt Remove the unneeded list rm trunc_fasta txt f Add QIIME labels This script allows you to specify the QIIME compatible label that belongs to each file Since we can t use split libraries as described in the documentation we have to add the labels manually This script requires the fasta files and a modified mapping file with the desired QIIME label and the exact name of the corresponding file http qiuime org scripts add_gqiime_labels html Modify the mapping file such that the file name is correct The FILE NAME must match the file to be labeled The SampleID can be anything within the formatting guidelines but it is often helpful to keep it as simple as possible SampleID FILE_NAME 10 29 21 43 1 16 rep1_S65_L001_R2_001_filtered fasta ee ae 10 29 21 43 1 16 rep2_S66_L001_R2_001_filtered fasta Se ooo 10 29 21 43 2 8 rep1_S41 L001 R2 001 filtered fasta SAP 10 29 21 43 2 8 rep2_S42_L001_R2_001_filtered fasta Add the QIIME labels The script requires a directory as input so we don t have to use a bash script here to add the labels to every sequence qlime qiime VirtualBox Desktop Shared Folder cso for hhmi add qiime labels py m mapping files mod mapping txt i rc seqs c FILE NAME n 1000000 o labeled re seqs a
80. nse oz iscence forz distribution details O Netural language suppor Dut running t on Emgiish locale espanol 2 laborative p ect ra contribstors Typ ibators ro fozaaticn 3 e fF pectages Blicet 3 4 IS a llen S2mr e vet i A s C mon tHe Dropbox S vcsf Pong an dara dorrat_pN_ dragy lt a hae 5 EPERE lt _ lt E a me Founda ce tee Search View Ercoding Leequege Setings Maco Run P Wie f a Pia w eat gets vedow o ii U oi T ai t 2 R is fre nd me als TE A iT You ez ze distribut t und i tl col foenat_pht_cete sy jE el F Type 1i 4 fcrmatting goddamn a a Satu g Pp t g En A A is at r wi ta t 1 amp i more t 7 j m sitec Ac p Liicriccasc r iii pesses aio en If you like what you see in the photo above click Yes customized setup If you prefer to have one window with all the components of the program viewed inside that window click No accept defaults and skip to Step 11 I2 1 If you said yes to Step 8 click SDI separate windows Next you can specify plain text or HTML help I would suggest HTML help because it is easier to view than plain text which appears in the window j If you are at an institution that utilizes Internet2 dll select Internet 2 If not or if you are unsure select Standard k Go ahead and create a program shortcut by clicking Next l Choose if you want to have another icon clutter your desk
81. o not want giime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files cd qlime qiime VirtualBox cd Desktop Shared Folder cso for hhmi converted files qiime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files mkdir qual files qlime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files cp qual qual files ime VirtualBox Desktop Shared Folder cso for hhmi converted files cd qual files e VirtualBox Desktop Shared Folder cso for hhmi converted files qual files ls gt qual list txt qlime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files qual files gedit qual list txt cp qual qual files 40 cd qual files Is gt qual list txt gedit qual list txt Then implement the loop This script generates plots of sequence quality and returns the position where the quality falls below the Phred score specified with s The default Phred quality score is 25 however we use 20 in this example Make an output directory with the utility mkdir mkdir home qiime Desktop Shared_Folder cso_ for hhmi qual plots qlime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files qual files cd qlime qiime VirtualBox cd Desktop Shared Folder cso for hhmi converted files qual files qliime qiime VirtualBox Desktop Shared Folder cso for hhmi converted files qual files while read Line gt do time quality scores plot py q line s 20 o home qiime Desk
82. of EtOH to the sample and mix Place a QIAquick spin column in a 2 mL collection tube To bind DNA apply the sample to the QJAquick column and centrifuge 1 minute The maximum reservoir or the column is 800 uL For samples greater than 800 u just load and spin again Discard flow through and place column back in same collection tube Add 0 5 mL of buffer QG and centrifuge for 1 min To wash add 0 75 mL buffer PE to QIAquick column and centrifuge min Discard flow through and centrifuge for an additional 1 minute at 17 900g 13 000 rpm Place QIAquick column into a clean labeled 1 5 mL microcentrifuge tube To elute DNA add 30 uL of Buffer EB to the center of the QlAquick membrane and centrifuge for minute Take flow through and spin through the column again Discard column Freeze products 13 4 Bioanalyzer 45 mins to set up chip 45 mins for scanning a Introduction The Bioanalyzer Agilent is used to assess the quality of the pooled DNA before it is sent to the sequencing core The Bioanalyzer uses microfluidics technology to carry out gel electrophoresis on a very small scale A gel dye mix is prepared and spread into the wells of the chip during the chip priming step Marker the ladder and the samples are loaded and the chip is vortexed briefly During the run the DNA fragments migrate and are compared to the migration of the ladder resulting in a precise calculation of DNA fragment size and abundance The Bioan
83. or more detailed content assessments see Week I through Week 5 Assessment folders for the Environmental Genomics course on the Wiki or on your flash drive List of assessment questions activities 2 This survey can be given at the end of the modules Also it can be found in the Post course survey folder on the Wiki and the flash drive Describe why the 16S rRNA gene is a good phylogenetic marker Describe the benefits of replication in designing a 16S rRNA gene study Design a study to compare microbial community structure in your system of choice from environmental sample to data analysis Draw the design of barcoded Illumina 16S rRNA gene targeted primers Forward and Reverse Label each section of the primer and describe its function in library construction Summarize the steps involved in preparing Illumina barcoded 16S rRNA gene libraries as described in the Caporoso et al paper Discuss biases associated sample preparation from collection through library preparation that might result in biases in microbial community structure Describe the steps of Illumina sequencing Molecular Techniques Post Course Student Attitudes Survey 1 Professional information please circle all relevant descriptions gqmonaoop Elementary School Teacher Middle school teacher High School Teacher College faculty staff Student Graduate Student Writer Artist Creator Other 2 Please indicate your primary academic disciplinary area below
84. prefs file py m mapping files mapping complete txt o prefs file txt make prefs file py 1 lt mapping _file txt gt 0 lt prefs_file txt gt The output of this script is a text file specifying the coloring for each metadata variable 33 iv Generate 2D plots If you have a large mapping file this script might fail because of memory issues If so you can choose to color by specific categories or split up the mapping file See the documentation for more information http qiuime org scripts make_ 2d _plots html Make 2D plots for both the unweighted and weighted principal coordinates matrices qlime qiime VirtualBox Desktop Shared Folder cso for hhmi time make 2d plots py 1 beta div pc unweighted pc txt m mapping files mapping complete txt o 2d out p prefs file txt make 2d _ plots py 1 lt pc_matrix txt gt m lt mapping file gt o lt output_dir gt p lt prefs_file txt gt The output of this script is an HTML file containing the 2D plots Hover over each point to see the sample ID and the metadata value An example of a 2D plot based on unweighted Unifrac distances is shown below CONDUCTIVITY PCoA PC3 vs PC2 T T T T T 0 15 0 20 L 4 4 4 i 4 4 4 a 0 20 0 15 0 10 0 05 000 005 010 015 020 PC3 Percent variation explained 4 44 Figure 15 A sample 2D principal coordinates analysis colored continuously by conductivity Hovering o
85. pt http qiuime org scripts beta_diversity_through_plots html beta_diversity_through_plots py 1 otu_table biom m lt mapping file gt 0 lt output_dir gt t lt path to ref tree gt Use the latest Greengenes reference tree required for Unifrac a required to run in parallel O lt num jobs to start gt lt seqs per sample gt depth of coverage for even sampling Individual Scripts i Compute distance matrices This script takes about 7 minutes to run on three cores with about 3 GB of RAM http qiime org scripts parallel_beta_diversity html qlime qiime VirtualBox Desktop Shared Folder cso for hhmi time parallel beta diversity py i otu table minl0 evenl000 biom t home qiime qiime software gqg 1 3 5 gg 13 5 otus trees 97 otus tree 0 3 o beta div dms parallel beta _diversity py i lt clean otu table biom gt 0 lt output dir gt m lt metrics gt Default metrics are unweighted and weighted Unifrac Additional metrics are given at http qiime org scripts beta diversity __metrics html t lt path to ref tree gt Use the latest Greengenes reference tree O lt num jobs to start gt The output of this script is two distance matrices one of which is calculated with the unweighted Unifrac metric and the other which is calculated with weighted Unifrac An excerpt of the unweighted distance matrix is included below Samples are compared pairwise to all other samples The values are the same across the diago
86. py f line o converted files c fastq to fastaqual done lt list txt Remove the list rm list txt c Generate quality plots This script requires a lot of memory which is probably limited on personal computers Use GCAT SEEK which has enough memory for this script to complete in an hour Alternatively see the complete output in qual_plots tar gz to save time The output of this script is not required to proceed with the tutorial Reads from the sequencing instrument can be of differing quality and it s important to ensure that the reads you are working with are of good quality to improve the downstream results Phred scores indicate the probability that a base call is correct For example if a base has a Phred score of 20 the chance that this base call is correct is 99 Phred scores are calculated with a logarithm so a Phred score of 30 indicates that the probability of a correct base call is 99 9 for a certain position QIIME allows you to determine at which nucleotide the quality tends to drop off Using this information you can truncate the reads manually or appropriately set a quality cutoff in split libraries fastq py http qiuime org scripts quality_scores_plot html Again we can automate this process with a bash script Move to the directory containing the fasta and qual files Make a new directory to store the qual files and copy them there Then make a list of the qual files as before and remove the files you d
87. qiime VirtualBox Desktop Shared Folder cso for hhmi cd raw data qiime qiime VirtualBox Desktop Shared Folder cso for hhmi raw data ls gt list t xT qiime qiime VirtualBox Desktop Shared Folder cso for hhmi raw data gedit list txt cd raw data Move to the directory where your unzipped raw data is located Is gt list txt Write the output of the list to a text file gedit list txt Open the text file with gedit a Notepad like text editor Ensure that only the files you want to unzip are in this file You will probably have to delete list txt and a blank line after the files Once this is done save and close gedit You will return to the command prompt 39 b Convert fastq files to fasta files using a bash script While there is a line to read the file specified in the line will be converted from a fastq file to a fasta file and a qual file and written to converted _files After all files have been converted the list will close and the command prompt will return This step may take up to ten minutes http qiime org scripts convert_fastaqual_ fastq html qilme qiime VirtualBox Desktop Shared Folder cso for hhmi raw data while read line do time convert fa staqual fastq py f line o home qiime Desktop Shared Folder cso for hhmi converted files c fastq to fastaqual done lt list txt You can either type everything at once or hit enter after the semicolons while read line do time convert fastaqual fastq
88. r hhmi converted files whi gt do read qual gt time truncate fasta qual files py f fna q qual b 220 o truncated fasta gt done lt conv files txt 42 while read fna do read qual time truncate fasta qual _files py f fna q qual b 220 o truncated_fasta done lt conv_files txt Remove the unneeded list rm conv files txt e Generate the reverse complement of the sequences The reads used in this tutorial are from a paired end sequencing run Paired end sequencing allows reads both in the forward and reverse directions to be generated This is useful when attempting to sequence a long amplicon with short read lengths Support for paired end reads is currently limited in QIIME so typically single end reads are used to generate the OTU table We are using read 2 because the quality is much better than read 1 in this case The read 2 s are in reverse orientation relative to the Greengenes database therefore we must generate the reverse complement of the reads before picking the OTU table As before we can use a bash script to automate the process http qiime org scripts adjust_seq_orientation html Move to the directory containing the truncated fasta files Make a new directory to store the fasta files temporarily Then copy only the fasta files to this new directory List the contents and write to a file Use gedit to remove files that should not be reversed qliime qiime VirtualBox cd Desktop Shared Fol
89. ray Curtis distance metric to compute beta diversity since the phylogenetic trees needed to use Unifrac have not yet been developed for ITS 60 Use x control x to return to the command line Open Reference OTU Picking We input the sequences file path the reference ITS OTUs with 97 define an output directory for the picked OTUs the parameters file discussed above and suppress aligning and constructing a phylogenetic tree from the sequences This step may take a half hour or more qlime qiime VirtualBox its its soils tutorial pick open reference otus py i seqs fna r home qiime its its 12 11 otus rep set 97 otus fasta o open ref its otus p params txt suppress align and tree pick open reference otus py 1 seqs fna r home qiime its 12 11 otus rep_set 97_ otus fasta o open ref its otus p params txt suppress align and tree Once the OTU table is generated the downstream analyses in QIIME can be used to analyze the data see the 16S workflow Any metric that uses a phylogenetic tree Unifrac for beta diversity and PD whole tree for alpha diversity cannot be used with fungal ITS data at this time since the appropriate reference trees are not yet available Assessments In the Environmental Genomics folder on your flashdrive and on the Wiki you will find all of the bioinformatics and biostatiscal worksheets quizzes activities discussion questions papers and assessments See weeks 6 through 14 Asses
90. related significance For the first half of the semester students will perform and be able to describe the biochemistry behind nucleic acid extraction quantification and 16S rRNA gene PCR to generate libraries for sequencing Example Project Temporal Dynamics of Microbial Community Structure of stormwater collected downstream of a combined sewer overflow Students will explain the biochemistry behind the most recent high throughput sequencing technologies and perform cost benefit analysis of utilizing different sequencing applications Students will apply unix and perl based bioinformatics tools to perform computational analysis on various types of genomics projects including the sequence data they generate from their semester long research project 19 e Students will prepare a scientific manuscript in which they synthesize the current literature relevant to their research problem describe their methodology and present and discuss their research findings e Each student will develop a poster describing the technology behind a molecular technique of their choice e Through discussion of current literature in the field students will develop plans to troubleshoot experimental and bioinformatics problems that they may encounter e Exposure to this type of research will also catalyze advanced undergraduate training in the integration of basic biological concepts cutting edge modern sequencing technologies and bioinformatics with the multi and
91. res 3 GB RAM http qiime org scripts parallel_multiple_rarefactions html This script will yield 100 OTU tables in total The original OTU table will be subsampled 10 times at a depth of 10 seqs sample 10 times at a depth of 120 seqs sample and so on until the maximum rarefaction depth is reached 1000 seqs sample The step size is 110 which means that the each sampling depth will be increased by 110 until 1000 seqs sample is reached Since each subsampling undergoes 10 iterations a total of 100 subsampled OTU tables will be generated qlime qiime VirtualBox Desktop Shared Folder cso for hhmi parallel multiple r arefactions py i otu table minl0 biom o multiple rarefactions m 10 x 1000 n 10 s 110 0 3 parallel multiple rarefactions py i otu_table biom O lt output_directory gt Put the absolute path of the output directory where you want to store the rarefied OTU tables m lt min rarefaction depth gt x lt max rarefaction depth gt S lt step size gt O lt num_ jobs to start gt Maximize the number of jobs to start since this step 1s computationally intensive The output of this script is a directory containing 10 OTU tables per sampling depth ii Compute alpha diversity Run this step in parallel if possible For serial environments use alpha _ diversity py and omit O This step takes about 20 minutes using the metrics below 3 cores 3 GB RAM http qiuime org scripts parallel_alpha_diversity
92. ropriate metadata The next step is to generate an OTU table which is often done by clustering the reads against the Greengenes reference database a process which greatly speeds computation After the OTU table is generated and assigned taxonomies various downstream analyses can be implemented QIIME offers support for a number of alpha and beta diversity metrics data visualization and multivariate statistics Furthermore files generated in QUME can be used with several other software packages including Microsoft Excel PC ORD Primer E and R D Installing the QIIME VirtualBox Image VirtualBox allows you to install QIIME which runs in Linux on a computer running a different operating system Mac users can install QIIME for Macs which is similar to a native installation of QIIME on a Linux operating system This tutorial will cover installing VirtualBox VB for Windows users You will need a 64 bit machine and a tool to unzip files 7zip works well and is free and easy to install http www 7 zip org There is a video tutorial available in which VirtualBox is installed on a Mac The installation procedure should be similar to that as the Windows installation http www youtube com watch v 1 YupkquaME Download VB and the current QUME virtual disk image VDI Follow the link below to install VirtualBox To save time the VDI is downloaded on the flash drive It needs to be unzipped before installing http qiime org install virtua
93. script are taxa summaries in both tab delimited and biom formats Taxa are summarized by L2 phylum L3 class L4 order L5 family and L6 genus Taxa can also be summarized at the species level L7 however the V4 region of the 16S rRNA gene typically does not provide enough resolution to get species level taxonomic information See the documentation for more information An excerpt from a family level L5 taxa summary is shown here The taxa are in column A with the relative abundance of each taxon given per sample which are labeled in row 1 1 Taxon 2 19 5 18 15 12 2 k_Archaea p_Crenarchaeota c_Thaumarchaeota o_Cenarchaeales f_Cenarchaeaceae 0 0 0 0 003175 0 001642 3 k_Archaea p_Crenarchaeota c_Thaumarchaeota o Cenarchaeales f_ SAGMA X 0 0 0 001534 0 0 001642 4 k_Bacteria p_Acidobacteria c_Acidobacteriia 0 Acidobacteriales f__Acidobacteriaceae 0 001499 0 004451 0 001534 0 0 0 0015 5 k_Bacteria p Acidobacteria c_Sva0725 0 Swa0725 f__ 0 0 0 0 001587 0 001642 0 003 6 k_Bacteria p Acidobacteria c_iii1 8 0 DS 18 f__ 0 002999 0 005935 0 001534 0 012698 0 00821 0 006 7 k_Bacteria p Actinobacteria c_Actinobacteria o Actinomycetales f_ Kineosporiaceae 0 0 0 0 001587 0 001642 8 k_ Bacteria p Actinobacteria c_ Actinobacteria o__Actinomycetales f__Microbacteriaceae 0 002999 0 008902 0 003067 0 014286 0 006568 0 001 9 k_Bacteria p _Actinobacteria c_Actinobacteria o Actinomycetales f__Micrococcaceae 0 0 0 0 0 Figure 11 The text
94. sis DNA quantification gel purification and library quality checking e Participants will also learn common issues associated with preparation of libraries and troubleshooting options e By the end of this module participants will have 16S rRNA gene and or ITS libraries ready for submission for sequencing on the Illumina MiSeq platform Vision and Change core competencies addressed in this module e Ability to apply the process of science by designing scientific process to understand microbial communities in their natural environments e Ability to apply the process of science by developing problem solving strategies to troubleshoot issues associated with PCR inhibition and instrumentation e Ability to understand the relationship between science and society as participants will need to contextualize and convey how their project relates human health and or the environment e Ability to tap into the interdisciplinary nature of science by applying physical and chemical principles of molecules to provide an in depth understanding of high throughput sequencing technologies GCAT SEEK sequencing requirements Data for these libraries will be sequenced using the Illumina MiSeq platform This technology currently yields up to 300 bp read lengths Single end runs yield 12 15 million reads while paired end read lengths yield 24 30 million reads More information is available at e Video http www youtube com watch v t0akxx8Dwsk e Backgroun
95. site and click Download R under Getting Started b Choose a place to download R Choosing a location close to you helps speeds things up c Choose which R package to download based on your operating system 1n the first box If you are Unix or Mac user I apologize but this is where we now go our separate ways d Click on install R for the first time then download the file with the biggest font on the top e Click run Then choose your language f Click next to start the installation agree to all their legal writings and selection an installation window g Select Core Files and then either 32 bit or 64 bit files depending on your computer system To check hit Start right click Computer and select Properties Look at System Type h Now you have a choice for Startup Options I prefer to view the program in multiple separate windows so that I can arrange them on my screen while also have an internet browser or a notepad type program open as well o a j x B ster i E cogis C 6 Mtps awagocgie com a o m nuee ve 12 6 Trick cz Treat O Veber Y chem Agplence bp LACCO Chas Sched M Gmel O Other beotrneds 1 he lation for Stotisticeal aputing SAN gt e m ma Ta This page isin Spanish Would yo Translate Nope Optics latform x56 t4 w msngwi2 x4 64 bit A is free sottwere a mes With ABSOLUTELY WARRANT ou are welcome G4 4 tTiLowte is ipae r Certan SONA t ne Type lice
96. sments questions are also embedded in module 2 text as well as the following questions Assessment for Module 2 1 Briefly describe each of the steps involved in analyzing high throughput 16S rRNA gene sequencing datasets using the qiime pipeline 2 Describe what data one can find in an OTU table 6l 3 Compare and contrast alpha and beta diversity 4 The plot below shows rarefaction curves for two samples green top line and blue bottom line for a measured alpha diversity metric PD whole tree a phylogenetic distance measure Describe the purpose of performing rarefaction analyses and describe what the results below show about these two different samples PD_whole_tree Treatment 5 The above plot is a Principal Coordinate Analysis based on a beta diversity metric from nine different samples Describe the utility of these types of plots for microbial ecology research 62 Cis Cry Q ma5tik CID a scs staff 1 In the terminal what would you type to get to the pg1 directory Assume you start in home directory SampleID Day1 2143 A1 Day1 2143 A2 Day1 2143 Bl1 Day1 2143 B2 Day1 2143 Cl Day1 2143 C2 Day2 0925 A1 Day2 0925 A2 BarcodeSequence TCCTTAGAAGGC GATGGACTTCAA GGTACCTGCAAT TCGCCTATAAGG TCTAGCCTGGCA GCCGGTACTCTA CGAATGAGTCAT CGATATCAGTAG LinkerPrimerSequence GTGCCAGCMGCCGCGGTAA GTGCCAGCMGCCGCGGTAA GTGCCAGCMGCCGCGGTAA GTGCCAGCMGCCGCGGTAA GTGCCAGCMGCCGCGGTAA GTGCCAGCMGCC
97. station and pipette 9 0 ul of gel dye mix in the wells marked G COND WN B WN Loading the Marker 1 Pipette 5 ul of marker green in all sample and ladder wells Do not leave any wells empty Loading the Ladder and the Samples 1 Pipette 1 ul of High Sensitivity DNA ladder yellow O in the well marked oi 2 In each of the 11 sample wells pipette 1 ul of sample used wells or 1 ul of marker unused wells 3 Put the chip horizontally in the adapter and vortex for 1 min at the indicated setting 2400 rpm 4 Run the chip in the Agilent 2100 Bioanalyzer within 5 min c Interpreting Bioanalyzer Results sample 1 T 50 LULI INE 7 35 150 300 500 1000 10330 bp L 2 Figure 6 An example tracing from the Bioanalyzer DNA assay containing a high quality barcoded 16S V4 amplicons The peaks at the 35 and 10 380 bp are the marker peaks black solid arrows The peak around 300 bp is the peak of interest and represents the approximately 360 bp V4 barcoded amplicon Sometimes extraneous peaks are present like the peak around 500 bp A small bump in the tracing is seen around 150 bp which indicates there is a small amount of primer still left in the sample however this peak is insignificant compared to the strong peak corresponding to the barcoded amplicon The gel to the right corresponds to the peaks with the most intense sample band slightly less than 400 bp 15 Assessments 1 Content Assessments F
98. stello EK Fierer N Pe a AG Goodrich JK Gordon JI Huttley GA Kelley ST Knights D Koenig JE Ley RE Lozupone CA McDonald D Muegge BD Pirrung M Reeder J Sevinsky JR Turnbaugh PJ Walters WA Widmann J Yatsunenko T Zaneveld J Knight R 2010 Nat Methods 7 5 335 6 Other useful tools Qiime forum Search here for helpful answers to your questions If you have searched the forum thoroughly and have not found helpful information then post your question The Knight group will get back to you within a few hours http groups google com group qiime forum http squamules blogspot com 2013 02 pandaseq to qiime html Pandaseq to qiime integrating paired end sequence data and getting it into qiime 66 Mothur is another 16S rRNA gene tool developed by Pat Schloss http www mothur org AXIOME is a new tool that integrates many different tools including QIIME Lynch MDj Masella AP Hall MW Bartram AK Neufeld JD 2013 AXIOME automated exploration of microbial diversity Gigascience doi 10 1186 2047 217X 2 3 67 APPENDIX A Helpful Links QUME links 1 Main website http qiuime org index html 2 Documentation http qiuime org documentation index html 3 Scripts http qiuime org scripts index html 4 More extensive information news and code can be found at QIIME s github site https github com qiime qiime 5 Video Tutorial for installing the QIIME VirtualBox image http qiuime org install virtual_box h
99. te Finally HoiStpeFan ane t 2X Xx Bi X DNA temp ate lOM 0 5 0 2 uM 10 uM 0 2 uM Thermocycling Conditions 1 94 C for 3 min to denature the DNA 2 94 C for 45 s 3 50 C for 60 s 35 cycles 4 72 C for 90 s 5 72 C for 10 min for final extension 6 4 C HOLD 16S rRNA samples 2 ITS Illumina tag PCR The ITS1 F and ITS2 primer pair with appropriate Illumina adapters pad and linker sequences and 12 bp Golay barcodes is used to amplify the ITS 1 region The PCR conditions are based on the Earth Microbiome Project standard amplification protocols and have been used in recent papers PCR reactions will be performed in duplicate ITS1 F CTTGGTCATTTAGAGGAAGTAA ITS2 GCTGCGTTCTTCATCGATGC Record the PCR plate set up on the chart below or in the appropriate spreadsheet on the flash drive EO TCT R Reverse primer 10 uM 0 5 0 2 uM PCR grade HO 105 f Totalyolume 250 Thermocycling Conditions 1 94 C for 3 min to denature the DNA 2 94 C for 45 s 3 50 C for 60 s 35 cycles 4 72 C for 90 s 5 72 C for 10 min for final extension 6 4 C HOLD ITS fungal samples C Check PCR Amplification 1 2 hours 1 Pooling the DNA 30 mins 1hour depending on the number of samples Combine duplicate reactions into a single pool per sample After combining determine which PCR reactions were successful with the E gel electrophoresis protocol 2 E Gel Electrop
100. tml virtualbox help video QUME forum https groups google com forum fromgroups forum qime forum The forum is an excellent resource Posts range from general questions about how the scripts work to troubleshooting errors The QIIME developers are very helpful and patient with people with limited computer science backgrounds Greengenes http greengenes bl gov cgi bin nph index cg1 Biom Format Documentation OTU tables http biom format org Linux for Beginners http www linux org tutorial view beginners level course Collection of File Management Commands http www tuxfiles org linuxhelp files html B Additional Protocols Scripts 1 Purification by SPRI Beads This protocol allows you to remove primers from the PCR product using SPRI beads It can be used to improve the quality of the PCR product if no gel extraction and purification 1s to be done a Gently shake the SPRI beads bottle to resuspend any magnetic particles that may have settled b Pipette 50ul of PCR product were transferred to single 96 well plate c Add 90ul of SPRI beads to the samples 68 d Mix reagent and PCR reaction thoroughly by pipette mixing very gently and incubate mixed samples for 10 minutes at room temperature for maximum recovery Making sure to tap gently every 2 3 minutes Note this step binds PCR products 150bp and larger to the magnetic beads The color of the mixture should appear homogenous after mixing e Place the p
101. top Shared Fq lder cso for hhmi qual plots line gt done lt qual list txt Move to the directory containing the qual files and implement the loop For each qual file position at which the average Phred score falls below 20 is recorded and a graph of the average Phred scores over the length of the reads 1s created cd home qitme Desktop Shared_Folder cso_for_hhmi converted_files qual_files while read line do time quality scores plot py q line s 20 o home qiime Desktop Shared_Folder cso_for hhmi qual_plots line done lt qual list txt Remove the list of quality files and the qual_ files directory rm qual list txt rm R qual files The output will be contained in separate directories for each sample The PDF file contains a plot of the average quality score at each nucleotide for all of the reads in each sample The dotted line indicates the minimum quality score which we specified as Phred 20 The black line is the average quality score with the upper and lower red lines indicate the standard deviation In this case we have good quality along the length of the read since the average quality score falls below Phred 20 just before the 250 nucleotide position 4 Quality Scores Report Quality Score Quality Score Average Std Dev Score Threshold a 50 100 150 200 250 Nucleotide Position Figure 10 A quality scores report shows the average quality score per nucleotide position This analysis
102. top and or Quick Launch toolbar I suggest leaving the two options under Registry Entries selected 3 Proprietary software for data analysis These tools were originally designed for macroecologists but can be used with microbial ecology data as well These software packages are not free unlike QIIME and R a PC ORD is a software package for multivariate community analysis http home centurytel net mjm pcordwin htm b Primer E with the PERMANOVAG4 add on is another software package for multivariate community statistics It is the premier software package used in ecology and includes many analyses QIIME does not yet implement including distance based redundancy analysis db RDA The software comes with an excellent manual which describes in detail each statistical analysis and how to use them to analyze your communities http www primer e com 4 Computing a Troubleshooting QIIME error messages There are three general types of error messages you will encounter when working with QIIME They include shell error messages QIIME specific error messages and Python error messages Shell error messages typically diminish as you become more comfortable working with the command line Python error messages are typically the hardest to troubleshoot since they typically signal some sort of formatting issue which may or may not be apparent to the user Sometimes you will get no error message but a script may hang without completing
103. ver a sample in the interactive HTML plot shows the sample ID and the metadata value In this case samples with low conductivity are located on the right red while high conductivity samples are located on the left blue in PCI vs PC2 For the metadata variable conductivity the points are colored by conductivity values on a color scale from red low to blue high The first PCoA plot is of PC1 and PC2 or the two axes which explain the most variation in the dataset PC1 explains 10 82 and PC2 explains 5 17 The next PCoA plots show PC2 vs PC3 and PC1 vs PC3 respectively We typically look at the first PCoA plot PC1 vs PC2 because the most variation in the dataset is explained by this ordination of samples based on their unweighted Unifrac distances Samples that are closer together represent communities with more similar microbial communities than do two samples that are farther apart Furthermore we can see here that there is a gradient of conductivity values in the sample ordination Samples with low conductivity cluster on the right with increasing conductivity values toward the left side of 54 the PCoA plot The sample ordination in space is random so the direction of the gradient may flip if this plot is remade however the trends and clustering are preserved v Generate 3D plots QIIME 1 7 0 offers support for two types of 3D plot viewers Emperor was developed by the Knight Lab and is more user friendly than KiNG which w
104. volume of elution buffer can be used but using a lower volume might require extra mixing and may not fully elute the entire product Elution is quite rapid and it is not necessary for the beads to go back into solution for it to occur I would add 40 ul if the samples were previously concentrated in a final volume of 50 ul or I would add 50 ul if the samples were for the PCR replicates combined in a final volume of 75 ul j Place the tubes back onto the magnetic plate for minute to separate the beads from the solution Transfer the eluent to a new tube This contains the purified PCR product 69 Pool PCR amplicons in equimolar concentrations a Quantify the amplicon pools by electrophoresis or bioanalyzer preferably by bioanalyzer b Pool 50 ng of each sample together in a 1 5 ml tube At this stage the volume is unimportant Note Lower quantities of each sample can be pool together However pooling a minimum amount of 50 ng of each sample will assure the final amount of 2 ug total that sequencing requires Concentration Option A The pooled samples will need to be concentrated into 25ul in order for the sequencing libraries to be created a Combine the triplicates PCR products in a thin walled 0 2 ml tube b Add 1 ul of linear acrylamide to pooled products Note this acts as a carrier for small amounts of DNA c Add 1 10 volume of sodium acetate d Add 0 8 1 volume of isopropanol e Mix well by pipettin
105. was my first time teaching the course so take things with a grain of salt Discussion Topics for class The utility of Unix Linux operating systems 65 e How to develop independence to troubleshoot error messages e Differences between alpha and beta diversity and which metric are appropriate for which biological questions e How to choose appropriate biostatistics for your biological question e Limitations of currently available informatics and statistical methods References and Suggested Reading Diversity l Lozupone CA Knight R 2008 Species divergence and the measurement of microbial diversity FEMS Microbiol Rev 32 557 578 OUME Algorithms 2 Caporaso JG Bittinger K Bushman FD DeSantis TZ Andersen GL Knight R 2010 PyNAST a flexible tool for aligning sequences to a template alignment Bioinformatics 26 266 267 3 Edgar RC 2010 Search and clustering orders of magnitude faster than BLAST Bioinformatics 26 2460 2461 OLIUME Examples 4 Caporaso JG et al 2011 Moving pictures of the human microbiome Genome Biol 12 R50 5 Kuczynski J Stombaugh J Walters WA Gonzalez A Caporaso JG Knight R 2012 Using QIIME to analyze 16S rRNA gene sequences from microbial communities Curr Protoc Microbiol Chapter 1 Unit 1E 5 doi 10 1002 9780471729259 mc01e05s27 6 QIIME allows analysis of high throughput community sequencing data Caporaso JG Kuczynski J Stombaugh J Bittinger K Bushman FD Co

Metagenomics Breakout Session

Contents

Download Pdf Manuals

Related Search

Related Contents