Home

BeadArray expression analysis using Bioconductor

1. illuminaHumanvi db illuminaHumanv2 db illuminaHumanv3 db BeadArrayUseCases There are also a few packages that are used to illustrate particular use case but are not crucial to the vignette They can be installed in advance or as required gt biocLite c GOstats GenomicRanges Biostrings Introduction to the BeadArray technology Ilumina Inc San Diego CA are manufacturers of the BeadArray microarray technology which can be used in genomic studies to profile transcript expression or methylation status or genotype SNPs The BeadArray platform makes use of 50mer oligonucleotide probes attached to beads that are randomly assembled and replicated many times on each array Different probes are designed to target specific positions transcripts SNPs in the genome A series of decoding hybridizations performed in house by Illumina determines the identity of each bead WG 6 Expression Viewofa BeadChip BeadArray Figure 1 Schematic of a WG 6 BeadChip and BeadArray section This type of BeadChip is made up of 12 sections also referred to as h i strips which are paired to allow A z 6 samples to be processed in par allel per chip Sections are densely Eo packed with beads which are ran i domly assembled on the array sur face Another feature of the tech nology is the high degree of replica randomly tion of beads assigned a particular assembled probe sequ
2. e targets file recommended contains sample information for each array See the file targetsHT12 txt for an example Er FOUOANDOBWNH e metrics file optional one for each BeadChip usually named Metrics txt which contains summary information about intensity the amount of saturation focus and reg istration on the image s from each section A To obtain the tiff and text files from Ilumina s BeadScan software version 3 1 you will need to modify the settings xml1 file used by the software For further details see the Scan ning Settings section of http www compbio group cam ac uk Resources illumina For iScan the steps are similar although there is a separate settings file for each platform expression Infinium genotyping etc Quality assessment using scanner metrics The first view of array quality can be assessed using the metrics calculated by the scanner These include the 95th P95 and 5th P05 quantiles of all pixel intensities on the image A signal to noise ratio SNR can be calculated as the ratio of these two quantities These metrics can be viewed in real time as the arrays themselves are being scanned By tracking these metrics over time one can potentially halt problematic experiments before they even reach the analysis stage Use Case Plot the P95 P05 signal to noise ratio for the HT 12 arrays in this experiment and assess whether any samples appear to be outliers that may need to be removed or down w
3. destfile filenm gt gse lt getGEO filename filenm gt dim gse gt exprs gse 1 5 1 2 gt samples lt as character pData gse title gt sites lt as numeric substr samples 10 10 gt shortlabels lt substr samples 12 13 gt rnasource lt pData gse source_name_chi gt levels rnasource lt c UHRR Brain UHRR75 UHRR25 gt boxplot log2 exprs gse col sites 1 names shortlabels las 2 cex names 0 5 ylab expression log 2 intensity outline FALSE ylim c 3 10 main Before batch correction t Data deposited in the GEO database may be either raw or normalized This preprocess ing information is annotated in the database entry and is accessible by typing pData gse data_processing in this example Boxplots of the data can also be used to see whether normalization has taken place Z This dataset consists of 59 arrays each of which contains 47 293 probes Samples were processed and hybridized in 3 different labs see the sites vector and subject to cubic spline normalization which appears to have been applied separately to the data from each site Microarray data deposited in ArrayExpress the other major public database of high throughput genomics experiments can be imported into R using the ArrayExpress package 13 A If the above code does not work due to a network error the same dataset albeit with fewer replicates may be o
4. gt boxplot AveSignal qual gt rem lt qual No match qual Bad gt gt gt All mappings available in the illuminaHumanv2 db can be listed by the illuminaHumanv2 func tion The default keys for each mapping are Illumina IDs that have the prefix ILMN As the row names in the maqc norm object are numeric ArrayAddressIDs we first have to convert them This is achieved by use of the revmap function which reverses the direction of standard mapping from Illumina ID to ArrayAddressID The mget function can then be used to query the probe quality for our new IDs and return the result as a list which we then convert to a vector The rowMeans is a base function to calculate the mean for each row in a matrix We then make a boxplot of the average signal using probe quality as a factor Z The output of illuminaHumanv2 lists all mappings available in the package and these are divided into two sections Firstly there are mappings that use the RefSeq ID assigned to each probe to map to genomic properties and secondly there are mappings based on the probe sequence itself using the procedure described by Barbosa Morais et al 10 You should see that the expression level of the probes annotated as No match or Bad are generally lower than other categories After applying this filtering step we are left with 27 Ce oe AeA Ne 31 304 probes out of a possible 48 701 regular probes for further analysis m
5. gt boxplot registrationScores plotP95 TRUE The checkRegistration function takes a beadLevelData object as it s primary argument along with the indices of the array sections that should be checked The output from checkRegistration is an object of class beadRegistrationData We have extended the standard boxplot function to accept this class providing an easy way to visualise the registration scores Z The registration scores are generated by looking at the the difference between the within bead type variance for the given bead IDs and a randomised set of IDs If the registration has been sucessful you expect this value to be greater than zero Any section where the me dian registration score is close to zero is of concern There are two reasons why a median value of zero may be seen either then array was misregistered or there are no beads present in the image The plotP95 argument shown above allows these cases to be distinguished Plotting control probes Illumina have designed a number of control probes for each BeadArray platform Two partic ular controls on expression arrays are housekeeping and biotin controls which are expected to be highly expressed in any sample With the poscontPlot function we can plot the intensities of any ArrayAddressIDs that are annotated as belonging to the Housekeeping or Biotin control groups Use Case Generate plots of the housekeeping and biotin controls for the 6th 7th 8th and 12th
6. It contains different modules for analyzing data from dif ferent platforms For further information on the software and how to export summarized data refer to the user s manual In this section we consider how to read in and analyze output from the gene expression module of BeadStudio GenomeStudio The example dataset used in this section consists of an experiment with one Human WG 6 version 2 BeadChip These arrays were hybridized with the control RNA samples used in the MAQC project 3 replicates of UHRR and 3 replicates of Brain Reference RNA The non normalized data for regular and control probes was output by BeadStudio GenomeS tudio The example BeadStudio output used in this section is available in the file AsuragenMAQC_BeadStudioOutput zip which can be downloaded from tt http www switchtoi com datasets ilmn You will need to download and unzip the contents of this file to the current R working directory Inside this zip file you will find several files including summarized non normalized data and a file containing control information We give a more detailed description of each of the particular files we will make use of below e Sample probe profile AsuragenMAQC probe raw txt required text file which contains the non normalized summary values as output by BeadStudio Inside the file is a data matrix with some 48 000 rows In newer versions of the software these data are preceded by several lines of header information Ea
7. expression log 2 intensity outline FALSE ylim c 3 10 main After batch correction plotMDS gse batchcorrect labels shortlabels col sites 1 main MDS plot ids3 lt featureNames gse qual2 lt unlist mget ids3 illuminaHumanviPROBEQUALITY ifnotfound NA table qual2 rem2 lt qual2 No match qual2 Bad is na qual2 gse batchcorrect filt lt gse batchcorrect rem2 dim gse batchcorrect dim gse batchcorrect filt design2 lt model matrix 0 rnasource colnames design2 lt levels rnasource aw2 lt arrayWeights gse batchcorrect filt design2 fit2 lt lmFit gse batchcorrect filt design2 weights aw2 contrasts2 lt makeContrasts UHRR Brain levels design2 contr fit2 lt eBayes contrasts fit fit2 contrasts2 topTable contr fit2 coef 1 volcanoplot contr fit2 main UHRR Brain ids4 lt rownames gse batchcorrect filt chr2 lt mget ids4 illuminaHumanv1iCHR ifnotfound NA chrloc2 lt mget ids4 illuminaHumanviCHRLOC ifnotfound NA refseq2 lt mget ids4 illuminaHumanviREFSEQ ifnotfound NA entrezid2 lt mget ids4 illuminaHumanviENTREZID ifnotfound NA symbols2 lt mget ids4 illuminaHumanviSYMBOL ifnotfound NA genename2 lt mget ids4 illuminaHumanviGENENAME ifnotfound NA anno2 lt data frame I11_ID ids4 Chr as character chr2 Loc as character chrloc2 RefSeq as character refseq2 VVVVVVVVVVV VV VV VV VV VV FV
8. 1D 4f showArrayMask data array i override TRUE ap i gt table getBeadData data what wts array 8 gt table getBeadData data what wts array 12 gt BASHoutput QCc Line 1 calls the BASH function with the array argument specifying that it should be run on arrays 8 and 12 If this argument was not specified the default behaviour is to process each array As mentioned above the output from BASH is a list with several entries The wts entry is a further list where each entry is a vector of weights one value per bead with 0 indicating a that a bead should be masked We then use the setWeights function to assign these weights to our beadLevelData object Line 4 shows how the setWeights function has added an additional column to the specified array in the beadLevelData object Finally we call showArrayMask which creates a plot similar to outlierplot mentioned earlier In addition to displaying the beads classed as outliers showArrayMask shows in red the beads 14 N that are currently masked from further analysis By default the function refuses to create the plot if over 200 000 beads have been masked as this can cause considerable slowdown on older computers so the override argument has been used in this example to force the plot creation Finally we can use the getBeadData function to see how many beads were assigned a weight of 0 i e completely masked on the arrays The numbers can also retri
9. each 3 gt design lt model matrix 0 rna gt colnames design lt levels rna gt aw lt arrayWeights maqc norm filt design gt aw gt fit lt lmFit maqc norm filt design weights aw gt contrasts lt makeContrasts UHRR Brain levels design gt contr fit lt eBayes contrasts fit fit contrasts gt topTable contr fit coef 1 gt par mfrow c 1 2 gt volcanoplot contr fit main UHRR Brain gt smoothScatter contr fit Amean contr fit coef xlab average intensity ylab log ratio gt abline h 0 col 2 lty 2 The code above shows how to set up a design matrix for this experiment to combine the data from the UHRR and Brain Reference replicates to give one value per condition Empirical array quality weights 11 can be used to measure the relative reliability of each array A variance is estimated for each array by the arrayWeights function which measures how well the expression values from each array follow the linear model These variances are converted to relative weights which can then be used in the linear model to down weight observations from less reliable arrays which improves power to detect differential expression We then define a contrast comparing UHRR to Brain Reference and calculate moderated t statistics with empirical Bayes shrinkage of the sample variances For more information about the lmFit contrasts fit and eBayes functions refer to
10. end epos validPos names names allLocs validPos strand strand validPos Again we retrieve all genomic locations using the unlist as list trick The location of each probe is encoded as a string with the chromosome start and end separated by a character We can use the base strsplit function to decompose this string within an lapply to process all sequences at once The RangedData object can be created now although we add a check to make sure that no NA values are passed Use Case Find all Humanv2 probes located on chromsome 8 between bases 28 800 001 and 36 500 000 What gene symbols do they target gt query lt IRanges start 28800001 end 36500000 gt olaps lt findOverlaps Humanv2RD RangedData query space chr8 chr8 gt matchingProbes lt as matrix olaps 1 gt Humanv2RD space Humanv2RD chr8 matchingProbes gt Humanv2RD space Humanv2RD chr8 names matchingProbes gt unlist mget Humanv2RD space Humanv2RD chr8 names matchingProbes illuminaHumanv2SYMBOL This example code uses functions from the Ranges package so users wanting a deeper un derstanding of the commands should consult the documentation for that package and the appropriate help files Essentially we define a query region using the desired chromosome start and end values The findOverlaps function will then find regions on chromosome 8 of the Humanv2RD object that
11. R and the packages used to complete this tutorial are listed below If you have further questions about using any of the Bioconductor packages used in this tutorial please email the Bioconductor mailing list bioconductor stat math ethz ch 1 gt sessionInfo R version 3 2 0 2015 04 16 Platform x86_64 unknown linux gnu 64 bit Running under Ubuntu 14 04 2 LTS locale 1 LC_CTYPE en_US UTF 8 LC_NUMERIC C 3 LC_TIME en_US UTF 8 LC_COLLATE C 5 LC_MONETARY en_US UTF 8 LC_MESSAGES en_US UTF 8 7 LC_PAPER en_US UTF 8 LC_NAME C 9 LC_ADDRESS C LC_TELEPHONE C 11 LC_MEASUREMENT en_US UTF 8 LC_IDENTIFICATION C attached base packages 1 stats4 parallel stats graphics grDevices utils 7 datasets methods base other attached packages 1 limma_3 24 0 GenomicRanges_1 20 1 37 3 5 7 9 11 13 15 17 19 Biostrings_2 36 0 XVector_0 8 0 illuminaHumanv3 db_1 26 0 illuminaHumanvi db_1 26 0 illuminaHumanv2 db_1 26 0 org Hs eg db_3 1 2 RSQLite_1 0 0 DBI_0 3 1 AnnotationDbi_1 30 0 GenomeInfoDb_1 4 0 TRanges_2 2 0 S4Vectors_0 6 0 GEOquery_2 34 0 beadarray_2 18 0 ggplot2_1 0 1 Biobase_2 28 0 BiocGenerics_0 14 0 loaded via a namespace and not attached 1 Repp_0 11 5 zlibbioc_1 14 0 MASS_7 3 40 4 munsell_0 4 2 colorspace_1 2 6 stringr_0 6 2 7 plyr_1 8 1 tools_3 2 0 base64_1 1 10 grid_3 2 0 gtable_0 1 2 digest_0 6 8 13 reshape2_1 4 1 bitops_1 0 6 RCurl_1 95 4 5 16 BeadDat
12. a matrix as an argument so could be used on the expres sion matrix of an ExpressionSetIllumina object extracted using the exprs function Filtering based on probe annotation Filtering non responding probes from further analysis can improve the power to detect differ ential expression One way of achieving this is to remove probes whose probe sequence has undesirable properties Annotation quality information is available from the platform specific annotation packages which are discussed in more detail later The illuminaHumanv2 db annotation package provides access to the reannotation information provided by Barbosa Morais et al 10 In this paper a scoring system was defined to quantify the reliability of each probe based on its 50 base sequence These mappings are based on the probe sequence and not the RefSeq ID as for the standard annotation packages and can give extra criteria for interpreting the results For instance probes with multiple genomic matches or matches to non transcribed genomic locations are likely to be unreliable This information can be used as a basis for filtering promiscuous or un informative probes from further analysis as shown above 26 ANIowkwnr Four basic annotation quality categories Perfect Good Bad and No match are defined and have been shown to correlate with expression level and measures of differential expression We recommend removing probes assigned a Bad or N
13. before the illuminaHumanv2 db package can be used for the arrays in this example dataset Use Case Use the appropriate annotation package to annotate our differential expression analy sis Include the genome position RefSeq ID Entrez Gene ID Gene symbol and Gene name infor mation Write the results from this analysis out to file contr fit genes lt anno topTable contr fit write fit contr fit file maqcresultsv2 txt gt library illuminaHumanv2 db gt illuminaHumanv2 gt ids lt as character contr fit genes ProbelID gt ids2 lt unlist mget ids revmap illuminaHumanv2ARRAYADDRESS ifnotfound NA gt chr lt mget ids2 illuminaHumanv2CHR ifnotfound NA gt cytoband lt mget ids2 illuminaHumanv2MAP ifnotfound NA gt refseq lt mget ids2 illuminaHumanv2REFSEQ ifnotfound NA gt entrezid lt mget ids2 illuminaHumanv2ENTREZID ifnotfound NA gt symbol lt mget ids2 illuminaHumanv2SYMBOL ifnotfound NA gt genename lt mget ids2 illuminaHumanv2GENENAME ifnotfound NA gt anno lt data frame I11_ID ids2 Chr as character chr Cytoband as character cytoband RefSeq as character refseq EntrezID as numeric entrezid Symbol as character symbol Name as character genename gt gt gt Other useful applications of annotation packages We now give a few brief example use cases that we have encountered in our own analyse
14. data and has the default setting of performing a log transformation Use Case Generate summary level data for this dataset on the log and unlogged scale gt datasumm lt summarize BLData data gt grnchannel unlogged lt new illuminaChannel transFun greenChannelTransfornm outlierFun illuminaQOutlierMethod exprFun function x mean x na rm TRUE varFun function x sd x na rm TRUE channelName G gt datasumm unlogged lt summarize BLData data useSampleFac FALSE channelList list grnchannel unlogged Line 1 uses the default settings of summarize to produce summary level data Line 2 shows the full gory details of how to modify how the bead level data are summarized by the creation of an illuminaChannel object We use a transformation function that just returns the Grn intensities Illumina s default outlier function and modified mean and standard deviation functions that remove NA values This new channel definiton is then passed to summarize In this call we also explicitly set the useSampleFac argument to FALSE The useSampleFac parameter should be used in cases where multiple sections for the same biological sample are to be combined which is not applicable in this case Z summarize produces verbose output which firstly gives details on how many sections are to be combined none in this case and how many ArrayAddressIDs are to be summarized Notice that we summarize al
15. data twice amp The se ezprs function has been named to be consistent with existing Bioconductor func tions However it may not always return standard errors as its name suggests The data re turned will depend on the definition of the illuminaChannel class In the example above the line gt se exprs datasumm 1 10 1 2 returns standard deviations which must be divided by the sqrt of the nObservations values to report standard errors mS calculateDetection assumes that the information about the negative controls is found in a particular part of the ExpressionSetIllumina object and takes the form of a vector of characters indicating whether each probe in the data is a control or not This vector can be supplied as the status argument along with an identifier for the negative controls negativeLabel Concluding remarks So far we have looked at the kinds of low level analysis that are possible when you have access to the raw and bead level data Once quality assessment is complete and the probe intensities have been summarized one can continue down the usual analysis path of normalizing between samples and assessing differential expression using limma or other Bioconductor tools We will leave this task for now and revisit it later 22 2 Analysis of summary data from BeadStudio GenomeStudio using limma BeadStudio GenomeStudio is Illumina s proprietary software for analyzing data output by the scanning system BeadScan iScan
16. gt TIFF lt readTIFF system file extdata FullData 4613710052_B_Grn tif package BeadArrayUseCases gt cbind col TIFF which TIFF 0 row TIFF which TIFF 0 gt xcoords lt getBeadData data array 2 what GrnXx gt ycoords lt getBeadData data array 2 what GrnY gt par mfrow c 1 3 gt Ox See S i gt jolene Cie 32 Cueser CUBI IGA CO o1 s values T textCol yellow main expression log 2 intensity 1 gt points xcoords 503155 ycoords 503155 pch 16 col red gt plotTIFF TIFF offset c 1202 1212 c 13576 13586 values T textCol yellow main expression log 2 intensity 1 gt points xcoords 625712 ycoords 625712 pch 16 col red gt plotTIFF TIFF offset c 1613 1623 c 9219 9229 values T textCol yellow main expression log 2 intensity 1 gt points xcoords 767154 ycoords 767154 pch 16 col red Z There are three pixels of value zero in the image that must be an imaging artefact rather than a true measure of intensity for that location note that we add an offset of 1 to avoid taking the log of zero These pixels will bias the estimate of background for all nearby beads and so we will try a more robust estimate of the background Rather than using the mean of the five lowest pixel values we will use the median equivalently the third lowest pixel value This is
17. try to find all BeadArray associated files The function will read any text or bab and TIFF files if instructed to do so and save the names of any locs and sdf files for future reference Here we have used the dir and sectionNames arguments to specify what directory to look in and the names of the sections that should be imported By setting useImages to FALSE we use the intensity values stored in the text files rather than recomputing them from the images The argument illuminaAnnotation is a character string to identify the organism and annotation revision number of the chip being analysed but not the number of samples on the chip Hence the value Humanv3 can be used for both Human WG 6 and HumanHT12 v3 data Setting this value correctly will allow beadarray to identify what bead types are to be used for control purposes and convert the numeric ArrayAddressIDs to a more commonly used format If you are unsure of the correct annotation to use this argument can be left blank at this stage rm If the data to be imported are not in bab format specifying the same arguments to readI1llumina will search for files of type txt instead A Use of the secttonNames argument is recommended when the directory containing bead level data contains files other than those produced by the Illumina scanner Extraneous files in such directories may confuse the readIllumina function amp Storing and reading bead level data requires a considerable amount o
18. which subtracts the intensities of the negative control probes from the intensities of the regular probes should also be avoided Use Case Read the example files using the read ilmn function Work out how many different classes of probes are present on these Human arrays gt library limma gt maqc lt read ilmn files AsuragenMAQC probe raw txt ctrlfiles AsuragenMAQC controls txt probeid ProbeID 4 annotation TargetID other columns c Detection Pval Avg_NBEADS gt dim maqc gt maqc targets gt maqc E 1 5 gt table maqc genes Status The files argument is the only compulsory argument in read ilmn The directory where the probe profile or control files is located can be specified using the path and ctrlpath arguments respectively Z The intensities are stored in an object of class EListRaw defined in the limma package The dimensions of this object should be 50 127 rows and 6 columns in other words there are 50 127 probes and 6 arrays Each row in the object has an associated status which deter mines if the intensities in the row are for a control probe or a regular probe that we might want to use in an analysis The maqc targets data frame indicates the biological samples hybridized to each array Preprocessing quality assessment and filtering The controls available on the Illumina platform can be used in the analysis to improve inference As we have already seen in sect
19. G K Smyth Optimizing the noise versus bias trade off for Illumina Whole Genome Expression BeadChips Nucleic Acids Res 38 22 e204 2010 38 8 10 11 R A Irizarry B Hobbs F Collin Y D Beazer Barclay K J Antonellis U Scherf and T P Speed Exploration normalization and summaries of high density oligonucleotide array probe level data Biostatistics 4 2 249 64 2003 M E Ritchie J Silver A Oshlack M Holmes D Diyagama A Holloway and G K Smyth A comparison of background correction methods for two colour microarrays Bioinformatics 23 20 2700 7 2007 N L Barbosa Morais M J Dunning S A Samarajiwa J F Darot M E Ritchie A G Lynch and S Tavar A re annotation pipeline for Illumina BeadArrays improving the interpretation of gene expression data Nucleic Acids Res 38 3 e17 2010 M E Ritchie D Diyagama J Neilson R van Laar A Dobrovic A Holloway and G K Smyth Empirical array quality weights in the analysis of microarray data BMC Bioinformatics 19 7 261 2006 S Davis and P S Meltzer GEOquery a bridge between the Gene Expression Omnibus GEO and Bioconductor Bioinformatics 23 14 1846 7 2007 A Kauffmann T F Rayner H Parkinson M Kapushesky M Lukk A Brazma and W Huber Importing ArrayExpress datasets into R Bioconductor Bioinformatics 25 16 2092 4 2009 39
20. N BeadArray expression analysis using Bioconductor Mark Dunning Wei Shi Andy Lynch Mike Smith and Matt Ritchie April 18 2015 Introduction to this vignette This vignette describes how to process Illumina BeadArray gene expression data in its var ious formats raw files produced by the scanning software as well as summarized BeadStu dio GenomeStudio output and data deposited in a public microarray database For each file format we introduce a series of use cases in the order in which they might be encountered by an analyst and demonstrate how to perform specific tasks on the example datasets using R code from Bioconductor or base packages The R code is explained immediately afterwards and where appropriate we given an interpretation of the resultant output indicated by the Z symbol and pointers to how the R code might be adapted to other datasets or use cases Where attention must be paid to avoid alarming results we give warnings indicated by the amp symbol It is assumed that the reader has familiarity with basic R data structures and operations such as plotting and reading text files therefore explanations of these functions will not be given The version of R used to compile this vignette is 3 2 0 The Bioconductor packages required can be installed by typing the following commands at the R command prompt gt source http www bioconductor org biocLite R gt biocLite c beadarray limma GEQquery
21. The packages used to annotate Illumina BeadArrays have a very simple convention which is illumina followed by an organism name followed by annotation version number Hence the above code could be executed for a Humanv3 array by simply replacing v2 with v3 library illuminaHumanv3 db illuminaHumanv3 HHH ids2 lt unlist mget ids revmap illuminaHumanv3ARRAYADDRESS ifnotfound NA etc You may notice a few outliers in the Bad category that have consistently high expression Some strategies for probe filtering would retain these probes in the analysis so it is worth considering whether they are of value to an analysis Use Case Investigate any IDs that have high expression despite being classed as Bad queryIDs lt names which qual Bad amp AveSignal gt 12 unlist mget queryIDs illuminaHumanv2REPEATMASK unlist mget queryIDs illuminaHumanv2SECONDMATCHES mget ILMN_1692145 illuminaHumanv2PROBESEQUENCE VvVV Z You should see that probes annotated as Bad hybridize to multiple places in the genome and often contain repetitive sequences making their inclusion in the analysis questionable The probe sequences themselves can be retrieved and subjected to manual BLAT search e g using the UCSC genome browser The above example ILMN_1692145 will return many matches Other data visualisation Although we will concentrate on a differential expression analysis there
22. Vt VY 4FVV N 35 ANow kwnr Symbol as character symbols2 Name as character genename2 EntrezID as numeric entrezid2 gt contr fit2 genes lt anno2 gt write fit contr fit2 file maqcresultsvi1 txt Z The batch correction procedure equalizes the mean expression level from each lab which removes the differences that were evident in the earlier boxplot The MDS plot shows that samples separate by RNA source with the first dimension separating the pure samples UHRR A and Brain Reference B from each other and the second dimension separating the pure samples A and B from the mixture samples 75 UHRR 25 Brain Reference C and 25 UHRR 75 Brain Reference D After filtering out the probes with poor annota tion we are left with 25 797 probes As we saw in the Asuragen MAQC dataset there is a great deal of differential expression between the UHRR and Brain Reference samples Hav ing a greater number of replicate samples 15 each for UHRR and Brain Reference leads to greater statistical significance for this comparison relative to the Asuragen dataset which had just 3 replicates for each RNA source Combining data from different BeadChip versions In this section we illustrate how to combine data from different BeadChip versions from the same species using the GEO MAQC dataset Human version 1 and the Asuragen MAQC dataset Human version 2 The approach taken in this example can be used to
23. aPackR_1 20 0 scales_0 2 4 XML_3 98 1 1 19 illuminaio_0 10 0 proto_0 3 10 Acknowledgements We thank James Hadfield and Michelle Osbourne for generating the HT 12 data used in the first section and the attendees of the various courses we have conducted on this topic for their feedback which has helped improve this document References 1 MAQC Consortium The MicroArray Quality Control MAQC project shows inter and intraplatform reproducibility of gene expression measurements Nat Biotechnol 24 9 1151 61 September 2006 M L Smith and A G Lynch BeadDataPackR A Tool to Facilitate the Sharing of Raw Data from Illumina BeadArray Studies Cancer Inform 9 217 27 2010 J M Cairns M J Dunning M E Ritchie R Russell and A G Lynch BASH a tool for managing BeadArray spatial artefacts Bioinformatics 24 24 2921 2 2008 M Suarez Farinas A Haider and K Wittkowski harshlighting small blemishes on microarrays BMC Bioinformatics 6 1 65 2005 M L Smith M J Dunning S Tavar and A G Lynch Identification and correction of previously unreported spatial phenomena using raw Illumina BeadArray data BMC Bioinformatics 11 208 2010 W Shi C A de Graaf S A Kinkel A H Achtman T Baldwin L Schofield H S Scott D J Hilton and G K Smyth Estimating the proportion of microarray probes expressed in an RNA sample Nucleic Acids Res 38 7 2168 76 2010 W Shi A Oshlack and
24. ain 6 8 I2 4 outlierplot data array i main paste sectionNames data il outliers By default the outlierplot function uses Illumina s three MADs from the median rule but applied to log transformed data It then plots the identified outliers by their location on the surface of the array section Of course the user can specify alternative rules and transformations and the function is additionally able to accept arguments to plot such as main Z In this example the regions of the arrays that appear to be spatial artefacts are also flagged as outliers which would be removed before creating summarized values for each probe as re ported in BeadStudio GenomeStudio Blue points represent outliers which are below aver age while pink points are outliers above the average these colors can be set by the user refer to the outlierplot help page for details The dotted red lines running vertically indicate the segment boundaries with each Humanv3 section made up of 9 segments that are physically separated on the section surface The lo cations of these segments are taken from an sdf file or can be specified manually if the sdf file is not available 13 jas jak FOU ANDOABWNH Excluding beads affected by spatial artefacts It should be apparent that some arrays in this dataset have significant artefacts and although it appears that most beads in these regions are classed as outliers it would b
25. are many other com mon analysis tasks that could be performed once a normalized expression matrix is available Use Case Pick a set of highly variable probes and cluster the samples gt IQR lt apply maqc norm filt E 1 IQR na rm TRUE gt topVar lt order IQR decreasing TRUE 1 500 gt d lt dist t maqc norm filt E topVar gt plot hclust d We calculate the interquartile range IQR of each probe across all samples and then order to find the 500 an arbitrary number most variable These probes are then used to cluster the data in a standard way Use Case Make a heatmap to show the differences between the groups 28 OND RWNHE gt heatmap maqc norm filt E topVar A Not all datasets will show such large differences between biological groups Differential expression analysis The differential expression methods available in the limma package can be used to identify differentially expressed genes The functions ImFit contrasts fit eBayes can be applied to the normalized and filtered data Use Case Fit a linear model to summarize the values from replicate arrays and compare UHRR with Brain Reference by setting up a contrast between these samples Assess array quality using empirical array weights and incorporate these in the final linear model Is there strong evidence of differential expression between these samples gt rna lt factor rep c UHRR Brain
26. arrays from this dataset gt par mfrow c 2 2 S atone Cal aly eS 75 5 19D af poscontPlot data array i main paste sectionNames data il Positive Controls ylim c 4 15 16 616443115 B Positive Co616443081_B Positive Co gt gt rep rep Cc g o o Cc Se Cc h N N oO oy 1 OCOCIOr AOOW OCcOr AlOOwW COOhmOTOMT OM MOROTOMr O0D AAS Steerer AAS Steerer ODODOOOOOO ODODODOOOOO Mowry TOR OO Mor TOR OO Fr OOTFLODOL TF OOSFLOODOOLOD TANOOSFLOLOLOCO TAIQICOSFLOLOLOCO 616443081 _H Positive Co616494005 A Positive Co gt gt i op 02 Cc g DB o Cc T a N N D oo 2 2 OCOCIOr AOOW OMOOr AOOW OhOTOMrT 0COCD OhOTOMTOM AIA wrrr NAA wrrr OODODODOOO0OO OODOODOOO0O Oot Or Oot Or TrOOTLOODOL TrOOTFLOODOL TAN COSLOLOLOC TNACOSFLOLOLOC Provided the annotation of the beadLevelData object has been correctly set the poscontPlot function should be able to identify the revelant probes and intensities This code generates positive controls plots for four arrays in the dataset one good quality array and three that we have noted problems with Z For the good quality array the control probes are highly expressed as expected For the array with poor scanner metrics all the housekeeping beads are lowly expressed but the Bi otin controls are highly expressed indicating successful staining but unsuccessful hybridiza tion For the arrays wi
27. between versions 1 and 2 of the Human gene expression BeadChip we see good concordance between the log ratios from each dataset m A representative probe was selected for each gene on each platform before the two datasets could be combined In this example the probe with the largest average intensity among all the probes belonging to the same gene was selected from each version Use Case Advanced Compare the version 1 and 2 results with those obtained from the Hu man HT 12 version 3 dataset analyzed earlier in this tutorial You will first need to between array normalize the HT 12 data and may find it useful to filter poorly annotated probes and down weight observations for less reliable arrays identified during the quality assessment process Dif ferential expression can be assessed using linear modelling and contrasts as previously shown You will then need to annotate the probes using Entrez Gene IDs How many probes can be matched between all 3 platforms How well does the version 3 data agree with data from earlier BeadChip versions Example code for this final use case has not been provided The reader should be able to complete this exercise independently using knowledge gained from the previous examples given in the vignette There are many other analysis tools available from R or Bioconductor that could be used for downstream analysis on these data For further details see the relevant vignettes or help pages The version of
28. bove code runs the QC pipeline with the default options However many aspects of the expressionQCPipeline function are configurable such as the transformation function applied to the data or the identities of the controls The expressionQCPipeline function will attempt to create graphics and HTML files in the specified directory so it is important that this directory is writable by the user Summarizing the data The summarization procedure takes the beadLevelData object where each probe is replicated a varying number of times on each array and produces a summarized object which is more amenable for making comparisons between arrays For each array section represented in the beadLevelData object all observations are extracted transformed and then grouped together according to their ArrayAddressID Outliers are removed and the mean and standard deviation of the remaining beads are calculated 19 Nok won rk There are many possible choices for the extraction transformation and choice of summary statistics and beadarray allows users to experiment with different options via the definition of an illuminaChannel class For expression data the green intensities will be the quantities to be summarised However the illuminaChannel class is designed to cope with two channel data where the green or red or some combination of the two may be required with minimal changes to the code The summarize function is used to produce summary level
29. btained from an existing Bioconductor package named MAQCsubsetILM gt source http www bioconductor org biocLite R gt biocLite MAQCsubsetILM gt library MAQCsubsetILM 34 ooN DONIANE gt data refA gt data refB gt data refC gt data refD gt gse combine refA refB refC refD gt sites pData gse 2 gt shortlabels substr sampleNames gse 7 8 gt rnasource pData gse 3 gt levels rnasource c UHRR Brain UHRR75 UHRR25 gt boxplot log2 exprs gse col sites 1 names shortlabels las 2 cex names 0 5 ylab expression log 2 intensity outline FALSE ylim c 3 10 main Before batch correction Use Case Remove any differences between labs using the removeBatchEffect function and make a boxplot and MDS plot of the corrected data to assess the effectiveness of this step Next filter out probes with poor annotation and perform a differential expression analysis between the UHRR and Brain Reference samples Annotate the results with the same information obtained in section 2 genome position RefSeq ID Entrez Gene ID Gene symbol and Gene name with an appropriate annotation package Write the results out to file gse batchcorrect lt removeBatchEffect log2 exprs gse batch sites par mfrow c 1 2 oma c 1 0 5 0 2 0 1 boxplot gse batchcorrect col sites 1 names shortlabels las 2 cex names 0 5 ylab
30. ch row is a different probe in the experiment and the columns give different measurements for the gene For each array we record the summarized expression level AVG_Signal standard error of the bead replicates BEAD_STDERR number of beads Avg _NBEADS and a detection p value Detection Pval which estimates the probability of a gene being detected above the background level When exporting this file from BeadStudio the user is able to choose which columns to export e Control probe profile AsuragenMAQC controls txt recommended text file which contains the summarized data for each of the controls on each array which may be useful for diagnostic and calibration purposes Refer to the Illumina documentation for information on what each control measures e targets file optional text file created by the user specifying which sample is hybridized to each array No such file is provided for this dataset however we can extract sample annotation information from the column headings in the sample probe profile Files with normalized intensities those with avg in the name as well as files with one inten sity value per gene files with gene in the name instead of separate intensities for different probes targeting the same transcript are also available in this download We recommend users work with the non normalized probe specific data in their analysis where possible Hlumina s 23 OMDNDATBWNYH bo background correction step
31. combine data from different microarray platforms as well Use Case Use Entrez Gene identifiers to match genes from different BeadChip versions Make a data frame which includes fold changes between UHRR and Brain Reference samples for both datasets for the genes which could be matched between platforms and plot the log fold changes Does expression agree between platforms z lt contr fit is na contr fit genes EntrezID z lt z lorder z genes EntrezID f lt factor z genes EntrezID sel unique lt tapply z Amean f function x x max x sel unique lt unlist sel unique contr fit unique lt z sel unique z lt contr fit2 is na contr fit2 genes EntrezID z lt z lorder z genes EntrezID f lt factor z genes EntrezID sel unique lt tapply z Amean f function x x max x sel unique lt unlist sel unique contr fit2 unique lt z sel unique m lt match contr fit unique genes EntrezID contr fit2 unique genes EntrezID contr fit common lt contr fit unique is na m contr fit2 common lt contr fit2 unique m is na m Jl lfc lt data frame lfc_versioni contr fit2 common coef 1 lfc_version2 contr fit common coef 1 dim lfc options digits 2 gee al sO l VVV V FV VV VV VV VV VV VV VV 36 22 gt plot lfc 1 lfcl 2 xlab version 1 ylab version 2 23 gt abline 0O 1 col 2 Z Even though the probes were redesigned
32. ded co ordinates of the bead centres over the TIFF image we can visually check that the image has been correctly registered however there are other approaches to checking this as will be seen in the next section 11 N aOow kwnr Quality assessment for raw and bead level data Boxplots of intensity values Boxplots are routinely used to assess the dynamic range of each sample and look for unusual signal distributions Use Case Create boxplots of the green channel intensities for all arrays gt boxplot data transFun logGreenChannelTransform col green ylab expression log 2 intensity las 2 outline FALSE main HT 12 MAQC data The boxplot function is a standard function in R that we have extended to work for the beadLevelData class Consequently the standard parameters to boxplot such as changing the title of the plot scale and axis labels are possible some of which are shown in the final five arguments above The help page for the par function provides information on these and other arguments that can be supplied to boxplot The only beadarray specific argument is the second transFun which takes a transformation function of the format shown previously In this case we have selected to use the logs of the green channel which is also the default Z Most arrays have a similar distribution of intensities with a median value of around 5 7 Array 4616443081_B has a lower median and IQR than o
33. dom arrangement of probes on the array surface that is unique for each BeadArray e locs files optional 1 single channel or 2 two color for each section on a BeadChip These are usually named using the convention chipID_section_channel locs The locs file stores the locations of all beads on the array including all those that could not be decoded beads present on the array but not in the text files The locs files are particularly useful for investigating spatial phenomena on the arrays e bab files optional one for each section of a BeadChip These files contain all of the information from the text and locs files stored in a substantially more efficient manner 2 e tiff files optional 1 single channel or 2 two color for each section on a BeadChip These are usually named using the convention chipID_section_channel tif For ex ample 4613710017_B_Grn tif is the Cy3 green image for the sample in position B on BeadChip 4613710017 The Cy5 red files end in _Red tif We refer to these as the raw data as access to these images allows the user to carry out their own image processing and provides access to the background intensities the values stored in the text files have already been background corrected e sdf file optional Illumina s sample descriptor file one per BeadChip e g 4613710017 sdf This file is used to determine the physical properties of a section and which sections to combine for each sample
34. done using the medianBackground function Use Case Calculate a robust measure of background for this array and store it in a new slot GrnRB gt Brob lt medianBackground TIFF cbind xcoords ycoords gt data lt insertBeadData data array 2 what GrnRB Brob The medianBackground function takes two arguments the first of which is the image itself and the second is a two column data frame specifying the bead centre coordinates We then use insertBeadData to add the new values to the existing beadLevelData object Because the presence of extremely low intensity beads is a known issue the authors have provided the medianBackground function within beadarray to perform an alternative local 10 aoow kwnr OAANInow1kWN HE background correction However the same methodology can be used to perform the image processing in any way the user sees fit Use Case Calculate foreground values in the normal way and subtract the median background values to get locally background corrected intensities TIFF2 lt illuminaSharpen TIFF I11F lt illuminaForeground TIFF2 cbind xcoords ycoords data lt insertBeadData data array 2 what GrnF T11F data lt backgroundCorrectSingleSection data array 2 fg GrnF bg GrnRB newName GrnR ae we a NS WA AV We could have chosen a more sophisticated background correction method approximately a hundred beads end up with a negati
35. e already seen The third entry uses one of beadarray s built in transformation functions To view an example of how a transformation function is defined you can enter the name of one of beadarray s pre defined functions without any parentheses or arguments gt logGreenChannelTransform function BLData array x getBeadData BLData array array what Grn return log2 na x lt environment namespace beadarray gt gt In addition to the logGreenChannelTransform function shown above beadarray provides predefined functions for extracting the green intensities on the unlogged scale greenChannelTransform analogous functions for two channel data logRedChannelTransform redChannelTransform and functions for computing the log ratio between channels logRatioTransform Analysis of raw data when the images are available The bead level expression intensity values that Illumina s software provides i e those stored in the txt or bab files are the result of a certain amount of preprocessing and so are not strictly the raw data In most situations these values are sufficient for our use but we may on occasion wish to begin from the image file either to reassure ourselves that there are no concerns or to address a problem that has become manifest It is important then that we understand what preprocessing steps Illumina apply by default These are well documented elsewhere but to summarize a local bac
36. e desirable to mask all beads from these areas from further analysis Our preferred method for doing this is to use BASH 3 which takes local spatial information into account when determining outliers and uses replicates within an array to calculate residuals BASH performs three types of artefact detection in the style of the affymetrix oriented Harsh light 4 package Compact analysis identifies large clusters of outliers where each outlying bead must be an immediate neighbour of another outlier Diffuse analysis finds regions that contain more outliers than would be anticipated by chance and Extended analysis looks for chip wide variation such as a consistent gradient effect The output of BASH is a list containing a variety of data including a list of weights indicat ing the beads that BASH has identified as problematic These weights may be saved to the beadLevelData object for future reference by using the setWeights function The locations of the masked beads can be visualized using the showArrayMask function Use Case Run BASH on the two arrays identified previously to have the most serious spatial artefacts mask the affected beads and visualize the regions that have been excluded How many beads does BASH mask on the two arrays gt BASHoutput lt BASH data array c 8 12 gt data lt setWeights data wts BASHoutput wts array c 8 12 gt head data 8 gt par mfrow c 1 2 gt oie Gi aba e
37. e exprs datasumm 1 10 1 2 parmar C125 i O 2 Ol miron S el 2B boxplot exprs datasumm ylab expression log 2 intensity las 2 outline FALSE boxplot nObservations datasumm ylab number of beads las 2 outline FALSE det lt calculateDetection datasumm head det Detection datasumm lt det VVvVV V FVV VV V The dim function has been extended to report the key dimensions of the data namely the number of probes and samples The expression matrix and associated probe specific variability are returned by lines 2 and 3 However see the note below about the se exprs function The calculateDetection function uses the annotation information stored in the Expression SetIllumina object to identify the negative control and do the detection calculation giving a detection value for each probe on each array Z The dimensions should be reported as 49 576 Features probes and 12 samples and 1 channel The boxplot of the expression matrix should agree with your observations from the bead level data The boxplot showing the distribution of bead numbers used in the calcula tion of the summary values for each array can reveal arrays with significant spatial artefacts 21 arrays 8 and 12 in this case A Pay attention to the scale on the y axis The median level of expression should be some where around 5 to 6 with the lowest values around 4 If the median level is around 2 to 3 you may have logged the
38. e mean and stan dard deviation of all control probes on every array gt quickSummary data array 1 gt qcReport lt makeQCTable data gt head qcReport 1 5 The above code generates a quality control summary for a single array Line 1 then for all arrays in the beadLevelData object using the makeQCTable function Z You should notice that the housekeeping controls are lower for array 7 as we have noted in previous quality assessments Saving control tables to a bead level object The insertSectionData function allows the user to modify the sectionData slot of a beadLevelData object We can use this functionality to store any quality control QC values that we have computed Use Case Store the computed QC values into the bead level data object gt data lt insertSectionData data what BeadLevelQC data qcReport gt names data sectionData The insertSectionData function requires a data frame with the same number of rows as the number of sections in the beadLevelData object The what parameter is used to assign a name to the data in the sectionData slot You could also save the QC table to disk using the write csv function or similar 18 oR WN FR Defining additional control probes beadarray allows flexibility in the way that control reports are generated For instance users are able to define their own control reporters We have observed that certain probes located o
39. eighted in further analyses gt hti2metrics lt read table system file extdata Chips Metrics txt package BeadArrayUseCases sep t header TRUE as is TRUE gt hti2snr lt hti2metrics P95Grn hti2metrics PO5Grn gt labs lt paste hti2metrics 2 ht1i2metrics 3 sep _ gt par mar Cl 5 8 Oc Weil gt plot 1 12 hti2snr pch 19 ylab P95 fe POY Salles SS main Signal to noise ratio for HT12 data axes FALSE frame plot TRUE gt axis 2 gt axis N isil Tabs las 2 The above code uses standard R functions to obtain the P95 and P05 values from the metrics file stored in the package The system file function is a base function that will locate the directory where the BeadarrayUseCases package is installed The plotting commands are just a suggestion of how the data could be presented and could be adapted to individual circum stances Z The SNR for these arrays seems to be over 15 in most cases although there is one excep tion that is below 2 Illumina recommend that this ratio be above 10 and in our experience a value below 2 would be grounds for discarding a sample from further analysis The P95 value for this sample is typical of what we would expect from a blank array with no RNA hybridized Scanner metrics information is equally as useful for sample quality assessment of summa rized data ONnowRWNe mm The P95 and P05 values will f
40. ence from a possible set replicate beads of 48 000 unique sequences on each array A BeadChip comprises of a series of rectangular BeadArray sections on a slide with each section sometimes referred to as a strip containing many thousands of different probes see Figure 1 The naming of a chip is decided by the number of unique hybridizations possible and the version of the probe annotation For example there are six pairs of sections on each Human WG 6 version 2 BeadChip and 12 sections one per sample on a Human HT 12 version 3 BeadChip The high degree of replication makes robust measurements for each probe possible and provides the opportunity to detect and correct for spatial artefacts that occur on the array surface Illumina s gene expression arrays also contain many control probes and one partic ular class of these known as negative controls can be used in the analysis to improve inference Experimental data We make use of three datasets in this vignette summarized in Table 1 to illustrate how data in different formats that have undergone varying degrees of preprocessing can be imported and analyzed using Bioconductor software The first example consists of bead level data from a series of Human HT 12 version 3 Bead Chips hybridized at the Cancer Research UK Cambridge Research Institute Genomics Core facility Bead level refers to the availability of intensity and l
41. eseqs gt GC which is na probeseqs letterFrequency ss letters GC gt hist GC 50 main GC proportion This code requires the Bioconductor Biostrings package which implements efficient string op erations The as list illuminaHumanv2PROBESEQUENCE command is simply a shortcut to return the probe seqeunce for all mapped keys We then convert the sequences into a Biostrings class on which we can use the letterFrequency function to give the desired result Use Case Convert the Humanv2 probe locations into a RangedData object We will illustrate this use case using the GenomicRanges package If you do not have this package you can install it in the usual way gt source http www bioconductor org biocLite R gt biocLite GenomicRanges require GenomicRanges allLocs lt unlist as list illuminaHumanv2GENOMICLOCATION chrs lt unlist lapply allLocs function x strsplit as character x MeO ELSA I CSL YD spos lt as numeric unlist lapply allLocs function x strsplit as character x EDARIA v VVV 31 OANnDWKRWN HE gt epos lt as numeric unlist lapply allLocs function x strsplit as character x Dey ECA es DD dD gt strand lt substr unlist lapply allLocs function x strsplit as character x TTY CCL CAID 5 th 1 gt validPos lt is na spos gt Humanv2RD lt RangedData space chrs validPos ranges IRanges start sposlvalidPos
42. etrieve specific information The first line above uses the operator to access all the data in the sectionData slot which can be quite large and unwieldy The commands below it lines 2 3 are accessor functions for retrieving a specific subset of data from the same slot Line 4 shows that if a beadLevelData object is accessed in the same fashion as a list a data frame containing the bead level data for the specified array is returned To access a specific entry in this data frame we can use a further subset or the data can be accessed using the getBeadData function In addition to the beadLevelData object you need to specify the section array of interest and the column heading you want Extracting transformed data In this example the data stored in the beadLevelData object by readIllumina are extracted directly from the Illumina text files The values in the Grn vector are intensity values inferred from a known location in the scanned image and there are a number of steps involved before these can be translated into quantities that relate to the expression levels The scanner gener ally produces values in the range 0 to 216 1 although the image manipulation and background subtraction steps can lead to values outside this range This is not a convenient scale for visual ization and analysis and it is common to convert intensities onto the approximate range 0 to 16 using a logs transformation possibly after an additional step to adj
43. eved from the QC information that BASH returns Z You should see that the setWeights has added an extra wts column into the beadLevel Data object The positions of the masked beads are indicated in red in the result of showAr rayMask and should agree well with the outlier locations however BASH should have identi fied more beads than straightforward outlier removal may have missed amp Running BASH for this example uses around 2 2 GB of RAM and takes 10 minutes per sample on our computer system If you are running short on time you may wish to skip this exercise The different components of BASH to find compact or diffuse defects plus the extended score analysis can be run separately see the BASHCompact BASHExtended and BASHExtended functions m Try running the BASHExtended function for some of the other arrays in this dataset e g BASHExtended data array 1 You should see extended scores of around 0 1 Removing intensity gradients The Extended score returned by BASH in the previous use case gives an indication of the level of variability across the entire surface of the chip If this value is large it may indicate a significant gradient effect in the intensities The HULK function can be used to smooth out any gradients that are present HULK uses information about neighbouring beads but rather than mask them out as in BASH it adjusts the log intensities by the weighted average of residual values within a local neigh b
44. everal slots The experimentData sectionData and beadData slots can be viewed as a hierarchy with each containing data at a different level Each can be accessed using the operator The experimentData slot holds information that is consistent across the entire dataset Quan tities with one value per array section are stored in the sectionData slot For instance any metrics information read by readIllumina along with section names and the total number of beads will be stored there This is also a convenient place to store any QC information derived during the preprocessing of the data The data extracted from the individual text files are stored in the beadData slot Accessing data in a beadLevelData object Use Case Output the data stored in the sectionData slot and determine the section names and number of bead intensities available from each section Access the intensities x coordinates NOok WN FH and probe IDs for the first 5 beads on the first array section data sectionData sectionNames data numBeads data head data 1 getBeadData data array 1 what Grn 1 5 getBeadData data array 1 what Grnx 1 5 getBeadData data array 1 what ProbeID 1 5 VVVVV VV Using the operator to access the data in particular slots is not particularly convenient or intuitive The functions sectionNames numBeads and getBeadData provide more convenient interfaces to the beadLevelData object to r
45. f disk space and RAM For this example around 1 1 GB of RAM is required to read in and store the data One could process bead level data in smaller batches if memory is limited and then combine at the summary level Selecting the annotation for a dataset If you are unsure of the correct annotation to use and thus left the illuminaAnnotation argument to readI1lumina as the default suggestAnnotation can be employed to identify the platform based on the ArrayAddressIDs that are present in the data The setAnnotation function can then be used to assign this annotation to the dataset Use Case Verify the version number of the dataset that has been read in and set the annotation of the bead level data object accordingly gt suggestAnnotation data verbose TRUE gt annotation data lt Humanv3 Z You should see that the percentage of overlapping IDs is greatest for the Humanv3 plat form t The result of suggestAnnotation only gives guidance about which annotation to use Hence the results may be unpredictable on custom arrays or arrays that are not listed in the suggestAnnotation output The beadLevelData class Once imported bead level data are stored in an object of class beadLevelData This class can handle raw data from both single channel and two color BeadArray platforms gt slotNames data The command above gives us an overview of the structure of the beadLevelData class which is composed of s
46. ion 1 positive controls can be used to identify suspect arrays Negative control probes which measure background signal on each array can be used to assess the proportion of expressed probes that are present in a given sample 6 The propexpr function estimates the proportion of expressed probes by comparing the empirical intensity distribution of the negative control probes with that of the regular probes A mixture model is fitted to the data from each array to infer the intensity distribution of expressed probes and estimate the expressed proportion Use Case Estimate the proportion of probes which are expressed above the level of the negative controls on the MAQC samples using the propexpr function Do you notice a difference between the expressed proportions in the UHRR and Brain Reference RNA samples gt proportion lt propexpr maqc gt proportion gt t test proportion 1 3 proportion 4 6 r Systematic differences exist between different BeadChip versions so these proportions should only be compared within a given platform type 6 This estimator has a variety of 24 are FPOUMOANDOBWNH applications It can be used to distinguish heterogeneous or mixed cell samples from pure samples and to provide a measure of transcriptome size Z The UHRR and Brain Reference samples used in this experiment have a similar propor tion of expressed probes Background correction and normalization A second use for
47. kground value is calculated by taking the mean of the five lowest intensity pixels from a square around the bead A filter is then applied to the image to concentrate the intensities in the centre of beads and foreground values are calculated as a weighted sum of the intensities in a 4 x 4 square around the bead centre It is worth noting that the filter applied to the image a sharpening filter contrasts the value of a pixel with the pixels surrounding it and as such itself could be viewed as a background correction step The final intensity for a bead is then calculated as the foreground value minus the local background value By starting from the image we can adjust any of these steps e g for the background we can adjust the size of the area around the bead or the function applied to it for the foreground we can change the filtering step or adjust the pixel weighting scheme or finally we can use a more sophisticated measure of intensity than foreground minus local background to avoid negative OANowRW NEHE bo values beadarray includes three functions that closely mirror the processing performed by Illumina illuminaBackground illuminaSharpen and illuminaForeground We will illustrate a change to the Illumina process that we recommend if beginning from the image Use Case Identify abnormally low intensity pixels and then plot a section of the image that illus trates the benefit of adjusting the standard analysis
48. l arrays in beadLevelData object even though we may not use the poor quality arrays in the analysis This is just for convenience as each array is summa rized independently so there is no way of the data from a poor quality array to contaminate the data from other arrays and it is simpler to remove outlier arrays from the summarized objects mm For WG 6 arrays which have two sections per biological sample BeadStudio combines the two sections together prior to calculating means and standard deviations It is possible to mimic this behaviour by setting the useSampleFac TRUE argument in summarize This either uses information from the sdf file if present or the value of the sampleFac argument However we recommend summarizing each section separately gt If we wanted to have the un logged and logs intensities in the same summary object we could have used supplied both channels in a list as an argument to summarize datasumm all lt summarize data list grnchannel grnchannel unlogged useSampleFac FALSE t During the summarization process the numeric ArrayAddressIDs used in the beadLevel Data object are converted to the more commonly used Illumina IDs However control probes 20 aro FOOANaATKWNHH retain their original ArrayAddressIDs and any IDs that cannot be converted e g internal controls used by Illumina for which no annotation exists are removed unless removeUnMappedProbes TRUE is specified The ExpressionSe
49. l probes gt boxplot maqc norm E range 0 ylab expression log 2 intensity us las 2 xlab main Regular probes NEQC normalized Z The neqc preprocessed intensities are stored in an EList object in which the control probes have been removed leaving us with 48 701 regular probes In this dataset the intensities are fairly consistent to begin with so calibration normalization and transformation with neqc does not dramatically change the intensities on any array amp Data exported from BeadStudio GenomeStudio may already be normalized however we recommend where possible analyzing the non normalized intensities which can be normal ized in R Recall that there are 6 samples per WG 6 BeadChip Boxplots allow within BeadChip trends such as intensity gradients from top to bottom of the chip to be assessed Differences between BeadChips hybridized at different times may also be expected m The neqc function is also able to accept a matrix as an argument rather than the limma EListRaw class Therefore users who might have read the data using lumi or processed bead 25 level data with beadarray as per section 1 will still be able to use this processing method However the status negctrl and regular arguments will need to be set appropriately The beadarray package includes neqc as an option in its normaliseI1lumina function amp When applying quantile normalization it is assumed that the distributi
50. luctuate over time and are dependant upon the scanner setup Including SNR values for arrays other than those currently being analysed will give a better indication of whether any outlier arrays exist Data import and storage The next step in our analysis is to read the data into R using the readI1lumina function The bead level data you will need for this example are available in the file beadlevelbabfiles zip 133 MB which is located in the extdata BeadLevelBabFiles directory of the BeadArrayUse Cases package These files were produced using the BeadDataPackR package in Bioconductor which provides a more compact representation of bead level data Use Case Read the sample information and bead level data stored in compressed bab format into R library beadarray chipPath lt system file extdata Chips package BeadArrayUseCases list files chipPath sampleSheetFile lt paste chipPath sampleSheet csv Se pee readLines sampleSheetFile data lt readIllumina dir chipPath sampleSheet sampleSheetFile useImages FALSE illuminaAnnotation Humanv3 vv VVVV Usually the directory will be in a known location but for convenience we use the system file function this directory and the sample sheet The section names to be read are then constructed using the contents of the sample sheet The final line executes the reading of the data By default readIllumina will look in the current working directory and
51. mixture proportions were 75 25 and 25 75 hybridized to Human WG 6 version 1 BeadChips We retrieve this experiment from the GEO database using the GEOquery package The goal of each analysis is to find differentially expressed probes between the two distinct RNA samples included in each experiment UHRR and Brain Reference For the differential ex pression analysis we make use of the linear modelling approach available in the limma package Table 1 Summary of the datasets analyzed in the vignette Number of arrays Number of array sections Dataset Type of data Array Generation UHRR Brain UHRR Brain 1 raw bead level HT 12 version 3 6 6 6 6 2 summary BeadStudio WG 6 version 2 3 3 6 6 3 summary GEO WG 6 version 1 15 15 30 30 Note that these data can not be analyzed at the section level raw or bead level data would be required for this 1 Analysis of bead level and raw data using beadarray Raw and bead level data types Reading raw or bead level data into R using the beadarray package requires several files produced by Illumina s scanning software We briefly describe these files below e text files required unless bab files present a text file txt or csv for each section which stores the position identity and intensity of each bead These files are usually named chipID_section txt for arrays from BeadChips e g 4613710017_B txt and are required because of the ran
52. n the Y chromosome are beneficial in discriminating the gender of samples in a population Below we provide an example of how these probes can be incorporated into a QC report This utilizes the fact that internally beadarray uses a very simple matrix to assign ArrayAddressIDs to control types gt cprof lt makeControlProfile Humanv3 gt sexprof lt data frame ArrayAddress c 5270068 1400139 6860102 Tag rep Gender 3 gt cprof lt rbind cprof sexprof gt makeQCTable data controlProfile cprof Line 1 obtains the control information currently in use for the Humanvs platform We then create a new data frame with three rows each representing a probe from chromosome Y and row bind this to the Humanv3 control object The makeQCTable is able to accept the new object as an argument rather than using the default Humanv3 profile Generating QC reports The generation of quality assessment plots for all sections in the beadLevelData object is pro vided by the expressionQCPipeline function Results are generated in a directory of the users choosing This report may be generated at any point of the analysis If the overWrite paramater is set to FALSE then any existing plots in the directory will not be re generated Futhermore QC tables that have been stored in the beadLevelData object already can be used Use Case Generate a QC report for all arrays gt expressionQCPipeline data The a
53. nction is useful for selecting probes that show evidence for differential expression after correcting for multiple testing We also have to define a universe of all possible Entrez IDs in the dataset and Entrez IDs that are significant The GOstats package will then map the Entrez IDs to GO terms for both universe and significant probes and use a hypergeometric test to see if any GO terms appear more often in the significant list than we would expect by chance See the vignette of the GOstats package for more details 33 OANnDwWRWNEH 1 2 3 3 Analysis of public data using GEOquery In this section we show how to retrieve an Illumina MAQC dataset Human WG 6 version 1 from the Gene Expression Omnibus database GEO using the GEOquery package 12 The GEO database is a public repository supporting MIAME compliant data submissions of microarray and sequence based experiments Use Case Read in the MAQC submitted data 1 from the Illumina platform using the getGEO function Extract information about the site each sample was prepared at and the RNA sample hybridized Make a boxplot of the intensities color coded by site to look for systematic differ ences between labs gt library GEOquery gt library limma gt library illuminaHumanvi1 db gt url lt ftp ftp ncbi nih gov pub geo DATA SeriesMatrix GSE5350 gt filenm lt GSE5350 GPL2507_series_matrix txt gz gt download file paste url filenm sep
54. o match quality score after normal ization This approach is similar to the common practice of removing lowly expressed probes but with the additional benefit of discarding probes with a high expression level caused by non specific hybridization The illuminaHumanv2 db package is an example of a Bioconductor annotation package built using infrastructure within the AnnotationDBi package More detailed descriptions of how to access data within annotation packages and how it is stored is given with the AnnotationDbi package Essentially each annotation package comprises a database of mappings between a defined set of microarray identifiers and genomic properties of interest However the user of such packages does not need to know the details of the database scheme as convenient wrapper functions are provided Use Case Retrieve quality information from the Human v2 annotation package and verify that probes annotated as Bad or No match generally have lower signal Exclude such probes from further analysis maqc norm filt lt maqc norm rem dim magc norm dim magce norm filt gt library illuminaHumanv2 db gt illuminaHumanv2 gt ids lt as character rownames maqc norm gt ids2 lt unlist mget ids revmap illuminaHumanv2ARRAYADDRESS ifnotfound NA gt qual lt unlist mget ids2 illuminaHumanv2PROBEQUALITY ifnotfound NA gt table qual gt AveSignal rowMeans maqc norm E
55. ocation information for each bead on each BeadArray in an experiment In this dataset BeadArrays were hybridized with either Universal Human Reference RNA UHRR Stratagene or Brain Reference RNA Am bion as used in the MAQC project 1 Bead level data for all 12 arrays are included in the file beadlevelbabfiles zip which is available as part of the BeadArrayUseCases package These data are in the compressed bab format 2 which can be analysed using the beadarray pack age For one array 4613710052_B we also provide the uncompressed data including the raw TIFF image This allows us to demonstrate the processing options available when pixel level information is at hand The second dataset is the AsuragenMAQC_BeadStudioOutput zip file from Hlumina s website http www switchtoi com datasets ilmn This experiment uses a Human WG 6 version 2 BeadChip and consists of 3 replicates of each of the UHRR and Brain Reference samples These data are available at the summary level as generated by Illumina s BeadStudio software The Bioconductor packages lumi limma and beadarray can all handle data in this format For this dataset we focus on tools available in the limma package The final dataset which is also at the summary level is publicly available from the GEO database accession GSE5350 and was deposited by the MAQC project 1 This experiment included pure UHRR and Brain Reference RNA samples as well as mixtures of these two samples the
56. omparisons between arrays and can prevent minor variations on otherwise perfectly acceptable arrays from being exaggerated By contrast we need to be cautious that the use of zlim may be detrimental if the same limits are not appropriate for each array which could lead to some spatial artefacts being overlooked This is especially the case for our example since these arrays have not been normalized Z White patches on the imageplot are due to beads that could not be or were not decoded for the end user by Illumina As each array is randomly constructed decoding takes place at Illumina before the BeadChips are supplied in order to identify the sequence attached to each bead Beads that could not be decoded are not present in the bead level text files nor do they contribute to Illumina s summary data Hence their intensities are not available for display on the imageplot You should notice obvious spatial artefacts on arrays 8 and 12 4616443081_H and 4616494005_A Plotting the location of outliers Recall that the BeadArray technology includes many replicates typically 20 of each probe type in each sample on an HT 12 array BeadStudio GenomeStudio removes outliers greater than 3 median absolute deviations MADs from the median prior to calculating summary values Use Case Plot the location of outliers on the arrays with the most obvious spatial artefacts and plot their location gt par mfrow c 2 1 gt foe i
57. on in signal should be the same from each array This assumption may be unreasonable in some experiments and should be carefully checked with diagnostic plots Dealing with batch effects Multidimensional scaling MDS assesses sample similarity based on pair wise distances be tween samples This dimension reduction technique uses the top 500 most variable genes between each pair of samples to calculate a matrix of Euclidean distances which are used to generate a 2 dimensional plot Ideally samples should separate based on biological variables RNA source sex treatment etc but often technical effects such as samples processed to gether on the same BeadChip may dominate Principal component analysis PCA is another dimension reduction technique frequently applied to microarray data Use Case Generate a multidimensional scaling MDS plot of the data using the plotMDS func tion Assess whether the samples cluster together by RNA source 1 gt plotMDS maqc norm E Z In this experiment the first dimension separates the UHRR samples from the Brain Ref erence samples The second dimension separates replicate Brain Reference samples indicat ing that these are less consistent than the UHRR samples The scale for dimension 2 is much reduced compared to dimension 1 indicating that the underlying biological differences be tween the two RNA sources explains most of the between sample variation t The plotMDS function accepts
58. ourhood Use Case Run HULK on the first array and replace the original intensities with the gradient ad justed values gt HULKoutput lt HULK data array 1 transFun logGreenChannelTransform gt data lt insertBeadData data array 1 data HULKoutput what GrnHulk Similar to BASH the primary argument to HULK is a beadLevelData object It also takes a transformation function allowing the intensity adjustment to be performed on any data stored within the object 15 N eA WN FR A Typically we would run BASH followed by HULK on a dataset However the order in which one does BASH and HULK could be a topic for research In cases of severe gradients across the array you might get a lot of beads masked at one edge of the array However if HULK were run first these beads might be saved Checking image registration Tools such as BASH and HULK are of no use if the process of generating the image and finding beads as the array is scanned known as registration fails We have previously encountered arrays where the position of the beads within the image was not found correctly 5 resulting in the identity of all beads being scrambled The function checkRegistration can be used to identify chips where such mis registration may have taken place A This functionality is only available in beadarray version 2 3 0 or newer gt registrationScores lt checkRegistration data array c 1 7
59. overlap with the query The indices of the matching probes are given in the matchMatrix slot which can be used to subset the Humanv2 Ranges What next The results of a differential expression analysis are often not the end point of an analysis and there is an increasing desire to relate findings to biological function Although this is beyond the scope of this vignette we will give a brief example of how a Gene Ontology analysis could be performed within Bioconductor There is a wide range of online tools can perform similar analyses of which DAVID and Genetrail seem to be the most popular Use Case Using GOstats find over represented GO terms amongst the probes that show evi dence for differential expression We will illustrate this use case using the GOstats package If you do not have this package you can install it in the usual way gt source http www bioconductor org biocLite R gt biocLite GOstats 32 ra SCO ANDOTKBWNH hgOver hyperGTest params summary hgOver 1 10 gt require GOstats gt universelds lt anno EntrezID gt dTests lt decideTests contr fit gt selectedEntrezIds lt anno EntrezID dTests 1 gt params new GOHyperGParams geneIds selectedEntrezlds universeGeneIds universelIds annotation illuminaHumanv2 ontology BP pvalueCutoff 0 05 conditional FALSE testDirection over gt gt The decideTests fu
60. s Use Case Retrieve the Illumina Humanv2 IDs that are part of the cell cycle according to GO GO 0007049 or KEGG 04110 30 QANE QQ AUUNe gt cellCycleProbesGO lt mget G0 0007049 illuminaHumanv2G02PROBE gt cellCycleProbesKEGG lt mget 04110 illuminaHumanv2PATH2PROBE Use Case Retrieve the Illumina Humanv2 IDs representing the ERBB2 oncogene Which probe seems to be most appropriate for analysis gt queryIDs lt mget ERBB2 revmap illuminaHumanv2SYMBOL gt mget ERBB2 revmap illuminaHumanv2SYMBOL gt mget unlist queryIDs illuminaHumanv2PROBEQUALITY The illuminaHumanv2SYMBOL mapping is defined to map Illumina IDs to probe symbol so if we want to go the opposite way we have to use the revmap function Z There are three probes on the illuminaHumanv 2 platform for ERBB2 All three are classed as Perfect although only one is located at the 3 end of the gene Use Case Calculate the GC content of all Humanv2 probes and plot a histogram We will illustrate this use case using the Biostrings package If you do not have this package you can install it in the usual way gt source http www bioconductor org biocLite R gt biocLite Biostrings gt require Biostrings gt probeseqs lt unlist as list illuminaHumanv2PROBESEQUENCE gt GC vector length length probeseqs gt ss lt BStringSet probeseqs which is na prob
61. tIllumina class Summarized data are stored in an object of type ExpressionSetIllumina which is an extension of the ExpressionSet class developed by the Bioconductor team as a container for data from high throughput assays Objects of this type use a series of slots to store the data For consistency with the definition of other ExpressionSet objects we refer to the expression values as the exprs matrix this stores the probe specific average intensities which can be accessed using exprs and subset in the usual manner The se exprs matrix which stores the probe specific variability can be accessed using se exprs and phenotypic data for the experiment can be accessed using pData To accommodate the unique features of Illumina data we have added an nObservations slot which gives the number of beads that we used to create the summary values for each probe on each array after outlier removal The detection score or detection p value is a standard measure for Illumina expression experiments and can be viewed as an empirical estimate of the p value for the null hypothesis that a particular probe in not expressed for a more complete definition refer to section 2 These can be calculated for summarized data provided that the identity of the negative controls on the array is known using the function calculateDetection Use Case Produce boxplots of the summarized data and calculate detection scores dim datasumm exprs datasumm 1 10 1 2 s
62. th severe artefacts 4616443081_H has a large spread of values for the control probes indicating that much of the array is affected by the artefact On the other hand 4616494005_A shows high expression of the control probes giving hope that this array can be used in further analysis 17 N N Producing control tables With knowledge of which ArrayAddressIDs match different control types we can easily provide summaries of these control types on each array In quickSummary the mean and standard deviation of all control types is calculated for a specified array using intensities of all beads that correspond to the different control types Note that these summaries may not correspond to similar quantities reported in Illumina s BeadStudio GenomeStudio software as the summaries are produced after removing outliers The makeQCTable function extends this functionality to produce a table of summaries for all sections in the beadLevelData object These data can be stored in the sectionData slot for future reference It is also informative to compare the expression level of various control types to the background level of the array This is done by the controlProbeDetection function that returns the percentage of each control type that are significantly expressed above background level For positive controls we would prefer this to be near 100 on a good quality array Use Case Summarize the control intensities for the first array then tabulate th
63. the limma documentation Z The array weights are lowest for the first and second Brain Reference samples which means the observations from these samples will be down weighted slightly in the linear model analysis You will recall that the Brain Reference samples were less consistent than the UHRR samples in the MDS plot the second dimension separated out different replicate Brain Ref erence samples The UHRR and Brain Reference RNA samples are very different and we find many differentially expressed genes between these two conditions in our analysis 29 OANnwRWNEH t The 1mFit function is able to accept a matrix as well as a limma object Hence users with summarised bead level data as created in section 1 can also use this function after extracting the expression matrix using the exprs function The code would look something like fit lt lmFit exprs datasumm design weights aw Annotation Annotating the results of a differential expression analysis The topTable function displays the results of the empirical Bayes analysis alongside the an notation assigned by Illumina to each probe in the linear model fit Often this will not provide sufficient information to infer biological meaning from the results Within Bioconductor an notation packages are available for each of the major Illumina expression array platforms that map the probe sequences designed by Illumina to functional information useful for downstream analysis As
64. the negative controls is in the background correction and normalization of the data 7 The normal exponential convolution model has proven useful in background correction of both Affymetrix 8 and two color data 9 Having a well behaved set of negative controls simplifies the parameter estimation process for background parameters in this model Applying this approach to Illumina gene expression data has been shown to offer improved results in terms of bias variance trade off and reduced false positives 7 The neqc function 7 in limma fits such a convolution model to the intensities from each sample before quantile normalizing and log transforming the data to standardize the signal between samples Use Case Apply the neqc function to calibrate the background level normalize and transform the intensities from each sample Make boxplots of the regular probes and negative control probes before normalization and the regular probes after normalization and assess whether the neqc pro cedure improves the consistency between different samples gt maqc norm lt neqc maqc gt dim maqc norm gt par mfrow c 3 1 gt boxplot log2 maqc E maqc genes Status regular range 0 las 2 xlab ylab expression log 2 intensity main Regular probes gt boxplot log2 maqc E maqc genes Status NEGATIVE range 0 las 2 xlab ylab expression log 2 intensity main Negative contro
65. ther arrays and 4616443081_H has a higher median and IQR Further quality assessment will focus on these arrays and whether they should be excluded from the analysis Visualizing intensities across an array surface The combination of both an intensity and a location for each bead on the array allows us to visualize how the intensities change across the array surface This kind of visualization is not possible when using the summarized output as the summary values are averaged over spatial positions The imageplot function can be used to create false color representations of the array surface Use Case Produce a graphic with imageplots of all arrays in the dataset with each image on the same scale gt par mfrow c 6 2 gt par mar c 1 1 1 1 gt aoe Gi sim sio if imageplot data array i high darkgreen low lightgreen zlim c 4 10 main sectionNames data i 3 The imageplot function has a number of arguments the first two shown above are the object containing the data to be visualized and the index of the desired array The high and low arguments specify the colors representing the extreme values The function automatically interpolates the colors for values in between The use here of zlim ensures that the color 12 oR WN FR range is the same between arrays Any value that falls outside this range will be shown in the same color as these limits This is beneficial for making c
66. ust non positive intensities Although this is simple to do in isolation using R s built in functions it becomes more com plicated within a function leading to a large number of arguments being required in order to specify whether the function should process the green or red channel use foreground or back ground intensities convert to the logs scale etc Even in this situation the user is restricted to the options that are provided by the arguments A more flexible way to obtain transformed per bead data from a beadLevelData object is to define a transformation function that takes as arguments the beadLevelData object and an array index The function then manipulates the data in the desired manner and returns a vector the same length as the number of beads on the array Many of the plotting and quality assessment functions within beadarray take such a function as one of their arguments By using such a system beadarray provides a great deal of flexibility over exactly how the data is analysed N Use Case Extract the green intensities on the logs scale for the first 10 probes from the first array section gt log2 data 1 1 10 Grn gt log2 getBeadData data array 1 what Grn 1 10 gt logGreenChannelTransform data array 1 1 10 The above example shows three different ways of obtaining the log green channel intensity data Lines 1 and 2 use R s log2 function on data extracted using the methods we v
67. ve intensity using a crude subtraction but this suffices to illustrate the point The majority of the beads have an intensity that barely changes and those that do we can attribute to the effect of these problematic pixels Use Case Compare the Illumina intensity with the robust intensity we have just calculated plot the locations of beads whose expressions change substantially and overlay the locations of the implausibly low intensity pixels in red gt oldG lt getBeadData data array 2 Grn gt newG lt getBeadData data array 2 GrnR gt summary oldG newG gt par mfrow c 1 2 gt plot xcoords abs oldG newG gt 50 ycoords abs oldG newG gt 50 pch 16 xlab X ylab Y main entire array gt points col TIFF TIFF lt 400 row TIFF TIFF lt 400 col red pch 16 gt plot xcoords abs oldG newG gt 50 ycoords abs oldG newG gt 50 pch 16 xlim c 1145 1180 ylim c 15500 Ls 15580 xlab X ylab Y main zoomed in gt points col TIFF TIFF lt 400 row TIFF TIFF lt 400 col red pch 16 Of course in practice we would probably save the new intensities in the Grn column of a section in the beadData slot so as not to use more memory than necessary nor to cause confusion later on The image file may also be of value for the purposes of quality control and assessment In particular by plotting the recor

BeadArray expression analysis using Bioconductor

Contents

Download Pdf Manuals

Related Search

Related Contents