Home

Agilent Feature Extraction 12.0 Reference Guide

image

Contents

1. Protocol Step Parameters Type Options Description Calculate Metrics QCMetrics Formulation integer The Spikeln formulation to use for the 1 TwoColor Spikeln Calculation Different formulations 2 OneColor will yield different expected values and different concentration values 3 CGH Calculate Metrics QCMetrics_EnableDyeFlip integer If True default the sign of the slope for 1 True the spikelns plot and its trend will be 9 Fal changed when the slope is detected to Traag have the wrong sign This means the labelling was intentionally flipped and must be flipped back Calculate Metrics QCMetrics_PercentileValuefor Signal float The PercentilelntensitySignal is calculated by the software on the r g ProcessedSignal showing the signal at a given percentile over the NonControl features This parameter is the percentile used for the calculation By default the value is set to 75 the software generates the 75 Signal value of the ProcessedSignals for all channels available FeatureExtractor_Version text Version of Feature Extractor FeatureExtractor_SingleTextFile integer Output 1 True The system prints the three tables FEParams Stats and Features are printed in the same text file 0 False The system prints each of the three tables in separate text files FeatureExtractor_JPEGDownSample float Factor by which the image is scaled down Factor and then converted to the JPEG format Must be at least 2 1 is no longer allowed Feat
2. CGH 1200 Jun14 This protocol is a CGH protocol for use with the Oligonucleotide Array Based CGH for Genomic DNA Analysis Enzymatic User Manual version 6 1 or higher ULS User Manual version 3 1 or higher Table 2 Default settings for CGH_1200_Jun14 protocol Protocol step Parameter Default Setting Value v12 0 Place Grid Array Format For any format automatically Automatically Determine determined or selected by you the Recognized formats Single software uses the default Density 11k 22k 25k Double Placement Method Density 44k 95k 185k 185k 10 Parameters that apply to specific uM 65 micron feature size also formats appear only if that formatis with 10 micron scans 30 micron selected feature size single pack and multi pack and Third Party Placement Method Hidden if Array Format is set to Automatically Determine Allow Some Distortion All formats Enable Background Peak Shifting Hidden if Array Format is set to Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack for which it is set to True 16 Feature Extraction Reference Guide Default Protocol Settings 1 Table 2 Default settings for CGH_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Use central part of pack for slope Hidden if Array Format is set to and skew calculation Automatically Determine Set to False for all arrays except 30 microns single pack
3. ChIP_OCMT_Jun14 Metrics Metric Name Excellent Good Evaluate IsGoodGrid gt 1 lt 1 AnyColorPrcntFeatNonu g_BGNoise lt 15 gt 15 r_BGNoise lt 15 gt 15 _SignallIntensity t_SignalIntensity _Signal2Noise t_Signal2Noise DerivativeLR_Spread Figure 44 QC Metrics for ChIP_OCMT_Jun14 metric set GE1 OQCMT_ Jun14 Metrics Metric Name Excellent Good Evaluate IsGoodGrid gt 1 lt 1 AnyColorPrentFeatNonU lt 1 gt 1 gNegCtrlAveNetSig lt 40 gt 40 gNegCtrlAveBGSubSig 10 to 5 lt 10 or gt 5 gNegCtriSDevBGSubSig lt 10 gt 10 gSpatialDetrendRMSFilte lt 15 gt 15 gNonCntriMedCVProcSi 0to8 lt 0 or gt 8 gElaMedCVProcSignal 0to8 lt 0 or gt 8 absGElElaSlope 0 90 to 1 20 lt 0 90 or gt 1 20 DetectionLimit 0 01 to 2 lt 0 01 or gt 2 Figure 45 QC Metrics for GE1_QCMT_Jun14 metric set Feature Extraction Reference Guide 123 2 QC Report Results GE2 QCMT_ Jun14 miRNA_QCMT Jun14 124 Figure 47 Metrics Metric Name Excellent Good Evaluate IsGoodGrid gt 1 lt 1 AnyColorPrentFeatNonu lt 1 gt 1 gNegCtrlAveBGSubSig 20 to 10 lt 20 or gt 10 gNegCtrlSDevBGSubSig lt 15 gt 15 rNegCtrlAveBGSubSig 20 to 4 lt 20 or gt 4 tNegCtriSDevBGSubSig lt 6 gt 6 gNonCntriMedCVBkSub O0to18 lt 0 or gt 18 rNonCntrlMedCVBkSubsS 0 to18 lt 0 or gt 18 gElaMedCVBkSubSignal 0 to18 lt 0 or gt 18 rELaMedCVBkSubSignal 0to18 lt 0 or gt 18 absElaObsVsExpCorr gt 0 86 lt 0 86 abs
4. Fit the BGSubSig2 Order Polynomial Zoom In of plot on left Figure 69 The effect of multiplicative detrending across array features A second order polynomial is fit to the higher signals on the array resulting in a subtle shape fit This fit results in the Pro cessedSignal having a better fit to the data than the BGSub Signal An option also exists in the 2 color gene expression protocols to detrend only on replicate signals The algorithm normalizes replicates fits the surface to the normalized replicates and then uses the fit to detrend the data Because the multiplicative trend can be confused with the additive trend for dim microarrays data points inside a multiple times the standard deviation from the center of the signals for the negative control population are excluded The equations for statistics and results that are produced by this calculation are shown in the following table See Table 32 Algorithms Protocol Steps and the results they produce on page 230 for descriptions of these results 273 5 274 How Algorithms Calculate Results Table 36 Statistics and Results for Multiplicative Detrending Results Equation gMultDetrendRMSFit N gt MDS average MDS MDS MultDetrendSignal i 1 N 21 gMultDetrendSignal jf itted log 10 BgSubSignal au Fitted log 10 BgSubSignal itted log gSubSigna 5 a0 i i l N 22 gProcessedSignal BGSub Signal ee 28 Mult
5. 12 Spatial Distribution of Significantly Up Regulated and Down Regulated Features Local Bkg inliers Page 2 of 2 Up Reg 2 68 af NanCis Fealuves Randam Value 0 96 Down Reg 2 19 af NanCi Featuies Randam Value 0 99 LogRatio ersus Log Processed Signal 570 Red Down Aup Regulatea J Da LogProcessedSignal Significantly down regulated Significantly up regulated Used to normalize Not differentially expressed Figure 14 Non Agilent GE2 QC Report p2 Feature Extraction Reference Guide QC Report Results 2 QC reports with metric sets added When metric sets are associated to the protocols QC reports are generated with an additional set of evaluation metrics Depending on the microarray types some QC metric sets come with thresholds denoted by QCMT and some without thresholds denoted by QCM If thresholds are included in the metric set the evaluation tables in the QC report show metrics that are within threshold ranges or that have exceeded those ranges Agilent has determined which of the FE Stats are good metrics to follow the processing of Agilent arrays Most of the metrics chosen are useful to determine if there are problems in the various laboratory steps label hybridization wash scan steps The new IsGoodGrid metric tracks the automatic grid finding of Feature Extraction By looking at numerous data run on our arrays using our we
6. If SURROGATES are turned off then if DyeNormRedSig lt 0 0 amp DyeNormGreenSig gt 0 0 if DyeNormRedSig gt 0 0 amp 4 DyeNormGreenSig lt 0 0 if DyeNormRedSig lt 0 0 amp 0 DyeNormGreenSig lt 0 0 sar X_IMAGE_POSITION float Found coordinates of the feature Y_IMAGE_POSITION centroid in microns 216 Feature Extraction Reference Guide MAGE ML XML File Results 4 Table 30 Feature results Compact contained in the MAGE ML FEATURES table Quant Features Green Features Red Options Description Type Error LogRatioError If SURROGATES are turned off then 1000 if DyeNormRedSig lt 0 0 OR DyeNormGreenSig lt 0 0 IF SURROGATES are turned on then LogRatioError error of the log ratio calculated according to the error model chosen PValue PValueLogRatio Significance level of the Log Ratio computed for a feature Derived Green DerivedSignal Red DerivedSignal The propagated feature signal per Signal channel used for computation of log ratio Error Green ProcessedSig Red ProcessedSig Standard error of propagated feature Error Error signal per channel Measured Green Measured Red Measured Raw mean signal of feature in green Signal Signal Signal red channel SQT gMedianSignal rMedianSignal Raw median signal of feature in green red channel SQT gBGMedianSignal rBGMedianSignal Median local background signal local to corresponding feature computed per channel Error Green BGPixSDev Red
7. Protocol Step Parameters Type Options Description Flag Outliers OutlierFlagger_FeatCCoeff2 float Feature Green Signal Constant Term Multiplier Flag Outliers OutlierFlagger_BGBCoeff float Background Red Poissonian Noise Term Multiplier Flag Outliers OutlierFlagger_BGCCoeff float Background Red Signal Constant Term Multiplier Flag Outliers OutlierFlagger_BGBCoeff2 float Background Green Poissonian Noise Term Multiplier Flag Outliers OutlierFlagger_BGCCoeff2 float Background Green Signal Constant Term Multiplier Flag Outliers OutlierFlagger_PopnOLOn integer 1 True Population Outlier flagging turned on 0 False Population Outlier flagging turned off Flag Outliers OutlierFlagger_MinPopulation integer Minimum number of replicates to turn on population outlier flagging Flag Outliers OutlierFlagger_lORatio float The boundary conditions for conducting box plot analysis to isolate population outliers Flag Outliers OutlierFlagger_BackgroundlORatio float The boundary conditions for conducting box plot analysis to isolate population outliers for the background Flag Outliers OutlierFlagger_Use Otest integer Enables Otest statistics when the minimum 1 True number of replicates for population outliers is greater than 2 and less than the 0 False Sa ae minimum population specified in the outlier section of the protocol Flag Outliers OutlierFlagger_UsePopnOLInMAGE integer Indicates whether to report population 1 True outliers as Failed in MA
8. WellAbove Background or not Feature passes g r lsPosAndSignif and additionally the g r BGSubSignal is greater than 2 6 g r BGSDUsed Set to true for a given feature if it is part of the filtered set used to detrend the background This feature is considered part of the locally weighted lowest x of features as defined by the DetrendLowPassPercentage Value of the smoothed surface calculated by the Spatial detrend algorithm A boolean used to flag features used for computation of global BG offset Background used to subtract from the MeanSignal variable also used in t test To display the values used to calculate this variable using different background signals and settings of spatial detrend and global background adjust see Table 34 on page 254 SQT Specialized Quantitation Type Feature Extraction Reference Guide 215 4 MAGE ML XML File Results Table for Compact Output Package This table contains only those columns required by Resolver GeneSpring CGH Analytics and Chip Analytics In the Compact version of the MAGE ML file the entire FEPARAMS section is included MAGE ML has a rich mechanism for describing protocols and protocol parameters Table 30 Feature results Compact contained in the MAGE ML FEATURES table Quant Features Green Features Red Options Description Type Ratio LogRatio base 10 log REDsignal GREENsignal per feature processed signals used to calculate log ratio
9. i i 5000 10000 15000 20000 25000 30000 35000 Figure 50 Histogram of a 30 micron feature array image The X axis cor responds to the pixel value and the Y axis to the frequency of occurrence 4000000 3000000 2000000 Frequency Red 1000000 Frequency Green o oO 20 40 60 80 100 120 Figure 51 Zoomed in section of Figure 50 The background peaks are at 32 for the red channel and 50 for the green channel Feature Extraction Reference Guide 239 5 240 How Algorithms Calculate Results 6000000 5000000 4000000 3000000 Frequency Red 2000000 1000000 o T T o 5000 10000 15000 20000 25000 30000 35000 Frequency Green Figure 52 Histogram of a 30 micron feature array image after Back ground Peak Shifting 6000000 4000000 Frequency Red 2000000 Frequency Green o N Q _ 00 8 120 Figure 53 Zoomed in section of Figure 52 Note the peaks at pixel val ue 0 Also note the dips in the frequency of values near the pixel value of 32 for the red channel and 50 for the green channel When the Use central part of pack for slope and skew calculation flag is set to True the gridding algorithm is modified to use central region of the pack to obtain slope skew and origin of each pack instead of using the edges of packs This enables the algorithm to correctly place the grid for arrays that have edges populated with dim spots When the Use the c
10. 2 244 Feature Extraction Reference Guide How Algorithms Calculate Results 5 24 nearest neighbors n 2 Figure 57 Example of the radius for the second closest set of nearest neighbors or n 2 Step 6 Reject outliers The calculation to determine the boundaries for rejection of the outlier pixels is defined in the following equations and diagram Assumptions for default value of 1 42 The following assumptions lead to the default value of 1 42 for this parameter e Normal distribution for pixel intensity where y axis corresponds to pixel frequency and x axis corresponds to pixel intensity e A 99 confidence interval that the pixels of interest are contained within the boundaries for rejection Feature Extraction Reference Guide 245 5 How Algorithms Calculate Results The nterquartile Range IQR is the range of points under a Gaussian distribution contained between the 25th percentile mark 25 of the points are contained under the curve from the zero point to the 25th percentile mark and the 75th percentile mark The 50th percentile mark is coincident with the median of the curve The boundary for rejection is the point on the x axis beyond which all pixels will be rejected D is the distance between the mean of the curve and the boundary for rejection 246 Calculations of default value The following calculations are based on the above assumptions e Ifa pixel is located within the 99
11. A feature is a population outlier if its signal is less than a lower threshold or exceeds an upper threshold determined using a multiplier 1 42 times the interquartile range i e IQR of the population SQT glsBGPopnOL rlsBGPopnOL g r IsBGPopnOL 1 The same concept as above but for indicates local background background is a population outlier in g r SQT gBGSubSignal rBGSubSignal gBGSubSignal Background subtracted signal gMeanSignal To display the values used to calculate gBGUsed this variable using different background signals and settings of spatial detrend and global background adjust see Table 34 on page 254 218 Feature Extraction Reference Guide MAGE ML XML File Results 4 Table 30 Feature results Compact contained in the MAGE ML FEATURES table Quant Features Green Features Red Options Description Type SQT IsManualFlag Boolean flag that describes if the feature centroid was manually adjusted SQT glsPosAndSignif rlsPosAndSignif g r isPosAndSignif Boolean flag established via a 2 sided 1 indicates t test indicates if the mean signal of a Feature is positive feature is greater than the corresponding and significant background selected by user and if this above background difference is significant To display variables used in the t test see Table 34 on page 254 SQT glsWellAboveBG rlsWellAboveBG Boolean flag indicating if a feature is WellAbove Background or not Feature passes g r
12. Calculation of Spot Statistics Method Cookie Percentage Exclusion Zone Percentage Auto Estimate the Local Radius LocalBGRadius Hidden if Array Format is set to Automatically Determine 8 0 for all formats except for third party for which it is set to 1 5 Hidden if Array Format is set to Automatically Determine Use Cookie All Formats Hidden if Array Format is set to Automatically Determine 0 650 Single Density 25k 0 561 Double Density 95k 0 700 185k 185k 10 uM 244k 10 uM 65 micron feature size 0 750 30 micron feature size Hidden if Array Format is set to Automatically Determine 1 200 All Formats except 30 micron feature size 1 300 30 micron feature size Hidden if Array Format is set to Automatically Determine True Single Density Double Density 25k 95k False 185k 185k 10uM 65 micron feature size 30 micron feature size 244k 10uM Hidden if Array Format is set to Automatically Determine 100 when False for 185k 185k 10uM 65 micron feature size 244k 10 uM 32 Feature Extraction Reference Guide Table 4 Default settings for GE1_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 Pixel Outlier Rejection Method RejectlORFeat RejectlQRBG Statistical Method for Spot Values from Pixels Flag Outliers Compute Population Outliers Minimum Population IORatio Background I
13. RR Sa et Feat Positi sani 9 Local Bkg Cep a f Ate es a Steet eatures itiv e reen Negative Log Ratios on Number 22105 22153 Allp ReqeleteaY Dawe Regulated eek ee aan Up Reg 9 85 af Nanci Features Random Value 0 92 age 100 Down Reg B48 of NanCu Fealuies Randam Value 1 17 p 10 0 93 1 35 LogRatio ersus Log Processed Signal Foreground Surface Fit S 9 Local Background Bee Sear H Inliers on page 97 RMS_Fit 111 1 82 E Jeni RMS_Resid 3 27 4 06 1 0 Avg_Fit 70 82 81 20 0s 10 Foreground Surface 12 Reproducibility C for Replicated Probes 3 ae Median C Signal inliers A oss Fit on page 97 Non Contral probes Agilent SpikeIns a Red Green Red Green AS qe 3 5 7 9 11 GSubSignal 15 05 1348 10 21 10 57 LogProcessedSignal 11 Plot of LogRatio vs Log fect 7 39 775 444 554 aaa ES rocessedSignal on Array Uniformity LogRatios O See eT P Non Contral Agilent Spiketns SpE Tea page 101 Not differentially expressed AbsAvgLogRatio 0 26 0 48 Agilent SpikeIns C of Average BG Sub Signal F eee aan AverageS N 3 86 43 07 15 12 Reproducibility Statistics 14 Sensitivity Agilent Spikelns Ratio of Signal to 4 0 H Backgroun r 2 dimmest probes CV Replicated Probes on E1A_r60_n11 E1A_r60_a97 fee age 104 9 r 9 r 10 pag iiiiIl P 41 3 2 15 0 2 3 ba ni g 15 13 Microarray Uniformity 5 2 color only on page 106 a aatan 0 14 Sensitivity
14. Spot Finding for Four Corners Feature Extraction Reference Guide QC Report Results 2 Outlier Stats If the QC Report shows a greater than expected number of nonuniform or population outliers check your hybridization wash step Also check the visual results shp file to see if the spot centroids are off center If the grid was not placed correctly a new grid is required Local Background Red Green Red Green Feature Non Uniform 4 Population 98 Figure 19 QC Report Outlier Stats For 1 color reports the number of outliers is reported for the green channel only Spatial Distribution of All Outliers The QC report shows two plots of all the outliers both population and nonuniformity outliers whose positions are distributed across the microarray One plot is for the green channel and the other for the red channel SNP probes are included To distinguish the background population and nonuniform outliers from one another look at the color coding at the bottom of the two plots For the 1 color report only the green plot is shown Feature Extraction Reference Guide 91 2 OC Report Results Spatial Distribution of All Outliers on the Array 105 rows x 215 columns FeatureNonUnif Red or Green 11 0 05 GeneNonUnif Red or Green 5 0 025 BG NonUniform BG Population Red FeaturePopulation Red Feature NonUniform Green FeaturePopulation Green Feature NonUniform Figure 20 QC Repo
15. We recommend Metrics checking the QC Report the image and the grid before using this data 3000 The Median percent CV of the replicated Grid No probes is very high We recommend Metrics checking the OC Report the image and the grid before using this data 4000 Algorithm Error This means that Poly Data Yes Outlier flagger had a problem Several Processing possible error messages can be generated here but they all happen in Outlier Flagging 4000 SpotAnalyzer Not enough pixels for Data Yes good pixels statistics Try adjusting the Processing protocol Try turning off pixel outlier rejection 4000 Execution error DyeNorm No Data Yes normalization file selected The select Processing Protocol requests use of a Dye Norm list during Dye Normalization but a Dye Norm List was not supplied either by external file or by GridTemplate default 316 Feature Extraction Reference Guide Feature Extraction Reference Guide Command Line Feature Extraction 6 Table 41 XML error codes Error Error message Type Abort code 4000 NRC Error a or b too big or MAXIT too Data Yes small in betacf Note this error can be Processing generated in Dye Normalization or in Background Subtraction The Error code will either be 4050 or 4012 as a result 4000 Execution error DyeNorm Need a 2 Data Yes color scan to do dye normalization Processing 4000 Execution error DyeNorm There are Data Yes not enough features to perform dye
16. and Third Party Feature Extraction Reference Guide 23 1 Default Protocol Settings Table 3 Default settings for ChIP_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Find Spots Spot Format Iteratively Adjust Corners Adjustment Threshold Maximum Number of Iterations Found Spot Threshold Number of Corner Feature Side Dimension Depending on the format selected by the software or by you the default settings for this step change See the following rows for the default values for finding spots Use the Nominal Diameter from the Grid Template Spot Deviation Limit Hidden if Array Format is set to Automatically Determine True All Formats except Third Party False Third Party Hidden if Array Format is set to Automatically Determine 0 300 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 5 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 0 200 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 20 All Formats except Third Party Automatically Determine Recognized formats same as those listed above except 244k 10uM replaces 65 micron feature size 10 micron scans Hidden if Array Format is set to Automatically Determine True All Formats Hidden if Array Format is set to Automatically Determine
17. calculated by the Spatial detrend algorithm These points are considered to be in the background for the purposes of spatial detrending and multiplicative detrending If the Boolean value is true for a given point it will be used in spatial detrending and not in multiplicative detrending depends on parameters Diameter of the spot X axis Diameter of the spot Y axis MeanSignal minus DarkOffset This signal is the robust average of all the processed green signals for each replicated probe multiplied by the total number of probe replicates the EffectiveFeature SizeFraction the Nominal Spot Area and the Weight For miRNA analyses This error is the robust average of all the processed green signal errors for each replicated probe multiplied by the total number of probe replicates the EffectiveFeature SizeFraction the Nominal Spot Area and the Weight For miRNA analyses Feature Extraction Reference Guide 187 3 Text File Parameters and Results Table 22 Feature results contained in the FULL output text file FULL FEATURES table continued Features Green Features Red Types Options Description glotalGeneSignal glotalGeneError glsGeneDetected gMultDetrendSignal gProcessed Background gProcessedBkng Error IsUsedBGAdjust glnterpolatedNeg CtrlSub glsInNegCtriRange glsUsedInMD rMultDetrendSignal rProcessed Background rProcessedBkng Error rinterpolatedNeg Ctr
18. e Some Feature Extraction parameter names FE PARAMS table have been changed to accommodate Rosetta Resolver terminology e MAGE result file includes all information included in the FEATURES table except for annotations deletion control information and spot size information e Feature results FEATURES table are associated with quantitation types as defined by the Object Management Group in its Gene Expression Specification paper of February 2003 V 1 These types are listed here Measured Signal Derived Signal Ratio Confidence Indicators error and p value Specialized Quantitation Type SQT includes all other data Full and Compact Output Packages In the Properties sheet for the project you can select if you want the MAGE ML result file to contain all the possible columns and results Full or a reduced set of results Compact Feature Extraction Reference Guide 207 4 208 MAGE ML XML File Results MAGE ML files can also be compressed before they are sent via FTP Compressed MAGE ML files further reduces the size of the file to decrease the transfer time Use both Compact and Compressed MAGE ML files for Resolver The Compact package contains only those columns required by Resolver GeneSpring CGH Analytics and Chip Analytics In the Compact version of the MAGE ML file the entire FEPARAMS section is included MAGE ML has a rich mechanism for describing protocols and protocol parameters Tables for Ful
19. float float Number of non control features with negative background subtracted signals Global dye norm factor The root mean square of the average lowess dye norm factor The lowess dye norm factor for each feature is its DyeNormSignal divided by its BGSubSignal Dimensionless RMS correction metric metric that indicates how much correction has been applied based upon the LOWESS curve Unit weighted RMS correction metric metric that indicates how much correction has been applied based upon the LOWESS curve Root mean square RMS of the fitted data points obtained from the Loess algorithm This gives an idea of the curvature of the surface fit Approximate residual from the surface fit Normalized area the fitted surface area divided by the projected area on the microarray also gives an idea of the curvature of the surface gradient Sum of the intensities of the surface area minus the offset The offset is calculated as the volume under the flat surface parallel to the glass slide passing through the minimum intensity point of the fitted surface This number total volume offset is normalized by the area of the microarray Describes the average intensity of the surface gradient 162 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Descripti
20. gGlobalBGlnlierSDev gGlobalBGInlierNum gNumFeatureNonUnifOL gNumPopnOL gNumNonUnifBGOL gNumPopnBGOL gOffsetUsed rLocalBGInlierNetAve rLocalBGInlierAve rLocalBGInlierSDev rLocalBGInlierNum rGlobalBGinlierAve rGlobalBGInlierSDev rGlobalBGInlierNum rNumFeatureNonUnifOL rNumPopnOL rNumNonUnifBGOL rNumPopnBGOL rOffsetUsed float float float integer float float integer integer integer integer integer float The average of the net signal of all inlier local backgrounds The average of all inlier local backgrounds The standard deviation of all inlier local backgrounds The number of inlier local backgrounds The average of all inliers used in background estimation for the selected global background subtraction method or the average of all inlier local backgrounds if the local background subtraction method is selected after global background adjustment is applied if selected The standard deviation of all inliers used in background estimation for the selected global background subtraction method or the standard deviation of all inlier local backgrounds if the local background subtraction method is selected The number of all inliers used in background estimation for the selected global background subtraction method or the number of all inlier local backgrounds if the local background subtraction method is selected The number of features that are flagged as
21. if DyeNormRedSig lt 0 0 amp DyeNormGreenSig lt 0 0 If SURROGATES are turned off then if DyeNormRedSig lt 0 0 OR DyeNormGreenSig lt 0 0 IF SURROGATES are turned on then LogRatioError error of the log ratio calculated according to the error model chosen Significance level of the LogRatio computed for a feature The g r surrogate value used No surrogate value used 180 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 22 Feature results contained in the FULL output text file FULL FEATURES table continued Features Green Features Red Types Options Description glsFound rlsFound boolean 1 IsFound A boolean used to flag found features 0 IsNotFound The flag is applied independently in each channel A feature is considered Found if two conditions are true 1 the difference between the feature signal and the local background signal is more than 1 5 times the local background noise and 2 the spot diameter is at least 0 30 times the nominal spot diameter gProcessedSignal rProcessedSignal float The signal left after all the Feature Extraction processing steps have been completed In the case of one color ProcesssedSignal contains the Multiplicatively Detrended BackgroundSubtracted Signal if the detrending is selected and helps If the detrending does not help this column will contain the BackgroundSubtractedSignal gProcessedSigError rProcessedSigE
22. vi to v3 v3 to v4 gt vi or lt v4 v2 to v4 gt v2 lt v4 v2 to v3 gt v2 or v3 to v4 lt v4 lt v2 gt v2 Excellent Good v2 to v3 gt v2 or lt v3 Excellent Good gt v3 lt v3 Excellent Good NA special case lt v2 when vl v2 gt vi Excellent Evaluate NA special case when vl v2 and v3 v2 to v3 v4 gt vi or lt v4 Excellent Evaluate NA special case gt v3 when v3 v4 lt v4 Excellent Evaluate 3 level metrics that may be used in FEv10 7 lt v2 vl to v2 gt vi Excellent Good Evaluate E v2 to v3 vi to v2 or v3 to v4 gt vi or lt v4 Excellent Good Evaluate gt Mton lt 4 lower__ Excellent Good Evaluate 12 3 level metrics that are asymmetric supported but not normally used FEv10 7 Range _ Excellent Good Evaluate 13 _ Range Excellent Good Evalvate 14 _ Range _ Excellent Good Evaluate 15 Excellent Good Evaluate 16 Excellent Good Evaluate Excellent Good Evaluate 18 Feature Extraction Reference Guide FEPARAMS table STATS table FEATURES table Agilent Feature Extraction 12 0 Reference Guide 3 Text File Parameters and Results Parameters options FEPARAMS 129 FULL FEPARAMS Table 129 COMPACT FEPARAMS Table 150 QC FEPARAMS Table 153 MINIMAL FEPARAMS Table 156 Statistical results STATS 159 STATS Table ALL text output types 159 Feature results FEATURES 178 FULL Features Table 178 COMPACT Features Table 189 QC Features Table 194 MINIMA
23. 0 Place Grid Array Format Optimize Grid Fit Grid Format For any format automatically determined or selected by you the software uses the default Placement Method Parameters that apply to specific formats appear only if that format is selected Placement Method Enable Background Peak Shifting Use central part of pack for slope and skew calculation Use the correlation method to obtain origin X of subgrids The parameters and values for optimizing the grid differ depending on the format Automatically Determine Recognized formats Single Density 11k 22k 25k Double Density 44k 95k 185k 185k 10 uM 65 micron feature size also with 10 micron scans 30 micron feature size single pack and multi pack and Third Party Hidden if Array Format is set to Automatically Determine Allow Some Distortion All formats Hidden if Array Format is set to Automatically Determine Set to false for all arrays except 30 microns single pack and multi pack for which it is set to true Hidden if Array Format is set to Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack for which it is set to True Hidden if Array Format is set to Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack for which it is set to True Automatically Determine Recognized formats 65 micron feature size 30 micron feature size
24. 0 Spatial Distribution of All Outliers on the Array 192 rows x 82 columns ne ees aj bd ie 4 Featuren onUnif 0 0 00 Feature Population Outliers 779 5 67 SBG NanUnifarm BG Population eGieen FealuiePapulatiang Gieen Feature NanUnifarm OC Report Results Page 1 of 2 Grid 019118_D_F_20110707 BG Method No Background Background Detrend On FeatN CRange LoPass Multiplicative Detrend False Additive Error 3 Saturation Value 65524 Net Signal Statistics Non Control probes Saturated Features t 99 of Sig Distrib 10731 50 of Sig Distrib 29 1 of Sig Distrib 22 Negative Control Stats Average Net Signals 27 24 StdDev Net Signals 1 99 Average BG Sub Signal 0 36 StdDev BG Sub Signal 1 99 Histogram of Signals Plot i Log of BG SubSignal WM Histogram of Signals Features NonCtrl with BGSubSignal lt 0 4445 Green zg g sees 60 50 Number of Points 40 30 20 10 ol E 7 his 4 5 2 Figure 11 MicroRNA miRNA QC Report p1 79 QC Report Results Page 2 of 2 8 Foreground Surface Fit on 8 Foreground Surface Fit Spatial Distribution of Median Signals for each Row page 97 RMS_Fit 0 86 RMS_Resid 2 56 6 Avg_Fit 37 20 Reproducibility C for Replicated Probes Median CV Signal inliers Ba 9 Reproducibility Statistics 9 Revco 7 8 CV Replicated Probes on Besubsigna
25. 20 10 0 Iina Aana na Column Median BGSub Signal for Column Median Proc Signal for Column Median Signal Figure 5 1 color Gene Expression OC Report with Spike ins p2 73 QC Report Results Page 3 of 3 Agilent SpikeIns C of Avg Processed Signal Plot Agilent SpikeIns cory ice Log Relative 14 Reproducibility plot for Bre aS 1 color gene expression a 14 me 15 spike in probes on a 40 t page 109 la E a z 5 20 A 175 15 Spike in Linearity Check ee for 1 color Gene Pa 3 I o 5 10 0 25 Expression on page 114 Log_oMedianProcessedSignat 020 080 180 280 380 480 580 680 o cviocreen ooo e Log Concentration E evaibation Metics for GEILOCMT JMi Agilent Spike In Concentration Response Statistics 16 OC Metric Set Good 11 Linear Range Statistics a Metric Name Value Excellent Good ajusta Results on page 122 i sion 0 33 pare Fe ato aes o lt 1 High Signal 5 67 olorPrentFe atNonUn lt gt 1 mo Low Relative gNegOrl veNetSig 17 lt 40 gt 40 0 75 17 Table of Values for gNegCtlAveBGSubSig 0 22 1005 lt 10or gt 5 Saeni 17 gNegCtrlSDevB GSubSig 130 lt 10 gt 10 High Relative 6 64 Concentration Response Plot OSpatlDewendRMSFIe 1 5 is gt is Concentration a gNon Cnt lMedCvProcSign s2 Ows lt 0or gt e Slope 0 91 1 color only on page 115 gELaMedCvProcsignal as 08 lt 0or gt 8 RAZ Value 1 00 2
26. 8 0 for all formats except for third party for which it is set to 1 5 24 Feature Extraction Reference Guide Table 3 Default settings for ChIP_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 Calculation of Spot Statistics Method Cookie Percentage Exclusion Zone Percentage Auto Estimate the Local Radius LocalBGRadius Pixel Outlier Rejection Method Hidden if Array Format is set to Automatically Determine Use Cookie All Formats Hidden if Array Format is set to Automatically Determine 0 650 Single Density 25k 0 561 Double Density 95k 0 700 185k 185k 10 uM 244k 10 uM 65 micron feature size 0 750 30 micron feature size Hidden if Array Format is set to Automatically Determine 1 200 All Formats except 30 micron feature size 1 300 30 micron feature size Hidden if Array Format is set to Automatically Determine True Single Density Double Density 25k 95k False 185k 185k 10uM 65 micron feature size 30 micron feature size 244k 10uM Hidden if Array Format is set to Automatically Determine 100 when False for 185k 185k 10uM 65 micron feature size 244k 10 uM 150 when False for 30 micron feature size Inter Quartile Region Automatically Determine and All Formats Feature Extraction Reference Guide 25 1 Default Protocol Settings Table 3 Default settings for ChIP
27. Default is false false 0 Find Spots SpotAnalysis_PixelSkewCookiePct float The percentage of the feature that should 0 00 1 00 be used when calculating the pixel skew A 0 70 default value of 70 means 70 of the radius of the feature Find Spots SpotAnalysis_CentroidDiff Integer The software computes the per feature 1 True Centroid Difference between the Grid 0 False position and the Spot Center Find Spots SpotAnalysis_NozzleAdjust Integer The software attempts to adjust a nozzle 1 True group in order to compensate for variations 0 False In printing Flag Outliers OutlierFlagger_Version text Version of Outlier Flagger algorithm Flag Outliers OutlierFlagger_NonUnifOLOn integer 1 True NonUniformity Outlier flagging turned on 0 False NonUniformity Outlier flagging turned off Flag Outliers OutlierFlagger_FeatATerm float Applies to feature specifies the intensity dependent variance and is set to the square of the CV Flag Outliers OutlierFlagger_FeatBTerm float Applies to feature specifies the variance due to the Poisson distributed noise Feature Extraction Reference Guide 135 3 Text File Parameters and Results Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Flag Outliers OutlierFlagger_FeatCTerm Flag Outliers OutlierFlagger_BGATerm Flag Outliers OutlierFlagger_BGBTerm Flag Outliers OutlierFlagger_BG
28. Extraction Reference Guide How Algorithms Calculate Results 5 Step 27 Calculate the p value and error on log ratio of feature PvalueLogRatio and LogRatioError on page 279 of this guide Table 39 Summary Use of surrogates for calculations Case 1 R G Case 2 r G Both channels use DyeNorm Signals r rSurrogateUsed P value and log ratio are calculated as usual G gDyeNormSignal For signals not using surrogates P value and log ratio are calculated as usual g r DyeNormSignal g r ProcessedSignal If r G gt 1 then Feature Extraction automatically sets which is then used to calculate log ratio LogRatio 0 and PvalueLogRatio 1 Case 3 R g Case 4 r g R DyeNormSignal Both channels use surrogates g gSurrogateUsed Feature Extraction automatically sets P value and log ratio are calculated as usual LogRatio 0 and pValueLogRatio 1 If R g lt 1 then Feature Extraction automatically sets For signals using surrogates LogRatio 0 and pValueLogRatio 1 g r ProcessedSignal g r SurrogateUsed g r DyeNormFactors Feature Extraction Reference Guide 295 5 How Algorithms Calculate Results Data from the FEATURES Table 296 Feature Extraction Reference Guide Agilent Feature Extraction 12 0 Reference Guide 6 Command Line Feature Extraction Commands 299 Return Codes 305 Extraction Input 307 Extraction Results 312 The command line version of Feature Extraction software is called FeNoWindows You can
29. FeNoWindows c addmetricset lt metricset_file_path gt metricset_file path The path and name of a metric set file This command adds a dyenormlist to the database FeNoWindows c adddyenormlist g gridtemplatename lt dyenormlist_file_path gt gridtemplatename The name of the database grid template that the probes in the dye norm list must match dyenormlist_file_path The path and name of the dye norm list The dye norm list needs to look like ProbeNamel GeneNamel SystematicNamel ProbeName2 GeneName2 SystematicName2 ProbeName3 GeneName3 SystematicName3 Spaces between words must be a tab and no white space is allowed at the end of the file When a list is read into the database it is checked against the specified grid template to make sure that the probes match with what is in the grid template The basename of the file is used to name the dye norm list in the database 301 302 Command Line Feature Extraction removegrid removeprotocol removemetricset removedyenormlist linkprotocoltogrid Example c adddyenormlist g 14850_D_F_ 20060807 C DyeNormlist MyNormlist txt This command removes a grid from the database FeNoWindows c removegrid lt gridname gt gridname The name of the grid This command removes a protocol from the database FeNoWindows c removeprotocol lt protocol_name gt protocol name The path to the protocol file This command removes a metric set from the database
30. Fit 97 Multiplicative Surface Fit 99 Spatial Distribution of Significantly Up Regulated and Down Regulated Features Positive and Negative Log Ratios 100 Plot of LogRatio vs Log ProcessedSignal 101 Spatial Distribution of Median Signals for each Row and Column 102 Histogram of LogRatio plot 103 Inter Feature Statistics 104 Reproducibility Statistics CV Replicated Probes 104 Microarray Uniformity 2 color only 106 Sensitivity 107 Reproducibility Plots 108 Spike in Signal Statistics 111 Spike in Linearity Check for 2 color Gene Expression 113 Spike in Linearity Check for 1 color Gene Expression 114 QC Report Results in the FEPARAMS and Stats Tables 121 Feature Extraction Reference Guide Contents QC Metric Set Results 122 CGH_OCMT_Junl4 122 ChIPLOCMT_Junl4 123 GE1_QCMT_Junl4 123 GE2_QCMT_Junl4 124 miRNA_OCMT_Junl4 124 Metric Evaluation Logic 125 3 Text File Parameters and Results 127 Parameters options FEPARAMS 129 FULL FEPARAMS Table 129 COMPACT FEPARAMS Table 150 QC FEPARAMS Table 153 MINIMAL FEPARAMS Table 156 Statistical results STATS 159 STATS Table ALL text output types 159 Feature results FEATURES 178 FULL Features Table 178 COMPACT Features Table 189 QC Features Table 194 MINIMAL Features Table 200 Other text result file annotations 204 4 MAGE ML XML File Results 205 How Agilent output file formats are used by databases 206 MAGE ML results 207 Differences between MAGE ML and tex
31. GPixNormlQR for BGPixSDev Norml0R 0 7413 x IQR The program does not make these substitutions for the Feature NonUniformity Outlier algorithm See the previous page for the definition of the Interquartile Range IQR Feature Extraction Reference Guide How Algorithms Calculate Results 5 MeanSignal Dy 4 i l where n is the of inlier pixels i e NumPix and X is pixel intensity in the feature The number of pixels that are removed as outliers at the high end and low end of the intensity distribution are shown in 4 columns of the FEATURES table NumPixOLLo and NumPixOLHi for both red and green channels Step 8 Calculate the mean signal of the local background BGMeanSignal The intensities of local background inlier pixels are averaged to give the local background mean signal The BGNumPix column in the result file lists the number of inlier pixels in the local background radius that remain after rejection of outlier pixels BGMeanSignal L xX 5 N ja where n is the of inlier pixels in the local background i e BGNumPix and X is the pixel intensity in the local background Step 9 Determine if the feature is saturated IsSaturated Feature is saturated if 50 of inlier pixels have intensity values above the saturation threshold 247 5 How Algorithms Calculate Results Flag Outliers oy is the measured variance of inlier pixels in the feature or background e g PixSDev2 or BGPix
32. Info dialog box 222 TIFF Tag 37701 TIFF Tag 37702 You can transfer the original TIFF file or a JPEG file to Rosetta Resolver or a third party program The shape file shp created during Feature Extraction cannot be displayed by any program other than Agilent Feature Extraction software TIFF file format options Feature Extraction supports the TIFF file format All file information for each file is listed in the File Info dialog box The TIFF file is compliant with Adobe version 6 0 file format The complete specification is available from the following URL http partners adobe com asn developer PDFS TN TIFF6 pdf There are two sets of custom TIFF tags in the Agilent file format Genetic Analysis Technology Consortium GATC TIFF Tags Agilent Technologies is not a member of GATC or otherwise connected to this organization and makes no internal use of these tags They are included for the convenience of customers who use software that requires them Custom TIFF Tags Agilent Technologies uses its own custom TIFF tags for storing additional file information This tag points to a data structure This data structure is not public but information stored in the data structure is available to customers in the MATLAB file format This tag points to a string containing the file description The usual TIFF description tags tag 270 are used to hold the color name red or green for each image This allows programs
33. Log Ratio Figure 31 Histogram of LogRatio plot Feature Extraction Reference Guide 103 2 OC Report Results Inter Feature Statistics 104 Spike in probes are known probes that are hybridized with known quantities of a target spike in cocktail They are used to perform a quality check of the microarray experiment Some microarray designs have replicated non control probes that is multiple features on the microarray contain the same probe sequence Many of the Agilent microarray designs also have spike in probes which are replicated across the microarray for example some microarrays have 10 sequences with 30 replicates each The QC Report uses these replicated probes to evaluate reproducibility of both the signals and the log ratios Metrics such as signal CV and log ratio statistics are calculated if probes are present with a minimum number of replicates The protocol indicates if labeled target to these spike in probes has been added in the hybridization QCMetrics_UseSpikelIns The minimum number of replicates inliers to Sat amp NonUnif flagging is also set in the protocol QCMetrics_minReplicate Population This section provides an explanation for each of the segments of the QC report that cover interfeature statistics and how these replicate statistics can help you assess performance Reproducibility Statistics CV Replicated Probes Non control probes If a non control probe has a minimum numb
34. Multiplier Background CV 2 Red Poissonian Noise Term Multiplier Red Background Constant Term Multiplier Green Poissonian Noise Term Multiplier Green Background Constant Term Multiplier Compute Bkgd Biasand Background Subtraction Method Error Significance for lsPosAndSignif and IsWellAboveBG 2 sided t test of feature vs background max p value WellAboveMulti Signal Correction Calculate Surface Fit required for Spatial Detrend Feature Set for Surface Fit Perform Filtering for Surface Fit Perform Spatial Detrending Signal Correction Adjust Background Globally Signal Correction Perform Multiplicative Detrending Detrend on Replicates Only 1 20 0 09000 3 No Background Subtraction Use Error Model for Significance 0 01 13 True FeaturesInNegativeControlRange True True False True True 40 Feature Extraction Reference Guide Table 5 Default settings for GE2_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 Correct Dye Biases Compute Ratios Calculate Metrics Filter Low signal probes from Fit Neg Ctrl Threshold Mult Detrend Factor Perform Filtering for Fit Robust Neg Ctrl Stats Choose universal error or most conservative MultErrorGreen MultErrorRed Auto Estimate Add Error Red Auto Estimate Add Error Green Use Surrogates Use Dye Norm List Dye Normalization
35. Only Multiplicative Detrend True KM1 Dye Norm Linear Lowess 012097_D_20070820 Linear Dy orm Factor 4 15 Red 15 9 Green 114 0 0 6 Additive Error 14 Red 65 Green Saturation Value 65211 r 65185 g NA NA 84 Evaluation Metrics for GE2_QCMT_Jul1i1 Good 12 Metric Name Value Excellent Good Evaluate IsGoodGrid 1 00 gt 1 lt 1 AnyColorPrentFeatNonUn 0 05 lt 1 gt 1 gNegCtriAveBGSubSig 3 61 20 to 10 lt 20 or gt 10 gNegCtrlSDevBGSubSig 2 83 lt 15 gt 15 rNegCtriAveBGSubSig 3 93 20 to 4 lt 20 or gt 4 rNegCtriSDevBGSubSig 2 07 lt 6 gt 6 gNonCntriMedCVBkSubSig 13 48 Oto18 lt 0 or gt 18 rNonCntriMedCVBkSubSig 15 05 Oto18 lt 0 or gt 18 gE1aMedCVBkSubSignal 10 57 Oto18 lt Oor gt 18 rElaMedCVBkSubSignal 10 21 Oto18 lt Oor gt 18 absEiaObsVsExpCorr 0 99 gt 0 86 lt 0 86 absE1aObsVsExpSlope 0 95 gt 0 85 lt 0 85 gDDN 15 to 15 lt 15 or gt 15 rDDN 15to 15 lt 15 or gt 15 Excellent Good Evaluate Figure 15 Partial QC Report Header and Evaluation Metrics with GE2 metric set with thresholds added Default protocol settings Feature Extraction Reference Guide QC Report Results 2 QC metric set results Spatial and Multiplicative Detrending Off Figure 16 is an example of a QC report header and Evaluation Metrics table generated from a 2 color gene expression extraction whose GE2 metric set with thresholds were added In this extraction spatial and multiplicative detren
36. Processing normalization All features designated for use in dye normalization are not fit to be used These features may be controls outliers or contain bad probe sequences 4000 There appears to be a large shift x x Data No pixels between the two scans in Processing red green Comes up if scans from XDR pair are not aligned 4000 Execution Error BGSub BGSub Error Data Yes Message Processing 4000 Found Feature d d d with 0 Data No pixels used to calculate mean Dubious Processing Significance 4006 SpotAnalyzer The background Radius Data Yes either calculated or specified is either Processing smaller than a single feature or larger than the scan Check the specified BGRadius or the Col and Row Spot Spacing of the Grid 4007 SpotAnalyzer Given the current Data Yes background Radius either calculated or Processing specified the region of interests for computing spot Statistics have no pixels Please check the Background Radius in the Protocol 317 318 Command Line Feature Extraction Table 41 XML error codes Error Error message Type Abort code 4015 The select Protocol requests use of a Data Yes Dye Norm list during Dye Normalization Processing but a Dye Norm List was not supplied either by external file or by GridTemplate default 5000 Execution error Cannot Open file etc 1 0 Error No a 5000 Print Failure 1 0 Error No 5000 Execution error Failed to generate a 1 0 Error
37. Red Green Saturated Features 16 1 99 of Sig Distrib 6850 1750 50 of Sig Distrib 82 64 1 of Sig Distrib 40 48 Negative Control Stats 6 Red Green Average Net Signals 42 32 53 24 StdDev Net Signals 2 19 3 59 Average BG Sub Signal 6 83 10 31 StdDev BG Sub Signal 2 19 3 59 Red and Green Background Corrected Signals Non Control Inliers 100000 as 1000 rBGSubSignal 1 10 100 10000 100000 9BGSubSignal 1000 Background Subtracted Signal Features NonCtrl with BGSubSignals lt 0 32 Red 148 Green Non Agilent GE2 QC Report p1 81 82 OC Report Results 8 Local Background Inliers on page 97 9 Foreground Surface Fit on page 97 10 Reproducibility Statistics CV Replicated Probes on page 104 11 Microarray Uniformity 2 color only on page 106 12 Spatial Distribution of Significantly Up Regulated and Down Regulated Features Positive and Negative Log Ratios on page 100 13 Plot of LogRatio vs Log ProcessedSignal on page 101 Red Green Number 22105 22153 Avg 49 77 41 00 sD 0 93 1 35 9 Foreground Surface Fit Red Green RMS_Fit 1 55 1 50 RMS_Resid 1 83 1 42 Avg_Fit 5 28 6 86 Reproducibility C for Replicated Probes 13 Median CV Signal inliers 1 0 Non Control probes Red Green BGSubSignal 14 95 1 00 ProcessedSignal NA NA Array Uniformity LogRatios 11 Non Control AbsAvgLogRatio 0 22 AverageS N 2 56
38. The CGH microarrays have many sequences of negative controls that span the range of sequence variability seen in the biological probes used on the microarrays This difference in the control grid especially the multiple sequences used for negative controls leads to a difference in protocol settings Hidden Settings CAUTION To create a protocol for a specific type of microarray you are required to use an Agilent created protocol or user created protocol for the same type of microarray Protocol templates provide both visible and hidden settings whose values are specific to the type or format of microarrays Although you can change the visible settings so that any two protocols of different type appear identical you cannot change the hidden settings that distinguish these protocols from one another The Tables of Default Protocol Settings show only the default visible parameter values for the steps of the protocol You can see the hidden parameters in the FE PARAMS table See Parameters options FEPARAMS on page 129 Many of these hidden parameters are image processing ones that are chosen using the Automatically Determine function 15 1 Default Protocol Settings Tables of Default Protocol Settings CAUTION These protocol settings may not be optimum for non Agilent microarrays or Agilent microarrays processed with non Agilent procedures You determine the settings and values that are optimum for your system
39. Third Party Feature Extraction Reference Guide 131 3 Text File Parameters and Results Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Place Grid GridPlacement_enableOriginXCal integer Indicates status of the Use the correlation 1 True method to obtain origin X of subgrids flag 0 False Place Grid GridPlacement_enableUseCentralPack integer Indicates status of the Use central part of 1 True pack for slope and skew calculation flag 0 False Place Grid GridPlacement_placementMode integer Mode of grid placement 0 Allow the grid to distort 1 Place the grid rigidly allowing only translation and rotation Optimize Grid Fit IterativeSpotFind_CornerAdjust integer Indicates whether or not the grid will be 0 False adjusted for better fit by looking at corner 1 True spots on the microarray Optimize Grid Fit IterativeSpotFind_AdjustThreshold float Grid will be adjusted if absolute average difference between grid and spot positions is greater than this fraction Optimize Grid Fit IterativeSpotFind_Maxlterations integer Maximum number of times spot finder algorithm is run to optimize the grid fit Optimize Grid Fit IterativeSpotFind_FoundSpot float Grid will be adjusted if this fraction or more Threshold of the features are considered found by the spot finder algorithm Optimize Grid Fit IterativeSpotFind_NumCornerFeatures integer Indicates the
40. and multi pack for which it is set to True Use the correlation method to Hidden if Array Format is set to obtain origin X of subgrids Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack for which it is set to True Optimize Grid Fit Grid Format The parameters and values for Automatically Determine optimizing the grid differ depending on the format Iteratively Adjust Corners Adjustment Threshold Maximum Number of Iterations Found Spot Threshold Number of Corner Feature Side Dimension Recognized formats 65 micron feature size 30 micron feature size and Third Party Hidden if Array Format is set to Automatically Determine True All Formats except Third Party False Third Party Hidden if Array Format is set to Automatically Determine 0 300 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 5 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 0 200 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 20 All Formats except Third Party Feature Extraction Reference Guide 17 Default Protocol Settings Default settings for CGH_1200_Jun14 protocol continued Default Setting Value v12 0 Depending on the format selected by the software or by you the default settings for this step change See the followin
41. and options contained within the QC text output file FEPARAMS table Protocol Step Parameters Type Options Description Protocol Name text Name of protocol used Protocol_date text Date the protocol was last modified Scan_ScannerName text Agilent scanner serial number used Scan_NumChannels integer Number of channels in the scan image Scan_date text Date the image was scanned Scan_MicronsPerPixelX float Number of microns per pixel in the X axis of the scan image Scan_MicronsPerPixelY float Number of microns per pixel in the Y axis of the scan image Scan_OriginalGUID text The global unique identifier for the scan image Scan_NumScanPass 1or2 For 5 micron scans indicates whether the scan mode was a single 1 or double pass scan mode on the Agilent Scanner Grid_Name text Grid template name or grid file name Grid_Date integer Date the grid template or grid file was created Grid_NumSubGridRows integer Number of subgrid columns Grid_NumSubGridCols integer Number of subgrid columns Grid_NumRows integer Number of spots per row of each subgrid Grid_NumCols integer Number of spots per column of each subgrid Grid_RowSpacing float Space between rows on the grid Grid_ColSpacing float Space between column on the grid Feature Extraction Reference Guide 153 3 Text File Parameters and Results Protocol Step Parameters Type Options Description Grid_OffsetX float In a dense pack array the offset in the X direction Gri
42. background signal local to corresponding feature computed per channel inlier pixels Median local background signal local to corresponding feature computed per channel inlier pixels Standard deviation of all inlier pixels per local BG of each feature computed independently in each channel The normalized Inter quartile range of all of the inlier pixels per local BG of each feature The range is computed independently in each channel Total number of saturated pixels per feature computed per channel Boolean flag indicating if a feature is saturated or not A feature is saturated IF 50 of the pixels in a feature are above the saturation threshold Reports if the feature signal value is from the scaled up low signal image or from the high signal image Ratio of estimated feature covariance in RedGreen space to product of feature standard deviation in Red Green space The covariance of two features measures their tendency to vary together i e to co vary In this case it is a cumulative quantitation of the tendency of pixels belonging to a particular feature in Red and Green spaces to co vary The same concept as above but in case of background Feature Extraction Reference Guide 183 3 Text File Parameters and Results Table 22 Feature results contained in the FULL output text file FULL FEATURES table continued Features Green Features Red Types Options Description glsFeatNonUnifOL rlsFea
43. calculate the statistic is defined as TotalProbeSignal longerProbe 39 TotalProbeSignal shorterProbe el ProbeRatio Log The Total Probe Signal is defined in Table 37 Statistics and Results for the MicroRNA Analysis see also Table 32 Algorithms Protocol Steps and the results they produce on page 230 on page 284 The other four statistics calculated are summarized as Gene Signals The Gene Signal is defined as GeneSignal Log TotalGeneSigna 40 The Total Gene Signal is defined in Table 37 Statistics and Results for the MicroRNA Analysis see also Table 32 Algorithms Protocol Steps and the results they produce on page 230 on page 284 The Statistics calculated are Feature Extraction Reference Guide 287 5 Table 38 miRNA Spike In Statistics 288 How Algorithms Calculate Results Statistic Name Statistic Type Description gdmr285GeneSignal gdmr31aGeneSignal gdmr6GeneSignal gdmr3GeneSignal gdmr6ProbeRatio gdmr3ProbeRatio float The Gene Signal for the dmr285 miRNA Note that the leading g means the data is calculated from the green channel float The Gene Signal for the dmr31a miRNA Note that the leading g means the data is calculated from the green channel float The Gene Signal for the dmr6 miRNA Note that the leading g means the data is calculated from the green channel float The Gene Signal for the dmr3 miRNA Note tha
44. deviation of the mit spike ins below the linear concentration range LowThresholdError 2SPLostProcessedsignals A where the set A is from step a in the table Feature Extraction Reference Guide 119 2 120 OC Report Results Accuracy of linear fit to middle of sigmoidal curve Agilent calculated the difference between expected log processed signals at the high and low relative concentrations on the linear curve with the expected log signals for the same concentrations on the sigmoidal curve For the high end of the linear range the difference is 15 36 For the low end of the linear range the difference is 16 75 Feature Extraction Reference Guide OC Report Results QC Report Results in the FEPARAMS and Stats Tables See Parameters options The FEPARAMS table contains most of the QC header FEPARAMS on page 129 and information The Stats table output contains all the metrics Statistical results STATS on shown on the QC Reports These QC stats let you make page 159 of this guide for tracking charts of individual metrics that you may want to descriptions of the parameters and follow over time To separate out the FEPARAMS and Stats statistics listed in the tables tables from each other and the FEATURES table see the Feature Extraction 12 0 User Guide Feature Extraction Reference Guide 2 121 2 QC Report Results QC Metric Set Results You can display the QC Metric Set Prope
45. enough to warrant XDR This can be ignored Most likely the background on this array is high Check the QC report Could show an ozone problem if the red ratio is always off Scanner PMT calibration should be checked but the effect on the data is arguably small in the two color case because of dye normalization Protocol Error Run correct Agilent protocol This warning will NOT come up when Feature Extraction is properly configured and standard tested protocols are used Didn t find enough non Control replicates to detrend Doesn t effect data Use a design with replicated features at least 75 total replicates more is better with at least 5 replicates per feature with at least 5 different probes replicated OR run detrending using all features not replicated ones 320 Feature Extraction Reference Guide Feature Extraction Reference Guide Command Line Feature Extraction 6 Warning Warning message Resolution code 1031 Multiplicative Detrending will Probably indicates another not be performed s Channel problem This array should be did not find enough suitable looked at features to be able to reliably detrend 1032 Multiplicative detrending effect Need at least 5 replicates per inconclusive CVs increasing feature with at least 5 different detrending removed probes replicated If detrending doesn t help the data then we turn it off Maybe we fit noise This warning can be ignored 1033 B
46. equation and their output in the stats table Table 16 Spike In Concentration Response Statistics for 1 color microarrays Statistic Description Where in calculations Stats Table Output Saturation Point upper limit of detection max step b eQCOneColorLogHighSignal Low Threshold lower limit of detection min step a eQCOneColorLogLowSignal Low Threshold Error error for lower limit See equation below table eQCOneColorLogLowSignalError Low Signal lowest quantifiable signal lowest signal from linear eQCOneColorLinFitLogLowSignal in linear range fit in step h High Signal highest quantifiable signal highest signal from linear eQCOneColorLinFitLogHighSignal in linear range fit in step h 118 Feature Extraction Reference Guide QC Report Results 2 Table 16 Spike In Concentration Response Statistics for 1 color microarrays Statistic Description Where in calculations Stats Table Output Low Relative Concentration High Relative Concentration lowest concentration leading to quantifiable signal highest concentration leading to quantifiable signal x0 2 3w in step f x0 2 2w in step g eQCOneColorLinFitLogLowConc eQCOneColorLinFitLogHighConc Slope slope of the linear fit on from step h eQCOneColorLinFitSlope sigmoidal curve R 2 Value correlation coefficient for from steph eQCOneColorLinFitRSO linear fit Spikeln Detection Limit The average plus 1 from step i eQCOneColorSpikelnDetectionLi standard
47. exists and is spelled correctly Invalid input file Check that you specified a valid input file name Request ignored If you receive this code when you are adding a protocol or grid template the object already exists in the database and will not be added If you receive this code when you are deleting objects the object was not found in the database No license or invalid license Check the existence location and expiration date of your Feature Extraction license Initialization failure MFC failed to initialize Call tech support Initialization failure COM failed to initialize Call tech support Invalid command line arguments Check spelling and syntax Feature extraction failed Call tech support 305 6 Command Line Feature Extraction Table 40 FeNoWindows return codes Return code Description 9 Feature Extraction failed to add or remove a protocol Database could be down Restart the database by rebooting or starting the AGTFEDB service from the control panel 10 Feature Extraction failed to add or remove a grid template Restart the database 11 The grid template or protocol link failed Restart the database 306 Feature Extraction Reference Guide Extraction Input Project Properties Settings Note that MAGEOutPkgType and TextOutPkgType are Full This means all the features are sent to the output file A compact subset of features is the alternate choice See Chapter
48. for the significance calculation surrogate calculation and multiplicative detrending steps Correct Dye Biases Since dye bias between the red and green channels is a common phenomenon in a dual color microarray platform this algorithm adjusts for the bias by multiplying the background subtracted signals with the appropriate dye normalization factors Both linear and non linear locally weighted normalization methods are available Surrogates are applied after the dye norm fit and before the dye normalization takes place This ensures that only real data contribute to the fit and also surrogate data is correctly dye normalized for both the Linear and Lowess options Because 1 color experiments use only the green channel they do not use this protocol step Surrogates exist and can be used for 1 color Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results 5 Compute Ratios This algorithm determines if a feature is differentially expressed by calculating the log ratio of the red over green processed signals The processed signal is the dye normalized signal Because 1 color experiments use only the green channel they do not use this protocol step MicroRNA Analysis This step is used in the 1 color miRNA analysis after background effects have been accounted for The algorithms in this step calculate the TotalGeneSignal the TotalGeneError The GeneSignal and the ProbeRatio
49. i e to co vary In this case it is a cumulative quantitation of the tendency of pixels belonging to a particular feature in Red and Green spaces to co vary The same concept as above but in case of background Integer indicating if a feature is a NonUniformity Outlier or not A feature is non uniform if the pixel noise of feature exceeds a threshold established for a uniform feature The same concept as above but for background Feature Extraction Reference Guide 213 4 MAGE ML XML File Results Table 29 Feature results Full contained in the MAGE ML FEATURES table Quant Features Green Features Red Options Description Type SQT glsFeatPopnOL rlsFeatPopnOL g r lsFeatPopnOL 1 Boolean flag indicating if a feature is a indicates Feature isa Population Outlier or not Probes with population outlier in replicate features on a microarray are g r examined using population statistics A feature is a population outlier if its signal is less than a lower threshold or exceeds an upper threshold determined using a multiplier 1 42 times the interquartile range i e IQR of the population SQT glsBGPopnOL rlsBGPopnOL g r IsBGPopnOL 1 The same concept as above but for indicates local background background is a population outlier in g r SQT IsManualFlag SQT gBGSubSignal rBGSubSignal gBGSubSignal Background subtracted signal gMeanSignal To display the values used to calculate gBGUsed this vari
50. in protocol is true then the values of TotalGeneSignal less than 0 1 will be set to 0 1 for the calculation Otherwise the original value for TotalGeneSignal is used in the calculation 174 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description gdmr31aGeneSignal gdmr6GeneSignal gdmr3GeneSignal rdmr31aGeneSignal rdmr6GeneSignal rdmr3GeneSignal float float float These are metrics for miRNA only This is the logj9 transformed value of TotalGeneSignal for the miRNA spikein gene dmr31a within the subtype mask 8196 If the parameter Do you want minimum signal value as 0 1 value in protocol is true then the values of TotalGeneSignal less than 0 1 will be set to 0 1 for the calculation Otherwise the original value for TotalGeneSignal is used in the calculation These are metrics for miRNA only This is the log 9 transformed value of TotalGeneSignal for the miRNA spikein gene dmr6 within the subtype mask 8196 If the parameter Do you want minimum signal value as 0 1 value in protocol is true then the values of TotalGeneSignal less than 0 1 will be set to 0 1 for the calculation Otherwise the original value for TotalGeneSignal is used in the calculation These are metrics for miRNA only This is the log 9 transforme
51. integer 1 Only positive and significant signals 2 All positive signals 3 All negative and positive signals Correct Dye Biases DyeNorm_CorrMethod integer Methods for computation of dye normalization factor to remove dye bias 0 Linear 1 Linear amp LOWESS locally weighted linear regression preceded by linear scaling in each dye channel LOWESS locally weighted linear 2 regression 144 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Correct Dye Biases DyeNorm_LOWESSSmoothFactor float Smoothing parameter Neighborhood size for LOWESS curve fitting Correct Dye Biases DyeNorm_LOWESSNumSteps integer Number of iterations in LOWESS Correct Dye Biases DyeNorm_RankTolerance float The threshold to pick rank consistent features between 2 channels for measuring dye biases Correct Dye Biases DyeNorm_VariableRankTolerance integer Allows the rank tolerance to vary with 1 True signal level to allow a fixed percentage of the data to be considered rank consistent 0 False Correct Dye Biases DyeNorm_MaxRankedSize integer The limit on the number of points used for the dye normalization set If the number is greater than this a random subset is chosen using this number of points Correct Dye Biases DyeNorm_IsBGPopnOLOn integer 1 True Software excludes any features from
52. lsPosAndSignif and additionally the g r BGSubSignal is greater than 2 6 g r BGSDUsed SQT Specialized Quantitation Type Feature Extraction Reference Guide 219 4 MAGE ML XML File Results Helpful hints for transferring Agilent output files XML output 220 There are several situations you should be aware of as you use MAGE ML XML output with gene expression data analysis software from Rosetta BioSoftware Rosetta Resolver software If there is no barcode If there is no barcode in the original tif file for whatever reason there will be no barcode information in the MAGE ML output warning message in Project Run summary For the data to load into Rosetta Resolver it must have a barcode associated with it You can add barcode information in the Scan Image Properties dialog box See the Feature Extraction 12 0 User Guide Access control list ACL Rosetta Resolver knows about the access control list ACL assigned to the scan and can easily recognize and load any MAGE ML file The owner of the data sets the chip and hybe access controls in Rosetta Resolver before importing the profile scan data For autoimport the profile is normally placed in the MAGE directory XML Control Type output If a feature is used in dye normalization its Control_Type is normalization even though it can also be a positive or negative control If a feature is not used in normalization it is either positive negative delet
53. non uniformity outliers The number of features that are flagged as population outliers The number of local background regions that are flagged as non uniformity outliers The number of local background regions that are flagged as population outliers Software estimated scanner offset 160 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description gGlobalFeatInlierAve rGlobalFeatInlierAve gGlobalFeatlnlierSDev rGlobalFeatInlierSDev gGlobalFeatInlierNum rGlobalFeatInlierNum AllColorPrentSat AnyColorPrentSat AnyColorPrentFeatNonUnifOL AnyColorPrentBGNonUnifOL AnyColorPrentFeatPopnOL AnyColorPrentBGPopnOL TotalPrentFeatOL gBGAdjust rBGAdjust gNumNegBGSubFeat rNumNegBGSubFeat float float float float float float float float float float float integer Average of all inlier features Standard deviation of all inlier features Number of all inlier features The percentage of features that are saturated in both the green AND red channels The percentage of features that are saturated in either the green or red channel The percentage of features that are feature non uniformity outliers in either channel The percentage of local backgrounds that are non uniformity outliers in either channel The percentage of fea
54. on page 107 it ee eee Ave_8GSubSignal CV for Red CV for Green Median 9 0V10 21 Red 10 57 Green 15 Reproducibility plot for 2 color gene expression spike in probes on Figure 2 2 color Gene Expression QC Report with Spike ins p2 Feature Extraction Reference Guide 16 2 color gene expression spike in signal statistics on page 111 17 Spike in Linearity Check for 2 color Gene Expression on page 113 18 OC Metric Set Results on page 122 Feature Extraction Reference Guide QC Report Results 16 Agilent SpikeIns Signal Statistics Probe Name Exp Obs sD S N ELA_r60_n9 1 00 0 01 106 67 E1A_r60_a107 0 48 0 01 56 44 E1A_r60_a135 0 48 0 01 53 69 E1A_r60_n11 0 48 0 56 0 02 32 96 E1A_r60_1 0 00 0 04 0 01 4 48 ELA_r60_a20 000 0 19 0 01 19 76 E1A_r60_3 048 0 44 0 02 28 48 E1A_r60_a104 048 0 35 0 01 34 03 E1A_r60_a97 048 0 34 0 02 14 13 E1A_r60_az2z 1 00 0 83 0 01 80 03 Evaluation Metrics for GE2_QCMT_Jul11 Good 12 Metric Name Value Excellent Good Evaluate 1 8 IsGoodGrid 100 gt 1 lt 1 AnyColorPrentFeatNonUn ac lt 1 gt i gNegCtrlAveBGSubSig 3261 20 to 10 lt 20 or gt 10 gNegCulSDevB GSubSig 283 lt 15 gt 15 rNegOrlAveBGSubSig 2 2004 lt 200r gt 4 rNegOrlSDevBGSubSig 207 lt 6 gt 6 gNonQntlMedCvBkSubSig 1348 Oto18 lt 0or gt 18 rNonCntriMedCVBkSub Sig 15065 Oto18 lt Oor gt 18 gE1aMedCvBkSubSignal 10 5
55. only the first two stats as shown in this example 314 Message 1 Red and 0 Green saturated features MessagelID 62 gt lt ResultMessages Status Success Message 16 Red and 13 Green feature non uniformity outliers MessagelID 63 gt lt ResultMessages Status Warning Message Multiplicative detrending effect inconclusive CVs increasing detrending removed MessageID 1032 gt lt Result gt lt StatsTable gt lt Stats Type float Name gDarkOffsetAverage Value 24 gt lt Stats Type float Name gDarkOffsetMedian Value 24 gt lt StatsTable gt lt Array gt lt Arrays gt lt ExtractionResult Status Warning gt The overall result of the slide lt ResultMessages Status Success Message Grid Template in us 014077 D 20051222 Feature Extraction Reference Guide All result messages in the extraction result entity are slide level messages These are the same messages that show up in the batch Run Summary Each message has a message ID associated with it If the message is Error or Warning then message ID indicates the type of failure or in which module the failure occurred The errors and warnings are summarized in the tables at the end of this chapter Error codes from XML file The bold error codes do not correspond to unique error messages but instead tell you in which module the software had an error Feature Extraction R
56. or grid file was created Grid_NumSubGridRows integer Number of subgrid columns Grid_NumSubGridCols integer Number of subgrid columns Grid_NumRows integer Number of spots per row of each subgrid Grid_NumCols integer Number of spots per column of each subgrid Grid_RowSpacing float Space between rows on the grid Grid_ColSpacing float Space between column on the grid 156 Feature Extraction Reference Guide Text File Parameters and Results 3 Protocol Step Parameters Type Options Description Grid_OffsetX float In a dense pack array the offset in the X direction Grid_OffsetY float In a dense pack array the offset in the Y direction Grid_NomSpotWidth float Nominal width in microns of a spot from grid Grid_NomSpotHeight float Nominal height in microns of a spot from grid Grid_GenomicBuild text The build of the genome used to create the annotation if available If the genome build is not available not all designs have this information then it is not put out All recent and all future designs have it FeatureExtractor_Barcode text Barcode of the Agilent microarray read from the scan image FeatureExtractor_Sample text Names of hybridized samples red green FeatureExtractor_ScanFileName text Name of the scan file used for Feature Extraction FeatureExtractor_ArrayName text Microarray filename FeatureExtractor_ScanFileGUID text GUID of the scan file FeatureExtractor_DesignFileName text Design or grid file used for Fe
57. rBGPixSDev rlsSaturated rlsLowPMTScaled Up rlsFeatNonUnifOL float float float float boolean boolean boolean 1 Saturated or 0 Not saturated 1 Low 0 High g r lsFeatNonUnifO L 1 indicates Feature is a non uniformity outlier in g r The universal or propagated error left after all the processing steps of Feature Extraction have been completed In the case of one color ProcessedSignalError has had the Error Model applied and will contain at least the larger of the universal UEM error or the propagated error If multiplicative detrending is performed ProcessedSignalError contains the error propagated from detrending This is done by dividing the error by the normalized MultDetrendSignal Raw median signal of feature in green red channel inlier pixels Median local background signal local to corresponding feature computed per channel inlier pixels Standard deviation of all inlier pixels per local BG of each feature computed independently in each channel Boolean flag indicating if a feature is saturated or not A feature is saturated IF 50 of the pixels in a feature are above the saturation threshold Reports if the feature signal value is from the scaled up low signal image or from the high signal image Boolean flag indicating if a feature is a NonUniformity Outlier or not A feature is non uniform if the pixel noise of feature exceeds a threshold established for a
58. scanner Amount of movement in the autofocus because of fluctuations in the glass Resulting signal when dark offset value is subtracted Table 28 Feature Extraction protocol parameters in MAGE ML result file Differences between FEPARAMS in text file and MAGE ML file Text File FEPARAMS MAGE ML File FEPARAMS Ratio_ErrorModel Ratio_AddErrorRed Ratio _AddErrorGreen Ratio_MultErrorRed Ratio_MultErrorGreen Error Model Red ADDITIVE_ERROR Green ADDITIVE_ERROR Red MULTIPLICATIVE_ERROR Green MULTIPLICATIVE_ERROR For 1 color red signals and log ratios are not included in the MAGE ML output files Feature Extraction Reference Guide 209 4 MAGE ML XML File Results Table 29 Feature results Full contained in the MAGE ML FEATURES table Quant Type Features Green Features Red Options Description sat SOT Ratio Error PValue SOT X_IMAGE_POSITION Y_IMAGE_POSITION SpotExtentX SpotExtentY LogRatio base 10 LogRatioError 1000 PValueLogRatio Non zero value 0 gSurrogateUsed rSurrogateUsed Found coordinates of the feature centroid Diameter of the spot X or Y Axis log REDsignal GREENsignal per feature processed signals used to calculate log ratio If SURROGATES are turned off then if DyeNormRedSig lt 0 0 amp DyeNormGreenSig gt 0 0 if DyeNormRedSig gt 0 0 amp DyeNormGreenSig lt 0 0 if DyeNormRedSig lt 0 0 amp DyeNorm
59. signal features end up at this offset Appears when Globally adjust background is turned on Compute Bkgd BGSubtractor_CalculateSurface integer Bias and Error MetricsOn 1 True Surface fit is done and metrics calculated 0 False Surface fit and metrics are not done Compute Bkgd BGSubtractor_SpatialDetrendOn integer Bias and Error 1 True Spatial detrend turned on 0 False Spatial detrend turned off Compute Bkgd BGSubtractor_DetrendLowPassfFilter integer Bias and Error 1 True Low pass filter used 0 False Low pass filter not used Compute Bkgd BGSubtractor_DetrendLowPass integer Specifies percentage of features based on Bias and Error Percentage the lowest intensity probes in each window that will be used to fit the surface Compute Bkgd BGSubtractor_DetrendLowPass integer Specifies size of the square window by the Bias and Error Window number of rows and columns The specified percentage of low intensity features is selected from this window size 140 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Compute Bkgd BGSubtractor_DetrendLowPass integer The increment in number of features by Bias and Error Increment which the above window is shifted horizontally and vertically on the microarray Compute Bkgd BGSubtractor_NegCtrlSpreadCoeff float The nu
60. slide for contamination and make sure that the scan region does not overlap the barcode or other non transparent areas of the slide Check the scan image for anomalies and then rescan if necessary The AutoFocus was suspended during the low PMT scan for xxx xx of time longer period than the threshold xxx xx Inspect the surface of the slide for contamination and make sure that the scan region does not overlap the barcode or other non transparent areas of the slide Check the scan image for anomalies and then rescan if necessary There is no barcode array identifier in the scan header MAGE GEML output is invalid Extraction of s discarded before completion QCMetrics Totals Found d of d Individual Metrics In Range Overall the Array Rescan the array Rescan the array If the scan is correctly named then the MAGE ML output will be again valid This warning can be ignored When Running the software using a metric set with thresholds and evaluation criteria the array wasn t in range of the given metrics and needs to be looked at This would be important for a user to take a look at the data before further processing Feature Extraction Reference Guide 323 324 Command Line Feature Extraction Warning Warning message Resolution code 1051 BGSubtract There are no negative controls on this array Switching background method to MinFeat 1052 BGSubtract Failed t
61. square area of features in each corner of the microarray to be used to calculate the average difference Find Spots SpotAnalysis_Version text Version of the spot analysis algorithm Find Spots SpotAnalysis_weakthresh float Minimum difference between the average intensities of feature and background after Kmeans Initialization Find Spots SpotAnalysis_MinimumNumPixels integer Minimum number of pixels required for the spot analysis 132 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Find Spots SpotAnalysis_RegionOflnterest float Multiplier that defines how big the Region Multiplier of Interest ROI is in terms of nominal spot spacing Find Spots SpotAnalysis_convergence_factor float Convergence factor of KMeans algorithm Find Spots SpotAnalysis_max_em_iter integer Maximum number of iterations of the Bayesian Classification Find Spots SpotAnalysis_max_reject_ratio float Maximum fraction of pixels to be rejected while software performs spotfinding Find Spots SpotAnalysis_kmeans_rad_reject float Factor that defines how much individual factor spot size may vary relative to the nominal spot size Find Spots SpotAnalysis_kmeans_cen_reject float Factor that defines how far the actual factor centroid may move relative to its nominal grid position in terms
62. that interpret only standard TIFF tags to determine image colors The Page Name tag tag 285 also contains the color names Feature Extraction Reference Guide Agilent Feature Extraction 12 0 Reference Guide 5 How Algorithms Calculate Results Overview of Feature Extraction algorithms 224 XDR Extraction Process 234 How each algorithm calculates a result 238 Example calculations for feature 12519 of Agilent Human 22K image 290 This chapter shows you how each Feature Extraction algorithm uses its parameters to calculate results that are passed on to the next algorithm and finally on to third party data analysis programs a Agilent Technologies 223 5 How Algorithms Calculate Results Overview of Feature Extraction algorithms Protocol step algorithms operate similarly during the Feature Extraction process for 2 color gene expression CGH ChIP and non Agilent microarrays That is the algorithms and parameter fields are similar but the parameter values are different depending on the protocol The Feature Extraction process for 1 color gene expression microarrays includes only seven protocol steps and for miRNA analysis the process includes those seven steps plus a MicroRNA Analysis step The examples used are primarily for 2 color microarrays Any differences in algorithms and functions for other microarray experiments are also explained Algorithms and functions they perform For more information on the algorit
63. the dye normalization set if the local backgrounds associated with those features have been flagged as population outliers in either channel The default recommendation is False 0 False Compute Ratios Ratio_Version text Version of Ratio algorithm Compute Ratios Ratio_PegLogRatioValue float Both positive and negative log ratio values are capped to this absolute value miRNA Analysis miRNA_Analysis_OutputGeneView integer 1 True Output Geneview File 0 False Don t output Geneview File miRNA Analysis miRNA_Analysis_EffectiveFeatSizeOn integer 1 True Enable to analyze by effective feature size 0 False Disable analysis by effective feature size miRNA Analysis miRNA_Analysis_MaxFeatloCompEffe integer Maximum number of features ctiveFeatSize Feature Extraction Reference Guide 145 3 Text File Parameters and Results Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description miRNA Analysis miRNA_Analysis_MinNumRatiosToCo integer Maximum number of ratios mpEffectiveFeatSize miRNA Analysis miRNA_Analysis_LowSigPctileTtoComp float Low Signal Percentile EffectiveFeatSize miRNA Analysis miRNA_Analysis_HighSigPctilefoCom float High Signal Percentile pEffectiveFeatSize miRNA Analysis miRNA Analysis_HighRatioCutOff float Throw away ratios greater than this value miRNA Analysis miRNA_Analysis_DefEffectiveFeatSize float Frac miRNA Analysis m
64. the CookieCutter method for defining features The radius for the local background is estimated in the same way for the WholeSpot method Feature or cookie Exclusion zone Local background Figure 54 Local background in relation to other zones for CookieCutter method Default radius The default radius is the radius of the local background for one feature This radius is known as the SELF radius and its value is the default value that you see in the Find and Measure Spots protocol step if autoestimation is turned off Feature Extraction Reference Guide Although the radius can map a circle that appears to overlap other features the Feature Extraction program does not use these pixels to calculate the local background signal How Algorithms Calculate Results 5 Figure 55 Example of a SELF radius The value of the default radius in microns depends on the scan resolution and interspot spacing found in the TIFF and grid template or file shown in equation 1 Default Local Radius SELF 0 6 x Scan_resolution x Max Interspotspacing_x Interspotspacing_y 1 For the WholeSpot method if extraction stops at this step you may need to enter a larger radius than the protocol default radius Feature Extraction Reference Guide The software autoestimates the Default Local Radius if specified in the protocol Otherwise you can enter this radius in the Feature Extraction Protocol Editor Minimum radiu
65. the processed signals for each replicated probe features with the same sequence measured in the extraction The same is done for the processed Signal Error column by propagating the error c Calculates the Nominal Spot Area S in square microns S n SpotWidth 2 SpotHeight 2 34 d Multiplies each average by the total number of pixwls targeted by that probe The total number of Features S EffectiveFeatureSizeFraction e Further multiplies by weight where the weight is calculated as 1 30 000 The equations and descriptions for calculating each output or result column are listed in the following table Feature Extraction Reference Guide 283 5 How Algorithms Calculate Results Table 37 Statistics and Results for the MicroRNA Analysis see also Table 32 Algorithms Protocol Steps and the results they produce on page 230 Feature or Stat Equation or Description glotalProbeSignal Inp x gProcSignal pp ee fs ee E S W35 PR npr Where PR Index of Probe Replicates for given miRNA In Number of replicate population inliers Tot Total number of probe replicates E EffectiveFeatureSizeFraction S Nominal Spot Area equation described on previous page W Weight described on previous page And The number of probes used in the calculation is based on whether the protocol option Exclude Non Detected Probes was turned on or off For more information see the Feature Extraction 10 9 Use
66. to 15 lt 15 or gt 15 Excellent Good Evaluate Figure 17 QC Report Header and Evaluation Metrics with miRNA metric set with thresholds added Default protocol settings 86 Feature Extraction Reference Guide OC Report Headers QC Report Results 2 2 color Gene Expression OC Report Date Image Protocol User Name Grid FE Version Sample red green DyeNorm List No of Probes in DyeNorm List BG Method Background Detrend Multiplicative Detrend Dye Norm Linear DyeNorm Factor Additive Error Feature Extraction Reference Guide The following Feature Extraction information is found in the 2 color gene expression QC Report header Date and time that the QC Report was generated Name of the TIFF file that was extracted Name of the protocol used for the extraction Name of the user who set up the extraction Name of the grid template or grid file used Version of the Feature Extraction software used Names of Cy5 and Cy3 labeled samples Name of the dye normalization list Number of probes in the designated dye normalization probe list Type of background subtraction method used If Spatial Detrend was turned on or off during the extraction If Multiplicative Detrend was turned on or off during the extraction Type of dye normalization method used Global dye normalization factor determined for the linear portion of the correction method Additive portion of the error estimated in
67. types 221 Feature Extraction Reference Guide correct bkgd and signal biases calculate background subtracted feature signal 254 calculate significance 269 how background adjustment works 262 how multiplicative detrend algorithm works 1 color only 272 values for BGSubSignal BGUsed and BGSDUsed 254 correct dye biases calculate normalization factor 276 select normalization features 274 E example calculations 290 extraction input 307 extraction results example output file 313 status information 312 F feature flag info conversion of 221 features results 178 file format options 222 find and measure spots calculate mean signal of feature 246 calculate mean signal of local background 247 define features 242 estimate local background radius 242 reject pixel outliers 245 saturated features 247 flag outliers non uniformity 248 population 250 G GEML result file feature results 210 216 L log ratios from adjusted background subtracted signals 264 from unadjusted background subtracted signals 263 M MAGE ML format result file 207 MAGE ML result file feature results 210 216 protocol parameters 209 scan protocol parameters 208 multiplicative detrend algorithm 1 color 272 N nonuniformity outliers estimated feature or bkgd variance 248 measured feature or bkgd variance 250 0 outliers criteria for rejecting 246 interquartile range method 246 standard devia
68. uniform feature Feature Extraction Reference Guide 191 3 Text File Parameters and Results Table 23 Feature results contained in the COMPACT output text file COMPACT FEATURES table continued Features Green Features Red Types Options Description glsBGNonUnifOL rlsBGNonUnifOL boolean g r IsBGNonUnifOL The same concept as above but for 1 indicates Local background background is a non uniformity outlier in g r glsFeatPopnOL rlsFeatPopnOL boolean g r lsFeatPopnOL Boolean flag indicating if a feature is a 1 indicates Feature Population Outlier or not Probes with is a population replicate features on a microarray are outlier in g r examined using population statistics A feature is a population outlier if its signal is less than a lower threshold or exceeds an upper threshold determined using a multiplier 1 42 times the interquartile range i e IQR of the population glsBGPopnOL rlsBGPopnOL boolean g r IsBGPopnOL 1 The same concept as above but for indicates local background background is a population outlier in g r IsManualFlag boolean Flags features for downstream filtering in third party gene expression software gBGSubSignal rBGSubSignal float g r BGSubSignal Background subtracted signal To g r MeanSignal display the values used to calculate this g r BGUsed variable using different background signals and settings of spatial detrend and global background adjust see Table 34 on page 254 glsPosAn
69. variation in the population Bias and Error 1 True algorithm is turned on This algorithm 0 Fal repeats the population outlier IQR TARDE algorithm on all features classified as negative controls after the first pass of population algorithm has been run on each sequence You may want to use this algorithm when you see hot features that have not been flagged as population outliers or hot sequences where all features of the sequence have higher signals than those in other negative control sequences Compute Bkgd BGSubtractor_RobustNCOutlierFactor float To calculate robust IQR statistics the Bias and Error algorithm uses upper and lower limits that contain a Multiplier x IQR term This parameter is the Multiplier Compute Bkgd BGSubtractor_ErrorModel integer Choose universal error or the most Bias and Error conservative 2 Universal Error Model 0 Most Conservative Compute Bkgd BGSubtractor_MultErrorGreen float Multiplicative error component in Green Bias and Error channel Compute Bkgd BGSubtractor_MultErrorRed float Multiplicative error component in Red Bias and Error channel Compute Bkgd BGSubtractor_AutoEstimateAddErrorG integer Bias and Error reen 1 True Auto estimation turned on 0 False Auto estimation turned off Compute Bkgd BGSubtractor_AutoEstimateAddErrorR integer Bias and Error ed 1 True Auto estimation turned on 0 False Auto estimation turned off 142 Feature Extraction Reference Guide Text File Pa
70. which is set to True GE1 GE2 CGH ChIP miRNA CGH ChIP GE1 GE2 GE1 GE2 CGH ChIP CGH ChIP GE1 GE2 GE1 GE2 CGH ChIP GE1 CGH ChIP GE1 CGH ChIP GE1 GE2 GE2 NonAT CGH ChIP miRNA All except for miRNA miRNA All except for GE2 NonAT GE2 NonAT All except GE1 protocol and GE2 NonAT GE2 NonAT All except GE1 protocol and GE2 NonAT 62 Feature Extraction Reference Guide Default Protocol Settings 1 Table 13 Compute Bkgd Bias and Error Default values in common and differences for protocols continued Parameter Default values Protocols using Default Value False Additive Error Value Red 30 GE2 NonAT Auto Estimate Add Error Green True All except for GE2 NonAT False Additive Error Value GE2 NonAT Green 30 Use Surrogates True All except for miRNA False miRNA Feature Extraction Reference Guide 63 1 Default Protocol Settings Correct Dye Biases These parameters and values differ depending on the microarray type The GE1 protocol and the miRNA protocol do not correct for dye biases Table 14 Correct Dye Biases Default values in common and differences for protocols Parameter Default values Protocols using default values NA for GE1 and miRNA protocols Use Dye Norm List Automatically Determine All Dye Normalization Probe Selection Method Use Rank Consistent Probes All Rank Tolerance 0 050 All Variable Rank Tolerance False All Omit Background Population O
71. with high signal This result is from a two step calculation Step 1 for each probe calculates the absolute average log ratio of all inlier non control features with minimum number of replicates Step 2 calculates the average of all absolute average log ratios calculated in step 1 The average standard deviation of log ratios of all inlier non control probe sets with a minimum number of replicates The average of signal to noise values of the log ratio for all inlier non control probe sets with a minimum number of replicates 164 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description eQCAbsAveLogRatio eQCSDevLogRatio eQCSNRLogRatio AddErrorEstimateGreen AddErrorEstimateRed TotalNumFeatures NonCtrINumUpReg NonCtriNumDownReg eQCObsVsExpLRSlope eQCObsVsExpLRintercept float float float float float integer integer integer float float This result is from a two step calculation Step 1 for each probe calculates the absolute average log ratio of all inlier spikein features with minimum number of replicates Step 2 calculates the average of all absolute average log ratios calculated in step 1 Average standard deviation of log ratios of all inlier spike in probe sets with a minimum number of replicates Ave
72. 0 LogRatioError PValueLogRatio gProcessedSignal rProcessedSignal float float float float 1000 per feature log of rProcessedSignal gProcessedSignal If SURROGATES are turned off then if DyeNormRedSig lt 0 0 amp DyeNormGreenSig gt 0 0 if DyeNormRedSig gt 0 0 amp DyeNormGreenSig lt 0 0 if DyeNormRedSig lt 0 0 amp DyeNormGreenSig lt 0 0 If SURROGATES are turned off then if DyeNormRedSig lt 0 0 OR DyeNormGreenSig lt 0 0 IF SURROGATES are turned on then LogRatioError error of the log ratio calculated according to the error model chosen Significance level of the Log Ratio computed for a feature The signal left after all the Feature Extraction processing steps have been completed In the case of one color ProcesssedSignal contains the Multiplicatively Detrended BackgroundSubtracted Signal if the detrending is selected and helps If the detrending does not help this column will contain the BackgroundSubtractedSignal 190 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 23 Feature results contained in the COMPACT output text file COMPACT FEATURES table continued Features Green Features Red Types Options Description gProcessedSigError gMedianSignal gBGMedianSignal gBGPixSDev glsSaturated glsLowPMTScaled Up glsFeatNonUnifOL rProcessedSigError rMedianSignal rBGMedianSignal
73. 0 83 0 33 I 0 17 0 20 080 180 280 3 80 480 580 680 Log Concentration e ral 3 pi H t6 E a c 4 gt a Processed Sig Vs Concentration Figure 41 1 color QC Report Agilent Spikelns Log Signal vs Log Relative concentration Plot 114 Feature Extraction Reference Guide Feature Extraction Reference Guide QC Report Results 2 Table of Values for Concentration Response Plot 1 color only This table presents the values for the log signal vs log concentration plot shown in Figure 41 Agilent Spike In Concentration Response Statistics Linear Range Statistics Low Signal High Signal Low Relative Concentration High Relative Concentration Slope R 2 Value Signal Detection Limit Statistics Saturation Point Low Threshold Low Threshold Error Spike In Detection Limit Figure 42 1 color QC Report Agilent Spike In Concentration Response Statistics Detection of missing spike ins This section describes how Feature Extraction deals with missing spike ins Case 1 If the array has a Grid Template with NO SpikelIns in the design If standard protocol is run then Feature Extraction will give a Warning in the Summary Report that there are no SpikeIn probes If protocol has SpikeIn Used set to False then the QC metric table in the QC Report will show for values and black font instead of red green or blue fonts indica
74. 0 90 or eE aa A eee Signal Detection Limit Statistics DetectionLimit ans 0 01 2 lt 0 01 o gt 2 gDDN aco 15to 15 lt 150r gt 15 DDN 15t0 15 lt 150r gt 15 Saturation Point 5 81 Low Threshold 0 05 4 Excellent Good Evaluate Low Threshold Error 0 22 Spike In Detection Limit 0 75 Figure 6 1 color Gene Expression QC Report with Spike ins p3 Feature Extraction Reference Guide Streamlined CGH OC Report 1 QC Report Headers on page 87 2 Spot finding of Four Corners on page 90 3 Spatial Distribution of All Outliers on page 91 4 0C reports with metric sets added on page 83 5 Histogram of Signals Plot 1 color GE or CGH on page 96 6 Outlier Stats on page 91 Feature Extraction Reference Guide OC Report Results The streamlined CGH QC report provides QC metrics that are relevant to CGH application All log plots use log base 2 not 10 Page 1 of 2 QC Report Agilent Technologies 2 Color CGH oats Wednesday November 16 2011 Sam ple red green 14 44 User Name KM1 FE Version 11 0 0 6 Image Hu244K_CGH_251469312458 BG Method Detrend on NegC Protocol 1 CGH_1100_Jul11 Read Only Multiplicative Detrend True Grid 014693_D_F_20080627 Dye Norm Linear Saturation Value 65211 r 65151 g DyeNorm List NA No of Probes in DyeNorm List NA 2 Evaluation Metrics for CGH_QCMT_Juli1 4 Excellent 9 Good 4 Evaluate 1 Metric Name Value Excellent Good Evaluat
75. 16Jan gt lt Array ID 1 gt lt Sample Name gt lt Array gt lt Array ID 2 gt lt Sample Name gt lt Array gt 309 310 Command Line Feature Extraction tion input Example of extraction set with grid file lt Array ID 3 gt lt Sample Name gt lt Array gt lt Array ID 4 gt lt Sample Name gt lt Array gt lt Array ID 5 gt lt Sample Name gt lt Array gt lt Array ID 7 gt lt Sample Name gt lt Array gt lt Array ID 8 gt lt Sample Name gt lt Array gt lt Extraction gt If you are extracting with a grid file the Extraction entity structure will look like the following lt Extraction Name US14702375 251494710059 SQ1 gt lt Image Name C GridComparison US14702375 251494710059 SO1 tif gt lt Grid Name C GridComparison gridfile grid csv IsGridFile True gt lt Protocol Name miRNA_95 16Jan gt lt Array ID 1 gt lt Sample Name gt lt Array gt lt Array ID 2 gt lt Sample Name gt Feature Extraction Reference Guide Feature Extraction Reference Guide Command Line Feature Extraction Extraction Input lt Array gt lt Array ID 3 gt lt Sample Name gt lt Array gt lt Array ID 4 gt lt Sample Name gt lt Array gt lt Array ID 5 gt lt Sample Name gt lt Array gt lt Array ID 7 gt lt Sample Name gt lt Array g
76. 3 and Chapter 4 of the Reference Guide for a listing of the FULL and COMPACT sets of features sent to the text and MAGE ML result files Feature Extraction Reference Guide Command Line Feature Extraction 6 The input file for extraction is a Feature Extraction project standard not on time file with a file type of XML An example of a project file fep is shown To create project files use the Feature Extraction user interface and the instructions in the Quick Start Guide lt FeatureExtractionML gt ty lt FEPMLVerInfo VerMaj 2 VerMin 50 gt lt FEProject Operator Unknown ResultsDirectory ResultsLocationSameAsImage True OutputMAGE False MAGEOutPkgType Full OutputMAGECompressed False OutputJPEG False OutputText True TextOutPkgType Full Text ZipTxtFile False CropMultipackImage False OutputVisualResults True OutputGRID False OutputArrayQCReport True FTPSendTiffFile False FTPMachineDestination FTPPort 21 FTPUserName resolverftp FTPPassword FTPProfileDestinationFolder mage 307 308 Command Line Feature Extraction Extraction Input OverWritePreviousResults False RDAUserName _ For Resolver RDACtrlGroups For Resolver DefaultQCMetricSet _ No longer used AfterArrayPostProcessingStep AfterSlidePostProcessingStep AfterBatchPostProcessingStep ExternalDyeNormList DefaultP
77. 50 of Sig Distrib 1358 3 Outlier Stats on page 91 AT j Non Control probes Grid Normal FR Feature aral A Pagine Saturated Features o i PEE E een wey a 4 Spatial Distribution of All 3 99 of Sig Distrib 25555 50 of Sig Distrib 68 Outliers on page 91 reni nfo z 1 of Sig Distrib 14 Population 92 778 Histogram of Signals Plot Spatial Distribution of All Outliers on the Array 532 rows x 85 columns 500 450 w 6 3 E 5 300 5 250 200 5 3 150 100 50 0 wl amp 5 Net Signal Statistics on page 93 amp S 8 6 Histogram of Signals Plot 4 1 color GE or CGH on page 96 0 1 2 3 4 5 Log of BG SubSignal WM Histogram of Signals Features NonCtrl w ith BGSubSignal lt 0 4631 Green FeatureNonUnif Green 2 0 00 _GeneNonUnif Green 2 0 005 SBG NanUnifarm BG Papulation FeatuiePapulaiane Gre aunifarm Figure 4 1 color Gene Expression OC Report with Spike ins p1 72 Feature Extraction Reference Guide 7 Negative Control Stats on page 94 8 Local Background Inliers on page 97 9 Foreground Surface Fit on page 97 10 Multiplicative Surface Fit on page 99 11 Reproducibility Statistics CV Replicated Probes on page 104 12 1 color gene expression spike in signal statistics on page 112 13 Spatial Distribution of Median Signals for each Row and Col
78. 7 Oto18 lt Oor gt 1e rE1aMedC VBkSub Signal 1021 Oto18 lt Oor gt 18 absE1aQbsVsEup Corr as gt 0 86 lt 0 86 absE1aObsVsEnp Slope as gt 0 85 lt 0 85 DDN 15 to 15 lt 15 or gt 15 rDDN 15to 15 lt 15 or gt 15 Excellent Good Evaluate Figure 3 Page 3 of 3 Agilent SpikeIns Expected LogRatio s Observed LogRatio 1 00 Observed LogRatio 1 50 1 00 0 50 0 00 0 50 1 00 Expected LogRatio Standard Deviation of Log Ratio Intercept 0 085 Slope 0 954 R 2 0 986 2 color Gene Expression QC Report with Spike ins p3 71 2 OC Report Results 1 color Gene Expression OC Report This module shows you the organization of the 1 color gene expression QC report See the following figure and the figures on the next pages for links to information on each of the QC Report regions Page 1 of 3 1 0C Re ort Headers on QC Report Agilent Technologies 1 Color Gene Expression p Date Monday Noverrber 28 2011 17 58 Grid 014850_D_ 20070820 page 87 ina Hudat ak GEI 251485034396 H aa Method No Background Protocol 1 GE1_1100_Jul14 Read Only Background Detrend On FeatN CRange LoPass User Name KM1 Multiplicative Detrend True FE Version 11 0 0 7 Additive Error 2 Green Sarnple red green Saturation Value 640101 g 2 Spot finding of Four 2 7 ciammunaa et ee gilent Spiketns i Corners on page 90 Green Saturated Features 0 99 of Sig Distrib 193434 cl H rr
79. Agilent Feature Extraction 12 0 Reference Guide For Research Use Only Not for use in diagnostic procedures Agg Agilent Technologies Notices Agilent Technologies Inc 2015 No part of this manual may be reproduced in any form or by any means including elec tronic storage and retrieval or translation into a foreign language without prior agree ment and written consent from Agilent Technologies Inc as governed by United States and international copyright laws Edition G4460 90052 Revision A2 August 2015 Printed in USA Agilent Technologies Inc 5301 Stevens Creek Blvd Santa Clara CA 95051 Agilent Recognized Trademarks Microsoft is a U S registered trademark of Microsoft Corporation Windows NT is a U S registered trade mark of Microsoft Corporation Windows and MS Windows are U S registered trademarks of Microsoft Corpora tion Patents Portions of this product may be covered under US patent 6571005 licensed from the Regents of the University of California Warranty The material contained in this docu ment is provided as is and is sub ject to being changed without notice in future editions Further to the max imum extent permitted by applicable law Agilent disclaims all warranties either express or implied with regard to this manual and any information contained herein including but not limited to the implied warranties of merchantability and fitness
80. B linear and C constant terms of the polynomial fit for the expected noise for any type of microarray experiment Compute Bkgd Bias and Error This algorithm applies background subtraction to each feature to yield the background subtracted intensity You can also apply a spatial detrend algorithm to estimate and remove noise due to a systematic gradient on the microarray Another algorithm can correct for any underestimation or overestimation of the background in both the red and green channels of low intensity signals by applying a global background adjustment value to the background subtracted signals Before using the algorithm for estimating the error the system uses an algorithm to calculate robust negative control statistics for both CGH and miRNA data CGH microarrays have a variety of sequences that are used as negative controls Occasionally hot features are not flagged as population outliers In addition hot sequences may exist that is all features of that sequence have higher signals than features in other negative control sequences These problems can inflate NegC SD which is used in the calculation of AdditiveError for the CGH error model To provide an estimate of the error in the background subtracted signal calculation the error model is now calculated after background subtraction The 1 color error model has been changed to exactly mimic the 2 color error model To determine if the featu
81. BGPixSDev Standard deviation of all inlier pixels per Local BG of each feature computed independently in each channel SQT glsSaturated rlsSaturated 1 Saturated or Integer indicating if a feature is 0 Not saturated saturated or not A feature is saturated IF 50 of the pixels in a feature are above the saturation threshold Feature Extraction Reference Guide 217 4 MAGE ML XML File Results Table 30 Feature results Compact contained in the MAGE ML FEATURES table Quant Features Green Features Red Options Description Type SQT glsLowPMTScaledUp rlsLowPMTScaledUp 1 Low For XDR features this is an integer 0 High indicating if the low PMT value was used for the calculations or the high value SQT glsFeatNonUnifOL rlsFeatNonUnifOL g r lsFeatNonUnifOL Integer indicating if a feature is a 1 indicates NonUniformity Outlier or not A feature is Feature is a non uniform if the pixel noise of feature non uniformity exceeds a threshold established for a outlier in g r uniform feature SQT glsBGNonUnifOL rlsBGNonUnifOL g r IsBGNonUnifOL The same concept as above but for 1 indicates Local background background is a non uniformity outlier in g r SQT glsFeatPopnOL rlsFeatPopnOL g r lsFeatPopnOL Boolean flag indicating if a feature is a 1 indicates Feature Population Outlier or not Probes with is a population replicate features on a microarray are outlier in g r examined using population statistics
82. CTerm Flag Outliers OutlierFlagger_OLAutoComputeABC Flag Outliers OutlierFlagger_FeatBCoeff Flag Outliers OutlierFlagger_FeatCCoeff Flag Outliers OutlierFlagger_FeatBCoeff2 float float float float integer 1 True 0 False float float float Applies to feature specifies variance due to background noise of the scanner slide glass and other signal independent sources Applies to background specifies the intensity dependent variance and is set to the square of the CV Applies to background specifies the variance due to the Poisson distributed noise Applies to background specifies variance due to background noise of the scanner slide glass and other signal independent sources AutoCompute Outlier flagging turned on AutoCompute Outlier flagging turned off For Agilent protocols when this flag is turned on the polynomial is calculated automatically This means that all above Feature and BG terms for B and C no longer appear in the output Rather they are calculated automatically and appear in the STATS table Also the eight parameters following this row appear Feature Red Poissonian Noise Term Multiplier Feature Red Signal Constant Term Multiplier Feature Green Poissonian Noise Term Multiplier 136 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table
83. DetrendSignal gProcessedSigError BGSubSignalError E Dee a MultDetrendSignal Correct Dye Biases Step 21 Determine normalization features Normalization features are features used to evaluate the dye bias between the red and green channels Using All Probes method Under this method the initial normalization features are selected based on the following three criteria Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results 5 e Features are positive and significant versus the background e g IsPosAndSignif 1 e Features are non control e g ControlType 0 e Features are non outlier e g IsFeatNonUnifOL 0 IsFeatPopnOL 0 IsSaturated 0 Using List of Normalization Genes method Under this method the user selects the normalization features These features can be housekeeping genes or genes with no differential expression Using Rank Consistency Probes method Under this method the chosen normalization features simulate housekeeping genes These features fall within the central tendency of the data having consistent trends between the red and green channels They are selected based on the following two criteria e Features pass the three criteria described in the all significant non control and non outlier features method and e Features pass the rank consistency filter between the red and green channels Rank consistency fil
84. ElaObsVsExpSlope gt 0 85 lt 0 85 LogRatiolmbalance 0 26 to 0 26 0 75 to 0 26 or 0 26 to 0 7 lt 0 75 or gt 0 75 Figure 46 QC Metrics for GE2_QCMT_Jun14 metric set Metrics Metric Name Excellent Good Evaluate IsGoodGrid gt l NA oa AddErrorEstimateGreen lt 5 5to12 gt 12 AnyColorPrcntFeatPopn lt 8 8to15 gt 15 gNonCtriMedPrcntCVBG 0 to 10 10 to 15 lt 0 or gt 15 gTotalSignal75pctile LabelingSpike InSignal gt 2 50 lt 2 50 HybSpike InSignal gt 2 50 lt 2 50 StringencySpike InRatio QC Metrics for miRNA_QCMT_Jun14 metric set Feature Extraction Reference Guide QC Report Results 2 Evaluation Logic Metric Metric Evaluation Logic For details on how to associate a OC metric set with a protocol see the Feature Extraction User Guide Feature Extraction Reference Guide When a QC metric set is associated with a protocol it is used to evaluate results using up to three defined threshold values for given metrics Results are then flagged in the QC Report Evaluation Metrics table according to the logic described in the following diagram and tables Figure 48 shows the metric evaluation using three threshold levels The black dots indicate how a result is evaluated if its value is the same as a limit value Good Upper warning limit Good Lower limit Figure 48 Three level QC Metrics evaluation used for Feature Extraction The following tables describe how results are eva
85. FeNoWindows c removemetricset lt metricset_name gt metricset_name The path to the metric set file This command removes a dyenormlist from the database FeNoWindows c removedyenormlist g gridtemplatename lt dyenormlistname gt gridtemplatename Name of the grid template associated with the dye norm list to be removed dyenormlistname Name of the dye norm list to be removed Example FeNoWindows c removedyenormlist g 14850_D_F_20060807 MyNormlist This command links a protocol to a grid template so that the protocol is automatically assigned if a valid scan barcode exists Command example FeNoWindows c linkprotocoltogrid p myOneColorProtocol q OneColor 012345 _D_ 20050212 Feature Extraction Reference Guide exportprotocols exportmetricsets exportdyenormlists barcode XDRScan ID Feature Extraction Reference Guide Command Line Feature Extraction 6 FeNoWindows c linkprotocoltogrid p protocol q linktype lt gridname gt linktype Type of link either OneColor or TwoColor that links protocol to grid template This command exports all the protocols in a given database to the location you specify FeNoWindows c exportprotocols lt to_directory gt to_directory The complete path to the directory where you want to keep the protocols This command exports all the metric sets in a given database to the location you specify FeNoWindows c exportmetricsets lt to_directory gt to_direc
86. GEML output 0 False Feature Extraction Reference Guide 137 3 Text File Parameters and Results Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Compute Bkgd BGSubtractor_MultiplicativeDetrend integer Enables multiplicative detrending Bias and Error On 1 True 1 color and CGH microarray protocols have 0 False this parameter enabled Compute Bkgd BGSubtractor_MultDetrendWinFilter integer No filtering Bias and Error 0 Average filtering 1 Median filtering 2 Compute Bkgd BGSubtractor_MultDetrendIncrement integer The increment in number of features by Bias and Error which the square window is shifted horizontally and vertically on the microarray Compute Bkgd BGSubtractor_MultDetrendWindow integer Specifies size of the square window by the Bias and Error number of rows and columns The specified percentage of low intensity features is selected from this window size Compute Bkgd BGSubtractor_MultDetrendNeighbor float Specifies the fraction of total number of Bias and Error hoodSize 0 1 neighborhood data points that will be weighted for linear regression during surface fitting for each data point Compute Bkgd BGSubtractor_MultHighPassFilter integer Enables rejection of probes close to Bias and Error 1 True zero signal from the set of features used in 0 False the fit Compute Bkgd BGSubtractor_PolynomialMultipli integer T
87. GSub Failed to automatically Won t come up using standard estimate additive error Value protocols The surface fit needs num has been used as the to be calculated Red Green additive error 1034 The auto estimate of the Won t come up using standard additive error used only Negative protocols The surface fit needs Control statistics for this array to be calculated 1036 The CGH OCReport cannot be Won t come up using standard generated for one color Data protocols 1037 CGH is nota one color protocol Won t come up using standard No valid formulation exists protocols Ignoring the protocol s parameter UseSpikelns 1038 Not enough significant eQC Maybe nothing was Spiked In If replicates for some probes Their Spikeln s were used then this statistics will be set to zero indicates another problem This array should be looked at 1039 There are no eQC probes on this The design in use has no array Cannot perform a fit of the data spike ins defined Can be ignored or you can create a special protocol just turning off Spike ins 321 322 Command Line Feature Extraction Warning Warning message Resolution code 1040 The Spikelns on this array Either nothing was Spiked in or appear suspect Software is there is another problem with unable to make a fit of the data the data Setting the fit statistics to 0 1041 The Spikelns on this array Either nothing was Spiked in or appear suspect Most ofthe eQC th
88. GreenSig lt 0 0 If SURROGATES are turned off then if DyeNormRedSig lt 0 0 OR DyeNormGreenSig lt 0 0 IF SURROGATES are turned on then LogRatioError error of the log ratio calculated according to the error model chosen Significance level of the Log Ratio computed for a feature The g r surrogate value used No surrogate value used 210 Feature Extraction Reference Guide MAGE ML XML File Results 4 Table 29 Feature results Full contained in the MAGE ML FEATURES table Quant Type Features Green Features Red Options Description SOT Derived Signal Error SOT SOT SOT glsFound Green DerivedSignal Green ProcessedSig Error gNumPixOLHi gNumPixOLLo gNumPix rlsFound Red DerivedSignal Red ProcessedSig Error rNumPixOLHi rNumPixOLLo rNumPix 1 IsFound 0 IsNotFound A boolean used to flag found strong features The flag is applied independently in each channel A feature is considered found if the calculated spot centroid is within the bounds of the spot deviation limit with respect to corresponding nominal centroid NOTE IsFound was previously termed IsStrong The propagated feature signal per channel used for computation of log ratio Standard error of propagated feature signal per channel Number of outlier pixels per feature with intensity gt upper threshold set via the pixel outlier rejection method The number
89. L Features Table 200 Other text result file annotations 204 Feature Extraction produces a tab delimited text file that contains three tables of input parameters and output results These tables are FEPARAMS STATS and FEATURES These three tables list all the possible parameters statistics and feature results that can be generated in the text output file Contains input parameters and options used to run Feature Extraction Gives results derived from statistical calculations that apply to all features on the microarray Displays results for each feature in over 90 output columns such as gene name log ratio processed signal mean signal or dye normalized signal ia Agilent Technologies 127 3 128 Text File Parameters and Results You have the option in the Project Properties sheet of selecting to generate either the FULL set of parameters statistics and feature information COMPACT QC or MINIMAL COMPACT output package is the default The COMPACT output package contains only those columns that are required by GeneSpring and DNA Analytics software The tables on the following pages present the text file summary for all output package types FULL COMPACT QC or MINIMAL The parameters statistical results and feature results included vary for any one output file depending on the application and protocol used for Feature Extraction You also have the option to generate one file with all three tables or three sepa
90. Low Level Runtime Memory Yes Error Table 42 XML warning codes Warning Warning message Resolution code 1024 The scan resolution is not Rescan the image in 5 micron sufficient for the density of the mode design Gridding might be off intensities might be imprecise 1060 Agilent does not support this See Table 1 Supported Scans configuration please consult the support matrix in the Feature Extraction users guide for a supported configuration and Array Formats in the Feature Extraction User Guide 319 6 Command Line Feature Extraction Warning code Warning message Resolution 1125 1126 1127 1128 1029 1031 The computation of the XDR fit for red green is based on only num pairs of high PMT low PMT matching values The computation of the XDR fit for red green is based ona small range of values low PMT range xx xx The computation of the XDR fit for red green results in a large intercept xx xx The computed XDR ratio for red green is xx xx vs expected xx xx from PMT settings Check scanner calibration Feature Significance will be computed on Pixel Statistics since the Error Model is turned off Multiplicative Detrending will not be performed red green Channel did not find enough suitable replicated features to be able to reliably detrend Signal ranges of the scan are not high enough to warrant XDR This can be ignored Most likely the signal ranges are not high
91. No picture of grid corners 5000 Error accessing scan file 1 0 Error Yes 7000 User aborted Abort Yes 8000 The scan has no barcode or the grid template you assigned to this extraction set has an AMADID different from the AMADID in its scan s barcode info FE unable to automate the extraction The operation completed successfully 8000 Metricset s is not present in database 1 0 Yes Please import missing metricset into database 8000 Unable to start extraction Unsupported 1 0 Yes scanner Model GenePix 4000B 83750 by Axon Instruments V1 00 is not supported 8000 Unable to start extraction Unable to 1 0 Yes open C Documents and Settings avinash_borde Desktop P90S35_portrait01_GE2 NonAT_95 Feb 07_feat csv The system cannot find the file specified 8000 Unable to find a default grid template 1 0 Yes from eArray some reason Feature Extraction Reference Guide Command Line Feature Extraction 6 Table 41 XML error codes Warning codes from XML file Feature Extraction Reference Guide Error Error message Type Abort code 8000 Unable to start extraction Extraction creating error Grid does not match image size 8000 Failed to import design file into database some reason 8000 Unable to find default protocol for extraction some reason 8000 Unable to start extraction ALL Yes 10000 Extraction failed ALL Yes 10000 Extraction completed with errors ALL No 20000 Execution error
92. ON This command extracts the designated TIFF file using the protocol specified If the protocol is not present then the default protocol in Feature Extraction is used The default grid template is used for the extraction This command creates a temporary project fep file and uses it for extraction SAF information cannot be provided for executing extraction using this switch FeNoWindows c extract o lt output_file gt i lt tiff_file gt p lt protocol_name gt output_file The name of the result xml file This file looks like a project file with the status added see following description tiff_file The absolute path to the TIFF image file protocol name The name of the protocol to use for extraction You must specify the o option when specifying the output file name or FeNoWindows will not create the file Feature Extraction Reference Guide addgrid addprotocol addmetricset adddyenormlist Feature Extraction Reference Guide Command Line Feature Extraction 6 This command adds a grid to the local database FeNoWindows c addgrid lt design_file_path gt lt grid_file_path gt design file path The path and name of a design file grid_file_ path The path and name of a grid file This command adds a protocol to the database FeNoWindows c addprotocol lt protocol_file_path gt protocol file path The path and name of a protocol file This command adds a metric set to the database
93. Probe Selection Method Rank Tolerance Variable Rank Tolerance Omit Background Population Outliers Allow Positive and Negative Controls Signal Characteristics Normalization Correction Method Max Number Ranked Probes Peg Log Ratio Value Spikein Target Used Min Population for Replicate Stats True 5 Use Window Average False Most Conservative 0 1000 0 1000 True True True Automatically Determine Use Rank Consistent Probes 0 050 False False False OnlyPositiveAndSignificantSignals Linear and Lowess 8000 4 00 True 5 Feature Extraction Reference Guide 41 1 Default Protocol Settings Table5 Default settings for GE2_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Grid Test Format Automatically Determine Recognized formats 60 micron and 30 micron feature size third party PValue for Differential Expression 0 010000 Percentile Value 75 00 Generate Results Type of OC Report Gene Expression Generate Single Text File True JPEG Down Sample Factor 4 42 Feature Extraction Reference Guide Default Protocol Settings 1 GE2 NonAT 1100 Jul11 Use this protocol for running Feature Extraction on non Agilent microarrays scanned with the Agilent scanner Table6 Default settings for GE2 NonAT_1100_Jul11 protocol Protocol step Parameter Default Setting Value v12 0 Place Grid Array Format For any format automatically Automatically Determine
94. QRatio Use Qtest for Small Populations Report Population Outliers as Failed in MAGEML file Compute Non Uniform Outliers Scanner Agilent scanner The values for the parameters change depending on the scanner used for the image See the following for differences Automatically Compute OL Polynomial Terms Feature CV 2 Green Poissonian Noise Term Multiplier 150 when False for 30 micron feature size Inter Quartile Region Automatically Determine and All Formats 1 42 All Formats 1 42 All Formats Use Mean Standard Deviation Automatically Determine and All Formats True 10 1 42 1 42 True False True Automatically Determine Hidden if Array Format is set to Automatically Determine True 0 04000 20 Feature Extraction Reference Guide 33 1 Default Protocol Settings Table4 Default settings for GE1_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Green Signal Constant Term Multiplier Background CV 2 Green Poissonian Noise Term Multiplier Green Background Constant Term Multiplier Compute Bkgd Biasand Background Subtraction Method Error Significance for lsPosAndSignif and IsWellAboveBG 2 sided t test of feature vs background max p value WellAboveMulti Signal Correction Calculate Surface Fit required for Spatial Detrend Feature Set for Surface Fit Perform Filtering for Surface Fit Perfo
95. Reference Guide Table 5 Default settings for GE2_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 Pixel Outlier Rejection Method RejectlORFeat RejectlQRBG Statistical Method for Spot Values from Pixels Flag Outliers Compute Population Outliers Minimum Population IORatio Background IQRatio Use Qtest for Small Populations Report Population Outliers as Failed in MAGEML file Compute Non Uniform Outliers Scanner Agilent scanner The values for the parameters change depending on the scanner used for the image See the following for differences Automatically Compute OL Polynomial Terms Feature CV 2 Red Poissonian Noise Term Multiplier 150 when False for 30 micron feature size Inter Quartile Region Automatically Determine and All Formats 1 42 All Formats 1 42 All Formats Use Mean Standard Deviation Automatically Determine and All Formats True 10 1 42 1 42 True False True Automatically Determine Hidden if Array Format is set to Automatically Determine True 0 04 20 Feature Extraction Reference Guide 39 1 Default Protocol Settings Table5 Default settings for GE2_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Red Signal Constant Term Multiplier Green Poissonian Noise Term Multiplier Green Signal Constant Term
96. SDUsed For the Error model significance test the SD becomes AddError 2 6 If the background subtracted signal is greater than the WellAboveSDMulti x SDpgg and if the feature passes the IsPosAndSignif test then the feature gets a Boolean flag of 1 under the IsWellAboveBG column in Feature Extraction result file Step 19 Calculate the surrogate value SurrogateUsed The surrogate value is calculated and used as the lowest limit of detection to replace the dye normalized signal when any of the following situations occur These tests are done for each channel e MeanSignal is less than BGUsed or not significant compared to BGUsed i e IsPosAndSignif 0 e BGSubSignal is less than its background standard deviation i e BGSubSignal lt BGSDUsed The decision to replace a dye normalized signal with a surrogate value is not made however until after probes are selected for correcting the dye bias The surrogate value is calculated in this step using these criteria If pixel significance is used to calculate IsPosAndSignif then 271 5 272 How Algorithms Calculate Results SurrogateUsed SDgg 19 where SDgg is the background standard deviation i e BGSDUsed For the local background method the standard deviation of the background is at the pixel level of the local background For global background methods the standard deviation of the background is at the replicate background population level o
97. SDev2 o is the estimated variance using known noise characteristics of the Agilent Microarray Gene Expression system For more information on confidence interval check Numerical Recipes in C Chapter 15 page 692 Net signal is the mean signal i e MeanSignal or BGMeanSignal respectively minus the MinSigArray which is minimum feature signal or minimum local background signal on the microarray representing an estimate of the scanner offset 248 Step 10 Determine if the feature is a non uniformity outlier IsFeatNonUnifOL The non uniformity outlier algorithm flags anomalous features and local backgrounds based on statistical deviations from the Agilent noise model Feature or background is flagged as a non uniformity outlier e g IsFeatNonUnifOL or IsBGNonUnifOL respectively if the measured variance is greater than the product of the estimated variance and the confidence interval multiplier oy S lop x CI Where CI is the confidence interval calculated from chi square distribution The following equations are calculated for each feature and background per channel Estimated Feature or Background Variance The Agilent noise model estimates the expected variance by using noise effects from the Agilent Microarray Gene Expression system which includes microarray manufacture wet lab chemistry and scanner noise 2 2 2 2 O E 0 Labeling FeatureSynthesis O Counting O Noise 6 o p Ax Bx C 7 x is the net sign
98. TION notice until the indicated conditions are fully understood and met A WARNING notice denotes a hazard It calls attention to an operating procedure practice or the like that if not correctly per formed or adhered to could result in personal injury or death Do not proceed beyond a WARNING notice until the indicated condi tions are fully understood and met Feature Extraction Reference Guide In This Guide Feature Extraction Reference Guide This Reference Guide contains tables that list default parameter values and results for Feature Extraction analyses and explanations of how Feature Extraction uses its algorithms to calculate results Protocol Default Settings This chapter includes tables that list the default parameter values found in the protocols shipped with the software Agilent 2 color gene expression GE 1 color GE CGH ChIP miRNA and non Agilent protocols QC Report Results Learn how to read and interpret the QC Reports Text File Parameters and Results This chapter contains a listing of parameters and results within the text file produced after Feature Extraction XML MAGE ML Results Refer to this chapter to find the results contained in the MAGE ML files generated after Feature Extraction How Algorithms Calculate Results Learn how Feature Extraction algorithms calculate the results that help you interpret your gene expression 2 color and 1 color CGH ChIP and miRNA exp
99. To extract these arrays the Feature Extraction program uses a somewhat different flow of the image processing and data analysis algorithms The Feature Extraction program places the grid on the high intensity scan only then finds spots using this grid on each of the two scans The XDR algorithm decides which features should use the low intensity scan data scales these signals appropriately and does a replacement for each feature and color channel where appropriate Then Feature Extraction proceeds with the rest of the data analysis outlier detection background correction dye normalization etc exactly as it would for a single non XDR scan Upon completion the Feature Extraction program generates results as if they were from a single measurement of the microarray The QC report and the stats table indicate that the Feature Extraction program extracted an XDR image pair by stating the new saturation value This is the saturation value of the low intensity scan after suitable scaling For instance if the high intensity scan is at 100 and the low intensity scan is at 10 the new saturation values will be around 650 000 about 10x greater than a normal 100 PMT gain scan This lets you use data in your calculations covering a much greater dynamic range 5 235 5 236 How Algorithms Calculate Results How the XDR algorithm works How does the XDR algorithm decide how to combine and scale the data from the high intensity an
100. V flattens out and is not tightly correlated with signal is the range where noise is proportional to signal This is generally the range used to calculate the median CV ity CV for Replicated Probes 2 2 a v 2 v g g v 5 gt Q R Log_gMedianProcessedSignal CV for Green Figure 37 miRNA QC Report Reproducibility CV for Replicated Probes 110 Feature Extraction Reference Guide Spike in Signal Statistics Feature Extraction Reference Guide QC Report Results 2 2 color gene expression spike in signal statistics These signal statistics and S N values for spike ins indicate accuracy and reproducibility of the signals of the microarray probes The table shows the expected signal of the spike in probe the observed average signal the SD of the observed signal and the S N of the observed signal Agilent SpikeIns Signal Statistics Probe Name E1A_r60_n9 E1A_r60_a107 E1A_r60_a135 E1A_r60_n11 E1A_r60_1 E1A_r60_a20 E1A_r60_3 E1A_r60_a104 E1A_r60_a97 E1A_r60_a22 Exp 1 00 0 48 0 48 0 48 0 00 0 00 0 48 0 48 0 48 1 00 Obs 1 10 0 53 0 47 0 56 0 04 0 44 SD Figure 38 2 color QC Report Agilent Spikelns Signal Statistics 111 2 112 QC Report Results 1 color gene expression spike in signal statistics For each sequence of spike ins this table shows the Probe Name the median Processed Signal median of LogProcess
101. Variable Rank Tolerance OnlyNegativeControlFeatures False True False True False True 3 Use Window Average True True Most Conservative 0 1000 0 1000 True True True Automatically Determine Use Rank Consistent Probes 0 050 False Feature Extraction Reference Guide 21 1 Default Protocol Settings Table 2 Default settings for CGH_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Compute Ratios Calculate Metrics Generate Results Omit Background Population Outliers Allow Positive and Negative Controls Signal Characteristics Normalization Correction Method Max Number Ranked Probes Peg Log Ratio Value Spikein Target Used Min Population for Replicate Stats Grid Test Format PValue for Differential Expression Percentile Value Type of QC Report Generate Single Text File JPEG Down Sample Factor False False OnlyPositiveAndSignificantSignals Linear 1 4 00 False 3 Automatically Determine Recognized formats 60 micron and 30 micron feature size third party 0 010000 75 00 Streamlined CGH True 4 22 Feature Extraction Reference Guide ChIP 1200 Jun14 Table 3 Default settings for ChIP_1200_Jun14 protocol Default Protocol Settings 1 This protocol is a ChIP protocol for use with Agilent Mammalian ChIP on Chip and DNA methylation applications Protocol step Parameter Default Setting Value v12
102. WARE Feature Extraction Reference Guide Content Default Protocol Settings 13 Default Protocol Settings an Introduction 14 Differences between CGH and gene expression microarrays 15 Hidden Settings 15 Tables of Default Protocol Settings 16 CGH_1200_Juni4 16 ChIP_1200_ Juni4 23 GE1_1200 Junl4 30 GE2_1200_Junl4 36 GE2 NonAT_1100_Jull1 43 miRNA_1200_Junl4 48 Differences in Protocol Settings Based on Each Step 55 Place Grid 56 Optimize Grid fit 57 Findspots 58 Flag outliers 59 Compute Bkgd Bias and Error 61 Correct Dye Biases 64 Compute ratios calculate metrics and generate results 65 QC Report Results 67 QC Reports 68 2 color Gene Expression QC Report 69 1 color Gene Expression QC Report 72 Streamlined CGH QC Report 75 CGH_ChIP QC Report 77 MicroRNA miRNA QC Report 79 Non Agilent GE2 QC Report 81 QC reports with metric sets added 83 Feature Extraction Reference Guide Contents QC Report Headers 87 2 color Gene Expression QC Report 87 1 color Gene Expression QC Report 88 Streamlined CGH QC Report 88 CGH_ChIPQC Report 88 MicroRNA miRNA QC Report 89 Non Agilent 2 color gene expression QC Report 89 Feature Statistics 90 Spot finding of Four Corners 90 Outlier Stats 91 Spatial Distribution of All Outliers 91 Net Signal Statistics 93 Negative Control Stats 94 Plot of Background Corrected Signals 95 Histogram of Signals Plot 1 colorGE or CGH 96 Local Background Inliers 97 Foreground Surface
103. _1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 RejectlORFeat RejectlIQRBG Statistical Method for Spot Values from Pixels Flag Outliers Compute Population Outliers Minimum Population IQRatio Background lQRatio Use Qtest for Small Populations Report Population Outliers as Failed in MAGEML file Compute Non Uniform Outliers Scanner Agilent scanner The values for the parameters change depending on the scanner used for the image See the following for differences Automatically Compute OL Polynomial Terms Feature CV 2 Red Poissonian Noise Term Multiplier Red Signal Constant Term Multiplier Green Poissonian Noise Term Multiplier Green Signal Constant Term Multiplier 1 42 All Formats 1 42 All Formats Use Mean Standard Deviation Automatically Determine and All Formats True 8 1 42 1 42 True False True Automatically Determine Hidden if Array Format is set to Automatically Determine True 0 04000 5 26 Feature Extraction Reference Guide Table 3 Default settings for ChIP_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 Background CV 2 Red Poissonian Noise Term Multiplier Red Background Constant Term Multiplier Green Poissonian Noise Term Multiplier Green Background Constant Term Multiplier Compute Bkgd Bia
104. able using different background signals and settings of spatial detrend and global background adjust see Table 34 on page 254 Error gBGSubSigError rBGSubSigError Propagated standard error as computed on net g r background subtracted signal SOT BGSubSigCorrelation Ratio of estimated background subtracted feature signal covariance in RG space to product of background subtracted feature Standard Deviation in RG space 214 Feature Extraction Reference Guide MAGE ML XML File Results 4 Table 29 Feature results Full contained in the MAGE ML FEATURES table Quant Features Green Features Red Options Description Type SQT glsPosAndSignif rlsPosAndSignif g r isPosAndSignif Boolean flag established via a 2 sided 1 indicates Feature is t test indicates if the mean signal of a positive and feature is greater than the significant above corresponding background selected background by user and if this difference is significant To display variables used in the t test see Table 34 on page 254 SQT gPValFeatEqBG rPValFeatEqBG P value from t test of significance between g r Mean signal and g r background SQT glsWellAboveBG rlsWellAboveBG Boolean flag indicating if a feature is Boolean gSpatialDetrendlsIn rSpatialDetrendlsIin FilteredSet FilteredSet float gSpatialDetrend rSpatialDetrend SurfaceValue SurfaceValue SOT IsUsedBGAdjust 1 Feature used 0 Feature not used SQT gBGUsed rBGUsed gBGSubSignal gMeanSignal gBGUsed
105. ackground Inliers on page 97 9 Foreground Surface Fit on page 97 10 Reproducibility Statistics CV Replicated Probes on page 104 11 Spatial Distribution of Significantly Up Regulated and Down Regulated Features Positive and Negative Log Ratios on page 100 12 OC reports with metric sets added on page 83 13 Plot of LogRatio vs Log ProcessedSignal on page 101 14 Histogram of LogRatio plot on page 103 10 12 Local Bkg inliers Red Green Number 240332 241276 Avg 50 70 42 97 sD 2 05 179 Foreground Surface Fit Red Green RMS Fit 0 71 0 77 RMS_Resid 2 09 1 52 Avg_Fit 58 40 47 01 Reproducibility C for Replicated Probes Median CV Signal inliers Non Control probes Red Green BGSubSignal 7 62 7 45 ProcessedSignal 3 78 3 58 Evaluation Metrics for ChIP_QCMT_Jul11 Good 5 Metric Name Value Excellent Good Evaluate IsGoodGrid 1 00 gt 1 lt 1 AnyColorPrentFeatNonUn 0 01 _BGNbise 1 67 lt 15 gt 15 r_BGNoise 233 lt 15 gt 15 9_SignalIntensity 187 79 r_Signallntensity 314 95 9_Signsl2Noise 112 27 r_SignalZNoise 135 01 DerivativeLR_Spread 018 gDDN 0 00 15t 15 lt 150r gt 15 rDDN 3 00 15W 15 lt 15 or gt 15 Excellent Good Evaluate Spatial Distribution of the Positive and Negative LogRatios 11 Page 2 of 2 A Positive Negative Positive 2 12 af NanClil Features Rangam Value 0 98 Negative 4 76 af NanCil Features Randa
106. al MeanSignal BGUsed BGUsed BGUsed BGUsed Global BGUsed GlobalBGinlierAve GBGIA SDSV GBGIA GBGIA SDSV BGAdjust Background GBGIA BGAdjust method BGSDUsed GlobalBGInlierSDev GBGISD GBGISD GBGISD GBGISD BGSubSignal MeanSignal MeanSignal MeanSignal MeanSignal BGUsed BGUsed BGUsed BGUsed For both the red and green channels 2 color CGH and non Agilent microarrays t With No background subtraction as the setting BGMeanSignal is the value for BGUsed only for the t test but no BGUsed is subtracted from the MeanSignal to produce BGSubSignal Ifthe method in the protocol for calculating the spot value from pixel statistics is Median Normalized InterQuartile Range instead of Mean Standard Deviation the program makes these substitutions for the spot value and background subtraction calculations MedianSignal for MeanSignal BGMedianSignal for BGMeanSignal PixNorm IQR for PixSDev GPixNormlQR for BGPixSDev NormlQR 0 7413 x IQR If Median is the selection in the protocol the median is substituted for the mean in the inlierAve and the InlierSDev calculations Feature Extraction Reference Guide 255 5 How Algorithms Calculate Results Step 13 Perform background spatial detrending to fit a surface To calculate the spatial shape or surface for each channel the Feature Extraction program uses one of these background subtraction protocol selections e All Feature Types This selection fits the surfac
107. al Background Use Error Model for Significance Use Pixel Statistics for Significance 0 01 13 2 6 244 2000 True FeaturesInNegativeControlRange AllFeatureTypes Only NegativeControl Features False True True False All except for GE2 NonAT GE2 NonAT All except GE2 NonAT GE2 NonAT All All except for GE2 NonAT GE2 NonAT miRNA only miRNA only All GE1 GE2 miRNA GE2 NonAT CGH ChIP CGH ChIP GE1 GE2 GE2 NonAT miRNA All except GE2 NonAT GE2 NonAT Feature Extraction Reference Guide 61 1 Default Protocol Settings Table 13 Compute Bkgd Bias and Error Default values in common and differences for protocols continued Parameter Default values Protocols using Default Value Signal Correction Adjust Background Globally Signal Correction Perform Multiplicative Detrending not applicable for GE2 NonAT Detrend on Replicates Only Filter Low signal probes from Fit Neg Ctrl Threshold Mult Detrend Factor Perform Filtering for Fit Use polynomial data fit instead of LOESS Polynomial Multiplicative DetrendDegree Robust Neg Ctrl Stats Choose universal error or most conservative MultErrorGreen MultErrorRed Auto Estimate Add Error Red False True False False True True 5 Use Window Average True False True Most Conservative Use Universal Error Model 0 1000 0900 0 1000 0900 True All except for GE2 NonAT
108. al of feature or background Aor O Labeling FeatureSynthesis is the term that estimates the sources of variance that are proportional to the square of the signal including microarray manufacturing and wet chemistry effects the variance follows a Gaussian distribution This term is intensity dependent and is the square of the CV e g coefficient of variation estimate of the pixel noise Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results 5 PixSDev CV MeanSignal MinSig 4rray 8 where B or Counting is the term that estimates the sources of variance that are proportional to the square root of the signal including scanning measurement or counting error the variance follows a Poisson distribution This term is dependent on the intensity and the scan resolution of the image where C or o Noise 1S the term that estimates the sources of variance that are independent of the signal including electronic noise in scanner and background level noise in glass the variance is a Constant The variables A B and C have different values for feature and background For Agilent data produced with the GE2 SSPE_95_Feb07 protocol these values are determined empirically default selection in protocol from self vs self experiments and from the known noise characteristics of the Agilent Microarray system discussed above For all other Agilent Feature Extraction protocols
109. all the Feature Extraction processing steps have been completed In the case of one color ProcesssedSignal contains the Multiplicatively Detrended BackgroundSubtracted Signal if the detrending is selected and helps If the detrending does not help this column will contain the BackgroundSubtractedSignal The universal or propagated error left after all the processing steps of Feature Extraction have been completed In the case of one color ProcessedSignalError has had the Error Model applied and will contain at least the larger of the universal UEM error or the propagated error If multiplicative detrending is performed ProcessedSignalError contains the error propagated from detrending This is done by dividing the error by the normalized MultDetrendSignal Number of outlier pixels per feature with intensity gt upper threshold set via the pixel outlier rejection method The number is computed independently in each channel These pixels are omitted from all subsequent calculations Number of outlier pixels per feature with intensity lt lower threshold set via the pixel outlier rejection method The number is computed independently in each channel These pixels are omitted from all subsequent calculations NOTE The pixel outlier method is the ONLY step that removes data in Feature Extraction 196 Feature Extraction Reference Guide Features Green Features Red Types Options Text File Parameter
110. ameter Default values Protocols using Default Value 5 CGH ChIP Green Signal Constant Term 1 All except GE2 NonAT Multiplier Background CV 2 0 09000 All except GE2 NonAT Red Poissonian Noise Term 3 All except GE1 GE2 NonAT Multiplier Red Signal Constant Term 1 All except GE1 GE2 NonAT Multiplier Green Poissonian Noise Term 3 All except GE2 NonAT Multiplier Green Background Constant 1 All except GE2 NonAT Term Multiplier Automatically Compute OL Polynomial Terms False GE2 NonAT Feature CV 2 0 11000 Poissonian Noise Term 320 R G combined Background Term 600 R G combined Background CV 2 0 09000 Poissonian Noise Term 320 R G combined Background Term 600 R G combined 60 Feature Extraction Reference Guide Default Protocol Settings 1 Compute Bkgd Bias and Error These parameters and values differ depending on the microarray type and the lab protocol Table 13 Compute Bkgd Bias and Error Default values in common and differences for protocols Parameter Default values Protocols using Default Value Background Subtraction Method Significance 2 sided t test of feature vs background max p value WellAboveMulti Background Method by Format Minimum Feature Threshold for Metrics Signal Correction Calculate Surface Fit required for Spatial Detrend Feature Set for Surface Fit Perform Filtering for Surface Fit Perform Spatial Detrending No Background Subtraction Loc
111. anSignal Median local background signal local to corresponding feature computed per channel Error Green BGPixSDev Red BGPixSDev Standard deviation of all inlier pixels per Local BG of each feature computed independently in each channel SQT gNumSatPix rNumSatPix Total number of saturated pixels per feature computed per channel SOT glsSaturated rlsSaturated 1 Saturated or Integer indicating if a feature is 0 Not saturated saturated or not A feature is saturated IF 50 of the pixels in a feature are above the saturation threshold 212 Feature Extraction Reference Guide MAGE ML XML File Results 4 Table 29 Feature results Full contained in the MAGE ML FEATURES table Quant Type Features Green Features Red Options Description SOT SOT float SOT SOT glsLowPMTScaledUp PixCorrelation BGPixCorrelation glsFeatNonUnifOL rlsFeatNonUnifOL glsBGNonUnifOL rlsBGNonUnifOL rlsLowPMTScaledUp 1 Low 0 High g r lsFeatNonUnifOL 1 indicates Feature is a non uniformity outlier in g r g r IsBGNonUnifOL 1 indicates Local background is a non uniformity outlier in g r For XDR features this is an integer indicating if the low PMT value was used for the calculations or the high value Ratio of estimated feature covariance in RedGreen space to product of feature Standard Deviation in Red Green space The covariance of two features measures their tendency to vary together
112. as Failed in MAGEML file Compute Non Uniform Outliers Automatically Compute OL Polynomial Terms Feature CV 2 Poissonian Noise Term Background Term Background CV 2 Poissonian Noise Term Background Term Compute Bkgd Biasand Background Subtraction Method Error Significance for IsPosAndSignif and IsWellAboveBG 2 sided t test of feature vs background max p value True 127 if False Inter Quartile Region 1 42 1 42 Use Mean Standard Deviation True 15 1 42 1 42 True False True False 0 11000 320 600 0 09000 320 600 Local Background Use Pixel Statistics for Significance 0 01 Feature Extraction Reference Guide 45 1 Default Protocol Settings Table 6 Default settings for GE2 NonAT_1100_Jul11 protocol continued Protocol step Parameter Default Setting Value v12 0 Correct Dye Biases WellAboveMulti Signal Correction Calculate Surface Fit required for Spatial Detrend Feature Set for Surface Fit Perform Filtering for Surface Fit Perform Spatial Detrending Signal Correction Adjust Background Globally Adjust Background Globally to Robust Neg Ctrl Stats Choose universal error or most conservative MultErrorGreen MultErrorRed Auto Estimate Add Error Red Additive Error Value Red Auto Estimate Add Error Green Additive Error Value Green Use Surrogates Use Dye Norm List Dye Normalization Probe Selection Method Rank Tolerance Variable Rank Tole
113. ata the QC report shows an RMS_Fit 0 0 If there are no stats for non control probes Feature Extraction looks at the spike in control probes If the CVs for these become worse Feature Extraction removes detrending 99 2 100 OC Report Results If the option Detrend on Replicates only is chosen and if there are not enough replicates for non control or spike in control probes Feature Extraction turns off multiplicative detrending Spatial Distribution of Significantly Up Regulated and Down Regulated Features Positive and Negative Log Ratios You can display the distribution of the significantly up and down regulated features on this plot up red down green Spatial Distribution of Significantly Up Regulated and Down Regulated Features be a ay vy Keema ANG ET AAY Up Regulated 465 Red Down Regulated 1041 Green AUp RegulatedDown Regulated Figure 28 QC Report Spatial Distribution of Up and Down Regulated Features For the CGH QC Report this plot is referred to as Spatial Distribution of the Positive and Negative Log Ratios If the microarray contains greater than 5000 features the software randomly selects 5000 data points These points include the number of up regulated features in the same proportion to the number of down regulated features as they are found on the actual microarray The threshold that is used to determine significance is set in the protocol QCMe
114. ated and Down Regulated Features Positive and Negative Log Ratios on page 100 8 Plot of Background Corrected Signals on page 95 76 Spatial Distribution of the Positive and Negative LogRatios Page 2 of 2 Red and Green Background Corrected Signals Non Control Inliers rBCSubSignal 100000 es alse 10 100 1000 10000 100000 9BGSubSignal ee Features NonCtrl with BGSubSignals lt 0 26 Red 145 Green Figure 8 Streamlined CGH QC Report p2 Feature Extraction Reference Guide CGH_ ChIP OC Report This report lists all of the same information as the 2 color Gene Expression report but removes the Array Uniformity table and spike ins and has a Histogram of LogRatio plot All log plots use log base 2 not 10 1 QC Report Headers on page 87 2 Spot finding of Four Corners on page 90 3 Outlier Stats on page 91 4 Spatial Distribution of All Outliers on page 91 5 Net Signal Statistics on page 93 6 Negative Control Stats on page 94 7 Plot of Background Corrected Signals on page 95 Feature Extraction Reference Guide OC Report Results 2 QC Report Agilent Technologies 2 Color CGH Date Monday Noverrber 28 2011 18 20 Image Hu244K_CGH_251469312458 Protocol ChIP_1100_Jul11 Read Only User Name Kmi Grid 1 014693_D_F_20111015 FE Version 114 0 0 7 Sample red green Page 1 of 2 BG Met
115. ative control 15000 SNP 20000 Not probe See Ch 4 for definition 30000 Ignore See Ch 4 for definition ProbeName text An Agilent assigned identifier for the probe synthesized on the microarray GeneName text This is an identifier for the gene for which the probe provides expression information The target sequence identified by the systematic name is normally a representative or consensus sequence for the gene SystematicName text This is an identifier for the target sequence that the probe was designed to hybridize with Where possible a public database identifier is used e g TAIR locus identifier for Arabidopsis Systematic name is reported ONLY if Gene name and Systematic name are different Description text Description of gene PositionX float Found coordinates of the feature PositionY centroid in microns Feature Extraction Reference Guide 179 3 Text File Parameters and Results Table 22 Feature results contained in the FULL output text file FULL FEATURES table continued Features Green Features Red Types Options Description LogRatio base 10 LogRatioError PValueLogRatio gSurrogateUsed rSurrogateUsed float float float float 1000 Non zero value 0 per feature log of rProcessedSignal gProcessedSignal If SURROGATES are turned off then if DyeNormRedSig lt 0 0 amp DyeNormGreenSig gt 0 0 if DyeNormRedSig gt 0 0 amp DyeNormGreenSig lt 0 0
116. ats on page 91 4 Spatial Distribution of All Outliers on page 91 5 Net Signal Statistics on page 93 6 Negative Control Stats on page 94 7 Plot of Background Corrected Signals on page 95 Feature Extraction Reference Guide OC Report Results 2 Date Tuesday Noverrber 29 2011 08 38 Image Hu22K_GE2_251209710036 Protocol GE2 NonAT_1100_Jul11 Read Only KM1 User Name 1 Grid 012097_D_20070820 FE Version 11 0 0 7 Sample red green DyeNorm List NA No of Probes in DyeNorm List NA Spot Finding of the Four Comers of the Array Grid Normal Local 3 Feature Background Red Green Red Green Non Uniform 1 1 0 0 Population 50 23 48 0 Spatial Distribution of All Outliers on the Array 105 rows x 215 columns Ae pee fee oe bes ER ae Cre wee er Popes ole 4 ays rar l Sas e oe Featur eNonUnif Red or Green 1 0 00 GeneNonUnif Red or Green 1 0 005 SBG Nantnifaim BG Papulatian Red FeatuiePapulatian Red Feature NanUnifarm eGieen FealuiePapulatiang Geen Feature NanUnifarm Figure 13 Page 1 of 2 QC Report Agilent Technologies 2 Color Gene Expression BG Method Minimum Feature Background Detrend Off Multiplicative Detrend False Dye Norm Lowess Linear DyeN orm Factor 1 Red 1 Green Additive Error 20 Red 30 Green Saturation Value 65211 r 65185 g Net Signal Statistics Non Control probes 5
117. ats Green Channel Stats Red Channel Type Description gDarkOffsetAverage rDarkOffsetAverage float Average dark offset per image per channel as measured by scanner gDarkOffsetMedian rDarkOffsetMedian float Median dark offset per image per channel as measured by the scanner gDarkOffsetStdDev rDarkOffsetStdDev float Standard deviation of the data points measured by the scanner to determine the dark offset per image per channel gDarkOffsetNumPts rDarkOffsetNumPts integer Number of points of data measured by the scanner to determine the dark offset per image per channel gSaturationValue rSaturationValue integer Signal intensity at which spot is considered saturated gAvgSig2BkgeQC rAvgSig2BkgeQC float The average ratio of net signal to local background for all spike in probes gAvgSig2BkgNegCtrl rAvgSig2BkgNegCtrl float The average ratio of net signal to local background for all negative control probes gRatioSig2BkgeQC_NegCtrl rRatioSig2BkgeQC_NegCtrl float The ratio of AvgSig2BkgeQC to AvgSig2BkgNegCtrl gNumSatFeat rNumSatFeat integer The number of saturated features on the microarray per channel Feature Extraction Reference Guide 159 3 Text File Parameters and Results Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description gLocalBGinlierNetAve gLocalBGInlierAve gLocalBGlnlierSDev gLocalBGInlierNum gGlobalBGinlierAve
118. ature Extraction FeatureExtractor_ExtractionTime text Time stamp at the beginning of Feature Extraction FeatureExtractor_UserName text Windows Log In Name of the User who ran Feature Extraction FeatureExtractor_ComputerName text Computer name on which Feature Extraction was run FeatureExtractor_Version text Version of Feature Extractor FeatureExtractor_IsXDRExtraction integer Says if result is from an XDR extraction 1 True 0 False Feature Extraction Reference Guide 157 3 Text File Parameters and Results Protocol Step Parameters Type Options Description FeatureExtractor_ColorMode integer A flag to indicate output color 0 One color green only 1 2 color FeatureExtractor_OCReportlype integer Type of OC report to generate 0 Gene Expression 1 CGH_ChIP 2 miRNA 4 Streamlined CGH DyeNorm_NormFilename text Name of the dye normalization list file DyeNorm_NormNumProbes integer Number of probes in the dye normalization list Grid_IsGridFile boolean 158 Feature Extraction Reference Guide Statistical results STATS Text File Parameters and Results 3 This middle section of the text file describes the results from the global array wide statistical calculations The STATS results are reported to 9 decimal places in exponential notation for all results files FULL COMPACT QC or MINIMAL STATS Table ALL text output types Table 21 Stats results contained in the text output file STATS table St
119. ature Extraction Reference Guide How Algorithms Calculate Results 5 Cutoff popoutier 1 42x IQR 00 where IQR Intensity at 7 percentile Intensity at 25th percentile where 1 42 is the IQR factor Agilent uses 1 42 as the IQR factor so that the cutoff boundaries encompass 99 of the expected population distribution The user can change this factor to encompass different boundaries as discussed in the Feature Extraction 10 9 User Guide Feature or background is flagged as population outlier e g IsFeatPopOL or IsBGPopOL respectively if the mean signal e g MeanSignal or BGMeanSignal is greater than the upper rejection boundary RBupper or less than the lower rejection boundary RBLower MeanSignal gt RBypper MeanSignal lt RBz ower where RB Upper 7 I 75percentile ag Cutoffpopoutlier and RB Upper 7 T25percentile T CutoffPopOutlier Feature Extraction Reference Guide 253 5 How Algorithms Calculate Results Compute Bkgd Bias and Error Feature extraction completes several steps in order to determine the error model for each feature First it determines and subtracts the background for each feature on the array This is followed by detrending the array for systematic error Finally an error model accounts for systematic and random errors encountered during sample preparation hybridization and scanning steps Step 12 Calculate the feature background subtracted signal BGSubSignal The feature backgroun
120. atureExtractor_ColorMode integer A flag to indicate output color 0 One color green only 1 2 color FeatureExtractor_OCReportlype integer Type of OC report to generate 0 Gene Expression 1 CGH_ChIP 2 miRNA 4 Streamlined CGH DyeNorm_NormFilename text Name of the dye normalization list file DyeNorm_NormNumProbes integer Number of probes in the dye normalization list Grid_IsGridFile boolean Indicates whether the grid is from a grid file Feature Extraction Reference Guide 155 3 Text File Parameters and Results MINIMAL FEPARAMS Table Table 20 List of parameters and options contained within the MINIMAL text output file FEPARAMS table Protocol Step Parameters Type Options Description Protocol Name text Name of protocol used Protocol_date text Date the protocol was last modified Scan_ScannerName text Agilent scanner serial number used Scan_NumChannels integer Number of channels in the scan image Scan_date text Date the image was scanned Scan_MicronsPerPixelX float Number of microns per pixel in the X axis of the scan image Scan_MicronsPerPixelY float Number of microns per pixel in the Y axis of the scan image Scan_OriginalGUID text The global unique identifier for the scan image Scan_NumScanPass 1or2 For 5 micron scans indicates whether the scan mode was a single 1 or double pass scan mode on the Agilent Scanner Grid_Name text Grid template name or grid file name Grid_Date integer Date the grid template
121. atureExtractor_ExtractionTime text Time stamp at the beginning of Feature Extraction run for the extraction set FeatureExtractor_UserName text Windows Log In Name of the User who ran Feature Extraction 130 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description FeatureExtractor_ComputerName text Computer name on which Feature Extraction was run FeatureExtractor_ScanFileGUID text GUID of the scan file FeatureExtractor_IsXDRExtraction integer Indicates whether or not the extraction 1 True was an XDR extraction 0 False DyeNorm_NormFilename text Name of the dye normalization list file DyeNorm_NormNumProbes integer Number of probes in the dye normalization list Grid_IsGridFile boolean Indicates whether the grid is from a grid file Scan_NumScanPass 1or2 For 5 micron scans indicates whether the scan mode was a single 1 or double pass scan mode on the Agilent Scanner Place Grid GridPlacement_Version text Version of the grid placement algorithm Place Grid GridPlacement_ArrayFormat integer Choices for grid placement based on the format of the image Choices include Automatically Determine Single Density 11k 22k Double Density 44k 95k 185 5 and 10 uM 65 micron 5 and 10 uM 30 micron single pack 30 micron multi pack 244 5 and 10 uM 25k
122. ay features can produce signals that span a broader range of intensity than a single scan can cover Therefore you can use eXtended Dynamic Range XDR to cover the full dynamic intensity range of your microarray features and hence see the most useful biology To do this you set the scanner to scan twice once at a high PMT setting the high intensity scan followed immediately by a low PMT setting the low intensity scan This functionality is enabled using Agilent Scan Control Software version 7 0 The two scans are labeled in their tiff headers as paired scans of the same microarray XDR Feature Extraction process The Feature Extraction program v9 1 and later uses this information to know to extract the low and high PMT images as a pair In this XDR extraction type the Feature Extraction program processes the two scans together and produces a single set of outputs that contain data from both scans Some of the features contain data from the high intensity scan and some from the low intensity scan You can determine this by viewing the column r gIsLowPMTScaledUp for each color channel For signals that are very bright or saturated in the high intensity scan e g a scan at 100 PMT gain the XDR algorithm substitutes the data from the low intensity scan e g 10 PMT gain after scaling the intensity appropriately Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results
123. background signal the test uses the feature signal and one error background signal distribution is assumed to be around 0 with one error The degrees of freedom are large enough to make the function Gaussian We define the error as one standard deviation 1SD from the probability of 0 on the Gaussian curve and equal to a p value of 01 AdditiveError 2 6 If the probability is greater than or equal to 1SD or 01 the background subtracted signal is flagged as positive and significant If it is less than 1SD or 01 it is flagged as not significant The value of the surrogate is scaled by the probability returned The surrogate value for the Not significant signals equals AddError 2 6 the probability calculated this way for two reasons e Signals stay continuous e Surrogate values are not larger than the smallest significant signals Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results 5 Step 18 Determine if the feature background subtracted signal is well above the background IsWellAboveBG The feature background subtracted signal i e BGSubSignal is compared to the noise of its background local or global BGSubSignal gt WellAboveSDMulti x SDgg where WellABoveSDMulti is the well above SD multiplier 5 default this means a feature is well above background if its signal is 5 times the additive error SDgg is the background standard deviation i e BG
124. centration spiked in red and green channels gNegCtriNumInliers rNegCtriNumInliers integer Number of all inlier negative controls Feature Extraction Reference Guide 163 3 Text File Parameters and Results Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description gNegCtrlAveNetSig gNegCtrlSDevNetSig gNegCtrlAveBGSubSig gNegCtrlSDevBGSubSig gAveNumPixOLLo gAveNumPixOLHi gPixCVofHighSignalFeat gNumHighSignalFeat NonCtrlAbsAveLogRatio NonCtrlSDevLogRatio NonCtrlSNRLogRatio rNegCtrlAveNetSig rNegCtrlSDevNetSig rNegCtrlAveBGSubSig rNegCtriSDevBGSubSig rAveNumPixOLLo rAveNumPixOLHi rPixCVofHighSignalFeat rNumHighSignalFeat float float float float integer integer float integer float float float Average net signal of all inlier negative controls Standard deviation of the net signal of all inlier negative controls Average background subtracted signal of all inlier negative controls Standard deviation of the background subtracted signals of all inlier negative controls The average number of pixels that are rejected from each feature at the low end of the intensity spectrum The average number of pixels that are rejected from each feature at the high end of the intensity spectrum Average of pixel CV for features with high signal The number of features
125. classifies the pixels in a region of interest around each grid position into feature pixels or background pixels The approximate radius of each feature mask is considered as the corresponding spot radius and the center of mass of the feature mask is considered as the actual spot centroid In the visual results view shp file all spots that are found are shown using a blue X on the spot and marked as Found For all spots the blue cross shows the location of the grid If the centroid cannot be found because the spot is too weak or the distance between and X centroids exceeds the range specified by the Spot Deviation Limit this spot is labeled Not Found 241 5 242 How Algorithms Calculate Results Step 4 Define features See the Feature Extraction 10 9 User Guide for how the Feature Extraction program defines features either with the CookieCutter method or the WholeSpot method Step 5 Estimate the radius for the local background The radius is the distance from the center of the cookie or whole spot to the edge of the outermost region as shown in Figure 54 The default radius is the value specified in the protocol You can also enter a minimum radius whose value is less than the default radius or you can enter a larger radius to capture more pixels in the background You can use the radius method for estimating global backgrounds as well The figures in this step represent the local background for
126. confidence interval it is 2 6 standard deviations SD away from the mean Or D 2 6 SD and D Mult _ factorxIOR k e From the Z table for cumulative normal frequency distribution the Zp 9 75 0 675 K 0 675x SD IQR 2 If you combine the four equations above and solve for the Mult_factor the Mult_factor 1 42 e If you would rather use a 95 confidence interval IQR Mult_factor 0 952 The reason for this is assuming normal distribution and infinite degrees of freedom D 1 96 SD 0 95185x IQR x Therefore l0OR 2 1 42 10R pI A fy 25 ile S0 ile 75 ile Boundary for rejection Boundary for rejection 1d Figure 58 Important points on Gaussian curve of pixels vs intensity Step 7 Calculate the mean signal of the feature MeanSignal The intensities of inlier pixels of a feature are averaged to give mean signal of the feature before background subtraction The NumPix column in the result file lists the number of inlier pixels in the cookie that remain after rejection of outlier pixels Feature Extraction Reference Guide If the method in the protocol for calculating the spot value from pixel statistics has been chosen to be Median Normalized InterQuartile Range instead of Mean Standard Deviation the program makes these substitutions for the spot value and background subtraction calculations MedianSignal for MeanSignal BGMedianSignal for BGMean Signal PixNorm IQR for PixSDev
127. cron feature size 30 micron feature size All 30 micron feature size All 185k 185k 10uM 65 micron feature size 30 micron feature size All All All All 58 Feature Extraction Reference Guide Flag outliers Default Protocol Settings 1 These parameters and values differ depending on the scanner used for the image the microarray type and the lab protocol Table 12 Flag Outliers Default values in common and differences for protocols Parameter Default values Protocols using Default Value Compute Population Outliers Minimum Population lQRatio Background lQRatio Use Qtest for Small Populations Report Population Outliers as Failed in MAGEML file Compute Non Uniform Outliers Agilent scanner Automatically Compute OL Polynomial Terms Feature CV 2 Red Poissonian Noise Term Multiplier Red Signal Constant Term Multiplier Green Poissonian Noise Term Multiplier True 10 15 8 1 42 1 42 5 00 True False True True 0 04000 30 20 5 1 20 All All except GE2 NonAT ChIP and miRNA GE2 NonAT ChIP and miRNA All All except miRNA miRNA All All All All except GE2 NonAT All except GE2 NonAT GE2 miRNA CGH ChIP All except GE2 NonAT GE1 GE2 miRNA Feature Extraction Reference Guide 59 1 Default Protocol Settings Table 12 Flag Outliers Default values in common and differences for protocols continued Par
128. cted then the gene is considered detected Estimates the ratio of the effective feature size to the nominal feature size It is calculated by looking at the ratio of the whole spot measurement versus the cookie measurement Calculates the ratio of the number of features having anomalous effective feature size fractions to the total number of features This gives a measure of the percentage of representative spots that are strange e g donuts super hot spots or hot crescents Reports whether an effective feature size was estimated or not Stat value is 0 if Yes and 1 if No If No the default effective feature size value is used Feature Extraction Reference Guide 285 5 286 How Algorithms Calculate Results Since v 10 7 support for miRNA Spike In analysis has been available The miRNA Spike In genes have a subtype mask of 8196 and consists of the following miRNA probes e dmr285 e dmr3la e dmr6 dmr3 Values for GeneSignal and ProbeRatio are calculated for each of the four probes How the miRNA Spike In Statistics and Metrics are calculated To calculate the miRNA Spike Ins four miRNAs from the species Drosophila melanogaster are utilized with the assumption that these sequences will not have any hybridization potential against the real targets on the microarray Those four miRNAs are named dmr6 dmr3 dmr3la and dmr285 The sequences come from the microRNA database miRBase http www mirbase org Thes
129. cted signals and log ratios for two channel data than using no correction or single channel correction Using a self self microarray i e same target labeled in red and green channels one expects to see a linear plot of red background subtracted signal versus green If the backgrounds have not been estimated correctly in one channel with respect to the second channel there will be a bias This bias yields a hook at the low end of the signal range when shown in a plot with log scale axes see Figure 65 262 Feature Extraction Reference Guide How Algorithms Calculate Results 5 50000 10000 5000 1000 500 100 50 rBGSubSignal 1 10 100 1000 10000 qBGSubSignal Figure 65 Unadjusted background subtracted signals The background adjustment algorithm first finds the central tendency of the data features shown as blue circles in the figures Using this subset of features the algorithm then estimates the best adjustment in both the red and green channels to remove the bias After the background adjustment the bias is removed and the plot is linear Figure 66 Feature Extraction Reference Guide 263 5 264 How Algorithms Calculate Results 50000 4 10000 4 5000 4 1000 4 5004 100 5 504 rBGSubSignal 104 Bis 1 10 100 1000 10000 gBGSubSignal Figure 66 Adjusted background subtracted signals The bias if uncorrected yields a log ratio versus signal plot that is not symmet
130. ction Reference Guide 173 3 Text File Parameters and Results Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description GridHasBeenOptimized boolean Indicates if grid has been adjusted for 0 False better fit as result of performing the interactively adjust corners method 1 True ExtractionStatus integer This is put out only if a metric set has been 0 in run It gives a status of the overall array range 1 out of range OCMetricResults String If the Extraction Status 0 the output says ExtractionInRange If the Extraction Status 1 the output says ExtractionEvaluate UpRandomnessRatio float Variance measure of whether or not positive Log Ratios appear to be correlated with position on the array DownRandomnessRatio float Variance measure of whether or not negative Log Ratios appear to be correlated with position on the array UpRandomnessSDRatio float StDev measure of whether or not positive Log Ratios appear to be correlated with position on the array DownRandomnessSDRatio float StDev measure of whether or not negative Log Ratios appear to be correlated with position on the array gdmr285GeneSignal rdmr285GeneSignal float These are metrics for miRNA only This is the log 9 transformed value of TotalGeneSignal for the miRNA spikein gene dmr285 within the subtype mask 8196 If the parameter Do you want minimum signal value as 0 1 value
131. ction boundary both of which are determined by multiplying a factor 1 42 by the interquartile range of the population made up of intra array feature replicates See Step 6 Reject outliers on page 245 230 Feature Extraction Reference Guide How Algorithms Calculate Results 5 Table 32 Algorithms Protocol Steps and the results they produce continued Protocol Step Results Result Definition Compute Bkgd Bias BGAdjust An adjustment value added to the initial background subtracted and Error signal to correct for underestimation or overestimation of the background This value can be positive or negative Note the BGAdjust values are reported per channel in the STATS table of Feature Extraction text file Compute Bkgd Bias BGused Final background signal used to subtract the background from the and Error feature mean signal To view the values used to calculate this variable using different background signals and settings of spatial detrend and global background adjust see Table 34 on page 254 Compute Bkgd Bias BGSubSignal Feature signal after subtraction of the background corrections To and Error view the values used to calculate this variable using different background signals and settings of spatial detrend and global background adjust see Table 34 on page 254 Compute Bkgd Bias IsPosAndSignif If significance is based on pixel statistics a Boolean flag of 1 and Error Compute Bkgd Bias and Error Compu
132. d from all subsequent calculations NOTE The pixel outlier method is the ONLY step that removes data in Feature Extraction Total number of pixels used to compute feature statistics i e total number of inlier pixels per spot same in both channels Raw mean signal of feature from inlier pixels in green and or red channel Raw median signal of feature from inlier pixels in green and or red channel Standard deviation of all inlier pixels per feature this is computed independently in each channel The normalized Inter quartile range of all of the inlier pixels per feature The range is computed independently in each channel Total number of pixels used to compute local BG statistics per spot i e total number of BG inlier pixels same in both channels 182 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 22 Feature results contained in the FULL output text file FULL FEATURES table continued Features Green Features Red Types Options Description gBGMeanSignal gBGMedianSignal gBGPixSDev gBGPixNormlOR gNumSatPix glsSaturated glsLowPMTScaled Up PixCorrelation BGPixCorrelation rBGMeanSignal rBGMedianSignal rBGPixSDev rBGPixNormlOR rNumSatPix rlsSaturated rlsLowPMTScaled Up float float float float integer 1 Saturated or 0 Not saturated boolean 1 Low 0 High boolean float float Mean local
133. d adjust see Table 34 on page 254 A boolean flag which indicates if a feature is used to measure dye bias The dye normalized signal in the indicated channel The standard error associated with the dye normalized signal Dye normalized red and green pixel correlation Indicates the error model that you chose for Feature Extraction or that the software uses if you have chosen the Most Conservative option A signal to noise parameter used to calculate pValue calculated differently depending on error model chosen 186 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 22 Feature results contained in the FULL output text file FULL FEATURES table continued Features Green Features Red Types Options Description gSpatialDetrendlsIn FilteredSet gSpatialDetrend SurfaceValue glsLowEnoughAdd Detrend SpotExtentX SpotExtentY gNetSignal gTotalProbeSignal glotalProbeError rSpatialDetrendlsIn FilteredSet rSpatialDetrend SurfaceValue rlsLowEnoughAdd Detrend rNetSignal 1 Feature in filtered set boolean 0 Feature not in filtered set float boolean float float float float float Set to true for a given feature if it is part of the filtered set used to detrend the background This feature is considered part of the locally weighted lowest x of features as defined by the DetrendLowPassPercentage Value of the smoothed surface
134. d low intensity scans The general theory is that the high intensity gives the best results for the low end of the signal range and the low intensity scan gives better data for bright features less affected by saturation The Feature Extraction program uses a signal level of 20 000 as the cut off between the two scans If the NetSignal of the high intensity scan is greater than 20 000 counts then the data from the low intensity scan is used The low intensity scan is scanned with a lower PMT gain than the high intensity scan say 10 versus 100 So to combine the data the signals from the low intensity scan must be increased to match those from the high intensity scans To determine the factor by which the low intensity signal should be scaled the algorithm uses features that have signals in an overlap range where both the high and low intensity scans provide very stable data This range is Net Signals in the high intensity scan greater than 300 counts and less than 20 000 counts Using data in this range the Feature Extraction program generates a linear fit with a slope and an intercept that transforms the low intensity mean signals into the same range as high intensity scans The final scaled signal for the XDR extraction is MeanSignal low intensity scan slope intercept The linear fit constants determined in this step are included in the stats table For signals over 20 000 counts in the high intensity scan therefor
135. d subtracted signal BGSubSignal is calculated by subtracting a value called the BGUsed from the feature mean signal BGSubSignal MeanSignal BGUsed_ 11 where BGSubSignal and BGUsed depend on the type of background method and the settings for spatial detrend and global background adjust See the following table Table 34 Values for BGSubSignal BGUsed and BGSDUsed for different methods and settings Background Background Spatial Detrend SpDe ON SpDe OFF Spatial Detrend ON Subtraction Subtraction SpDe OFF Method Variable Global Bkgnd GBA OFF GBA ON Global Bkgnd Adjust ON Adjust GBA OFF No BGUsed BGMeanSignal SpatialDetrend BGAdjust SpatialDetrendSurface background SurfaceValue Value SDSV BGAdjust subtract t 7 r BGSDUsed BGPixSDev BGPixSDev BGPixSDev BGPixSDev BGSubSignal MeanSignal MeanSignal MeanSignal MeanSignal BGUsed BGUsed BGUsed Local BGUsed BGMeanSignal BGMeanSignal BGMeanSignal BGMeanSignal SDSV Background SDSV BGAdjust BGAdjust BGSDUsed BGPixSDev BGPixSDev BGPixSDev BGPixSDev 254 Feature Extraction Reference Guide How Algorithms Calculate Results 5 Table 34 Values for BGSubSignal BGUsed and BGSDUsed for different methods and settings continued Background Background Spatial Detrend SpDe ON SpDe OFF Spatial Detrend ON Subtraction Subtraction SpDe OFF Method Variable Global Bkgnd GBA OFF GBA ON Global Bkgnd Adjust ON Adjust GBA OFF BGSubSignal MeanSignal MeanSignal MeanSign
136. d value of TotalGeneSignal for the miRNA spikein gene dmr3 within the subtype mask 8196 If the parameter Do you want minimum signal value as 0 1 value in protocol is true then the values of TotalGeneSignal less than 0 1 will be set to 0 1 for the calculation Otherwise the original value for TotalGeneSignal is used in the calculation Feature Extraction Reference Guide 175 3 Text File Parameters and Results Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description gdmr6ProbeRatio gdmr3ProbeRatio LogRatiolmbalance rdmr6ProbeRatio rdmr3ProbeRatio float float float These are metrics for miRNA only This is the log transformed value of the ratio of the TotalGeneSignal value for the longer probe in dmr6 divided by the TotalGeneSignal value for shorter probe in dmr6 for the miRNA spikein gene dmr3 within the subtype mask 8196 The probe length can be determined from the probe name itself for example dmr_6_17 means 17 is the probe length If the parameter Do you want minimum signal value as 0 1 value in protocol is true then the values of TotalGeneSignal less than 0 1 will be set to 0 1 for the calculation Otherwise the original value for TotalGeneSignal is used in the calculation These are metrics for miRNA only This is the log transformed value of the ratio of the TotalGeneSignal valu
137. dSignif rlsPosAndSignif boolean g r isPosAndSignif Boolean flag established via a 2 sided 1 indicates Feature is positive and significant above background t test indicates if the mean signal of a feature is greater than the corresponding background selected by user and if this difference is significant To display variables used in the t test see Table 34 on page 254 192 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 23 Feature results contained in the COMPACT output text file COMPACT FEATURES table continued Features Green Features Red Types Options Description glsWellAboveBG rlsWellAboveBG boolean Boolean flag indicating if a feature is WellAbove Background or not feature passes g r lsPosAndSignif and additionally the g r BGSubSignal is greater than 2 6 g r BG_SD You can change the multiplier 2 6 SpotExtentX float Diameter of the spot X axis gBGMeanSignal rBGMeanSignal float Mean local background signal local to corresponding feature computed per channel inlier pixels glotalProbeSignal float This signal is the robust average of all the processed green signals for each replicated probe multiplied by the total number of probe replicates the EffectiveFeature SizeFraction the Nominal Spot Area and the Weight For miRNA analyses glotalProbeError float This error is the robust average of all the processed green signal errors for each replicated probe mult
138. d_OffsetY float In a dense pack array the offset in the Y direction Grid_NomSpotWidth float Nominal width in microns of a spot from grid Grid_NomSpotHeight float Nominal height in microns of a spot from grid Grid_GenomicBuild text The build of the genome used to create the annotation if available If the genome build is not available not all designs have this information then it is not put out All recent and all future designs have it FeatureExtractor_Barcode text Barcode of the Agilent microarray read from the scan image FeatureExtractor_Sample text Names of hybridized samples red green FeatureExtractor_ScanFileName text Name of the scan file used for Feature Extraction FeatureExtractor_ArrayName text Microarray filename FeatureExtractor_ScanFileGUID text GUID of the scan file FeatureExtractor_DesignFileName text Design or grid file used for Feature Extraction FeatureExtractor_ExtractionTime text Time stamp at the beginning of Feature Extraction FeatureExtractor_UserName text Windows Log In Name of the User who ran Feature Extraction FeatureExtractor_ComputerName text Computer name on which Feature Extraction was run FeatureExtractor_Version text Version of Feature Extractor FeatureExtractor_IsXDRExtraction integer Says if result is from an XDR extraction 1 True 0 False 154 Feature Extraction Reference Guide Text File Parameters and Results 3 Protocol Step Parameters Type Options Description Fe
139. determined or selected by you the Recognized formats Single software uses the default Density 11k 22k 25k Double Placement Method Density 44k 95k 185k 185k 10 Parameters that apply to specific uM 65 micron feature size also formats appear only if that formatis with 10 micron scans 30 micron selected feature size single pack and multi pack and Third Party Placement Method Hidden if Array Format is set to Automatically Determine Allow Some Distortion Enable Background Peak Shifting Hidden if Array Format is set to Automatically Determine Set to false for all arrays except 30 microns single pack and multi pack for which it is set to true Use central part of pack for slope Hidden if Array Format is set to and skew calculation Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack for which it is set to True Use the correlation method to Hidden if Array Format is set to obtain origin X of subgrids Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack for which it is set to True Feature Extraction Reference Guide 43 1 Default Protocol Settings Table6 Default settings for GE2 NonAT_1100_Jul11 protocol continued Protocol step Parameter Default Setting Value v12 0 Optimize Grid Fit Grid Format The parameters and values for Automatically Determine optimizing the grid differ depending Recognized for
140. difference of the log ratios This metric is used in CGH experiments where differences in the log ratios are small on average A smaller standard deviation here indicates less noise in the biological signals Feature Extraction Reference Guide OC Report Results MicroRNA miRNA QC Report This header lists the same information as the 1 color gene expression QC Report header If the XDR function is turned on it also lists Saturation Values exceeding 65 500 Because the dynamic range of the intensity for all miRNA microarray spots on a microarray may exceed that of a normal scan range the miRNA analysis on some microarrays can benefit with the XDR function turned on Non Agilent 2 color gene expression OC Report Feature Extraction Reference Guide This header lists the same information as the 2 color gene expression QC report header 2 2 OC Report Results Feature Statistics This section provides an explanation for each of the feature statistics segments of the QC report and how these feature statistics can help you assess the performance of your microarray system Spot finding of Four Corners By looking at the features in the four corners of the microarray you can decide if the spot centroids have been located properly If their locations are off center in one or more corners you may have to run the extraction again with a new grid Spot Finding of the Four Corners of the Arra Grid Normal Figure 18 QC Report
141. ding were turned off Note that not all values of the metrics are within the default thresholds Feature Extraction Reference Guide QC Report Agilent Technologies 2 Color Gene Expression Date Thursday November 17 2011 14 50 BG Method No Background Image Hu22K_GE2_251209710036 Background Detrend Off Protocol GE2_1100_Julii Read Only Multiplicative Detrend False User Name KM1 Dye Norm Linear Lowess Grid 012097_D_20070820 Linear DyeNorm Factor 4 05 Red 6 84 Green FE Version 11 0 0 6 Additive Error 13 Red 28 Green Sample red green Saturation Value 65211 r 65185 g DyeNorm List NA No of Probes in DyeNorm List NA Evaluation Metrics for GE2_QCMT_Jul11i Good 10 Evaluate 2 Metric Name Value Excellent Good Evaluate IsGoodGrid 1 00 gt 1 lt 1 AnyColorPrentFeatNonUn 0 05 lt 1 gt 1 gNegCtrlAveBGSubSig 77 46 20 to 10 lt 20 or gt 10 gNegCtriSDevBGSubSig 3 59 lt 15 gt 15 rNegCtriAveBGSubSig 66 53 20to4 lt 20 or gt 4 rNegCtriSDevBGSubSig 2 19 lt 6 gt 6 gNonCntriMedCVBkSubSig 4 78 O to 18 lt 0 or gt 18 rNonCntriMedCVBkSubSig 6 56 O to 18 lt 0 or gt 18 gElaMedCV8kSubSignal 7 64 Oto18 lt 0 or gt 18 rElaMedCV8kSubSignal 9 27 0 to 18 lt 0 or gt 18 absE1aObsVsExpCorr 0 97 gt 0 86 lt 0 86 absE1aObsVsExpSlope 0 87 gt 0 85 lt 0 85 gDDN 15 to 15 lt 15 or gt 15 rDDN 15 to 15 lt i5 or gt 15 Excellent Good Evaluate Figure 16 QC Report Header and Evaluation M
142. does not help this column will contain the BackgroundSubtractedSignal The universal or propagated error left after all the processing steps of the Feature Extraction process have been completed In the case of one color If multiplicative detrending is performed ProcessedSignalError contains the error propagated from detrending This is done by dividing the error by the normalized MultDetrendSignal Log of the ratio of rProcessedSignal over gProcessedSignal The log ratio indicates the level of gene expression in cyanine 5 labeled sample relative to cyanine 3 labeled sample 232 Feature Extraction Reference Guide How Algorithms Calculate Results 5 Table 32 Algorithms Protocol Steps and the results they produce continued Protocol Step Results Result Definition Compute Ratios pValueLogRatio P value indicates the level of significance in the differential expression of a gene as measured through the log ratio MicroRNA Analysis glotalGeneSignal This signal is the sum of the total probe signals in the green channel per gene MicroRNA Analysis glotalGeneError This error is the square root of the sum of the squares of the TotalProbeError Feature Extraction Reference Guide 233 5 How Algorithms Calculate Results XDR Extraction Process 234 What is XDR scanning The Agilent scanner can cover a dynamic intensity range greatly in excess of the range covered by a single scan Furthermore Agilent microarr
143. e IsGoodGrid 1 00 gt NA lt 1 AnyColorProntFeatN onlin 0 01 lt 1 1to5 gt 5 DerivativeLR_Spread 0 18 lt 0 20 0 20 to 0 30 0 30 aRepro 0 07 00 0 05 0 05to0 20 lt Oor gt 0 20 9_BGNoise 1 67 lt 5 5to15 gt 15 g_Signal2Noise 112 27 gt 100 30 to 100 lt 30 9_SignalIntensity 187 79 gt 150 50to 150 lt 50 Grid Normal rRepro 0 08 Oto0 05 0 05to0 20 lt 0 or gt 0 20 Outlier Numbers with Spatial Distribution r_BGNoise 2 33 lt 5 5t015 gt 15 3 534 rows x 456 columns r SigmalzNaise 13501 saoo osotoro lt 30 octet A a r_Signalintensity 314 95 gt 150 50 to 150 lt 50 Res 8 bg RestrictionControl 1 00 0 80to1 lt 0 80 or gt 1 LVL Ui i DDN 0 00 15t015 lt 15 or gt 15 rDDN 9 00 15t015 lt 15 or gt 15 6 Red FeaturePopulation Red Feature NonUniform Green FeaturePopulation Green Feature NonUniform Feature Red Green Any Outlier Non Uniform 19 18 23 0 01 Population 114 129 214 0 09 Exppllent Good Evaluate Histogram of Signals Plot Red 11000 10000 9000 8000 7000 g 50 4000 o a d k S 0 Number of Probes Bie ott ht te 0 1 Log of BG SubSignal 4 Histogram of Signals Plot Green 10000 9000 2000 7000 6000 4000 3000 2000 1000 i okt nal ih 2 3 o 1 Log of BG SubSignal Number of Probes 2 Figure 7 Streamlined CGH QC Report p1 75 2 QC Report Results Q 7 Spatial Distribution of Significantly Up Regul
144. e the low intensity scan signals can extend to nearly 1 2 million counts If the low intensity scan has a spot centroid too far from the high intensity centroid greater than 2 pixels the algorithm does not make a substitution Feature Extraction Reference Guide How Algorithms Calculate Results 5 Troubleshooting the XDR extraction Feature Extraction Reference Guide The XDR algorithm provides warnings in the project summary report to indicate an issue with the XDR extraction process No XDR signal substitution for color red green This message appears if there are no features for which the low intensity data are substituted This could occur on a dim array Computation of the XDR fit for red green is based on only X pairs of high PMT low PMT matching values This message appears if very few features had data in the overlap range for the fit The user should check the data in this case to confirm that the XDR combination is satisfactory Computation of the XDR fit for red green results in a large intercept This message appears if the linear fit between the low and high intensity scans has a very large intercept This can be indicative of a poor linear fit The user should check the data in this case to confirm that the XDR combination is satisfactory Computed XDR ratio for red green is X vs expected Y from PMT settings Check scanner calibration This message appears if the ratio of the high low intensity sca
145. e 26 QC Report Foreground Surface Fit 98 Feature Extraction Reference Guide QC Report Results 2 Multiplicative Surface Fit See Step 16 Determine the error in the signal calculation on page 266 of this guide for more information about these calculations Feature Extraction Reference Guide This value is the root mean square RMS of the surface fit for the data The RMS X 100 is roughly the average deviation from flat on the microarray A multiplicative trend means that there are regions of the microarray that are brighter or dimmer than other regions This trend is an effect that multiplies signals that is a brighter signal is more affected in absolute signal counts than a dimmer signal SNP probes are not included in calculation of multiplicative detrending This option is turned on in GE1 GE2 and CGH protocols turned off in the miRNA protocol and is not available for non Agilent protocols If the signal is improved through a multiplicative surface fit the RMS_Fit value appears as a fraction as in the figure shown Multiplicative Surface Fit Figure 27 QC Report Multiplicative Surface Fit What if multiplicative detrending does not work If the median CV for the Processed Signal of the non control probes is greater than the BGSub Signal median CV after multiplicative detrending Feature Extraction turns off multiplicative detrending If multiplicative detrending did not result in better d
146. e 34 on page 254 184 Feature Extraction Reference Guide Text File Parameters and Results Table 22 Feature results contained in the FULL output text file FULL FEATURES table continued Features Green Features Red Types Options Description gBGSubSigError BGSubSigCorrela tion glsPosAndSignif gPValFeatEgBG gNumBGUsed glsWellAboveBG rBGSubSigError rlsPosAndSignif rPValFeatEgBG rNumBGUsed rlsWellAboveBG float float Boolean float integer Boolean g r isPosAndSignif 1 indicates Feature is positive and significant above background Propagated standard error as computed on net g r background subtracted signal For one color the error model is applied to the background subtracted signal This will contain the larger of he universal UEM error or the propagated error Ratio of estimated background subtracted feature signal covariance in RG space to product of background subtracted feature standard deviation in RG space Boolean flag established via a 2 sided t test indicates if the mean signal of a feature is greater than the corresponding background selected by user and if this difference is significant To display variables used in the t test see Table 34 on page 254 pValue from t test of significance between g r Mean signal and g r background selected by user Number of local background regions or features used to calculate the bac
147. e following equation Qi Xi Xnearest Xmax Xmin Where Xi the intensity of a probe sequence Xnearest the intensity of the nearest probe sequence in intensity Xmax the intensity of the most intense probe sequence Xmin the intensity of the least intense probe sequence Qi is compared to Qcritical to determine if the feature is an outlier Qcritical depends upon the number of replicate features N and upon the chosen confidence level Agilent has chosen a 95 confidence level and bases the identification of population outliers on this table Feature Extraction Reference Guide 251 5 How Algorithms Calculate Results See Step 6 Reject outliers on page 245 for definitions to help you understand the Interquartile Range 252 Table 33 Ocritical values at 95 confidence level Number of Ocritical replicated features N 3 0 970 4 0 829 5 0 710 6 0 625 7 0 568 8 0 526 9 0 493 10 0 466 IOR Test for replicate features gt or minimum population number The following equations are calculated for each feature and background population per channel The intensities of all features or background regions in the population are plotted on a distribution curve The difference in intensities between the 25 and 75 percentiles represent the Interquartile Range IQR 1 42 10R I0R 2 Boundary for rejection 25 ile 50 ile 75 ile Boundary for rejection idk Figure 59 Interquartile Range Fe
148. e for the longer probe in dmr3 divided by the TotalGeneSignal value for the shorter probe in dmr3 for the miRNA spikein gene dmr3 within the subtype mask 8196 The probe length can be determined from the probe name itself for example dmr_3_17 means 17 is the probe length If the parameter Do you want minimum signal value as 0 1 value in protocol is true then the values of TotalGeneSignal less than 0 1 will be set to 0 1 for the calculation Otherwise the original value for TotalGeneSignal is used in the calculation This metric is for CGH only It calculates the amount of amplifications versus deletions per chromosome to determine if there is an imbalance that falls outside of normal expectations 176 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description Metric_MetricName Optional Only displayed when a metric set is used The name of a metric in the metric set The given value is the one that has been calculated for this metric You can have more than one metric in a given metric set Metric_MetricName_IsInRange integer Optional Only displayed when a metric 1 in set is used Indicates whether the metric range was within any user defined thresholds found in the metric set for that metric 0 out of range Results are reported to 9 decimal places
149. e miRNAs have been placed on the array in multiple locations as replicated probe pairs with corresponding names dmr6 dmr3 dmr31la and dmr285 Replicated probe pairs means that two probes have been designed for each of the four miRNAs a longer probe and a shorter probe Multiple copies of each probe exist on the array in random locations The probe length can be determined from the probe name itself by examining the last portion of the probe name For example the probe dmr_3_17 has a length of 17 In order for these probes to show any legitimate signal in your microarray experiment the experimental protocol must be modified to include target mixtures of these Spike Ins please see the miRNA manual for details The Feature Extraction software will assume that these Spike Ins have been added and attempt to calculate the statistics and metrics unless that option has been specifically disabled via Feature Extraction protocol modification The Feature Extraction Reference Guide How Algorithms Calculate Results 5 software will calculate six statistics associated with the Spike Ins and add these six statistics to the STATS table that is output as part of the tab text output of Feature Extraction The software will then calculate three metrics from those statistics The software will output and grade these metrics on the miRNA QC report Statistics Two of the statistics calculated are summarized as ProbeRatios The ProbeRatio used to
150. e to a set of very low intensity features evenly distributed on the slide using a moving windowed filtering This algorithm which was the original algorithm for gene expression microarrays moves a window over the whole microarray and attempts to choose a fixed number of data points with the lowest intensity inside each window This option is recommended for those arrays without negative controls and is illustrated in the following figure No Moving Window Figure 60 The effect of a moving window on selecting the lowest inten sity features as an estimate of background In the figures above the blue squares represent the low intensity features found on the array In the absence of a moving window the lowest features on the entire array are located and may ex hibit spatial bias With the moving window the lowest fea tures from each region of the microarray are better identified 256 Feature Extraction Reference Guide How Algorithms Calculate Results 5 e OnlyNegativeControlFeatures This selection fits the surface to the negative control features distributed on the slide and is recommended for Agilent CGH microarrays This option works well with well defined negative controls Outlier filtering should be enabled with this option to ensure good negative control values To enable outlier filtering set NegCtrlSpread Outlier Rejection On to True which removes artifacts from distorting the co
151. ed N Local Feature 3 a Pekarni Saturated Features 16 1 ao oreen Red Green g9 of Sig Distrib 6850 1750 50 of Sig Distrib 82 64 Non Uniform 4 8 9 O 4 of Sig Distrib 40 48 Population 98 73 48 o Ls Red and Green Background Corrected Signals Non Control Spatial Distribution of All Outliers on the Array 6 Inliers 4 105 rows x 215 columns 100000 ft Aer X 10000 ve wee if Sarl atm ymy FH a Se ai m 1000 3 bee ee 5 i ae ote te ptas 3 l a 9 8 e x or e pee ose eh s gt z 1 10 100 1000 10000 100000 fas ae Coe DR A M gBGSubSignal Background Subtracted Signal porne UUl Features NonCtrl with BGSubSignals lt 0 1829 Red 5445 _Genenonunif Red or Green 5 0 025 Green SBG Nanunifam BG Population Rea FeatuiePapulatian Rea Feature Nanunifaim eGieen FeatuiePapulatiang Gieen Featuie NanUnifarm Figure 1 2 color Gene Expression QC Report with Spike ins p1 QC Report Results 7 Negative Control Stats on eae page 94 Negative Control Stats Spatial Distribution of Significantly Up Regulated and 7 Red Great Down Regulated Features feign eran 8 Spatial Distribution of Average Net Signals 42 92 53 24 SSA ARA A AS aes StdDev Net Signals 2 19 3 59 eds of i dee Significantly Up Regulated Average BG Sub Signal 3 93 3 61 a TNR StdDev BG Sub Signal 2 07 2 83 PRATER he TR TEIN and Down Regulated
152. ed by multiplicative detrending This average is used to normalize the surface It is a straight average over all the points in the surface Measures the standard deviation of the probe to probe difference of the log ratios This is a metric used in CGH experiments where differences in the log ratios are small on average A smaller standard deviation here indicates less noise in the biological signals The probe name of the eQC probe spiked in at the lowest concentration The probe name of the eQC probe spiked in at the second lowest concentration Agilent Spike In Concentration Response Statistic in the 1 color QC Report Log of low signal for the data Agilent Spike In Concentration Response Statistic in the 1 color QC Report Error in the log of low signal for the data Agilent Spike In Concentration Response Statistic in the 1 color QC Report Log of high signal for the data Agilent Spike In Concentration Response Statistic in the 1 color QC Report Log of low concentration in the linear range of curve fit Agilent Spike In Concentration Response Statistic in the 1 color QC Report Log of low signal in the linear range of curve fit Feature Extraction Reference Guide 167 3 Text File Parameters and Results Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description eQCOneColorLinFitLogHigh float Agilent Spike In Concentratio
153. ed when OLAutoCompute is turned on Applies to background specifies the variance due to the Poisson distributed noise automatically calculated when OLAutoCompute is turned on Feature Extraction Reference Guide 169 3 Text File Parameters and Results Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description gOutlierFlagger_Auto_BgndC Term OutlierFlagger_FeatChiSq OutlierFlagger_BgndChiSq gXDRLowPMTSlope gXDRLowPMTIntercept GriddingStatus NumGeneNonUnifOL TotalNumberOfReplicated Genes rOutlierFlagger_Auto_BgndC float Term float float rXDRLowPMTSlope rXDRLowPMTIntercept integer integer integer Applies to background specifies variance due to background noise of the scanner slide glass and other signal independent sources automatically calculated when OLAutoCompute is turned on Confidence Interval for the feature Confidence Interval for the background The slope that is multiplied by the original low intensity Mean Signal to get the XDR mean signal Used in the linear equation relating the Mean or Median Signal in the low intensity scan to the scaled intensity used in the combined XDR output The intercept that is added to the Slope LowIntensityMeanSignal to get the XDR Mean Signal Used in the linear equation relating the Mean or Median Signal in the low intensity scan to the scaled intens
154. edSignal CV SD_ProcessedSignals Avg_ProcessedSignals and StdDev of LogProcessedSignals Agilent SpikeIns Signal Statistics Probe Name E1A_r60_3 E1A_r60_a104 E1A_r60_a107 E1A_r60_a135 E1A_r60_a20 E1A_r60_a22 E1A_r60_a97 E1A_r60_n11 E1A_r60_n9 E1A_r60_1 Log Relative Conc 0 30 1 30 2 30 3 30 3 83 4 30 4 82 5 30 5 82 6 30 Median CV StdDev Figure 39 1 color QC Report Agilent Spikelns Signal Statistics Feature Extraction Reference Guide QC Report Results 2 Spike in Linearity Check for 2 color Gene Expression Using the data calculated for the above table the observed average log ratio is plotted vs the expected log ratio for each of the spike in probes A linear regression analysis is done using these values and the metrics are shown beneath the plot A slope of 1 y intercept of 0 and R of 1 is the ideal of such a linear regression A slope lt 1 may indicate compression such as having under corrected for background The regression coefficient R reflects reproducibility The standard deviation for each data point is shown on the plot by an error bar extending above and below the point Agilent SpikeIns Expected LogRatio Vs Observed LogRatio 2 kzi amp i 6 Go D 2 A o a 2 o 0 50 0 00 Expected LogRatio Standard Deviation of Log Ratio Y Intercept 0 085 Slope 0 954 R 2 0 986 Figure 40 QC Report Agilent Spikeln
155. edSignal rDyeNormSignal if rSurrogateUsed 49209 6 49209 6 293 5 How Algorithms Calculate Results rSurrogateUsed rBGSDUsed if Use Pixel Statistics for Significance is selected If a feature fails either or both of the criteria above SurrogateUsed is a non zero value and is calculated as shown in the following equation depending on the Significance test parameter chosen in the Compute Bkgd Bias and Error protocol step rSurrogateUsed rAddError rLinearDyeNormFactor if Use Error Model for Significance is selected If a surrogate is used in the red channel i e rSurrogateUsed is a non zero value the red processed signal is calculated as surrogate value multiplied by the dye normalization factors rProcessedSignal rSurrogateUsed rLinearDyeNormFactor rLowessDyeNormFator if rSurrogateUsed 0 294 The Log ratio is the log of red processed signal over green processed signal rProcessedSignal gProcessedSignal 0 0308612 log 49209 64 45834 13 LogRatio log It is important to note that log ratio and p value calculations are computed differently depending on whether a surrogate is used in only one channel both channels or neither channels If a feature uses a surrogate in only the red channel Case 2 of Table 39 and the red surrogate value is not greater than the green processed signal the p value and error on the log ratio are calculated as usual using equations 1 and 2 in Feature
156. eference Guide Command Line Feature Extraction 6 MessageID 29 gt lt ResultMessages Status Success Message Protocol in use CGH_107_ Sepo9 MessageID 30 gt lt ExtractionResults gt lt Extraction gt lt FEProjectResults gt lt FeatureExtractionML gt Table 41 XML error codes Error Error message Type Abort code 2002 Unable to load tiff image content Memory Yes 2000 Insufficient memory Memory Yes 3000 Grid is placed outside the scan Gridding Yes Failure 3000 Found Feature num outside the Scan at Gridding No xpos ypos Ignoring Failure 3000 Gridding Error X location obtained for Grid Yes grid origin is invalid GridPlacement Metrics 3000 Gridding Error Y location obtained for Grid Yes grid origin is invalid GridPlacement Metrics 3000 The grid may be placed incorrectly The Grid No spot centroids are shifted relative to their Metrics nominal grid 315 6 Command Line Feature Extraction Table 41 XML error codes Error Error message Type Abort code 3000 There are a large percentage of notfound Grid No features along one or more of the array Metrics edges We recommend checking the QC Report the image and the grid before using this data 3000 There is a large percentage of Grid No background non uniform outliers We Metrics recommend checking the QC Report the image and the grid before using this data 3000 There are a large number of negative Grid No control outliers
157. er of inliers a CV percent coefficient of variation of the background corrected signal is calculated for each channel SD of signals average of signals This calculation is done for each replicated probe and the median of those CV s is reported in the table for each channel SNP probes are not included Feature Extraction Reference Guide Feature Extraction Reference Guide QC Report Results 2 Reproducibility CV for Replicated Probes Median CV Signal inliers Non Control probes Red Green Red Green Agilent SpikeIns BGSubSignal 15 05 13 48 10 21 10 57 ProcessedSignal 7 39 oan 4 44 5 54 Figure 32 QC Report Reproducibility A lower median CV value indicates better reproducibility of signal across the microarray than a higher value Exclusion of dim probes Feature Extraction calculates the Median CV using those probes bright enough to be in the range where the noise is more proportional to signal Feature Extraction excludes from the calculation any sequences for which the Average BGSubSignal x Multiplicative error lt Additive error Dye Norm Factor For 1 color data the Dye Norm Factor is 1 A probe sequence has a CV calculated if the number of features that pass the filters NonUniform and signal filter described above is greater than the minimum replicate number indicated in the protocol QCMetrics_minReplicatePopulation If the number of replicated sequences with enough inlier features is
158. ercept by slope MedianLogProcSig HIGH MedianLogProcSig LOW LogConc HIGH LogConc LOW intercept LogConc HIGH slope MedianLogProcSig HIGH w is estimated by using the slope calculated above By looking at the derivative of F x at x0 we get DF x x0 max min 4 w so w 4 slope max min After the estimates are complete the data is fit and the parameters Min Max x0 w are optimized by using a parameterized curve fitting routine called 117 2 OC Report Results f Levenberg Marquardt and is a standard technique documented in Numerical Recipes in C on pages 683 688 After the curve fitting is done the Low Relative Concentration is calculated as x0 2 3 w The High relative Concentration is calculated as x0 2 2 w h All the eQC points falling between x0 2 3 w and x0 2 2 w are then fit through a line with the Slope and R Squared value reported All of the points with a concentration below Low Concentration are used to calculate SpikeIn Detection limit For each probe the mean and standard deviation is calculated in linear BGSubSignal space Then the average plus 1 standard deviation is calculated for each probe The maximum of these is used It is converted to log10 space and reported as the SpikeIn Detection Limit Relation of curve fit calculations to statistics in table In summary Table 16 presents descriptions of the statistics in Figure 42 their definitions within the
159. ere X p is the mean signal MeanSignal of the feature and Xz is the background correction used for subtraction BGUsed see Table 34 on page 254 Loy np df np where M and Nn are the number of inlier pixels in the feature or background local respectively e g NumPix or BGNumPix where o and o are variances of inlier piens for feature and back ounid respectively e g PixSDev or BGSDUsed n 1 xp i Z X X 17 X is pixel intensity 2 n 1 5 T x 18 269 5 270 How Algorithms Calculate Results where df is the degrees of freedom df Np nB 2 After the p value is calculated from the 2 sided t test using incomplete Beta Function it is compared to the user defined max p value If the calculated p value from the Beta Function is less than the user defined max p value then the feature signal is considered to be significantly different from the background signal If p value Calculated lt P value yon and if MeanSignal gt BGUsed then feature gets a Boolean flag of 1 under the IsPosAndSignif column in Feature Extraction result file Significance based on additive error The Error model significance also uses a Gaussian probability distribution for the calculation and tests to see if a signal is greater than 0 with a known additive error We compute the probability in a similar way to the Pixel Significance calculation But instead of having a feature signal and a
160. ere is another problem with probes measured are either in the data the noise or saturated Cannot make a linear fit of the data Setting the fit statistics to 0 1042 This CGH design has no This is a design file problem The systematic name defined systematic name for CGH arrays cannot calculate derivative of needs to have the chromosome the log ratio SD coordinates defined to compute the DLRSD metric 1043 No Spike in probes found inthis The design in use has no Array Setting the protocol s spike ins defined Can be parameter UseSpikelns to false ignored or you can create a special protocol just turning off Spike ins 1044 Ratio Warning Detected a Indicates data problem This negative or zero propagated array should be looked at variance on the log ratio Check the log file for more details 1045 The AutoFocus was suspended Rescan the array for an extended period of time during the scan xxx xx Inspect the surface of the slide for contamination and make sure that the scan region does not overlap the barcode or other non transparent areas of the slide Check the scan image for anomalies and then rescan Feature Extraction Reference Guide Command Line Feature Extraction 6 Warning code Warning message Resolution 1046 1047 1048 1050 1300 The AutoFocus was suspended during the scan for xxx xx of time longer period than the threshold xxx xx Inspect the surface of the
161. eriments Command Line Feature Extraction This chapter contains the commands and arguments to integrate Feature Extraction into a completely automated workflow Acknowledgments Apache acknowledgment Part of this software is based on the Xerces XML parser Copyright c 1999 2000 The Apache Software Foundation All Rights Reserved www apache org JPEG acknowledgment This software is based in part on the work of the Independent JPEG Group Copyright c 1991 1998 Thomas G Lane All Rights Reserved Loess Netlib acknowledgment Part of this software is based on a Loess Lowess algorithm and implementation The authors of Loess Lowess are Cleveland Grosse and Shyu Copyright c 1989 1992 by AT amp T Permission to use copy modify and distribute this software for any purpose without fee is hereby granted provided that this entire notice in included in all copies of any software which is or includes a copy or modification of this software and in all copies of the supporting documentation for such software THIS SOFTWARE IS BEING PROVIDED AS IS WITHOUT ANY EXPRESS OR IMPLIED WARRANTY NEITHER THE AUTHORS NOR AT amp T MAKE ANY REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE Stanford University School of Medicine acknowledgment Non Agilent microarray image courtesy of Dr Roger Wagner Division of Cardiovascular Medicine Stanford Uni
162. ery probe i ProcessedSignal 130 LogRatio Logig e where ProcessedSignal and ProcessedSignal are signals post dye normalization and post surrogate processing in the red and green channels respectively Step 27 Calculate the p value and error on log ratio of feature PvalueLogRatio and LogRatioError PvalueLogRatio gives the statistical significance on the log ratio per each feature e g gene between the red and green channels The p value is a measure of the confidence viewed as a probability that the feature is not differentially expressed For example if the p value is less than 0 01 we can say with a 99 confidence level that the gene is differentially expressed In other words there would be a 1 random chance of getting this low of a p value with a gene that is actually not differentially expressed dev xdev p value 1 Erf xdev Erfe 31 where _ 2 0 Erf x Tahe dt 32 279 5 How Algorithms Calculate Results For more details on calculations with the Universal Error Model see the confidential Agilent technical paper on error modeling For more details on calculations with the propagation error model see the confidential Agilent technical paper on error modeling Erf x is the error function of the expression x as given by the above equation It is twice the integral of the Gaussian distribution with mean 0 and variance 1 2 Erfc i
163. es glotalProbeError float This error is the robust average of all the processed green signal errors for each replicated probe multiplied by the total number of probe replicates the EffectiveFeature SizeFraction the Nominal Spot Area and the Weight For miRNA analyses glotalGeneSignal float This signal is the sum of the total probe signals in the green channel per gene For miRNA analyses glotalGeneError float This error is the square root of the sum of the squares of the TotalProbeError For miRNA analyses glsGeneDetected boolean Lets you know if the gene was detected on the miRNA microarray Feature Extraction Reference Guide 199 3 Text File Parameters and Results MINIMAL Features Table Table 25 Feature results contained in the MINIMAL output text file MINIMAL FEATURES table Features Green Features Red Types Options Description FeatureNum integer Feature number Row integer Feature location row Col integer Feature location column ControlType integer Feature control type See XML Control Type output on page 220 for definitions Control type none 0 1 Positive control 1 Negative control ones a be See Ch 4 for definiti 20000 ot probe i pea 30000 Ignore See Ch 4 for definition ProbeName text An Agilent assigned identifier for the probe synthesized on the microarray SystematicName text This is an identifier for the target sequence that the probe was designed to hybridize w
164. ese numbers you can see the mean signal distribution for the local background regions BGMeanSignal after outliers have been removed This information can help you detect hybridization wash artifacts and can be a component of noise in the low signal range SNP probes are included Local Bkg inliers Red 22105 49 77 0 93 Figure 25 QC Report Local Background Inliers Foreground Surface Fit See Step 13 Perform background spatial detrending to fit a surface on page 256 of this guide for more information about these calculations Feature Extraction Reference Guide Spatial Detrend attempts to account for low signal background that is present on the feature foreground and varies across the microarray SNP probes are not included e A high RMS_Fit number can indicate gradients in the low signal range before detrending e RMS_Resid indicates residual noise after detrending e AvgFit indicates how much signal is in the foreground A higher AvgFit number indicates that a larger amount of signal was detected by the detrend algorithm and removed This value may include the scanner offset unless a background method has been used before detrending The value may not include higher frequency background signals These higher frequency background signals are best removed by using the Local Background Method before the detrending algorithm 97 2 QC Report Results Foreground Surface Fit Red Figur
165. eshold Reports if the feature signal value is from the scaled up low signal image or from the high signal image The same concept as above but in case of background Boolean flag indicating if a feature is a NonUniformity Outlier or not A feature is non uniform if the pixel noise of feature exceeds a threshold established for a uniform feature Feature Extraction Reference Guide 197 3 Text File Parameters and Results Features Green Features Red Types Options Description glsBGNonUnifOL rlsBGNonUnifOL boolean g r IsBGNonUnifOL The same concept as above but for 1 indicates Local background background is a non uniformity outlier in g r glsFeatPopnOL rlsFeatPopnOL boolean g r lsFeatPopnOL Boolean flag indicating if a feature is a 1 indicates Feature Population Outlier or not Probes with is a population replicate features on a microarray are outlier in g r examined using population statistics A feature is a population outlier if its signal is less than a lower threshold or exceeds an upper threshold determined using a multiplier 1 42 times the interquartile range i e IQR of the population glsBGPopnOL rlsBGPopnOL boolean g r IsBGPopnOL 1 The same concept as above but for indicates local background background is a population outlier in g r IsManualFlag boolean Flags features for downstream filtering in third party gene expression software gBGSubSignal rBGSubSignal float g r BGSubSignal Backgro
166. essed in microns This represents the thickness of the microarray slide as measured during autofocus homing Using standard Agilent slides the values range from 900 1000 Nominal values for non Agilent slides are specified between 900 and 1100 for C scanners and 900 and 1200 for B scanners Restriction control probes are a set of probes spanning cut sites that are not variant in samples If the protocol is followed correctly these probes should always give 0 signal The final restriction control value is the minimum of the restriction control values of red channel and green channel If restriction control probes are not present in the design the RestrictionControl value is set to 1 Direction Dependent Noise during scanning For single pass scanning mode available in some Agilent scanner software the average of background signal on an even scan line is different from an odd scan line During postprocessing the scanner control software finds the DDN difference between both directions an average difference over the entire scan It then calculates the even line average minus odd line average A positive DDN value means the even line average value is greater than the odd line average value and a negative DDN means the even line average is less than the odd line average The DDN values are written to the image file header These stat values are not given for images that do not have DDN information Feature Extra
167. esults 159 text file feature results 178 parameters 127 statistical results 159 text file results 127 TIFF file format options 222 TIFF results 222 U up and down regulated features spatial distribution 100 Feature Extraction Reference Guide www agilent com In this book The Reference Guide presents descriptions of the protocols or methods available for use with Agilent Feature Extraction 12 0 as well as a listing of results and an explanation of how the Feature Extraction algorithms work This guide provides e alist of the default settings for each protocol shipped or downloaded with the software a list of all the parameters and results available after feature extraction the equations and a sample calculation for the feature extraction process Agilent Technologies Inc 2015 Revision A2 August 2015 G4460 90052 ite Agilent Technologies
168. etermine the error in the signal calculation This step calculates the error on the background subtracted and detrended signal You can select for the error calculation either the Universal Error Model or the model Universal or propagated that produces the largest most conservative estimate of the error Feature Extraction Reference Guide How Algorithms Calculate Results 5 The Feature Extraction program does a dynamic computation of an approximation for the additive terms in both the red and green channels for the Universal Error Model The estimation of the dynamic additive error term for each channel red or green is based on the following equation for 1 color gene expression the green channel AddError Jm oy eCirl_ My DNF RMSFit m DNF residual UI For definitions of non uniform and population outliers see the Feature Extraction 10 9 User Guide The RMSFit term drops out of the equation for microarrays of less than 5000 features Feature Extraction Reference Guide where m MultNcAutoEstimate Mg MultRMSAutoEstimate Mg MultResidualRMSAutoEstimate DNF LinearDyeNormF actor of the corresponding channel residual The residual of the 2D Loess fit Since the Additive Error is now calculated in Compute Background Bias and Error Section the DNF is 1 and the Variance of the NegCtrls are not scaled for the DNF either This scaling is done to the AdditiveError after DyeNorm is completed 2 ioi
169. etrics with GE2 metric set with thresholds added Detrending turned off 85 2 QC Report Results OC metric set results miRNA spike in analysis Figure 17 is an example of a QC report header and Evaluation Metrics table generated from a 1 color extraction whose miRNA metric set with thresholds had been added In this extraction the default protocol settings were used Note that not all values of the metrics are within the default thresholds For details on how the miRNA spike in statistics and metrics are calculated see MicroRNA Analysis on page 283 QC Report Agilent Technologies miRNA Date Thursday November 17 2011 08 35 Grid 019118_D_F_20110707 Image US23502302_25191 iia a BG Method No Background Protocol miRNA_1100_Juli1 Read Only Background Detrend On FeatNCRange LoPass User Name KM1 Multiplicative Detrend False FE Version 11 0 0 6 Additive Error 3 Sample red green Saturation Value 65524 Evaluation Metrics for miRNA_QCMT_Jul11 Excellent 3 Good 3 Evaluate 1 Metric Name Value Excellent Good Evaluate IsGoodGrid 1 00 gt 1 NA lt 1 AddErrorEstimateGreen 2 56 lt 5 5to12 gt 12 AnyColorPrcntFeatPopnOL 5 67 lt 8 8 to 15 gt 15 gNonCtriMedPrentCVBGSu 10 88 O to 10 10to 15 lt 0 or gt 15 gTotalSignal75pctile 65 48 LabelingSpike InSignal 3 74 gt 2 50 lt 2 50 HybSpike InSignal 2 31 gt 2 50 lt 2 50 StringencySpike InRatio 1 43 gDDN 7 00 15 to 15 lt 15 or gt 15 rDDN 15
170. f Median Signals for each Row and Column Higher frequency noise is shown in these plots so you can distinguish a low frequency trend outside of the high frequency noise 102 The first of these graphs plots the median Processed Signal and median BGSub Signal for each row over all columns of a 1 color GE microarray The second plots the same signals for each column over all rows of the 1 color GE microarray The difference between the Processed Signal and the BGSubSignal represents the effect of the multiplicative detrending The Processed Signal should look flatter Spatial Distribution of Median Signals for each Row Median Signal 14 51 101 151 201 251 301 351 401 451 501 Row Median BGSub Signal for Row Median Proc Signal for Row Spatial Distribution of Median Signals for each Column Median Signal 1 m 2 Si an aa al t Column Median BGSub Signal for Column Median Proc Signal for Column Figure 30 1 color QC Report Median Signal Spatial Distribution Feature Extraction Reference Guide QC Report Results 2 Histogram of LogRatio plot This is a plot of the log ratio distributions and displays the log ratios vs the number of probes This plot is included only in the CGH_ChIP report which is the default report for the ChIP_ lt revision gt _ lt date gt protocol Ratio Histogram of Lo 5 8 2 o a ps 2 zZ 2000 1000 0 2 5 20 15 10 05 00 05 10 15 20
171. f the microarray If Error model significance is used to calculate IsPosAndSignif then SurrogateUsed AddError LinearDyeNormFactor 20 where AddEtrror is the additive error from the Error Model calculation If Multiplicative Detrending is used the SurrogateUsed is scaled by the MultDetrendSignal for each feature If a p value other than default 0 01 is chosen in the protocol then the SurrogateUsed is adjusted appropriately Step 20 Perform multiplicative detrending Multiplicative detrending is an algorithm designed to compensate for slight linear variations in intensities that can occur if the processing is not homogeneous across the slide This non homogeneous processing results in different chemical reaction times for example between the sides and the center and produces a dome effect With 2 color microarrays these dome effects are the same in each channel and for the most part cancel out during the calculations Agilent has found multiplicative detrending to still be useful however for all the microarrays It is turned on in all protocols except for the GE2 nonAT_95 protocol Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results 5 This algorithm is designed to correct the data by fitting a smoothed surface via a second degree polynomial fit to the higher signals on the microarray after outliers are rejected This is shown in the following illustration
172. for a par ticular purpose Agilent shall not be liable for errors or for incidental or consequential damages in connec tion with the furnishing use or per formance of this document or of any information contained herein Should Agilent and the user have a separate written agreement with warranty terms covering the material in this document that conflict with these terms the warranty terms in the sep arate agreement shall control Technology Licenses The hardware and or software described in this document are furnished under a license and may be used or copied only in accor dance with the terms of such license Restricted Rights Legend U S Government Restricted Rights Soft ware and technical data rights granted to the federal government include only those rights customarily provided to end user cus tomers Agilent provides this customary commercial license in Software and techni cal data pursuant to FAR 12 211 Technical Data and 12 212 Computer Software and for the Department of Defense DFARS 252 227 7015 Technical Data Commercial Items and DFARS 227 7202 3 Rights in Commercial Computer Software or Com puter Software Documentation Safety Notices CAUTION A CAUTION notice denotes a haz ard It calls attention to an operat ing procedure practice or the like that if not correctly performed or adhered to could result in damage to the product or loss of important data Do not proceed beyond a CAU
173. for the analysis Calculate Metrics These algorithms calculate all the QC metrics for the analysis One of the primary algorithms in this step is the gridding test whose parameter values are hidden in the protocol This algorithm yields grid warnings on the Summary Reports and the Evaluate Grid warning in the QC Report Agilent has added many more tests to assess if gridding has been successful or not Protocols for Agilent arrays also have associated QC metric sets These metrics are calculated at this step Agilent miRNA protocols also have specialized metrics calculated at this step Generate Results This part of the process generates the output result files using the parameter values specified in the protocol step and the selections made in the Project Properties window This step is not discussed in this chapter 229 5 How Algorithms Calculate Results Algorithms and results they produce Table 32 summarizes the results for each algorithm protocol step These result names are used in the equations for the calculations for each algorithm Table 32 Algorithms Protocol Steps and the results they produce Protocol Step Results Result Definition Find Spots MeanSignal Average raw signal of feature calculated from the intensities of all inlier pixels that represent the feature after outlier pixel rejection The number of inlier pixels is shown in the column NumPix Find Spots MedianSignal Median raw signal
174. for the third term in the additive error equation This parameter is for single density 8 pack microarrays where Feature Extraction may not be able to accurately subtract the background using the spatial detrending method This parameter provides a minimum number of features needed for the software to use the residual or the RMS to estimate the additive error It comes up only if using low density 8 pack microarrays Flag indicating the use of surrogates Use of surrogates turned on Use of surrogates turned off Feature Extraction Reference Guide 143 3 Text File Parameters and Results Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Compute Bkgd BGSubtractor_Version text Version of BGSubtractor algorithm Bias and Error Correct Dye Biases DyeNorm_Version text Version of DyeNorm algorithm Correct Dye Biases DyeNorm_UseDyeNormList integer 0 Automatically determine 1 True 2 False Correct Dye Biases DyeNorm_SelectMethod integer Method for selecting features used for measurement of dye bias 4 Use All Probes 5 Use List of Normalization Genes 6 Use Rank Consistent Probes 7 Use Rank Consistent List of Normalization Genes Correct Dye Biases DyeNorm_ArePosNegCtrlsOK integer 1 True Use positive and negative controls for dye normalization 0 False Do not use these controls Correct Dye Biases DyeNorm_SignalCharacteristics
175. formation might look like presented in red after an extraction set is run Each of these messages is associated with an extraction set that has been run lt FeatureExtractionML gt lt FEPMLVerInfo VerMaj 2 VerMin 50 gt lt FEProject Operator Unknown gt lt Extaction Name SinglePack gt lt XDRScanID Name gt lt Image Name C Images SinglePack tif gt lt Grid IsGridFile False Name 014077_D 20051222 gt lt Protocol Name CGH 107 Sep09 2 gt lt GridFile Path gt lt FeatFile Path gt lt ShapeFile Path gt lt Arrays gt lt Array ID 251407710012 gt lt SampleId Name gt lt JpegFile Path gt lt TextFile Path C Images SinglePack_CGH_107_Sep09 txt gt lt QCReport Path C Images SinglePack CGH 107 Sep09 pdf gt lt MAGEML Path _ gt lt Result Status Warning gt The overall result of the aray lt ResultMessages Status Success 313 6 Command Line Feature Extraction All result messages in the result entity are array level messages These are the same messages that show up in the batch Run Summary Each message has a message ID associated with it If the message is Error or Warning then message ID indicates the type of failure or in which module the failure occurred The errors and warnings are summarized in the tables at the end of this chapter The entire stats table is output We included
176. g rows for the default values for finding spots Use the Nominal Diameter from the Grid Template Spot Deviation Limit Calculation of Spot Statistics Method Cookie Percentage Exclusion Zone Percentage Auto Estimate the Local Radius Automatically Determine Recognized formats Single Density 11k 22k 25k Double Density 44k 95k 185k 185k 10 uM 244k 10uM 65 micron feature size 30 micron feature size and Third Party Hidden if Array Format is set to Automatically Determine True All Formats Hidden if Array Format is set to Automatically Determine 8 0 for all formats except for third party for which it is set to 1 5 Hidden if Array Format is set to Automatically Determine Use Cookie All Formats Hidden if Array Format is set to Automatically Determine 0 650 Single Density 25k 0 561 Double Density 95k 0 700 185k 185k 10 uM 244k 10 uM 65 micron feature size 0 750 30 micron feature size Hidden if Array Format is set to Automatically Determine 1 200 All Formats except 30 micron feature size 1 300 30 micron feature size Hidden if Array Format is set to Automatically Determine True Single Density Double Density 25k 95k Feature Extraction Reference Guide Table 2 Default settings for CGH_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 LocalBGRadius Pixel Outlier Rejection Me
177. hannel Type Description gMedPrentCVProcSignal geQCMedPrentCVProcSignal gOutlierFlagger_Auto_FeatB Term gOutlierFlagger_Auto_FeatC Term gOutlierFlagger_Auto_BgndB Term rMedPrentCVProcSignal reQCMedPrentCVProcSignal rOutlierFlagger_Auto_FeatB Term rOutlierFlagger_Auto_FeatC Term rOutlierFlagger_Auto_BgndB Term float float float float float The median CV for replicate non control probes using the processed signal This value is calculated by calculating the average SD and CV of the processed signal of each replicated probe For non control replicated probes there must be at least 10 CVs from which to calculate a median otherwise 1 is reported The MedPrentCVProcSignal and the MedPrentCVBGSubSignal show if Multiplicative Detrending is having a positive effect on the data If multiplicative detrending is helping the MedPrentCVProcSignal should be smaller than the MedPrentCVBGSubSignal This is the same as MedPrentCVProcSignal except that it is performed using the eQC Spikeln Replicates rather than the nonControl Replicates There must be at least 3 CVs from which to calculate a median Applies to feature specifies the variance due to the Poisson distributed noise automatically calculated when OLAutoCompute is turned on Applies to feature specifies variance due to background noise of the scanner slide glass and other signal independent sources automatically calculat
178. hannels The new signals after correction R and G are obtained by transforming the original R and G d Ifthe original log ratio is exactly along the fit line Mfit the new log ratio is shifted to zero If log R G Mfit then Log R Log G Mfit or Log R 10 Log G 10 FY Mfit or Log R Mfit 2 Log G Mfit 2 Mfit or Log R G 0 e The LOWESSDyeNormFactor for R is 1 10 The LOWESSDyeNormFactor for G is 10 Note that the Linear amp LOWESS dye _ Linear amp LOWESSDyeNormFactor This curve fitting algorithm normalization factor is not reported does a linear scaling normalization of the data individually in the Feature Extraction output in each channel before performing a non linear dye file Therefore the only way to normalization know the Linear amp Lowess dye norm factor is to calculate it using the following equation The Linear amp LOWESS dye normalization factor can be calculated from the following equation Linear amp LOWESSDyeNormFactor ___DyeNormalSignal 27 BGSubSignal x LinearDyeNormFactor Feature Extraction Reference Guide 277 5 278 How Algorithms Calculate Results Step 23 Determine if surrogate values must substitute for low intensity signals At this point two criteria are used to determine is surrogate values must take the place of the low intensity signals e The feature signal is not positive and significant versus background e The signal i
179. he preliminary spatial fit of the negative controls It is equivalent to a standard deviation of NC signals after removal of spatial homogeneities Used as a preliminary estimation of the noise on the array for selecting near zero probes in spatial detrending and conversely for excluding near zero probes in multiplicative detrending Measure of the number of noncontrol features whose signals are well above background Used as a metric for the number of features with significant signal 16 bit or 20 bit The percentage of time during a scan that the Autofocus assembly holds its position rather than actively maintaining focus Typically the value is less than 2 however the value will be larger if there are obstructions on the microarray that interfere with the laser beams The voltages that Photomultipliers are set to The voltage adjusts the spectral response of the scanner to incoming light from the lasers In general the higher the PMTVoltage the higher the signals will be for fluorescent artifacts that are scanned Typical numbers here are between 350 525 mV but can vary depending on the PMT 172 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description GlassThickness RestrictionControl gDDN rDDN float float integer Expr
180. he option to use a polynomial surface fit Bias and Error cativeDetrend 1 True method for the multiplicative detrending fit rather than LOESS 0 False Compute Bkgd BGSubtractor_NegCtrlThresholdMultD float This factor multiplies the negative control Bias and Error etrendFactor spread to determine the threshold signal below which low intensity features are filtered out of the multiplicative detrending fit set 138 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Compute Bkgd Bias and Error plicativeDetrendDegree Compute Bkgd Bias and Error Compute Bkgd Bias and Error Replicates Compute Bkgd Bias and Error BGSubtractor_PolynomialMulti BGSubtractor_TestMultDetrendOnCVs BGSubtractor_MultDetrendOn BGSubtractor_BGSubMethod integer 1 5 integer integer 1 True 0 False integer 1 Shows the degree of the polynomial fit used for the multiplicative detrending The most common choices are 2 quadratic or 2nd order surface and 4 4th order surface Tests whether the replicate CVs improve i e decrease after multiplicative detrending If this choice is 1 True and the replicate CVs don t improve Feature Extraction doesn t use the multiplicative detrending for that array Specifies to use only
181. hms for XDR extraction see XDR Extraction Process on page 234 224 Place Grid This algorithm finds the grid to define the nominal positions of the spots on the microarray eXtended Dynamic Range XDR extraction For an XDR extraction the grid placement is done using the high intensity scan i e higher PMT voltage The grid found using the high intensity scan is used as the starting point for the remaining extraction of both the high and low intensity images Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results 5 With version 10 x and higher of the software you no longer have to perform XDR dual scans or extractions to capture the full dynamic range of the data You can get the same dynamic range by working with the 20 bit TIFF Dynamic Range option This option is meant to be a replacement for the XDR option You capture the full dynamic range with better accuracy Choosing the XDR option may still be useful if you want to compare XDR data from the G2565BA Scanner with XDR data from the G2565CA Scanner Optimize Grid Fit This algorithm improves the grid fit on the entire microarray Leveraging from the Spot Finder algorithm this protocol step examines the spots in the four corners of the microarray and iteratively adjusting the grid for a better fit If the grid has been optimized by this protocol step the STATS table shows the stat GridHasBeenOptimized with b
182. hod No Background Background Detrend On NegC Multiplicative Detrend True Dye Norm Linear Linear DyeNorm Factor 3 13 Red 5 3 Green Additive Error 7 Red 9 Green Saturation Value 65211 r 65151 9 DyeNorm List NA Derivative of Log Ratio Spread ate No of Probes in DyeNorm List NA Spot Finding of the Four Comers of the Array Net Signal Statistics Non Control probes 5 Red Green Saturated Features o a 99 of Sig Distrib 892 553 50 of Sig Distrib 350 210 1 of Sig Distrib 119 63 Negative Control Stats Grid Normal Red Green Feature Background 4 r A Average Net Signals 34 65 22 43 3 Bed Steen ed Green StdDev Net Signals 2 48 1 89 Average BG Sub Signal 0 12 0 18 Non Uniform 19 18 14 15 StdDev BG Sub Signal ye oe Population 114 129 3098 2154 BG Noise i Spatial Distribution of All Outliers on the Array Red and Green Background Corrected Signals Non Control 534 rows x 456 columns Inliers 100000 FeatureN onUnif Red or Green 23 0 01 GeneNonUnif Red or Green 22 0 009 SBG Nantnifarm BG Population Rea FeatuiePapulatian Rea Feature Nanunifaim eGieen FealuiePapulatiang Gieen Feature NanUnifarm Figure 9 rBGSubSignal 1 10 100 gBGSubSignal Background Subtracted Signal Features NonCtrl with BGSubSignals lt 0 26 Red 145 Green 1000 10000 100000 CGH_ChIP OC Report p1 71 78 QC Report Results 8 Local B
183. i ONegCtrl Variance of the inlier negative control where inlier negative control implies the negative controls for the corresponding channel after rejections of saturated population and non uniform outliers where SpatialDetrendRMSFit RMS of the points defining the surface fit for that channel For more details on this term see Table 35 on page 261 For Agilent 8 x format oligo microarrays the auto estimation algorithm uses only the variance of the inlier negative controls You can set m1 or m2 in equation 22 equal to zero in the protocol settings 267 5 How Algorithms Calculate Results MultNcAutoEstimate MultRMSAutoEstimate MultResidualRMSAutoEstimate 268 Multiplier for the first term in the additive error equation standard deviation of the inlier negative control The value changes depending on the protocol used GE1 GE2 and miRNA 0 CGH and ChIP 1 non Agilent 1 Multiplier for the second term in the additive error equation g 7 SpatialDetrendRMSFit This term is proportional to the amount of sequence variability in the foreground On gene expression arrays Agilent uses this term because there is a single sequence for all negative controls so an estimation of any sequence dependent foreground noise using negative controls is not possible For CGH microarrays the error model choice is to make this term and m3 zero and use only m1 because there are a variety of sequences used for the negative cont
184. iRNA_Analysis_MinNoiseMultToCo float Minimum Noise Multiplier mpEffectiveFeatSize miRNA Analysis miRNA_Analysis_IsDetectedMulti float Configures the IsProbeDetected Multiplier in the miRNA algorithm miRNA Analysis miRNA_Analysis_MinimumTotalGeneS float Configures the Default Total Gene Signal if ignal all probes are not detected Used if the non detected probes are excluded from the calculation miRNA Analysis miRNA_Analysis_ExcludeNonDetecte integer Changes how the Total Gene Signal is dProbes calculated If a Total Probe Signal is not detected then it is not added to the Total Gene Signal If a probe that is associated with an miRNA isn t detected because it fails its lsProbeDetected flag then if this option is true it will not contribute to the totalGeneSignal and its error will not propagate to the totalGeneError 1 True Exclude non detected probes from analysis 0 False Include non detected probes in analysis Results will be same as Feature Extraction v10 5 146 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description miRNA Analysis miRNA_Analysis_PropagateTotalGene integer Use this if and only if the all the probes are SignalError not detected and the non detected probes are excluded from the calculation see option above If true Total Gene Signal Er
185. ics Calculate Surface Fit required for Spatial Detrend Feature Set for Surface Fit Perform Filtering for Surface Fit Perform Spatial Detrending Adjust Background Globally 1 20 0 09000 3 No Background Subtraction Use Error Model for Significance 0 01 13 244 2000 True FeaturesInNegativeControlRange True True False 52 Feature Extraction Reference Guide Table 7 Default settings for miRNA_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 microRNA Analysis Perform Multiplicative Detrending Robust Neg Ctrl Stats Choose universal error or most conservative Use Surrogates Output GeneView File MultErrorGreen MultErrorRed Auto Estimate Add Error Red Auto Estimate Add Error Green Analyze By Effective Feat size Is Probe Detected Multiplier Exclude non detected probes Maximum Number of Features Minimum Number of Ratios Low Signal Percentile Is Gene Detected Multiplier High Signal Percentile Minimum Noise Multiplier Throw away ratios greater than Default Total Gene Signal if all probes are not detected Set the Total Gene Signal to the Total Gene Error Feature Size Fraction by Array Type False True Use Universal Error Model 0 1000 0 1000 True True False True True 10000 200 50 00 3 0 90 00 10 00 1 50 3 0 True 0 10 False Automatically Determine Low Dens
186. in exponential notation for all result files Feature Extraction Reference Guide 177 3 Text File Parameters and Results Feature results FEATURES The bottom section of the text file gives descriptions of the results for each feature Results are reported to 9 decimal places in exponential notation for all result files FULL Features Table Table 22 Feature results contained in the FULL output text file FULL FEATURES table Features Green Features Red Types Options Description FeatureNum integer Feature number Row integer Feature location row Col integer Feature location column Accessions text Gene accession numbers Chr_coord text Chromosome coordinates of the feature SubTypeMask integer Numeric code defining the subtype of any control feature SubTypeName integer Name of the subtype of any control feature Start integer Indicates the place in the transcript where the probe sequence starts Sequence text The sequence of bases printed on the array ProbeUID integer Unique integer for each unique probe in a design 178 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 22 Feature results contained in the FULL output text file FULL FEATURES table continued Features Green Features Red Types Options Description ControlType integer Feature control type See XML Control Type output on page 220 for definitions 0 Control type none 1 Positive control z Neg
187. ion mismatch or false Feature Extraction Reference Guide MAGE ML XML File Results 4 Table 31 Control Type Definitions Name XML Probe false Positive Control pos or positive Negative Control neg or negative Not Probe notprobe Not Probe These features are feature extracted but they are not used by Feature Extraction as input to any calculations these features are not used during outlier analysis or for the dye normalization calculation However dye normalization values and ratios are calculated and the results appear in the text and XML output files and the feature extraction visual results file An exception is that Not Probe s background is used in the calculation of the local background with the radius method Conversion of feature flag information Failed MAGE ML produce the following settings Bit 8 green and 12 red are set if the feature is saturated in both channels Bit 18 is set if the feature or its deletion control is a non uniformity outlier in either color or if the feature is a population outlier in either color and the Report Population Outliers as Failed in MAGE ML file option is set to True Bit 23 is set if the probe is low specificity e g when the deletion control is greater than or equal to the feature Feature Extraction Reference Guide 221 4 MAGE ML XML File Results TIFF Results See the Feature Extraction 12 0 User Guide for more information on the File
188. iplied by the total number of probe replicates the EffectiveFeature SizeFraction the Nominal Spot Area and the Weight For miRNA analyses glotalGeneSignal float This signal is the sum of the total probe signals in the green channel per gene For miRNA analyses glotalGeneError float This error is the square root of the sum of the squares of the TotalProbeError For miRNA analyses glsGeneDetected boolean Lets you know if the gene was detected on the miRNA microarray Results are reported to 9 decimal places in exponential notation for all result files Feature Extraction Reference Guide 193 3 Text File Parameters and Results OC Features Table Table 24 Feature results contained in the QC output text file QC FEATURES table Features Green Features Red Types Options Description FeatureNum integer Feature number Row integer Feature location row Col integer Feature location column SubTypeMask integer Numeric code defining the subtype of any control feature ControlType integer Feature control type See XML Control Type output on page 220 for definitions Control type none Positive control Negative control SNP Not probe See Ch 4 for definition Ignore See Ch 4 for definition 15000 20000 30000 ProbeName text An Agilent assigned identifier for the probe synthesized on the microarray SystematicName text This is an identifier for the target sequence that the probe was designed t
189. is computed independently in each channel These pixels are omitted from all subsequent calculations Number of outlier pixels per feature with intensity lt lower threshold set via the pixel outlier rejection method The number is computed independently in each channel NOTE The pixel outlier method is the ONLY step that removes data in Feature Extraction Total number of pixels used to compute feature statistics i e total number of inlier pixels per spot same in both channels Feature Extraction Reference Guide 211 4 MAGE ML XML File Results Table 29 Feature results Full contained in the MAGE ML FEATURES table Quant Features Green Features Red Options Description Type Measur Green Measured Red Measured Raw mean signal of feature in green ed Signal Signal red channel Signal SQT gMedianSignal rMedianSignal Raw median signal of feature in green red channel SQT gNetSignal rNetSignal MeanSignal minus DarkOffset Error Green PixSDev Red PixSDev Standard deviation of all inlier pixels per feature This is computed independently in each channel SQT gBGNumPix rBGNumPix Total Number of pixels used to compute Local BG statistics per spot i e total number of BG inlier pixels This number is computed independently in each channel Measur Green Background Red Background Mean local background signal local to ed corresponding feature computed per Signal channel SQT gBGMedianSignal rBGMedi
190. ith Where possible a public database identifier is used e g TAIR locus identifier for Arabidopsis Systematic name is reported ONLY if Gene name and Systematic name are different 200 Feature Extraction Reference Guide Text File Parameters and Results 3 Features Green Features Red Types Options Description LogRatio base 10 LogRatioError PValueLogRatio gProcessedSignal rProcessedSignal float float float float 1000 per feature log of rProcessedSignal gProcessedSignal If SURROGATES are turned off then if DyeNormRedSig lt 0 0 amp DyeNormGreenSig gt 0 0 if DyeNormRedSig gt 0 0 amp DyeNormGreenSig lt 0 0 if DyeNormRedSig lt 0 0 amp DyeNormGreenSig lt 0 0 If SURROGATES are turned off then if DyeNormRedSig lt 0 0 OR DyeNormGreenSig lt 0 0 IF SURROGATES are turned on then LogRatioError error of the log ratio calculated according to the error model chosen Significance level of the LogRatio computed for a feature The signal left after all the Feature Extraction processing steps have been completed In the case of one color ProcesssedSignal contains the Multiplicatively Detrended BackgroundSubtracted Signal if the detrending is selected and helps If the detrending does not help this column will contain the BackgroundSubtractedSignal Feature Extraction Reference Guide 201 3 Text File Parameters and Results Fea
191. ity 8 pack OR Feature Extraction Reference Guide 53 1 Default Protocol Settings Table 7 Default settings for miRNA_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Calculate Metrics Generate Results Spikein Target Used Min Population for Replicate Stats Grid Test Format Minimum percentage of features needed to be found PValue for Differential Expression Percentile Value Type of QC Report Generate Single Text File JPEG Down Sample Factor High Density 8 pack True 5 Automatically Determine Recognized formats 60 micron and 30 micron feature size third party 1 99 for 30 micron and 65 micron feature size 0 010000 75 00 miRNA True 4 54 Feature Extraction Reference Guide Default Protocol Settings Differences in Protocol Settings Based on Each Step Some of the default settings are the same for all the protocols yet many are different depending on the protocol step Table 8 shows each protocol step and where you can find information on the default settings for that step Table 8 Location of protocol template default settings for each step Protocol step Location of default settings Place Grid page 56 Optimize Grid Fit page 57 Find Spots page 58 Flag Outliers page 59 Compute Bkgd Bias and page 61 Error Correct Dye Biases page 64 Compute Ratios page 65 Calculate Metrics page 65 Generate Results page 65 Feature Ex
192. ity features of both red and green channels separately This is described graphically in the following figure 259 5 5 How Algorithms Calculate Results eMeanSienal eSpatialDetrendSurfaceValue Figure 64 The effect of a 2 dimensional Loess fit to the green mean sig nal intensities across the array You can find more informa tion on the algorithm from the Web site http www itl nist gov div898 handbook pmd section1 pmd144 htm If N number of data points selected for surface fitting after filtering and I i point from the filtered low intensity data set the Loess algorithm fits a surface through these data points to obtain an intensity value describing the surface corresponding to each input data point Let O denote the fitted output surface corresponding to the qh input point J The statistical results that come out of this calculation are described in the table on the next page 260 Feature Extraction Reference Guide How Algorithms Calculate Results 5 Table 35 Statistical results of spatial detrend algorithm Result Description and Equation SpatialDetrendRMSFit SpatialDetrendRMSFiltered minusFit SpatialDetrendSurfaceArea This result gives an idea of the extent of the surface fit It is the root mean square of the fitted data points obtained from the Loess algorithm 2 gt 0 i 1 N N 2 O 1 N 12 W 12 This result is the approximate residual from
193. ity used in the combined XDR output Indicates that the automatic image processing was flagged as needing evaluation Number of genes that do not have any replicate features on the array where both color channels are not Feature Non Uniform outliers If multiple probes address the same gene this value actually states the number of probes that have no non uniform replicates Number of genes that have replicate features on the array 170 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description gMultDetrendMeanSignal Difference EffectiveFeatureSizeFraction Feature UniformityAnomaly Fraction UsedDefaultEffectiveFeature Size gPercentilelntensityProcessed __rPercentilelntensityProcessed Signal Signal glotalSignal99pctile float float float integer float float This is output for miRNA only If multiplicative detrending is turned on the meanSignal over all replicated noncontrols is calculated before detrending and after detrending The difference in mean signals is reported here Because the mean signal should not change this number should be close to 0 Without Multiplicative detrending this number is always 0 Estimates the ratio of the effective feature size to the nominal feature size It is calculated by looking at the
194. kground used for background subtraction on this feature Boolean flag indicating if a feature is WellAbove Background or not feature passes g r lsPosAndSignif and additionally the g r BGSubSignal is greater than 2 6 g r BG_SD You can change the multiplier 2 6 Feature Extraction Reference Guide 185 3 Text File Parameters and Results Table 22 Feature results contained in the FULL output text file FULL FEATURES table continued Features Green Features Red Types Options Description gBGUsed rBGUsed gBGSDUsed rBGSDUsed IsNormalization gDyeNormSignal rDyeNormSignal gDyeNormError rDyeNormError DyeNormCorrelation ErrorModel xDev float float boolean float float float float g r BGSubSignal g r MeanSignal g r BGUsed 1 Feature used 0 Feature not used 0 Propagated model chosen by you or by software 1 Universal error model chosen by you or by software Background used to subtract from the MeanSignal variable also used in t test To display the values used to calculate this variable using different background signals and settings of spatial detrend and global background adjust see Table 34 on page 254 Standard deviation of background used in g r channel variable also used in t test and surrogate algorithms To display the values used to calculate this variable using different background signals and settings of spatial detrend and global backgroun
195. l 10 88 35 ProcessedSignal 10 88 page 104 Reproducibility C for Replicated Probes plot 100 0 12 1 21 41 61 81 101 121 141 161 181 p pe Ror 10 Reproducibility plot for A z A gs Median Proc Signal for Row miRNA non control E 10 Spatial Distribution of Median Signals for each Column 5 robes on page 110 probes on pag 3 P M eS be 25 rs ae 4 P 3 POr 11 OC reports with metric z orn A 0 5 sets added on page 83 Ta eee 5 Log_gMedianProcessedSignal 3 2 CV for Green Evaluation Metrics for miRNA_QCMT_Jul11 i e 5 Excellent 3 Good 3 Evaluate 1 o 12 Spatial Distribution of 11 metric name Value Excellent Good Evaluate ie ome a my ea ae ee a Column i i id al Median Signals for each Row apne area Bes si Pere A Median Proc Signal for Column ke x lt gt and Column on page 102 fencers ike Obie AS Shas gTotalSignal Spctile 65 48 LabelingSpike InSignal 374 gt 2 50 lt 2 50 HybSpike InSignal 2 31 gt 2 50 lt 2 50 StringencySpike InRatio 1 43 gDDN 7 00 15t 15 lt 15 or gt 15 rDDN 15t 15 lt 15 or gt 15 Excellent Good Evaluate Figure 12 MicroRNA miRNA QC Report p2 Feature Extraction Reference Guide Non Agilent GE2 QC Report This report lists all of the same information as the 2 color gene expression QC report but with no spike ins 1 QC Report Headers on page 87 2 Spot finding of Four Corners on page 90 3 Outlier St
196. l Output Package Table 27 Scan protocol parameters in MAGE ML result file Parameter Image acquisition identifier Log information Activity date Scanner information Operator ScanNumber Red LASER_POWER_VALUE Green LASER_POWER_VALUE Red PMT_GAIN_VALUE Green PMT_GAIN_VALUE Red Saturation_Value Green Saturation_Value Description Barcode or identifier for microarray Warnings and errors during run Time stamp for scanner run Information such as name make model and serial number of scanner Person that runs scanner Number of the scan associated with the values listed in this table Value of laser power in red channel Value of laser power in green channel Photomultiplier gain in red channel Photomultiplier gain in green channel Signal value beyond which signal is saturated in the red channel Signal value beyond which signal is saturated in the green channel Feature Extraction Reference Guide MAGE ML XML File Results 4 Table 27 Scan protocol parameters in MAGE ML result file continued Parameter Description MICRONS_PER_PIXEL_X MICRONS_PER_PIXEL_Y GlassThickness Red DarkOffsetAverage Green DarkOffsetAverage PercentAutoFocusHold DarkOffsetSubtracted Radius of pixel in the x direction Radius of pixel in the y direction Thickness of microarray slide Dark offset data per image in red channel as measured by scanner Dark offset data per image in green channel as measured by
197. lSub rlsInNegCtrlRange rlsUsedInMD float float boolean float float float boolean float boolean boolean 1 Feature used 0 Feature not used This signal is the sum of the total probe signals in the green channel per gene For miRNA analyses This error is the square root of the sum of the squares of the TotalProbeError For miRNA analyses Lets you know if the gene was detected on the miRNA microarray A surface is fitted through the log of the background subtracted signal to look for multiplicative gradients A normalized version of that surface interpolated at each point of the microarray is stored in MultDetrendSignal The surface is normalized by dividing each point by the overall average of the surface That average is stored in MultDetrendSurfaceAverage as a statistic 1 color only Indicates the Background signal that was selected to be used Mean or Median Indicates the Background error that was selected to be used PixSD or NormlQR A Boolean used to flag features used for computation of global BG offset Value at the polynomial fit of the negative controls Set to true for a given feature if its signal intensity is in the negative control range Indicates whether this feature was included in the set used to generate the multiplicative detrend surface 188 Results are reported to 9 decimal places in exponential notation for all result files Feature Extracti
198. less than 10 or less than 10 of the replicated sequence that is if there are not enough bright replicated probes the Median CV field shows up as 1 Spike in probes The same algorithm is used to calculate the Median CV for the spike in probes as well Because there are only ten sequences in total and some are expected to fail the Additive error test described above the minimum number of bright enough sequences required to calculate the Median CV is 3 105 2 106 OC Report Results Microarray Uniformity 2 color only The QC Report has two metrics that measure the uniformity of replicated log ratios and that indicate the span of log ratios average S N and AbsAvgLogRatio These are calculated from inlier features of replicated non control and spike in probes For example some microarrays have 100 different non control probe sequences with 10 replicate features each For each replicate probe the average and SD of the log ratios are calculated The signal to noise S N of the log ratio for each probe is calculated as the absolute of the average of the log ratios divided by the SD of the log ratios From the population of 100 S N s for example the average S N is determined and shown in Figure 33 The second metric AbsAvgLogRatio indicates the amount of differential expression up regulated or down regulated As described above averages of log ratios are calculated for each replicated probe The absolute of
199. lid only if there is no background subtraction spatial detrending is on and there is no global background adjustment For an explanation of BGUsed with other background settings see rBGSubSignal rMeanSignal rGBGUsed Table ar on page 254 13430 2 13502 52 72 2993 Results from Correct Dye Biases Algorithm Refer to Data from the STATS Table on page 291 for the FeatureNum gDyeNormSignal rDyeNormSignal LinearDyeNormFactor value 12519 45834 1 49209 6 rDyeNormSignal rBGSubSignal x rLinearDyeNormFactor x rhOWESSDyeNormFactor 49209 6 13430 2 x 4 14607 x rLOWESSDyeNormFactor 292 Feature Extraction Reference Guide How Algorithms Calculate Results 5 Results from Compute Ratios and Errors Algorithm FeatureNum gSurrogateUsed rSurrogateUsed gProcessedSignal rProcessedSignal 12519 0 0 45834 13 49209 64 FeatureNum LogRatio LogRatioError PValueLogRatio 12519 0 0308611696 0 06148592089 0 6157220099 Feature Extraction Reference Guide For the red channel does the feature number 12519 pass the two criteria listed that are required to calculate an accurate and reproducible log ratio e Feature is positive and significant vs background i e IsPosAndSignif 1 e BGSubSignal is greater than its background standard deviation i e BGSDUsed For this example calculation feature number 12519 passed both criteria Since rSurrogateUsed 0 the rDyeNormSignal is the same value as the rProcessedSignal rProcess
200. lor application Threshold_Max 50 Test5 What is the difference between feature centers found by the gridding algorithm vs the spot finding algorithm Stat names Max CentroidDiffx CentroidDiffY Threshold_Max 10 Optional Test6 How many features along the edge of the microarray are flagged as non uniform outliers in either channel This test is used only if one of these two metrics is unavailable e No replicated features are present to calculate the NonCtrlIMedPrentCVBGSubSig metric e Or no NegControls are present to calculate the StdDev Stat name MaxNonUnifEdges Threshold_Max 10 282 Feature Extraction Reference Guide How Algorithms Calculate Results 5 MicroRNA Analysis This step is only used for the feature extraction of microRNA microarray 1 color images This analysis samples multiple probes with multiple features per probe and reports the measurements and errors as the TotalGeneSignal and TotalGeneSignalError for each of the miRNAs of the 8 pack microarray These values are reported in both the text file and a new file called the GeneView file Several steps are needed to calculate the total gene signal First you calculate the TotalProbeSignal and then you sum the TotalProbeSignal over the number of probes per gene To calculate the TotalProbeSignal and the TotalProbeError this algorithm does the following steps a Calculates the EffectiveFeatureSizeFraction b Finds the robust average of all
201. luated using up to three threshold levels Metric Evaluation Logic tables In the following tables evaluation metrics are described for 18 cases IDs Results are compared to four limit values shown in the Limits used table upper limit upper warning limit lower warning limit and lower limit v1 through v4 The logic used is described in the center table showing the metric evaluation indication Excellent Good Evaluate that 125 2 OC Report Results is based on how the result compares to the given limit value s Cases covered indicate the type of threshold along with the boundaries that are displayed in the QC Report value gt Upper limit gt Evaluate value gt Upper Warning limit and value lt Upper limit gt Good value gt Lower Warning limit and value lt Upper warning limit gt Excellent value gt Lower limit and value lt Lower Warning limit gt Good value lt Lower limit gt Evaluate Good lt vl vi to v4 gt vi or lt v4 gt v4 Le aen a A aa S m S S Evaluate m gt vi lt v4 Cases covered D 2 level metrics used in FEv10 5 _ we fetes peal Good Evaluate A 2 level metrics that may be used in FEv10 7 Figure 49 126 QC Metrics evaluation tables and cases OO v2 to v3 vi to v2 or lt v3 gt vi v2 to v4 vi to v2 gt vi or lt v4 vi to v3 lt v3 gt vi
202. m Value D98 LogRatio ersus Log Processed Signal LogRatio base 2 oo 6 8 10 12 14 16 18 20 22 24 26 28 30 Log base 2 ProcessedSignal Significantly Negative Used to normalize Significantly Positive Not Significantly Different Histogram of LogRatio 7000 14 25 20 1 5 1 0 0 5 00 05 10 15 20 Log Ratio a Figure 10 CGH_ChIP QC Report p2 Feature Extraction Reference Guide MicroRNA miRNA OC Report This module shows you the organization of the 1 color miRNA QC report See the following figure and the figures on the next pages for links to information on each of the QC Agilent miRNA microarrays are currently in development Check the Agilent Web site for the latest information 1 QC Report Headers on page 87 2 Spot finding of Four Corners on page 90 3 Outlier Stats on page 91 4 Spatial Distribution of All Outliers on page 91 5 Net Signal Statistics on page 93 6 Negative Control Stats on page 94 7 Histogram of Signals Plot 1 color GE or CGH on page 96 Feature Extraction Reference Guide Report regions QC Report Agilent Technologies miRNA Date Monday Noverrber 28 2011 18 45 US23502302_251911811018_S01_H ip 1a Protocol miRNA_1100_Jul11 Read Only User Name 1 KM1 FE Version 114 0 0 7 Sample red green Grid Normal Local Se Background 3 Non Uniform 0 0 Population 779
203. mSignal LinearDyeNormFactor Table 17 on page 129 ProcessedSignal ProcessedSigError LogRatio A surface is fitted through the log of the background subtracted signal to look for multiplicative gradients A normalized version of that surface interpolated at each point of the microarray is stored in MultDetrendSignal The surface is normalized by dividing each point by the overall average of the surface That average is stored in MultDetrendSurfaceAverage as a statistic If the protocol uses the option to fit to only replicate features the surface is normalized for the fit The MultDetrend SurfaceAverage is smaller in this case a number around 1 A non zero surrogate value indicates that the MeanSignal is less than or not significant versus the background or the BGSubSignal is less than the Error where the Error is the Additive Error for all default Agilent Protocols A dye normalized signal calculated by multiplying the BGSubSignal with the appropriate DyeNormFactor A global constant to normalize the dye bias from all feature background subtracted signals LinearDyeNormFactor is calculated such that geometric mean intensity of the selected normalization features equals 1000 The signal left after all the Feature Extraction processing steps have been completed In the case of 1 color ProcessedSignal contains the Multiplicatively Detrended BackgroundSubtracted Signal if the detrending is selected and helps If the detrending
204. mats 65 micron on the format feature size 30 micron feature size and Third Party Iteratively Adjust Corners Hidden if Array Format is set to Automatically Determine True All Formats except Third Party False Third Party Adjustment Threshold Hidden if Array Format is set to Automatically Determine 0 300 All Formats except Third Party Maximum Number of Iterations Hidden if Array Format is set to Automatically Determine 5 All Formats except Third Party Found Spot Threshold Hidden if Array Format is set to Automatically Determine 0 200 All Formats except Third Party Number of Corner Feature Side Hidden if Array Format is set to Dimension Automatically Determine 20 All Formats except Third Party Find Spots Spot Format Third Party Use the Nominal Diameter fromthe True Grid Template Spot Deviation Limit 1 50 Calculation of Spot Statistics Use Cookie Method Cookie Percentage 1 000 Exclusion Zone Percentage 1 200 Ag Feature Extraction Reference Guide Table6 Default settings for GE2 NonAT_1100_Jul11 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 Auto Estimate the Local Radius LocalBGRadius Pixel Outlier Rejection Method RejectlORFeat RejectlQRBG Statistical Method for Spot Values from Pixels Flag Outliers Compute Population Outliers Minimum Population 0Ratio Background lIQRatio Use Otest for Small Populations Report Population Outliers
205. mber of multiples of the negative Bias and Error control spread that defines the signal range within which features are considered to be within the negative control range for FeaturesInNegativeControlRange background detrend option Compute Bkgd BGSubtractor_NegCtrlSpreadRobust float Specifies to remove negative control Bias and Error On features that are outliers before calculating the negative control spread for use with FeaturesInNegativeControlRange Compute Bkgd BGSubtractor_AdditiveDetrend integer Determines which features are considered Bias and Error FeatureSet for the surface fit set 0 All inlier features 1 Negative control inliers only 2 Features in negative control range Compute Bkgd BGSubtractor_DetrendNeighborhood float Specifies the fraction of total number of Bias and Error Size neighborhood data points that will be weighted for linear regression during surface fitting for each data point Compute Bkgd BGSubtractor_ErrModelSignificance integer Decides whether the error model or pixel Bias and Error 0 pixel staistics are used to determine Positive statistics and Significance calls and WellAboveBackground 1 error model Feature Extraction Reference Guide 141 3 Text File Parameters and Results Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Compute Bkgd BGSubtractor_RobustNCStats integer Specifies if a
206. n e Add and remove design file i e grid templates e Add and remove and export protocols e Add remove and export metricsets files e Add remove and export dyenormlist e Get the barcode from image file e Get the XDR Scan ID from image file e Link protocol to design file e Get all protocol list e Get all metric set list e Get all design file list e Get license status e Get license file text e Set license Command line syntax Feature Extraction Reference Guide FeNoWindows c command o output_file p protocol getxdrscanid tif_file getprotocollist q lt linktype b tif_file lt input_file gt command can be any of the following extract addgrid addprotocol adddyenormlist removegrid removeprotocol removedyenormlist linkprotocoltogrid exportprotocols exportdyenormlists If you do not specify a command it defaults to extract 299 300 Command Line Feature Extraction Commands and arguments extract CAUTION This command runs Feature Extraction on the input project FeNoWindows c extract o lt output_file gt lt input_file gt input_file The name of an xml project file with the extension fep output_file The name of the result xml file This file looks like a project file with the status added see the following description You must specify the o option when specifying the output file name or FeNoWindows will not create the file extract CAUTI
207. n Response Conc Statistic in the 1 color QC Report Log of high concentration in the linear range of curve fit eQCOneColorLinFitLogHigh float Agilent Spike In Concentration Response Signal Statistic in the 1 color QC Report Log of high signal in the linear range of curve fit eQCOneColorLinFitSlope float Agilent Spike In Concentration Response Statistic in the 1 color QC Report Slope of the linear range of curve fit eQCOneColorLinFitIntercept float Agilent Spike In Concentration Response Statistic in the 1 color QC Report Intercept of the linear range of curve fit eQCOneColorLinFitRSO float Agilent Spike In Concentration Response Statistic in the 1 color QC Report Square of the correlation coefficient of the linear range of curve fit eQCOneColorSpikeDetection float The detection limit as determined by Limit measuring the average plus 1 standard deviation of all spike in probes below the linear concentration range This value is the maximum of these gNonCtrl50PrentBGSubSig gNonCtrl50PrentBGSubSig float Background subtracted signal intensity at 50th percentile for all non control probes gCtrleQC50PrentBGSubSig rCtrleQC50PrentBGSubSig float The median background subtracted signal for all the embedded QC probes on the microarray 168 Feature Extraction Reference Guide Table 21 Text File Parameters and Results 3 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red C
208. n a variety of file formats including GAL and MAGE ML These files describe the gene probes and their number and spacing on the microarray Profile result files contain the signal and error information for each of the hybridized gene probes on the microarray Both pattern files and profile result files contain information that can be formatted in several ways tab delimited text format or an XML format MAGE ML Agilent only supports GEML2 Pattern files and MAGE ML profiles for use with Rosetta Resolver The pattern name in Rosetta Resolver should match the profile pattern name embedded in the profile data so that the data can be correctly associated To do this use the pattern autoimport function in Rosetta Resolver or correctly specify the pattern name when manually importing the pattern The Agilent pattern name in most cases is Agilent xxxxxx where the xxxxxx is the AMADID number of the microarray For transfer of data into GeneSpring the pattern information can be obtained from within the Feature Extraction profile tab text file or can be obtained by download from the GeneSpring Web site Feature Extraction Reference Guide MAGE ML XML File Results 4 MAGE ML results Differences between MAGE ML and text result files The MAGE ML result file includes most of the same parameters statistics and results as the FULL text result file with the following differences e Scanner control parameters are included in the file
209. nFileName text Name of the scan file used for Feature Extraction FeatureExtractor_ArrayName text Microarray filename FeatureExtractor_ScanFileGUID text GUID of the scan file FeatureExtractor_DesignFileName text Design or grid file used for Feature Extraction FeatureExtractor_ExtractionTime text Time stamp at the beginning of Feature Extraction FeatureExtractor_UserName text Windows Log In Name of the User who ran Feature Extraction FeatureExtractor_ComputerName text Computer name on which Feature Extraction was run FeatureExtractor_Version text Version of Feature Extractor FeatureExtractor_IsXDRExtraction integer Says if result is from an XDR extraction 1 True 0 False Feature Extraction Reference Guide 151 3 Text File Parameters and Results Table 18 List of parameters and options contained within the COMPACT text output file FEPARAMS table Protocol Step Parameters Type Options Description FeatureExtractor_ColorMode integer A flag to indicate output color 0 One color green only 1 2 color FeatureExtractor_OCReportlype integer Type of OC report to generate 0 Gene Expression 1 CGH_ChIP 2 miRNA 4 Streamlined CGH DyeNorm_NormFilename text Name of the dye normalization list file DyeNorm_NormNumProbes integer Number of probes in the dye normalization list Grid_IsGridFile boolean 152 Feature Extraction Reference Guide QC FEPARAMS Table Text File Parameters and Results 3 Table 19 List of parameters
210. ngle 1 or double pass scan mode on the Agilent Scanner Grid_Name text Grid template name or grid file name Grid_Date integer Date the grid template or grid file was created Grid_NumSubGridRows integer Number of subgrid columns Grid_NumSubGridCols integer Number of subgrid columns Grid_NumRows integer Number of spots per row of each subgrid Grid_NumCols integer Number of spots per column of each subgrid Grid_RowSpacing float Space between rows on the grid Grid_ColSpacing float Space between column on the grid Grid_OffsetX float In a dense pack array the offset in the X direction 150 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 18 List of parameters and options contained within the COMPACT text output file FEPARAMS table Parameters Type Options Description Grid_OffsetY float In a dense pack array the offset in the Y direction Grid_NomSpotWidth float Nominal width in microns of a spot from grid Grid_NomSpotHeight float Nominal height in microns of a spot from grid Grid_GenomicBuild text The build of the genome used to create the annotation if available If the genome build is not available not all designs have this information then it is not put out All recent and all future designs have it FeatureExtractor_Barcode text Barcode of the Agilent microarray read from the scan image FeatureExtractor_Sample text Names of hybridized samples red green FeatureExtractor_Sca
211. ns is different from what is expected from the scanner For instance an XDR scan set with 100 and 10 for PMT gain settings should yield a ratio close to 10 If this ratio is different than expected the Feature Extraction program may or may not have performed correctly But you should check the data in this case to confirm that the XDR combination is satisfactory This message is more likely to appear as the low intensity PMT gain setting gets closer 1 This is because the percentage error in the PMT gain setting increases as the setting moves away from 100 237 5 How Algorithms Calculate Results How each algorithm calculates a result 238 Place Grid Step 1 Place a grid to find the nominal spot positions After the Feature Extraction program automatically determines the format of the grid it initiates the next steps The algorithm reduces the two dimensional image data of the microarray to two one dimensional data sets that are further processed to determine the layout of the grid on the microarray Projection of the two dimensional microarray is performed to produce two one dimensional data sets projected signals From the one dimensional data sets peaks of the projected signals are filtered to determine which peaks to retain for further processing based on predetermined peak height and peak width thresholds Nominal spacing between the features may be estimated based on a statistical determination of a most f
212. ntrol feature set distribution This is illustrated in the following figure Figure 61 The purple surface represents a smoothed fit to all the nega tive control feature inliers The residual of the surface fit is the Error on background subtraction in the Additive Error Esti mation see Step 16 Determine the error in the signal calculation on page 266 Feature Extraction Reference Guide 257 5 258 How Algorithms Calculate Results FeaturesInNegativeControlRange This algorithm does two levels of filtering First it finds the features in the range of negative controls by fitting the negative controls to a surface and finding non control features whose signal is within 3 standard deviations of that fit Then it fits a Lowess curve to this set of features It interpolates from that fit to calculate a background signal for each feature This method is recommended for Agilent GE1 GE2 and miRNA microarrays For high density microarrays this algorithm can take a long time to complete its calculations To speed up the process you can elect in the protocol to randomly select a small percentage of the total points with which to calculate the fit To do this you set Perform Filtering for Fit to True which significantly reduces the amount of time for spatial detrending of high density microarrays 3 Residuals 3 Residuals Figure 62 The purple surface represents the smoothed fit of all fea tures pl
213. o calculate background statistics 1053 Feature Significance will be computed on Pixel Statistics since the Error Model is turned off 1055 FE unable to find attached If the attached protocol is not protocol s into database found in database Searching default protocol for extraction 1056 Unable to get application type If the application type is blank in from grid template FE grid template automatically treated the application type as Expression 1057 Grid template online Update Failed to check grid template for status reason update or failed to download Grid template during update 1058 Failed to import new grid Failed to import design file template into database during Grid template update reason Feature Extraction Reference Guide Index Numerics 1 color detrend algorithm 272 A Agilent scanner protocols difference between gene expression and CGH protocols 15 GE2_11kx2_1005 14 55 GE2_22k_1005 14 55 algorithms how calculate results 238 overview 224 results they produce 230 annotations public accession numbers 204 C command line syntax 299 commands add grid 301 addprotocol 301 exportprotocols 303 extract 300 303 linkprotocoltogrid 302 removegrid 302 removeprotocol 302 commands and arguments 300 compute ratios and errors calculate feature log ratio 279 calculate processed signal 278 calculate pvalue and log ratio error 279 calculate surrogate value 271 control
214. o hybridize with Where possible a public database identifier is used e g TAIR locus identifier for Arabidopsis Systematic name is reported ONLY if Gene name and Systematic name are different Description text Description of gene 194 Feature Extraction Reference Guide Text File Parameters and Results 3 Features Green Features Red Types Options Description PositionX float Found coordinates of the feature PositionY centroid in microns LogRatio base 10 float per feature log of rProcessedSignal gProcessedSignal If SURROGATES are turned off then 4 if DyeNormRedSig lt 0 0 amp DyeNormGreenSig gt 0 0 4 if DyeNormRedSig gt 0 0 amp DyeNormGreenSig lt 0 0 0 if DyeNormRedSig lt 0 0 amp DyeNormGreenSig lt 0 0 LogRatioError float If SURROGATES are turned off then 1000 if DyeNormRedSig lt 0 0 OR DyeNormGreenSig lt 0 0 IF SURROGATES are turned on then LogRatioError error of the log ratio calculated according to the error model chosen PValueLogRatio float Significance level of the LogRatio computed for a feature Feature Extraction Reference Guide 195 3 Text File Parameters and Results Features Green Features Red Types Options Description gProcessedSignal gProcessedSigError gNumPixOLHi gNumPixOLLo rProcessedSignal rProcessedSigError rNumPixOLHi rNumPixOLLo float float integer integer The signal left after
215. of feature calculated from the intensities of all inlier pixels that represent the feature after outlier pixel rejection The number of inlier pixels is shown in the column NumPix Find Spots BGMeanSignal Average raw signal of the local background calculated from intensities of all inlier pixels that represent the local background of the feature after outlier pixel rejection The number of inlier pixels is shown in the column BGNumPix Find Spots BGMedianSignal Median raw signal of the local background calculated from intensities of all inlier pixels that represent the local background of the feature after outlier pixel rejection The number of inlier pixels is shown in the column BGNumPix Find Spots NetSignal MeanSignal minus Dark Offset Find Spots IsSaturated A Boolean flag of 1 indicates that the feature is saturated at least 50 of the inlier pixels in the feature have intensities above the saturation threshold One can determine the saturation level of a feature by dividing the NumSatPix by the NumPix Flag Outliers IsFeatureNonUnifOL A Boolean flag of 1 indicates that the feature is a non uniformity outlier the measured feature pixel variance is greater than the expected feature pixel variance plus the confidence interval Flag Outliers IsFeatPopOL A Boolean flag of 1 indicates that the feature is a population outlier This means that the feature MeanSignal is greater than the upper rejection boundary or less than the lower reje
216. of nominal radius In the protocol this parameter is called the Spot Deviation Limit Find Spots SpotAnalysis_kmeans_moi_reject float Maximum allowable moment of inertia of factor the spot Find Spots SpotAnalysis_isspot_factor float Factor from the statistics of the found feature and background that indicates if the spot is a spot Find Spots SpotAnalysis_isweakspot_factor float Factor from the statistics of the found feature and background that indicates if the spot is a strong one Find Spots SpotAnalysis_BackgroundThreshold float Factor by which the individual spot background may vary from the running average of all the background means Find Spots SpotAnalysis_ROIType integer Type of Region of Interest Feature Extraction Reference Guide 133 3 Text File Parameters and Results Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Find Spots SpotAnalysis_UseNominalDiameter integer If True the nominal spot diameter from the FromGT 1 True grid template is used as a starting point for 0 False final spot diameter computation If False the nominal diameter is obtained from the grid placement algorithm Find Spots SpotAnalysis_RejectMethod integer 0 Pixel Outlier Rejection turned off 2 Standard Deviation based 3 Interquartile Range based Find Spots SpotAnalysis_StatBoundFeat float Multiplier parameters for feature outlier rejec
217. on gNonCtrlNumSatFeat rNonCtrlNumSatFeat integer The number of saturated non control features gNonCtrl99PrentNetSig rNonCtrl99PrentNetSig float NetSignal intensity at 99th percentile for all non control probes gNonCtrl50PrentNetSig rNonCtrl50PrentNetSig float NetSignal intensity at 50th percentile for all non control probes gNonCtrl1PrentNetSig rNonCtrl1 PrentNetSig float NetSignal intensity at 1st percentile for all non control probes gNonCtrlMedPrentCVBGSub rNonCtrlMedPrentCVBGSubSig float The median percent CV of Sig background subtracted signals for inlier noncontrol probes gCtrleQCNumSatFeat rCtrleQCNumSatFeat integer The number of saturated spike in features gCtrleQC99PrentNetSig rCtrleQC99PrcntNetSig float NetSignal intensity at 99th percentile of all spike in probes gCtrleQC50PrentNetSig rCtrleQC50PrentNetSig float NetSignal intensity at 50th percentile of all spike in probes gCtrleQC1PrentNetSig rCtrleQC1PrentNetSig float NetSignal intensity at 1st percentile of all spike in probes geQCMedPrentCVBGSubSig reQCMedPrentCVBGSubSig float The median percent CV of background subtracted signals for inlier spike in probes geQCSig2BkgLow1 reQCSig2BkgLow1 float Median ratio net signal to BGUsed of all inlier features for an spike in probe with lowest concentration spiked in red and green channels geQCSig2BkgLow2 reQCSig2BkgLow2 float Median ratio net signal to BGUsed of all inlier features for an spike in probe with second lowest con
218. on i e features with IsNormalization 1 LOWESSDyeNormFactor The LOWESS dye normalization method assumes that dye bias may be intensity dependent and therefore takes a local approach to dye normalization The LOWESS dye normalization factor is calculated by fitting the locally weighted linear regression curve to the chosen normalization features The amount of dye bias is determined from the curve at each feature s intensity Each feature gets a different LOWESS dye normalization factor per channel The LOWESS method corrects the log ratio data so that its central tendency after dye normalization lies along zero for all intensity ranges assuming an equal number of up and Feature Extraction Reference Guide How Algorithms Calculate Results 5 down regulated features in any given signal range The LOWESS DyeNorm Factor is derived for each channel by the procedure described on the next page a A linear regression curve is fit to the data in a plot of M vs A where M y axis Log R G and A x axis 1 2 x Log R G R and G represent the red and green background subtracted signals This LOWESS curve fit through the central tendency of the M vs A plot is defined as Mfit and is a function of A b The dye normalization step transforms the data so that the central tendency of Mfit at every A is shifted to be equal to zero c After the correction factor is determined for any feature it is split evenly over the red and green c
219. on Description dbj DNA Database of Japan emb EMBL gb GenBank gbpri GenBank primate nucleotide accession number gi GenBank Gene Identifier gp GenPept protein identification number mgi Mouse Genome Informatics pdb Brookhaven Protein data bank pir NBRF PIR prf Protein Research Foundation rafl RIKEN full Length cDNA ref RefSeq sp SwissProt tair The Arabidopsis Information Resource ug UniGenelocuslink LocusLink ID wi Whitehead 204 Feature Extraction Reference Guide Agilent Feature Extraction 12 0 Reference Guide 4 MAGE ML XML File Results How Agilent output file formats are used by databases 206 MAGE ML results 207 Helpful hints for transferring Agilent output files 220 This chapter provides a listing of MAGE ML results in the form of tables Refer to these tables when you want to know the results reported in a particular file This chapter also contains a section on TIFF files and formats phe Agilent Technologies 205 4 MAGE ML XML File Results How Agilent output file formats are used by databases Pattern files should be loaded to the database via FTP if possible to ensure that the pattern element name attribute is used to name the pattern 206 Data analysis programs must match up information about the layout and annotation of the microarray features with the profile result files for each microarray within their databases Agilent provides this design information for its microarrays i
220. on Reference Guide COMPACT Features Table Text File Parameters and Results 3 Table 23 Feature results contained in the COMPACT output text file COMPACT FEATURES table Features Green Features Red Types Options FeatureNum integer Row integer Col integer SubTypeMask integer ControlType integer 0 1 1 15000 20000 30000 ProbeName text SystematicName text Position X float Position Y Description Feature number Feature location row Feature location column Numeric code defining the subtype of any control feature Feature control type See XML Control Type output on page 220 for definitions Control type none Positive control Negative control SNP Not probe See Ch 4 for definition Ignore See Ch 4 for definition An Agilent assigned identifier for the probe synthesized on the microarray This is an identifier for the target sequence that the probe was designed to hybridize with Where possible a public database identifier is used e g TAIR locus identifier for Arabidopsis Systematic name is reported ONLY if Gene name and Systematic name are different Found coordinates of the feature centroid in microns Feature Extraction Reference Guide 189 3 Text File Parameters and Results Table 23 Feature results contained in the COMPACT output text file COMPACT FEATURES table continued Features Green Features Red Types Options Description LogRatio base 1
221. on Zone Percentage Auto Estimate the Local Radius LocalBGRadius Hidden if Array Format is set to Automatically Determine 8 0 for all formats except for third party for which it is set to 1 5 Hidden if Array Format is set to Automatically Determine Use Cookie All Formats Hidden if Array Format is set to Automatically Determine 0 650 Single Density 25k 0 561 Double Density 95k 0 700 185k 185k 10 uM 244k 10 uM 65 micron feature size 0 750 30 micron feature size Hidden if Array Format is set to Automatically Determine 1 200 All Formats except 30 micron feature size 1 300 30 micron feature size Hidden if Array Format is set to Automatically Determine True Single Density Double Density 25k 95k False 185k 185k 10uM 65 micron feature size 30 micron feature size 244k 10uM Hidden if Array Format is set to Automatically Determine 100 when False for 185k 185k 10uM 65 micron feature size 244k 10 uM 50 Feature Extraction Reference Guide Table 7 Default settings for miRNA_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 Pixel Outlier Rejection Method RejectlORFeat RejectlQRBG Statistical Method for Spot Values from Pixels Flag Outliers Compute Population Outliers Minimum Population IORatio Background IQRatio Use Qtest for Small Populations Report Population Outlier
222. only the A term is empirically determined For all other Agilent protocols the default selection in the protocol is to determine the B and C terms automatically Here is how the Feature Extraction program calculates these terms e Saturated features are omitted from the population of negative control probes NC This NC set and the local background regions associated with these features are used in the calculations e Calculates Net Signal e Calculates the pixel standard deviation and then squares it to yield the pixel variance e From a histogram plot of number of features or bkgd vs net signal finds the net signal value for the 25th percentile 249 5 250 How Algorithms Calculate Results e From a histogram plot of number of feature or local bkgd vs variance finds the variance for the 25th percentile e Calculates the B term as 25 NetSignal X B Term Multiplier and the C term as 25 Variance X C Term Multiplier For a given scanner multipliers need to be determined This tuning should use many images from different batches of microarrays different users and different processes Different channels may need their own multipliers Measured Feature or Background Variance n 1 9 x X X 19 i 0 1 n l1 2 Iu where n is of inlier pixels in the feature or background i e NumPix or BGNumPix respectively where X is raw pixel intensity in the feature or background inlier pixels
223. oolean of 1 or a boolean of 0 if the grid has not been optimized Find Spots This algorithm locates the exact size and centroid of each spot on the scanned microarray Once the spot centroids have been located the CookieCutter algorithm or WholeSpot algorithm defines the feature for each spot The software then defines the local background for each spot based on the radius of a circle drawn around the spot Next the pixel outlier algorithm identifies outlier pixels in the feature and in the local background for each spot These pixels are then omitted from further calculations This is the only point where data is omitted Subsequent outlier analyses flag data but do not remove the data Inlier pixels within the cookie area represent a feature while the inlier pixels within the annulus around the feature after excluding the exclusion zone represent the local background 225 5 226 How Algorithms Calculate Results The Feature Extraction program calculates the following values from these inlier pixels mean median standard deviation normalized IQR and number of inlier pixels XDR extraction This is the only step that is run twice on an XDR extraction The spot placement and spot measurements are found separately for the high and low intensity scans Then the XDR algorithm decides on a feature by feature basis which scan the data should come from more on this follows For features that are very bright in the high inten
224. or which it is set to True 36 Feature Extraction Reference Guide Table 5 Default settings for GE2_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 Optimize Grid Fit Grid Format Find Spots Spot Format The parameters and values for optimizing the grid differ depending on the format Iteratively Adjust Corners Adjustment Threshold Maximum Number of Iterations Found Spot Threshold Number of Corner Feature Side Dimension Depending on the format selected by the software or by you the default settings for this step change See the following rows for the default values for finding spots Use the Nominal Diameter from the Grid Template Automatically Determine Recognized formats 65 micron feature size 30 micron feature size and Third Party Hidden if Array Format is set to Automatically Determine True All Formats except Third Party False Third Party Hidden if Array Format is set to Automatically Determine 0 300 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 5 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 0 200 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 20 All Formats except Third Party Automatically Determine Recognized formats same as those listed ab
225. orner Features Side Dimension Default values True False 0 300 Not applicable for Third Party 5 Not applicable for Third Party 0 200 Not applicable for Third Party 20 Not applicable for Third Party Formats using Default Value 65 micron feature size 30 micron feature size Third Party 65 micron feature size 30 micron feature size 65 micron feature size 30 micron feature size 65 micron feature size 30 micron feature size 65 micron feature size 30 micron feature size Feature Extraction Reference Guide 57 1 Default Protocol Settings Find spots The parameters and values differ depending on the microarray format Table 11 Find spots Default values in common and differences for spot formats Parameter Use the Nominal Diameter from the Grid Template Spot Deviation Limit Calculation of Spot Statistics Method Cookie Percentage Exclusion Zone Percentage Auto Estimate the Local Radius LocalBGRadius Pixel Outlier Rejection Method RejectlORFeat RejectlORBG Statistical Method for Spot Values from Pixels Default values True 8 0 Use Cookie 0 650 0 561 0 700 0 750 1 200 1 300 True When False is the default 100 When False is the default 150 Inter Quartile Region 1 42 1 42 Use Mean Standard Deviation Formats using Default Value All All except third party where it is set to 1 5 All SD 25k TP DD 95k 185k 185k 10uM 65 mi
226. orrelation method to obtain origin X of subgrids is set to False results obtained from the projection data analysis are used to estimate the origin Selecting this option will use the same calculations used in Feature Extraction version 10 7 10 9 or earlier When the flag is set to True the software performs one extra step of correlation following the projection data analysis to get the origin This option is of use particularly in cases where pack edges have dim spots and are failing to grid Feature Extraction Reference Guide How Algorithms Calculate Results 5 Optimize Grid Fit Find Spots Feature Extraction Reference Guide Step 2 Iteratively adjust grid by examining the corner spots This algorithm improves the grid fit by leveraging from the Spot Finder algorithm Looking only at the specified square area of features at each corner of the microarray it performs the iteratively adjust corners method up to the maximum number of iterations specified in the protocol It adjusts the grid only if the following criteria are met The absolute average difference between the grid position and the spot position is within the specified Adjustment Threshold The number of features considered found by the spot finder algorithm is within the specified Found Spot Threshold Step 3 Locate the spot centroids The calculation is based on an iterative Bayesian probability based pixel classification A binary feature mask is created that
227. ove except 244k 10uM replaces 65 micron feature size 10 micron scans Hidden if Array Format is set to Automatically Determine True All Formats Feature Extraction Reference Guide 37 1 Default Protocol Settings Table5 Default settings for GE2_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Spot Deviation Limit Calculation of Spot Statistics Method Cookie Percentage Exclusion Zone Percentage Auto Estimate the Local Radius LocalBGRadius Hidden if Array Format is set to Automatically Determine 8 0 for all formats except for third party for which it is set to 1 5 Hidden if Array Format is set to Automatically Determine Use Cookie All Formats Hidden if Array Format is set to Automatically Determine 0 650 Single Density 25k 0 561 Double Density 95k 0 700 185k 185k 10 uM 244k 10 uM 65 micron feature size 0 750 30 micron feature size Hidden if Array Format is set to Automatically Determine 1 200 All Formats except 30 micron feature size 1 300 30 micron feature size Hidden if Array Format is set to Automatically Determine True Single Density Double Density 25k 95k False 185k 185k 10uM 65 micron feature size 30 micron feature size 244k 10uM Hidden if Array Format is set to Automatically Determine 100 when False for 185k 185k 10uM 65 micron feature size 244k 10 uM 38 Feature Extraction
228. pixels in a feature are above the saturation threshold Boolean flag indicating if a feature is a NonUniformity Outlier or not A feature is non uniform if the pixel noise of feature exceeds a threshold established for a uniform feature 202 Feature Extraction Reference Guide Text File Parameters and Results 3 Features Green Features Red Types Options Description glsFeatPopnOL rlsFeatPopnOL glsWellAboveBG rlsWellAboveBG boolean g r lsFeatPopnOL 1 indicates Feature is a population outlier in g r Boolean Boolean flag indicating if a feature is a Population Outlier or not Probes with replicate features on a microarray are examined using population statistics A feature is a population outlier if its signal is less than a lower threshold or exceeds an upper threshold determined using a multiplier 1 42 times the interquartile range i e IQR of the population Boolean flag indicating if a feature is WellAbove Background or not feature passes g r lsPosAndSignif and additionally the g r BGSubSignal is greater than 2 6 g r BG_SD You can change the multiplier 2 6 Feature Extraction Reference Guide 203 3 Text File Parameters and Results Other text result file annotations The following public accession numbers may or may not show up in the Feature Results section of the output text file Table 26 Public accession numbers in the output text file Abbreviati
229. ple Images that Agilent provides on the Feature Extraction software installation CD 290 Feature Extraction Reference Guide How Algorithms Calculate Results 5 Data from the FEPARAMS table BGSubtractor BGSubMethod BGSubtractor_BackgroundCorrectionOn BGSubtractor_SpatialDetrendOn 7 0 1 The BGSubMethod of 7 corresponds to No Background Subtraction method see Table 17 on page 129 of this guide Global Background Adjustment is turned Off Spatial Detrending is turned On Data from the STATS Table LowessDyeNormFactor is not shown in Feature Extraction result _ gLinearDyeNormFactor rLinearDyeNormFactor file This value can be back calculated using DyeNormSignal equation on page 245 15 881 4 14607 Data from the FEATURES Table Results from Find And Measure Spots Algorithm FeatureNum gNumPix rNumPix gMeanSignal rMeanSignal gPixSDev rPixSDev 12519 62 62 3021 774 13502 52 187 8805 1102 547 Feature Extraction Reference Guide 291 5 How Algorithms Calculate Results Results from Correct Bkgd and Signal Biases Algorithm FeatureNum gSpatialDetrendSurfaceValue rSpatialDetrendSurfaceValue 12519 81 5464 72 2993 FeatureNum gBGUsed rBGUsed gBGSDUsed rBGSDUsed gBGSubSignal rBGSubSignal 12519 81 5464 72 2993 3 5514 5 34552 2940 23 13430 2 FeatureNum glsPosAndSignif rlsPosAndSignif glsWellAboveBG rlsWellAboveBG 12519 1 1 1 1 rBGUsed rSpatialDetrendSurface Value 72 2993 72 2993 Note that this equation is va
230. r Guide glotalProbeError Un pr gt gProcSignalError pp c Totpp E S W 36 PR Inpp glotalGeneSignal NumProbesPerGene b gt gTotalProbeSignal 37 i 0 284 Feature Extraction Reference Guide How Algorithms Calculate Results 5 Table 37 Statistics and Results for the MicroRNA Analysis continued see also Table 32 Algorithms Protocol Steps and the results they produce on page 230 Feature or Stat Equation or Description glotalGeneError gGeneSignal gProbeRatio IsGeneDetected gEffectiveFeatureSizeFraction gFeatureUniformityAnaomalyFraction gUsedDefaultEffectiveFeatureSize INumProbesPerGene 5 gt gTotalProbeError 38 i 0 This signal is the log 9 transformed value of the gTotalGeneSignal value calculated for each of the four miRNA spike in genes within the subtype mask 8196 This is the log transformed value of the ratio of the TotalGeneSignal value for the longer probe divided by the TotalGeneSignal value for shorter probe The probe length can be determined from the probe name itself for example dmr_6_17 means 17 is the probe length This flag marks a gene as detected or not detected It is computed by checking all the probes that make up the gene A probe is considered detected if its signal is some multiple of its error where the multiplier is defined in the Feature Extraction protocol default 3 If one probe of the set of probes comprising the gene is dete
231. raction Reference Guide See the Feature Extraction 12 0 User Guide to learn the purpose of all the parameters and settings and how to modify them Agilent protocols are meant for use with Agilent microarrays scanned with an Agilent scanner They are intended for use with arrays that use Agilent default lab procedures label hybridization wash and scanning methods The non Agilent protocol is meant for use with non Agilent microarrays that are scanned with an Agilent scanner Agilent Feature Extraction 12 0 Reference Guide 1 Default Protocol Settings Default Protocol Settings an Introduction 14 Tables of Default Protocol Settings 16 Differences in Protocol Settings Based on Each Step 55 When a protocol is assigned to an extraction set the software loads a set of protocol parameter values and settings that affect the process and results for Feature Extraction Parameter values in the protocol depend on the microarray type and your experiment The following pages list the default settings for each of the protocol templates shipped or downloaded with the software Each protocol template represents a different microarray type You can display these settings and values when you open the Protocol Editor for each of the protocol templates os a Agilent Technologies 1 Default Protocol Settings Default Protocol Settings an Introduction To learn more about changing the default values for the protocols see the Featu
232. rage signal to noise value of log ratios of all inlier spike in probe sets with a minimum number of replicates The additive error estimated for the microarray in the green channel The additive error estimated for the microarray in the red channel Total number of features that show up in output file Number of up regulated non control probes Number of down regulated non control probes For 2 color QC report Slope of the linear regression fit of the plot of the expected versus observed average log ratio for each spike in probe For 2 color QC report Intercept of the linear regression fit of the plot of the expected versus observed average log ratio for each spike in probe Feature Extraction Reference Guide 165 3 Text File Parameters and Results Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description eQCObsVsExpCorr NumlsNorm ROI Width ROI Height CentroidDiffx CentroidDiffY NumFoundFeat MaxNonUnifEdges MaxSpotNotFoundEdges gMultDetrendRMS Fit rMultDetrendRMS Fit float integer float float float integer float float float For 2 color QC report The R2 value of the linear regression fit of the plot of the expected versus observed average log ratio for each spike in probe Number of features used for normalization The width or height in pixels of the region of intere
233. rameters and Results 3 Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Compute Bkgd Bias and Error Compute Bkgd Bias and Error Compute Bkgd Bias and Error Compute Bkgd Bias and Error Compute Bkgd Bias and Error Compute Bkgd Bias and Error Compute Bkgd Bias and Error BGSubtractor_AddErrorGreen BGSubtractor_AddErrorRed BGSubtractor_MultNcAutoEstimate BGSubtractor_MultRMSAutoEstimate BGSubtractor_MultResidualsRMSAuto Estimate BGSubtractor_AutoEstimateNCOnly Thresh BGSubtractor_UseSurrogates float float float 0 10 float 0 10 float 0 10 float integer 1 True 0 False This additive error component in the green channel is entered in the protocol when auto estimation is turned off When auto estimation is turned on the estimated error value appears in the Stats table as AddErrorEstimateGreen This additive error component in the red channel is entered in the protocol when auto estimation is turned off When auto estimation is turned on the estimated error value appears in the Stats table as AddErrorEstimateRed Multiplier for the first term standard deviation of the inlier negative control in the additive error equation Multiplier for the second term gMultSpatialDetrendRMSFit in the additive error equation Multiplier
234. rance Omit Background Population Outliers Allow Positive and Negative Controls Signal Characteristics Normalization Correction Method 2 6 True AllFeatureTypes True False True 0 False Most Conservative 0 0900 0 0900 False 30 False 30 True Automatically Determine Use Rank Consistent Probes 0 050 False False False OnlyPositiveAndSignificantSignals Lowess Only 46 Feature Extraction Reference Guide Default Protocol Settings 1 Table6 Default settings for GE2 NonAT_1100_Jul11 protocol continued Protocol step Parameter Default Setting Value v12 0 Max Number Ranked Probes 8000 Compute Ratios Peg Log Ratio Value 4 00 Calculate Metrics Spikein Target Used False Min Population for Replicate Stats 5 PValue for Differential Expression 0 010000 Percentile Value 75 00 Generate Results Generate Single Text File True JPEG Down Sample Factor 4 Feature Extraction Reference Guide 47 1 Default Protocol Settings Table 7 Default settings for miRNA_1200_Jun14 protocol miRNA 1200 Jun14 This protocol is a miRNA protocol for use with miRNA Microarray System with miRNA Complete Labeling and Hyb Kit lab protocol v2 0 or higher publication number G4170 90011 Protocol step Parameter Default Setting Value v12 0 Place Grid Array Format For any format automatically Automatically Determine determined or selected by you the Recognized formats Single software uses the defaul
235. ranked values of the distribution of signals Another indicator of signal range for the microarray is the number of features that are saturated in the scanned image for example NumSat Net Signal Statistics Agilent SpikeIns Red Saturated Features 0 99 of Sig Distrib 24937 50 of Sig Distrib 2351 1 of Sig Distrib 160 Non Control probes Red Saturated Features 99 of Sig Distrib 50 of Sig Distrib 1 of Sig Distrib Figure 21 QC Report Net Signal Statistics 93 2 94 OC Report Results Negative Control Stats The Negative Control Stats table includes the average and standard deviation of the net signals mean signal minus scanner offset and the background subtracted signals for both the red and green channels in the negative controls These statistics filter out saturated and feature nonuniform and population outliers and give a rough estimate of the background noise on the microarray SNP probes are not included in these statistics Negative Control Stats Red Average Net Signals StdDev Net Signals Average BG Sub Signal StdDev BG Sub Signal BG Noise Figure 22 QC Report Negative Control Stats Feature Extraction Reference Guide QC Report Results 2 Plot of Background Corrected Signals Figure 23 is a plot of the log of the red background corrected signal versus the log of the green background corrected signal for non control inlier features The linearity or curvat
236. rate files with one for each table To select to generate one file or three see the Feature Extraction 12 0 User Guide To display the text results file in an easy to read format see the Feature Extraction 12 0 User Guide Feature Extraction Reference Guide Parameters options FEPARAMS Text File Parameters and Results 3 The top most section of the result file contains the parameters and option choices that you used to run Feature Extraction FULL FEPARAMS Table Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Protocol Name text Name of protocol used Protocol_date text Date the protocol was last modified Scan_date text Date the image was scanned Scan_ScannerName text Serial number of the scanner used Scan_NumChannels integer Number of channels in the scan image Scan_MicronsPerPixelX float Number of microns per pixel in the X axis of the scan image Scan_MicronsPerPixelY float Number of microns per pixel in the Y axis of the scan image Scan_OriginalGUID text The global unique identifier for the scan image Grid_Name text Grid template name or grid file name Grid_Date integer Date the grid template or grid file was created Grid_NumSubGridRows integer Number of subgrid columns Grid_NumSubGridCols integer Number of subgrid columns Grid_NumRows integer Number of spots per row of each subgrid Grid_NumCols integer N
237. ratio of the whole spot measurement versus the cookie measurement Fraction Num TotalNum of the number of features looked at that had anomalous ratios This gives a measure of the percentage of representative spots that are strange e g donuts super hot spots hot crescents Reports whether or not the default effective feature size was used If the default was used the stat is 1 If the effective feature size was estimated the stat value is 0 The protocol lets you enter the Percentile Value at which the intensity of the noncontrol signals is recorded All protocols specify the 75th percentile This number is the intensity of all the noncontrol signals in the 75th percentile This stat is used to normalize 1 color data These are metrics for miRNA only This is the value of the TotalGeneSignal for all genes at the 99th percentile Feature Extraction Reference Guide 171 3 Text File Parameters and Results Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description glotalSignal75pctile gNegCtrlSpread gNonCtriNumWellAboveBG ImageDepth AFHold gPMTVolts rNegCtrlSpread rNonCtriNumWellAboveBG rPMTVolts float float integer string float float These are metrics for miRNA only This is the value of the TotalGeneSignal for all genes at the 75th percentile The root mean square RMS of t
238. re Extraction 12 0 User Guide To learn about the naming of the protocol templates see the Feature Extraction 12 0 User Guide Agilent provides new and updated protocols on the eArray Web site If you set up an eArray login in Feature Extraction the software can automatically download and install protocol updates from eArray See the Feature Extraction 12 0 User Guide for more details 14 This chapter presents tables for display of the default settings for each protocol Parameter values depend on e microarray type e lab protocol e formats e scanner used Listed in the following table are the names of the nonremovable protocols and where you can find the tables that list their default values Table 1 Location of protocol template default settings Protocol Template name Location in chapter CGH_1200_Jun14 page 16 ChIP_1200_Jun14 page 23 GE1_1200_Jun14 page 30 GE2_1200_Jun14 page 36 GE2 NonAT_1100_Jul11 page 43 miRNA_1200_Jun14 page 48 Feature Extraction Reference Guide Default Protocol Settings 1 Differences between CGH and gene expression microarrays To see the differences in some default settings between protocols go to GE2_1200_Jun14 on page 36 Feature Extraction Reference Guide CGH microarrays possess a different negative control sequence scheme than the gene expression microarrays The gene expression microarrays have many replicate negative control features using only one sequence
239. re intensity is significant compared to the background intensity two kinds of tests are available t test and WellAboveBG test Both of these tests depend upon an estimation of background error The default protocol for older Agilent protocols still uses pixel statistics of local background regions to estimate background error in the 2 sided t test Newer Agilent 227 5 228 How Algorithms Calculate Results protocols use an improved estimation of background error the additive error calculated from the Agilent error model You can choose between these two background error estimations in the protocol parameter field Significance for IsPosAndSignif and IsWellAboveBG The WellAboveSDMulti confidence test is used to determine if the feature background subtracted signal is well above its background error Surrogates are calculated here and depend on the significance model used Given the standard t test the surrogates are calculated exactly as before Given the new significance test based upon additive error the surrogate value is determined by the additive error and the p value The program can also use a multiplicative detrend algorithm if selected or the default in the protocol to provide a surface fit to account for the dome effect that can happen when microarrays are processed Placing the error model calculation step before the significance calculation permits the result of the error model calculation to be used
240. rence Guide Agilent Feature Extraction 12 0 Reference Guide 2 QC Report Results QC Reports 68 QC Report Headers 87 Feature Statistics 90 Histogram of LogRatio plot 103 QC Report Results in the FEPARAMS and Stats Tables 121 QC Metric Set Results 122 QC reports include statistical results to help you evaluate the reproducibility and reliability of your single microarray data This chapter describes each of five types of QC report 2 color Gene Expression 1 color Gene Expression Streamlined CGH CGH_ChIP and microRNA miRNA and how each can help you interpret the performance of your microarray system Use plots and statistics from the report to e Set up your own run charts of statistical values versus time or experiment number to track performance of one microarray compared to other microarrays e Monitor upstream lab protocols such as performance of your hybridization washing steps e Monitor the effect of changing Feature Extraction protocol parameters on the performance of your data analysis If you incorporate a set of QC metrics in your extraction those results appear on the final page of the QC report as an Evaluation Table ot Agilent Technologies 67 2 OC Report Results OC Reports 68 This section contains example QC Reports and points out the different sections that appear on the reports The reports in this section are examples The actual contents of the reports vary depending on the protocol
241. replicated probes with multiple features normalized to their replicate average for the multiplicative detrending set Either minimum feature or minimum local background across the microarray for background subtraction global method Average of local backgrounds for background subtraction global method Average of negative controls for background for background subtraction global method Local background corresponding to each feature for background subtraction local method Minimum feature across the microarray for background subtraction global method No background subtraction Feature Extraction Reference Guide 139 3 Text File Parameters and Results Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Compute Bkgd BGSubtractor_MaxPVal float The pValue at which a feature is Bias and Error determined to be statistically significant above background Compute Bkgd BGSubtractor_WellAboveMulti float The number of standard deviations above Bias and Error background at which the feature is flagged as well above background Compute Bkgd BGSubtractor_BackgroundCorrectionO integer Bias and Error m 1 True Globally adjust background turned on 0 False Globally adjust background turned off Compute Bkgd BGSubtractor_BgCorrectionOffset Adjust the signal of all features by an offset Bias and Error constant so that very low
242. requent distance between centers of retained peaks that are adjacent to one another Coordinates for the features on the microarray relative to the X and Y axes are generated based on the selected peaks and peak spacing The grid is then adjusted for rotation and skew The background peak shift flag helps to improve the gridding Ideally all background pixels should have a gray value of zero In practice these values are nonzero When this flag is set to true the algorithm determines the background pixels pixel value from the histogram of the image All pixels having a non zero value background window are set to zero thus reducing the contribution of background pixels in the two one dimensional projected signals This shift in the peak of the background signal leads to better determination of peaks Feature Extraction Reference Guide How Algorithms Calculate Results 5 The following figures illustrate the result of applying Background Peak Shifting Figure 50 is a histogram of a typical 30 micron feature array before Background Peak Shifting Figure 51 depicts the same array after applying Background Peak Shifting Note that this operation is done internally in the grid placement algorithm The actual image data remains unchanged Some variations in the results are expected with and without use of this flag as the grid positions obtained differ 4000000 3000000 2000000 Frequency Red 1000000 Frequency Green
243. ric about the log ratio axis Figure 67 whereas after adjustment the data is more symmetric Figure 68 27 LogRatio 50 100 500 1000 5000 10000 500001 0000200000 Avq qrProcessed Signal Figure 67 Log ratios calculated from unadjusted background subtracted signals Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results 5 05 LogRatio 057 50 100 500 1000 5000 10000 z000mo000200000 Avg_grProcessed_Signal Figure 68 Log ratios calculated from adjusted background subtracted signals How is the Adjust background globally pad used If Adjust background globally is selected you can enter a constant between 0 and 500 called the pad value which forces the log ratio of red green towards zero The value of the pad is expressed in raw counts before dye normalization The Feature Extraction program assumes that this value applies to the red or green channel with the smallest mean signal and automatically computes the corresponding raw value in the other channel that would yield a corrected log ratio of zero after dye normalization The red and green feature signals are analyzed for rank consistency If red signal is plotted vs green signal and the slope of the rank consistent features is gt 1 then the pad value is assigned to the green channel If the slope is lt 1 the value is assigned to the red channel For instance if you set Adjus
244. rm Spatial Detrending Signal Correction Adjust Background Globally Signal Correction Perform Multiplicative Detrending Detrend on Replicates Only Filter Low signal probes from Fit Neg Ctrl Threshold Mult Detrend Factor Perform Filtering for Fit Use polynomial data fit instead of LOESS Polynomial Multiplicative DetrendDegree 1 0 09000 3 No Background Subtraction Use Error Model for Significance 0 01 13 True FeaturesInNegativeControlRange True True False True True True 5 Use Window Average True 34 Feature Extraction Reference Guide Default Protocol Settings 1 Table4 Default settings for GE1_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Robust Neg Ctrl Stats False Choose universal error or most conservative Most Conservative MultErrorGreen 0 1000 Auto Estimate Add Error Green True Use Surrogates True Calculate Metrics Spikein Target Used True Min Population for Replicate Stats 5 Grid Test Format Automatically Determine Recognized formats 60 micron and 30 micron feature size third party PValue for Differential Expression 0 010000 Percentile Value 75 00 Generate Results Type of QC Report Gene Expression Generate Single Text File True JPEG Down Sample Factor 4 Feature Extraction Reference Guide 35 1 Default Protocol Settings GE2_1200_Jun14 This is a 2 color gene expression protocol for use with the Two color Microarra
245. rols GE1 GE2 and miRNA 0 CGH and ChIP 0 non Agilent 4 Multiplier for the third term in the equation and is the width of the distribution of signals used in the background spatial detrending set after the background surface has been subtracted out When the background detrending set includes a group of features well distributed across the microarray with a variety of sequences the width of the distribution of the signals of these features after background subtraction is a very good estimate of the uncertainty of the dim signals or the additive error GE1 GE2 and miRNA 1 CGH and ChIP 0 non Agilent 0 Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results 5 Step 17 Calculate the significance of feature intensity relative to background IsPosAndSignif The significance of the feature intensity compared to the background intensity local or global is calculated using two different significance tests one using pixel statistics for both the feature and the background values and the other using the additive error from the Error Model calculation for the background value Significance based on pixel statistics This method to determine significance uses the 2 sided Student s t test with mean signal for the feature and the background correction for the background This is implemented as an incomplete Beta Function approximation Xr Xp 2 l o eee np n wh
246. ror is calculated as if all probes were included Invalidates Default Total Gene Signal 1 True 0 False Calculate Metrics QCMetrics_UseSpikelns integer 1 True Use Spikelns 0 False Do not use Spikelns Calculate Metrics QCMetrics_minReplicatePopulation integer Minimum number of replicates necessary to calculate replicate statistics Calculate Metrics QCMetrics_differentialExpression float The pValue to use to look for differentially PValue expressed genes Calculate Metrics QCMetrics_MaxEdgeDefect float Maximum allowable fraction of features Threshold along any edge of the microarray that are non uniform before a grid placement warning is given Calculate Metrics QCMetrics_MaxEdgeNotFound float Maximum allowable fraction of features Threshold along any edge of the microarray that are not found before a grid placement warning is given Calculate Metrics QCMetrics_MaxLocalBGNonUnif float Maximum allowable fraction of the local Threshold background regions on the microarray that are flagged as NonUniform before a grid placement warning is given Calculate Metrics QCMetrics_MinNegCtrlSDev float Minimum value for the standard deviation for the negative controls Calculate Metrics QCMetrics_MinReproducibility float Minimum value for the reproducibility Feature Extraction Reference Guide 147 3 Text File Parameters and Results Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table
247. rotocol UseGridFilelfAvailable False UseProjDefProtocolFirst False gt lt Extraction Name US23502418 251407710012 S0Q1 gt lt XDRScanID Name gt lt Image Name C Images US23502418 251407710012 SO1 tif gt lt Grid Name 014947_ D 20051222 IsGridFile False gt lt Protocol Name CGH 107 Sep09 2 gt lt Array ID 1 gt lt Sample Name gt lt Array gt lt Array ID 2 gt lt Sample Name gt lt Array gt lt Array ID 3 gt lt Sample Name gt lt Array gt lt Array ID 4 gt lt Sample Name gt lt Array gt Feature Extraction Reference Guide Example of XDR extraction set Feature Extraction Reference Guide Command Line Feature Extraction 6 Extraction Input lt Array ID 5 gt lt Sample Name gt lt Array gt lt Array ID 7 gt lt Sample Name gt lt Array gt lt Array ID 8 gt lt Sample Name gt lt Array gt lt Extraction gt lt FEProject gt lt FeatureExtractionML gt If you are extracting an XDR pair of images the Extraction entity structure will look like the following lt Extraction Name US45102874 251494710148 SO1 gt lt XDRScanID Name 01122007125846 gt lt Image Name C GridComparison US45102874 251494710148 SO1 H tif gt lt ImageXDR2 Name US45102874 251494710148 SO1 L tif WS lt Grid Name 014947 D 20060807 IsGridFile False gt lt Protocol Name miRNA_ 95
248. rror float The universal or propagated error left after all the processing steps of Feature Extraction have been completed In the case of one color ProcessedSignalError has had the Error Model applied and will contain at least the larger of the universal UEM error or the propagated error If multiplicative detrending is performed ProcessedSignalError contains the error propagated from detrending This is done by dividing the error by the normalized MultDetrendSignal Feature Extraction Reference Guide 181 3 Text File Parameters and Results Table 22 Feature results contained in the FULL output text file FULL FEATURES table continued Features Green Features Red Types Options Description gNumPixOLHi gNumPixOLLo gNumPix gMeanSignal gMedianSignal gPixSDev gPixNormlOR gBGNumPix rNumPixOLHi rNumPixOLLo rNumPix rMeanSignal rMedianSignal rPixSDev rPixNormlQR rBGNumPix integer integer integer float float float float integer Number of outlier pixels per feature with intensity gt upper threshold set via the pixel outlier rejection method The number is computed independently in each channel These pixels are omitted from all subsequent calculations Number of outlier pixels per feature with intensity lt lower threshold set via the pixel outlier rejection method The number is computed independently in each channel These pixels are omitte
249. rt Number and Spatial Distribution of Outliers The number and percentage of features that are feature nonuniformity outliers in either the green or red channel is shown under the plot The 1 color report shows only the percentage of green feature non uniformity outliers Also the number and percentage of genes that are nonuniformity outliers in either channel is shown under the plot If there were replicate features representing one gene and at least one feature was not an outlier no gene outliers would appear 92 Feature Extraction Reference Guide QC Report Results 2 Net Signal Statistics Net signal is the mean signal minus the scanner offset Net signal is used so that these statistics are independent of the scanner version Feature Extraction Reference Guide Net signal statistics are an indication of the dynamic range of the signal on a microarray for both non control probes and spike in probes not applicable for CGH QC report The QC Report uses the range from the first percentile to the 99th percentile as an indicator of dynamic range for that microarray NetSignal is also a column in the FeatureData output For example in Figure 21 for non control probes the dynamic range of the net signal intensity for the red channel is from 42 to 6803 Half the probes have a net signal intensity of greater than the median of 97 and half below the median of 97 The median or 50th percentile represents the middle of the
250. rties by double clicking on a OC metric set in the QC Metric Set Browser 122 The figures in this section show the metric names and default thresholds for the QC metric set results that appear in the Evaluation Tables for each of the QC metric sets available for Feature Extraction e CGH_QCMT_Date e ChIP_QCMT_Date e GE1_QCMT_Date e GE2_QCMT_Date miRNA_QCMT_Date where QCMT means QC Metrics with Thresholds QCM means QC Metrics without thresholds and Date is the date that the metric set was released from Agilent For details on the logic used for evaluating metrics see Metric Evaluation Logic on page 125 CGH_OCMT_Jun14 Metrics Metric Name Excellent Good Evaluate IsGoodGrid gt 1 NA lt 1 AnyColorPrentFeatNonU lt 1 1to5 gt 5 DerivativeLR_Spread lt 0 20 0 20 to 0 30 gt 0 30 gRepro 0 to 0 05 0 05 to 0 20 lt 0 or gt 0 20 g_BGNoise lt 5 5to15 gt 15 g_Signal2Noise gt 100 30 to 100 lt 30 _SignalIntensity gt 150 50 to 150 lt 50 rRepro 0 to 0 05 0 05 to 0 20 lt 0 or gt 0 20 r_BGNoise lt 5 5to15 gt 15 r_Signal2Noise gt 100 30 to 100 lt 30 r_Signallntensity gt 150 50 to 150 lt 50 RestrictionControl 0 80 tol lt 0 80 or gt 1 LogRatiolmbalance 0 26 to 0 26 0 75 to 0 26 or 0 26 to 0 7 lt 0 75 or gt 0 75 Figure 43 QC Metrics for CGH_QCMT_Jun14 metric set Feature Extraction Reference Guide QC Report Results 2 SNP probes are not used in calculation of any CGH QC Metric
251. run FeNoWindows from any directory The Feature Extraction installation includes FeNoWindows along with the necessary grid templates and protocols The installer places FeNoWindows exe in the Feature Extraction folder and edits the System Path Variable to include the Feature Extraction folder When you start FeNoWindows you cannot return to Feature Extraction until FeNoWindows completes any running tasks and exits or exits due to an error FeNoWindows accepts only one project as input Also project files containing more than one extraction especially 30u extractions run the risk of running out of memory FeNoWindows accepts project files from v8 5 and later as input for running Feature Extraction A Feature Extraction project file is an XML file that specifies an extraction set You create project files using the Feature Extraction user interface FeNoWindows returns result information in XML format the result looks similar to a project XML file FeNoWindows appends a result code to the project XML file that indicates ot Agilent Technologies 297 6 Command Line Feature Extraction the basic status of the run such as successful completion unsuccessful attempts warnings or errors For a complete listing of return codes see Table 40 on page 305 298 Feature Extraction Reference Guide Commands Command Line Feature Extraction 6 FENoWindows commands are available to perform the following operations e Run extractio
252. s Expected Log Ratio Vs Observed LogRatio Feature Extraction Reference Guide 113 2 OC Report Results Spike in Linearity Check for 1 color Gene Expression This plot shows the dose response curve of the spike ins from the detection limit to the saturation point This plot is usually sigmoidal with At high signal levels the error bars are small since the two asymptotes one at the scanner reaches saturation at this point Both the signals scanner saturation point and one and standard deviations are underestimated because the at the level of signal for sequences saturated data is not excluded from the calculation with no specifically bound target Some microarrays produce plots missing the top asymptote especially if extended dynamic range is used See Figure 41 At low signal levels the error bars are visible because the signal is dropping into the background noise The signal level at the top of the error bars of the features with lowest signal provides a rough estimate of the lower limit of detection Signals at this level can be slightly overestimated and the error slightly underestimated because the signals below zero are excluded from the calculation The most reliable Feature Extraction data is found in the signal range where the signal increases linearly with the concentration of the target Agilent SpikeIns Log Signal vs Log Relative concentration Plot 5 33 4 83 4 33 3 83 3 33 2 83 2 33 1 83 1 33
253. s The minimum radius that you can enter is the FLOOR Default Radius where FLOOR rounds the calculated value of the default radius down to the next lower integer e g FLOOR 87 6 87 Maximum radius The software lets you enter a maximum radius for the local background no greater than the distance from the center of the innermost feature to the edge of a circle that approximately surrounds the fourth closest set of nearest neighbors or n 4 as shown in Equation 2 The set of eight nearest neighbors closest to the feature of interest is defined as n 1 as shown in Equation 3 243 5 How Algorithms Calculate Results Figure 56 Example of the radius for the first closest set of nearest neighbors or n 1 eight nearest neighbors The value of the maximum radius also depends on the scan resolution and interspot spacing in the TIFF and grid template or file shown in the equation Maxx radius CEILING Scan_resolution x 4 7 Interspotspacing x Interspotspacing vy 2 where CEILING rounds the calculated value up to the next higher integer e g CEILING 3 2 4 Any radius The value of any radius between the minimum and maximum that circumscribes a circle surrounding the nth closest set of nearest neighbors from the central spot can be approximated as Radius_n Scan_resolution x n 6 J Unterspotspacing x Unterspotspacing yy 3 where n 1 2 3 or 4 Figure 57 shows the set of nearest neighbors where n
254. s and Results Description gNumPix gMeanSignal gMedianSignal gPixSDev gBGMeanSignal gBGMedianSignal gBGPixSDev glsSaturated glsLowPMTScaled Up BGPixCorrelation glsFeatNonUnifOL rNumPix rMeanSignal rMedianSignal rPixSDev rBGMeanSignal rBGMedianSignal rBGPixSDev rlsSaturated rlsLowPMTScaled Up rlsFeatNonUnifOL integer float float float float float float boolean boolean float boolean 1 Saturated or 0 Not saturated 1 Low 0 High g r lsFeatNonUnifO L 1 indicates Feature is a non uniformity outlier in g r Total number of pixels used to compute feature statistics i e total number of inlier pixels per spot same in both channels Raw mean signal of feature from inlier pixels in green and or red channel Raw median signal of feature from inlier pixels in green and or red channel Standard deviation of all inlier pixels per feature this is computed independently in each channel Mean local background signal local to corresponding feature computed per channel inlier pixels Median local background signal local to corresponding feature computed per channel inlier pixels Standard deviation of all inlier pixels per local BG of each feature computed independently in each channel Boolean flag indicating if a feature is saturated or not A feature is saturated IF 50 of the pixels in a feature are above the saturation thr
255. s as Failed in MAGEML file Compute Non Uniform Outliers Scanner Agilent scanner The values for the parameters change depending on the scanner used for the image See the following for differences Automatically Compute OL Polynomial Terms Feature CV 2 Red Poissonian Noise Term Multiplier 150 when False for 30 micron feature size Inter Quartile Region Automatically Determine and All Formats 1 42 All Formats 1 42 All Formats Use Mean Standard Deviation Automatically Determine and All Formats True 8 1 42 5 00 True False True Automatically Determine Hidden if Array Format is set to Automatically Determine True 0 04000 20 Feature Extraction Reference Guide 51 1 Default Protocol Settings Table 7 Default settings for miRNA_1200_Jun14 protocol continued Protocol step Default Setting Value v12 0 Red Signal Constant Term Multiplier Green Poissonian Noise Term Multiplier Green Signal Constant Term Multiplier Background CV 2 Red Poissonian Noise Term Multiplier Red Background Constant Term Multiplier Green Poissonian Noise Term Multiplier Green Background Constant Term Multiplier Compute Bkgd Biasand Background Subtraction Method Error Significance for lsPosAndSignif and IsWellAboveBG 2 sided t test of feature vs background max p value WellAboveMulti Background Method by Format Min Feature Threshold for Metr
256. s not larger than the background error Surrogate values were computed during background subtraction and are stored in the SurrogateUsed column Step 24 Calculate the dye normalized signal DyeNormSignal The dye normalized signal is calculated by multiplying the background subtracted signal by the dye normalization factor DyeNormSignal BGSubSignal MultDetrendSignal x DNF 28 where DNF LinearDyeNormFactor when linear dye normalization method is used and where DNF LinearDyeNormFactor x LOWESSDyeNormFactor 29 when LOWESS dye normalization method is used Compute Ratios Step 25 Calculate the processed signal ProcessedSignal The processed signal is used in calculating the log ratio If a surrogate is not used i e SurrogateUsed zero value then the processed signal is the dye normalized signal If a surrogate is used i e SurrogateUsed non zero value then the processed signal is the SurrogateUsed value if SurrogateUsed 0 then ProcessedSignal DyeNormSignal Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results 5 if SurrogateUsed 0 then ProcessedSignal SurrogateUsed DyeNormFactors where DyeNormFactors LinearDyeNormFactor LowessDyeNormFactor if Linear and Lowess methods are used Step 26 Calculate the log ratio of feature LogRatio The log ratio 7 is the measure of differential expression between the red and green channels for ev
257. s the complementary error function as defined by the above equation adev is the deviation of LogRatio from 0 LogRatio xdey 2 _ _ LogRatioError 33 Equation 22 is analogous to a signal to noise metric If the Universal Error Model is used then xdev is computed from six sources e ProcessedSignals red and green channels e Multiplicative error factors red and green e Additive error factors red and green The terms xdev multiplicative error and additive error come from the Universal Error Model as developed by Rosetta Biosoftware Once xdev is computed it is plugged back into Equation 2 where LogRatioError is derived If the Propagation of Pixel Level Error Model is used then LogRatioError is computed from the following sources e Feature PixSDev red and green channels e Background Noise calculation is dependent upon the chosen BkSubMethod red and green channels Once the LogRatioError is computed it is plugged back into Equation 21 where xdev is derived Calculate Metrics 280 Although the QC metrics are calculated in this step only the gridding tests are discussed in this section Feature Extraction Reference Guide Test 1 Test 2 Test 3 Test 4 Feature Extraction Reference Guide How Algorithms Calculate Results 5 Step 28 Perform a series of gridding tests to make sure that grid placement has been successful These tests are performed to yield
258. sand Background Subtraction Method Error Significance for lsPosAndSignif and IsWellAboveBG 2 sided t test of feature vs background max p value WellAboveMulti Signal Correction Calculate Surface Fit required for Spatial Detrend Feature Set for Surface Fit Perform Filtering for Surface Fit Perform Spatial Detrending Signal Correction Adjust Background Globally Signal Correction Perform Multiplicative Detrending Detrend on Replicates Only Filter Low signal probes from Fit Neg Ctrl Threshold Mult Detrend Factor Perform Filtering for Fit Use polynomial data fit instead of LOESS 0 09000 3 No Background Subtraction Use Error Model for Significance 0 01 13 True OnlyNegativeControlFeatures False True False True False True 3 Use Window Average True Feature Extraction Reference Guide 27 1 Default Protocol Settings Table 3 Default settings for ChIP_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Correct Dye Biases Compute Ratios Calculate Metrics Polynomial Multiplicative DetrendDegree Robust Neg Ctrl Stats Choose universal error or most conservative MultErrorGreen MultErrorRed Auto Estimate Add Error Red Auto Estimate Add Error Green Use Surrogates Use Dye Norm List Dye Normalization Probe Selection Method Rank Tolerance Variable Rank Tolerance Omit Background Population Outliers Allow Posi
259. settings and QC metric set used Feature Extraction Reference Guide OC Report Results 2 color Gene Expression QC Report 1 QC Report Headers on page 87 2 Spot finding of Four Corners on page 90 3 Outlier Stats on page 91 4 Spatial Distribution of All Outliers on page 91 5 Net Signal Statistics on page 93 6 Plot of Background Corrected Signals on page 95 Feature Extraction Reference Guide This module shows you the organization of the 2 color gene expression QC report See the following figure and the figures on the next pages for links to information on the QC Report regions Page 1 of 3 QC Report Agilent Technologies 2 Color Gene Expression Date Thursday Noverrber 17 2011 08 18 BG Method No Background Image Hu22K_GE2_251209710036 Background Detrend On FeatN CRange LoPass Protocol GE2_1100_Jul11 Read Only Multiplicative Detrend True User Name 1 KM1 Dye Norm Linear Lowess Grid 012097_D_20070820 Linear DyeN orm Factor 4 15 Red 15 9 Green FE Version 1L006 Additive Error 14 Red 65 Green Sample red green Saturation Value 65211 r 65185 9 DyeNorm List NA No of Probes in DyeNorm List NA Spot Finding of the Four Comers of the Array 5 Net Signal Statistics 2 Agilent SpikeIns Red Green Saturated Features o o 99 of Sig Distrib 24937 11899 50 of Sig Distrib 2351 750 1 of Sig Distrib 160 112 Non Control probes Grid Normal R
260. shold is indicative of a labeling problem The LabelingSpike InSignal is calculated as LabelingSpike InSignal gdmr285 GeneSignal gdmr3laGeneSignal 41 The HybSpike InSignal metric helps determine potential hybridization issues The Spike In targets used in computing this metric are added to the mix after labeling just prior to hybridization If both the HybSpike InSignal and LabelingSpike InSignal are low e g below the threshold then there may be an issue with the hybridization of this array If the LabelingSpike InSignal metric is below the threshold but the HybSpike InSignal is not then the efficiency of the Labeling reaction may have been compromised The HybSpike InSignal metric is calculated as HybSpike InSignal gdmr3GeneSignal gdmr6GeneSignal 42 Feature Extraction Reference Guide 2 The StringencySpike InSignalRatio metric may help evaluate wash stringency As of Feature Extraction 10 7 there are no thresholds for this metric This may change with future updates The StringencySpike InRatio is calculated as StringencySpike InRatio gdmr3ProbeRatio 43 289 5 How Algorithms Calculate Results Example calculations for feature 12519 of Agilent Human 22K image Figure 70 Visual results of feature number 12519 from Shapes file shp of Human_22K_expression microarray image The 2 color gene expression Human 22K microarray image Human_22K_expression is included in the Exam
261. sity scan the XDR algorithm uses the data from the low intensity scan This choice is made independently for each color channel For each feature that uses data from the low intensity scan the following columns get replaced determined separately for red and green channels NumPixOLHi NumPixOLLo NumPix MeanSignal MedianSignal PixSDev PixNormIQR NumSatPix IsSaturated NetSignal These columns include the raw data from the spotfinding and measurement steps signal levels pixel noise levels number of pixels if the pixels and feature are saturated Once the substitutions have been made to some features in each color channel the extraction proceeds as if there were only a single combined set of features Flag Outliers Next the Flag Outliers algorithm flags anomalous features and local backgrounds as non uniformity outliers and or population outliers Population outlier flagging is based on population statistics of replicate features on the microarray Which of two statistical tests is used to identify population outliers depends on the number of replicate features on the microarray Non uniformity outlier flagging is based on statistical deviation from the expected noise in the Agilent microarray based system scanner labeling hybridization protocols and microarrays The algorithm automatically Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results 5 calculates the
262. st ROI about a nominal spot location The spotfinder determines the found centroid and spot size of the spot within the ROI The average absolute of difference between nominal centroids and corresponding found centroids in X direction The average absolute of difference between nominal centroids and corresponding found centroids in Y direction The number of features that are flagged as found Maximum fraction of features that are non uniform along any edge of the microarray Maximum fraction of features that are not found along any edge of the microarray Root mean square RMS of the fitted data points obtained from the second degree polynomial equation in Multiplicative Detrending This gives an idea of the curvature of the surface fit to the hybridization dome in the Agilent Hybridization chambers 166 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description gMultDetrendSurfaceAverage rMultDetrendSurfaceAverage DerivativeOfLogRatioSD eQCLowSigName1 eQCLowSigName2 eQCOneColorLogLowSignal eQCOneColorLogLowSignal Error e0COneColorLogHighSignal eQCOneColorLinFitLogLowConc eQCOneColorLinFitLogLow Signal float float text text float float float float float The average of the surface calculat
263. stant Term 1 Multiplier Background CV 2 0 09000 Red Poissonian Noise Term 3 Multiplier Red Background Constant Term 1 Multiplier Green Poissonian Noise Term 3 Multiplier Green Background Constant Term 1 Multiplier Compute Bkgd Biasand Background Subtraction Method No Background Subtraction Error Significance for IsPosAndSignif and IsWellAboveBG Use Error Model for Significance 2 sided t test of feature vs 0 01 background max p value WellAboveMulti 13 Signal Correction Calculate Surface Fit required for True Spatial Detrend 20 Feature Extraction Reference Guide Table 2 Default settings for CGH_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 Correct Dye Biases Feature Set for Surface Fit Perform Filtering for Surface Fit Perform Spatial Detrending Signal Correction Adjust Background Globally Signal Correction Perform Multiplicative Detrending Robust Neg Ctrl Stats Detrend on Replicates Only Filter Low signal probes from Fit Neg Ctrl Threshold Mult Detrend Factor Perform Filtering for Fit Use polynomial data fit instead of LOESS Polynomial Multiplicative DetrendDegree Choose universal error or most conservative Use Surrogates Use Dye Norm List MultErrorGreen MultErrorRed Auto Estimate Add Error Red Auto Estimate Add Error Green Dye Normalization Probe Selection Method Rank Tolerance
264. t lt Array ID 8 gt lt Sample Name gt lt Array gt lt Extraction gt 311 6 Command Line Feature Extraction Extraction Results The information contained in the output file specified with the o command depends on the extraction operation performed and the options you specified For example the XML file can contain status time warning or error messages and indicate the number of outliers Status information Success Error Warning is particularly important Status information Success Feature Extraction had no issues extracting the data Warning Feature Extraction generated the data which might be usable Users should check the RTF file for the warning Feature Extraction probably ran OK A common warning is No Spikelns found on this design Error Output files may or may not have been generated If output files were generated users need to look at the image and shape files to make sure they are OK The grid may not have been placed correctly Users should not trust the data without visual inspection FeNoWindows occasionally reports failures that are not true errors The image RTF file and QC report and possibly the shapes file need to be examined to see why things failed 312 Feature Extraction Reference Guide Command Line Feature Extraction 6 Examples of status information Feature Extraction Reference Guide The following XML file fragments show you examples of what the status in
265. t Density 11k 22k 25k Double Placement Method Density 44k 95k 185k 185k 10 Parameters that apply only to uM 65 micron feature size also specific formats appear only if that with 10 micron scans 30 micron format is selected feature size single pack and multi pack and Third Party Placement Method Hidden if Array Format is set to Automatically Determine Allow Some Distortion All formats Enable Background Peak Shifting Hidden if Array Format is set to Automatically Determine Set to false for all arrays except 30 microns single pack and multi pack for which it is set to true Use central part of pack for slope Hidden if Array Format is set to and skew calculation Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack for which it is set to True Use the correlation method to Hidden if Array Format is set to obtain origin X of subgrids Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack for which it is set to True 48 Feature Extraction Reference Guide Table 7 Default settings for miRNA_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 Optimize Grid Fit Grid Format Find Spots Spot Format The parameters and values for optimizing the grid differ depending on the format Iteratively Adjust Corners Adjustment Threshold Maximum Number of Itera
266. t background globally to 50 and if the slope is 1 2 then a value of 50 is added to the green background subtracted signal of all features whereas a value of 50 1 2 60 is added to the red background subtracted signal of all features 265 5 266 How Algorithms Calculate Results Conversely if you set Adjust background globally to 50 and if the slope is 0 5 then a value of 50 is added to the red background subtracted signal of all features whereas a value of 50 0 5 100 is added to the green background subtracted signal of all features Step 15 Calculate robust negative control statistics This algorithm is used primarily for CGH and miRNA microarrays It repeats the population outlier algorithm but not on one sequence at a time rather on the distribution of all features that are classified as NegC or negative controls The algorithm calculates robust IQR statistics on features not designated as non uniform outliers population outliers or saturated UpperLimit 75th percentile Multiplier IQR LowerLimit 25th percentile Multiplier IQR The default value for this multiplier is 5 The algorithm then omits features that are outside the Upper and LowerLimits and calculates the new robust Count Avg and SD of these inliers for the net signal and the background subtracted signal g r NegCtrINumInliers g r NegCtrlAveNetSig g r NegCtr1SDevNetSig g r NegCtrlAveBGSubSig g r NegCtrlSDevBGSubSig Step 16 D
267. t lab protocols Agilent has found thresholds that indicate if the data is in the expected range Good or out of the expected range Evaluate For some applications CGH miRNA an extra threshold level Excellent is provided More data has been screened to allow setting the metric thresholds to tighter limits that indicate excellent processing For those applications that do not have a full set of thresholds for example ChIP or no Excellent thresholds for example GE1 and GE2 the user is assured that the data coming from the Good grade is good to use Excellent thresholds for those applications may be provided in the future Feature Extraction Reference Guide 83 2 QC Report Results QC metric set results default protocol settings Figure 15 is an example of part of a QC report the header and the Evaluation Metrics table generated from a 2 color gene expression extraction whose GE2 metric set with thresholds had been added In this extraction the default protocol settings were used Note that all values for the metrics are within the default threshold ranges QC Report Agilent Technologies 2 Color Gene Expression Date Image Protocol User Name Grid FE Version Sample red green DyeNorrn List No of Probes in DyeNorm List Thursday Noverrber 17 2011 08 18 BG Method No Background Hu22K_GE2_251209710036 Background Detrend On FeatN CRange LoPass GE2_1100_Julii Read
268. t result files 207 Full and Compact Output Packages 207 Tables for Full Output Package 208 Table for Compact Output Package 216 Helpful hints for transferring Agilent output files 220 Feature Extraction Reference Guide 9 Contents 10 5 6 XML output 220 TIFF Results 222 How Algorithms Calculate Results 223 Overview of Feature Extraction algorithms 224 Algorithms and functions they perform 224 Algorithms and results they produce 230 XDR Extraction Process 234 What is XDR scanning 234 XDR Feature Extraction process 234 How the XDR algorithm works 236 Troubleshooting the XDR extraction 237 How each algorithm calculates a result 238 Place Grid 238 Optimize Grid Fit 241 Find Spots 241 Flag Outliers 248 Compute Bkgd Bias and Error 254 Correct Dye Biases 274 Compute Ratios 278 Calculate Metrics 280 MicroRNA Analysis 283 Example calculations for feature 12519 of Agilent Human 22K image 290 Data from the FEPARAMS table 291 Data from the STATS Table 291 Data from the FEATURES Table 291 Command Line Feature Extraction 297 Commands 299 Command line syntax 299 Commands and arguments 300 Feature Extraction Reference Guide Contents Return Codes 305 Extraction Input 307 Extraction Results 312 Status information 312 Examples of status information 313 Error codes from XML file 315 Warning codes from XML file 319 Index 325 Feature Extraction Reference Guide 11 Contents 12 Feature Ext
269. t the leading g means the data is calculated from the green channel float The Probe Ratio of the 2 dmr6 probes float The Probe Ratio of the 2 dmr3 probes Metrics The Feature Extraction software via the miRNA metric set provided with Feature Extraction versions 10 7 and later calculates three metrics that appear on the miRNA QC report LabelingSpike InSignal HybSpike InSignal and StringencySpike InRatio Two of the three metrics have thresholds associated with them as defined in the QC metric set the other metric does not as of Feature Extraction 10 7 This may change in future updates The Spike In controls when used in conjugation with the Spike In metrics can help troubleshoot potential issues with your miRNA microarray experiment The Spike Ins and Feature Extraction Reference Guide How Algorithms Calculate Results 5 associated metrics are for use with the Agilent miRNA experimental protocol only We have not tested nor evaluated any deviations from our standard protocol and therefore cannot offer support guidance with issues arising from the use of other protocols The LabelingSpike InSignal metric helps determine if there might be a problem with the labeling reaction The Agilent protocol for use with the Spike Ins must be used for the metric to give meaningful values The metric encompasses two different Spike In miRNAs and reports the average signal strength A value for this metric below the thre
270. tNonUnifOL boolean g r lsFeatNonUnifO Boolean flag indicating if a feature is a L 1 indicates NonUniformity Outlier or not A feature Feature is a is non uniform if the pixel noise of non uniformity feature exceeds a threshold established outlier in g r for a uniform feature glsBGNonUnifOL rlsBGNonUnifOL boolean g r IsBGNonUnifOL The same concept as above but for 1 indicates Local background background is a non uniformity outlier in g r glsFeatPopnOL rlsFeatPopnOL boolean g r lsFeatPopnOL Boolean flag indicating if a feature is a 1 indicates Feature Population Outlier or not Probes with is a population replicate features on a microarray are outlier in g r examined using population statistics A feature is a population outlier if its signal is less than a lower threshold or exceeds an upper threshold determined using a multiplier 1 42 times the interquartile range i e IQR of the population glsBGPopnOL rlsBGPopnOL boolean g r IsBGPopnOL 1 The same concept as above but for indicates local background background is a population outlier in g r IsManualFlag boolean Boolean to flag features for downstream filtering in third party gene expression software gBGSubSignal rBGSubSignal float g r BGSubSignal Background subtracted signal To g r MeanSignal display the values used to calculate this g r BGUsed variable using different background signals and settings of spatial detrend and global background adjust see Tabl
271. te Bkgd Bias and Error Compute Bkgd Bias and Error IsWellAboveBG SpatialDetrendlsIn FilteredSet SpatialDetrend SurfaceValue indicates that the feature MeanSignal is greater than and significant compared to the background signal i e BGUsed If significance is based on the Additive Error of the Error Model a Boolean flag of 1 means that the feature MeanSignal is greater than and significant compared to the Additive Error A Boolean flag of 1 indicates that the feature BGSubSignal is well above background and passes the IsPosAndSignif test Set to true for a given feature if it is part of the filtered set used to detrend the background The feature may be in the set of locally weighted lowest x of features as defined by the DetrendLowPassPercentage may be a negative control feature or may be part of the set of features that are in the negative control range The feature set is defined by the detrend method selected Value of the smoothed surface at that feature calculated by the Spatial detrend algorithm Feature Extraction Reference Guide 231 5 How Algorithms Calculate Results Table 32 Algorithms Protocol Steps and the results they produce continued Protocol Step Results Result Definition Compute Bkgd Bias and Error Compute Bkgd Bias and Error Correct Dye Biases Correct Dye Biases Compute Ratios Compute Ratios Compute Ratios MultDetrendSignal SurrogateUsed DyeNor
272. ter is done by transforming the feature BGSubSignal to feature rank per channel Next the feature correlation strength is calculated per feature cs Pa Pd t25 N where pp and pg are the ranks of feature in the red and green channels respectively where N is the total number of initial normalization features If the CS lt t where 7 is the threshold percentile then feature passes the rank consistency filter between the red and green channels and falls within the central tendency of the data Note 7 is a user defined parameter in the Feature Extraction program 275 5 How Algorithms Calculate Results The LinearDyeNormFactor red and green channels values are listed in the STATS table 276 Using Rank Consistent List of Normalization Genes This method uses the rank consistent normalization genes from the list These genes follow the criteria described above Step 22 Calculate the normalization factor LinearDyeNormFactor The linear dye normalization method assumes that dye bias is not intensity dependent and therefore takes a global approach to dye normalization A linear dye normalization factor is computed per channel by setting the geometric mean of signal intensity of the normalization features equal to 1000 1000 iS rex 10 7 LinearDyeNormFactor 26 where YX is the background subtracted signal of a feature G e BGSubSignal MultDetrendSignal where 7 is the number of features used for normalizati
273. teratively Adjust Corners Adjustment Threshold Maximum Number of Iterations Found Spot Threshold Number of Corner Feature Side Dimension Depending on the format selected by the software or by you the default settings for this step change See the following rows for the default values for finding spots Use the Nominal Diameter from the Grid Template Automatically Determine Recognized formats 65 micron feature size 30 micron feature size and Third Party Hidden if Array Format is set to Automatically Determine True All Formats except Third Party False Third Party Hidden if Array Format is set to Automatically Determine 0 300 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 5 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 0 200 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 20 All Formats except Third Party Automatically Determine Recognized formats same as those listed above except 244k 10uM replaces 65 micron feature size 10 micron scans Hidden if Array Format is set to Automatically Determine True All Formats Feature Extraction Reference Guide 31 1 Default Protocol Settings Table4 Default settings for GE1_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Spot Deviation Limit
274. the surface fit The deviations of the input filtered points from the corresponding output fitted data points are computed An outlier rejection is performed on the set of deviations using the standard IQR technique Figure 59 on page 252 Here is the value from the Loess fit and 0 is the BGSubSignal This result gives an idea of the curvature of the surface gradient Feature Extraction Reference Guide 261 5 How Algorithms Calculate Results Table 35 Statistical results of spatial detrend algorithm continued Result Description and Equation SpatialDetrendVolume The volume is calculated as the sum of the intensities of the surface area minus the offset The offset is calculated as the volume under the flat surface parallel to the glass slide passing through the minimum intensity point of the fitted surface This number total volume offset is normalized by the area of the microarray SpatialDetrendAveFit This describes the average intensity of the surface gradient N 20 i l 14 N 14 Step 14 Adjust the background This algorithm determines the offset in both the red and green channels by identifying features that are not differentially expressed and fall within the central tendency of the data especially in the lower intensity domain These features should not be saturated or be flagged as non uniform outliers Using this method yields more accurate and reproducible background subtra
275. the Universal or Most Conservative error model if AutoEstimateAddError was selected Or the values entered into the protocol if AutoestimateAddError was not selected Note that the 87 2 OC Report Results Saturation Value additive error that appears in the QC report header is the Additive Error value selected in the protocol multiplied by the linear dye norm factor The signal intensity value above which the signal is considered saturated This value only appears if it exceeds about 65 500 If it appears this means that this QC report is from an XDR image file 1 color Gene Expression QC Report This report lists all of the same header information as the 2 color gene expression report except for Dye Norm and Linear DyeNorm Factor which are removed Streamlined CGH OC Report The streamlined CGH QC report contains the same header information as the 2 color gene expression QC report except for Linear DyeNorm Factor and Additive Error which are removed Also the information from the two fields BG Method and Background Detrend have been collapsed into the one field BG Method CGH_ ChIP OC Report Derivative of Log Ratio Spread All header information that appears in the 2 color gene expression QC report are included in the CGH_ChIP report This report lists one additional metric Derivative of Log Ratio Spread in the header information Measures the standard deviation of the probe to probe
276. the data is close to linear for higher multiples of w away from x0 Feature Extraction Reference Guide Feature Extraction Reference Guide QC Report Results 2 The asymptotes for the max and the min are not necessarily symmetric The upper asymptote is a function of scanner offset and the lower asymptote is a function of chemistry scanner noise The calculations then follow this order b c d The Min is estimated by taking all the SpikeIn data and for each sequence calculating the BackgroundSubtracted SignalAverage the Median of the Log of the processed Signals StDev of the Log of the processed Signals the CV of the processed signals The Median Log Proc Signal CV StDev of the Log of the processed signals all show up in the Agilent SpikelIns Signal Statistics table of the QC report For each sequence use the calculated Background SubtractedSignalAverage and compare against the StdDeviation of the Negative Controls StdDevBgSubSigNegCtrl using the formula BGSubAverage MultErrorGreen gt StdDevBgSubSigNegCtrl Exclude the Proc Signals that fail this test and use the median of the Proc Signals for the remaining sequences as the initial guess Max is estimated as Log Scanner SaturationValue x0 is estimated by starting with the y value max min 2 then finding the 2 closest Med Log Proc Signals above and below this point Finding the Log concentrations of those points and then computing a slope and an int
277. these averages is determined next Then the average of these absolute of averages is calculated to get a single value for the QC Report The larger this value the more differential expression is present Array Uniformity LogRatios Non Contro Agilent SpikeIns AbsAvgLogRatio 0 26 0 48 AverageS N 3 86 43 07 Figure 33 QC Report Array Uniformity LogRatios Feature Extraction Reference Guide Sensitivity Feature Extraction Reference Guide OC Report Results These values represent the NetSignal to background BGUsed ScannerOffset ratio of the two spike in probes with the lowest background subtracted signal Their purpose is to characterize the sensitivity of detecting a low signal relative to the background Sensitivity Agilent SpikeIns Ratio of Signal to Background for 2 dimmest probes E1A_r60_n11 E1A_r60_a97 g r 9 r 15 0 2 3 Figure 34 QC Report Sensitivity Agilent Spikelns Ratio of Signal to Background for 2 dimmest probes 2 107 2 OC Report Results Reproducibility Plots Reproducibility plot for 2 color gene expression spike in probes Signal replicate statistics are calculated for spike in probes if three criteria are met e They are present on the microarray e The protocol indicates that labeled target to these spike in probes has been added in the hybridization QCMetrics_UseSpikelIns is True e There are a minimum number of inlier features for calculations QCMe
278. thod RejectlORFeat RejectlQRBG Statistical Method for Spot Values from Pixels Flag Outliers Compute Population Outliers Minimum Population IORatio Background lIQRatio Use Qtest for Small Populations Report Population Outliers as Failed in MAGEML file Compute Non Uniform Outliers Scanner The values for the parameters change depending on the scanner used for the image See the following for differences False 185k 185k 10uM 65 micron feature size 30 micron feature size 244k 10uM Hidden if Array Format is set to Automatically Determine 100 when False for 185k 185k 10uM 65 micron feature size 244k 10 uM 150 when False for 30 micron feature size Inter Quartile Region Automatically Determine and All Formats 1 42 All Formats 1 42 All Formats Use Mean Standard Deviation Automatically Determine and All Formats True 10 1 42 1 42 True False True Automatically Determine Feature Extraction Reference Guide 19 1 Default Protocol Settings Table 2 Default settings for CGH_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Agilent scanner Automatically Compute OL Polynomial Terms Hidden if Array Format is set to Automatically Determine True Feature CV 2 0 04000 Red Poissonian Noise Term 5 Multiplier Red Signal Constant Term 1 Multiplier Green Poissonian Noise Term 5 Multiplier Green Signal Con
279. ting no evaluation has been done by Feature Extraction Specialized SpikeIn plots amp tables will be omitted from the report 115 2 116 OC Report Results Case 2 If the array has a Grid Template WITH Spikelns in the design but the user adds no SpikelIns to hyb If standard protocol is run the results will either be wrong values or listed as NA e If the protocol has SpikeIn Used set to False then the QC metric table in the QC Report will show for values and black font instead of red green or blue fonts indicating no evaluation has been done by Feature Extraction Specialized SpikeIn plots amp tables will be omitted from the report How the curve and statistics are calculated Curve fit equation All of the statistics in the table above are calculated using a parameterized sigmoidal curve fit to the data max min F x min ST lt e where min is the level of signal for sequences with no specifically bound target and max is the upper limit of detection where x0 is the center of the data and close to the center of the linear range where w is the width of the curve on either side of x0 Curve fit calculations Before the calculations the following assumptions are made Saturation Point is fixed or close to scanner detection limit This value is Log Scanner Saturation Value 4 82 The linear range of the curve x0 w x0 w does not define the dynamic range of the data as
280. tion method 246 325 Index output files control types 221 how used by databases 206 integrating with Resolver 220 text 127 P parameter options 129 place grid find nominal spot positions 238 protocol find settings 14 55 hidden settings 15 public accession numbers 204 Q QC Report foreground surface fit 97 header 87 88 local background inliers 97 microarray uniformity 106 net signal statistics 91 outlier number and distribution 91 plot of background corrected signals 95 plot of LogRatio vs Average Log Signal 101 reproducibility plot spike ins 108 reproducibility statistics non control probes 104 results in FEPARAMS and STATS table 121 sensitivity 107 spike in log ratio statistics 108 spot finding four corners 90 up and down regulated features 100 QC Report 1 color only Histogram of Signals Plot 96 Multiplicative Surface Fit 99 Spatial Distribution of Median Signals 102 326 QC Report Types 1 color gene expression 72 75 79 2 color gene expression 69 CGH 77 81 results features 178 integrating with Resolver 220 QC Report parameters and stats 121 statistical 159 text file 127 text file output 127 return codes 305 Rosetta Biosoftware use of XML output with 220 S signals background subtracted adjusted 264 background subtracted unadjusted 263 statistical results 159 T tables FEPARAMS 129 parameters 129 statistical r
281. tion method as selected above Find Spots SpotAnalysis_StatBoundBG float Multiplier parameters for background outlier rejection method as selected above Find Spots SpotAnalysis_SpotStatsMethod integer Different algorithms to calculate spot statistics 1 CookieCutter method 2 Whole Spot method Find Spots SpotAnalysis_CookiePercentage float The fraction of the nominal radius used to draw the cookie around the centroid of each spot Find Spots SpotAnalysis_ExclusionZone float The outer radius of the exclusion zone Percentage based on nominal spot size Find Spots SpotAnalysis_EstimateLocalRadius integer The option to calculate the outer radius of 1 True the local background based on row and 0 False column spacing Find Spots SpotAnalysis_LocalBGRadius float The outer radius of the local background supplied from the protocol if EstimateLocalRadius is not selected 134 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Find Spots SpotAnalysis_SignalMethod integer The option for the statistical method for determining signals from features either mean and standard deviation or median and normalized IQR Mean is 1 and Median is 2 Find Spots SpotAnalysis_ComputePixelSkew integer The option to set whether the program true 1 computes and shows the skew of each feature
282. tions Found Spot Threshold Number of Corner Feature Side Dimension Depending on the format selected by the software or by you the default settings for this step change See the following rows for the default values for finding spots Use the Nominal Diameter from the Grid Template Automatically Determine Recognized formats 65 micron feature size 30 micron feature size and Third Party Hidden if Array Format is set to Automatically Determine True All Formats except Third Party False Third Party Hidden if Array Format is set to Automatically Determine 0 300 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 5 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 0 200 All Formats except Third Party Hidden if Array Format is set to Automatically Determine 20 All Formats except Third Party Automatically Determine Recognized formats same as those listed above except 244k 10uM replaces 65 micron feature size 10 micron scans Hidden if Array Format is set to Automatically Determine True All Formats Feature Extraction Reference Guide 49 1 Default Protocol Settings Table 7 Default settings for miRNA_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Spot Deviation Limit Calculation of Spot Statistics Method Cookie Percentage Exclusi
283. tive and Negative Controls Signal Characteristics Normalization Correction Method Max Number Ranked Probes Peg Log Ratio Value Spikein Target Used Min Population for Replicate Stats Grid Test Format PValue for Differential Expression 4 True Most Conservative 0 1000 0 1000 True True True Automatically Determine Use Rank Consistent Probes 0 050 False False False OnlyPositiveAndSignificantSignals Linear 1 4 00 False 3 Automatically Determine Recognized formats 60 micron and 30 micron feature size third party 0 010000 28 Feature Extraction Reference Guide Default Protocol Settings 1 Table 3 Default settings for ChIP_1200_Jun14 protocol continued Protocol step Parameter Default Setting Value v12 0 Percentile Value 75 00 Generate Results Type of QC Report CGH_ChIP Generate Single Text File True JPEG Down Sample Factor 4 Feature Extraction Reference Guide 29 1 Default Protocol Settings GE1_1200_Jun14 This protocol is a 1 color gene expression protocol for use with the One Color Microarray Based Gene Expression Analysis Quick Amp Labeling lab protocol v5 7 or higher publication number G4140 90040 or G4140 90041 for Tecan HS Pro Hybridization Table4 Default settings for GE1_1200_Jun14 protocol Protocol step Parameter Default Setting Value v12 0 Place Grid Array Format For any format automatically Automatically Determine determined or selected by
284. tory The complete path to the directory where you want to keep the metric sets This command exports all the dyenormlists in a given database to the location you specify FeNoWindows c exportdyenormlists lt to_directory gt to_directory The complete path to the directory where you want to keep the dye norm lists Example FeNoWindows c exportdyenormlists C DyeNormList This command gets the barcode from the tiff image FeNoWindows b tif_file This gets the GUID of the corresponding low PMT scan from the input high PMT scan for making XDR project files Example FeNoWindow getxdrscanid high_pmt_tif_file 303 6 Command Line Feature Extraction GetProtocolList This gets the list of protocols available from within Feature Extraction Example FeNoWindows getprotocollist 304 Feature Extraction Reference Guide Return Codes Feature Extraction Reference Guide Command Line Feature Extraction 6 Return codes are integers that represent errors that caused FeNoWindows to fail without generating output They are listed in Table 40 Table 40 FeNoWindows return codes Return code Description 0 The extraction project completed without errors The output file contains extraction information for every extraction This success code does not guarantee the validity of every extraction in the set The input parameter was not found Check that the filename and path are correct or that the database entry
285. traction Reference Guide 1 Default Protocol Settings Place Grid The parameters and values differ depending on the selected microarray format Table9 Place Grid Default values in common and differences for grid formats Parameter Default values Formats using Default Value Array Format Placement Method Enable background peak shifting Use central part of pack for slope and skew calculation Use the correlation method to obtain origin X of subgrids Automatically Determine Allow some distortion False False False Single Density 11k 22k Double Density 44k 95k 185k 65 micron feature size 30 micron feature size single pack 30 micron feature size multi pack 185k 10uM 65 micron feature size 10 micron scans 25k Third Party All All except 30 micron feature size single pack and 30 micron feature size multi pack All except 30 micron feature size single pack and 30 micron feature size multi pack All except 30 micron feature size single pack and 30 micron feature size multi pack 56 Feature Extraction Reference Guide Table 10 Optimize Grid fit Default values in common and differences for grid formats Optimize Grid fit Default Protocol Settings The parameters and values differ depending on the microarray format 1 Parameter Iteratively Adjust Corners Adjustment Threshold Maximum Number of Iterations Found Spots Threshold Number of C
286. trics_differentialExpressionPValue These are the same features shown as up or down regulated in Figure 29 Feature Extraction Reference Guide QC Report Results 2 Plot of LogRatio vs Log ProcessedSignal LogProcessedSignal in the plot is Log rProcessedSignal x gProcessedSignal 2 Feature Extraction Reference Guide This plot shows the log ratios of non control inliers vs the log of their red and green processed signals The color coding signifies the degree to which features are significantly differentially expressed those that are up regulated red those that are down regulated green and those that cannot confidently be said to show gene expression light yellow For the CGH QC Report these are referred to as Positive Negative log ratios base 2 The threshold that is used to determine significance is set in the protocol QCMetrics_differentialExpressionPValue Features that were used for normalization are indicated in blue Significance takes precedence over normalization for the color coding that is features that are both significantly differentially expressed and used for normalization are color coded either red or green SNP probes are not included LogProcessedSignal Significantly down regulated Significantly up regulated Used to normalize Not differentially expressed Figure 29 QC Report Plot of Up and Down Regulated Features 101 2 OC Report Results Spatial Distribution o
287. trics_minReplicatePopulation As described above for non control probes CV s are calculated for inliers for both red and green background corrected signals The CV for each probe is plotted on the next page vs the average of its background corrected signal The median of these CV s is shown directly beneath the plot Agilent SpikeIns CV of Average BG Sub Signal 4660 9320 13980 18640 23290 Ave_BGSubSignal CV for Red CY for Green Median CV 10 21 Red 10 57 Green Figure 35 QC Report Agilent Spikelns CV of Average BGSub Signal 108 Feature Extraction Reference Guide Feature Extraction Reference Guide QC Report Results 2 Reproducibility plot for 1 color gene expression spike in probes This graph plots CV vs the log_gMedianProcessedSignal for the 1 color gene expression microarray experiment The region where the CV flattens out and is not tightly correlated with signal is the range where noise is proportional to signal This is generally the range used to calculate the median CV Agilent SpikeIns CV of Avg Processed Signal Plot 2 Log_gMedianProcessedSignal CY for Green Median CV 6 17 Figure 36 1 color QC Report Agilent Spikelns CV of Avg Processed Signal Plot 109 2 OC Report Results Reproducibility plot for miRNA non control probes This graph plots CV vs the log_gMedianProcessedSignal for the 1 color miRNA microarray experiment The region where the C
288. tures Green Features Red Types Options Description gProcessedSigError gNumPixOLHi gMedianSignal gPixNormlOR glsSaturated glsFeatNonUnifOL rProcessedSigError rNumPixOLHi rMedianSignal rPixNormlQR rlsSaturated rlsFeatNonUnifOL float integer float float boolean boolean 1 Saturated or 0 Not saturated g r lsFeatNonUnifO L 1 indicates Feature is a non uniformity outlier in g r The universal or propagated error left after all the processing steps of Feature Extraction have been completed In the case of one color ProcessedSignalError has had the Error Model applied and will contain at least the larger of the universal UEM error or the propagated error If multiplicative detrending is performed ProcessedSignalError contains the error propagated from detrending This is done by dividing the error by the normalized MultDetrendSignal Number of outlier pixels per feature with intensity gt upper threshold set via the pixel outlier rejection method The number is computed independently in each channel These pixels are omitted from all subsequent calculations Raw median signal of feature from inlier pixels in green and or red channel The normalized Inter quartile range of all of the inlier pixels per feature The range is computed independently in each channel Boolean flag indicating if a feature is saturated or not A feature is saturated IF 50 of the
289. tures that are population outliers in either the green or red channel The percentage of local backgrounds that are population outliers in either channel The percentage of non control features that are feature non uniformity outliers in either the green or red channel or are saturated in both channels Background offset constant to adjust all feature signals If Adjust Background Globally is set True all feature signals are adjusted by this offset If set to the value entered in the protocol all feature signals are adjusted so that very low level feature signals equal the protocol value Number of background subtracted features with negative signals Feature Extraction Reference Guide 161 3 Text File Parameters and Results Table 21 Stats results contained in the text output file STATS table continued Stats Green Channel Stats Red Channel Type Description gNonCtrlNumNegFeatBGSub Sig gLinearDyeNormFactor gRMSLowessDNF DyeNormDimensionlessRMS DyeNormUnitWeightedRMS gSpatialDetrendRMSFit gSpatialDetrendRMS Filtered MinusFit gSpatialDetrendSurfaceArea gSpatialDetrendVolume gSpatialDetrendAveFit rNonCtrliNumNegFeatBGSubSig rLinearDyeNormFactor rRMSLowessDNF rSpatialDetrendRMSFit rSpatialDetrendRMS Filtered MinusFit rSpatialDetrendSurfaceArea rSpatialDetrendVolume rSpatialDetrendAveFit integer float float float float float float float
290. umber of spots per column of each subgrid Feature Extraction Reference Guide 129 3 Text File Parameters and Results Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description Grid_RowSpacing float Space between rows on the grid Grid_ColSpacing float Space between column on the grid Grid_OffsetX float In a dense pack array the offset in the X direction Grid_OffsetY float In a dense pack array the offset in the Y direction Grid_NomSpotWidth float Nominal width in microns of a spot from grid Grid_NomSpotHeight float Nominal height in microns of a spot from grid Grid_GenomicBuild text The build of the genome used to create the annotation if available If the genome build is not available not all designs have this information then it is not put out All recent and all future designs have it FeatureExtractor_Barcode text Barcode of the Agilent microarray read from the scan image FeatureExtractor_Sample text Names of hybridized samples red green FeatureExtractor_ScanFileName text Name of the scan file used for Feature Extraction FeatureExtractor_ArrayName text Microarray filename FeatureExtractor_DesignFileName text Design or grid file used for Feature Extraction FeatureExtractor_PrintingFileName text Print file if available used for Feature Extraction FeatureExtractor_PatternName text Agilent pattern file name Fe
291. umn on page 102 Feature Extraction Reference Guide 10 1 12 Negative Control Stats Average Net Signals StdDev Net Signals Average BG Sub Signal StdDev BG Sub Signal Local Bkg inliers Number Avg SD Foreground Surface Fit RMS_Fit RMS_Resid Avg_Fit Multiplicative Surface Fit RMS_Fit Reproducibility C for Replicated Probes Green 17 94 1 63 0 22 1 30 Green 44237 35 66 1 18 Green 1 08 1 95 42 22 Green 0 11 Median CV Signal inliers Agilent SpikeIns Non Control probes Yo CW 73 24 24 85 5 86 5 03 4 02 3 15 4 80 4 98 6 28 Green BGSubSignal 12 95 ProcessedSignal 5 22 Agilent SpikeIns Signal Statistics Median Log log Probe Name Relative 08 Conc Sig E1A_r60_3 0 30 0 44 E1A_r60_a104 1 30 0 87 E1A_r60_a107 2 30 1 63 E1A_r60_a135 3 30 2 68 E1A_r60_a20 3 83 3 06 E14_r60_a22 4 30 3 55 E1A_r60_a97 4 82 4 08 E1A_r60_n11 5 30 4 67 ELA_r60_n9 5 82 4 84 E1A_r60_1 6 30 5 28 11 14 Green 11 18 4 98 StdDev 0 19 0 11 0 03 0 02 0 02 0 01 0 02 0 02 0 03 0 05 13 Page 2 of 3 Spatial Distribution of Median Signals for each Row 100 90 80 70 60 50 40 30 20 10 Median Signal 0 4 51 101 151 201 251 301 351 401 451 501 Row Median BGSub Signal for Row Median Proc Signal for Row Spatial Distribution of Median Signals for each Column 100 90 80 70 60 50 WARM 40 30
292. und subtracted signal To g r MeanSignal display the values used to calculate this g r BGUsed variable using different background signals and settings of spatial detrend and global background adjust see Table 34 on page 254 glsPosAndSignif rlsPosAndSignif Boolean g r isPosAndSignif Boolean flag established via a 2 sided 1 indicates Feature is positive and significant above background t test indicates if the mean signal of a feature is greater than the corresponding background selected by user and if this difference is significant To display variables used in the t test see Table 34 on page 254 198 Feature Extraction Reference Guide Text File Parameters and Results 3 Features Green Features Red Types Options Description glsWellAboveBG rlsWellAboveBG Boolean Boolean flag indicating if a feature is WellAbove Background or not feature passes g r lsPosAndSignif and additionally the g r BGSubSignal is greater than 2 6 g r BG_SD You can change the multiplier 2 6 SpotExtentX float Diameter of the spot X axis gBGMeanSignal rBGMeanSignal float Mean local background signal local to corresponding feature computed per channel inlier pixels glotalProbeSignal float This signal is the robust average of all the processed green signals for each replicated probe multiplied by the total number of probe replicates the EffectiveFeature SizeFraction the Nominal Spot Area and the Weight For miRNA analys
293. ure of this plot can indicate the appropriateness of background method choices The plot should be linear The intersection of the red vertical and horizontal lines shows the location of the median signal The numbers along the edge of the lines represent the location of the median signal on the plot The values under the plot indicate the number of non control features that have a background corrected signal less than zero SNP probes are not included Red and Green Background Corrected Signals Non Control Inliers f i a 2 100 1000 10000 100000 gBGSubSignal Background Subtracted Signal Features NonCtri with BGSubSignals lt 0 0 Red 0 Green Figure 23 QC Report Plot of Background Corrected Signals Feature Extraction Reference Guide 95 2 OC Report Results Histogram of Signals Plot 1 color GE or CGH The purpose of this histogram is to show the level of signal and the shape of the signal distribution The histogram is a line plot of the number of points in the intensity bins vs the log of the processed signal SNP probes are not included 500 450 4 a 350 T S 300 a 250 200 E 150 50 o bua iin 0 1 2 3 4 5 1 Log of BG SubSignal I Histogram of Signals Features NonCtrl with BGSubSignal lt 0 4778 Green Figure 24 1 color QC Report Histogram of Signals Plot 96 Feature Extraction Reference Guide QC Report Results 2 Local Background Inliers With th
294. ureExtractor_ColorMode integer A flag to indicate output color 0 One color green only 1 2 color 2 One color red only 148 Feature Extraction Reference Guide Text File Parameters and Results 3 Table 17 List of parameters and options contained within the FULL text output file FEPARAMS table Protocol Step Parameters Type Options Description FeatureExtractor_QCReportlype integer Type of OC report to generate 0 Gene Expression 1 CGH_ChIP 2 miRNA 4 Streamlined CGH FeatureExtractor_OutputOCReport integer Generate output details on OC report GraphText 1 True graphs 0 False Feature Extraction Reference Guide 149 3 Text File Parameters and Results COMPACT FEPARAMS Table Table 18 List of parameters and options contained within the COMPACT text output file FEPARAMS table Protocol Step Parameters Type Options Description Protocol Name text Name of protocol used Protocol_date text Date the protocol was last modified Scan_ScannerName text Agilent scanner serial number used Scan_NumChannels integer Number of channels in the scan image Scan_date text Date the image was scanned Scan_MicronsPerPixelX float Number of microns per pixel in the X axis of the scan image Scan_MicronsPerPixelY float Number of microns per pixel in the Y axis of the scan image Scan_OriginalGUID text The global unique identifier for the scan image Scan_NumScanPass 1or2 For 5 micron scans indicates whether the scan mode was a si
295. us or minus 3 errors of the negative control fit The residual of the surface fit is the Error on background subtrac tion in the Additive Error Estimation see Step 16 Determine the error in the signal calculation on page 266 Feature Extraction Reference Guide Feature Extraction Reference Guide How Algorithms Calculate Results The FeaturesInNegativeControlRange algorithm has been shown to more accurately estimate zero than the All Feature Types background algorithm This improvement is shown by viewing the features used in the additive detrend algorithm colored in blue superimposed on the InterpolatedNegCtrlSubSignal distribution You can see that the signals of those features are closer to zero when the FeaturesInNegativeControlRange algorithm is used AllDetrend FeatInNCRange SetDatrend 0 SaDetrend 2 1800 y 1600 Color by ginFitAddDetrend A mom 1200 S 1000 Q Os 600 400 aE tlt tl TT Peer ce Ei are raat eal gaeaSgSeSsskeee gevaggngsssseeen interpolatedNegCtriSubSignal Figure 63 The effects of using all features for detrending shown in the left figure as compared to using the features in the negative control range shown in the right figure Features that had detrending added are shown in blue The FeaturesInNega tiveControlRange algorithm more accurately centers the val ues around zero A 2D Loess algorithm fits the surface on the mean intensities of the filtered low intens
296. utliers False All Allow Positive and Negative Controls False All Signal Characteristics OnlyPositiveAndSignificantSignals All Normalization Correction Method Linear and Lowess GE2 Linear CGH ChIP Lowess Only GE2 NonAT Max Number Ranked Probes 1 All except for GE2 8000 GE2 64 Feature Extraction Reference Guide Default Protocol Settings 1 Compute ratios calculate metrics and generate results Some of these parameters and values are the same for all the protocols others vary and still others do not even use a protocol step Table 15 Values in common and differences in protocols Protocol step Parameter Default Value v12 0 Compute Ratios Calculate Metrics Generate Results Generate Results Peg Log Ratio Value Spikein Target Used Min Population for Replicate Statistics Grid Test Format PValue for Differential Expression Percentile Value Type of QC Report Generate Single Text File JPEG Down Sample Factor 4 00 Not applicable for GE1 and miRNA True GE1 GE2 miRNA False CGH ChIP GE2 NonAT 5 3 for CGH and ChIP Automatically Determine Not applicable for GE2 NonAT 0 010000 All 75 00 All Gene Expression for GE1 or GE2 Streamlined CGH for CGH CGH_ChIP for ChIP miRNA for miRNA True All 4 All Feature Extraction Reference Guide 65 1 Default Protocol Settings Compute ratios calculate metrics and generate results 66 Feature Extraction Refe
297. versity School of Medicine Ultimate Grid acknowledgment This software contains material that is Copyright c 1994 1999 DUNDAS SOFTWARE LTD All Rights Reserved Feature Extraction Reference Guide Feature Extraction Reference Guide LibTiff acknowledgement Part of this software is based upon LibTIFF version 3 8 0 Copyright c 1988 1997 Sam Leffler Copyright c 1991 1997 Silicon Graphics Inc Permission to use copy modify distribute and sell this software and its documentation for any purpose is hereby granted without fee provided that i the above copyright notices and this permission notice appear in all copies of the software and related documentation and ii the names of Sam Leffler and Silicon Graphics may not be used in any advertising or publicity relating to the software without the specific prior written permission of Sam Leffler and Silicon Graphics THE SOFTWARE IS PROVIDED AS IS AND WITHOUT WARRANTY OF ANY KIND EXPRESS IMPLIED OR OTHERWISE INCLUDING WITHOUT LIMITATION ANY WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE IN NO EVENT SHALL SAM LEFFLER OR SILICON GRAPHICS BE LIABLE FOR ANY SPECIAL INCIDENTAL INDIRECT OR CONSEQUENTIAL DAMAGES OF ANY KIND OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE DATA OR PROFITS WHETHER OR NOT ADVISED OF THE POSSIBILITY OF DAMAGE AND ON ANY THEORY OF LIABILITY ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFT
298. warnings on the Summary Reports about unsuccessful gridding They also produce the assessment shown in the QC Report of whether the grid needs to be evaluated or not In Feature Extraction new tests have been added and thresholds tuned to decrease the number of false negatives Summary Report shows no problems when there are and false positives Summary Report shows a problem when there isn t The parameters for these tests do not appear in the protocols but they do appear in the FEParams output The following shows a question asked by each test the metric used to answer the question stat name that appears in the result text file as the Statistics table and the threshold to assess gridding success or failure If a grid fails any one of these tests a warning or warnings appear in the reports How many features are not found along the edge of the microarray Stat name MaxSpotNotFoundEdges Threshold_Max 0 72 How many local background regions are flagged as non uniform outliers in either channel Stat name AnyColorPrentBGNonUnifOL Threshold_Max 2 How broad is the distribution of NegControl net signals Stat name Max gNegCtrlSDevNetSig rNegCtrl1SDevNetSig Threshold_Max 100 What is the median CV of BGSubSignal of the NonControl replicated sequences 281 5 How Algorithms Calculate Results Stat names Max gNegCtrlMedPrentCVBGSubSig rNegCtrl1MedPrentCVBGSubSig or just the green stat for a 1 co
299. where X is mean raw pixel intensity for the feature or background i e MeanSignal or BGMeanSignal respectively Step 11 Determine if the feature is a population outlier IsFeatPopOL Agilent provides two different statistical algorithms for identifying population outliers You select the appropriate algorithm to use in the protocol For probe sequences with enough replicate features Feature Extraction uses the IQR test for population outlier analysis The minimum number of replicates needed is set by the protocol field Minimum Population and is set to 10 as the default for most Agilent protocols Feature Extraction Reference Guide How Algorithms Calculate Results 5 If the protocol choice Use Qtest for Small Populations is set to True the Q test method is used when a probe sequence has fewer than the minimum population number of features The Q test choice is set to True for Agilent s newer protocols Otest for replicate features lt minimum population number Q test allows population outlier flagging for probe sequences from one less than the minimum population number down to 3 This test is especially useful for NegC probes on CGH microarrays Flagging features as population outliers is needed to accurately calculate NegCAvg and SD statistics It is also useful for the miRNA extraction where flagging features as population outliers is needed to accurately calculate Gene statistics This algorithm uses th
300. y Based Gene Expression Analysis Quick Amp Labeling lab protocol v5 7 or higher publication number G4140 90050 or G4140 90051 for Tecan HS Pro Hybridization Table5 Default settings for GE2_1200_Jun14 protocol Protocol step Parameter Default Setting Value v12 0 Place Grid Array Format For any format automatically Automatically Determine determined or selected by you the Recognized formats Single software uses the default Density 11k 22k 25k Double Placement Method Density 44k 95k 185k 185k 10 Parameters that apply to specific uM 65 micron feature size also formats appear only if that formatis with 10 micron scans 30 micron selected feature size single pack and multi pack and Third Party Placement Method Hidden if Array Format is set to Automatically Determine Allow Some Distortion All formats Enable Background Peak Shifting Hidden if Array Format is set to Automatically Determine Set to false for all arrays except 30 microns single pack and multi pack for which it is set to true Use central part of pack for slope Hidden if Array Format is set to and skew calculation Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack for which it is set to True Use the correlation method to Hidden if Array Format is set to obtain origin X of subgrids Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack f
301. you the Recognized formats Single software uses the default Density 11k 22k 25k Double Placement Method Density 44k 95k 185k 185k 10 Parameters that apply to specific uM 65 micron feature size also formats appear only if that formatis with 10 micron scans 30 micron selected feature size single pack and multi pack and Third Party Placement Method Hidden if Array Format is set to Automatically Determine Allow Some Distortion All formats Enable Background Peak Shifting Hidden if Array Format is set to Automatically Determine Set to false for all arrays except 30 microns single pack and multi pack for which it is set to true Use central part of pack for slope Hidden if Array Format is set to and skew calculation Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack for which it is set to True Use the correlation method to Hidden if Array Format is set to obtain origin X of subgrids Automatically Determine Set to False for all arrays except 30 microns single pack and multi pack for which it is set to True 30 Feature Extraction Reference Guide Table 4 Default settings for GE1_1200_Jun14 protocol continued Default Protocol Settings 1 Protocol step Parameter Default Setting Value v12 0 Optimize Grid Fit Grid Format Find Spots Spot Format The parameters and values for optimizing the grid differ depending on the format I

Download Pdf Manuals

image

Related Search

Related Contents

組立・取扱説明書 突っ張り薄型シューズ  総合カタログ - アドウィン  Descriptif des variables de l`enquête sur la santé et la  Cingular PPC-6600 User's Manual  Handbuch  取扱説明書 CamPlay  Pluralisme thrapeutique de femmes franaises en priode de    Logitech Presenter R400 User's Manual  Samsung SGH-X630 Lietotāja rokasgrāmata  

Copyright © All rights reserved.
Failed to retrieve file