Home

User Manual for - Statistician

1. Statistician Powerful Easy to Use Statistics Add in for Excel User Manual for Statistician Lite Statistician Standard from xlQA www StatisticianAddin com Preface The Statistician Excel 2007 2010 2013 365 addin performs a range of statistical analysis within the Microsoft Excel environment Excel is a powerful and widely used data analysis tool that was not originally designed for statistical analysis Although Excel contains a number of statistical functions see Appendix A it is often difficult to implement in Excel a number of the statistical tests to the level required by many researchers and students Statistician overcomes this shortcoming of Excel Statistician is totally integrated within Excel making the extensive graphical data analysis and presentation features of Microsoft Excel available to the user without having to cut and paste between various software packages Statistician is unique amongst Excel statistical addins because of its extensive use of meaningful input forms that make the use of the software easy and intuitive Because statistical analysis is performed upon random variables Statistician forces the user to define Data Variables which are consequently used for statistical analysis The data observations which comprise a Data Variable are stored in a hidden worksheet and are saved when the user exits Excel This feature gives Statistician the look and feel of a professional statis
2. on the second object Let dmn be the distance between the two objects and let k be the number of measures on each object The full set of distance measures available in Statistician are Distance Measure Definition k Euclidian dmn HCH ni i 1 k Squared Euclidian a xa ni i 1 k Manhattan City Block a4 HI nj i 1 Chebychev Maximum dmn sup Um nil 1sisk dmn 1 fan Correlation 1 where fnn is the sample correlation between m and n dmn 1 tal Correlation 2 where fnn is the sample correlation between m and n dmn JI Tmn where fmn is the sample correlation between m and n Correlation 3 k Cosine eee Amir X Nj mn ek 2 k ai dint MX Liar Ni 1 k p f dnn mi ni Minkowski ZS hs dee i where p 1 default is p 2 117 k Im nj Bray Curtis dmn EE L m nil i 1 M n Canberra dmn Jam ml Lu mc nj i 1 dmn m n S 1 m n where S is the inverse of the covariance Mahalanobis matrix between all objects Note that distances cannot be standardized when using the Mahalanobis distance measure Statistician also offers a number of distance measures that are suitable for working with binary data 0 and 1 Define a b cand das a count of i such that m n 1 b count of i such that m 1 and n 0 c count of i such that m 0
3. 73 Using Statistician To perform a Kolmogorov Smirnov goodness of fit test the user selects a Data Variable from the Distribution of combobox in the Hypothesis Test groupbox The hypothesized distribution of the sample is selected from the Distribution combobox The parameters of the hypothesized distribution are entered in the Distribution Parameters groupbox which is presented to the user after a distribution is selected Distributions which can be tested are as follows Distribution Parameters Type Normal Mean Decimal Standard Deviation Positive Decimal Uniform Lower Bound Decimal Upper Bound Decimal Log Normal Mean Decimal Standard Deviation Positive Decimal Exponential Mean Positive Decimal Weibull Scale Positive Decimal Shape Positive Decimal References David J Sheskin 2007 Handbook of parametric and nonparametric statistical procedures 4th ed Chapman amp Hall CRC 74 Section 8 y Tests not available in Statistician Lite a Multinomial Experiment A multinomial experiment is an extension of a binomial experiment where two or more outcomes are possible in each trial Assume there are n trials of an experiment with k possible outcomes Let pj 1 lt i lt k be the probability of outcome i The null and alternative hypotheses are Ho Py p2 Pe H At least one probabilty is dif ferent from the others Let Or denote the obser
4. Le 1 Bib y Se YF A r t The in sample forecast for the three factor model with multiplicative seasonal indexes is given as F L 1 bt 1 X S _ The m step ahead out of sample forecast beginning after the final observation in the series is calculated as 113 Ft m Lr mbr x S The triple factor Holt Winters requires initial values for the seasonal indices and also for the initial trend value b These can be estimated in a variety of ways Two seasons of data or 2s observations are employed in this task The initial value of the trend at time s is given as Leon Mac 4228 TVs b EE E H Z nn yi Sy d i 1 The initial value for the it seasonal index is the seasonal calculated from the first two seasons of data For additive seasonal indexes it is given as _ Yit Vsti ES EE where A is the arithmetic average of all observations in the first two seasons For multiplicative seasonal indexes the it seasonal index is given as Sj ZZ The initial value of the smoothed series at time s is given as S 1 D i 1 From these initial values of S bs and s the calculation of the series is given from time s 1 Of course some users will wish to employ other initial values for the level trend and seasons This can be easily implemented by manually changing the formulas within Excel itself Users may also wish to select the alpha beta and gamma parameters based upon the criteria
5. PEARSON PERCENTILE PERCENTRANK PERMUT POISSON PROB QUARTILE RANK RSQ SKEW SLOPE SMALL STANDARDIZE STDEV STDEVA STDEVP STDEVPA STEYX TDIST TINV TREND TRIMMEAN TTEST VAR VARA Returns the standard normal cumulative distribution Returns the inverse of the standard normal cumulative distribution Returns the Pearson product moment correlation coefficient Returns the k th percentile of values in a range Returns the percentage rank of a value in a data set Returns the number of permutations for a given number of objects Returns the Poisson distribution Returns the probability that values in a range are between two limits Returns the quartile of a data set Returns the rank of a number in a list of numbers Returns the square of the Pearson product moment correlation coefficient Returns the skewness of a distribution Returns the slope of the linear regression line Returns the k th smallest value in a data set Returns a normalized value Estimates standard deviation based on a sample Estimates standard deviation based on a sample numbers text and logical values including Calculates standard deviation based on the entire population Calculates standard deviation based on the entire population including numbers text and logical values Returns the standard error of the predicted y value for each x in the regression Returns the Student s t distribution Returns the inverse of the Student
6. p degrees of freedom where N is the number of observations and p is the number of parameters in the model Let l b denote the maximum of the log likelihood function for a Poisson model The restricted log likelihood is estimated with only a constant as the independent variable and is denoted by l b The LR statistic tests the joint hypothesis that all of the independent variables except the constant are equal to zero It is given as LR 2 DO un The asymptotic distribution of the LR statistic has a y7_ distribution from which the probability of the LR statistic can be obtained The pseudo R is a likelihood ratio index and is defined as pseudo R 1 The Akaike information criterion corrected Akaike information criterion and Bayesian information criterion are given as 102 2I b 2k ACS 4 n n 21 b 2k Oye C ted AIC orrecte Ski 21 b kin k peaa O Eng n n Using Statistician Count Models To estimate a Poisson regression the user selects a binary dependent variable from the Dependent Variable combobox and then selects the independent variables from the Independent Variables listbox If no independent variables are selected then a constant term must be included in the regression Checking either With Constant or Without Constant in the Constant frame determines if a constant term is estimated in the model The regression can be performed with all observations by clicking the U
7. Also called weighted pair group method using centroid averages UPGMC This method is identical to the Centroid method except that equal weighting is to the clusters to be merged The combinatorial formula is Drg Dex Dsx Se 1 Drs ZRS Ward The cluster to be merged is the one which will produce the least increase in the within cluster sums of squares error The sums of squares error within a cluster is from the centroid of a cluster and the distances between objects in that cluster The combinatorial formula is Drg NRtNK NstnNK NK D TA RK NRHNSHNK SK NRtNstnKk Dee NRHNSHNK RS Cluster analysis often uses standardized data as unstandardized data yields inconsistent results when different scales are employed to form clusters 120 ii K means Clustering K means clustering is simple non hierarchical technique For a specified number of clusters K the clustering algorithm proceeds as follows 1 Arbitrarily assign each object to one of K clusters 2 Calculate the centroid of each cluster 3 For each object in each cluster calculate it s Euclidian distance from itself to the centriod of all clusters If the object is closer to another cluster s centroid then move it into that cluster 4 Recalculate the centroid of each cluster 5 Continue looping through steps 3 and 4 until no further changes are made 121 b Principal Com
8. d then the null hypothesis is rejected if T gt eri If the alternate hypothesis is H 0 0 then the null hypothesis is rejected if T gt is or T lt EH A confidence interval for the population variance at x n 1 s n 1 s confidence level 1 a is given as lt o lt Tanas X a 2n 1 X a 2 n 1 b Two Samples Statistician Variance Test Two Samples Hypothesis Test a 0 05 Null Hypothesis Variance of Select a Data Variable v wi Variance of Select a Data Variable v Alternative Hypothesis Variance of wi Variance of V Output Confidence Interval Significance Significant Digits 2 Y O1 5 10 _ Other wl Format Output Output Results The primary function Two Sample groupbox is to test the hypothesis that the population variances a 07 of two selected Data Variables are equal to greater than or equal to or less than or equal to each other 53 The following sets of null and alternative hypotheses can be tested a Ho o o2 H 0 of two tailed test or Ha gt o2 or H4 02 lt o2 b Ho oi gt of H4 0f lt o2 c Ho o2 lt o2 Hy 0 gt o2 The test statistic is given as s s The F statistic has an F distribution with n 1 degrees of freedom in the numerator and n 1 degrees of freedom in the denominator where n and n are the sample sizes of the first and second Data Variables respectively If the alternate hypothesis is H
9. d Statistical Tables The Statistical Tables form is selected by clicking the Statistical Tables menu item from the Tools button on the Statistician ribbon tab Statistician Statistical Tables Normal Alpha O Student t a OF Chi Squared Binomial Cumulative Poisson Cumulative Mann Whitney One Tail Mann Whitney TwoTails Wilcoxon Rank Sum Wilcoxon Signed Rank Sum Runs Lower Critical Values Runs Upper Critical Values Spearman Rho Studentized Range Durbin Watson O Kendall tau Format Output Output Results Manage Data Close 23 Statistical tables that can be outputted to an Excel spreadsheet are as follows 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Normal Distribution standard normal probabilities in the range 0 z where z is a normal critical value Student t Distribution Critical values from the student t distribution for both one and two tailed tests For a one tailed test significance values are o 0 1 0 05 0 025 0 01 0 005 0 001 and for a two tailed test significance values are a 0 2 0 1 0 05 0 02 0 01 0 002 F distribution Critical values for the F distribution for a given number of degrees of freedom The significance level of the table is selected from the Alpha textbox and can equal a 0 1 0 05 0 01 x Distribution Critical values
10. decision rule and conclusion for each of the five tests Statistician automatically determines if the user has selected a small or large sample based upon a the number of observations in each Data Variable and b the criteria defining a small and large sample outlined in the preceding discussions on each of the tests For each of the four location non parametric tests the following additional information for each selected Data Variable is also outputted Mann Whitney test Rank Sum Average Rank and U statistic Wilcoxon Rank Sum test Rank Sum Wilcoxon Signed Rank Sum test Number of positive negative zero and non zero differences Rank sum of positive T and negative T differences Sign test Number of positive negative and zero and differences and p value 69 g Non Parametric Tests Two or More Samples When comparing two or more samples of ordinal data the Kruskal Wallis and Friedman tests can be employed Assuming we have k populations of ordinal data The null and alternative hypothesis to be tested is Ho The location of the k populations are the same H The location of at least one poulation dif fers from the others h Kruskal Wallis Test The Kruskal Wallis test is employed to compare two or more populations of ordinal data which may have a non normal distribution and are independent Let n n denote the sample size of each of the k samples let n n n All of the o
11. lt 30 The critical value Terit is taken from a table of Wilcoxon signed rank sum test critical values for a given significance level a Statistician has Wilcoxon Signed Rank Sum critical values for one and two tailed tests where a 0 1 0 05 and 0 01 in the small sample case For a two tailed test with alternative hypothesis H L 12 the null hypothesis is rejected if T lt Toto For a one tailed test with alternative hypothesis H4 L4 lt L2 the null hypothesis is rejected if T lt Terita and Tt lt T7 For a one tailed test with alternative hypothesis H L gt Lz the null hypothesis is rejected if T lt Terita and Tt gt T7 The large sample case is defined when n gt 30 The test statistic is given as Z u 205 eee The z test statistic has a standard normal distribution For a two tailed test with alternative hypothesis H Lt the null hypothesis is rejected if z Saz where Z 2 is the critical value and a is the level of significance For a one tailed test with alternative hypothesis H L lt L3 the null hypothesis is rejected if z 2 z and T lt T For a one tailed test with alternative hypothesis H L gt L3 the null hypothesis is rejected if z z and T gt T7 66 f Sign Test The sign test tests if the location of matched pairs are different where the matched pairs may not be independent The test statistic is constructed by taking the difference of the matched pairs di X i2 x2
12. member currency to another by using the euro as an intermediary triangulation Returns data stored in a PivotTable report Returns the register ID of the specified dynamic link library DLL or code resource that has been previously registered Connects with an external data source and runs a query from a worksheet then returns the result as an array without the need for macro programming CUBEKPIMEMBER Returns a key performance indicator KPI name CUBEMEMBER property and measure and displays the name and property in the cell A KPI is a quantifiable measurement such as monthly gross profit or quarterly employee turnover used to monitor an organization s performance Returns a member or tuple in a cube hierarchy Use to validate that the member or tuple exists in the cube CUBEMEMBERPROPERTY Returns the value of a member property in the cube Use to validate that a member name exists within the cube 138 CUBERANKEDMEMBER CUBESET CUBESETCOUNT CUBEVALUE and to return the specified property for this member Returns the nth or ranked member in a set Use to return one or more elements in a set such as the top sales performer or top 10 students Defines a calculated set of members or tuples by sending a set expression to the cube on the server which creates the set and then returns that set to Microsoft Office Excel Returns the number of items in a set Returns an aggregated value from a
13. we Known and equal Za 207 Nae ee Wwe Where s and s2 are the sample variances of each random variable o and of are the known and unequal variances of each random variable o is the known and equal variance of each of the random variables and where w is half of the confidence interval width Using Statistician Test Difference in Population Means Independent Samples The Data Variables to be tested are selected from the Null Hypothesis comboboxes in the Hypothesis Test groupbox The hypothesized difference in the means of the Data Variables are entered in the Hypothesized Difference textbox in the Hypothesis Test groupbox The null hypothesis gt lt and alternative hypothesis lt gt are selected in the corresponding comboboxes in the Hypothesis Test groupbox If the standard deviation of the populations is unknown and unequal or unknown and equal then one of the Unknown and Unequal or Unknown and Equal radiobuttons are selected in the Standard Deviation groupbox In this case the sample standard deviation of the selected Data Variables are calculated and a t test is performed If the standard deviation of the populations is known and unequal or known and equal then one of the Known and Unequal or Known and Equal radiobuttons are selected in the Standard Deviation groupbox The known standard deviations are entered in the 38 textbox s that are enabled upon making one of these selections In
14. where zu and x are the gh observation on the first and second selected Data Variables respectively and d is the difference between the matched pairs 1 lt i lt n If d 0 then the ih matched pair is ignored Let the number of remaining matched pairs with a non zero difference be denoted by n The number of positive n and negative n differences are then recorded If the location of both samples is the same then it is expected that the population proportion of successes p is given by p 0 5 The null hypothesis is given as Ho p 0 5 The alternative hypotheses are given as Hip 0 H p gt 0 Ay p lt 0 The number of positive differences in the sample is modeled with a binomial distribution If n p gt 5 and n p 1 p 5 which implies n gt 10 when p 0 5 then this distribution can be closely approximated with a normal distribution with mean 0 5n _ and standard deviation 0 V0 25n The test statistic is given as z fe p 0 5Vn 67 g Kolmogorov Smirnov Test two sample The Kolmogorov Smirnov test tests if a sample from two random variables are drawn from the same distribution Let F X and F X denote the cumulative probability distribution from which the data from the first and second sample are drawn respectively The null hypothesis for the test is given as Ho Fy X F X The alternative hypotheses are given as Hy F X F X H1 Fy X gt F X Hy F X lt F X T
15. 0 7 lt o then the null hypothesis is rejected if F lt F 1 an 1 n 1 Where a is the level of significance If the alternate hypothesis is H1 a gt of then the null hypothesis is rejected if F gt F an 1 n 1 If the alternate hypothesis is H 0f 03 then the null hypothesis is rejected if F lt Fa_ e 2n 1n 1 or F gt F 2 n 1 n 1 A confidence interval for the ratio of the two population variances of oF S 2 2 Si ER 1 1S given as STi ie lt of o2 lt 37 F a 2m 1m2 1 c Many Samples Two tests are available to test if k samples have equal variances they are the Bartlett test and the Levine test The null and alternative Hypotheses are given as Ho o2 of gt o H At least one pair of variances are unequal i e a o i j 54 Bartlett Test Statistician Variance Test Bartlett Data Variables wl Summary Statistics Significance 1 5 O10 O Other zi Significant Digits H v Select All Variables DeSelect All Variables l g Output Results Manage Data Close The Bartlett test is constructed as follows Let n denote the size of the Oh sample and let n SL be the pooled sample size Let s be the variance of the Oh sample and 1 y n 1 s denote the weighted average of the variance of all samples as s TES The test statistic is given as 8 n k In s2 YE UDints 1 1 1 VEER Zh m m cl The T statistic h
16. 2y 1 The square root of the diagonal elements of the inverse of the Hessian matrix provides maximum likelihood standard errors for the parameter estimates For the Logit model the first partial derivative with respect to fis given as auf ap Am Stee and for the Probit model the first partial derivative with respect to J is given as ant wi ap Dra i 1 The estimated residuals from the models are given as Y F Xb where is a k x 1 vector of residual estimates and F is the functional form for either the Probit or Logit model Let l b denote the maximum of the log likelihood function for a Probit or Logit model The average log likelihood is given as l b n The restricted log likelihood is estimated with only a constant as the independent variable and is denoted by l b The LR statistic tests the joint hypothesis that all of the independent variables except the constant are equal to zero It is given as LR 2 1 5 un The asymptotic distribution of the LR statistic has a y7_ distribution from which the probability of the LR statistic can be obtained The McFadden R is a likelihood ratio index and is defined as McFadden R 1 l b The Akaike Bayesian and Hannan Quinn information criterion are given as 2I b 2k ACS n n 98 BIC any E n n 21 b 2kIn In k ce a HIC respectively Using Statistician Binary Models To esti
17. Ay Up lt D c Ho Up lt D Hi up gt D The sample average difference in matched pairs is given as n a 1 Xp SE e x2 X X2 i 1 where zu and x2 are the i observation on the first and second selected Data Variables respectively The quantities x and x are the means of the first and second selected Data Variables respectively The sample standard deviation of the differences in matched pairs is given as Sp n 1 net 3 Go ze GK i 1 The test statistic under the null hypothesis is given by t _ Xp D Sp Vn and has a t distribution with n 1 degrees of freedom The null hypothesis is rejected if Alternative Hypothesis Rejection Criterion Ay Up D t gt ta 2n 1 H up lt D tX tinai Hi up gt D t gt tan 1 The confidence interval for up is given by 7 Sp Xp taj2n 1 Ja 41 The required sample size to estimate up to within a particular value w is the lowest integer greater than Nin the expression pea 255 N w Using Statistician Test Difference in Matched Pairs Dependent Samples The Data Variables to be tested are selected from the Null Hypothesis comboboxes in the Hypothesis Test groupbox The hypothesized difference in the means of the Data Variables are entered in the Hypothesized Difference textbox in the Hypothesis Test groupbox The null hypothesis gt lt and alternative hypothesis lt gt ar
18. EE 59 H Crame r von Mises Testisi he edd EEN gel E eege assess 60 Section 7 Non Parametric Tests 61 a Runs KEE 61 b Non Parametric Tests Two Sample EE 63 Mann Whitney U Test ioen eare raria a ai ia a an ad E a aa a 64 d Wilcoxon Rank Sum TeStiisiewcccsscsecsiesleadese vance teslealdesedannansesecenss carcasedees dancevennakivaeeleas 65 e Wilcoxon Signed Rank Sum Test 66 Sis Leet gebiet le le cent ed eege EEN ee EAR Heated e dene 67 g Kolmogorov Snurnov Test two sample ufreet Ee iene AN ee Ree 68 g Non Parametric Tests Two or More Samples 70 6 Kruskal Wallis Testi casead ci cep TTT 70 i Priediman EE 71 ele NEE KEE 75 a M ltinomial et 75 B Contingency Tables sisi i reene e E aly E Wey 77 Section O ANOV Aus ele Ae Ne Nd ae Nees cand Eed eee saa raed te Peet 79 a Single factor ANOVA sessed adap Ee EE Abee 80 b Two factor ANOVA without replication cce cece cece ec ec eens cece en eeeeeeeeeeneeneaeaes 83 c Two factor ANOVA with Replication cccccccececc ec eeeececneceeee cece eneeeeeceeeeneenenenes 85 Section 10 Regr ssion Analy EE 89 Section 11 Bi ary M delS TEE 96 Section 12 Count Models 07 cas cee ce NEEN choca AEN REENEN ENNEN EEN SEN ductors enpevens daeeesd de 101 Section LS Time Series eeri are nea i ds SEN ty He ene lee ae ee ea 105 a FOreCastine TEE 105 b Holt Winters smoothing techniques wcicciei sian EENS eure ee dE 112 c Hodric
19. S My b bit for any given value oft An estimate of s is given as amp ys bp byt ii Quadratic Trend The time series is modeled as a quadratic function of the time variable t Using regression analysis the following model is estimated Ye Po Bit Bot amp where fo Band D are fixed parameters and N 0 a2 Let bo b and b be least squares estimates of Bp pand p respectively The smoothed values of y are the fitted values from the regression that is Se Ps b b t b t for any given value of t An estimate of is given as amp ys bo byt bot iii Exponential Trend The time series is modeled as a exponential function of the time variable t Using regression analysis the following model is estimated Inyt Bo Bit amp where Dn and E are fixed parameters and N 0 02 Let bo bh be least squares estimates of f and Du respectively The smoothed values of y is given by Si 9 et for any given value oft An estimate of is given as y e othit 107 iv Autoregessive Trend The time series is modeled as a function of p past values of the dependent variable Using regression analysis the following model is estimated p Yt Qo gt iVe i Et est where q is a fixed parameter and N 0 02 Let gcbe least squares estimates of qai The smoothed values of y is given by S ao Di aye An estimate of SKS Ge p r is giv
20. and Ze are the critical values for a two tailed and one tailed test respectively 44 The confidence interval for an estimate of the population proportion is given as p Zo SC The sample size required to estimate the population proportion to within a particular value w for a given confidence level oe can be calculated in two ways 1 Point estimate If there is reason to believe the true population proportion is reasonably close to the estimated sample proportion then the required sample size is given by the lowest integer greater than 24 201 f w2 2 Conservative To obtain a conservative estimate of the required sample size employing no assumptions about the true population proportion then the required sample size is given by the lowest integer greater than UE w2 45 Using Statistician Test Population Proportion The Data Variable to be tested is selected from the Null Hypothesis combobox in the Hypothesis Test groupbox The hypothesized proportion of successes in the Data Variable is entered in the Hypothesized Proportion textbox in the Hypothesis Test groupbox The null hypothesis gt lt and alternative hypothesis lt gt are selected in the corresponding comboboxes in the Hypothesis Test groupbox All Data Variables tested must be binary variables that is they have one of two distinct values referred to as success of failure These distinct values
21. are outputted The user can also test the hypothesis that any pairs of treatments have significantly different means Three tests are available the Fisher s least significant difference test Tukey Kramer test and the Scheffe The results of these tests are outputted by selecting the Fisher LSD Tukey Kramer or Scheffe radiobuttons respectively in the Pairwise Tests groupbox Statistician then outputs the null and alternative hypothesis the difference in the means of the Data Variables the critical value and the decision to reject or not reject the null hypothesis 82 b Two factor ANOVA without replication When two factors impact upon an experimental observation a two factor ANOVA table can be used to test the effect of both of these factors The treatment effect is usually considered as the first factor and a block or replicate effect is taken as the second factor Two sets of hypothesis are of interest a A test of the hypothesis that the means of all of the treatments are equal b A test of the hypothesis that the means of all of the blocks are equal Let uj be the mean of treatment i where 1 lt i lt a and let u be the mean of block j where 1 lt j lt b The null and alternative hypothesis for both sets of tests can be stated as a Ho wi p3 Ha H not all of the means of the treatments are equal b Ho we u3 up H not all of the means of the blocks are equal Let x j be the sam
22. cube 139
23. defining a small and large sample outlined in the preceding section 62 b Non Parametric Tests Two Sample Non parametric testing techniques involve testing hypothesis about ordinal or ranked data When working with ordinal data the concept of a mean makes little sense and consequently many statisticians prefer to refer to the location of the data as a measure of central tendency In the following tests two samples from two populations are drawn Let L and L denote the location of the first and second populations respectively The following set of null and alternative hypothesis can be tested a Hal L2 H L4 L two tailed test or H L4 gt L or Hi L lt Lp b Ho L1 gt Lp Hi L lt L c Ho Li lt L gt Hi L gt Lz Four location tests are available They are the Mann Whitney U test the Wilcoxon Rank Sum test the Wilcoxon Signed Rank Sum test and the Sign test 63 c Mann Whitney U Test The Mann Whitney U test tests the relative location of two independent samples of ordinal data Let n and n be the sample size of the observations drawn from both populations respectively and letn n n The data from both samples are pooled and then ranked from 1 to n Observations with a tied rank are assigned a rank which is the average of the corresponding raw ranks Define S as the sum of the ranks from the first sample and define S as the sum of the ranks from t
24. e number of ones in each Data Variable e number of zeros in each Data Variable e proportion of successes in each Data Variable 1 x and x are the number of successes in each Data Variable e pooled sample size n n n3 a x x2 e pooled proportion of successes p ae 1 2 Pi N22 e m 1 p n2 1 f2 e test statistic e critical value e p value e decision rule and e conclusion A x and pz 2 where To obtain the sample size for the difference in proportions for a given confidence interval width the Output Sample Size checkbox in the Confidence Interval groupbox is checked The user then enters the desired distance from the true difference in population proportion in the corresponding textbox The number of significant digits after the decimal point in the confidence interval output is selected from the Number of Significant Digits in Output combobox the default is 3 50 Section 5 Variance Tests not available in Statistician Lite The Variance Tests button on the Statistician ribbon tab provides the user with four menu items These are i One Sample ii Two Samples iii Bartlett and iv Levene Each of these menu item selections are described in detail below Note that in the following discussion the notation eon and F qn n denote values of the y and F distribution where a is the probability area in the right tail of the dis
25. eee Format Output Failure Output Results Gen The Test Population Proportion form is activated by selecting One Sample from the Proportion combobox in the Standard Tests groupbox The Test Population Proportion form has two primary functions a To test the hypothesis that the population proportion of successes mt in a selected Data Variable is equal to greater than or equal to or less than or equal to a hypothesized value p b To produce a confidence interval for the proportion of successes in a Data Variable 43 The following set of null and alternative hypotheses can be tested a Ho T p H m p two tailed test or Hi t gt p or Hen b Ho Tt 2 p Hi t lt p c Hon lt p H n gt p A binary variable can take on distinct values which are classified as either success or failure Denote p S as the sample proportion of successes where x is the number of successes in the sample and n is the sample size The sampling distribution of p under the null hypothesis is approximately normal with mean p and standard deviation p 1 p n The test statistic DP pa p n is approximately standard normally distributed The normality assumption is reasonably accurate when np gt 5 and n 1 p 5 The null hypothesis is rejected if Alternative Hypothesis Rejection Criterion Hy p z gt Zaye Hy lt p Z lt Za H n gt p Z gt Zo where Z 2
26. i Runs ii Mann Whitney iii Wilcoxon Rank Sum iv Wilcoxon Signed Rank Sum v Sign vi Kolmogorov Smirnov two sample vii Kruskal Wallis viii Friedman ix Chi square goodness of fit x Kolmogorov Smirnov goodness of fit Each of these menu items are described in detail below a Runs Test The runs test is employed to test if data is serially related The null and alternative hypotheses to be tested are Ho The data is serially independent H The data is serially dependent The test is conducted on a series of binary data that is data that can only take two distinct values Under the null hypothesis it can be expected that there is no particular pattern in consecutive runs of observations in each of the two categories of data in the series Let n and n denote the number of observations in each of the two categories of data in the series The statistic R is the number of runs in the series In the small sample case where n lt 20 and n lt 20 if R is above an upper critical value or below a lower critical value for a given level of significance the null hypothesis is rejected In the large sample case where n gt 20 or n gt 20 the distribution of runs is approximately normal with mean Llp 2m 44 and standard Nz N2 61 deviation op iat haath The test statistic in the large sample case is given as ni n2 ni n2 1 z e R and can be tested wit
27. in each variable and n and n are the number of observations on each variable The pooled proportion of successes is a _ X4 X2 denoted by p SS The test statistic is dependent upon the value of the hypothesized difference in proportions pp If pp 0 the test statistic is given as If pp 0 the test statistic is given as D B2 Dn CEEA ny N2 The confidence interval is given by PGS Py J el Ba 1 P2 Zas2 m n where a is the significance level The z statistic in each case has approximately a standard normal distribution ifn p 2 5 n1 1 f1 2 5 na 2 5 and n 1 2 2 5 The null hypothesis is rejected if Alternative Hypothesis Rejection Criterion Hy T2 Pp z gt ze Hy T3 lt Pp Z lt Zo Hy 12 gt Pp Z gt Zq where Zou and Ze are the critical values for a two tailed and one tailed test respectively 48 The required sample size to construct a confidence interval for the true difference in population proportions can be calculated in two ways 1 Point estimate If we have reason to believe the true difference in population proportions is close to the estimated difference in sample proportions Di 2 then the required sample size is given by the lowest integer greater than Za 2 S C mp f1 E E GA w2 2 Conservative If we wish to obtain a conservative estimate of the required sample size emplo
28. in two key ways i The sample mean and standard deviation are employed in the test rather than employ a hypothesised mean and standard deviation ii Different critical values are used see Abdi and Molin 2007 59 f Cram r von Mises Test The Cram r von Mises test statistic T is given as n 2 EE E Ge Tan 2 al 2n a where F x is the cumulative normal distribution with the mean and standard deviation estimated from the data References Abdi Herv and Molin Paul 2007 Lilliefors Van Soest s test of normality in Neil Salkind Ed 2007 Encyclopedia of Measurement and Statistics Thousand Oaks CA Sage Thode Jr H C 2002 Testing for Normality Marcel Dekker New York Shapiro S and Wilk B 1965 An analysis of variance test for normality complete samples Biometrika 52 3 and 4 p 691 Sheskin David 2007 Handbook of parametric and non parametric statistics 4th edition Chapman and Hall M A Stephens 1986 Tests Based on EDF Statistics In D Agostino R B and Stephens M A Goodness of Fit Techniques New York Marcel Dekker 60 Section 7 Non Parametric Tests not available in Statistician Lite The Non Parametric Tests button on the Statistician ribbon tab provides the user with a number menu items which activate various forms to perform a number of non parametric tests These are
29. of minimizing the MAD RMSE or MAPE This is an optimization problem that is not trivial as the curve to be minimized often contains many local minima and maxima Excel solver is a fast optimization tool that can be employed in this task However it does not always find the global minimum Statistician Optimizer found in the Tools groupbox can also be employed in the optimization task It is slower than Excel Solver but does sometimes find a global minimum that Excel Solver does not find 114 c Hodrick Prescott filter The Hodrick Prescott is employed to separate the cyclical component c from the growth component g in a time series y t 1 T The time series is written as Mu aGte Typically y is the logarithm of a macroeconomic variable such as GDP The growth components are found by minimizing T T 1 Xo 9 A 3 Toa 9t Ge gel t 1 t 2 Typically for annual data A 100 for quarterly data A 1 600 and for monthly data A 14 400 The solution to this equation is given by I 4K y where is the estimated growth component y y Yr I is aT xT identity matrix and Kis a T xT symmetric matrix of the form 1 2 1 0 o 2 5 4 1 0 K 1 4 6 AL 0 o 1 4 6 a 1 4 1 0 115 Section 14 Multivariate Analysis not available in Statistician Lite Multivariate analysis in Statistician consists of two techniques cluster analysis and principle component analys
30. of standard errors that are selected 4 Fitted Values and Residuals The estimated fitted values and residuals from the regression are numbered and reported 5 Mallows C Analysis A Mallows Cp analysis is conducted with output p Cp R adjusted R and a list of the relevant Data variables Output is sorted by the value of Cp Note no more than 9 regressors can be selected in a Mallows Cp analysis 95 Section 11 Binary Models not available in Statistician Lite The Binary Models form is selected by clicking the Binary Model button on the Statistician ribbon tab Statistician Binary Models Dependent Variable Select a Data Variable v Output Logit dl Model Statistics Dependent Variable Data A L 3 Probit Variance Covariance Matrix Success Estimation Information Failure 77 Constant With Constant Fitted Values and Residuals C Without Constant Independent Variables Sample Range Use All Observations User Defined From to 7 wl Format Output Output Results Manage Data Close Technical Discussion The Probit and Logit Logistic models model the probability that a binary dependent variable equals one This probability is given as a function F of the explanatory variables X and a vector of parameters fp The models are specified as P Y 1 X F X B P Y 0 X 1 F X f 96 Binar
31. parameters in the range 0 1 and L b and F are the level trend and forecast series respectively The forecast series is produced by initially by setting F y The default initial value for the trend series is given as bh y y The m step ahead out of sample forecast is given by Fram Sr mbr iii Triple exponential smoothing If the time series contains a trend and seasonal effects then the Holt Winters three factor model is employed The equations are dependent upon whether the seasonal indexes are additive or multiplicative If the seasonal indexes are additive the equations are given as Li ye St s 1 a at bt 1 by P Li Le 1 A P bi 1 Ar ye Leit l t where a P and y are fixed parameters in the range 0 1 and L b F and S are the level trend forecast and seasonal index series respectively Parameter s is the number of seasons and S _ is the seasonal index for the time series at time t The in sample forecast for the three factor model with additive seasonal indexes is given as F L 1 bi 1 St s The m step ahead out of sample forecast beginning after the final observation in the series is calculated as Fram Lr mbr x S where AS is the final in sample calculated seasonal index that corresponds to the forecast Fiim If the seasonal indexes are multiplicative then the equations are given as Yt St s Ly a 1 a Le 1 bt 1 by P Le
32. regression are numbered and reported 104 Section 13 Time Series not available in Statistician Lite Any variable that changes in value over time is referred to as a time series The Time Series groupbox offers the user a number of forms to investigate the behavior of a number of economic and business time series These forms include a Forecasting users can smooth a time series estimate seasonal indexes and produce forecasts b Holt Winters includes the one two and three factor model and the c Hodrick Prescott filter a Forecasting Statistician Time Series Forecasting Data Variable Select a Data Variable v Include Seasonal Indexes Smoothing Method Output Average e Moving Average Methods Contemporaneous Interval Centered Forecast to L 3 Weighted Weighting Data Variable Trend Methods _ Linear Quadratic Seasonal Data Variable Exponential Data Begins at Season _ Autoregressive Lags 1 wl Format Output Output Results Manage Data Close If the Forecasting menu item is selected from the the Time Series button the user is offered a total of nine smoothing methods There is a simple averaging method four moving average methods and four trend methods to smooth the random effects out of a time series Holt Winters methods are implemented on the Holt Winters form The 105 nine methods
33. s t distribution Returns values along a linear trend Returns the mean of the interior of a data set Returns the probability associated with a Student s t test Estimates variance based on a sample Estimates variance based on a sample including numbers text and logical values 126 VARP VARPA WEIBULL ZTEST Calculates variance based on the entire population Calculates variance based on the entire population including numbers text and logical values Returns the Weibull distribution Returns the one tailed probability value of a z test 127 Appendix B Other Excel Functions Math and trigonometry functions ABS ACOS ACOSH ASIN ASINH ATAN ATAN2 ATANH CEILING COMBIN COS COSH DEGREES EVEN EXP FACT FACTDOUBLE FLOOR GCD INT LCM LN LOG LOG10 MDETERM MINVERSE MMULT Returns the absolute value of a number Returns the arccosine of a number Returns the inverse hyperbolic cosine of a number Returns the arcsine of a number Returns the inverse hyperbolic sine of a number Returns the arctangent of a number Returns the arctangent from x and y coordinates Returns the inverse hyperbolic tangent of a number Rounds a number to the nearest integer or to the nearest multiple of significance Returns the number of combinations for a given number of objects Returns the cosine of a number Returns the hyperbolic cosine of a number Converts radians to degrees Rounds a
34. this case a z test is performed When the Output Results button is clicked Statistician outputs the e null hypothesis e alternative hypothesis e sample size of each Data Variable D and nz e standard deviation each Data Variable sample known or pooled e test statistic e critical value e p value e decision rule and e conclusion When the Unknown and Unequal or Unknown and Equal radiobutton is selected in the Standard Deviation groupbox a checkbox titled Output Test for the Equality of Sample Variances is displayed which gives the user the option to additionally test the hypothesis that the variance of both selected Data Variables are equal If this checkbox is checked then Statistician also outputs the null hypothesis alternative hypothesis sample variances test statistic critical value decision rule and conclusion of the F test To obtain the sample size for the difference in means for a given confidence interval width the Output Sample Size checkbox in the Confidence Interval groupbox is checked The user then enters the desired distance from the true population mean in the corresponding textbox The number of significant digits after the decimal point in the confidence interval output is selected from the Number of Significant Digits in Output combobox the default is 2 39 c Test Difference in Matched Pairs Dependent Samples The Test Difference in Population Means Dependent
35. with known standard deviations is given as Z 2 and the critical value for a test calculated with unknown standard deviations is Lou adr Where df is the degrees of freedom of the t distribution For a one tailed test the critical value for a test with known standard deviations is given as Z and the critical value for a test calculated with unknown standard deviations is taf The reported p value is the probability of rejecting the null hypothesis when it is true When the standard deviation of the population is known the null hypothesis is rejected if Alternative Hypothesis Rejection Criterion Hy 44 l2 D z gt ze H W Hz lt D Z lt Zo Hy u uz gt D Z gt Zy When the standard deviation of the population is unknown the null hypothesis is rejected if Alternative Hypothesis Rejection Criterion Hy h H2 D t gt ta 2n 1 Ay My H2 lt D t lt tan 1 Hy My Mz gt D t gt tan 1 36 The following table displays the formulae for the construction of the confidence intervals where a is the level of significance Standard Deviation Confidence Interval Unknown and Unequal SE ge CEET x A2 1 2 a 2 df n Ny Unknown and equal 1 1 X X t x Isi X1 X2 E ta 2 af Pla n Z 2 d 2 where s2 n4 1 s n2 1 s5 nyt n2 2 Known and unequal Bee en 1 2 x1 X2 Za 2 X F ay Known and e
36. 4 7 178 855 2 606 5 259 7 1974 4099 7 926 561 7 291 2 1975 4084 4 6 122 345 4 1976 4311 7 5 266 555 5 371 9 1977 4511 8 5 51 1269 9 639 4 405 1978 4760 6 7 572 1365 5 713 444 2 1979 4912 1 10 017 1473 1 735 4 489 6 1980 4900 9 11 374 1599 1 655 3 576 6 1981 5021 13 776 1754 6 715 6 659 3 1982 4913 3 11 084 1909 5 615 2 732 1 1983 5132 3 2126 673 7 797 8 1984 5505 2 2309 7 871 5 856 1 1985 5717 1 2495 4 863 4 924 6 1986 5912 4 2732 1 857 7 978 5 1987 2831 1 879 3 1018 4 1988 2994 3 902 8 1066 2 then the Data Variables listbox will appear as in figure 3 Figure 3 Statistician Manage Data Variables 1 to 19 Year Import Export EE 1 to 13 TBE 1 to 19 M2 i 1 to 19 E Hie Variable 1 to 19 GPI mes in First Row 1 to 19 FEDEXP Export Data Variables Rename Remove Rename a Data Variable Remove Data Variable s Select All Variables DeSelect All Variables 8 Note the asterisk in the sample range between the square brackets for the Data Variables M2 and GPI which both contain non numeric data blank cells The above process can be repeated any number of times to load data from various parts of a workbook When analysis is performed upon Data Variables Statistician will often automatically adjust the range of data for analysis to include only valid numerical observations For example if the correlation between Data
37. 631 5 0 752 2 5 0 873 1 1 035 0 5 1 159 The p value reported is computed from the modified statistic t Ar 1 2z as given in Stephens 1986 58 c Shapiro Wilk Test The Shapiro Wilk 1965 test statistic W is given as b2 W __ n Us where s YE EI n is the number of observations x is the it observation and x is the mean of these observations The term b is constructed as follows 1 All observations are sorted into ascending order let y denote the i sorted observation 2 kis defined as k n 2 if nis even and k n 2 if nis odd 3 bis then calculated as b X a yn_ i41 yi where the weights a are taken from a table given in Shapiro and Wilk 1965 d Kolmogorov Smirnov Test Assume F X is the empirical distribution from which a sample is derived and F X is the hypothesized normal distribution from which the sample is derived with mean and standard deviation denoted by uo respectively The Kolmogorov Smirnov test statistic D is the maximum distance between the cumulative probability distribution of the empirical data and the hypothesised normal distribution It is defined as D max sup F x Fo xi sup F x Fo xi 1 The null hypothesis is rejected if D is greater than or equal to the Kolmogorov Smirnov critical value e Lilliefors test The Lilliefors test is almost identical to the Kolmogorov Smirnov test but differs
38. 9 Section 4 Standard Tests The Standard Tests button on the Statistician ribbon tab provides the user with five menu items These are i Test Population Mean ii Test Difference in Population Means Independent Samples iii Test Difference in Matched Pairs Dependent Samples iv Test Population Proportion and v Test Difference in Population Proportions Each of these menu item selections are described in detail below a Test Population Mean The Test Population Mean form is activated from the Standard Tests button on the Statistician ribbon tab Statistician Test Population Mean Hypothesis Test a 0 05 Standard Deviation Hypothesized A Null Hypothesis Mean Unknown Mean of Selecta Data Variable wll y Known Alternative Hypothesis Significance Mean of y Om 5 O10 O Other 2 Confidence Interval Output a 95 Confidence Interval for the Mean Output the Sample Size to of the True Obtain a Confidence Population Interval Estimate to Within Mean wl Format Output Significant Digits 2 v Output Results Cse 30 The Test Population Mean form has two primary functions a To test the hypothesis that the population mean u of a selected Data Variable is equal to greater than or equal to or less than or equal to a hypothesized value v b To produce a confidence interval for the mean of a Data Variable The following se
39. Covariance and Correlation iii Autocorrelation Function iv Statistical Tables v Sort and Rank Each of these menu items are described in detail below a Summary Statistics The Summary Statistics form is selected by clicking the Summary Statistics menu item from the Tools button on the Statistician ribbon tab Statistician Summary Statistics Data Variables Extremes Count Sum Minimum Maximum 4 A i i Location Mean v v Median J Mode Higher Moments Select All Variables DeSelect All Variable sl Skewness Std Error Skewness v Excess Kurtosis Std Error Excess Kurtosis Select All Statistics DeSelect All Statistics 13 Spread v Range Interquartile Range wi Standard Deviation Sample Standard Deviation Population 4 Variance Sample Variance Population Sum Of Squares Mean Square Error Root Mean Square Error Mean Absolute Deviation Other Jacque Bera Test Statistic Durban Watson Test Statistic wl Format Output Output Results Close The Summary Statistics form generates various summary or descriptive statistics on the selected Data Variables To generate the statistics the relevant data variables are selected from the Data Variables list box The statistic
40. If Correlogram is selected only the correlation estimates are reported If Table is selected a t statistic is reported under the null hypothesis that the correlation is equal to zero for Pearson and Spearman correlations If Table is selected for the Kendal tau a or tau b correlation coefficient a z statistic is reported under the null hypothesis that the correlation is equal to zero The number of concordant and discordant pairs is reported and in the case of the Kendal tau b correlation coefficient the number of ties is reported 20 c Autocorrelation The autocorrelation form allows users to identify the autocorrelation and partial autocorrelation function for a time series The kt autocovariance for data variable y is defined as yg Cov y z and the kt autocorrelation for data variable y is then defined as px Yk Yo An estimate of go is given as aes nt Y Ot k Y S Al De where y is the unconditional mean of series y and n is the number of observations Each sample autocorrelation estimate is distributed with zero mean and has approximate variance 1 n under the null hypothesis that y is a white noise process Bartlett 1946 proposes that under the null hypothesis that y is a white noise process the variance of a sample autocorrelation is given as 1 Vor St 2r 2rp_4 A p value to test the significance of an autocorrelation estimate is derived from the test statistic give
41. Replication When the user selects the Two Factor Table with replication radiobutton from the ANOVA Table Type groupbox user then selects two or more Data Variables from the Data Variables listbox All treatment data is stored in a Data Variable Each treatment Data Variable is structured as follows The set of observations for the first block replications 1 to d are stored at the top of the Data Variable The next set of observations for the second block is then stored below the first set of block observations This process is continued until all replication data is stored in the treatment Data Variable A schematic representation of the data is as follows Treatmenti Treatment 2 Treatmenta Block 1 Block 1 Block 1 Block 2 Block 2 Block 2 Block b Block b Block b 87 The number of observations sum average and variance of the means of treatments and blocks is outputted A table describing the source of variation as displayed below is then reported pour W of d f shana Mean F Statistic F critical p value Variation Squares Squares Treatments a 1 SST MST fr MST MSE EECH P F gt fr Blocks b 1 SSB MSB fs MSB MSE F a b 1 ab r 1 P F gt fs Interaction a 1 b 1 SSI MSI fr MSI MSE Fia a 1 b 1 ab r 1 P F gt fi Error ab r 1 SSE MSE Total abr 1 SS Total The null and alternate hypothesis for the test on treatments and blocks the test s
42. Returns a number corresponding to an error type Returns information about the current operating environment Returns TRUE if the value is blank Returns TRUE if the value is any error value except N A Returns TRUE if the value is any error value Returns TRUE if the number is even Returns TRUE if the value is a logical value Returns TRUE if the value is the N A error value Returns TRUE if the value is not text Returns TRUE if the value is a number Returns TRUE if the number is odd Returns TRUE if the value is a reference Returns TRUE if the value is text 134 NA TYPE Returns a value converted to a number Returns the error value N A Returns a number indicating the data type of a value Logical functions AND FALSE IF IFERROR Returns TRUE if all of its arguments are TRUE Returns the logical value FALSE Specifies a logical test to perform Returns a value you specify if a formula evaluates to an error otherwise returns the result of the formula NOT OR TRUE Reverses the logic of its argument Returns TRUE if any argument is TRUE Returns the logical value TRUE Lookup and reference functions ADDRESS AREAS CHOOSE COLUMN COLUMNS HLOOKUP HYPERLINK INDEX INDIRECT LOOKUP MATCH OFFSET ROW Returns a reference as text to a single cell in a worksheet Returns the number of areas in a reference Chooses a value from a list of values Returns the column number of a reference Returns the n
43. Samples form is activated from the Standard Tests button on the Statistician ribbon tab Statistician Test Matched Pairs Dependent Samples Hypothesis Test a 0 05 Hypothesized Null Hypothesis foes Average Difference of Select a Data Variable v and Select a Data Variable wll Wun Alternative Hypothesis Average Difference of and ty Confidence Interval Significance Cl Output a 95 Confidence Interval for the Average Difference rz 5 O10 Output the Sample Size to Obtain ofthe Average a a Confidence Interval Estimate of Population the Average Difference to within Diference Significant Digits 2 v v Format Ouiput Output Results Manage Data Close The Test Difference in Population Means Dependent Samples form is activated by selecting Two Sample Matched Pairs from the Mean combobox in the Standard Tests groupbox The Difference in Population Means Dependent Samples form has two primary functions a To test the hypothesis that the average difference in the matched pairs up of two selected Data Variables is equal to greater than or equal to or less than or equal to a hypothesized value D b To produce a confidence interval for the average difference in the matched pairs of two selected Data Variables 40 The following set of null and alternative hypotheses can be tested a Ho Up D H Up D two tailed test or Hy Up gt D or Ay Up lt D b Ho Lp D
44. Variables GDP and GPI in figure 3 is generated then the sample range employed will be 4 to 15 unless overridden by a user defined sample range If the workbook which contains the Data Variables is saved upon exiting Excel the Data Variables will also be saved and available for use the next time the workbook is opened c Selecting Data Variables Data Variables can be selected by left clicking the Data Variable name in the Data Variables listbox Data Variables can also be de selected by again left clicking the Data Variable name All of the Data Variables in the Data Variables listbox can be selected or de selected by clicking the Select All Variables or DeSelect All Variables buttons respectively d Exporting Data Variables To export data from Statistician to a spreadsheet select one or more Data Variables from the Data Variables list box By clicking the Export Data Variables button the user is presented with the Select a Range inputbox which is used select the top left output cell of the output area in a spreadsheet e Removing Data Variables To remove one of more Data Variables from Statistician select the Data Variables in the Data Variables list box that will be removed and then click the Remove Selected Variable s button 9 f Renaming a Data Variable To rename a Data Variable select one Data Variable from the Data Variables list box and then click the Rename a Data Variable button Th
45. a Huesen Z lt Zg Hy u gt v Z gt Zg When the standard deviation of the population is unknown the null hypothesis is rejected if Alternative Hypothesis Rejection Criterion Hiu v lt gt ta 2n 1 Huesen t lt tgn 1 Hun t gt tanci The confidence level is given by 1 a expressed as a percentage where a is the significance level The confidence interval is given as 8 by X Zq 2 vn for a known population standard deviation and gt S x ta 2n 1 vn for an unknown population standard deviation To obtain a sample size required to construct an estimate of the population mean for a given confidence interval width and significance level the required sample size is the lowest integer greater than Za 2Or rw for a known standard deviation and es W for an unknown standard deviation where w is half of the confidence interval width 32 Using Statistician Test Population Mean The Data Variable to be tested is selected from the Null Hypothesis combobox in the Hypothesis Test groupbox The hypothesized mean of the Data Variable is entered in the Hypothesized Mean textbox in the Hypothesis Test groupbox The null hypothesis 2 lt and alternative hypothesis lt gt are selected in the corresponding comboboxes in the Hypothesis Test groupbox If the standard deviation of the population is known then the Known radiobutton is selected in th
46. a Variables are selected from the Data Variables listbox see figure 3 Figure 3 Statistician Manage Data kk Data Variables 1 to 19 Year 4 to 17 GDP 1to 13 TB6 1 to 19 False 1to 19 GPI 1to 19 FEDEXP Select All Variables DeSelect All Variables Data Variables used in a statistical analysis are selected by clicking the name of the Data Variable in the listbox Clicking the Data Variable name again will de select the Data Variable If the Select All Variables button is clicked then all of the Data Variables will be selected Clicking the DeSelect All Variables button will de select all of the Data Variables When a Data Variable is selected its background will turn blue 11 Often the user is required to select a level of significance to perform a statistical test This level of significance is selected from the Significance groupbox Most statistical tests are conducted at the 1 5 or 10 level These significance levels are selected by clicking the corresponding radiobutton If some other level of significance is required then the user clicks the Other radiobutton and then enters the level of significance in the adjacent textbox see figure 4 Figure 4 Significance Om 5 10 _ Other 12 Section 3 Tools The Tools button on the Statistician ribbon tab provides the user with five menu items These are i Summary Statistics ii
47. a dollar price expressed as a decimal number into a dollar price expressed as a fraction Returns the annual duration of a security with periodic interest payments Returns the effective annual interest rate Returns the future value of an investment Returns the future value of an initial principal after applying a series of compound interest rates Returns the interest rate for a fully invested security Returns the interest payment for an investment for a given period Returns the internal rate of return for a series of cash flows Calculates the interest paid during a specific period of an investment Returns the Macauley modified duration for a security with an assumed par value of 100 Returns the internal rate of return where positive and negative cash flows are financed at different rates Returns the annual nominal interest rate Returns the number of periods for an investment Returns the net present value of an investment based on a series of periodic cash flows and a discount rate Returns the price per 100 face value of a security with an odd first period Returns the yield of a security with an odd first period Returns the price per 100 face value of a security with an odd last period Returns the yield of a security with an odd last period Returns the periodic payment for an annuity Returns the payment on the principal for an investment for a given period Returns the price per 100 face value of a security that pa
48. ame sample range number of observations in the regression model estimated parameter estimates parameter estimate standard errors parameter estimate t ratio s parameter estimate p values 99 Output from each of the four options in the OutPut groupbox are as follows 1 Model Statistics count of dependent variable 0 count of dependent variable 1 mean of the dependent variable standard error of the dependent variable residual sums of squares standard error of the regression log likelihood average log likelihood Akaike information criterion Scwharz information criterion Hannan Quinn information criterion McFadden R restricted log likelihood LR statistic e probability of LR statistic 2 Variance Covariance Matrix Maximum likelihood variances and covariances are reported 3 Estimation Information estimation algorithm tolerance iterations required function evaluations required starting values 4 Fitted Values and Residuals The fitted values estimated residuals from the regression are numbered and reported 100 Section 12 Count Models not available in Statistician Lite A count model is employed to estimate the mean number of times an event occurs conditional upon a number of independent variables A commonly used count model is the Poisson regression If a random variable Y has a Poisson distribution then the probability tha
49. and n 1 d count of i such that m n 0 The binary distance measures are defined as follows Distance Measure Definition Jaccard PIRE b c mn a tb c Simple Matching b c dmn k Russel and Rao b ct d dmn p Sokal and Sneath 1 b c dmn mm 2a a b c 2d Sokal and Sneath 2 d b c mn a 2 b c Rogers and Tanimoto 8 2 b c mn a 2 b c d 118 Dice 8 b c mn Jab he Yule Ais 2bc mn ad bc Statistician offers two forms of Cluster analysis a hierarchical technique and the K means technique i Hierarchical or Join Clustering Agglomerative hierarchical clustering performs successive fusions of the data into clusters where each object initially starts out as its own cluster Agglomerative hierarchical clustering techniques differ to the extent that different measures are employed to measure the distance between clusters often referred to as the linkage method In the following discussion the following notation is employed Cy cluster X Dyy the distance between cluster X and Y Ny the number of objects in cluster X X the centriod of cluster X When two clusters Cp and Cs are merged to form a new cluster Cr a combinatorial formula can be employed to calculate the distance between the new merged cluster and all other clusters Cx Linkage methods available in Statistician are as follows Linkage Method D
50. are defined as follows In the following discussion let S be the smoothed observation on the time series observation y 1 Average Each value in the smoothed series is simply the arithmetic average of all observations in the original series 2 Moving average smoothing techniques Initially the user enters a moving average interval in the Interval textbox that is displayed This interval must be a positive integer I 2 i Contemporaneous Moving Average The smoothed observation is the arithmetic average of the current and past LI observations If for example a three point moving average is selected that is J 3 then the smoothed estimate for the current observation is given as A Dir Yt 1 t 2 3 In this example the first two observations for the smoothed series are not defined ii Lagged Moving Average The smoothed observation is the arithmetic average of the past J observations If for example a three point moving average is selected that is J 3 then the smoothed estimate for the current observation is given as S 4 1 Yt 2 Yt 3 3 In this example the first three observations for the smoothed series are not defined iii Centered Moving Average If the interval J is an odd number then the smoothed observation is the average of an equal number of leads and lags of the original observation For example if I 3 then Si Yt 1 Ye 1 3 If the interval is an even number then the smoothed ob
51. as a IA distribution The null hypothesis is rejected if T gt Kei 55 Levene Test Statistician Variance Test Levene Data Variables v Summary Statistics Significance 01 5 10 Other Levene Test Type Mean O Median Trimmed Mean Significant Digits v Select All Variables F DeSelect All Variables sl Format Output Output Results Manage Data Close The Levene test is constructed as follows Let n denote the size of the it sample and let n Y _ n be the pooled sample size The test statistic is given as emt Samia a where z can take one of the following three definitions a Mean Zij H 7il where y is the mean of the it sample b Median Z Yi ml where y is the median of the it sample c Trimmed Mean zij fu yil where 9 is the 10 trimmed mean of the Oh sample z denotes the sample means of zur and Z denotes the overall mean of zij The W statistic has a F k 1 n p distribution The null hypothesis is rejected if F gt Fak 1n k 56 Section 6 Normality Tests not available in Statistician Lite The Normality Tests button on the Statistician ribbon tab provides the user with ability to perform six different normality tests These are the i Jacque Bera test ii Anderson Darling test iii Shapiro Wilk test iv Kolmogorov Smirnov test v Lilliefors test an
52. at will be reported in the Standard Errors groupbox At least one type of standard error OLS White or Newey West must be selected The user can select the type of output from the regression in the OutPut groupbox If no option is selected in the OutPut frame then xlStatistician reports dependent variable name sample range number of observations in the regression estimation method parameter estimates parameter estimate standard errors OLS White and or Newey West parameter estimate t ratio s parameter estimate p values Output from each of the four options in the OutPut groupbox are as follows 1 Regression Statistics mean of the dependent variable standard error of the dependent variable R adjusted R standard error of the regression total sums of squares regression sums of squares residual sums of squares F statistic P value of F statistic Log likelihood Akaike information criterion Scwharz information criterion 2 Variance Inflation Factors The Variance Inflation Factors VIF are reported alongside the table of parameter estimates standard errors t ratios and p values 94 3 Residual Diagnostics skewness standard error of skewness kurtosis standard error of kurtosis Jacque Bera test statistic Durbin Watson test statistic 3 Variance Covariance Matrix This may include OLS White or Newey West variances and covariances depending on the type
53. ble combinations of assumptions for the standard deviation of the two random variables The distribution and test statistic for the difference in the means of the two random variables under these four assumptions is displayed in the following table Standard Deviation Distribution Test Statistic Unknown and Unequal Student t x 1 2 D D a Unknown and equal Student t xi X2 D Bat aa are La Where s2 n1 1 s n2 1 s5 ny n2 2 Known and unequal Normal 1 2 D D g 2 Known and equal Normal xi X2 D SE 35 where s and s are the sample variances of each random variable o and of are the known and unequal variances of each random variable and where is the known and equal variance of each of the random variables When the variances are unknown and unequal the degrees of freedom of the t distribution is given by Sor BF 2 s S s A d f E 2 WIES m 1 m2 D where d f is rounded to the nearest integer When the variances are unknown and equal the degrees of freedom are given by d f n n 2 The variable D yp u is the hypothesized difference between the means where 4 and u are the population means of the two random variables The critical values for a z test are taken from a standard normal distribution and for a t test are taken from a student t distribution For a two tailed test the critical value for a test
54. bservations from each of the k samples are pooled and ranked from 1 to n Observations with a tied rank are assigned a rank which is the average of the corresponding raw ranks The sum of the ranks from each of the k samples are denoted by Th The test statistic H is given as k 2 ye 3 n 1 In n 1 4 nj j 1 Let a large sample size be defined as the case where the sample size from each of the populations is greater than or equal to five ie n gt 25 The H statistic has a x distribution with k 1 degrees of freedom If H gt yp the null hypothesis is rejected where a is the selected significance level 70 i Friedman Test The Friedman is employed when either ordinal or numerical data is generated from a randomized block experiment as is the case with an ANOVA table without replication Assume we have k samples treatments and b blocks within each sample The test statistic is constructed by initially ranking all observations within each block Observations with a tied rank are assigned a rank which is the average of the corresponding raw ranks The sum of all ranks for each treatment is denoted by TP The Friedman test statistic F is defined as k pol 7 3b k 1 bk k tU MN J The large sample case is defined as the case where either k or b is greater than or equal to five The F statistic has a y distribution with k 1 degrees of freedom If H gt X n 1 the null hypothesis is rejected where a i
55. d the vi Cram r von Mises test Statistician Normality Tests Data Variable Significance Select a Data Variable wl O 5 O 10 Mean 0 Tests Other Std Dev 4 C Jacque Bera Anderson Darling Shapiro Wilk C Kolmogorov Smirnov Lilliefors C Cramer von Mises v Format Output Output Results Ge Each of these tests are described in detail below The Null and Alternative hypothesis in a normality test is given as Ho The data is normally distributed H The data is not normally distributed 57 a Jacque Bera Test The Jacque Bera test statistic JB has a Xe distribution It is defined as es Skewness A Excess Kurtosis JEEN 6 24 A3 n n xj X where Skewness SEET Geh S oe n n 1 n Xi 4 3 n 1 and Excess Kurtosis Gennes St S SET The null hypothesis is rejected if the JB test statistic is greater than the Xe critical value b Anderson Darling Test The Anderson Darling test statistic often denoted as A7 is given as n 1 A n 2i 1 In F n 1 Ft All i 1 where F is the normal distribution z is the z score from the i sorted ascending observation and n is the number of observations The null hypothesis is rejected if A is greater than the critical value CV where CV a 1 35 n2 The value of ais dependent on the level of significance required Significance a 10 0
56. e To enhance the readability of the output the user has the option of selecting a Data Variable from the Row Classifications combobox This Data Variable should contain the description of each row of the second nominal variable If no selection is made from the Row Classifications combobox Statistician will simply refer to Row 1 77 Row 2 etc in the output Statistician uses the selected Data Variable names as the description for each of the columns of the contingency table The user also has the option of entering a general title for the rows and columns in the contingency table by entering the corresponding titles in the Row Title and Column Title textboxes If these textboxes are left blank Statistician will refer to Rows and Columns in the output Statistician Contingency Table Column Variables Row Classifications Row Title Column Title Significance Om 5 O 10 O Other Table Output Output Contingency Table Output Expected Values Table Significant Digits 2 v Format Output Output Results Manage Data Close By checking either the Output Contingency Table and or Output Expected Values Table checkboxes Statistician will output the contingency table and or a table of expected values within the contingency table Statistician outputs the null and alternative hypothesis number of column classifications number of row classifications test statistic critical value s
57. e Rename a Data Variable inputbox see figure 4 is then presented to the user Enter the new name for the Data Variable into the textbox and then click OK Figure 4 Rename a Data Variable a Enter the New Name of Data Variable MI 10 Section 2 Controls Common to all Forms Statistician has a number of controls that are common to many forms On the bottom right of most forms users will see two buttons and a checkbox as displayed in figure 1 Figure 1 wl Format Output Output Results Close The bottom button on the form Close closes the current form and returns the user to Excel The button second from the bottom of the form Output Results initiates the statistical analysis and outputs the results to an Excel worksheet after presenting the user with an inputbox which can be used to select the top left output cell with the mouse If the Format Output checkbox is checked Statistician will align cells bold face headings and autofit various columns of the output On the bottom left of all forms is the Manage Data button see figure 2 Manage Data Figure 2 Clicking the Manage Data button opens the Manage Data form which allows users to import export rename and remove Data Variables Clicking Return on the bottom right of the Manage Data form returns the user to the current form Often the user is required to select two or more Data Variables for analysis These Dat
58. e Standard Deviation groupbox and a textbox is displayed where the known standard deviation can be entered In this case a z test is performed If the standard deviation of the population is unknown then the Unknown radiobutton is selected In this case the sample standard deviation of the selected Data Variable is calculated and a t test is performed When the Output Results button is clicked Statistician outputs the e null hypothesis e alternative hypothesis e sample size e standard deviation sample or known e test statistic e critical value e p value e decision rule and e conclusion To obtain a sample size for a given confidence interval width the Output Sample Size checkbox in the Confidence Interval groupbox is checked The user then enters the desired distance from the true population mean in the corresponding textbox The number of significant digits after the decimal point in the confidence interval output is selected from the Number of Significant Digits in Output combobox the default is 2 33 b Test Difference in Population Means Independent Samples The Test Difference in Population Means Independent Samples form is activated from the Standard Tests button on the Statistician ribbon tab Statistician Test Difference In Population Means Independent Samples Hypothesis Test a 0 05 Hypothesized Null Hypothesis Dawns Meanof Selecta Data Variable v Meanof S
59. e selected in the corresponding comboboxes in the Hypothesis Test groupbox When the Output Results button is clicked Statistician outputs the e null hypothesis e alternative hypothesis e sample size e mean of differences e standard deviation of differences e test statistic e critical value e p value e decision rule and e conclusion To obtain a sample size for a given confidence interval width the Output Sample Size checkbox in the Confidence Interval groupbox is checked The user then enters the desired distance from the true population mean in the corresponding textbox The number of significant digits after the decimal point in the confidence interval output is selected from the Number of Significant Digits in Output combobox the default is 2 42 d Test Population Proportion The Test Population Proportion form is activated from the Standard Tests button on the Statistician ribbon tab Statistician Test Population Proportion Hypothesis Test a 0 05 Hypothesized Null Hypothesis Proportion Proportion of Select a Data Variable vi i wb Alternative Hypothesis Proportion of v 05 Confidence Interval Significance a Om 5 10 _ Output a 95 Confidence Interval for the Proportion L 7 Other Output the Sample Size to of the True i Obtain a Confidence Population Interval Estimate to Within Proportion Significant Digits 3 v Symbols Success
60. ed critical value If the alternative hypothesis is H ps lt 0 the null hypothesis is rejected if rs is negative and r is greater than or equal to the one tailed critical value Kendall tau The Kendall correlation coefficient often referred to as the Kendall tau or Kendall tau a coefficient measures the association between the ranks of two ordinal variables and is often denoted by t The Kendall correlation coefficient is given as na 2 nc np n n 1 where n and np denote the number of concordant and discordant pairs respectively and n is the sample size If tied ranks are present in either of the x or y random variables the above methodology must be modified In this case the tie corrected Kendall tau or Kendall tau b coefficient can be employed Let the number of distinct tied ranks for each random variable x and y be denoted by t and t respectively The tie adjusted Kendall tau tau b is then defined as 2 nc Np yin 1 t n n 1 e In the large sample case when n gt 10 the null hypothesis Hy t 0 for both tau a and tau b can be tested reasonably accurately with the statistic 3t n n 1 z J 2 2n 5 The z statistic has a standard normal distribution In the small sample case where n lt 10 the critical value to test the null hypothesis Ho Tt 0 should be taken from a table of Kendall critical values In the Output groupbox users can select either Correlogram or Table
61. eens E a a TET 9 i EEN 10 Section 2 Controls Common to all Forms cece cnec eens ec en eceeeceeeeeeeeaeaes 11 SECON 3 TOONS 205 ani eene ee Sagan bod Rages deene ee Ee 13 a Summary Ee 13 b Covariance and Correlation ccc sairaan cece ee iaa eee Ea Ea aia S 17 c Autocorrelation EE 21 d Statistical Tables sssnicininccniiconene NNN RENE a a a a NENNEN 23 e Sortand EE 26 i Sampling ea a E tas ea el a A a a a oa 27 E r kee 28 h Make Lagged Data Variables EE 29 Section 4 Standard Tests 30 a Test Population Mean TEE 30 b Test Difference in Population Means Independent Samples cccceeeeees 34 c Test Difference in Matched Pairs Dependent Samples 40 d Test Population Proportion iicistsicieicuiiecanseavtviasls penih eateries anaes 43 e Test Difference in Population Proportions ess ng cogs desacegansugasdaceigtasaceag cadavers sepaneal 47 Section 5 Variance Tests eege oc dead asics ENEE ZENNER 51 a One Sample c sin an a AAE a a aaa 52 b Two Sample S entene ani erer E ited tached ee E sae lel eae sind A EE SETE 53 C Many Samples E 54 Section 6 Normality Tests ANER ENNEN ENNEN SRN e E EE EE E EE 57 a Jacgu ue Bera EE 58 b Anderson Darling Test ccc0 lected Zen gebai odina eaa a adia ea a aaae 58 c Shapir Wilk Test 08 i nrn E EEEE E EE O E E E E 59 d Kolmogorov Smirnov LeStiveiescaceecccidese ca decoder deg leaded a VEER Eed AEN 59 e E Uer
62. electa Data Variable v wii Alternative Hypothesis Mean of Meanof wll Confidence Interval Standard Deviation Output a 95 Confidence Interval for Unknown and Unequal Unknown and Equal the Difference in Means _ Known and Unequal _ Known and Equal Output the Sample Size to Obtain of the True Std Dev of 77 Std Dev of 77 a Confidence Interval Estimate of Population the Difference in Means to Within Difference Output F Test for the Equality of Sample Variances Significant Digits 2 v Significance Om 5 10 Format Output O Other NW 2 Output Results Gen The Test Difference in Population Means Independent Samples form has two primary functions a To test the hypothesis that the difference in the means u4 H2 of two selected Data Variables is equal to greater than or equal to or less than or equal to a hypothesized value D b To produce a confidence interval for the difference in the means of two selected Data Variables 34 The following set of null and alternative hypotheses can be tested a Ho 41 H2 D Hy Uy Uz D two tailed test or Ay ly u2 gt D or Ay fy u2 lt D b Ho Hi u2 2 D Ay fy Wz lt D c Ho H Hz SD H My U2 gt D The standard deviation of the two random variables may be known or unknown They may also be assumed to be equal or unequal This leaves four possi
63. en as amp yt ao n liYt i Cyclical effects can be isolated in a time series using the following methodology 1 For each time period calculate the smoothed value S using one of the methods outlined above 2 Calculate the percentage of trend P as P E SCH x 100 3 A visual inspection of the graph of P over time will be centered on 100 in the horizontal direction If P is consistently above or below the 100 for an extended period of time then this may indicate the existence of a cyclical effect Statistician outputs the percentage of trend series when the Percentage of Trend checkbox is checked Seasonal Indexes A time series y is often modeled with four components a A trend component T which is a long term pattern or direction that the time series exhibits b A cyclical component C which is a wavelike pattern around the trend which typically becomes apparent over a number of years c A seasonal component S which is a repetitive pattern which occurs typically in weekly monthly quarterly or annual cycles d A random component R which are irregular and unpredictable patterns not associated with any of the other components Statistician models two common time series models the additive model and the multiplicative model The additive model is defined as 108 Mr T C S Rt The multiplicative model is defined as Vp T XC XS X Re Seasonal inde
64. er a number is greater than a threshold value Converts a hexadecimal number to binary Converts a hexadecimal number to decimal Converts a hexadecimal number to octal Returns the absolute value modulus of a complex number Returns the imaginary coefficient of a complex number Returns the argument theta an angle expressed in radians Returns the complex conjugate of a complex number Returns the cosine of a complex number Returns the quotient of two complex numbers Returns the exponential of a complex number Returns the natural logarithm of a complex number Returns the base 10 logarithm of a complex number Returns the base 2 logarithm of a complex number Returns a complex number raised to an integer power Returns the product of complex numbers Returns the real coefficient of a complex number 137 IMSIN IMSQRT IMSUB IMSUM OCT2BIN OCT2DEC OCT2HEX Returns the sine of a complex number Returns the square root of a complex number Returns the difference between two complex numbers Returns the sum of complex numbers Converts an octal number to binary Converts an octal number to decimal Converts an octal number to hexadecimal Add in and Automation functions CALL EUROCONVERT GETPIVOTDATA REGISTER ID SQL REQUEST Cube functions Calls a procedure in a dynamic link library or code resource Converts a number to euros converts a number from euros to a euro member currency or converts a number from one euro
65. escription The distance between two clusters is determined by the Nearest Neighbour distance of the two closest objects nearest neighbours in Single the different clusters The combinatorial formula is Drg min De Dsg The distance between two clusters are determined by the Furthest Neighbour greatest distance between any two objects in the different Complete clusters be by the furthest neighbors The combinatorial formula is Drg max Dpx Dsg 119 Group Average Also called weighted pair group method using averages WPGMA The distance between two clusters is calculated as the average distance between all pairs of objects in the two different clusters weighted by the number of objects in each cluster The combinatorial formula is Drg aan PRK ns Dex Nrtns SK McQuitty Also called unweighted pair group method using averages UPGMA The distance between two clusters is calculated as the average distance between all pairs of objects in the two different clusters This method is identical to the Group Average method except that equal weighting is to the clusters to be merged The combinatorial formula is Drg 1 1 3 Prk 5 Psk Centroid The distance between two clusters is the squared Euclidian distance between the centroids of the clusters The combinatorial formula is Da NRNS n n z DrK S Dsk Nrtns Nrtns ng ns 2 RS Median
66. for the y Distribution for a given number of degrees of freedom Significance values are a 0 995 0 99 0 975 0 95 0 9 0 1 0 05 0 025 0 01 0 005 Binomial Distribution Table of binomial probabilities for a given probability of success p 0 05 0 1 0 9 0 95 and a given number of trials n 1 10 Cumulative Binomial Distribution Table of cumulative binomial probabilities for a given probability of success p 0 05 0 1 0 9 0 95 and a given number of trials n 1 10 Poisson Distribution Table of Poisson distribution probabilities for a given mean A 0 1 0 2 1 2 10 and a given number of successes x 0 1 20 Cumulative Poisson Distribution Table of cumulative Poisson distribution probabilities for a given mean A 0 1 0 2 1 2 10 and a given number of successes x 0 1 20 Mann Whitney One Tail Critical values for a Mann Whitney one tailed test where n lt 20 and n lt 20 The significance level of the table is selected from the Alpha textbox and can equal a 0 05 0 01 Mann Whitney Two Tails Critical values for a Mann Whitney two tailed test where nu lt 20 and n lt 20 The significance level of the table is selected from the Alpha textbox and can equal a 0 05 0 01 Wilcoxon Rank Sum Upper and lower critical values for the Wilcoxon rank sum test where 4 lt n n2 lt 10 The significance level of the table is selected from the Alpha te
67. h 0 We E X ai i 1 where 0 is the observed frequency for bin i and E is the expected frequency for bin i The expected frequency is calculated by E N Fo Xy z Fo X1i where Zu is the upper limit for bin Luis the lower limit for bin i and N is the sample size The parameters of the theoretical distribution can be either estimated from the sample data or can be assumed to have a specific value If the parameters of the theoretical distribution are estimated from the sample data then the test statistic follows approximately a chi square distribution with k 1 c b degrees of freedom where k is the number of bins c the number of estimated parameters for the distribution and b is the number of empty bins If the parameters are assumed to have a specific value then c 0 Using Statistician To perform a chi square goodness of fit test the user selects a Data Variable from the Distribution of combobox in the Hypothesis Test groupbox The hypothesized distribution of the sample is selected from the Distribution combobox If the parameters of the hypothesized distribution are to be estimated from the sample data then Assigned radiobutton in the Distribution Parameters groupbox is checked To assign specific parameter values to the hypothesized distribution the Assigned radiobutton is checked When the Assigned radiobutton is checked users can enter the hypothesized distribution parameters in the corresponding textboxes in t
68. h a standard two tailed z test R Using Statistician Statistician Runs Test Data Variable Select a Data Variable v Symbols for Categories Significance Category 1 1 oa 5 10 Category 2 Other v Format Output Output Results Manage Data Exit The Non Parametric Tests Runs form is selected by clicking the Runs button in the Non Parametric Tests groupbox on the Statistician ribbon The user selects the Data Variable for analysis from the Data Variable combobox If the data is a small sample defined in the preceding section then only a significance level of 0 05 can be selected Statistician outputs the sample size of the Data Variable test statistic critical value s significance null and alternative hypothesis decision rule and conclusion for the test All Data Variables tested must be binary variables that is they have one of two distinct values referred to as Category 1 or Category 2 These distinct values may be numeric or text The symbols for Category 1 and Category 2 are generated by Statistician after analyzing the selected Data Variable Statistician enters in the symbols for Category 1 and Category 2 in the Category 1 and Category 2 labels in the Symbols for Categories groupbox Clicking the Switch Symbols button switches the symbols for success and failure Statistician automatically determines if the user has selected a small or large sample based upon the criteria
69. he 72 Distribution Parameters groupbox To specify the bins for the test users enter the lower bound upper bound and bin width in the Bins groupbox If the Output Bin Frequencies checkbox is checked then the expected and observed frequencies of observations within each bin is outputted Distributions which can be tested are as follows Distribution Parameters Type Normal Mean Decimal Standard Deviation Positive Decimal Uniform Lower Bound Decimal Upper Bound Decimal Log Normal Mean Decimal Standard Deviation Positive Decimal Exponential Mean Positive Decimal References David J Sheskin 2007 Handbook of parametric and nonparametric statistical procedures 4th ed Chapman amp Hall CRC k Goodness Of Fit Tests Kolmogorov Smirnov The Kolmogorov Smirnov goodness of fit test tests if a Data Variable has a specific distribution The Kolmogorov Smirnov goodness of fit test can only be applied to continuous distributions The null and alternative hypothesis for the Kolmogorov Smirnov goodness of fit test is given as Ho F X Fo X Hy F X Fo X where F X is the empirical distribution from which the sample is derived and F X is the hypothesized distribution from which the sample is derived The Kolmogorov Smirnov test statistic is defined as i 1i D max Lou Ee so The null hypothesis is rejected if D is greater than or equal to the Kolmogorov Smirnov critical value
70. he form of text to a serial number Converts a serial number to a day of the month Calculates the number of days between two dates based on a 360 day year Returns the serial number of the date that is the indicated number of months before or after the start date Returns the serial number of the last day of the month before or after a specified number of months Converts a serial number to an hour Converts a serial number to a minute Converts a serial number to a month 133 NETWORKDAYS Returns the number of whole workdays between two dates NOW Returns the serial number of the current date and time SECOND Converts a serial number to a second TIME Returns the serial number of a particular time TIMEVALUE Converts a time in the form of text to a serial number TODAY Returns the serial number of today s date WEEKDAY Converts a serial number to a day of the week WEEKNUM Converts a serial number to a number representing where the week falls numerically with a year WORKDAY Returns the serial number of the date before or after a specified number of workdays YEAR Converts a serial number to a year YEARFRAC Returns the year fraction representing the number of whole days between start_date and end_date Information functions CELL ERROR TYPE INFO ISBLANK ISERR ISERROR ISEVEN ISLOGICAL ISNA ISNONTEXT ISNUMBER ISODD ISREF ISTEXT Returns information about the formatting location or contents of a cell
71. he second sample Let A S n and Az S n denote the average of the ranks of the samples from the first and second populations respectively Define the U and U statistics as U n nz EE as and U mnm BY s The U test statistic is defined as U min U U2 Let a small sample be defined as the case where n lt 10 and n lt 10 In the small sample case critical values are taken from a table of Mann Whitney critical values Statistician has Mann Whitney critical values in the small sample case for one and two tailed tests where a 0 05 and 0 01 Let U itg and U itq denote the critical value for a one and two tailed test respectively with significance level a If the alternative hypothesis is H L L2 the null hypothesis is rejected if U lt UZ i q For a one tailed test with alternative hypothesis H L lt Lz the null hypothesis is rejected if U lt He and A lt A For a one tailed test with alternative hypothesis H L gt L3 the null hypothesis is rejected if U lt UZ itq and A gt Ap A large sample is the case where n gt 10 or n gt 10 The test statistic is given as Z u mne DC The z test statistic has a standard normal distribution and can be tested with a standard z test If the alternative hypothesis is H L Lz the null hypothesis is rejected if z gt Saz where Z 2 is the critical value For a one tailed test with alternative hypothesis H Lt the null hypothesis is rejected if
72. he test statistic is constructed as follows Assume we have two samples of data with values X 0 1 2 2 4 5 and X 1 2 4 4 6 All distinct values are sorted and placed in a table column 1 The number of values of X4 and X that correspond to the distinct values in column 1 are placed in columns 2 and 3 1 2 3 4 5 6 Value Number Number Cum Prob X4 Cum Prob X Difference in E in X2 0 1 0 1 6 0 167 0 5 0 0 0 167 1 1 1 2 6 0 333 1 5 0 2 0 133 2 2 1 4 6 0 667 2 5 0 4 0 267 4 1 2 5 6 0 833 4 5 0 8 0 033 5 1 o 6 6 1 000 4 5 0 8 0 2 6 O 1 6 6 1 000 5 5 1 0 0 0 The cumulative probability of obtaining a particular value for X and X is placed in columns 4 and 5 The difference in cumulative probabilities is placed in column 6 The Kolmogorov Smirnov test statistic KS is the largest of all of the absolute differences in cumulative probabilities In the above example KS 0 267 Let n and n denote the size of the first and second sample In the large sample case where n gt 10 and nz 10 the critical value for the test can be constructed as follows Define K as K jm 1ft2 x one tailed test 0 1 0 05 0 025 0 01 0 005 x two tailed test 0 2 0 1 0 05 0 02 0 01 Critical Value 1 07K 1 22K 1 36K 1 52K 1 63K In the small sample case where n lt 10 or n lt 10 critical values are taken from a table of Kolmogorov S
73. ies Multiplies the values in a particular field of records that match the criteria in a database Estimates the standard deviation based on a sample of selected database entries Calculates the standard deviation based on the entire population of selected database entries Adds the numbers in the field column of records in the database that match the criteria Estimates variance based on a sample from selected database entries Calculates variance based on the entire population of selected database entries Engineering functions BESSELI Returns the modified Bessel function In x 136 BESSELJ BESSELK BESSELY BIN2DEC BIN2HEX BIN2OCT COMPLEX CONVERT DEC2BIN DEC2HEX DEC20CT DELTA ERF ERFC GESTEP HEX2BIN HEX2DEC HEX20CT IMABS IMAGINARY IMARGUMENT IMCONJUGATE IMCOS IMDIV IMEXP IMLN IMLOG10 IMLOG2 IMPOWER IMPRODUCT IMREAL Returns the Bessel function Jn x Returns the modified Bessel function Kn x Returns the Bessel function Yn x Converts a binary number to decimal Converts a binary number to hexadecimal Converts a binary number to octal Converts real and imaginary coefficients into a complex number Converts a number from one measurement system to another Converts a decimal number to binary Converts a decimal number to hexadecimal Converts a decimal number to octal Tests whether two values are equal Returns the error function Returns the complementary error function Tests wheth
74. ignificance p value decision rule and conclusion of the test 78 Section 9 ANOVA The ANOVA form is selected by clicking the ANOVA button in the Statistician ribbon tab Statistician ANOVA Data Variables ANOVA Table Type Pairwise Tests One Factor None Two Factor Without Replication O Fisher LSD Two Factor With Replication Tukey Kramer Number Of Replications Scheffe Significance Pairwise Alpha Om 5 O10 O1 O Other z gt 5 es O 10 C Output Summary Statistics Significant Digits 2 y Select All Variables wl Format Output DeSelect All Variables Output Results Manage Data Close The analysis of variance form ANOVA allows users to test the difference in the mean of a number of populations on the basis of samples Three forms of ANOVA can be implemented with the ANOVA form These are 1 Single factor ANOVA 2 Two factor ANOVA without replication 3 Two factor ANOVA with interaction ie with replication Each of these three forms of ANOVA analysis is discussed separately 79 a Single factor ANOVA Assume there are a number of populations of interest each of which is comprised of a number of experimental observations Each population is referred to as a treatment Assume there are a treatments and let u4g be the mean of the experimental observations in each of the a treatments Assume there are n experimental obser
75. ignificance level the F statistic critical value p value the decision rule and the conclusion are also reported If the Output Summary Statistics checkbox is checked then summary statistics for each of the treatments and blocks are outputted 88 Section 10 Regression Analysis The Regression form is selected by clicking the Regression button on the Statistician ribbon tab Statistician Regression Analysis Dependent Variable Constant Output Select a Data Variable v With Constant wl Regression Statistics C 3 Without Constant Variance Inflation Factors Independent Variables Residual Diagnostics Standard Errors Variance Covariance Matrix v OLS White 1980 Fitted Values and Residuals Mallows Cp Analysis Newey West 1987 Sample Range Use All Observations User Defined From to wl Format Output Output Results Manage Data Close Technical Discussion The standard linear regression model with k independent variables one of which may be a constant with n observations is given as Yi Bo Pixs Box25 PkXr it i where y is the i observation on the dependent variable xj is the i observation on the j independent variable p is the i coefficient of the i independent variable that is to be estimated and sis the i residual or error or disturbance term The expression in can be written m
76. is a Cluster Analysis Statistician Cluster Analysis Data Variables Objects Measure Names optional Distance Metric Output Select A Data Variable v Euclidian v Objects v Standardize Variables Power 2 C Gun Number Of Clust um usters Linkage Method Distance Matrix Method NearestNeighbour v E Hierachical Clusters K Means Horizonally _ Dendrogram Select All Variables E tO DeSelect All Variables TER Output Results Manage Data Close Cluster analysis refers to a group of techniques that groups objects based upon the characteristics they possess Cluster analysis groups objects so that similar objects with respect to some set of predetermined selection criteria or attributes or measures will be assigned to the same cluster Fundamental to the use of any clustering technique is the computation of a measure of similarity or distance between the objects Before any analysis takes place the measurements for each object are often standardized by subtracting their average and then dividing by the standard deviation The distance between any two objects can be defined in a number of ways Each object in Statistician is defined as a Data Variable and each observation in that Data Variable is a measure on that Data Variable Let m and n be vectors of measures on two objects where m is the oh measure on the first object and n is the it measure 116
77. ive method for calculating the Spearman rank order coefficient when tied ranks are present in the random variables is as follows The number of distinct tied ranks for each random variable x and y is denoted by n and ny respectively For random variable x the number of observations in Oh group of tied observations is denoted by n The quantity T for random variable x is defined as T 3 n yt nyi Define S as S LE For random variable y S is similarly defined The tie corrected Spearman rank correlation coefficient rg is then given as r Sx F Sy _ Xid g res where d is the squared difference between the corresponding ranks of observations in variables X and Y In the large sample case where n gt 10 the test statistic under the null hypothesis that the population correlation ps is equal to zero is given as Ten 2 l1 r2 The test statistic has a student t distribution with n 2 degrees of freedom i In the small sample case where n lt 10 the critical value to test the null hypothesis Ho ps 0 Should be taken from a table of Spearman s critical values If the alternative hypothesis is H4 ps 0 the null hypothesis is rejected if r is greater than or equal to the two tailed critical value for a given level of significance ol If the alternative 19 hypothesis is H p gt 0 the null hypothesis is rejected if r is positive and r is greater than or equal to the one tail
78. ived in step 3 are adjusted so that the average of the additive seasonal indexes is O and the average of the multiplicative indexes is 1 Assume that there are s seasons and seasonal indexes The i adjusted seasonal Sa A index for the additive model is given as SI A SE The ih adjusted seasonal index for the multiplicative model is given as SI xs j 1 j 109 The seasons can be defined in a number of different ways in the Seasons groupbox depending upon the selected radiobutton as follows Radiobutton Seasons Number of Seasons Weekdays All Mon Tue Sat Sun T Weekdays Excluding Sun Mon Tue Fri Sat 6 Weekdays Mon Tue Wed Thu Fri 5 Excluding Sat and Sun Monthly Jan Feb Dec 12 Quarterly Jan Mar Apr Jun Jul Sep Oct 4 Dec Numeric User enters the number of User defined Number of Seasons seasons as an integer in an input box From Data Variable User selects a Data Variable which User defined contains the names of all of the seasons Users can select the season at which the data begins by making a selection in the Data Begins at Season combobox By default the data begins at the first season that is defined The Oh deseasonalized observation for an additive model is given as y SI and for a multiplicative model is given by y SI where SI is the adjusted seasonal index corresponding to time t The tt fitted observation Il fo
79. k Prescott HU EE 115 Section 14 Multivariate Analysis cccc cc cececcec ee ee cece ec ee eee e een eens eeeeeeseeneeeseeeeneeeed 116 fa Greter EE we Need ae eu 116 i Hierarchical or Join Clustering gees eege eg EeEN Ed EENEG yan EE en 119 ii Rome EE ETH 121 b Principal Component Analysis cc ccceccec ec nec eee ec eens cece en eens cee edeeeeeseeeeneeneaeaes 122 Appendix A Built in Excel Statistical Functons ccc eeceee eee eeeeeeeeeeaeeeees 124 Appendix B Other Excel Functions E EE 128 Math and trigonometry functions EEN 128 Text e Te ET 130 Financial tee 131 Date anid time TUNCtHONS crisderi aoaie rara lt Qdesdende tasted a edad bovenae ddd ecdaudediaalvelaneaeds 133 Information Eet e 134 Logical fUn ton EE 135 Lookup EE EE 135 Database function Suiriri nre MER Shade Eeselen Bue caeanaeateseet 136 EIERE Ee 136 Add in and Automation Ee 138 CUDE Feb Oger ege ETE E EE E EA EEE E E E TE 138 5 Section 1 Manage Data Note If you are a first time user of Statistician you can get up to speed quickly and easily by going through the Quick Start guide and data that is downloadable from www xlstatistianaddin com The User manual that you are now using is more of a reference manual than a learning tool The Manage Data form is selected by clicking the Manage Data button in the Tools groupbox on the Statistician ribbon tab Statistician Manage Data Data Variab
80. les a Import Export Import Data Variables Data Variable Names in First Row 4 Export Data Variables Rename Remove Rename a Data Variable Remove Data Variable s Select All Variables DeSelect All Variables Close In Statistician the analysis of data is not conducted upon data stored in a range of cells in a worksheet Rather data is imported from a range within an Excel worksheet and is copied to a hidden worksheet through the Manage Data form This form like all forms in Statistician is activated from the Excel ribbon by clicking the Manage Data button The Manage Data form can also be activated from all other forms in Statistician The Manage Data form allows the user to define a Data Variable All of the analysis of data in Statistician is conducted upon Data Variables 6 a Data Variables A Data Variable in Statistician is comprised of a number of Data Observations stored in a column which are usually numeric but may be text To import Data Variables into Statistician and if Data Variable Names in First Row is checked the data must be stored in columns in a spreadsheet and each column must have a Data Name in the first row A Data Name can be comprised any characters available on the keyboard If Data Variable Names in First Row is not checked then Statistician will assign a Data Name to the data b Importing Data Variables To import Data Variables into Statistician click
81. ls to be varied is selected by clicking the Select a Range of Cells to Vary buttons The maximum number of iterations is written in the Maximum Number of Iterations textbox and the accuracy or tolerance of the optimization is written in the Tolerance textbox Optimizer has some advantages and disadvantages in comparison to Excel Solver The two disadvantages are that Optimizer is slower than Solver and does not do constrained optimization However the speed of optimization is of little consequence in practice unless highly numerically demanding tasks are performed On the positive side Optimizer is simpler to use than Solver and does sometimes arrive at a better solution than Solver because of the inbuilt optimization algorithms Thus Statistician users have the choice of two optimization tools 28 h Make Lagged Data Variables The Make Lagged Data Variables form is selected by clicking the Make Lagged Data Variables menu item from the Tools button on the Statistician ribbon tab Statistician Make Lagged Data Variables Data Variables Number of Lagos 1 Select All Variables DeSelect All Variables Manage Data Close The Make Lagged Data Variables tool will create lags of a predefined Data Variable To create a set of lagged Data Variables the user selects one or more Data Variables and then sets then number of desired lags from the Number of Lags textbox Clicking Make Now creates the lagged Data Variables 2
82. mate a Binary regression model the model to be estimated is selected by selecting either Probit or Logit option in the Model groupbox The user selects a binary dependent variable from the Dependent Variable combobox and then selects one or more independent variables from the Independent Variables listbox The binary dependent variable must have one of two distinct values These distinct values may be numeric or text and represent either a 1 or 0 The symbols for 1 and 0 are generated by Statistician after analyzing the selected Data Variable Statistician enters in the symbols for 1 or 0 in the 1 and O labels in the Dependent Variable Data groupbox Clicking the Switch button switches the symbols for 1 and 0 Checking either With Constant or Without Constant in the Constant frame determines if a constant term is estimated in the model The regression can be performed with all observations by clicking the Use All Observations option in the Sample Range groupbox If the User Defined option is selected in the Sample Range groupbox the regression can be restricted to a subset of all of the observations by entering the starting number and ending number of the range of observations in the From and To textbox that will be displayed The user can select the type of output from the regression in the OutPut groupbox If no option is selected in the OutPut frame then Statistician reports dependent variable n
83. may be numeric or text The symbols for success and failure are generated by Statistician after analyzing the selected Data Variable Statistician enters in the symbols for success and failure in the Success and Failure labels in the Symbol groupbox Clicking the Switch Symbols button switches the symbols for success and failure When the Output Results button is clicked Statistician outputs the e null hypothesis e alternative hypothesis e sample size e number of ones e number of zeros e proportion of successes e np onl p e test statistic e critical value e p value e decision rule and e conclusion If the user checks the Estimate Confidence Interval checkbox a confidence interval for the population proportion is outputted The number of significant digits after the decimal point in the confidence interval output is selected from the Number of Significant Digits in Output combobox the default is 3 To obtain the required sample size for a given confidence interval width the Output Sample Size checkbox in the Confidence Interval groupbox is selected The user enters the desired distance from the true population proportion in the corresponding textbox 46 e Test Difference in Population Proportions The Test Difference in Population Proportions form is activated from the Standard Tests button on the Statistician ribbon tab Statistician Test Difference in Population Proportions H
84. me Series Holt Winters Data Variable Select a Data Variable v Method Factor Value s Single Simple Simple Factor Alpha 0 5 Double Si ith Ti S S S hua Trend Factor Beta 0 5 _ Triple Simple with Trend and Seasonals Forecasts _ Forecast to Seasonal Factor Gamma 0 5 Number of Seasons Season Type wl Format Output Output Results Manage Data Close i Single or Simple exponential smoothing The forecast F of a time series Wrs el is given by Luz ay 1 a F where the damping or smoothing factor is a fixed parameter The damping factor is restricted to the range 0 lt lt 1 The forecast series is produced by initially by setting F y Note that F is undefined Subsequent forecasts are calculated iteratively from the preceding forecast and preceding observation The first out of sample forecast beginning after the final observation is given by Fr41 ay7 1 a Fr Thereafter the level of the series is flat or constant and given as Fron Fry h 1 2 In the single factor Holt Winters model the level of the series is identical to the forecast series ii Double exponential smoothing If the time series contains a trend the Holt Winters two factor model is defined by the equations Le ay 1 a Lt 1 bt 1 112 by P Li Lt 1 1 P bi 1 Fram Le mb where a and P are fixed
85. mirnov critical values 68 If the alternative hypothesis is H F X F X the null hypothesis is rejected if the absolute value of the test statistic is greater than or equal to the critical value If the alternative hypothesis is H F X gt F X the null hypothesis is rejected if the absolute value of the test statistic is greater than or equal to the critical value and the test statistic is positive If the alternative hypothesis is H F X lt F X the null hypothesis is rejected if the absolute value of the test statistic is greater than or equal to the critical value and the test statistic is negative Using Statistician Two Sample Tests The user selects the two Data Variables for analysis from the two Null Hypothesis comboboxes in the Hypothesis Test groupbox The null hypothesis lt and alternative hypothesis lt gt are selected in the corresponding comboboxes in the Hypothesis Test groupbox If the Output Ranks checkbox is checked then Statistician will output the rank of each Data Variable observation for the Mann Whitney U test the Wilcoxon Rank Sum test and the Wilcoxon Signed Rank Sum test For the Kolmogorov Smirnov test the user can output the cumulative probabilities of the selected Data Variables by checking the Output Cumulative Probabilities checkbox Statistician outputs the sample size of each Data Variable test statistic critical value significance null and alternative hypothesis
86. n as 7 Var n N 0 1 Two tests are commonly employed to test for autocorrelation in the time series y Under the null hypothesis that y is a white noise process the Box Pierce 1970 test statistic is given as The Ljung Box 1979 test statistic is given as Tie p Q n n 2 k 1 and is generally regarded as having better finite sample properties than the Box Pierce 1970 statistic Both the Q and Q statistics are distributed as a XG random variable An estimate of the kt partial autocorrelation x is taken from the linear regression Yt Bot PaYt 1 Bk 1Yt k 1 PeMt k Et where E s 0 21 Using Statistician Statistician Autocorrelation Data Variable Number of Lags 10 Q Statistic Box Pierce 1970 Ljung Box 1979 v Format Output Output Results The Autocorrelation form is selected by selecting the Autocorrelation menu item from the Tools button on the Statistician ribbon The user selects the Data Variable for analysis from the Data Variable combobox The user enters the number of lags of the selected Data Variable to be tested in the Number of Lags textbox the default is 10 Either the Box Pierce 1970 or Ljung Box 1979 test statistic can be selected from the Q Statistic groupbox Statistician outputs the autocorrelation partial autocorrelation Q statistic and p value at each lag of the selected Data Variable 22
87. ns in the i treatment and jtt block 1 e x ae Ei et Xi jx is the sample mean of all observations e x The total sum of squares SS Total the treatment sum of squares SST the block sum of squares SSB the interaction sum of squares SSI and the error sum of squares SSE are given as 85 j 1 a b ssi r ER xP x i 1 j 1 a b r SSE Y Y Y Du SCT i 1 j 1k 1 The five of sum of squares are related by SS Total SST SSB SSI SSE The mean square of treatments MST the mean square of blocks MSB the mean square of interaction MSI and the mean square error MSE are defined as SSI EEN SSE E ab r 1 Under the null hypothesis that the mean of all treatments are equal the test statistic is given as fr na where fr is F distributed with a 1 degrees of freedom in the numerator and ab r 1 degrees of freedom in the denominator Under the null where fe is F distributed with b 1 degrees of freedom in the numerator and ab r 1 degrees of freedom in the denominator Under the null hypothesis that the mean of all MS replications are equal the test statistic is given as f where fi is F distributed hypothesis that the mean of all blocks are equal the test statistic is given as fg 86 with a 1 b 1 degrees of freedom in the numerator and ab r 1 degrees of freedom in the denominator Using Statistician Two factor ANOVA with
88. ntal factors treatments and blocks and these two factors have an interaction effect An interaction effect takes place if the response of one factor is dependent on the level of the other factor Three sets of hypothesis are of interest a A test of the hypothesis that the means of all of the treatments are equal b A test of the hypothesis that the means of all of the blocks are equal c A test of the hypothesis that the interaction effect between factors is zero Let uj be the mean of treatment i where 1 lt i lt a and let u be the mean of block j where 1 lt j lt b Let ut j be the interaction effect between treatment i and block j There will be a x b interaction terms It is also assumed that the experiment is replicated r times The null and alternative hypothesis for the sets of tests can be stated as a Ho yi 42 Ha H not all of the means of the treatments are equal b Ho wt u3 Up H not all of the means of the blocks are equal c Ho Hi1 Hi2 Hap 0 H not all of the interaction ef fects are equal to zero The following notation will be used in this section e Xi jx is the observation in the i treatment and j block in the k replication e lt x is the sample mean of all observations in the the i treatment e xP is the sample mean of all observations in the the j block e lt x is the sample mean of all observations in the the k replication i is the sample mean of all observatio
89. number up to the nearest even integer Returns e raised to the power of a given number Returns the factorial of a number Returns the double factorial of a number Rounds a number down toward zero Returns the greatest common divisor Rounds a number down to the nearest integer Returns the least common multiple Returns the natural logarithm of a number Returns the logarithm of a number to a specified base Returns the base 10 logarithm of a number Returns the matrix determinant of an array Returns the matrix inverse of an array Returns the matrix product of two arrays 128 MOD MROUND MULTINOMIAL ODD PI POWER PRODUCT QUOTIENT RADIANS RAND RANDBETWEEN ROMAN ROUND ROUNDDOWN ROUNDUP SERIESSUM SIGN SIN SINH SQRT SQRTPI SUBTOTAL SUM SUMIF SUMIFS SUMPRODUCT SUMSQ SUMX2MY2 SUMX2PY2 Returns the remainder from division Returns a number rounded to the desired multiple Returns the multinomial of a set of numbers Rounds a number up to the nearest odd integer Returns the value of pi Returns the result of a number raised to a power Multiplies its arguments Returns the integer portion of a division Converts degrees to radians Returns a random number between 0 and 1 Returns a random number between the numbers you specify Converts an arabic numeral to roman as text Rounds a number to a specified number of digits Rounds a number down toward zero Rounds a number up away from zero Returns the
90. ore compactly as Y XfB e 89 where Y is a nx 1 vector of observations on the dependent data variable X is a n x k matrix of observations on the independent data variables p is a k x 1 vector of fixed coefficients and e is a n x 1 vector of residuals An estimate of the regression coefficients is given as b X X 1X Y where b is a k x 1 vector of coefficient estimates The estimated residuals from the regression are given as Y Xb where is a k x 1 vector of residual estimates The sum of squares of the residuals is given as SSR amp The total sum of squares of the regression is given as n TSS 01 7 i 1 where yis the mean of the dependent variable y The regression sum of squares is then given by RSS TSS SSR An estimate of the variance of the residuals is given as The OLS standard errors of the coefficient estimates are given by the square root of diagonal elements of the OLS variance covariance matrix defined as Lois s X Xt where Zoe is a k x k matrix The it OLS t statistic of the coefficient estimates is given as bj tstat i Se where b is the oh coefficient estimate and se is the it standard error of the coefficient estimate The OLS p value of the coefficient estimate is taken from a t distribution with n k degrees of freedom and is the probability that the coefficient is equal to zero 90 The White 1980 es
91. ple mean of all observations in the i treatment and D block Let xf be the sample mean of all of the block means in the i treatment and let xP be the sample mean of all the treatment means in the j block Let x SE E i j be the sample mean of all means The total sum of squares SS Total the treatment sum of squares SST the block sum of squares SSB and the error sum of squares SSE are given as b a SS Tota l E H i 8y i 1 Ze a SST b H i x i 1 b SSB a EF x j 1 a b SSE gt gt Gy x xf x i 1 j 1 The four of sum of squares are related by SS Total SST SSB SSE 83 The mean square of treatments MST the mean square of blocks MSB and the mean square error MSE are defined as SST MST 1 MSB SSB b 1 MSE SSE a 1D b 1 Under the null hypothesis that the mean of all treatments are equal the test statistic is given as fr where fr is F distributed with a 1 degrees of freedom in the numerator and a 1 b 1 degrees of freedom in the denominator Under the null hypothesis that the mean of all blocks are equal the test statistic is given as fg where fg is F distributed with b 1 degrees of freedom in the numerator and a 1 b 1 degrees of freedom in the denominator Using Statistician Two Factor Table without replication When the user selects the Two Factor Table without replication
92. ponent Analysis Statistician Principle Component Analysis Data Variables v Standardize Variables Sample Names Optional Select a Data Variable v Output _ Covariance Matrix Correlation Matrix Components Correlogram Variables and Compo Significant Digits 4 y Select All Variables DeSelect All Variables sl Format Output Output Results Manage Data Close Principal Component Analysis PCA is a commonly used as a variable reduction technique that expresses a set of variables in terms of a smaller set of composite variables or principle components These principle components are a linear combination of the original variables Principal Component Analysis is a methodology that estimates those components that contribute most to the variation in the data These components are uncorrelated with each other The first principle component extracts the maximum variance from the variables The second principle component is constructed from the residual correlations and extracts the maximum variance from a linear function of the random variables that is orthogonal to the first principle component Subsequent principle components are similarly formed Principle components are calculated as follows Let X be a n x m matrix where m is the number of measurement types or characteristics factors variables and n is the number of samples recorded on each of the measurement types Le
93. qual 1 1 Z 2 E Zaz X 0 1 2 4a 2 ni Ny where s and s are the sample variances of each random variable o and of are the known and unequal variances of each random variable and where a is the known and equal variance of each of the random variables When the variances are unknown and unequal the degrees of freedom of the t distribution is given by si sf s 2 s2 2 d f E WIES m 1 n2 D where d f is rounded to the nearest integer When the variances are unknown and equal the degrees of freedom are given by d f n n 2 To test the null hypothesis that the variances of the random variables are equal against the alternative hypothesis that the variances are unequal an F statistic is calculated as s s The F statistic has an F distribution with n 1 degrees of freedom in the numerator and n 1 degrees of freedom in the denominator The null hypothesis is rejected if F lt Fy_q 2n n lower critical value or if F gt Fa zn n Upper critical value To obtain a sample size required to construct an estimate of the difference in population means for a given confidence interval width and significance level the required sample size is the lowest integer greater than N in the following table 37 Standard Deviation Sample Size Unknown and Unequal taj a s2 N E Unknown and equal Canary s2 N 2 Known and unequal Zaz Lei 02 N
94. r an additive model is given as fi 5S SI and for a multiplicative model is given by fe S x Sle where SI is the adjusted seasonal index corresponding to time t The Mean Absolute Deviation MAD Mean Absolute Percentage Error MAPE and the Root Mean Square Error RMSE are reported and defined as S 1 MAD ly Sil i 1 S 100 BE MAPE gt Ye Mt Yt 110 S 1 RMSE X Or S i 1 where we is the observation in the original series A is the observation in smoothed series and s is the number of smoothed observations which will be different from the number of observations in the original series if a moving average method is employed If seasonal indexes are not calculated for the original time series then the MAD MAPE and RMSE are reported for the smoothed series otherwise they are reported for the fitted series which is adjusted for seasonal effects teh tth If the user checks the Forecast To checkbox a forecast horizon is then entered into the corresponding textbox This forecast horizon must be an integer that is greater than the number of original observations However no forecasts can be obtained if the time series has been smoothed with a Contemporaneous or Centered moving average method This is because these smoothing methods employ current or future values to construct a smoothed series 111 b Holt Winters smoothing techniques Statistician Ti
95. radiobutton from the ANOVA Table Type groupbox user then selects two or more Data Variables from the Data Variables listbox Each of these Data Variables are the treatments Each observation in a treatment is the mean of a block Hence the first observation of each Data Variable is the mean of the first block the second observation of each Data Variable is the mean of the second block and so on Each treatment can be thought of as a column of data Each block can be considered to be a row of data The number of observations sum average and variance of the means of treatments and blocks is outputted A table describing the source of variation as displayed below is then reported Sour ie oF d f Suet of mean F Statistic F critical p value Variation Squares Squares Treatments oi SST MST fr MST MSE rarer P F gt fr Blocks b 1 SSB MSB fs MSB MSE Fap 1n 1 P F gt fB Error a 1 b 1 SSE MSE Total ab 1 SS Total The null and alternate hypothesis for the test on treatments and blocks the test significance level the F statistic critical value p value the decision rule and the 84 conclusion are also reported If the Output Summary Statistics checkbox is checked then summary statistics for each of the treatments and blocks are outputted c Two factor ANOVA with Replication The two factor ANOVA model is now extended to the case where there are two experime
96. rvations It is defined as ie a n 5 C eh ewness SEN E i Standard Error of Skewness A measure of the significance of the skewness estimate It is defined as F d 8 6n n 1 Std Error of Skewness n 2nt Dnt3 Excess Kurtosis A measure of the magnitude of observations in both tails of the distribution relative to the normal distribution It is defined as n n 1 DEE 3 n 1 Excess Kurtosis n 1 n 2 n 3 eee i n 2 n 3 Standard Error of Excess Kurtosis A measure of the significance of the skewness estimate It is defined as Std E E Kurtosis o rror of Excess Kurtosis n 3 n 2 n 3 n 5 Jacque Bera Test Statistic A test statistic to measure the normality of the distribution of a Data Variable The Jacque Bera statistic has a Xe distribution It is defined as Skewness Excess Kurtosis B Jacque Bera d 6 74 Durban Watson Test Statistic A test for autocorrelation in the Data Variable and is given by Ors EG DW nx i 1 i 16 Critical values for the Durban Watson test statistic are taken from a table of Durban Watson critical values b Covariance and Correlation The Covariance and Correlation form is selected by clicking the Covariance and Correlation button in the Tools groupbox on the Statistician ribbon tab Statistician Covariance and Correlation Data Variables Table Body Sample Range Co
97. rvations with magnitude less than the observation Number Below The observation s percentile is then calculated as Number Above Number Above Number Below x 100 26 f Sampling The Sampling form is selected by clicking the Sampling menu item from the Tools button on the Statistician ribbon tab Statistician Sampling Data Variable Select a Data Variable v Number Of Drawi Sampling Method S Per ae 1 C With Replacement C Without Replacement Number of Samples 1 Format Output Output Results The user selects a Data Variable from which random samples will be taken The type of sampling either with or without replacement is selected in the Sampling Method groupbox The number of observations in each sample is written in the Number Of Drawings Per Sample textbox The number of samples to be taken is written in the Number Of Samples textbox 27 g Optimizer The Optimizer form is selected by clicking the Optimizer menu item from the Tools button on the Statistician ribbon tab Statistician Optimizer Select a Cell to Optimize Select a Range of Cells to Vary Minimize Maximize Maximum Number Of 1000 Iterations Tolerance 1E 7 Optimize Now The Optimizer tool will either minimize or maximize the value of a cell by changing the value in some other cells The cell to be optimized is selected by clicking the Select a Cell to Optimize button and the range of cel
98. s often employed as a criterion to select a subset of p regressors from k potential regressors p lt k in multiple linear regressions The intercept is included in p The Mallows C statistic with p regressors is defined as SSR Cp Re 2p n where SSR is the residual sums of squares for the model with p regressors 6 is the estimated variance of the error term of the full model with k regressors and n is the sample size When an important regressor has been omitted from the regression it is expected that C gt p The favoured model is one where Cp is close to or less than p Using Statistician Regression To perform a regression the user selects a dependent variable from the Dependent Variable combobox and then selects one or more variables from the Independent Variables listbox Checking either With Constant or Without Constant in the Constant groupbox determines if a constant term is estimated in the regression The regression can be performed with all observations by clicking the Use All 93 Observations option in the Sample Range frame If the User Defined option is selected in the Sample Range frame the regression can be restricted to a subset of all of the observations by entering the starting number and ending number of the range of observations in the From and To textbox that will be enabled The user can select the type of standard errors and variance covariance matrix of the parameter estimates th
99. s the F probability distribution Returns the inverse of the F probability distribution Returns the Fisher transformation Returns the inverse of the Fisher transformation Returns a value along a linear trend Returns a frequency distribution as a vertical array Returns the result of an F test Returns the gamma distribution Returns the inverse of the gamma cumulative distribution Returns the natural logarithm of the gamma function T x Returns the geometric mean Returns values along an exponential trend Returns the harmonic mean Returns the hypergeometric distribution Returns the intercept of the linear regression line Returns the kurtosis of a data set Returns the k th largest value in a data set Returns the parameters of a linear trend Returns the parameters of an exponential trend Returns the inverse of the lognormal distribution Returns the cumulative lognormal distribution Returns the maximum value in a list of arguments Returns the maximum value in a list of arguments including numbers text and logical values Returns the median of the given numbers Returns the minimum value in a list of arguments Returns the smallest value in a list of arguments including numbers text and logical values Returns the most common value in a data set Returns the negative binomial distribution Returns the normal cumulative distribution Returns the inverse of the normal cumulative distribution 125 NORMSDIST NORMSINV
100. s the selected significance level In the small sample case critical values are taken from a table of Friedman critical values Using Statistician Two or More Samples The user selects two or more Data Variables for analysis from the Data Variable listbox If the Output Ranks checkbox is checked then Statistician will output the rank of each Data Variable observation Statistician outputs the sample size of each Data Variable test statistic critical value significance p value rank sum null and alternative hypothesis decision rule and conclusion for each of the two tests In the case of the Friedman test each selected Data Variable represents a treatment and each observation within a treatment represents a block The number of observations blocks within each Data Variable treatments must be equal 71 j Goodness Of Fit Tests Chi square The chi square and goodness of fit test tests if a Data Variable has a specific distribution The chi square goodness of fit test can be applied to both discrete and continuous distributions The null and alternative hypothesis for a goodness of fit test is given as Ho F X Fo X Hy F X Fo X where F X is the population distribution from which the sample is derived and F X is the hypothesized theoretical distribution from which the sample is derived With the chi square goodness of fit test the data is divided into k bins and the test statistic is defined as k 2
101. s to be outputted are selected by clicking checking the various checkboxes in the body of the form If the Select All Statistics button is clicked then all of the selected statistics will be outputted Clicking the DeSelect All Statistics button de selects all of the checkboxes Description of the Summary Statistics Assume n observations with x being the i observation on a selected Data Variable Count The number of observations Minimum The smallest observation Maximum The largest observation Sum The sum of all observations Arithmetic Mean The arithmetic mean of all observations It is defined as n i 1 ee x n Median The middle observation The observations are initially sorted in ascending order If there are an odd number of observations the median is the middle observation If there is an even number of observations then the median is the average of the two middle observations Mode The most common observation If no mode exists then N A is reported If more than one mode exists then the first mode is reported Range The difference of the largest and smallest observation Inter Quartile Range A measure of spread based upon the difference of the observations at the first and third quartile The observations are initially sorted in ascending order If there is an even number of observations then the data is divided into two groups a group with the highest numbers and a group with the lo
102. se All Observations option in the Sample Range groupbox If the User Defined option is selected in the Sample Range groupbox the regression can be restricted to a subset of all of the observations by entering the starting number and ending number of the range of observations in the From and To textbox that will be displayed The user can select the type of output from the regression in the OutPut groupbox If no option is selected in the OutPut frame then Statistician reports dependent variable name sample range number of observations in the regression model estimated parameter estimates parameter estimate standard errors parameter estimate t ratio s parameter estimate p values Output from each of the four options in the OutPut groupbox are as follows 1 Model Statistics mean of the dependent variable variance of the dependent variable deviance log likelihood restricted log likelihood Akaike information criterion 103 e corrected Akaike information criterion e Scwharz information criterion e Psuedo R e LR statistic e Chi square statistic 2 Variance Covariance Matrix Maximum likelihood variances and covariances are reported 3 Estimation Information estimation algorithm tolerance iterations required function evaluations required starting values 4 Fitted Values and Residuals The fitted values estimated residuals and estimated Pearson residuals from the
103. servation is a two point moving average of two moving averages where the first moving average has one more lead term than the second moving average For example if J 4 then S Wt 2 Yt 1tYt Yt 1 4 VYt 1 Yt tYt 1 Yt 2 4 2 int 1 2 smoothed observations being undefined at the start and at the end of the smoothed time series The int function removes any values after the decimal point Using a centered moving average involves 106 iv Weighted Moving Average The current value of a smoothed series is the weighted average of the past J observations that is Ay E jw The weights must be in the range 0 lt wj lt 1 and sum to one that is Um 1 The weights are defined by selecting a Data Variable containing the weights from the Select a Weighting Data Variable combobox When using the Weighted Moving Average technique users can readily find the optimal value of the weights that minimize the MAD MAPE or RMSE defined later with the use of Excel solver 3 Trend smoothing techniques i Linear Trend The time series is modeled as a linear function of a time variable t 1 n where n is the number of observations Using regression analysis the following model is estimated Yt Po Bit amp Where Bn and D are fixed parameters and N 0 02 Let b and b be least squares estimates of Dun and p respectively The smoothed values of y are the fitted values from the regression that is
104. sis is rejected if T lt T In the small sample case acceptable values of alpha are 0 1 0 05 and 0 01 The large sample case is defined when n gt 10 or nz gt 10 The test statistic is given as E 7 CM uns Ty the large sample case the test statistic has a 2 12 standard normal distribution and can be tested with a standard z test 65 e Wilcoxon Signed Rank Sum Test The Wilcoxon signed rank sum test tests if the location of a set of ordinal matched pairs are different This test is employed when the matched pairs are not independent The test statistic is constructed by taking the difference of the matched pairs d Xi xz where x and x2 are the i observation on the first and second selected Data Variables respectively and d is the difference between the matched pairs 1 lt i lt n Ifd 0 then the i matched pair is ignored Let the number of remaining matched pairs with a non zero difference be denoted by n The absolute value of the non zero differences are calculated and then ranked in ascending order Observations with a tied rank are assigned a rank which is the average of the corresponding raw ranks Let T be the sum of the ranks where d gt 0 and let T be the sum of the ranks where d lt 0 The test statistic T is defined as T min T T The Wilcoxon signed rank sum test is conducted differently for small and large samples Let a small sample be defined as the case where n
105. stimated residuals and is given by SE A D Where is the mean of the estimated residuals The mean of the estimated residuals is equal to zero if a constant is included in the regression and is usually non zero otherwise The standard error of the skewness estimate is given as e T 6n n 1 se of Skewness EECH DGS Dint Dnt 3 The Excess Kurtosis statistic is a measure of the kurtosis of the estimated residuals and is given by n n 1 amp 3 n D Excess Kurtosis ae Da Daas m n 2 n 3 The standard error of the excess kurtosis estimate is given as p WW 24n n 1 se of Kurtosis n 3 n 2 n 3 n 5 The Jacque Bera test statistic tests for the normality of the residuals and has a E distribution It is given as 92 Ss Skew i Excess Kurtosis EE 24 The Durban Watson test statistic tests for autocorrelation in the residuals and is given by oy Dolan Ee Ea 2S Gi i DW The Variance Inflation Factor VIF statistic in a regression is employed as an indicator of the contribution to multicollinearity of an individual regressor or independent variable The VIF of regressor kis given as VIF 1 R where RZ is the R computed from regressing the kt regressor upon all other regressors Note a regression must have at least two independent variables for the VIF statistic to make any sense Mallows Cp Statistic The Mallows C statistic i
106. sts are as follows Test Critical Value Notes Fisher LSD 1 1 ta 2n a MSE LG Tukey Kramer MSE 1 1 qala n a is the critical qala n a value of the Studentised 2 nm m range distribution Scheffe zca 1 F a q 1n a MSE n 81 If x x2 lt Critical Value then it is concluded that there is no significant difference between the means of the two treatments being tested Using Statistician Single Factor ANOVA After selecting the One Factor radiobutton in the ANOVA Table Type groupbox the user then selects two or more Data Variables in the Data Variables listbox Each of these Data Variables are the observations on a treatment and may be of unequal sample size The number of observations sum average and variance of all of the observations within each treatment group is outputted A table describing the source of variation as displayed below is then reported sour So os d f lee Mean F Statistic F critical p value Variation Squares Squares Treatments oi SST MST f MST MSE Faa 1n 1 PE gt f Error n a SSE MSE Total n 1 SS Total The null and alternate hypothesis the test significance level the F statistic critical value p value the decision rule and the conclusion are also reported If the Output Summary Statistics checkbox is checked then summary statistics for each of the treatments
107. sum of a power series based on the formula Returns the sign of a number Returns the sine of the given angle Returns the hyperbolic sine of a number Returns a positive square root Returns the square root of number pi Returns a subtotal in a list or database Adds its arguments Adds the cells specified by a given criteria Adds the cells in a range that meet multiple criteria Returns the components sum of the products of corresponding array Returns the sum of the squares of the arguments Returns the sum of the difference of squares of corresponding values in two arrays Returns the sum of the sum of squares of corresponding values in two arrays 129 SUMXMY2 Returns the sum of squares of differences of corresponding values in two arrays TAN Returns the tangent of a number TANH Returns the hyperbolic tangent of a number TRUNC Truncates a number to an integer Text functions ASC Changes full width double byte English letters or katakana within a character string to half width single byte characters BAHTTEXT Converts a number to text using the baht currency format CHAR Returns the character specified by the code number CLEAN Removes all nonprintable characters from text CODE Returns a numeric code for the first character in a text string CONCATENATE Joins several text items into one text item DOLLAR Converts a number to text using the dollar currency format EXACT Checks to see if two tex
108. t Y has a specific value y is given as A e72 PY y ei where E Y u and Var Y u Let x denote the it independent variable and let f denote the coefficient of the it independent variable A linear combination of independent variables is given as x B By Bix PnXp Let Y denote the number of events observed from n repetitions for the Oh covariate pattern The expected value of Y in a Poisson regression is given as E Y u nye or equivalently log ui log ni xip The parameters of the Poisson model are estimated with maximum likelihood techniques The log likelihood of the Poisson model is given as N In L X exp xip yixiB input i 1 where N is the number of observations The gradient function is given as OL N ap D ior exviB i 1 The Hessian H of the Poisson log likelihood function is given as n X expri Bien i 1 pati apap The square root of the diagonal elements of the inverse of the Hessian matrix provides maximum likelihood standard errors for the parameter estimates 101 Fitted values from a Poisson regression are given as G xi b Yi nye where b is the maximum likelihood estimate of Gg Residuals from the regression are given as Pearson residuals are given as The deviance for a Poisson model is given as D 2 Yilog r Ai X and D are approximately equal with a chi squared distribution with N
109. t p denote the correlation matrix of X and let denote the covariance matrix of X The principle components of X are the eigenvectors of p or X The variance of the principle components 4 1 lt i lt p are the eigenvalues of matrix p or 122 After calculating the eigenvalues and eigenvectors Statistician reports e The eigenvalue for each component e The percentage of variance for each component e The cumulative percentage of variance for each component e The correlation matrix e The covariance matrix e The eigenvectors loadings for each component e The principle components e Acorrelogram between the principle components and the original variables Note that when Standardize is checked the analysis is performed upon the correlation matrix When Standardize is not checked the analysis is performed upon the covariance matrix 123 Appendix A Built in Excel Statistical Functions AVEDEV AVERAGE AVERAGEA AVERAGEIF AVERAGEIFS BETADIST BETAINV BINOMDIST CHIDIST CHIINV CHITEST CONFIDENCE CORREL COUNT COUNTA COUNTBLANK COUNTIF COUNTIFS COVAR CRITBINOM DEVSQ EXPONDIST Returns the average of the absolute deviations of data points from their mean Returns the average of its arguments Returns the average of its arguments including numbers text and logical values Returns the average arithmetic mean of all the cells in a range that meet a gi
110. t up by placing the observations in a table with r rows and c columns The observations typically represent the frequency of a particular event Rows and columns each represent the two classifications criteria of the two nominal variables Let n r xc denote the number of cells in the contingency table Let Sc 1 lt i lt c denote the sum of the hb column let Aer 1 lt j lt r denote the sum of the jt row and let S denote the sum of all cells If the two nominal variables are independent the expected value of the cell in row i and column j is given by e j SRiSCj io a l e SE Si Denoting the observation in cell in row i and column jas Or the test statistic is given as rT c 2 p y Y Cua ren i j The test statistic has a y distribution with r 1 c 1 degrees of freedom If the test statistic is greater than the critical value then the null hypothesis is rejected Using Statistician Contingency Tables The Non Parametric Tests Contingency Table form is selected by clicking the Contingency Table button in the Chi 2 Tests groupbox on the Statistician ribbon The user selects the Data Variables for analysis from the Data Variables listbox Each selected Data Variable represents the values of the first nominal variable or columns in the contingency table Each observation within a selected Data Variable represents the observed value for each category of the second nominal variable or rows in the contingency tabl
111. t values are identical FIND FINDB Finds one text value within another case sensitive FIXED Formats a number as text with a fixed number of decimals JIS Changes half width single byte English letters or katakana within a character string to full width double byte characters LEFT LEFTB Returns the leftmost characters from a text value LEN LENB Returns the number of characters in a text string LOWER Converts text to lowercase MID MIDB Returns a specific number of characters from a text string starting at the position you specify PHONETIC Extracts the phonetic furigana characters from a text string PROPER Capitalizes the first letter in each word of a text value REPLACE Replaces characters within text REPLACEB REPT Repeats text a given number of times RIGHT RIGHTB Returns the rightmost characters from a text value 130 SEARCH SEARCHB SUBSTITUTE T TEXT TRIM UPPER VALUE Finds one text value within another not case sensitive Substitutes new text for old text in a text string Converts its arguments to text Formats a number and converts it to text Removes spaces from text Converts text to uppercase Converts a text argument to a number Financial functions Returns the accrued interest for a security that pays periodic interest ACCRINT ACCRINTM AMORDEGRC AMORLINC COUPDAYBS COUPDAYS COUPDAYSNC COUPNCD COUPNUM COUPPCD CUMIPMT CUMPRINC DB DDB DISC Re
112. the Manage Data button on the Statistician ribbon and then click the Import Data Variables button on the Manage Data form An inputbox Select a Range will appear see figure 1 Highlight the data range which contains the Data Name and Data Observations and then click the OK button Figure 1 Select a Range EA SAS1 When importing a large set of Data Variables with many Data Observations the user can more easily select the import range with the use of the lt Shift gt and lt End gt keyboard keys Firstly select the top and left most cell of the import range Holding down the lt Shift gt key and then pressing the lt End gt key followed by either lt l gt or lt gt gt will highlight the right most and bottom most cells in a contiguous range of cells Then click the OK button to import the Data Variables The Data Variables listbox will then display the name of each of the Data Variables The range of numbers in the square brackets before each of the Data Variable names in the listbox indicates the range of the lowest and highest numeric Data Observations in each column If any non numeric Data Observation lies within this range then an asterisk will appear in the range between the square brackets 7 For example if a spreadsheet containing the data in figure 2 is imported Figure 2 Year GDP TB6 M2 GPI FEDEXP 1970 6 562 626 4 436 2 198 6 1971 4 511 710 1 216 6 1972 4 466 802 1 240 1973 4123
113. tical package All output from Statistician is sent to a spreadsheet This output is second to none in terms of clarity for the end user Rather than simply reporting a few cryptic numbers which are the result of a statistical test Statistician outputs a meaningful report of a statistical test clearly stating the null and alternative hypotheses alphas test statistics critical values p values the decision rule conclusion and any other relevant factors The engine behind Statistician is Microsoft Visual Studio a fast and powerful development environment that integrates seamlessly into the suite of Microsoft Office products The use Microsoft Visual Studio as the development tool offers the end user processing speeds that Excel VBA could not achieve This guarantees that the software underpinning Statistician will have a very long shelf life Dr Bernard Bollen PhD in Econometrics Monash University March 2012 2 Table of Contents CEMA EE r 2 Section 1 Manage Datas ee eeeg d e dee Ee che Gate i ened lees dee ey an AE Eege 6 ea BR E EE 7 b Importing Data Variables 0 cece ccc cece cece eee e ence ee eee eee ene e ee eens a aa a eaa 7 c Selecting Data Variables hiirer cc cccccccc cece cece TE EEE EAE EEEE REELE 9 d Exporting Data Variables ccccccscescscceceessescseceeseasenceceseesaseeeecsecsesaneasenecsesenees 9 ec Removing Data Variables ienr oaeee e cece eee e ence ee ee ee ee ene ee
114. timate of the parameter variance covariance matrix corrects for heteroscedasticity in the residuals and is given as Swhite Z N A N n k where n a RS Q EI XiXj i 1 and x is the i row of X The Newey West 1987 estimate of the parameter variance covariance matrix corrects for both autocorrelation and heteroscedasticity in the residuals and is given as e n 10 1 1 Z Newey West ap LN SI UI where n V aa A A H 1 qt K E ES Xj_ yei_veiX i v 1 n q a2 ai v 1 i 1 n 2 9 Z When White 1980 or Newey West 1987 standard errors are selected the t ratios and p values of the parameter estimates are adjusted accordingly and q is set equal to largest integer lower than d The regression R and adjusted R R are defined as 5 _ SSR TSS _ n 1 R 1 1 R 2 ora respectively The log likelihood of the regression is given as n SSR ER log 2m log The Akaike and Schwarz information criterion are given as 91 2l 2k AIC n n 2l kxlog k SIC k x log k n respectively The F statistic is given as R k 1 Troes E and has an F distribution with k 1 degrees of freedom in the numerator and n k degrees of freedom in the denominator The probability of F is the probability that all coefficients are equal to zero The Skewness statistic is a measure of the skewness of the e
115. tion is denoted by rs It is calculated as follows The observations in the sample are sorted in ascending order and 1 is assigned to the number with the lowest value 2 is assigned to the number with the next lowest value and so on until n is assigned to the observation with the highest value When no tied ranks are present in the data the Spearman rank correlation coefficient can be calculated as 18 L d s n n 1 where d is the difference between the rank of adjacent observations on random variables x and y that is di R x Ru where R x and R y are the ranks of oberservation x and y respectively When no tied ranks are present in the data the above formulation of the Spearman rank correlation coefficient is equivalent to calculating the Pearson product moment correlation between the ranks of the two random variables If there are tied ranks present in the data the above formulation of the Spearman rank correlation coefficient will inflate the estimate of r One method to correctly calculate the Spearman rank correlation coefficient when tied ranks are present in the random variables is to a rank the observations on both random variables and then b for a set of observations in a random variable with equal value assign a tied rank which is the average of the corresponding raw ranks The Pearson product moment correlation coefficient is then calculated on the tie corrected rankings An alternat
116. tribution 51 a One Sample Statistician Variance Test One Sample Hypothesis Test a 0 05 Hypothesized Null Hypothesis Variance Variance of Select a Data Variable v v Alternative Hypothesis Variance of y Output Confidence Interval Significance O1 5 10 Significant Digits 2 v b e irch C Other Format Output Output Results Close The One Sample menu item has two primary functions a To test the hypothesis that the population variance o of a selected Data Variable is equal to greater than or equal to or less than or equal to a hypothesized value os b To produce a confidence interval for the variance of a Data Variable The following sets of null and alternative hypotheses can be tested a Ho 0 o H 07 of two tailed test or H 0 gt o or H 0 lt of b Ho oi gt of H 0 lt a6 c Ho 0o lt of E gt o 52 41 e2 The test statistic is given as T HE where nis the number of observations in the 0 Data Variable s is the sample variance of the Data Variable and o is the hypothesized variance of the Data Variable The T statistic has a y distribution with n 1 degrees of freedom If the alternate hypothesis is H 0 lt d then the null hypothesis is rejected if T lt Recent where a is the level of significance and n 1 is the degrees of freedom If the alternate hypothesis is H1 o gt
117. ts of null and alternative hypotheses can be tested a Ho u v H u v two tailed test or H u gt vor H u lt v b Ho u 2 v Hy u lt v c Hou lt v Hu gt v The test statistic z for a test with a known standard deviation is given as EE a vn where xis the sample mean wis the hypothesized mean is the known population standard deviation and n is the number of observations in the Data Variable The test statistic t for a test with an unknown population standard deviation is given as E Sdvn where s is the sample standard deviation The critical values for a z test are taken from a standard normal distribution and for a t test are taken from a student t distribution For a two tailed test the critical value for a test with a known standard deviation is given as Z 2 and the critical value for a two tailed test calculated with an unknown standard deviation is ty z 1 where n 1 is the degrees of freedom of the t distribution For a one tailed test the critical value for a test with a known standard deviation is given as Z and the critical value for a one tailed test calculated with an unknown standard deviation is ty _ The reported p value is the probability of rejecting the null hypothesis when it is true 31 When the standard deviation of the population is known the null hypothesis is rejected if Alternative Hypothesis Rejection Criterion Ay wstv z gt Se
118. turns the accrued interest for a security that pays interest at maturity Returns the depreciation for each accounting period by using a depreciation coefficient Returns the depreciation for each accounting period Returns the number of days from the beginning of the coupon period to the settlement date Returns the number of days in the coupon period that contains the settlement date Returns the number of days from the settlement date to the next coupon date Returns the next coupon date after the settlement date Returns the number of coupons payable between the settlement date and maturity date Returns the previous coupon date before the settlement date Returns the cumulative interest paid between two periods Returns the cumulative principal paid on a loan between two periods Returns the depreciation of an asset for a specified period by using the fixed declining balance method Returns the depreciation of an asset for a specified period by using the double declining balance method or some other method that you specify Returns the discount rate for a security 131 DOLLARDE DOLLARFR DURATION EFFECT FV FVSCHEDULE INTRATE IPMT IRR ISPMT MDURATION MIRR NOMINAL NPER NPV ODDFPRICE ODDFYIELD ODDLPRICE ODDLYIELD PMT PPMT PRICE PRICEDISC PRICEMAT Converts a dollar price expressed as a fraction into a dollar price expressed as a decimal number Converts
119. umber of columns in a reference Looks in the top row of an array and returns the value of the indicated cell Creates a shortcut or jump that opens a document stored on a network server an intranet or the Internet Uses an index to choose a value from a reference or array Returns a reference indicated by a text value Looks up values in a vector or array Looks up values in a reference or array Returns a reference offset from a given reference Returns the row number of a reference 135 ROWS RTD TRANSPOSE VLOOKUP Returns the number of rows in a reference Retrieves real time data from a program that supports COM automation Automation A way to work with an application s objects from another application or development tool Formerly called OLE Automation Automation is an industry standard and a feature of the Component Object Model COM Returns the transpose of an array Looks in the first column of an array and moves across the row to return the value of a cell Database functions DAVERAGE DCOUNT DCOUNTA DGET DMAX DMIN DPRODUCT DSTDEV DSTDEVP DSUM DVAR DVARP Returns the average of selected database entries Counts the cells that contain numbers in a database Counts nonblank cells in a database Extracts from a database a single record that matches the specified criteria Returns the maximum value from selected database entries Returns the minimum value from selected database entr
120. values for the Kendall tau correlation coefficient where 4 lt n lt 60 For a one tailed test significance values are a 0 01 0 05 0 025 0 01 0 005 0 001 and for a two tailed test significance values are a 0 2 0 1 0 05 0 02 0 01 0 005 25 e S ort and Rank The Sort and Rank form is selected by clicking the Sort and Rank menu item from the Tools button on the Statistician ribbon tab Statistician Sort and Rank Data Variable Select a Data Variable v Output v v lv v vY Sort Ascending Sort Decending Ranks Average of Tied Ranks wl Format Output Percentiles Output Results The Sort and Rank form allows to user to output the observations in the selected Data Variable in various forms By checking the corresponding check box the user can output 1 The Data Variable sorted in ascending order 2 The Data Variable sorted in descending order 3 The rank of each observation in the Data Variable 4 The tied rank of each observation in the Data Variable where all ranks with equal observations are assigned a tied rank which is the average of the rank of all equal observations 5 The percentile of each observation in the Data Variable The percentile of an observation is calculated by counting the number of observations with magnitude greater than the observation Number Above and counting the number of obse
121. variance Matrix Use All Observations COENA _ User Defined Pearson From Spearman to Kendall tau a Kendall tau b Output Correlogram Table Select All Variables DeSelect All Variables v Format Output Output Results Manage Data Close The Covariance and Correlation form creates a variance covariance matrix or correlation matrix of two or more selected Data Variables 17 The statistics that are reported are defined as follows Covariance The sample covariance between two Data Variables x and y is given as 1 n 1 Cyy KEE i 1 where x and y denote the mean of random variables x and y Pearson Product Moment Correlation Coefficient The correlation between two variables often referred to as the Pearson product moment correlation coefficient is a measure of association and is in the range 1 1 The sample correlation coefficient between two random variables is given as Coy SxSy GI where s and s are the sample standard deviations of random variables x and y respectively The test statistic to determine the significance of ra under the null hypothesis that the population correlation Pxy is equal to zero is given as p TyVn 2 1 r2 This test statistic has a student t distribution with n 2 degrees of freedom Spearman Rank Correlation Coefficient The Spearman rank correlation coefficient of a sample of observations from a popula
122. vations in treatment i The technique of ANOVA involves testing a null and alternative hypothesis of the form Ho My H2 Ha H at least two of the means are not equal The total number of observations in all treatments is given as a n n i 1 The sample mean of treatment i is given as ni 1 Xi Xij t nhl j 1 where x j is the jt observation in treatment i The sample mean of all treatments is given as The sample variance of observations within each treatment is given as a Cs xi 2 r n 1 S The total sum of squares SS Total treatment sum of squares SST and the error sum of squares SSE are given as 80 a ni SS Total gt gt Gu i 1 j 1 a SST GE nj X x i 1 a ni SSE ye 1 s Gs x i 1 i 1 Tel The three sums of squares are related by SS Total SST SSE The mean square for treatments MST and the mean square for error MSE are given as SST MST 1 SSE MSE To test the null hypothesis the test statistic is defined as MST msg This ratio is F distributed with a 1 degrees of freedom in the numerator and n a degrees of freedom in the denominator The user can test the hypothesis that any pairs of treatments have significantly different means Three tests are available the Fisher least significant difference LSD test the Tukey Kramer test and the Scheffe test Critical values for these te
123. ved number of success s for the Th outcome The expected number of successes under the null hypothesis is given as er np The test statistic is given as k 0 e E i 1 f The test statistic has a y distribution with k 1 degrees of freedom If the test statistic is greater than the critical value then the null hypothesis is rejected 75 Using Statistician Multinomial Experiment Statistician Multinomial Experiment Data Variable l v Significance Om mb O10 Other Format Output Output Results Manage Data Close The Non Parametric Tests Multinomial Experiment form is selected by clicking the Multinomial button in the Chi 2 Tests groupbox on the Statistician ribbon The user selects the Data Variable for analysis from the Data Variable combobox Each outcome of the experiment is an observation in the selected Data Variable Statistician outputs the null and alternative hypothesis number of outcomes expected outcome test statistic critical value significance p value decision rule and conclusion of the test 76 b Contingency Tables A contingency table is used to test if evidence exists to infer that two nominal variables are related The method involves classifying data according to two different criteria The null and alternative hypotheses to be tested are Hu The two variables are independent H The two variables are dependent The test is initially se
124. ven criteria Returns the average arithmetic mean of all cells that meet multiple criteria Returns the beta cumulative distribution function Returns the inverse of the cumulative distribution function for a specified beta distribution Returns the individual term binomial distribution probability Returns the one tailed probability of the chi squared distribution Returns the inverse of the one tailed probability of the chi squared distribution Returns the test for independence Returns the confidence interval for a population mean Returns the correlation coefficient between two data sets Counts how many numbers are in the list of arguments Counts how many values are in the list of arguments Counts the number of blank cells within a range Counts the number of cells within a range that meet the given criteria Counts the number of cells within a range that meet multiple criteria Returns covariance the average of the products of paired deviations Returns the smallest value for which the cumulative binomial distribution is less than or equal to a criterion value Returns the sum of squares of deviations Returns the exponential distribution 124 FDIST FINV FISHER FISHERINV FORECAST FREQUENCY FTEST GAMMADIST GAMMAINV GAMMALN GEOMEAN GROWTH HARMEAN HYPGEOMDIST INTERCEPT KURT LARGE LINEST LOGEST LOGINV LOGNORMDIST MAX MAXA MEDIAN MIN MINA MODE NEGBINOMDIST NORMDIST NORMINV Return
125. west numbers If there are an odd number of observations the middle observation is discarded and the two groups of highest and lowest number are then formed The median of the lowest set of numbers is then subtracted from the median of the highest set of numbers to give the inter quartile range 14 Standard Deviation Sample A measure of the spread of the population based upon a sample of observations It is defined as Standard Deviation Population A measure of the spread of the population based upon all population observations It is defined as Variance Sample A measure of the spread of the population based upon a sample of observations It is defined as 1 n EM x i 1 Variance Population A measure of the spread of the population based upon all population observations It is defined as ix ER oui o x X i 1 Sum of Squares The sum of the squares of all observations It is defined as n ss 3 i 1 Mean Square The average of the sum of squares It is defined as SS MS n Root Mean Square The square root of the average of the sum of squares It is defined as RMS VMS 15 Mean Absolute Deviation The average of absolute deviations from the mean It is defined as 1 n MAD SY Ix x n4 i 1 where x is the mean of x Skewness A measure of the magnitude of observations in the tails of the distribution of obse
126. xes are calculated by initially smoothing the data Data can be smoothed by using any method discussed previously Steps are as follows Step 1 Remove the effect of seasonal and random variations This is done by creating a new series with a lagged or centered moving average process on the original series where the length of the moving average interval is set to the number of seasons Alternately a linear quadratic exponential or autoregressive trend model can be estimated from the original series The fitted values from the estimated trend model is taken to be the new smoothed series The new smoothed series Smoothed has only trend and cyclical components Thus for the additive model Smoothed T C and for the multiplicative model Smoothed T X C Step 2 For the additive take the difference of the original series and the smoothed series yt Smoothed S R For the multiplicative take the ratio of the original series and Yt Smoothedt that contains only seasonal and random components Denote the new additive series Yt as At yz Smoothed S Re and the new multiplicative series as M SSES Se X Ry the moving average series S X Ri In both cases we are left with a new series Step 3 In each season calculate the average of A or M The result is the unadjusted seasonal index in each season This averaging removes most but not all of the random effects Step 4 The unadjusted seasonal indexes der
127. xtbox and can equal a 0 1 0 05 0 01 Wilcoxon Signed Rank Sum Critical values for the Wilcoxon signed rank sum test where 5 lt n lt 30 For a one tailed test significance values are o 0 05 0 025 0 01 0 005 and for a two tailed test significance values are a 0 1 0 05 0 02 0 01 Runs Lower Critical Values Lower critical values for the Runs test where 2 lt ny nz lt 20 and the significance value is 0 05 Runs Upper Critical Values Upper critical values for the Runs test where 2 lt ny nz lt 20 and the significance value is 0 05 24 15 16 17 18 Spearman Rho p Critical values for the Spearman rho correlation coefficient where 5 lt n lt 30 For a one tailed test significance values are a 0 05 0 025 0 01 0 005 and for a two tailed test significance values are a 0 1 0 05 0 02 0 01 Studentized Range Critical values for the Studentized range distribution for a given number of degrees of freedom and a given number of treatments k 2 20 The significance level of the table is selected from the Alpha textbox and can equal 0 1 0 05 0 01 Durbin Watson Upper and lower critical for the Durbin Watson test for a given number observations in the regression and for a given number of regressors The significance level of the table is selected from the Alpha textbox and can equal a 0 05 0 01 Kendall tau t Critical
128. y dependent variable models have the dependent variable assigned the value of zero 0 or one 1 Let y be the i binary observation on the dependent variable where 1 lt is lt n let Y be a n x 1 vector of the dependent variable observations let xj be the th observation on the jt independent variable where 1 lt j lt k let X be a nxk matrix of the independent variable observations let x be a 1 x k vector which is the th row of X let f be the ih coefficient of the i independent variable that is to be estimated let 6 be a k x 1 vector of independent variable coefficients and is the it residual or error or disturbance term For the Probit model the function F as specified as X B en Jare o p where is the standard normal distribution function and is the cumulative standard normal distribution function For the Logit model the function F as specified as T ei B F X B T exh The parameters of the Probit and Logit models are estimated by maximising a log likelihood function of the form rw 3 buln F 0 ist tel i 1 The parameter estimates which maximise the log likelihood function are held in a k x 1 vector denoted by b For the Logit model the Hessian matrix H is given as n H Al Ad i 1 eil a Ae where A LF For the Probit model the Hessian matrix H is given as n H ES A Ai xj B xix i l 97 where A wee and q
129. ying no assumptions about the true population difference in proportions then the required sample size is given by the lowest integer greater than 0 52672 w2 where w is half of the required confidence interval width Using Statistician Test Difference in Population Proportions The Data Variables to be tested are selected from the Null Hypothesis combobox in the Hypothesis Test groupbox The hypothesized difference in the proportion of successes in the Data Variables is entered in the Hypothesized Difference textbox in the Hypothesis Test groupbox The null hypothesis 2 lt and alternative hypothesis lt gt are selected in the corresponding comboboxes in the Hypothesis Test groupbox All Data Variables tested must be binary variables that is they have one of two distinct values referred to as success of failure These distinct values may be numeric or text The symbols for success and failure are generated by Statistician after analyzing the selected Data Variables Both Data Variables must use the same symbols for success and failure Statistician enters in the symbols for success and failure in the Success and Failure labels in the Symbol groupbox Clicking the Switch Symbols button switches the symbols for success and failure 49 When the Output Results button is clicked Statistician outputs the e null hypothesis e alternative hypothesis e sample size of each Data Variable Di and nz
130. ypothesis Test a 0 05 Hypothesized Null Hypothesis Difference Proportion of Select a Data Variable v 7 Proportion of Select a Data Variable wll ell Alternative Hypothesis Proportion of Proportion of vijn Confidence Interval Symbols Success Output a 95 Confidence Interval for the Difference in Proportions S Failure Output the Sample Size to Obtain a aithe Te Confidence Interval Estimate of the Ri cen i Significance Difference in Proportions to Within Proportions Om 5 O10 Significant Digits 3 v Other V Format Output Output Results Cse The Test Difference in Population Proportions form has two primary functions a To test the hypothesis that the difference in the proportion of successes in two selected Data Variables m m2 is equal to greater than or equal to or less than or equal to a hypothesized value pp b To produce a confidence interval for the difference in the proportion of successes of two selected Data Variables The following set of null and alternative hypotheses can be tested a Ho T1 T2 Pp H1 11 T2 Pp two tailed test or H1 T1 12 gt Pp OTF H1 T1 T2 lt Pp b Ho T1 T2 2 Pp Ay c Ho H T e Pp T T2 S Pp T1 T2 gt Pp 47 x A x A x Assume the proportion of successes in each variable is denoted by p on and pz 1 2 where x and x are the number of successes
131. ys periodic interest Returns the price per 100 face value of a discounted security Returns the price per 100 face value of a security that pays interest at maturity 132 PV RATE RECEIVED SLN SYD TBILLEQ TBILLPRICE TBILLYIELD VDB XIRR XNPV YIELD YIELDDISC YIELDMAT Returns the present value of an investment Returns the interest rate per period of an annuity Returns the amount received at maturity for a fully invested security Returns the straight line depreciation of an asset for one period Returns the sum of years digits depreciation of an asset for a specified period Returns the bond equivalent yield for a Treasury bill Returns the price per 100 face value for a Treasury bill Returns the yield for a Treasury bill Returns the depreciation of an asset for a specified or partial period by using a declining balance method Returns the internal rate of return for a schedule of cash flows that is not necessarily periodic Returns the net present value for a schedule of cash flows that is not necessarily periodic Returns the yield on a security that pays periodic interest Returns the annual yield for a discounted security for example a Treasury bill Returns the annual yield of a security that pays interest at maturity Date and time functions DATE DATEVALUE DAY DAYS360 EDATE EOMONTH HOUR MINUTE MONTH Returns the serial number of a particular date Converts a date in t
132. z lt Z and A lt A For a one tailed test with alternative hypothesis H L gt L the null hypothesis is rejected if z gt Z and A gt Ap 64 d Wilcoxon Rank Sum Test The Wilcoxon Rank Sum test tests if the location of a sample is significantly different from the location of another sample when both of the samples are independent Let n and nz be the sample size of the observations drawn from both populations and let n n nz Also assign the sample with the least number of observations as the first sample so that n lt n2 The data from both samples are pooled and then ranked from 1 to n Observations with a tied rank are assigned a rank which is the average of the corresponding raw ranks Define T as the sum of the ranks from the first sample and define T as the sum of the ranks from the second sample The Wilcoxon Rank Sum test is conducted differently for small and large samples Let a small sample be defined as the case where n lt 10 and n lt 10 In the small sample case the test statistic is T For a two tailed test if T if is greater than an upper critical value Ty or less than a lower critical value T then the null hypothesis is rejected for a given significance level ol For a one tailed test where the alternative hypothesis is H T gt T2 the null hypothesis is rejected if T gt Ty For a one tailed test where the alternative hypothesis is H T lt T2 the null hypothe

User Manual for - Statistician

Contents

Download Pdf Manuals

Related Search

Related Contents