Home

A Programming Language for Building Likelihood Models Version 2.1

1. The DATA statement is used in conjunction with the DATA function Within a MODEL statement you can use the pata function to evaluate the likelihood one observation at a time Do not be confused by the fact that there is both a DATA statement and a pata function They complement each other Simply remember that a DATA statement is used as a statement and there is typically one such 15 mle 2 1 manual statement per program The DATA function can only be used as part of an expression typically only within the likelihood expression of a MODEL statement Model Statement MODEL The MODEL RUN END statement defines the underlying probability model used by mle defines the parameters to be found for the model and defines constraints under which parameters are to be estimated Only an overview of the MODEL statement is given here An entire chapter is devoted to the MODEL statement including some details for specifying likelihoods The basic structure of the MODEL statement looks like this lt expression gt RUN lt run specifications gt END Between MODEL and RUN is a single expression that is the likelihood Within the likelihood is one or more PARAM END functions These define the parameters whose values will be found so that the likelihood is maximized One of the most important aspects of learning mle is the design and construction of the expression for the likelihood
2. A list of lt run specifications gt is given between the RUN and the END part of the MODEL statement this provides a way of evaluating the full model as well as a series of nested or reduced models If all of the parameters defined by PARAM END functions are to be found a simple FULL command is placed between the RUN and its matching END Reduced models where one or more parameters are constrained to a constant or another parameter are specified as REDUCE followed with a list of one or more reductions For example you might constrain a parameter called mean to be zero and only allow the parameter called stdev to be found Then you would put REDUCE mean 0 between the RUN and the END Any number of REDUCE commands along with one FULL can be used in a single model The various forms of the model will be evaluated in turn Intrinsic Procedures SEED 9734 HALT Intrinsic procedures are predefined single word statements that perform a specific task on a list of zero or more arguments When called a procedure executes a series of actions using the arguments Procedures do not return a value the way a function does For example the statement DATAFILE hammes dat found in the earlier example defines and opens the file used by the DATA statement A list of all procedures with examples can be found in a later chapter Here are some example procedure statements Seeds the
3. e Multidimensional arrays are supported for all types Subscripted values are accessed as for example z i j k Arrays are declared as a REAL 1 TO 5 1 TO 1 0 Declare and initialize matrix a b INTEGER 4 TO 4 O TO 1 Declare but no assignment e A new DERIVATIVE function numerically finds the value of a derivative at a specified point along some function For example DERIVATIVE x 2 3 x 2 2 x 4 END which is the derivative of 3x 2x 4 evaluated at x 2 returns 14 0 e The new FINDMIN function finds the value that minimizes a bounded function An example is FINDMIN x 0 2 PI COS x END which finds a minimum of the function cosine x between O and 27 It returns 3 1415925395570 Wis an exact solution The accuracy of the solution may be specified as a third argument within the parenthesis e The new FINDZERO function finds the value of an argument for which the function goes to zero An example is FINDZERO x 0 PI COS x END which finds a value of x for which cosine x is zero It returns 1 5707963267949 which is close to the exact solution of 1 2 The accuracy of the solution may be specified e An important syntactical change is that every PARAM function must have a matching END 25 mle 2 1 manual e The default Form for the PARAM function is NUMBER if no covariates are specified and LOGLIN if one or more covariates are specified e The covar s
4. 4 RR Ro 3 4 5 Some terminal types allow device specific options to be included after the name of the terminal For example set terminal dumb 80 60 would set the size of the previous plot to 80 characters across by 60 characters high Information on specific device options is available in the Gnuplot manual Here is a synopsis of some commonly used terminal devices O set terminal dumb lt xsize gt lt ysize gt for dumb terminals and printers see the previous example set terminal epson for printing bit mapped graphics to an Epson printer set terminal gpic for generating TEX output for use with the gpic groff package from the Free Software Foundation O set terminal hpljii lt resolution gt for printing to an Hewlett Packard LaserJet II printer The lt resolution gt is 75 100 150 or 300 0 set terminal hpdj lt resolution gt for printing to an Hewlett Packard Deskjet printer The lt resolution gt 1S 75 100 150 or 300 0 set terminal latex lt font gt lt size gt for generating TEX output for use with LaTeX and EMTexX 0 set terminal pcl5 lt mode gt lt font gt lt size gt for printing to an Hewlett Packard HGPL 2 printer or plotter set terminal postscript for printing to a postscript printer or device There are a number of mode color and font options for this device 0 set terminal table for printing a table of values as an ASCII text file instead of a graph 0 set terminal windows
5. 6 a9 de ak 6 vil 64 60 EY 61 54 77 81 93 93 51 76 96 77 93 95 54 99 This example shows how linear regression is treated within the framework of likelihood models The linear regression model with n covariates specifies that the value of the ith observation is a combination of a y intercept term an additive covariate parameter term xj x2B2 xmB plus an error e Furthermore distribution among all error terms is normally distributed with a mean of zero and a standard deviation of The formal specification is y 0 XB Xi2 2 Fon E XinBn ej e NO 0 Under the likelihood model the equivalent specification can be given in a very different format Y F 4 0 u a xB XB bet iha The difference in the two specifications exemplifies the two different philosophies in the methods Under regression difference between each observation and the line defined by parameters and covariates is treated as error Under the likelihood model the observations are normally distributed with a mean that is determined by a series of covariates The data for this example are fictitious The third column contains the values of y column is x and xn The following shows the output from a regression analysis 119 Statistical examples VARIABLE MEAN STD DEVIATION COEF VARIAT Indept Variable 76 17647059 16 63293154 0 21834736 Depent Variable 11 07058824 9 74453467 0 88021833 De
6. FORM suMLL This takes the log of each individual likelihood and sums the loglikelihoods over the data A likelihood rather than a loglikelihood is specified for lt expression gt This is the default value if no lt formtype gt is specified FORM SUMOr FORM SUMMATION Sums loglikelihoods over the data without first taking the log This is used when a loglikelihood is specified rather than a likelihood for lt expression gt FORM PROD Or FORM PRODUCT Takes the product of likelihoods over the data and does not take the log of the likelihood This is used when a likelihood rather than a loglikelihood is specified for lt expression gt and some function appears outside the data function that takes the log Here are three models that yield the same overall likelihood function but uses different forms for the pata function 67 mle 2 1 manual MODEL DATA FORM SUMLL PDF NORMAL topen tclose PARAM mu LOW 10 HIGH PARAM sigma LOW 0 0001 HIGH END pdf END data RUN FULL END model MODEL DATA FORM SUM LN PDF NORMAL topen tclose PARAM mu LOW 10 HIGH PARAM sigma LOW 0 0001 HIGH END pdf the default form 100 START 10 START 100 START 10 START END data RUN FULL END model MODEL LN DATA FORM PRODUCT PDF NORMAL topen tclose PARAM mu LOW 10 HIGH PARAM sigma LOW 0 0001 HIGH END pdf END data RUN FULL END model 100 START 10 START In theory these
7. MLE TITLE Example DATAFILE ex7 dat OUTFILE ex7 out DATA talpha FIELD 1 Left truncation time topen FIELD 2 time last known alive tclose FIELD 2 time first known dead or oo if censored END tomega 5 0 MODEL DATA PDF GOMPERTZ topen tclose talpha tomega PARAM al LOW 0 00001 HIGH 0 5 START 0 01 END PARAM bl LOW 0 01 HIGH 2 START 0 1 END END of the PDF END RUN FULL END of the MODEL END Survival analysis Accelerated failure time model Frequently one is interested in modeling the effects of covariates on the time to failure A common model of this type is call the accelerated failure time model AFT in which covariates shift the time to failure to the right or the left mle has a general mechanism for modeling the effects of covariates on any parameter that is defined so that accelerated failure time models can be easily constructed In this example the mean of a normal distribution has two covariates that shift the failure time MLE TITLE Example DATAFILE ex8 dat OUTFILE ex8 out DATA topen Last observation time prior to the event tclose First observation time after the event weight the first covariate age the second covariate END MODEL DATA PDF NORMAL topen tclose PARAM mu LOW 0 00001 HIGH 100 START 25 FORM LOGLIN COVAR weight PARAM b_weight LOW 20 HIGH 20 START 0 END COVAR age PARAM b_age LOW 20 HIGH 20 START 0 END END param mu PARAM s LOW 0 0
8. e Copy the file to a subdirectory say mle e Uncompress the archive with the command uncompress mle 2 1 11 1inux i386 tar Z e Extract everything from the archive with the command tar xvf mle 2 1 11 1inux i386 tar e Make sure you have permission to execute the program Type chmod u x mle e The directory now contains the executable mle example programs etc At this point you can run programs from within the directory You can add the directory to your PATH so that you can execute the program from anywhere 31 mle 2 1 manual Alternatively you can move the executable program to a directory in your path For example mv mle bin Windows Find the current release of the mle setup and installation program The current release might be called mle_2_1_15 setup exe Note that there are versions with and without the mle documentation The versions should be apparent from the file names Here are the steps for installation The easiest way to install mle is to open the setup program via a web browser Windows will in effect execute the install the package Alternatively you can download the setup program to any directory and then run the program from a DOS window or using the Start gt Run command e The setup program will walk you through a number of steps for installation If you are not an administrator or power user on the computer you will want to change the location where the program is installed fr
9. mle 2 1 manual Report with no standard error or confidence intervals At times it is desirable to print parameter values without standard errors or confidence intervals This can be done by including the assignment PRINT_PARAMS TRUE This will print out an additional report with parameter estimates Additionally PRINT_SE and PRINT_C1 can be set to FALSE So that neither confidence intervals nor the variance covariance matrix are computed When the variable PRINT_SHORT TRUE the report format is modified so that all parameters estimates are printed on one line Printing Distributions The values of all survival function the probability density function and the hazard function can be tabulated for each PDF function in the likelihood To do so set PRINT_DISTS TRUE All distributions that are in the model will be tabulated The tabulation starts at value DIST_T_START ends at the value DIST_T_END and is tabulated for DIST_T_N equally spaced points The mean values of data variables e g covariates are used when computing the distributions For example to print the SDF PDF and hazard function at 101 points from 0 to 100 use the following code PRINT_DISTS DIST_T_START print out distributions lowest value to print highest value to print number of points to print DIST_T_END DIST_T_N Other Printing Options The MIN_SIGNIFICANT variable controls the minimum number of significant digits in
10. Here is an example of a simple model for finding the two parameters of a normal distribution from a series of interval censored observations Suppose there are N interval censored observations The interval in which events occur fall between 57 mle 2 1 manual the times topen and fo ose The goal is to estimate the parameters u and of the normal distribution we will use mu and sigma as parameter names The likelihood needed for this problem looks like this N L 5e 16S Slao 111 0 i l where S t is the survival function for a normal distribution The mle program for this likelihood looks like this MODEL DATA PDF NORMAL topen tclose PARAM mu LOW 5 HIGH PARAM sigma LOW 0 1 HIGH END pdf END data RUN FULL 0 END Everything beginning with the pata function on line 2 to the END on line 7 is a single expression that defines the likelihood The DATA function corresponds to the product in the likelihood It loops through all data and evaluates the expression nested within it for each observation The expression PDF NORMAL topen tclose END defines the area under a normal distribution in the interval topen tclose Finally the PARA functions tell mle that mu and sigma are the parameters in the model that are to be changed in pursuit of maximizing the likelihood Values for the parameters mu and sigma will be tried until those that maximize this likelihood are found The word FULL between
11. etn I coa SUIS h t In survival analysis for example the likelihood for an exact failure time is given by the value of the PDF at the exact point of failure For a right censored observation the likelihood is given by summing up integrating all possible PDF values from the last observation time until the maximum possible time The likelihood for a cross sectional responder is the integral from zero to the time of first observation Table 5 lists the likelihoods that result from the four time variables for different conditions For example when 1 f or when only one time variable is specified mle returns the density at t This is the desired likelihood for an exact failure Likelihoods for right and interval censored observations with and without left and right truncation are given in Table 5 73 mle 2 1 manual The Hazard Parameter For most parametric distributions like the normal or lognormal distributions the hazard function does not take on a simple or closed form For this reason most studies have modeled the covariates as acting on the failure time for these distributions Nevertheless there is no inherent reason why hazards models cannot be constructed using distributions without a closed form for the hazards functions Most of the PDFs included in mle provide a general mechanism for covariates to be modeled as affecting the hazard of failure rather than or in addition to affecting intrinsic parameters Her
12. fout FILE SEED CLOCKSEED OPENWRITE fout kids dat FOR cid 1 TO nkids DO age QUANTILE NORMAL RAND mu sig END WRITELN FOUT cid age END CLOSE fout END A less simple simulation program Rather than just simulating a data set this program creates multiple data sets and also does analyses of each data set This simulation program deals with aspects of study design study length censoring and duration between prospective follow ups as well as the underlying parametric model The last segment of the program computes some summary statistics for the repeated estimates of the model parameters 156 Programming tutorial MLE This program does 4 things 1 It creates data sets each with a single variable age and observations of age The observations are drawn from a normal distribution 2 It fits a model Normal to each data set It simulates aspects of the study observation children are initially recruited from ages minrage to maxrage months of age uniformly distributed Children are visited every obswidth months for studylength months censorprob of children drop out between mincensor and maxcensor months computes the mean and standard deviation the repeated parameter estimates OUTFILE DEFAULTOUTNAME seed the random number generator s CLOCKSEED SEED s PRINTLN Clock seeded with s SEs must be computed with the alternative method because we are not using a DAT
13. including IF lt bexpr gt THEN lt statements gt ELSEIF ELSE lt statements gt END FOR lt v gt lt expr gt TO lt expr gt DO lt statements gt END BEGIN lt statements gt END WHILE lt bexpr gt DO lt statements gt END REPEAT lt statements gt UNTIL lt bexpr gt BREAK exits the current WHILE REPEAT FOR loop or BEGIN END block CONTINUE Skips to the next iteration of a WHILE REPEAT or FOR loop 0 A new QUANTILE function returns the value that gives the qth quantile of any of the predefined pdfs For example the median where g 0 5 can be found for the RANDOMWALK pdf with arguments 2 and 3 as QUANTILE RANDOMWALK 0 5 2 3 END It returns 7 4595847118228 The function uses algebraic solutions for many pdfs When no closed for solution is known an iterative solution is found 26 mle 2 1 manual e Fundamental physical constants have been updated to the most recent recommend values provided in Mohr and Taylor 1999 e Strings can be delimited by either or except that a one character sequence using is a character constant Converting Version 1 Programs to Version 2 Programs written in earlier versions of mle can be converted into later versions without much difficulty The most important things to change are given below e Change all INFILE mydata dat statements to DATAFILE mydata dat procedure calls e Change all OUTFILE results out statements to OUTFIL
14. mle 2 1 manual What is a Statement Every program begins with the word MLE and ends with the matching word END Any text after the final END is ignored Between the MLE and its matching END comes the body of an mle program as a series of statements The most basic outline of an mle program looks like this MLE lt Statement 1 gt lt Statement 2 gt lt Statement 3 gt A statement is a single complete instruction When a program is run each statement is executed in turn Here are some things statements do e Print messages to the screen WRITELN statement e Create data sets DATA statement e Find maximum likelihood estimates MODEL statement e Define variables assignment statement e Assign or change the value of a variable assignment statement e Define a data file a call to the DATAFILE procedure e Loop through a series of statements FOR WHILE Or REPEAT statements e Conditionally execute one series of statements over of another 1F statement e Create user defined procedures or functions PROCE DURE Or FUNCTION statements e Call a user defined procedure procedure call Each type of statement is briefly discussed below Assignment Statement Assignment statement serves two purposes The primary purpose is to assign values to variables A secondary purpose is to define new variables A great number of pre defined variables are available to chan
15. ot left right 160 Programming tutorial CLOSE fout END for s END mle 161 References 162 References References Abramowitz M and Stegun IA eds 1972 Handbook of Mathematical Functions with Formulas Graphs and Mathematical Tables 9th printing New York Dover Agresti A 1990 Categorical Data Analysis New York John Wiley and Sons Ahuja JC and Nash SW 1967 The generalized Gompertz Verhulst family of distributions Sankhya series A 29 141 56 Akaike H 1973 Information theory and an extension of the maximum likelihood principle In Second International Symposium on Information Theory ed B N Petrov and F Csaki pp 268 281 Budapest Hungarian Academy of Sciences Reprinted in Akaike 1992 and Akaike 1998 Akaike H 1992 Information theory and an extension of the maximum likelihood principle In Breakthroughs in Statistics Volume II ed S Kotz and N Johnson pp 610 624 New York Springer Verlag Akaike H 1998 Selected Papers of Hirotugu Akaike New York Springer Verlag Bernoulli J 1713 Ars Conjectandi Basel Birnbaum ZW and Saunders SC 1969 A new family of life distributions Journal of Applied Probability 6 319 27 Borel E 1925 Principes et formules classiques du Calcul des Probabiliti s Paris Borwein P 1995 An efficient algorithm for the Riemann zeta function Working paper http www cecm sfu ca pborwein PAPERS P155 pdf http citeseer nj
16. 1947 The distribution of the range Annals Mathematical Statistics 18 384 412 Guttorp P 1995 Stochastic Modeling of Scientific Data London Chapman and Hall Hammes LM Treloar AE 1970 Gestational interval from vital records American Journal of Public Health 60 1496 505 Harris JW Stocker H 1998 Handbook of Mathematics and Computational Science New York Springer Verlag Hazelrig JB Turner ME Blackstone EH 1982 Parametric survival analysis combining longitudinal and cross sectional censored and interval censored data with concomitant information Biometrics 38 1 15 Hilborn R and Mangel M 1997 The Ecological Detective Confronting Models with Data Monographs in Population Biology 28 Princeton N J Princeton University Press Holman DJ 1996 Total Fecundability and Fetal Loss in Rural Bangladesh Doctoral Dissertation The Pennsylvania State University Holman DJ and Jones RE 1998 Longitudinal analysis of deciduous tooth emergence Il Parametric survival analysis in Bangladeshi Guatemalan Japanese and Javanese children American Journal of Physical Anthropology 105 2 209 30 164 References Jorgensen B 1982 Statistical Properties of the Generalized Inverse Gaussian Distribution Lecture Notes in Statistics No 9 New York Springer Verlag Johnson NL Kotz S 1969 Discrete Distributions New York John Wiley and Sons Johnson NL Kotz S Balakrishnan N 1994 Continuous Univariate Distributions Volume 1
17. CURVE KEY Observed WITH impulses d_idx 1 TO 11 numb_plants numb_quadrants END 127 Statistical examples END plot The resulting parameter output are given in annotated form below The difference in AIC between the two modes suggests that the Thomas distribution fits the data much better than the Poisson The plots in Figure 6 show how much better the Thomas distribution fits compared to the Poisson 11 lines read from file armeria dat 11 Observations kept and 0 observations dropped NAME numb_plant numb_quadr FREQUENCY MEAN 5 00000000 9 09090909 9 09090909 VAR 11 0000000 264 690909 264 690909 STDEV 3 31662479 16 2693242 16 2693242 MIN 0 00000000 0 00000000 0 00000000 MAX 10 0000000 57 0000000 57 0000000 Model 1 Run 1 Thomas distribution LogLikelihood 158 0639 AIC 320 12784 Del LL 0 0000016017 Iterations 6 Function evaluations 158 Converged normally Results with estimated standard errors 6 evals Solution with 2 free parameters Name Form Estimate Std Error t against aa 0 581452489433 0 088149263241 6 59622631041 0 0 bb 1 717416986359 0 258883747023 6 63393127652 0 0 Model 2 Run 1 Poisson distribution LogLikelihood 225 3173 AIC 452 63465 Del LL 0 0000000000 Iterations 2 Function evaluations 26 Converged normally Results with estimated standard errors 3 evals Solution with 1 free parameter Name Form Estimate Std Error t against m 1 579996571411 0 069292424625 22 8018658598 0 0 Figur
18. Distribution of gestational age Parameter file hammes mle Input data file name hammes dat Output file name hammes out 3 variables read 18 lines read from file hammes dat 18 Observations kept and 0 observations dropped for each variable ROW topen tclose frequency MEAN 258 722222 253 555556 144 111111 VAR 5338 56536 6032 37908 51267 3987 STDEV 73 0654868 77 6683918 226 423052 MIN 0 00000000 1 0000000 0 00000000 MAX 329 000000 329 000000 724 000000 Model 1 Run 1 Distribution of gestational age METHOD DIRECT Maximum Iterations MAXITER 50 Maximum function evaluations MAXEVALS 100000 Convergence at EPSILON 0 0000001000 LogLikelihood 6459 238 AIC 12922 476 Del LL 0 0000000000 Iterations 3 Function evaluations 146 Converged normally PDF NORMAL with 2 free parameters Name Form Estimate Std Error t against mean 279 1204377949 0 370066272387 754 244465444 0 0 stdev 23 02007362180 0 365987430388 62 8985361530 0 0 Variance covariance matrix 0 13694904596 0 0586570132 0 0586570132 0 13394679920 Likelihood CI Results Log Likelihood 5915 1352 after 4 iterations Delta LL 0 00000000 PDF NORMAL with 2 free parameters Name Form Estimate Lower CI Upper CI mean 279 7654969512 279 1863052702 280 3447034638 stdev 13 04605798312 12 64289497881 13 47052893809 Figure 3 Output generated by the program in Figure 1 The mle program is run by typing the line mle hammes mle at the command line prompt see Chapter
19. END of the MLE program The abridged output is 18 lines read from file ex2 dat 18 Observations kept and 0 observations dropped New model 32 kV Insulating Fluid Example LogLike 81 66833 Iterations 2 Func evals 28 Del LL 0 0000000000 Converged normally Results with estimated standard errors 8 evals Solution with 1 free parameter Name Form Estimate Std Error t against lambda LOGLIN 0 011742333138 0 002142967492 5 47947329296 0 0 109 Statistical examples Survival analysis Interval censored Observations Interval censored observations are those collected between two points of time These observations frequently arise from prospective studies in which periodic observations are collected The exact times to the event are not known What is known is f the last time before the event occurred and te the time of the first observation after the event occurred The likelihood for interval censored events is the area under the pdf between f and te 4 L 8 t t T fGl0m2 Ills 18 S t 16 il y i l In mle the area under the pdf that is the integral over the interval t te is specified for most distributions as the first two times with the second time greater than the first For example PDF NORMAL 11 15 10 6 END returns 0 231 which is the area between 11 and 15 under a normal distribution with u 10 and o 6 Here is an mle program that finds parameters of a lognormal distribution from int
20. If you are learning a programming language for the first time mle is a good language beginner s language If not here are some reasons to learn and use mle It will make it easier to develop and estimate statistical models in mle This is perhaps the biggest reason to learn mle instead of another language Learning general purpose computer programming in mle will simultaneously provide tools for scientific computing model development and statistical estimation 10 Okay this is an exaggeration After all hundreds if not thousands of wars have been fought over religion Fortunately programming language bigotry does not quite rise to that level of fanaticism 131 Programming tutorial It is free for non commercial use It is a simple language It is almost as simple as early versions of Basic but with some nice programming features like those found in Pascal So many newer languages are badly bogged down with widget libraries object oriented constructs and other complexities it makes it difficult to do simple data manipulation or calculation It recognizes many different number formats This can be helpful when you need to read in say Roman numerals time formats dates etc It comes with many useful numerical and mathematical functions It comes with many useful statistical functions and predefined probability density functions It can work with complex numbers It has built in help Learning how to progra
21. RUN FULL THEN print out simulation stats PRINTLN sig mean sig_mean 158 Programming tutorial sig SD sig_sd true sitsd PRINTLN Absolute bias sitsd sig_mean bias 100 sig_mean sitsd PRINTLN t test param lt gt 0 t sig_mean sig_sd PRINTLN t test param sitsd t sig_mean sitsd sig_sd END then END model END An even more complicated simulation program This program simulates repeated datasets each containing observations of a bilateral morphological trait The simulation includes the ability to add for example a directional size bias Noise of development is superimposed on the underlying trait and different variances in the noise can be specified for each side 159 MLE Programming tutorial This program simulates Fluctuating Asymmetry data It creates 200 simulations with 150 subjects each SEED CL outdir outfile nsims nsubjec trait_a trait_b da 0 sd_left sd_righ prob_AS OCKSEED sim base 200 ts 5 2 0 0 EE 0 0 fout FILE FUNCTION drawtrait dist INT draws a random value dist selects the dis IF dist 1 THEN pick a random seed directory where output goes base name for output file Number of simulations to do Number of subject in each simulation Trait mean parameter Trait dispersion parameter this param controls da AS if prob_AS lt gt 0 a
22. UPATT aa Go to previous line POD Go down one page PeUP Go up one page Ctl ees ee ees Go to next tab Shift_Tab Go to previous tab Ctrl_Home Move window up Ctrl_End Move window down Ctrl_RtArr Skip ahead one word Ctrl_LtArr Skip back one word Insert and delete commands Delete Delete character del Ctrl AAA Delete character backspace Ctrl_J Ctrl_M Break line at current position A Toggle insert overwrite Cil AA Delete line O AN Delete to beginning of line Cl csssshcashecededee Delete to end of line Ctrl N wee Insert new line Ctr R nanan Delete word File commands lt not assigned gt Close file Save if necessary lt not assigned gt Close file without saving lt not assigned gt Save and close file Ctrl OMe Open Save current file if necessary lt not assigned gt Open without saving current file lt not assigned gt Save current and open AIX Gea Rn ea Quit Save if necessary Ctrl Kia Quit without saving 36 mle 2 1 manual Shift FB Save and quit lt not assigned gt Save as lt not assigned gt Save lt not assigned gt Save file Al Fasismi Set whether backup files are made Block commands Shift_F4 Alt_ Mark beginning of block Alt Pus Copy block AMO sl Delete block Ctrl_F4 Alt_ Mark end of block AUTO td Go to block Alt Coiiloici n dobacss Clear block m
23. a23 1 measure left 1 United_States Wisconsin Madison longitude 40 1388333 Here are some legal names that are of questionable value 1 legal if odd name 3 ditto bad name could be confused with the sin function confusing name Looks like a subrange of some sort 2 confusing legal name The leading oh looks like a zero 4 confusing legal name The leading el looks like a 1 Here are some examples of improper variable names Bad variable name test TEST is an mle reserve word model 2 MODEL is an mle reserve word writeln 6 WRITELN is an mle intrinsic procedure 2days 2 doesn t begin with a letter _now 2 doesn t begin with a letter sib s_name embedded punctuation school number embedded space Variable Types Most examples so far have shown assignments using real numbers and integers There are in fact seven different types supported by mle REAL INTEGER COMPLEX BOOLEAN STRING CHAR character and FIL E A variable s type refers to the domain of values that the variable can take on For example INTEGER variables can take on a limited range of integer values BOOLEAN variables can only take on the values TRUE and FALSE Variables can be defined for each of the seven types expressions always take on one of these types Here is an explanation of each e Real variables represent the continuous real number line Many mathemat
24. high risk infant mortality subgroup and low risk normal subgroup The results suggest that 22 of the deaths were to individuals in the first subgroup The expected age at death can be found by taking E a Sta Oda 0 where denotes that we are using the parameter estimates Additionally the expectation can be taken for each of the subgroups by fixing p 0 or p 1 The expectation comes to 7 11 years for the full sample which is very close to the 7 09 years found by Deevey 1947 using the life table method For the first subgroup the expectation of life is 0 77 years and for the low risk subgroup the expectation of life is 8 90 years A plot of the survival distribution for the most parsimonious model is shown in the following figure 122 Statistical examples The following code show the final analysis and other statistics computed for this model MLE Analysis of the data from Murie 1944 as reported in Deevey 1947 The raw data consist of 608 Dall mountain sheep skulls collected in the Mt McKinley Park Ages at death were determined from the annual growth rings on the horns INPUT_SKIP 2 TITLE Murie skull data Siler model EPSILON 0 0000001 DATAFILE murie dat OUTFILE DEFAULTOUTNAME PLOTFILE DEFAULTPLOTNAME MAXITER 500 DATA frequency FIE last_alive FIE first_dead FIE END MODEL DATA PDF MIXMAKEHAM last_alive first_dead PARAM p LOW 0 HIGH 1 START PARAM al LOW
25. in a batch mode the b option turns off this monitoring and slightly speeds up execution The termination file option t tells mle to watch for a termination file while solving a model The term file is given the same name as the program file name but with a t rm file extension replacing the mle If the file is found mle terminates solving the model at the end of the next iteration interactive mode mle can be run interactively using the i command line option When run interactively commands are typed directly into the command line This option is particularly useful when mle is used as a calculator which is described in the last section of this manual Of course a full program can be written directly from the keyboard using this option Calculator Mode mle can act like a calculator In this mode instead of a program filled with assignment statement data statements and model statements a series of expressions are given to mle The expressions are evaluated and the result is printed This can be done either interactively using the i command line option or by reading in a program file This calculator mode is assumed when the first keyword of a program is not MLE mle will then execute all subsequent commands as expressions to be interpreted Here is an example 45 mle 2 1 manual e gt mle i sin pi 3 This is the user defined expression 2 168404E 0019 And this is what was returned PDF normal 2 3 1
26. mix END data RUN FULL END model Here again the lt expression gt begins with the DATA function and ends with a matching END just before the Run Within the DATA function the MIX function is immediately called and the MIx function contains three arguments separated by commas Each of these three arguments contains an expression Here we see one parameter p a mixing proportion and two function calls PDF END Within each PDF END two parameters are defined The model contains a total of five parameters The FULL keyword specifies that all parameters will be estimated Runlist Parameters that are defined with the PARAM END function can be free parameters and therefore estimated as part of maximizing the likelihood Alternatively they can be constrained for the purpose of hypothesis testing or otherwise modifying the model Parameters may be held constant or fixed to the value of another parameter These are called fixed parameters and an estimate will not be found for them when the likelihood is maximized The lt runlist gt in mle provides the mechanism to specify a series of one or more models containing different combinations of free and fixed parameters For example in the mixture model likelihood above we may have reason to believe that the proportion parameter p ought to be 0 5 Perhaps this is because of the nature of the system being modeled We could first fit our collection of t values to t
27. return the area from 14 to infinity of under a normal pdf with parameters u 10 and O 6 or about 0 2525 This would correspond to the likelihood of an individual surviving past 14 units of times under the specified model For this example we use the data in Table 6 and suppose that there were three additional observations that had not failed by time 220 the end of the experiment The data will be coded so that the three right censored times are given as negative times 220 The DATA statement now creates two variables the first is the absolute value of time to failure and the second is the unmodified time Thus failed observations have two identical failure times for example 9 88 9 88 which defines an exact failure When the two identical observations are used in the PDF function the probability density function at that point is returned The right censored observations have a positive and a negative failure times 220 220 When the second failure time is less than the first the PDF function gives the area under the pdf from 220 to infinity which is the survival function MLE TITLE 32 kV Insulating Fluid Example DATAFILE ex2 dat Input data file name OUTFILE ex2 out Name to which results are written DATA topen FIELD 1 ABS topen tclose FIELD 1 END MODEL DATA PDF EXPONENTIAL topen tclose PARAM lambda LOW 0 00001 HIGH 1 START 0 05 END END of the PDF END RUN FULL END of the MODEL
28. taken from optional string expression in the CURVE statement is a continuation of the plot statement That line beginning with a comma tells Gnuplot to re plot the same data file using lines The name eg6 001 is the data file containing the plot points These file is written by mle and read by Gnuplot Here is the result of running Gnuplot on this plot file 0 08 0 07 50 06 30 05 a 0 04 5 20 03 0 02 09 5 0 5 10 Three dimensional Plots Three dimensional plots follow the same syntax as do two dimensional plots except that both an lt x_var gt and a lt y_var gt must be defined in the CURVE statement along with their ranges Here is the formal definition for one form CURVE KEY lt keystring gt WITH lt withstring gt AXES lt axesstring gt lt x_var gt lt x_min gt lt x_max gt lt x_points gt BY lt y_var gt lt y_min gt lt y_max gt lt y_points gt lt x_expr gt lt y_expr gt lt z_expr gt lt expr gt PESTLIDOS ere END Note that there is now a variable for both x and y The specification for each variable is separated by the keyword BY If the value of lt x_points gt or lt y_points gt is missing it will be taken from the variable PLOTPOINTS which is initially 100 Alternatively the INTEGER from of th
29. 0 HIGH 2 START 0 PARAM a3 LOW 0 HIGH START PARAM b LOW 0 HIGH START END END RUN THEN e2 INTEGRATE z 0 120 z PDF MIXMAKEHAM z p al 0 a3 b END END INTEGRATE z 0 120 z PDF MAKEHAM z al a3 b END END INTEGRATE z 0 120 z PDF MAKEHAM z 0 a3 b END END PRINTLN Expectation of life MixedMakeham model PRINTLN Expectation of life Subgroup 1 PRINTLN Expectation of life Subgroup 2 plotoptions set ylabel Probability of success set xlabel Treatment length days lo 0 hi 12 pts 50 PLOT plotoptions CURVE x lo hi pts xX PDF MIXMAKEHAM x p al 0 a3 b END END curve CURVE WITH lines linetype 2 x Lo Ad pts Xy PDF MIXMAKEHAM x p al 0 a3 b END I6 SETRANSFORM PDF MIXMAKEHAM x p al 0 a3 b END END curve upper CI CURVE WITH lines linetype 2 Xx Lo hi pts xX PDF MIXMAKEHAM x p al 0 a3 b END 96 SETRANSFORM PDF MIXMAKEHAM x p al 0 a3 b END EN curve lower CI Lot 123 Statistical examples Logistic regression 13 MLE INPUT_SKI TITLE DATAF ILE Tanner 1996 gives an example of logistic regression using data from Mendenhall et al 1989 Twenty four patients were given radiotherapy for some number of days to treat a tongue carcinoma Three years later the treatment is classified as success by the absence of the tumor after three years or failure if the disease recurs The observations are given in the file
30. 1 for estimates from oo to Sometimes it is useful to impose narrower limits perhaps to avoid getting hung up at a local maximum or to solve the model more quickly Be careful though Limits that are too narrow may exclude the global maximum after all the best parameter estimates for a set of data are presumably unknown Excessively narrow limits may cause problems when numerical derivatives for the variance covariance matrix are computed as well Also likelihood confidence intervals will bump up and stop at the limits you set The TEST xxx part of a PARAM function provides a value against which the parameter will be tested in some reports In a sense the TEST value is a null hypothesis value ho The test performed is t p h SE p where Pis the maximum likelihood parameter estimate and 69 mle 2 1 manual SE p is the standard error for the parameter estimate The Wald test is provided for convenience only mle does not make use of the test in any way Modeling Covariate Effects The PARAM function allows covariate effects and their associated parameters to be modeled within the parameter statement This is done as follows PARAM x HIGH lt expr gt LOW lt expr gt START lt expr gt TEST lt expr gt FORM lt formspec gt COVAR lt expr gt PARAM z HIGH END COVAR lt expr gt PARAM z HIGH er c END END param With covariates the lt expr gt following covar is a covariate effect Typ
31. 100h22 30H32 2 0h12 0H12 3 230 16 32 14 32 6 100 22 30 32 2 14 230 16 32 14 32 6 270 10 0 30 18 2 3 4 12 32 166 12 9 19 14 7 12 607 3 12_5 16 3_2 3 0_1 7 16d12m1944y 1D6M1800Y 12m16d1944y 6M1D1800Y 1944y12m16d 1800Y6M1D 14Dec1999 30jun1961 IMAY 1944 Real numbers can be specified in scientific notation so that Conversion Metric other suffix Table 2 Standard exponential format xEy gt x x 10 Roman numerals to integer Converts y from base d from 2 to 36 into integer 24 hour time into hours Hours must be 0 24 12 hour time with AM and PM suffixes into hours Hours must be 0 12 Degree hour minute second format Converted to real angle time Degree minute second format converted to radians Minute second and second format converted to radians Fraction notation Date converted to Julian day Date converted to Julian day Date converted to Julian day Date converted to Julian day 2 1E 23 Result integer real real real integer integer real real real real real real integer integer integer integer d is a strings of one or more positive digits s is a one or two character case sensitive metric or percent suffix see Table 2 v is a string of one or more Roman numeral digits I VXLCDM y is a string of one or more characters mmm is a 3 character English month name E g jan Feb MAR etc The deg
32. 2 end Compute the area under normal pdf from 2 to 3 H 1 o 2 0 1498822726114 resulting area INTEGRATE z 2 3 PDF NORMAL z 1 2 end end Expressions can be nested Integrate for 2 to 3 a normal pdf with H 1 0 2 0 1498822847945 This should be close to the previous result gamma 3 8 Evaluates the gamma function 4 6941742051124 summation i 1 10 1 i 2 end Sum from 1 to 10 1 i2 1 5497677311665 end Ends and returns to DOS In version 2 of mle when using calculator mode interactively there will always be a delay of one expression before the results is returned This is because an expression can continue indefinitely For example the expression SIN 2 pi followed by a carriage return does not complete the expression because the next line may be 1 2 A new expression is needed to denote the end of the old expression Thus typing 1 pi 2 followed by a carriage return will result in two complete expressions returning 1 and 3 1415926535898 The third expression is not yet complete Note that if you begin mle with the options i v and begin typing expressions the verbose result will show the entire expression in functional form i e as a series of functions For example c gt mle i v sin pi 2 4 1 This is the user defined expression returns SIN ADD DIVIDE POWER PI 0 320074806512 46 mle 2 1 manual Chapter 3 Creating data sets As a first step in parameter estimation a data set must be read in
33. 2 for details The results written to the output file are shown in Figure 3 The first section of the output provides summary statistics for each of the variables read from the data file The parameter estimates are given in two ways once with estimated standard errors including a f test of the hypothesis that the estimate is zero and once with likelihood confidence intervals A Note About Parameters The ultimate goal of putting together a likelihood model is to estimate one or more parameters of the model In mle the param ND function defines parameters to be estimated This use of the word parameter can be confusing so lets clear up the issue In any mathematical language we can refer to a function s arguments as parameters For example in the statement a sin b sin is a function with one parameter b This manual will avoid the word parameter in this general sense Instead the word argument will be used to refer to the arguments of a function in this general sense So the sin function has the argument b As used in this manual the word parameter in mle refers to an unknown quantity of a probability model whose value is to be estimated Parameters in this sense are frequently arguments to functions but not all arguments are parameters A more accurate definition of a parameter is an unknown quantity whose distribution of values is to be estimated The standard errors or confidence intervals provide informati
34. 2nd edition New York John Wiley and Sons Johnson NL Kotz S Balakrishnan N 1995 Continuous Univariate Distributions Volume 2 2nd edition New York John Wiley and Sons Kalbfleisch JD Prentice RL 1980 The Statistical Analysis of Failure Time Data New York John Wiley amp Sons King G 1998 Unifying Political Methodology The Likelihood Theory of Statistical Inference Ann Arbor The University of Michigan Press Kirkpatrick S Gelatt CD Vecchi MP 1983 Optimization by simulated annealing Science 220 4598 67 1 80 Kishor ST 1982 Probability and Statistics with Reliability Queuing and Computer Science Applications Englewood Cliffs NJ Prentice Hall Laplace PS 1774 M moire sur la probabilit des causes par les v nemens M m de Math et Phys l Acad Roy des Sci par div Savans 6 621 56 Lee ET 1992 Statistical Methods for Survival Data Analysis New York John Wiley and Sons Levy P 1939 Composita Mathematica 7 283 339 Maxwell JC 1860a Phil Mag 19 19 Maxwell JC 1860b Phil Mag 20 21 33 Metropolis N Rosenbluth A Rosenbluth M Teller A and Teller E 1953 Equation of state calculations by fast computing machines Journal of Chem Phys 21 1087 90 Mohr PJ and Taylor BN 1999 CODATA Recommended values of the fundamental physical constants 1998 Journal of Physical and Chemical Reference Data 28 6 1 140 Morgan BJT 2000 Applied Stochastic Modeling London Arnold Murie A 1944 T
35. 2x 3 you know by convention that the exponentiation occurs first the multiplications take place second and the addition is third The built in operators in mle follow a more or less standard precedence That is an expression like 4 2 3 will evaluate 2 3 first and then 1 The 2X notation is how numbers are specified in other bases base 2 or binary in this case For base 2 numbers only the digits O and 1 are permitted on the right hand side of the X Octal base 8 numbers can be specified as 8X where digits from 0 to 7 are permitted on the right hand side of X 138 Programming tutorial add 4 The precedence of operators are defined in Table 10 Higher precedence Table 10 Operator precedence Operator s Precedence Category Uniary operators Exponent operator div mod and shl shr Multiplying operators or xor Adding operators or lt gt lt gt lt gt Relational operators operators will always be evaluated before lower precedence operators More on Strings String constants are values that are enclosed within quotes Here are a few rules for string constants e when you specify a string constant you can use either the or the characters e If you open a string constant with you must close it with If you open the string with you must close with Hence the statements foo My name is bar Kilroy WRITELN foo bar are legal and produce the output My name is
36. 3 14159265359 And a third example mle h besseli Function BESSELI x1 x2 returns the modified Bessel fcn I integer order x1 of real x2 The n option provides summaries for a few topics For example mle h FUNCTIONS will list all of the intrinsic simple functions and mle h SYMBOLS which lists all variables in the symbol table Typing mle h functions more isa useful way to examine all mle intrinsic functions because the more program will stop the display after each page of output is listed The H lt name gt option is similar to the h option except that any function variable constant or reserve word that includes lt name gt as some part of the reserve word is printed The H option is particular useful when you cannot recall the exact name for some keyword Thus mle H integra lists all keywords with the string integra INTEGRATE v exprl expr2 expr3 END INTEGRATE v exprl expr2 expr4 expr3 END v is the variable of integration exprl is evaluated for the lower limit of integration expr2 is evaluated for the upper limit of integration expr3 is the integrand and may reference v expr4 is an optional convergence criterion INTEGRATE_METHOD I_TRAP_CLOSED uses closed trapezoidal integration INTEGRATE_METHOD I_TRAP_OPEN uses open trapezoidal integration INTEGRATE_METHOD I_SIMPSON uses open simpson integration INTEGRATE_METHOD I_AQUAD default uses adaptive quadrature integration INTEGRATE_N is the number
37. 87 mle 2 1 manual The Curve Statement The CURVE END statement does the bulk of the work in creating plots Each curve statement generally creates a single curve or surface For simplicity the curve statement will be discussed separately for two dimensional and three dimensional plots Two dimensional Plots The idea of the curve statement is to generate a series of points for a function For simple curves two points must be defined an x value and its corresponding y value There are two forms for the CURVE statement for producing two dimensional plots One form generates a series of REAL x values for use in computing y values The second form generates an INTEGER series of points The REAL version looks like this CURVE KEY lt keystring gt WITH lt withstring gt AXES lt axesstring gt lt x_var gt lt x_min gt lt x_max gt lt x_points gt lt x_expr gt lt y_expr gt lt expr gt lt strings gt END The KEY WITH and axes will be discussed later This form of the curve statement creates a series of x points It begins with the point lt x_min gt and ends with the point lt x_max gt lt x_points gt points will be generated in total Each point will be assigned to lt x_var gt in turn The value of lt x_var gt will be used at each point to compute lt x_expr gt and lt y_expr gt and perhaps other expressions as well If the expression for lt x_
38. CURVE i 1 TO 12 i PDF GEOMETRIC i 0 2 END END plot END 0 25 0 2 0 15 0 1 0 05 5 2 4 6 8 10 12 Each CURVE END statement defines a single graph as a series of x and y points The x and y values and perhaps some values used for error bars and other things are written to a data file These data files one per CURVE END statement are read by Gnuplot when creating the graphs KEY There are three optional keywords that can be used in the CURVE END statement The first is KEY followed by a string expression This sets up a title for the plot key AXES The axes keyword defines the axis to which a curve will be plotted A single string expression follows axes Valid values for this string are xly1 x2y1 x1y2 and x2y2 WITH The w1TH keyword defines the style of curve to be plotted along with any options for that style A single string expression follows wITH The string begins with one of the Gnuplot plot styles and is followed by options for that style mle checks the first word of this string and makes sure there are enough PLOT expressions for the desired graph type The information is also used to put together the Gnuplot plot or splot command Valid values for the first word of this string are boxerrorbars 4 to 6 CURVE expressions 2d only boxes 2 CURVE expressions 2d only boxxyerrorbars 4 to 7 CURVE expressions 2d only candlesticks 7 CURVE expressions 2d only dots 2 2d or 3 3d CURVE expre
39. Column 1 is the minimum gestational age in a category column 2 is the maximum gestational age in a category Together they define a birth weight interval The 1 in the last row denotes an open birth weight interval i e a weight of 329 Column 3 is the frequency of births in each birth weight interval The DATA statement given in Figure 1 specifies how the data file is to be read The three variables topen tclose and frequency that come between DATA and its matching END are read in for each observation i e each line in Figure 2 In fact each of these variables will be created as an array each having twenty elements each element corresponding to one line in the data file The variable name frequency is special mle treats variables with the name frequency and freq as well as a count of repeated observations The likelihood is adjusted for the number of observations so that the contribution will be the same as if multiple identical observations been read in from the file Likelihood Model Model 1 The next part of the program is the MODEL statement The MODEL statement consists of two parts an expression that comes between the MODEL and Run that defines the likelihood and a list of one or more specifications between the RUN and END each giving some details of how parameters are to be estimated Run Part of the Model Statement Within the MODEL RUN part of the statement is a single function that defines
40. Kilroy The statements foo My name is bar Kilroy are invalid because the quote types do not match Some languages do not allow this flexibility In BASIC for example all string constants must be enclosed in the character In Pascal all string constants must be enclosed in the character mle allows either Commas in lists of arguments Commas are always optional in mle Hence both WRITELN foo bar WRITELN foo bar are valid and they work exactly the same There are several good reasons to use commas however First they make it easier to read Secondly they are helpful when working with negative numbers Consider the following 139 Programming tutorial WRITELN 3 1 This statement produces the output 3 1 There is no space between the 3 and the 1 because it is not asked for Now what if you leave the comma out WRITELN 3 1 This program produces the output 2 This is because 3 1 was taken as a mathematical expression The expression evaluated to the number 2 So the comma was useful in this context You could however still avoid using the comma Here are some ways of getting the same result WRITELN 3 1 put the 1 inside parentheses WRITELN 3 NEGATE 1 creates 1 with the negate function Now once you understand all that you can make sense of statements like WRITELN My name is first middle last The call to procedure WRITELN has 6 arguments some
41. Likewise the default width and decimal places for complex numbers is controlled by COMPLEXWIDTH and COMPLEXDECIMALS Plotting routines have been added for generating GNUPLOT output PLOT CURVE and MULTIPLOT Also the MODEL statement has been modified to plot estimated distributions with confidence intervals and likelihood surfaces See the PLOTTING chapter in the Users manual for details The For statement has been greatly enhanced The step keyword provides for different step sizes The looping index variable can be either real or integer The steps keyword specifies the number of steps to loop over between the two limits Finally the ror statement can take a dataarray or an array variable and loop over each element of the array of any type Since a step size of 1 can be used the DOWNTO statement is no longer supported A great number of intrinsic functions have been added CLOCKSEED EXEC lt cmd gt lt args gt PLOTFILE NORMAL x NORMALCDF x CHISQ x df STUDENTT x df INVSTUDENTT p df FDIST x dfl df2 INVFDIST p dfl df2 INVBETA p v w DIREXISTS FILESIZE ENVCOUNT ENVSTRING ARGCOUNT ARGSTRING GETDIR ZETA SETRANSFORM lt expr gt 22 mle 2 1 manual e Added some new procedures Among them ERASE EXEC lt cmd gt lt args gt RENAME n1 n2 CHDIR n1 MKDIR n1 RMDIR n1 G
42. OBSERVATIONS WITHOUT A FILE ccccccccseccccesscceceeccscueccsceeccscuseccecueecsseuecsssuseceseueecesseeecessuecessueecsssueeseeeues 51 PRINTING OBSERVATIONS AND STATISTICS cccccseeccccesecccceeeccccusecccceseccseuseccscusecceceueceseusecsseusecsssueeceseueecessusecessueecsseueeseseues 52 AN EXAMPLE OF CREATING AND READING A DATA FILE sscccccseecccceecceceseccsceeeccccuseccseusecescusecsseueecessueecssueeecessueecssseeseseues 52 ACCESSING OBSERVATIONS ccccccsseeccceseccecsecccceecccceuecceceueccecueecceceuscsseuecescueeccssueecsssusecsscuecsssuecessueeceseuecessueessssueesesenes 54 NUMBERFORMATS 3253238 den cd ado ds toral a la A dk des E Led a so ok tte dll a de oS 55 BUILDING LIKELIHOOD MODELS cscssssscssscsssessscsssesssessscssscssssssscsssesssesssesssesssessscsssesssesssesssesssessscsssessscsssetees D STRUCTURE OFTHE MODEL STATEMENT io 57 Ampl le tad 57 An Mer example A RE ii ices 38 mle user s manual Brief table of contents FRUNUISE sarccesiettsteste sated eect sa ae ti ile el e o e a oleate te ets 59 BULL catarata aed ea ee ee T 60 REDUCE conta AAA ieee 60 WITH A eaeel wld ave a ei ae ind ee ch ea tei a ei Rh SE aed eee 60 THEN END sasich eles aria ie Se elvis 61 Bayesian model AVCvAQing cssccessscceseecessccessceessceesscecsscecssaecesseeesseecsseecsseecesaeeesseeesseecsseecescecesaeeesseecsseeceseeceaeoesaeeesaee 62 Results cacccccevsecvscntintsssnniedessLideeesaealeteiel cde
43. RADIOT DAT and the mle program file iS RADIOT MLE P 8 skip comments Radiotherapy success radiot dat Input data file name OUTFILE DEFAULTOUTNAME METHOD EPSILON DATA days success END ALT_LOGIS MODEL DATA CGRADIENT1 1E 10 FIELD 1 Days of treatment FIELD 2 Success of treatment at 3 years TIC TRUE use exp xb 1 exp xb instead of 1 1 exp xb PDF BERNOULLITRIAL success PARAM b_0 LOW 500 HIGH 500 FORM LOGISTIC COVAR days PARAM b_days LOW 10 HIGH 10 START 0 END END param END pdf END data RUN FULL END model END of the MLE program In this model the variable days is the covariate of interest and the outcome is the variable success The logistic regression model specifies the probability of success as exp B B x 1 exp B B x where x is the number of days of treatment and the B coefficients are parameters to be estimated Note that the variable ALT_LOGISTIC is set to TRUE for this particular form of the logistic model The likelihood under the logistic model is probability p for each patient for whom therapy is successful and 1 p for each patient for whom therapy is unsuccessful Hence each observation is treated as a Bernoulli trial for success with parameter p modeled as 13 The likelihood is 124 Statistical examples L 13 i exp Bo B x i l 1 exp Bo B x The resulting parameter
44. Runs mle interactively Commands are typed directly in from the keyboard Using interactive mode is helpful for using mle as a probability calculator Interactive mode is discussed later in this chapter The program file is parsed for errors and not run Sets the internal variable PARSE TRUE Specifies a file system path to include while searching for include files see command INCLUDE Batch mode Turns off keyboard monitoring for interactive debugging while executing models Tells mle to watch for a termination file while solving a model and if it is found terminates solving the model at the end of the next iteration Tells mle to read in values from the start file to initialize start values for a MODEL statement The start file is automatically created by the Sw option Tells mle to write a start file following each iteration during a MODEL statement The values are read and used as updated start values when the Sr option is used A special flag equivalent to Sr Sw t v A flag used by the editor emle to interact with mle mle supports various number formats dates times Roman etc This command line option takes a list of numbers parses them and reports the results Prints out a version number string Turns on data debugging where details are printed as each observation is read from the data file and converted into a data set Sets DEBUG_DATA TRUE Echos each
45. Statement The For statement provides a means of looping through statements The formats are FOR lt v gt lt expr gt TO lt expr gt DO lt statements gt END FOR lt v gt lt expr gt TO lt expr gt STEP lt expr gt DO lt statements gt END FOR lt v gt lt expr gt TO lt expr gt STEPS lt iexpr gt DO lt statements gt END FOR lt v gt lt array gt DO lt statements gt END Form 1 is a simple looping statement The variable lt v gt must either not be previously defined or if it already exists it must be an integer or real variable Its value will change as the For statement is executed The first lt expr gt will be executed once and will define the starting value of lt v gt The second lt expr gt will be executed once and will define the last value of lt v gt Every iteration through the loop the value of lt v gt will be incremented by 1 Here is an example that will print sine and cosine tables in one degree increments as well as creating a table of radians for each degree 19 mle 2 1 manual r REAL O TO 359 FOR x 0 TO 359 DO r x DTOR x WRITELN x degrees r x radians SIN SIN r x COS COS r x END Form 2 of the For statement is like form 1 except that the lt expr gt after ste will be used as the increment or decrement value instead of one The step size can be any real or integer value If the value is positive then lt statements
46. Variables are like the cells in a spreadsheet program except that they are not laid out in a visual grid The first variable created above is called population The value 6 3 is assigned to this variable Since 6 3 is a real number rather than an integer or a string of characters the variable will be created to be a real number and initially assigned the value 6 3 The variable greeting is assigned to a string of characters Hello Universe Consequently the greeting is created as a STRING variable The single quotation marks are not actually part of the string Rather they serve to delimit where the string starts and where it ends The quote marks can be single quote marks or double quote marks but they must match Hello is not a valid way to specify a string However you can specify the string People s world as People s world What goes into a variable name There are several rules that must be followed e First a variable name must begin with a letter The letter can be upper case or lower case it does not matter mle treats uppercase and lowercase as identical for identifier names and keywords e After at least one letter other letters numbers a period or an underscore may be used e You should avoid using predefined keywords function names and procedure names Sometimes you will get an error i e using a keyword and other times you will simply add confusion and disable the original purpose of the
47. WHILE Statement The WHILE statement loops through statements while some condition is true The format is WHILE lt bexpr gt DO lt statements gt END The Boolean expression lt bexpr gt is executed first If the value is TRUE the lt statements gt are executed once and lt bexpr gt is evaluated again The sequence continues until lt bexpr gt evaluates to FALSE That is when lt bexpr gt is FALSE the loop terminates Unlike the REPEAT statements the statements will not be executed once if the condition initially fails IF Statement The 1F statement provides a means of conditionally executing statements The following types of IF statements are available IF lt bexpr gt THEN lt statements gt END This form will conditionally execute the lt statements gt only if lt bexpr gt evaluates to TRUE An ELSE clause can be added to the statement so that one of two sets of statements will always be executed 20 IF lt bexpr gt THEN lt statements gt ELSE lt statements gt END mle 2 1 manual In addition one or more ELSEIF clauses can be added to the statement to allow multiple conditions to be IF lt bexpr gt THEN lt statements gt tested ELSEIF lt bexpr gt THEN lt statements gt ELSEIF lt bexpr gt THEN lt statements gt ELSE lt statements gt END Here is an example of an IF statement IF SYSTEM MS DOS THEN
48. a Here is a call to the user defined procedure WRITELN Back from myproc with t t END The definition begins with the word PROCEDURE and ends with the corresponding END The word following PROCEDURE 1s the name of the procedure in this case myproc The name is followed by a list of O or more arguments that are formally defined that is a name and type must be specified for each argument In this example three arguments a b and c are defined The argument names and all of the variables defined within the procedure like msg are private to the procedure Names of preexisting variables like a are not affected by and do not affect declarations outside of the procedure The procedure definition does not actually do any visible work in a program The work comes when a procedure is called as in the line myproc t 4 2 a Once called each argument is evaluated and a copy of the result is assigned to the formal argument defined in the heading of the procedure The statements within the procedure are executed and control is passed back to the main program Here are results from the sample program Call myproc with t 4 In myproc a is lt 10 4 2000000000 c Hello world Exit myproc with a Back from myproc with t A careful examination reveals an interesting behavior in this example the arguments passed from outside the procedure are not affected by any manipulation within the procedure
49. a T Accept lt Ctrl gt key 61D AA unmapped 0 e Xi es unmapped Cl linedel Delete line Ctrl AAN unmapped Eten tons unmapped Ctrl Nines as unmapped 0 Ctrl AA unmapped 0 mle 2 1 manual Ctrl eae eesti unmapped Ctrl AS unmapped Shift_Tab tabprev eee Go to previous tab Alt Qin blockde l Delete block Alt Won windowmenu Window menu AM Eiras editmenu 0 Edit menu At Rorera unmapped 0 Alt WT va ciksesssssccecess blockwrite Write block to a file Alt Yerri unmapped Alt Ulises unmapped Alt Tir aoee unmapped Alt Orosei blockgoto Go to block AlPuancririis blockcopy Copy block AltA wo ASCH edad de Enter ASCII code Alt Ss searchmenu Search menu AD unmapped AIR filemenu File menu IS A gotoline ooooocccc Goto line Alt cdo helpmenu Help menu Auris unmapped Alt_K oo rulertoggle Toggle ruler display Alt AA unmapped All Lustre unmapped At XK never eer a U E i cios Quit Save if necessary Alt G wait scostenscccctsd clearmarks Clear block marks Alt_V wo blockmove Move block
50. arguments can be any expression so that time shifts and transformations can be incorporated in this list Intrinsic parameter list provides specifications for the PDF s intrinsic parameters The order that the intrinsic parameters are specified is important it corresponds to how the PDF is defined within mle The PDFs chapter lists the order for intrinsic parameters alternatively the command line mle h can be used to determine the proper argument order Note that any expression can be used for an intrinsic parameter That is you do not need to use a PARAM function for the intrinsic parameters although this is the most common use Here is an example in which the location parameter is fixed to a constant for a shifted lognormal distribution PDF SHIFTLOGNORMAL tooth_eruption_age 9 shift the time back to conception PARAM location LOW 1 HIGH 4 START 2 5 END PARAM scale LOW 0 0001 HIGH 3 START 0 9 END END PDF Time Arguments Most PDFs can have as few as one and as many as four time arguments specified They are t the last observation time before an event te the first observed time after the event ta the left truncation time for the observation or the PDF and t the right truncation time for the observation or the PDF Understanding how these four times act on the PDF statement is critical to creating the desired and proper likelihood 7 These are called time variables in the context of survival analysis
51. c test gt mle p test mle Parses test mle reports syntax errors C test gt mle mle will request the input file name mle Program file to run test mle The last example shows that if a program file name is not given on the command line you will be prompted for the program file name The middle two examples show command line options v and p being specified Command line options are used to change the behavior of mle and are discussed below If you type an erroneous command line option or the file is not recognized by mle the following synopsis is given c test gt mle z analysis mle There is no z option Error Incorrect number of parameters Usage mle v p i dd de di dl dp ds dx mlefile v teration histories and other messages are written to the screen p Only parses the mle file i Runs mle interactively dd Turns on data debugging de Echos characters while parsing di Turns on integration debugging dl Turns on likelihood debugging p s x e dp Turns on parser debugging ds Turns on symbol table debugging d Turns on debugging during execution mlefile is the name of the file with the program Usage mle h namel name2 help for PDFs functions symbols parameter transforms h matches words exactly H searches within words Usage mle pn nl n2 parses n s and returns values and type Table 1 gives a list of valid command line options A useful command lin
52. character in the program file as it is being read Sets DEBUG_ECHO TRUE Turns on debugging for the integration routines so that a report for each integration call is written to the standard output Sets DEBUG_INT TRUE Turns on likelihood debugging so that parameter estimates and an individual likelihood is written to standard output for every likelihood evaluation Sets DEBUG_LIK TRUE Turns on debugging while reading and parsing the program file Sets DEBUG_PARSE TRUE Turns on debugging for the symbol table routines so that information is printed to standard output whenever variables and symbols are created or destroyed Sets DEBUG_SYM TRUE Turns on debugging while running executing the program file so that a message is written to the screen just prior to executing each statement Sets the internal variable DEBUG_EXEC TRUE Sets the internal variable DEBUG to the value set by When is greater than zero debugging messages are printed The nature and type of messages changes and the output is used for program development A value of 0 turns off debugging 43 mle 2 1 manual which shows that there are two intrinsic parameters Note that equations are given for the probability density survival function or hazard function At least one of these is given for other PDFs as well Here is another example mle h pi Symbol PI REAL Const Static
53. current file to mle so that the program is run Prompts the user for an expression to evaluate via mle Inserts a code template at the current location Sets options intent level case for code case for comments for the templates Exits the menu From the main menu lt A1t gt w brings up the Window menu This menu provides some several mle related special functions The menu contains these elements Backcolor Forecolor Wordwrap Setmargins Ruler reDraw Quit Help menu Switches through the background color for the text Switches the foreground color of the text Toggles word wrap Sets the left and right margins Toggels a ruler display Off Top Bottom Redraws the current screen Exits the menu From the main menu lt A1t gt H brings up the Help menu This menu provides for several types of help information The menu contains these elements Editor_keys Key_map Mle_help mle_Search About Quit Default settings Displays the current mapping between editor commands and the keyboard Displays the current mapping of key to editor commands Submits the current word the word the cursor is currently sitting on to mle with the help option h option Any mle help messages that match the keyword exactly will be displayed Submits the current word to mle with the help option H option Any mle help messages that match any part of the keyword will be displayed Shows information about the editor Exi
54. data RUN FULL END model END program Here is the abridged output New model 32 kV Insulating Fluid Example LogLike 70 76273 Iterations 2 Func evals 26 Del LL 0 0000000000 Converged normally Results with estimated standard errors 7 evals Solution with 1 free parameter Name Form Estimate Std Error t against lambda LOGLIN 0 024294254090 0 004468859626 5 43634307759 0 0 The first part of the output shows the loglikelihood and information about iterations function evaluations and convergence This is followed a report of parameter estimates and their standard errors Table 6 Times to breakdown for an insulating fluid at 32 kV from Nelson W 1982 105 0 27 0 4 0 69 0 79 2 75 3 91 9 88 13 95 15 93 27 8 53 24 82 85 89 29 100 58 215 1 Survival analysis Exact Failure and Right Censored observations The standard problem in survival analysis is to find parameters of a parametric model when some observations are right censored Typically we have N exact observations and N right censored observations the likelihood is 3 L 0 t fa DI sa 0 where S 10 is the survival distribution which is the area under 110 to the right of t The area under a right censored observation is specified in the mle por 108 Statistical examples function by setting the second time variable to infinity or something less than the first time variable So the function PDF NORMAL 14 1 10 6 END would
55. data file The default value is zero so that no lines are skipped Delimiters in the data file Data files consist of a series of text elements separated by one or more delimiters One or more delimiters must appear between each record within a data file The delimiters define the fields within each line in which variables reside By default the characters space tab and comma are treated as delimiters You can redefine the delimiters by changing the variable DELIMITERS before the DATA statement If for example you wanted the colon and semicolon character as the only valid delimiters you would add the line DELIMITERS Creating observations without a file CREATE_OBS SEED 8936 DATA varl var2 var3 END varl BNHONON FS WW y 6679777032 7136215828 8714564727 6521659697 9649275178 6017912164 6953390371 7412253145 7631538913 0772026291 Sometimes it is useful to create observations rather than reading observations from a file For example you can simulate data sets using the random number generator in mle To create variables simply set the variable CREATE_OBS to some positive number prior to the DATA statement That number of observations will be created Here is an example create 10 observations set the random number generator seed QUANTILE WEIBULL RAND 3 2 2 5 END draw variates from a Weibull 3 2 2 5 pdf IRAND 100 200 draw discrete variates f
56. data file consists of seven fields delimited by space characters Since the space character is one of the default delimiters we do need to change the DELIMITERS variable to recognize the space as such But since we have commas embedded in the text that should not to be taken as a delimiter we must redefine DELIMITERS to exclude the comma and include the space and the tab character if necessary The numeric values appear in fields 3 4 6 and 7 52 mle 2 1 manual We do not need to do anything with fields 1 2 and 5 Let suppose that we want to convert Time from years into months Here is the complete mle code to read and process this file but no analyses are specified MLE DATAFILE THEDATA DAT PRINT_OBS TRUE print out each observation INPUT_SKIP 1 get rid of the header line DELIMITERS me spaces only treat commas as text DATA age FIELD amount FIELD DROPIF amount lt 0 rate FIELD is a legal number suffix in mle time FIELD time 12 END END Running mle on this file produces the output to the screen or standard output since no OUTFILE procedure was called Here are the results Table 2 Standard metric SI suffixes Taylor 1996 and IEC suffixes for integer and real numbers Suffix Name Conversion Suffix Name Conversion Deka d deci x10 a 2 Hector c centi percent x10 Kilo m milli x107 Mega pu micro x10 Giga nano x10 Tera pico x10 2 Peta femto x105 Exa atto
57. each numeric field of the confidence interval and standard error reports More significant digits are displayed if there is room If the number of leading zeros becomes too large that number will be printed in scientific notation e g 1 2343E 56 The variable PRINT_INFO when TRUE directs mle to print basic information about the model including the method being used the maximum number of iterations the maximum number of function evaluations and the criterion for normal convergence The PRINT_FREE_PARAMS variables when TRUE directs mle to print a list of all free parameters and the attributes of those parameters The variable PRINT_LLIKS controls printing of the individual likelihoods in a model When set to TRUE the likelihood and frequency for each observation will be printed to the output file Variables created by models mle creates variables in order to access the results from previous runs either within or outside of the MODEL statement Each MODEL statement is numbered beginning with 1 in the order in which they are found in the program Furthermore each run of the model defined by the FULL or REDUCE statement is numbered beginning with 1 for each moDEL The following variables are created 65 mle 2 1 manual lt param gt lt m gt lt r gt lt param gt LOW lt m gt lt r gt lt param gt HIGH lt m gt lt r gt lt param gt START lt m gt lt r gt lt param gt U
58. estimates suggest the log odds of recurrence by year 3 with zero days of treatment are 3 819 Paradoxically the log odds of success decrease with each extra day of treatment by about 8 6 percent Convergence at EPSILON 1 000E 0010 LogLikelihood 13 89411 AIC 31 788220 Del LL 1 367E 0014 Iterations 8 Function evaluations 824 Converged normally Results with estimated standard errors 10 evals Solution with 2 free parameters Name Form Estimate Std Error t against b_0 LOGISTIC 3 819417361125 1 739572481596 2 19560691005 0 0 b_days 0 08648243176 0 041100225123 2 1041838944 0 0 The resulting logistic curve can be plotted with a 95 confidence interval by replacing the RUN FULL part of the model statement with the following code RUN FULL THEN Code for plotting the logistic curve with CIs PLOT set ylabel Probability of success set xlabel Treatment length days set yrange 0 1 CURVE x 20 to 60 x LOGISTIC p x b days END curve CURVE WITH lines linetype 2 x 20 to 60 x LOGISTIC p x bdays 1 96 SETRANSFORM LOGISTIC p x b_days END curve upper CI CURVE WITH lines linetype 2 x 20 to 60 x LOGISTIC p x b_days 1 96 SETRANSFORM LOGISTIC p x b_days END curve lower CI END plot END full then Probability of success 0 20 25 30 35 40 45 50 55 60 Treatment length days l END model 125 Statistical example
59. is type COMPLEX You can explicitly define a variable s type when the variable is first referenced in an assignment statement Here are some examples c STRING nine REAL t BOOLEAN ang2 COMPLEX x c would otherwise be CHAR SES nine would otherwise be INTEGER TRUE t is explicitly declared BOOLEAN it is the default ang REAL SIN 2 pi ang is explicitly declared REAL it is the default GAMMA 1 5 force ang2 to COMPLEX 136 Programming tutorial Table 8 Algebraic boolean and logical operators Operator Function Example Equivalent function uniary negation E x uniary positive power function POWER x y multiply function ULTIPLY x y divide function IVIDE x y integer divide function DIV x y integer modulo function ODF x y NDF x y M D I M boolean and logical and function A logical shift left function SHIFTLEFT x y S A S O logical shift right function HIFTRIGHT x y addition DD x y subtraction UBTRACT x y boolean and logical or function RF x y boolean and logical xor function XORF x boolean is equal function ISEQ x boolean not equal function ISNE x boolean less than function x boolean greater than function boolean less than or equal to function boolean greater than or equal to function Statements with numeric boolean and logical
60. later versions of DOS is called EDIT Alternatives that come as part of Windows are NOTEPAD and WordPad Even word processing programs like MS Word can be used although you must be certain to save the programs as fext files emle A rudimentary editor is now available with Windows versions of mle This section of the manual briefly describes the editor and its functions 32 mle 2 1 manual The editor can be started from the Start gt Program menu A window pops up that looks like the this Alternatively the editor can be opened from a DOS command line To do so the emle exe command must be in your path or current directory The command emle myfile mle will open the editor and load or create the file myfile mle The text being edited is displayed in the black area of the screen although the color can be changed The top of the screen shows the current menu The bottom of the screen shows status information The first means that the current file has been changed The line number and column number come next The Insert or OvrWrt indicates the mode the editor is in Finally the filename is given if a file is opened for editing Editor commands can be accessed through the keyboard there is currently no mouse support Keystrokes work as expected that is the arrow keys navigate around the text lt PgUp gt and lt PgDn gt keys scroll up and down through the text etc Additionally menu items which are list
61. must not be assigned to a variable mle pre defines many built in constants and variables so you should avoid variable names that exist for some other purpose such as an mle constant a list of all variables appears in a later chapter Likewise mle uses the period as an internal delimiter for some purposes Conflicts might arise if your variable names contain a period you are free to use periods but an underscore might be a better choice Field The word FIELD refers to which column within an input file a variable is found in In the hammes dat file used in Chapter 1 four fields or columns existed in the input file T specifier must be a positive integer constant A number of other elements can be added to a variable definition as well These are below but the grammar used for specifying each variable is lt variable name gt FIELD x LINE y lt expr gt DROPIF lt expr gt KEEPIF lt expr gt he field defined Line Sometimes observations take up multiple lines in the data file An example might be times to first birth for a married couple in which female characteristics appear on the first line male characteristics occur on the second line When the LINE keyword is used e g LINI and the E 2 mle keeps track of the maximum number of lines specified this way Then all observations are assumed to have the maximum number of lines If observations are each on one line the statement LINE 1 may be d
62. name of data files created for use by the plot file Suppose we wish to create a plot called sincos plt The statement PLOTFILE sincos plt will create a plot file by that name Information will be written to this file that defines the plot The information comes from six places The PLOTFILE procedure writes an initialization string to the plot file The string is stored in the variable GNUPLOTINIT For example in DOS based operating systems this variable is initially set to set terminal windows reset set data style lines set autoscale set nokey These Gnuplot statements specify that the terminal is Windows plot parameters will be reset lines will be plotted by default Gnuplot will figure out a good scale to use and a graph key will not be generated You can change this initialization string by assigning a new string to the PLOTINIT variable Alternatively you can keep this string as is and add new program lines using the WRITEPLOTLN statement discussed next The WRITEPLOTLN and WRITEPLOT procedures provide a simple way of writing Gnuplot statements directly to the plot file These statements must be used after the PLOTFILE statement For example if you want to add a title to the plot the statement WRITEPLOTLN set title Sin and Cos functions You can insert any Gnuplot statement into the plot file this way The difference between WRITEPLOTLN and WRITEPLOT is that the former adds a newline after writin
63. not strictly continuous Occasionally difficulties arise with round off error because of the discrete computer representation of real numbers 12 mle 2 1 manual either TRUE or FALSE and decides which of the remaining two expressions will be evaluated and returned An example of a boolean expression is this 3 5 4 5 which returns the value FALSE e String variables hold a sequence of character constants A string written as a constant is a sequence of characters enclosed within quotes The single quote character can be used as well for strings greater than one character see Character below for an explanation String variables are typically used to assign file names titles etc Some functions take on string or character variables other functions return strings For example the CONCAT s1 s2 function will add together two string variables and return it as a longer string e Character variables take on the value of a single character When written as a constant in a program character constants consist of a single character enclosed within single quotes Character constants are not typically used within a user s program but are available if needed Usually character constants and variables can be used anywhere string variables are allowed e File variables are used to reference files Most of the time file variables are transparent and you need not explicitly define or manipulate file variables This i
64. of iterations default 100 INTEGRATE_TOL is the convergence criterion default 1 0E 0006 INTEGRATE_METHOD INTEGER INTEGRATE_N INTEGER 100 INTEGRATE_TOL REAL 0 00000100000 Debugging Options A number of command line options assist in debugging models data files program options numerical methods and the mle program interpreter itself see Table 1 The adx option provides a way of tracing the execution of each statement in turn The d1 option is useful for examining likelihoods every time a complete likelihood is computed More advanced debugging options assume some familiarity with the internal workings of parsers symbol tables and an advanced understanding of likelihood estimation The di option offers help with debugging problems of numerical integration in mle The debugging and help options send output to the screen or standard output device The standard DOS and Unix redirection symbols gt and I can be used to redirect the output to other devices For example the command mle d 25 test mle gt test dbg will create a possibly large file called test dbg The output file specified within the test mle program will not be affected Other Options testing number formats mle supports many formats for numbers Each number begins with a numeral but can contain additional symbols to specify different meanings A full discussion of the number formats is given in the data chapter You can test the way i
65. or created This chapter discusses aspects of creating a data set including e How to read a data set into mle How to set up a data file e How to transform variables How to drop unwanted observations The number formats recognized by mle Reading data from a file Data sets are read into mle from an input file They consist of at least one and usually many observations Each observation is a collection of one or more variables The mle pata statement defines how observations are to be read from a file The data statement also has mechanisms for doing transformations to the data as they are being read In the current implementation of mle the transformations and other data manipulations provided by the data statement are adequate for most tasks but are not particularly powerful Other programs spreadsheets or database managers for example can be used for complicated data transformations and the resulting data set can be then used by mle Naming the data file Data sets are created by a DATA statement The data statement typically works by reading observations from a data file This file must be named and opened with a call to the DATAFILE procedure The call to DATAFILE is usually defined near the top of the program before the DATA statement as in the example in Chapter 1 The data statement begins with the word DATA and is terminated by a matching END So if the name of the data file is MYDATA DAT you include the stateme
66. sin x 2x from Vr to Vn Here is an example of how that could be coded INTEGRATE x SORT PI SQRT PI SIN x 2 2 x END The function evaluates to 1 525 Here it is with comments INTEGRATE x x is the variable of integration SORT PI This is the lower limit of integration SORT PI This is the upper limit of integration Close of the argument list SIN x 2 2 x The function to be integrated END End of the integrate function Any of the predefined probability density functions can be used as part of an expression For example the area under a normal distribution with u 10 and o 3 between 8 and 12 could be calculated by PDF NORMAL 8 12 10 3 END The DATA function The DATA END function provides a mechanism to feed observations to the likelihood This function specifies that observations are to be fed to the likelihood one at a time corresponding to the product over all observations shown in likelihoods or the shown in loglikelihoods The DATA function loops through all observations that were previously read in by the DATA statement In other words the DATA END function returns the total logloglikelihood or total likelihood given a series of observations and an expression for an individual likelihood or individual loglikelihood The general form for the DATA function is DATA lt optional_form gt lt expression gt END where optional_form is one of
67. table of contents mle user s manual Brief table of contents Table of contents PRELIMINARIES 393553 ccciceste cose sci wihin Jaded e lili ih The Program File sit TNC B1111 UC cess cet acted Sete ct etateteateneted o id EBS The Output lA A oi Skeleton of an mle Prograne ccccecccccscssssssesessesssscensessssessessesensessessasessescesessensnecsesssecesesecsesscenscsesaeecesensensesseasesensesaeeseneens 2 ANEXAMPLE 232s oii St Sh GO he ek a o e e o ah et 3 Program Constants and Variables ie at pila cid 4 COMME e e ot a Ca a a A a OE o oe 4 Reading Dalai ia to a ii id 4 Etkelthocd Model A A EER o 5 Model Run Part of the Model Statement cccccccccsssssssccccceecccccceseseesesessesseeenccecececcececceecsssststttttteeseeess 5 Run End Part of the Model Statement A Note About Parameters ooconnnnnnnnnnnnnnnannanananacnnnoos m EZ wt ot PR X WRITING MILE PROGRAMS tuna ind Sle CONVENTOS o ias a Typographic CONVERSA iia WNTEIS ASS TACENS iaa 10 Assignment Statement Variable Names aia aaa OA laa aii ala iia es Varnable TI peta es Array Variables cdsc5ihce scaiigesveathcennetsncevsedes ounedatovawedaeonnned snasnueausaoaneauacesesueareiesuagensedesGeungause canes eraneeuae coagaueteasgaees Initialized Array Variables Data Statement Model Statement Intrinsic Procedures User defined Procedures iio iio Erotica sie 16 User defined Functions AA ARAN BEGIN END Statement ast ae is ot FOR Statem
68. the likelihood In this example we specify the likelihood N L u c II EM lu O e a FI i l where N is the number of age categories i e the number of lines of observations frequency is the frequency of observations per age category SO is a survival density function for the normal distribution fopen aNd tciose are the two times read from the data file into the variables topen and tclose and y and O are the parameters that will be found by maximizing the likelihood The first part of the likelihood expression is a DATA END function This function specifies that observations are to be fed to the likelihood one at a time corresponding to the product shown in the likelihood above Do not confuse the DATA function found within the MODEL statement with the DATA statement discussed above The DATA function loops through all observations that were 5 Run mle 2 1 manual previously read in by the DATA statement Within the DATA ND function comes the rest of the likelihood which is shown to the right of the in likelihood 1 Within the DATA END function is the individual likelihood As parameter estimates are being found the individual likelihood is evaluated for each observation and the log of that likelihood is taken Each individual loglikelihood is multiplied by the frequency of the current observation and added to the total likelihood In short the DATA END function takes a series of o
69. the main authors are Thomas Williams Colin Kelley Russell Lang Dave Kotz John Campbell Gershon Elber and Alexander Woo How to Obtain Gnuplot mle requires Gnuplot version 3 7 or later Gnuplot and its documentation can be downloaded from many ftp and web sites Gnuplot can be downloaded and compiled on your computer system For some platforms particularly DOS and Windows executable packages are commonly available Here are some ways of obtaining Gnuplot The official ftp distribution site for the Gnuplot source is ftp dartmouth edu The file is called pub gnuplot gnuplot 3 7 tar Z Most comp sources misc archive sites distribute Gnuplot Executable versions of Gnuplot for MS DOS and MS Windows are available from oak oakland edu 141 210 10 117 as pub msdos plot gpt37 zip garbo uwasa fi Europe 128 214 87 1 as pc plot gpt37 zip and archie au Australia 139 130 4 6 as micros pc oak plot gpt37 zip The files are gpt37doc zip gpt37exe zip gpt37src zip and gpt37win zip OS 2 2 x binaries are at ftp os2 nmsu edu 128 123 35 151 in os2 2 x unix gnu gplt37 zip 96 mle 2 1 manual e There are many other web sources are available Give the name Gnuplot to any major search engine to find a location near you e Most sites that distribute software under the Free Software Foundation GNU Public License also distribute Gnuplot 0 Many Linux distributions contain Gnuplot as a package Basics of Gnuplot Full d
70. the para function is specified with a model a variable called D_1DxX is initialized to the value of 1 When D_IDX is 1 any reference to the DATA variables returns the value of the first observation Thus the variable age yields the value 42 As each likelihood within the DATA function is computed the value of D_IDX is incremented up to the last observation The total number of observations read by DATA statement is accessed by the variable N_oBs This variable is assigned the count of lines of observations read in assuming one line per observation and kept i e not dropped However this variable is incorrect if a single line represents more than one observation For example if the FREQUENCY variable is defined and some observations have frequencies other than one the n_oBs will no longer represent the correct number of observations Another variable TOTAL_OBS is the sum over all FREQUENCY observations and can be used as a count of the total number of observations Internally variables are stored as special array variables Whenever a data variable name is specified the value of D_IDx is used as the index into the array All observations are easily accessed outside of the DATA LEVEL Or LEVELDELTA functions by directly manipulating D_IDX Here is an example that builds on the previous example The following code which is placed after the DATA statement counts and prints the number of observations under and ove
71. times the time an individual was enrolled for prospective observation ta the last time an individual was observed as alive and the first time the individual was known to be dead The first time ta defines the left truncation point t and te define an interval within which death took place For right censored observations te is set to infinity or a number greater than the human lifespan The likelihood is N S t 10 S t 10 6 L 0 t t t 2 6 0 t I St 18 From this likelihood it can be seen that an individual s probability of death is the area under pdf between t and te and divided by the area from ta to infinity which renormalizes the pdf for the period of actual observation An individual likelihood is constructed in mle as PDF SILER 14 15 6 0 05 0 3 0 0 0 001 0 05 END which represents a person who died between ages 14 and 15 and were enrolled in the study at age 6 112 MLE TITLE DATAFILE OUTFILE DATA talpha topen tclose END MODEL DATA PDF Statistical examples Example ex5 dat ex5 out FIELD 1 Left truncation time FIELD 2 time last known alive FIELD 2 time first known dead or oo if censored SILER topen tclose talpha PARAM al LOW 0 00001 HIGH 0 5 START 0 PARAM bl LOW 0 01 HIGH 2 START 0 PARAM a2 LOW 0 HIGH 1 START 0 PARAM a3 LOW 0 0000 HIGH 1 START 0 PARAM b3 LOW 0 00001 HIGH 1 START 0 END END RUN FULL END of END o
72. variables hold a sequence of character constants A string written as a constant is a sequence of characters enclosed within quotes The single quote character can be used as well for strings greater than one character String variables are typically used to assign file names titles etc Character variables take on the value of a single character When written as a constant in a program character constants consist of a single character enclosed within single quotes Character constants are not typically used within a user s program but are available if needed Usually character constants and variables can be used anywhere string variables are allowed File variables are used to reference files Most of the time file variables are transparent and you need not explicitly define or manipulate file variables This is because mle defines and does the bookkeeping for the data file the output file the plot file and the screen or standard output file File variables can be created should you wish to create and manipulate other files Here are some examples largely self explanatory of typical assignment statements large_data subtitle nine five one onealso N_OBS gt 5000 large_data is declared as type BOOLEAN Analysis of INFILE subtitle is declared as type STRING 3 F330 nine is type REAL 2 3 five is type INTEGER SIN 23 2 COS 23 2 one is type real SIN 23 0i 2 COS 23 2 onealso
73. wW exp xiB where p is the estimated intrinsic parameter mean in this case Thus for the ith observation the u parameter of the normal distribution will be constructed as j meanXexp sex xb_sex weight Xb_weight The second parameter stdev has the same two covariates modeled on it but the parameter names are and must be different from the parameters modeled on mean For some forms the parameter itself is transformed For example when a parameter is a probability as it is for the MIX function in above the parameter can be defined as PARAM p LOW 999 HIGH 999 START 0 FORM LOGISTIC END The logistic transformation permits the parameter p to take on any value from negative infinity to infinity but the resulting value passed used by the likelihood will be constrained to the range 0 1 In other words mle will estimate a parameter over the range 999 to 999 but before that parameter is used in computation it will undergo a logistic transformation as p 1 1 exp p so that the value of p will be a probability mle currently provides a limited number of specifications for how parameters and covariates are modeled see the Reference Manual Even so this mechanism for modeling covariates on any parameter is extremely general and provides the basis for building unique and highly mechanistic Box et al 1978 or etiologic Wood 1994 models The PDF functions One of the most frequently used functions in the MO
74. within a procedure but will not be available outside that procedure e Procedures can overwrite the name of intrinsic procedures User defined Functions mle provides capabilities for user defined functions A function is a single word command that takes a list of zero or more arguments performs some operation and returns a result User defined functions in mle are very similar to Pascal s user defined functions They must be understood as two components the function definition and a call to the function A user defined function must be defined prior to being called By convention they are usually placed near the beginning of the program Here is an example of a user defined function being defined and later used MLE FUNCTION int_power a REAL 3 INTEGER REAL raises a to integer power j RETURN 1 0 WHILE j gt 0 DO IF ISODD j THEN RETURN RETURN a END if a a a j j DIV 2 END while END int_power WRITELN int_power SORT 4 int_power 4 5 2 int_power 10 2 3 The definition begins with the word FUNCTION and ends with the corresponding END The word following FUNCTION is the name of the function in this case int_power The name is followed by a list of O or more arguments that are formally defined that is a name and type must be specified for each argument In this example two arguments a and 3 are defined The argument names and all of the variables defined within the function are pr
75. you can modify the Brent maximizer First the maximum number of iterations in a single dimension can be set with BRENT_ITS value which is sufficient for almost every function The next modification is to change the value of BRENT_MAGIC to some other number This number defines the interpolation point between two 79 mle 2 1 manual points of a parabola the so called golden mean of ancient Greece With such a heritage there is little reason to change it Finally the value BRENT_ZERO is an arbitrary tiny number used in place of zero for the difference of two equal function evaluations Simulated Annealing Method The simulated annealing method is an exciting and relatively new idea in maximization It was first proposed by Kirkpatrick et al 1983 for combinatorial problems The algorithm was further developed for functions of continuous variables by Corana et al 1987 and refined by Goffe et al 1994 both papers lucidity describe how the method works As a metal is heated to its melting point it loses its crystalline organization Then as it again cools the crystalline pattern reemerges When cooled slowly a process called annealing small crystals of metal rearrange themselves and join other crystals with maximum orderliness or minimum energy This occurs as random movements of atoms and groups of atoms eventually fall into an alignments that minimize gaps Once these structured alignments arise they form a la
76. 0001 HIGH 3 START 0 2 END END pdf PDF NORMAL topen tclose PARAM mu LOW 10 HIGH 100 START 30 END PARAM sigma LOW 0 0001 HIGH 10 START 1 END HAZARD COVAR z 1 END pdf END integrate END level END data RUN FULL END model END mle program 75 mle 2 1 manual The LEVEL statement advances through all of the individual level observations and computes the product of the likelihoods for each individual The DATA statement only sees observations that begin with a 1 because the LEVEL statement consumes all of the observations that begin with a 2 The LEVEL statement returns a likelihood which is the product of likelihoods taken within each subject the DATA statement takes those likelihoods one per subject takes the natural log of each and sums them over all subject The LEVELDELTA function The LEVELDELTA function is very similar to the LEVEL function LEVELDELTA provides a mechanism by which multilevel or hierarchical models can be constructed The syntax of the LEVELDELTA function is LEVELDELTA lt expression gt THEN lt optional_form gt lt expression gt END The effect of the LEVELDELTA function is to evaluate lt expression gt for each observation and while the expression does not change form a product of likelihoods out of the observations The lt optional_form gt is specified a
77. 1 HIGH 50 START 3 END END of the PDF END RUN FULL END of the MODEL END 115 Statistical examples From this specification of covariates the u intrinsic parameter of the normal distribution will be computed for the ith observation as u muxexp weight X b_weight age X b_age Survival analysis Hazards model An alternative to the accelerated failure time model is the hazards model Under the hazards model the effects of covariates is to raise or lower the hazard by some amount In general if A t is the hazard function covariates for the ith individual xB are modeled on the hazard as h t h exp xB Most of the probability density functions in mle provide a mechanism for modeling the effects of covariates on the hazard You can find out for any particular pdf by typing for example mle h lognormal A message will tell you whether or not covariates can be modeled on the hazard In this example the same normal distribution used in the previous example has had the two covariates moved from affecting u to affecting the hazard MLE TITLE Example DATAFILE ex8 dat OUTFILE ex8 out DATA topen Last observation time prior to the event tclose First observation time after the event weight the first covariate age the second covariate END MODEL DATA PDF NORMAL topen tclose PARAM mu LOW 0 00001 HIGH 100 START 25 END PARAM s LOW 0 01 HIGH 50 START 3 END HAZARD COVAR weight PARAM b
78. 10510011 4 22599601366 0 0 The results are nearly identical to the regression results presented earlier All parameters of the likelihood model are given with a standard error 120 Statistical examples For a series of data that are complete as given in this example there is little advantage to using maximum likelihood for parameter estimation Maximum likelihood methods are most useful under some simple modificatons of the data or model used above Suppose that in addition to the above observations we had several observations that were less than the minimum or greater than the maximum value of y that could be measured by our instrumentation The maximum likelihood model could accomodate such observations with ease Another modification might be to change the underlying distribution to something other than a normal For example could take on an extreme value distribution or a Laplace distribution Again the likelihood framework easily accomodates such modifications Case study Mortality models Estimation of age at death distributions from skeletal indicators is an important task for ecologists and anthropologists alike This case study discusses some likelihood models to estimate such distributions The simplest case arises when exact skeletal ages at death are known for a representative sample of N skeletons covering the entire life span Call f al the probability density function that represents the age at death distribut
79. 2 3 4 5 6 7 ERRORBARS Additional expressions within CURVE END define things like error bars Gnuplot provides two standards for error bars If only one additional error bar expression exist that value is taken as a delta value to add and subtract from the y value If two error bar expressions exist the values are taken as the minimum and maximum respectively values for the error bars Here is an example of plotting error bars for a binomial experiment involving 40 observations 90 mle 2 1 manual Plots the probabilities of observing x boys in a families of exactly 5 children 5 bernoulli trials for families of size 5 0 502 probability of a male child per trial Also plots the standard errors for each outcome assuming that fam 40 a sample of fam families are observed PLOTFILE DEFAULTPLOTNAME PLOT set yrange 0 set xrange 0 25 REAL2STR n 0 25 6 2 CURVE WITH errorbars 0 TOn x x axis value PDF BINOMIAL x p n END y axis value SORT p 1 p fam errorbar delta END curve END plot END mle The Gnuplot file and graph resulting from this program looks like this set terminal windows reset set data style lines set autoscale set nokey set yrange 0 set xrange 0 25 5 2500 plot eg5 001 using 1 2 3 notitle with errorbars Y 0 4 0 35 0 3 0 25 0 2 0 15 0 1 0 05 Other strings A series of one or more string expressions can
80. 3 END model ooooooo ooo b b b b b b b o o THEN END The single wITH keyword creates a total of eight models All of the models include the parameter b_0 And all models will be created from the list b_1 b_2 b_3 Here is the equivalent list of models that will be estimated from this single wITH statement WNRENDEA PO EF The use of parameters within parentheses in the lt withlist gt raises the issue of the number of models that will be created Since each parameter has two states included and not included there are 2 models formed where K is the number of parameters given in parentheses The practical use of WITH in this way depends on how quickly a single model solves With eight parameters there are 256 models estimated At 10 parameters the number is 1024 and 15 parameters yields 32768 models Each of the keywords FULL REDUCE and WITH can be followed by an optional THEN END clause gives you a way to do something a particular model is solved or set of models for wITH For example you could insert code to transform the parameters from one form into another plot distributions or write results to another file Most legal statements can come between the THEN and END except DATA END and MODEL END statements 61 mle 2 1 manual Bayesian model averaging The w1TH keyword can generate many models from a single line of text Ideally the uncertai
81. 36 18 25 FUNCTION thomas k INTEGER a REAL b REAL REAL returns the pdf for the thomas dist count k and parameters a and b RETURN EXP a SUMMATION j 0 k a 3 FACT 3 EXP 3j b 3 b k 3 FACT k 3 END summation END function thomas DATAFILE armeria dat OUTFILE DEFAULTOUTNAME PLOTFILE DEFAULTPLOTNAME INPUT_SKIP 3 DATA numb_plants FIELD 1 numb_quadrants FIELD 2 FREQUENCY numb_quadrants END TITLE Thomas distribution MODEL PREASSIGN BEGIN a PARAM aa LOW 0 0001 HIGH 20 START 2 0 END b PARAM bb LOW 0 0001 HIGH 40 START 0 5 END DATA thomas ROUND numb_plants a b END END preassign RUN FULL END Plot obs exp of quadrants with k plants under the Thomas distribution PLOT set title Thomas distribution set xrange 0 5 10 5 set key top right CURVE KEY Expected WITH boxes 0 TO 10 i 100 thomas i aa 1 1 bb 1 1 END CURVE KEY Observed WITH impulses d_idx 1 TO 11 numb_plants numb_quadrants END END plot TITLE Poisson distribution MODEL DATA PDF POISSON numb_plants PARAM m LOW 0 001 HIGH 100 START 1 5 END END END RUN FULL END Plot the obs amp exp of quadrants with k plants under the Poisson distribution WRITEPLOTLN pause 1 PLOT set title Poisson distribution set xrange 0 5 10 5 set key top right CURVE KEY Expected WITH boxes 0 TO 10 i 100 PDF POISSON i m 2 1 END END
82. 4 mle 2 1 manual e lt variable gt is the name of the variable being defined The variable must not already exist All variables created by the DATA statement are defined to be type real Integer values will be read in from the data file and converted to real numbers Text strings can exist within a fields of a text file but must not be assigned to a variable 0 FIELD refers to which column within an input file a variable is found in In the hammes dat file four fields or columns existed in the input file The field specifier must be a positive integer constant 0 LINE provides a way to read observations spread across multiple lines in the data file When the LINE keyword is used the maximum number of lines specified e g 2 for LINE 2 is taken as the number of lines for all observations If observations each take but one line the statement LINE 1 may be dropped one line per observation is assumed as a default The line specifier must be a positive integer constant 0 lt expr gt defines a data transformation expression The expression may refer to the variable being read or any variables defined prior to the current variable The line newvar FIELD 3 newvar 2 will read newvar from field three of the data file the value of newvar is then squared and assigned back to newvar DROPIF provides a mechanism to drop observations The expression following DROP1F will evaluate to TRUE or FALSE If TRUE the observat
83. 5 START 0 01 END PARAM b LOW 2 HIGH 0 START 0 1 END END of the PDF END RUN FULL END of the MODEL END Survival analysis Left and right truncated observations This example extends the previous one by including both left and right truncation as well as interval censored observations We will use a child mortality example again but now each children is recruited at some age from 0 to 5 years Their risk will be left truncated at the age of entry Again only children who die before age 5 would be included in the analysis so that all exposures are right truncated Finally children are periodically visited so all observations are interval censored Again we will use the Gompertz competing hazards mortality model for this fictitious prospective study of child mortality The likelihood is A S t 10 S t 18 8 L O t t t t 4 i Lsa 10 50 19 From this likelihood it can be seen that an individual s probability of death is the area under pdf between t and te and divided by the area from ta to to which renormalizes the pdf for the period of actual observation An individual likelihood is constructed in mle as PDF GOMPERTZ topen tclose talpha tomega 0 05 0 3 END For example PDF GOMPERTZ 2 1 2 4 1 0 5 0 0 05 0 3 END returns the probability that a child enrolled in the study at age one and selected for having died by age five died between the ages of 2 1 and 2 4 114 Statistical examples
84. A statement info_methodl FALSE info_method2 TRUE minrage 0 minimum age of recruitment maxrage 0 maximum age of recruitment censorprob 0 20 probability of dropping out 0 width of the observation interval obswidth 4 studylength 10 max of months to observe over mincensor min number of months to censor at maxcensor max number of months to censor at sitmean 6 mean age at sitting sitsd 1 sd of age at sitting array for observations ageo REAL 1 TO 500 last interval before sitting agec REAL 1 TO 500 first observation after sitting numbobs 500 save the estimates of mu and sig one for each simulation savemu REAL 1 TO 200 savesig REAL 1 TO 200 numbsims 200 Loop through data sets FOR sim 1 TO numbsims DO create a new data set FOR cid TO numbobs DO s_age QUANTILE NORMAL RAND sitmean sitsd END get age at sitting r_age RRAND minrage maxrage age at recruitment now determine how long to observe children o_len IF RAND lt censorprob THEN RRAND mincensor maxcensor ELSE studylength END if function Now figure out open and closing interval IF s_age lt r_age THEN cross section responder ageo cid 0 agec cid r_age 157 Programming tutorial ELSEIF s_age gt r_age o_len THEN right censored ageo cid r_age o_len agec cid 1 ELSE FOR x r_age TO o_len STEP obswidth DO IF s_age gt
85. ATE function actually there can be more see the reference manual The first is x the variable of integration within parenthesis come the lower and upper limits of integration followed by the integrand One of the strengths of mle is that it contains a large number of predefined probability density functions and functions derived from the PDF Any of the predefined probability density functions can be used as part of an expression For example the following program will give the area between user specified limits for a normal distribution with user specified parameters WRITELN Returns the area under a Normal distribution WRITE Lower and upper limits of the area READLN a WRITE Mean and Standard deviation READIN mu sig WRITELN PDF NORMAL a b mu sig END END b Notice that the PDF function is called within the WRITELN function This is perfectly valid The arguments to WRITELN can be any expression no matter how complicated Here is an example of what happens when this program is run Returns the area under a Normal distribution Lower and upper limits of the area 3 4 Mean and Standard deviation 10 3 0 0129347552 143 Programming tutorial Random numbers REAL b REAL Simulation programming often times requires drawing numbers from particular probability densities Random numbers can be generated for nearly all of the densities supported by mle The ouaNTILE functio
86. Alt Bu blockmenu Block menu AN unmapped AWM Serra mlemenu mle menu Plis helpedit Displays editor commands E ont MlerUN cconncccnnnn Run in mle B33 nites filemenu File menu PA blockmenu Block menu Piti Tedraw oo eee Redraw the screen Oasis HN 3 coeshietees encase see Find text Band replace eee Find and replace AAA AATE wordwraptoggle Toggle word wrap Pocito EXEC ssas Open up OS window PlOedpesesnena mletempl Insert an mle template Homes ienanaa linebegin Go to beginning of line UpATT snenie lineprev nesnese Go to previous line POU eer inienn Pageup Go up one page EEATT snasda charprev seee Go to previous character REAT ow eee charnext 00 Go to next character Edaran lineend 006 Go to end of line DONA TT csi lineneXt ee Go to next line PED sisi pagedown Go down one page Insert eee inserttoggle Toggle insert overwrite Delete chardel Delete character del Shift_Fl helpmlesearch Match and give help on a keyword Shift_F2 mleparse Parse in mle 39 mle 2 1 manual Shift_F3 quitsave ee Save and quit Shift_F4 0 blockbegin Mark beginning of block Shift FS colorforeset
87. But it is with great pleasure I dedicate this manual to my undergraduate advisor the late Dr Robert E Miller Dr Miller was an anthropologist a South Asianist a futurists and an ardent advocate of systems thinking He taught with an enthusiasm that was both infectious and inspiring I suspect that my career as an anthropologist has been motivated subconsciously by the words that ended a number of our philosophical debates Darryl you simply can t quantify love If Dr Miller s conjecture is ever disproven I am sure that likelihood will have played a pivotal role iii mle user s manual Brief table of contents Brief table of contents PREFACE BRIEF TABLE OF CONTENTS TABLE OF CONTENTS INTRODUCTION TO MLE INSTALLING AND RUNNING MLE ETIT 31 CREATING DATA SETGB cs00008 A 47 BUILDING LIKELIHOOD MODELS cccsssssssscrssessscsssesssessscsssesssesssesssesssesssesssesssesssesssesssesssesssesssessscssssssessoetees D AS LL e IV e ecccccce AS ecccccccccce ecccccccccce ecccccccccccce eC ccccccccccccccccs AS occrcccrccnnn ecccccccccccccce ecccccccccccccce eccccccccccccccccs ecccccccccccccccccce ecccccccccccce ecccccccccccccce ecccccccccccccccccce oe cecccccccccccccccccce Pe ccccccccccccccccccccoce mle user s manual Brief
88. CI lt m gt lt r gt lt param gt LCI lt m gt lt r gt lt param gt SE lt m gt lt r gt LOGLIKELIHOOD lt m gt lt r gt FREE_PARAMS lt m gt lt r gt DELTA_LL lt m gt lt r gt ITERATIONS lt m gt lt r gt EVALS lt m gt lt r gt VCV_EVALS lt m gt lt r gt CI_EVALS lt m gt lt r gt INVERTFLAG lt m gt lt r gt CONVERGENCE lt m gt lt r gt VCV lt m gt lt r gt where lt m gt is the model number and lt r gt is the run number for the model and lt param gt is the name for a free parameter in the model Each vcv lt m gt lt r gt is an nxn matrix where n is the number of free parameters which is available in FREE_PARAMS lt m gt lt r gt The variable INVERTFLAG lt m gt lt r gt is a boolean variable that specifies whether or not the variance covariance matrix was inverted without error Each CONVERGENCE lt m gt lt r gt variable has an integer value that takes on a value given in Table 4 Table 4 Meaning of the CONVERGENCE variable Value Meaning Not done Stopped after maximum function evaluations Stopped after maximum number of iterations Converged normally Trouble converging in one dimension Starting value is not within min and max bounds Starting temperature is not positive Did not converge Building MODEL statements Expressions are used in many ways within mle so that you should become thoroughly acquainted with expres
89. DEL statement is the PDF function The purpose of the PDF function is to specify the component of a pre defined probability density or distribution functions Although the name is PDF the PDF function can return the probability density function areas under the PDF curve including the cumulative and survival density functions and the hazard function In addition the PDF function can return areas or densities that are left and right truncated The structure of the PDF function call is PDF lt PDF name gt lt time variablel gt lt time variable2 gt lt intrinsic parameter 1 gt lt intrinsic parameter 2 gt lt optional HAZARD gt END The name following PDF is the name of the built in distribution mle predefines over 60 density functions including most well known ones like the normal lognormal weibull gamma beta and exponential distributions A complete summary of built in distribution is given in a later chapter 71 mle 2 1 manual Time variable list is a list of the time arguments passed to the PDF Most univariate PDFs can take from one to four time arguments In fact these four times describe a single observation in such a way as to incorporate a number of defects in the observation process including right censoring left truncation right truncation cross sectional observations A description of how the four arguments combine to specify a probability are given in the section that follows Note that the time
90. E results dat procedure calls e Change all SEED 5352 statements to SEED 5352 procedure calls e Eliminate all const blocks that may have been used at the beginning of MODEL statements Instead define the constant outside of the MODEL statement Alternatively use a PREASSIGN function within the MODEL statement to create temporary variables within that statement e Add an END after all param functions e Some older versions of mle did not have or allow the DATA END function within the MODEL statement In more recent versions a DATA END function is almost always required to cycle through all observations in the data set MODEL statements should usually look like this the rest of the likelihood goes here data model e Some older versions of mle used the keyword FREQ followed by a variable name within a PDF function to denote the a frequency variable These must be deleted The special variable names FREQ and FREQUENCY should be used in the DATA statement to denote frequencies of observations e The method of transforming variables within the DATA statement has changed in version 2 All transformations must be re coded following the new syntax described earlier in this chapter and in a later chapter Additionally the method of dropping or keeping variables within the DATA statement has changed An example of the old syntax is 27 mle 2 1 ma
91. E DEFAULTOUTNAME DATA subject sex age weight height armcirc skinfold deltahr deltav02 END data subject ID individual s sex O female 1 male individual s age individual s weight individual s height mid upper arm circumference individual s skinfold measurement heart rate adjusted for baseline rate volume of 02 used during exercise adjusted for baseline PAPA Aaa E E E a a a d e OOO 10011400 oO WMWAInDAOBPWNE MODEL PREASSIGN BEGIN sigz PARAM sigmaz LOW 0 001 HIGH 50 START 1 END upperlim 6 sigz lowlim upperlim END DATA INTEGRATE z lowlim upperlim PDF NORMAL z 0 sigz END LEVELDELTA subject THEN PDF NORMAL deltav02 PARAM b0 LOW 200 HIGH 50 START 0 FORM ADD COVAR sex PARAM bsex LOW 10 HIGH 50 START 0 COVAR age PARAM bage LOW 10 HIGH 50 START 0 COVAR weight PARAM bweight LOW 10 HIGH 10 START 0 COVAR height PARAM bheight LOW 10 HIGH 10 START 0 COVAR armcirc PARAM barmcirc LOW 10 HIGH 10 START 0 COVAR skinfold PARAM bskinfold LOW 10 HIGH 10 START 0 COVAR deltahr PARAM bdeltahr LOW 10 HIGH 10 START 0 COVAR z 1 END param b0 PARAM sigma LOW 0 00001 HIGH 50 START 5 EN END pdf normal END leveldelta END integrate END data END preassign RUN WITH sigmaz b0 sigma bsex bage bweight bheight barmcirc bskinfold bdeltahr END model END mle Setting the maximization method mle has four methods for maximizing the likelihood f
92. EL function The LEVEL function provides a mechanism by which multilevel or hierarchical models can be constructed The syntax of the LEVEL function is LEVEL lt boolean expression gt THEN lt optional_form gt lt expression gt END The effect of the LEVEL function is to test the lt boolean expression gt for each observaton and while the condition is true form the sum of loglikelihoods out of the observations The lt optional_form gt provides alternative ways of tallying the likelihoods and is specified as it is for the DATA function save for one difference The default form is FORM PRODUCT The best way to understand the effect of the LEVEL command is by an example Consider the likelihood hi L fe 6182 dz 1 j This is a standard model for which a distribution of clustering or heterogeneity g z 1s estimated along with the model s other parameters 0 There are two levels that make up this model Let us call the outer level denoted by the outer product the subject level that is we have N individual subjects and this outer product is taken over all subjects For each of N 74 mle 2 1 manual subjects there are multiple repeated observatons taken For the ith subject we have n repeated observations The inner level formed by the innermost product is the likelihood formed by n repeated observations of the ith subject The rationale for this type of
93. END END END for sig END plot END for mu END multiplot END mle 95 mle 2 1 manual The MULTIPLOT statement makes use of a multiplot routine available in Gnuplot The Gnuplot statement does not work correctly for all terminal types In particular the x axis labels an plot titles do not always print correctly for the right most plots Also plots with x axis labels and plot titles are sometimes scaled to an overly small size mle attempts to scale the multi plots so that none of the figures overlap and so that the aspect ratio is unchanged You can affect the scaling size from within mle by changing the variables MPLOTYSCALE and MPLOTXSCALE both begin as 1 0 These variables control the relative degree of shrinkage or expansion beyond that required to fit a plot in its rectangle Working with Gnuplot What is Gnuplot Gnuplot is a function and data plotting program that is designed to work on a large range of computer systems The program has many graphing capabilities including the ability to plot directly from files mle makes use of a relatively small subset of the Gnuplot capabilities to generate graphs In fact mle simply writes a Gnuplot program and creates data sets Gnuplot does the rest The authors of Gnuplot provide for free distribution of the software including the source code Over the years many individuals have contributed to writing the program but
94. ETDATE GETTIME WRITEPLOTLN WRITEPLOT PLOTFILE PTRANSFORM FINISHPLOT Additionally INC x and DEC x are defined as both procedures and functions e New predefined PDFs ZIPF BETABINOMIAL THOMAS POLYAEGGENBERGER e A restart file option has been added assist in rerunning programs The sw writes updated parameter START values to the file lt name gt lt model_number gt lt run_number gt each iteration The sr option on the command line instructs mle to read parameter START values from the file e A termination file option has been added When the t is given the program will periodically check for the file lt name gt TRM If the file exists the program will terminate e The RUN part of the MODEL statement can now take a wITH clause in addition to FULL and REDUCE A list of parameter names follow the wITH keyword The model will be run using only those parameters Other parameters will be set to the TEST value set in the PARAM function Additionally one or more parameter names can be enclosed in parentheses following the wrTH keyword All possible models 2 for N parameters that include and exclude these parameters will be formed e A Bayesian model selection report is now available Setting AIC_SELECT TRUE will produce a report based on Akaike s information criterion AIC Setting AICC_SELECT TRUE will prod
95. IF lt bexpr gt THEN lt statements gt ELSEIF lt bexpr gt THEN lt statements gt ELSEIF lt bexpr gt THEN lt statements gt ELSE lt statements gt END Notice that any number of statements can come within each section of the IF statement The ELSEIF and ELSE clauses are always optional When there is no ELSE Clause the IF statement doesn t necessarily end up executing any of the statements That is if all 17 and ELSE expressions evaluate to FALSE the IF statement will skip to the end of the statement Here is another example of using the IF statement IF SYSTEM MS DOS THEN PRINTLN Run from an MS DOS system SEP DATAFILE C SEP DIR SEP NAME ELSE PRINTLN Run on a unix system SEP DATAFILE DIR SEP NAME END FOR statement The For statement provides a means of looping through statements for some fixed number of iterations mle contains several different types of For statements Three of them are introduced here The rest are introduced in the section on arrays 145 MLE FOR x 0 Programming tutorial Here is an example program that creates a table of sine and cosine values TO 359 DO r DTOR x WRITELN x degrees r radians SIN SIN r COS COS r END for END mle FOR lt v gt lt The variable x is called the index variable Its value will change with each pass through a loop
96. In this example x is initially set to zero and the statements sandwiched between the Do and the END are executed The value of x is incremented by one and the statements are executed again and so on until x is 359 After the last pass through the loop execution continues after the END Generically the simplest form of the For statement looks like this expr gt TO lt expr gt DO lt statements gt END The variable lt v gt must either not be previously defined or if it already exists it must be an INTEGER or a REAL variable Its value will change as the FOR statement is executed The first lt expr gt will be executed once at the beginning of the loop and will define the starting value of v The second lt expr gt will also be executed once and will define the last value of v Here is another example This program reads an integer and prints it out backwards MLE read an integer and print it out backwards i INTEG ER WRITE Type an integer READIN i FOR x 1 tmp i i iD TO LOG10 i 1 DO temporarily save i Iv 10 get rid of last digit WRITE tmp i 10 compute and print the least significant digit END for WRITELN END mle with no argument writeln goes to the next line FOR STEP statement MLE FOR x 9 WRITELN END for END mle There are several variations on the For The first the sTEP clause allows the index variable to be incr
97. MODEL ccccceseeccceecceceeececeuecceceuseceseueeceseuecceseueecessueesseaueeseuanees SURVIVAL ANALY SIS HAZARDS MODEL e venues SURVIVAL ANALYSIS IMMUNE SUBGROUDP ccsssescccsesccccseseccsececsescscecucesensueeseesueeseeseecsessuesseseusesessueesessuecsensueesessueess LINEAR REGRESSION IN THE LIKELIHOOD FRAMEWORK sccccceeeccecevecccceusccecuueccecuuecceeeecceseueccesueceesseecessenecesseeeccesaneceees CASE STUDY MORTALITY MODELS 35sec ec Saved Sa es obese Sead seed cous ia TE GISTIC REGRESSION ica CASE STUDY EXTENDED POISSON FOR MODELING SPECIES ABUNDANCE sscccccesecceceecccceeccsceeceeceeescseeeccesuneceeeaneceees INTRODUCTION TO PROGRAMMING IN MLE ELEMENTS OF MLE PROGRAMMING ccccccccsssssseccccececssseseccccccecaussseecccceesauaesseecceseeauuesseeccceessauanseeecceessauanssecceeesesauaneeeecees THESE AM ER RCT CET POSE CT EET ECE E CMTE eee ere ee ita Identifiers assignment statement and functions zi oe bo be NR eS 135 Statements with numeric boolean and logical EXPresSiONS ccseceeseceesseeesseeseseeceseeeesaeeesaeecsacecsteeeeteesesaeeesaeers 137 Operator precede aia aa More On Strings i sures cesvecs recsevdas cons ecdacubuecaa E E veda cusuecdadeasedaacesnecedbeanedaves Commas in lists of arguments ee faa wee e CATS cessecanccaueces cevseceacenescaa cons scdacebecaa cussecdadenvecea susueddaveacedsadeancdeveratedsees Mathematical COmpuUtati on oia daa vest aseveien
98. MPLEX VAR root2 COMPLEX tmpc COMPLEX This procedure takes coefficients a b and c and returns the roots as complex roots rootl and root2 tmpc SORT b 2 4 a c compute an intermediate result Foo El b tmpc 2 a root2 b tmpc 2 a END Defining the procedure The procedure definition begins with the word PROCEDURE and ends with a corresponding END The word following PROCEDURE is the name of the procedure in this case quadratic The name is followed by a list enclosed in parenthesis of formal arguments five in this case The argument name and type must be specified for each of the argument In this example three arguments a b and c are defined to be type REAL and two are defined as type COMPLEX The argument names and for that matter all of the variables defined within the procedure like tmpc are private to the procedure Names of preexisting variables outside of the procedure are not affected by and do not affect declarations of variables using the same name inside the procedure Thus the following bit of code causes no problems Outside of the procedure a b and c refer to one set of variables but the names have different meanings within the procedure a STRING b BOOLEAN c CHAR tmpc CHAR PROCEDURE quadratic a REAL b REAL c REAL VAR root1 COMPLEX VAR root2 COMPLEX tmpc COMPLEX END Any reference to the variables a b and c i
99. Mangel 1997 Holman and Jones 1998 King 1998 Nelson 1982 Morgan 2000 Pickles 1985 Royall 1999 and Wood et al 1992 Guttorp 1995 and Morgan 2000 give accessible introduction to stochastic modeling Programs written in mle are in many respects similar to those written in SAS S SPSS BMDP or other statistical programming languages The language consists of keywords like MODEL END DATA and so on Like all languages mle has rules of syntax that must be strictly followed to produce a valid program The resulting mle program is translated into actions like parameter estimation by the mle interpreter The mle interpreter typically works with three files the mle program file the data file and the output file The next three sections discuss these files in more detail Notice that mle has two distinct meanings in this document First it is a programming language for building likelihoods described herein Second it is the name of the computer program that interprets the language and finds maximum likelihood estimates of model parameters mle 2 1 manual The Program File The program file contains a program written in the mle programming language The first line of this file begins with the word MLE and the program ends with a matching END The program consisting of a set of zero or more statements falls between the MLE and the END Most programs will have statements that name the
100. NGTH is the initial step length for all parameters Empirically the starting step length value has little effect on the outcome of the maximizer SA_ALT_ADJUSTMENT uses an alternative formula for adjusting the step length SA_ADJ_LOWERBOUND defines a null area for which step length is not adjusted If the proportion of accepted moves is greater than SA_ADJ_LOWERBOUND and is less than 1 SA_ADJ_LOWERBOUND the current steplength will continue to be used See Corana et al 1987 for more details 81 mle 2 1 manual Stopping Criteria There are three ways to terminate finding the solution of a model The first way is to minimize the change in the log likelihood to below some specified minimum value You can specify this by setting for example EPSILON 1E 8 When the absolute difference between the log likelihoods of the previous iteration and the current iteration falls below this value the problem will be considered to have converged normally The second way of controlling the stopping criteria is by specifying the maximum number of iterations permissible For example setting MAXITER 1000 would stop searching for the maximum after 1 000 iterations regardless of the change in the likelihood Note that a single iteration is that over all dimensions The third stopping criterion is by specifying the maximum number of function evaluations permissible You can specify for example MAXEVALS 10000 which wo
101. PRINTLN Run from an MS DOS system SEP DATAFILE C ELSE PRINTLN Run on a unix system SEP DATAFILE DIR SEP NAME END SEP DIR SEP NAME The Break Statement The outs BREAK statement works within loop statements WHILE REPEAT and FOR When a BREAK statement is encountered the loop is immediately exited The behavior of a BREAK statement outside of a loop causes the current scope to be exited This means that within the main program ide of a user defined procedure or function a BREAK acts like a HALT statement defined procedure or function the procedure or function is exited The Continue Statement The CONT loop Within a user CONTINUE statement works within loop statements WHILE REPEAT and FOR When a INUE statement is encountered all further statements are skipped until the end of the current The Exit Statement The is encountered outside of a procedure or function the program exits EXIT statement immediately exits the current procedure or function When an EXIT statement Differences Between Version 2 0 and Version 2 1 Version 2 1 offers improved speed greater memory capacity and the addition of some significant new capabilities With one minor exception FoR loops using DOWNTO version 2 0 programs should work without change in version 2 1 Here is a list of the most important changes User de
102. Programming tutorial MLE FUNCTION myfunc a REAL b REAL REAL RETURN END The main body of the program starts here FOR x 1 TO 20 DO a myfunc x x 2 WRITELN a END END The statements within the procedure are executed the values of rootl and root2 are updated and control is passed back to the main program In the main program the variables rl and r2 have been updated with the results from root1 and root2 Nested procedures New procedure definitions can be defined within existing procedures In the same way that variables defined inside a procedure are visible from within a procedure procedures defined within procedures are only visible from within that procedure Here is an example of nested procedures MLE PROCEDURE printthings sl STRING s2 STRING PROCEDURE indent VAR s STRING n INTEGER Indents a string by n spaces FOR i 1 TO n DO s 5 END for END proc indent indent s1 6 indent s2 12 WRITELN s1 WRITELN s2 END proc printthings Example programs This section contains a few examples of programs written in mle 155 Programming tutorial A simple simulation program MLE This program simulates a simple data set The output is an id and an age at which some developmental landmark is attained drawn from a normal pdf nkids 1000 number of kids to simulate mu 6 mean age of reaching the landmark sig 1 stddev in reaching the landmark
103. RUN and END tells mle that all parameters defined in the likelihood in this case mu and sigma are to be manipulated in order to maximize the likelihood Alternatively the REDUCE Or WITH keywords can be used in place of FULL Another example The expression that defines the likelihood within a model statement can become much more complicated than the first example Consider the following likelihood L P Spun 11 095 14901 J 0 PIS pe 112 62 SC 1803 J i 1 This is the likelihood for a mixture model in which observations are drawn from two distributions that is two different sets of parameters for the same distribution and mixed at some fraction p This type of model arises when one cannot tell which of the two distributions observations are drawn from An example might be a collection of people heights with no information on the sex of each individual Even without such information the proportion of each sex can be treated as a latent variable and sex specific parameters can be estimated along with the proportion This more complicated likelihood can be coded as follows 58 mle 2 1 manual MODEL mixture of two normal distributions DATA MIX PARAM p LOW 0 HIGH 1 START 0 5 END E PDF NORMAL topen tclose PARAM mul LOW 5 HIGH PARAM sigmal LOW 0 1 HIGH END PDF r PDF NORMAL topen tclose PARAM mu2 LOW 0 HIGH 6 START 2 END PARAM sigma2 LOW 0 01 HIGH 5 START 1 2 END END PDF
104. Set foreground color Shift_E6 findnext ooonccccnn Find next occurrence Shift_P7 replacenext Find and replace next occurrence Shift_F8 0 0 marginset Set margins Shift_F9 writekeymapfile Writes startup key map file Shift_FlO mletmplopts Change mle template options Ctrl Fl we helpmle Give help on an mle keyword Ctrl F2 ois unmapped CHIES cee unmapped Ctrl Ba ee torrets blockend Mark end of block Ctrl Earn colorbackset Set background color Ctrl_F6 we findopts Find with options Ctrl Enans replaceopts Find and replace options Ctrl FSi unmapped Ctrl Fo sects readkeymapfile Reads key map file Ctrl F10 vse as ese unmapped ALE pete helpkeyboard Displays keys mapped to commands AEP AA mleeXxpf eee Run an mle expression Alt_F3 wee makebackup Set whether backup files are made Alt Fhian blockmove Move block Alt A debug eee Turns debugging on Alt_F6 raissa unmapped Alt ET eesin unmapped Alt_F8 oo eee debugscreen Shows internal information Alt_F9 wo configsave Saves configuration information Alt_F10 oe unmapped Ctrl_PrtSc unmapped 0 0 Ctrl_LtArr w
105. Specifically t in the call was not changed by the assignment to a in the procedure The reason is that a copy of each argument is passed to the procedure This behavior prevents accidental side effects outside of the procedure resulting from manipulations within procedures Additionally this permits recursive calls to a procedure i e a procedure that calls itself Sometimes it is helpful to permit the procedure to change the variables back in the main program or calling procedure It is possible to pass a variable to a procedure so that its value can be manipulated within the procedure This is done by preceding the variable in the formal argument list of the procedure by the name var This mechanism is almost identical to variable arguments in Pascal and Modula Suppose we rewrite the previous example by adding var before the formal declaration of variable a PROCEDURE myproc VAR a INTEGER b REAL c STRING msg ais 17 mle 2 1 manual Now any changes to variable a within the procedure will be reflected in changes to variable t outside of the procedure Call myproc with In myproc 4 2000000000 c Hello world ais lt 10 Exit myproc with a Back from myproc with t Here are some other notes about user defined procedures O VAR arguments require that variables be passed instead of constants since the variable may be modified e Arrays can only be passed as var arguments e Procedures can be defined and called
106. Typically boolean expressions use relational operators gt gt lt lt lt gt and boolean operators NOT AND OR XOR Functions that return boolean values can be used as well Multiple KEEPIF and DROPIF statements can be used for a single variable As mle reads in variables each condition is tested in sequence until the end of the tests are reached or the observation deemed dropped that is boolean short circuiting will be used to drop variables at the first opportunity The third example is a test that keeps the observation if last_time is greater then zero the second test will examine if the value is equal to INFINITY a built in constant or less than first_time and drop the observation if either condition is true Then if the variable is to be dropped the entire observation is dropped Note that the value of other variables in the current observation may be used in a DROPIF and KEEPIF statement Observation frequency Each observation in a data file which typically occurs on a single line is usually a single observation Sometimes it is convenient to place multiple identical observations on a single line along with a count of how many observations are represented The names FREQUENCY or FREQ have a special meaning when defined as variables in a DATA statement They are taken as the frequency or count for each observation If both variable names are used FREQUENCY is ta
107. WRITELN Hello Universe END is a valid program Notice that the END does not require an additional space because is punctuation Identifiers assignment statement and functions Let s expand on the first program a bit The second program introduces assignment statements identifiers function calls and comments 133 MLE Writ Writ populatio greeting now signature writ WRITELN g WRITELN s END Programming tutorial es a greeting card to the universe ten 29 Mar 2003 n 6 3 update from http www ibiblio org lunarbin worldpop Hello Universe create a signature that includes everyone from REAL2STR population 3 1 billion of us on earth e the message here reeting ignature The first thing to notice about this program is that it contains comments The comments are contained within curly brackets Comments are ignored and are there to help programmers makes sense of the program months or years later As a programmer you should develop the discipline to document your program with comments Try to develop a consistent and descriptive style for formatting your programs including informative comments sprinkled throughout In this program we have created some variables Variables are named objects that take on a value In a spreadsheet program there are cells available that can take on values
108. _weight LOW 20 HIGH 20 START 0 COVAR age PARAM b_age LOW 20 HIGH 20 START 0 END hazard END of the PDF END RUN FULL END of the MODEL END Survival analysis Immune subgroup When observing times to events there may be an unidentifiable subgroup for whom risk of experiencing the event is zero These make up a so called immune fraction a sterile subgroup or a contaminating fraction It is possible to model Except for the exponential and the Weibull distributions accelerated failure time models are not proportional hazards models 116 Statistical examples some fraction of individuals who are not at risk so to statistically identify the subgroup If complete records are available for all individuals one could simply remove the sterile individuals from the analysis of the non sterile fraction When complete records are not available i e we cannot tell a sterile individual from a right censored individual maximum likelihoods methods are easily adapted to include estimation of an unknown fraction of individuals who are not susceptible to failure The effect of the sterile subgroup on the survival distribution can be seen in Figure 5 Call s the non susceptible fraction Then the proportion of individuals who are susceptible at the start of risk is p 0 1 s Inspection of Figure 5 suggests that the fraction of surviving individuals at time must be made up of two fractions One is S t weighted by the fractio
109. al maximum than the simplex or congugate gradient methods Furthermore it is easy to constrain the algorithm so that new parameter values never overstep the user defined or mathematically defined limits that is it respects the boundaries of our map Unfortunately the number of function evaluations goes up as an exponent of the number of dimensions in the problem When the number of parameters gets large the solution is very slow in coming Furthermore some functions that have the maximum along a long narrow ridge at a 45 angle to the lines of longitude and latitude require a large number of tiny movements before reaching the maximum The direct method and is set by METHOD DIRECT It uses the HIGH value and Low values to constrain all parameters as discussed below The start values define the initial starting parameters The direct method uses Brent s 1973 see also Press et al 1989 parabolic interpolation to find the maximum along a single direction i e for a single parameter holding all other parameters constant The maximizer uses the HIGH value and Low value to define the extreme bounds of the problem The start value is the first guess at the maximum A parabola is then fit through the set of three points and the maximum is analytically computed This procedure is repeated with the three points enclosing the maximum until the maximum in that dimension is found to some prespecified tolerance There are three ways
110. ample REAL 1 TO 5 1 TO 2 FOR x 1 TO 5 DO FOR y 1 TO 2 DO WRITE a x END for y WRITELN END for x Here are the results of running this example a 1 1 1 1000000000 2000000000 a 2 1 2 1000000000 2000000000 a 3 1 3 1000000000 2000000000 a 4 1 4 1000000000 2000000000 a 5 1 5 1000000000 2000000000 Data Statement Most mle programs include a DATA END statement The purpose of a DATA statement is to create a series of observations which will be used to compute likelihoods The DATA END statement defines the format of the data file defines variables to be read in provides a way of transforming variables and provides a way of selecting and dropping observations Only an overview of the DATA statement is given here Details are given in chapter three Formats for the DATA statement are DATA lt variable gt FIELD x reads variable from field gt lt variable gt FIELD x LINE y multiline version lt variable gt FIELD x LINE y lt expr gt reads and transforms lt variable gt FIELD x LINE y DROPIF lt expr gt KEEPIF lt expr gt generic from with FIELD lt variable gt lt expr gt creates from an expressions lt variable gt lt expr gt DROPIF lt expr gt KEEPIF lt expr gt creates and conditionally keeps lt variable gt FIELD x LINE y lt expr gt DROPIF lt expr gt KEEPIF lt expr gt END A description of each field follows 1
111. ana movements and evaluations we now adjust the maximum steplength vector v The reduction or increase in steplength is done according to the proportion of accepted and rejected movements by an algorithm described in detail below In short the maximum step length is reduced or increased so that we can expect to accept about one half of all moves in the next cycle of random steps Following this adjustment a new cycle of random steps is initiated until a total of Nag of these adjustments have been completed Thus after Nrang lt Naaj function evaluations a single iteration completes and a new iteration is begun until convergence the maximum number of iterations is reached or the maximum number of function evaluations is reached The simulated annealing method is set by METHOD ANNEALING The method does use the HIGH value and Low values to constrain all parameters as discussed below The START values define the initial starting parameters A number of other variables should be set with this method Since the simulated annealing method uses random numbers the user must set a random seed by calling the procedure SEED with a positive integer The starting temperature is set with SA_TEMPERATURE The default value is 1000 0 which is too high for all but extremely wild functions It is difficult to know what a good starting temperature is for a function but values under 100 empirically seem to work for all but the most topographically complicate
112. arks Alt_V Alt_F4 Move block AM Eastonin Write block to a file Page formatting commands Ctrl Po sii ties Set background color Shift_F5 Set foreground color Shift_F8 Set margins oE Redraw the screen Ale Ke ase ceaes Toggle ruler display ESuutubthn ttt sls Toggle word wrap Help commands Plantas Displays editor commands AM Pi O Displays keys mapped to commands Ctr Fl cise tees Give help on an mle keyword Shift F1 Match and give help on a keyword lt not assigned gt Program information lo eee A Open up OS window Shift_F2 Parse in mle Electo times Run in mle Alt A Run an mle expression POs kav Find text Shift_P6 Find next occurrence Ctrl_F6 ee Find with options AOS Find and replace Shift_F7 Find and replace next occurrence Corl Tiana Find and replace options Alta Goto line Other commands Ctrl_T EFlO Insert an mle template Shift_F10 00 Change mle template options Corll Ehana Change case to end of line Sid Od iesi Change to lower case to EOL CL ica Change to upper case to EOL 37 mle 2 1 manual AMA od oe Enter ASCII code EV a Accept lt Ctrl gt key Shift_F9 Writes startup key map file emle kbm Ctrl_F9 we Reads key map file emle kbm ATEO comia Saves configuration information to the file emle cfg Alt_P8 A Shows internal information used for debugging A EJ wesc nceeeeeees Turns debugging on Menu commands lt
113. at can be converted into the same type an integer into a real for example Variables and expressions for that matter in mle can take on one of the following types REAL INTEGER COMPLEX BOOLEAN STRING CHAR character and FILE A detailed discussion of these types is given in the reference manual A summery is given here A variable s type refers to the domain of values that the variable can take on For example INTEGER variables can take on a limited range of integer values BOOLEAN variables can only take on the values TRUE and FALSE Variables can be defined for each of the seven types and expressions always take on one of these types Here is an explanation of each e Real variables represent the continuous real number line For example 3 5 1E 23 7 0 and 19 999 are all real numbers e Integer variables take on whole number values over a machine dependent range of numbers For most versions of mle this range is 2 147 483 648 to 2 147 483 647 e Complex variables include a real number part and an imaginary part Complex numbers are specified by expressions such as 1 2 0 4i or 0 Ii e Boolean variables take on one of two states TRUE or FALSE No other value is allowed or recognized Boolean expressions are frequently used to test conditions in the IF THEN ELSE END function or statement 135 Programming tutorial e String
114. ation from the three points of the triangle at a given step A rather unsophisticated method alternates between maximizing the function first by longitude using as many evaluations as needed to find the maximum longitude for a given latitude and then does the same for latitude By repeating this many times a maximum usually the global maximum is found Needless to say this method can be very slow Finally a newer method has been developed that mimics natures own maximization method The method can be slow but seems to be as robust at finding the global maximum as any iterative method Conjugate gradient method The conjugate gradient method searches through parameter space for combinations of parameters where the slope of the likelihood function goes to zero Now the computer numerically computes a slope or gradient using the equation m f x Ax f xi Ax for parameters x and 78 mle 2 1 manual small values Ax This procedure uses the slopes m to figure out the next set of x under the idea that the slope will decrease as the maximum is approached unless the surface is flat The conjugate gradient method used in mle was developed by Powell 1964 Brent 1973 and further developed by Press et al 1989 For problems of more then two free parameters the conjugate gradient method is usually much faster than the direct method Caution must be exercised when using this method At times a local maximum is latched onto by
115. ative value so the value is 768 The logical AND oR and xor functions act bit by bit as well Thus the binary values 2x101101 AND 2x111000 which is the same as 45 AND 56 evaluates to 40 or 2x101000 The SHL and SHR operators shift bits to the left and right So 2x000111 SHL 3 i e 7 SHL 3 evaluates to 56 or 2x111000 See Table 9 defines the logical operators Table 9 Definition of logical operators Flips all Os to 1s and 1s to Os Returns 1 if both bits are 1 1 2x1010 AND 8 2x1000 AND 1 gt 1 0 AND 1 gt 0 0 ANDO gt 2x1100 Returns 1 if either bit is al 1 2x1010 OR 2x1110 ff 14 2x1110 OR 1 gt 1 0 R 1 gt 1 1 R 0 gt 1 0 ORO gt 0 Exclusive OR function Returns a 1 2x1010 OR 2x1110 6 2x0110 if one of the bits is 1 and the other is 0 1 XOR 1 gt 0 0 XOR 1 gt 1 1 XOR 0 gt 1 0 XR 0 gt 0 You might be wondering how mle decides whether an operator is boolean or logical The answer is simple if both operands are boolean types the operator will be boolean If both operands are integers the operator will be logical If one operator is boolean and one is logical an error results For the expression x gt 4 OR y lt 2 each of the expressions in parenthesis will evaluate to TRUE Or FALSE so that the or will be a boolean operator Operator precedence Mathematicians have developed a series of conventions on operator precedence When you see the expression 4x2
116. bservations and an expression for an individual likelihood It computes and returns the total log loglikelihood The individual likelihood for this example specified within the pata function consists of a PDF function A NORMAL distribution is specified with two arguments topen tclose These times denote the time interval within which births occur Because the arguments which were read from column 1 and 2 of the data file differ from each other the PDF function returns the area under a normal PDF between topen and tclose The area corresponds to the probability of observing a birth within that interval If instead we had specified one argument to the PDF function or if topen was equal to tclose the PDF function would have returned the probability density at that point corresponding to exact ages at birth Within the PDF NORMAL function call are two PARAM functions These functions define parameters that will be changed in order to maximize the likelihood Naturally you can specify limits starting values etc for these parameters End Part of the Model Statement Between the RUN and the END part of a MODEL statement comes a list specifying how to run the model The full model is run by specifying FULL all parameters defined in the model will be estimated Various reduced forms of the model can be run by specifying a REDUCE command More details on this are given below and in a later chapter mle 2 1 manual
117. cnseeesaeessaeeseneaeeesaee mle 2 1 manual mle 2 1 manual Chapter 1 Introduction to mle mle is a simple programming language for building and estimating parameters of likelihood models The language was originally intended for building and estimating the parameters of survival models but it has evolved to be general enough to estimate parameters for many other types of likelihood models Indeed the language attempts to be a general purpose tool for likelihood estimation This chapter provides an overview of mle The basic concepts of the programming language are introduced and some examples are given Additional examples of mle programs and program fragments are sprinkled throughout this chapter the rest of this User s manual and the Reference manual The mechanics of running mle from DOS or Unix is given in Chapter 2 Formal descriptions of the mle programming language are saved for later chapters Another later chapter is devoted to examples of different type of likelihood models Preliminaries This manual gives only a superficial treatment of topics like probability theory probability models stochastic modeling and maximum likelihood estimation In order to write mle programs you will need a basic understanding of these topics Some helpful generally applied introductions to statistical modeling and maximum likelihood estimation can be found in Burnham and Anderson 1998 Cullen and Frey 1999 Edwards 1972 Hilborn and
118. d it is read in from the input file An example of creating and reading a data file Last Smith Jones Connor Data file are read as ordinary ASCII text files which means they can be created with any text editor Word processors can be used to create files as well but the results must be saved as ASCII text file Nearly all word processors provide an ASCII text option An example of a typical data file can be seen in Chapter 1 but here we will examine a more complicated data file and write the mle program to read and process the file The current version of mle creates variables of type real and attempts to read real numbers from each field for which a variable is defined Even so any delimited text can appear in fields that are not assigned to variables Consider how we would create a DATA statement to read the numeric values for the following file First MI Age Amount Jame 5 A 42 12000 David J 38 8000 Mary 50 11000 First of all notice that the first line of the file is a comment Clearly we do not want mle to treat this line as an observation so we can discard the line by setting INPUT_SKIP 1 From there the data file has one line per observation with each variable corresponding to one column meaning that we will not need to use the LINE specification here Some data files place each observation across multiple lines so that the LINE option in the DATA statement must be used This sample
119. d likelihood functions When a likelihood is to be solved multiple times on similar data sets like when running on bootstrapped data sets it is worth exploring a couple of different temperatures and monitoring the progress of the annealing by using the verbose v option In fact watching the entire annealing process is useful for developing and understanding of the algorithm The variable sA_COOLING controls the cooling rate and is 0 85 by default Too high a value will slow down cooling and may lead to unnecessary evaluations whereas too low a value may resulting in simulated quenching The number of steps of random parameter perturbation is set using SA_STEPS The number of step length adjustments taken every iteration is controlled by SA_ADJ_CYCLES Finally the size of each step adjustment can be controlled by SA_STEPLENGTH_ADJ but the default value of 2 0 usually works well The simulated annealing algorithm uses a different criterion for convergence than do the other solvers An array of the best likelihoods of size sA_EPS_NUMBER default is 4 is created and updated every iteration Convergence is considered achieved when the likelihood for the current iteration differs from all sA_EPS_NUMBER likelihoods by the value of EPSILON Several other variables can be used for fine tuning of the simulated annealing algorithm but there is rarely a need to mess with them SA_STEPLE
120. data file and the output file a DATA statement describing how to read and possibly transform observations from a data file and specifications of one or more likelihood models along with parameters to find Parameter estimates are then found by an iterative search that maximizes the likelihood given a set of observations The resulting parameter estimates are then written to the output file The mle program file is created as an ordinary text file using almost any editor You can create and edit the mle program using Notepad in Windows the EDIT command in DOS vi pico or Emacs in Unix or any other editor that will read and write a file as ASCII text Word processors such as Microsoft MSWord can be used as well but you must remember to save your work using the text with line breaks option The Data File The data file contains lines of observations The observations are read and perhaps transformed when the mle program is run The observations are then used with the likelihood function specified in the mle program file to find parameter estimates Data files are standard ASCII text files Typically one line in the file represents one observation although a single observation can span more than one line Within each observation is a series of fields that are separated by spaces tabs commas or some other user specified delimiter Numeric fields can be read into mle variables The Output File The output file is where result
121. dence interval Panel a uncorrelated parameters where the one dimensional change in likelihood is identical to the change over both parameters Panel b correlated parameters where the change in likelihood dotted lines is less than the change in likelihood over both parameters dashed lines Given the limitation of these confidence intervals why use them There are several cases where they are helpful e When a single parameter is being estimated e In some models where parameters are statistically independent like while estimating the location and scale parameters of a normal distribution There are circumstances when the variance covariance matrix is singular For example this happens when one or more parameters are collinear and don t independently contribute information to a likelihood Under these circumstances the confidence intervals are helpful for identifying poorly identified parameters so that the model can be modified to eliminate collinear parameters The confidence intervals are found iteratively in one dimension at a time For each of the limit pairs mle first evaluates the likelihood at the extremes LOW CI_LIMIT_DELTA and HIGH CI_LIMIT_DELTA Convergence occurs when the difference between the likelihood at the parameter estimate and the confidence limit estimate is equal to c1_CHISO down to an absolute error of CI_CONVERGE The maximum number of iterations for each of the limits is CI_MAXITS 64
122. e 6 Plots of observed and expected numbers of plant counts under two different distributions 60 60 50 Thomas distribution 50 Poisson distribution ao Expected 40 Expected _ 30 Observed 30 Observed 20 20 10 10 0 mil l eRe 0 0 2 4 6 8 10 128 Programming tutorial 129 Programming tutorial Chapter 7 Programming tutorial The mle programming language is a general purpose algebraic programming language This chapter provides a tutorial and examples of some of the language tools that can be used for many types of programming Introduction to programming in mle People get passionate about programming languages the way they get passionate about religion There are thousands of programming languages that have been written Why should you use mle Why indeed With so many good general purpose programming languages available in the world I will not try to make strong arguments that mle is the best general purpose programming language and I will not even claim that it is the single best language for any specific purpose Rather I will argue that there some pretty good reasons to use mle But if you are already a crack Ada Basic COBOL Fortran Python SAS SNOBOL Java perl or COBOL programmer by all means use that language you know best If you are an experienced programmer in any conventional programming language the learning mle will be simple the syntax is straightforward and punctuation is minimal
123. e CURVE statement can be used CURVE KEY lt keystring gt WITH lt withstring gt AXES lt axesstring gt lt x_var gt lt x_min gt TO lt x_max gt BY lt y_var gt lt y_min gt TO lt y_max gt lt x_expr gt lt y_expr gt lt z_expr gt lt expr gt lt string gt a a Additionally the REAL and INTEGER forms can be combined 92 mle 2 1 manual CURVE KEY lt keystring gt WITH lt withstring gt AXES lt axesstring gt lt x_var gt lt x_min gt TO lt x_max gt BY lt y_var gt lt y_min gt lt y_max gt lt y_points gt lt x_expr gt lt y_expr gt lt z_expr gt lt expr gt lt string gt s oa CURVE KEY lt keystring gt WITH lt withstring gt AXES lt axesstring gt lt x_var gt lt x_min gt lt x_max gt lt x_points gt BY lt y_var gt lt y_min gt TO lt y_max gt lt x_expr gt lt y_expr gt lt z_expr gt lt expr gt Ssteaing gt e a Gnuplot does not support error bars or boxes for three dimensional plots Thus there are three required numeric expression lt x_expr gt lt y_expr gt lt z_expr gt following the lt y_var gt definition although additional numeric expressions can be written to the data file for other uses These three required expressions gives the x y and z values to be plotted for each combination of x_var and y_var Here is an example of a simple three dimensional plot Suppose we want to plot th
124. e breaks sacred rules from the halls of Computer Science variables can be automatically declared when first encountered in an assignment statement The pitfalls of permitting this in my opinion are offset by ease of use in a statistical programming language Formal declarations are intimidating to the occasional programmer although I insist on writing the mle interpreter in a language that strictly enforces variable declaration The suite of programming features was completed with the addition of user defined procedures and functions Currently mle is embodied as about 25 000 lines of Pascal Earlier DOS versions of the mle interpreter were compiled in Borland Pascal version 7 I still use the Borland environment for most development and ii mle user s manual Preface debugging The most recent release is compiled using the Free Pascal Compiler FPC Van Canneyt 2000 which benchmarks at three to six time faster than the Borland compiler with slightly smaller code size FPC has also relaxed the small data set limitation as data variables and arrays can now be allocated larger than 64 Ki With FPC I can now release versions for Linux ELF binaries Windows 95 98 2000 NT and in the off chance that there is demand OS 2 FreeBSD Solaris Intel Commodore Amiga and Atari ST The Unix version of mle has traditionally been Solaris for the Sun Sparc architecture This version was created by translating Pascal into c using the p2c translator Gi
125. e function SIN x 2 COS y 2 over the range 0 to 27 with 30 points in each dimension The mle code to do this is MLE PLOTFILE DEFAULTPLOTNAME open plot file PLOT set contour base set hidden3d plot a surface plot and a contour plot set view 50 change the perspective a bit CURVE x 0 2 PI 30 BY y 0 2 PI 30 define the ranges x Y SIN x 2 COS y 2 the function to plot END curve END plot END mle The resulting Gnuplot file is set terminal windows reset set data style lines set autoscale set nokey set contour base set hidden3d set view 50 splot eg7 001 using 1 2 3 notitle The file eg7 001 contains the points generated by mle Here is the resulting plot 93 mle 2 1 manual Q ANNY ae ORT O KAK SIISE DSZ NS a Three dimensional plots can include multiple curves For example to the previous curve we can add to the graph a plane through z 1 and another plane through z y 4 MLE PLOTFILE DEFAULTPLOTNAME open plot file PLOT set nocontour no contours set hidden3d hide lines set view 50 change the perspective a bit CURVE x 0 2 PI 30 BY y 0 2 PI 30 curve Xy Vr SIN x 2 COS y 2 END curve CURVE x 0 2 PI 10 BY y 0 2 PI 10 x Y 1 END curve CURVE x 0 2 PI 10 BY y 0 2 PI 10 X Yr y 4 END curve END plot END mle The resulting Gnuplot file is set terminal windows reset set data set nocontour se
126. e is an example PDF NORMAL topen tclose PARAM mean LOW 100 HIGH 400 START 270 TEST 0 FORM LOGLIN END PARAM stdev LOW 0 1 HIGH 100 START 20 END HAZARD COVAR sex PARAM b_sex LOW 2 HIGH COVAR weight PARAM b_weight LOW 2 HIGH START O END START O END 2 2 END The covariates sex and weight are modeled to effect on the hazard of failure Parameters b_sex and b_weight provide estimates of the effect The HAZARD statement always provides a proportional hazards specification modeled directly on the hazard of the PDF Usually the specification is loglinear so that the hazard for the ith observation including the covariate effects defined as h tx B h t exp xiB where A t is the baseline hazard for the specified PDF and x B is a vector of covariates x and parameters B so that xB x 1P1 xj2B2 x383 Then the survival function becomes S t lx B SUYA and the probability density function becomes fi tlx B f t S t pb This particular hazards model specification is commonly used By exponentiating the x B array the covariate effects will never cause the hazard to go negative hazards are never negative A multiplicative form for the proportional hazards specification can also be specified by setting the constant EXP_HAZARD FALSE it is TRUE by default Then the model is h tiix B h t x B S tlx B say and f tlx B At S t x You must insure that x 8 never goes negative The LEV
127. e option is p parse only which tell mle to parse the program without running it and report any errors in the grammar The statements within the program are not executed Another very useful option is the v verbose option which tells mle to provide periodic status reports while solving a likelihood Among other things the status report prints out the likelihood and parameter values at each iteration Help Options mle predefines a large number of functions variables constants and reserved words The h help option provides short summaries of mle language parts PDFs and concepts Typing mle h yields 41 mle 2 1 manual mle h lt keyword gt to match keywords exactly mle H lt keyword gt to match partial keywords h MLE gives a program outline PROCEDURES lists procedures PDFS lists PDF types F lists parameter forms HAZARD gives an example of a hazard specification SYMBOLS lists pre defined variables NUMBERS lists number formats FUNCTIONS lists simple functions lp available for the following types of functions expressions DENTIFIER FUNCTION ARRAY DATA DATAARRAY ERIVATIVE FINDMIN FINDZERO FUNCTION IF TEGRATE LEVEL LEVELDELTA PARAM PDF HAZARD PPDF POSTASSIGN PREASSIGN PRODUCT UANTILE QDF SUMMATION elp is available for the following statements SSIGNMENT BEGIN BREAK CONTINUE CURVE DATA EXIT FOR UNCTION IF MODEL MULTIPLOT PLOT PROCEDURE REPEAT WHILE This option is particularly helpful f
128. ed Variables are read in the same order in which they are defined This is true even if they are read over several lines Once a variable is defined its value can be used in later transformations Then when reading in the data file mle will take the value of that variable for the current observation for use in the later transformation An example might be DATA subject_id FIELD DROPIF subject_id 1022 DROPIF subject_id 3308 births FIELD DROPIF births 1 miscarriages FIELD DROPIF miscarriages abortions FIELD 9 DROPIF abortions pregnancies births miscarriages abortions KEEPIF pregnancies gt 0 END This data statement will read subject_id then births then miscarriages and then abortions These variables will then be added together and assigned to the variable pregnancies An observation will be dropped if any of births miscarriages Or abortions are negative one in this case the missing code or if two particular subject_ids are found or if pregnancies 0 Creating dummy variables Dummy variables sometimes called indicator variables are variables that take on the values 0 and 1 to denote two different states for an observation A typical example is a dummy variable for an individual s sex taking a O for females and a 1 for males Frequently dummy variables are used to simplify a more complex continuous or ordinal variable Maternal age for example might be measured as a continuous variable but the characterist
129. ed at the top of the screen are accessed using the lt Alt gt key along with the highlighted character Menus This section shows and describes the menu commands available in emle File menu From the main menu lt Alt gt F brings up the File menu The File menu provides a number of commonly used file related operations The menu contains these elements Open lt Alt gt o provides a menu for opening up a file The arrow keys can be used to move through files and directories Note that the special file is used to change to the previous directory Save Saves the current file saveAs Prompts for a new name and then saves the current work as that name Close Closes the current file eXit Exits the program Backups Toggles whether or not back ups are made while saving files Dos Escapes to a DOS session 33 Edit menu From the main menu lt Alt gt mle 2 1 manual E brings up the Edit menu The Edit menu provides some special editing functions The menu contains these elements Del_line Flipcase Lowercase Uppercase Ctrl_key Quit Block menu Deletes the current line Flips the case of all characters from the cursor to the end of the current line Changes characters to lower case to the end of the current line Changes characters to upper case to the end of the current line After selecting this a control key can be entered into the text Quits this menu From the main menu lt A1t gt B
130. ed in the assignment part of the functions For example 24 mle 2 1 manual PREASSIGN BEGIN This is the statement part r REAL O TO 359 FOR i 0 TO 359 DO r i DTOR i END for END begin this is the end of the statement part of the PREASSIGN PDF NORMAL a b c d END This is the function returned by PREASSIGN END preassign e The conditional expressions in the IF THEN ELSE END and LEVEL functions take a Boolean expression of any complexity e g IF a b AND c 2 2 lt 23 OR d gt 1 THEN ELSE END e The IF THEN ELSE END function has been generalized so that multiple ELSEIF THEN conditions may be added The following assignment is an example status IF height lt 48 THEN 1 ELSEIF height gt 48 and height lt 60 THEN 0 ELSE 1 END if e Types can be optionally defined for variables when they are first encountered Valid types are INTEGER REAL CHAR STRING BOOLEAN and FILE For example REAL 23 x would be integer but is defined to be real STRING c would be char but is defined to be string e In general types are handled better Adding two integers variables together for example returns an integer value The IF THEN ELSE END function can return any type but the type after the THEN must match the type after the ELSE
131. ed in the program A lower and upper index must be specified as integer constants Multidimensional arrays of all types are supported by mle as well The format is var typetminl TO maxl min2 To max2 J Some examples of declarations are s STRING 1 TO 5 Defines a one dimensional array of strings r REAL 1 TO 10 1 TO 10 Defines a 10 x 10 matrix b BOOLEAN EO CTO Ty 20 TO 1 0 TO 1 Defines a 3 dimensional array 149 Programming tutorial An entire array can be initialized to a single value in an assignment statement Examples are Arrayed variables are accessed by using brackets for subscripting r REAL O TO 359 FOR i 0 TO 359 DO r i DTOR i writeln Sin i END Files SIN r i1 Text files are widely used in computer programming for statistical analysis and for data files mle provides tools for creating reading writing and appending to text files There are four steps to working with files e First step a variable must be declared as type FILE The variable will be used to refer to a file it acts as a so called file handle e Next a file must be opened OPE NR EAD OP two arguments You must call one of the procedures ENWRIT E OP KNAPP END Each of these procedures take The first is the file variable and the second is a string expression that is the name of the file e No
132. emented by something other than one Here is an example that prints the sequence 9 18 27 TO 99 STEP 9 DO x The initial value of the index variable here x is set to the first value 9 in this case and x is incremented by the step value each iteration so long as x is less than or equal to the final value 99 here The step value can be negative providing a countdown statement 146 Programming tutorial FOR STEPS statement Another variation on the FOR statement includes the steps clause This allows for a fixed number of steps between the first and last values of the loop For example here is a program that prints the cumulative area under a standard normal PDF from 1 to 1 in 10 steps MLE FOR x 1 TO 1 STEPS 10 DO WRITELN x NORMALCDF x END for END mle Here is the resulting output 000000000 777777778 999555556 3333333333 Po El BB EB Bs BBB 0 1111111111 3333333333 95999995996 7777777778 0000000000 1586552595 2183499460 2892573209 3694414036 4557640673 5442359327 6305585964 7107426741 7816500540 8413447405 ee ORO eRe eee The index variable of a FoR STEPS statement is always type REAL REPEAT statement The REPEAT statement provides a means of looping through statements until some condition is met The REPEAT statement differs from the FOR statement in that there is no index variable and no start variable Generical
133. en I have progressively added language features functions probability density functions and numerical methods to the program For a spell I was obsessed with collecting probability density functions the way some people collect shoes many will never be used for serious work but I can peer into the wardrobe and take great satisfaction in seeing them tidily arranged During another compulsive period I decided that mle ought to recognize a plethora of number formats including dates times angular formats numbers in arbitrary bases numbers with metric and computation suffixes and Roman numerals Eventually the language was generalized to recognize and work with different variable types including integer real complex boolean character string and file types This led to a preoccupation with adding predefined mathematical boolean and string functions Recent additions to mle have included full programming language capabilities The language was largely modeled after Pascal with some major differences First I jettisoned most punctuation those pesky semicolons that separate statements and the commas and semicolons that separate lists of arguments In mle commas are always optional where they make sense Sometimes they are helpful for appearance or to separate an argument beginning with a negative sign so that a b is treated as two arguments and not the algebraic expression a b In an important way the mle programming languag
134. ensional array of strings REAL 1 TO 10 1 TO 10 Defines a 10 x 10 matrix BOOLEAN O TO 1 0 TO 1 O TO 1 Defines a 3 dimensional BOOLEAN array Values within an array variable are accessed using brackets to denote subscripts The following example creates an array of radian angles for integral degree angles and prints out a table of sine values r REAL O TO 359 FOR i 0 TO 359 DO r i DTOR i assignment to element i of array r writeln Sin i SIN r il access the ith element of array r END 13 mle 2 1 manual Initialized Array Variables Arrays can be initialized in the same time they are defined There are three ways to initialize an array First the value of a constant can be assigned to the array Examples are STRING 1 TO 5 Defines s and initializes all values to an empty string REAL 1 TO 10 1 TO 10 0 Defines a 10 x 10 matrix and initializes everything to 0 An array can be used to initialize another array provided that the arrays are identically defined That is they must have the same number of subscripts and the same subscript ranges Here is an example a REAL 1 TO 20 FOR x 1 TO 5 DO a2 x x END for b REAL 1 TO 5 Atrays can also be initialized with a list of values one per element A special function is defined that that is enclosed within brackets and within the function brackets are used to nest the values to different levels Here is an ex
135. ent statements using expressions PI weight_max height_max 2 el_count e2_count e3_count e4_count IF linear THEN max_age ELSE SORT max_age END PDF NORMAL 2 2 1 3 END gives area from 2 to 2 for N 1 3 SIN total 2 COS total 2 Variable Names You nami Here can create new variables for the purpose of holding values A few rules must be observed for ng variable and other identifiers such as user defined procedure and function names A variable name must begin with a letter After the initial letter any combination of letters numbers the underscore character _ and the period character may be embedded in the name Punctuation other than a period and underscore character is not permitted Variable names in mle are insensitive to case the variable ccc is the same as ggg Ggg and gGg Variable names cannot be identical to mle keywords such as PROCEDURE DATA FOR etc Also a variable cannot take on the name of an intrinsic procedure READLN SEED OUTFILE etc T Variable names can be the same as an intrinsic function You are discouraged from doing this it can become extremely confusing If you do define variable with the same name as a function the function will no longer be available for use by the program are some examples of valid variable names 11 mle 2 1 manual identical identifier a is the same as A a_long_variable_name 1
136. ent vaso usasscsauacevadobsssacssvaca soed ssedievesdevsevsdagdsovssaeadecsdsndveuesss0soesaveeseass asetedendseus sseasenes REPEAT Statement ETE cus ci ean btasuesses ceva do A E WHILE 0 AA secssdecs osesscaesseoesaveseesdstaveussaeedegedevonseassvaseteden seve sseaseees IN 11 1 EEE PEER NN The Break Statement The Continue Statement THE FX IES TQLCMENE AEE inc eel ohn a a A NN ial as a BO Es a ae DIFFERENCES BETWEEN VERSION 2 0 AND VERSION 2 1 seseeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeseeeeeeeeseeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeges 21 DIFFERENCES BETWEEN VERSION AND VERSION 2 Changes and New Features in Version 2 Converting Version I Programs to Version 2 vi mle user s manual Brief table of contents INSTALLING AND RUNNING MLE cccsscssscsssssssssssssssssssnssssssssssssssscsssssssssssssessssssssssessesssssssesssssossessesessessesssseseess OL MON NE ERE EE acces cetesessdeussoeadevedsacdessdebesededesesstecobassdeeesscetessbadedesedddebscenedesecatesstedeuessdiosasetesesss 33 Pile MMU ies TON EINEN iia TN Search AA iaa Mle MenU oocccnnncccnnnniconnnnncnononicnonos SN A A ANNO Help Men iia o A a aa DEJaulES CINES ca0s doc shes dae cbc tube ded tend dbvaates a ade ati dha ceeedtess Default COMMANA MAP PUIG sot e e e eet Sets bee side C rsa cono COMMANDS A e a ld Ms ld Insert and delete COMMANAS tdt a a ae a a indole dante delle riada ads ElecommandS ar
137. er They are 1 the highest possible value that can be tried for the parameter 2 the lowest possible value that can be tried for the parameter 3 an initial value for the parameter 4 a test value against which the parameter will be tested when standard errors are computed and 5 a form for the parameter that defines simple algebraic transformations and the mathematical from for incorporating covariates The following model statement defines all five characteristics for the parameters of a beta distribution PDF BETA p PARAM PARAM END pd END data RUN FULL END model a LOW 10 START NUMBER END b LOW 10 START NUMBER END f The two parameters of the beta distribution are limited to the range 0 5 to 10 whereas mathematically they are only restricted to positive values The TEST 1 specifies that the parameter will be tested against one instead of the default value of zero after standard errors for the parameters are found The start value of one simply gives mle a starting place that falls within the Low and HIGH values Use care when setting the HIGH and Low limits Most importantly limits must be constrained to valid ranges for the intrinsic parameters Thus for the MIX mixing proportion parameter the first of the three parameters then HIGH 1and Low 0 should be defined as is appropriate for a probability unless some FORM like FORM LOGISTIC is used to constrain the resulting parameter to between 0 and
138. erger F and P lya G 1923 Uber die Statistik verketteter Vorg nge Zeitschrift fiir Angewwandte Mathematik und Mechanik 1 279 289 Elandt Johnson RC Johnson NL 1980 Survival Models and Data Analysis New York John Wiley and Sons Evans M Hastings N Peacock B 2000 Statistical Distributions Third edition New York John Wiley and Sons Fisher RA 1921 On the probable error of a coefficient of correlation deduced from a small sample Metron 1 3 32 Fisher RA Corbet AS Williams CB 1943 The relation between the number of species and the number of individuals in a random sample from an animal population Journal of Animal Ecology 12 42 58 Folks JL and Chhikara RS 1978 The inverse Gaussian distribution and its statistical applications A review Journal of the Royal Statistical Society Series B 40 263 89 Forsythe G Malcolm MA Moler CB 1977 Computer Methods for Mathematical Computations Englewood Cliffs NJ Prentice Hall Gage TB 1989 Bio methematical approaches to the study of human variation in mortality Yearbook of Physical Anthropology 32 185 214 Geoffe WL Ferrier GD Rogers J 1994 Global optimization of statistical functions with simulated annealing Journal of Econometrics 60 65 99 Gillespie D 1989 p2c Pascal to C translator Gompertz B 1825 On the nature of the function expressive of the law of human mortality Philosophical Transactions of the Royal Society of London Series A 115 513 85 Gumbel EJ
139. erval censored data MLE TITLE Example DATAFILE ex3 dat OUTFILE ex3 out DATA topen FIELD 1 tclose FIELD 2 END MODEL DATA PDF LOGNORMAL topen tclose PARAM a LOW 0 00001 HIGH 9 START 1 END PARAM b LOW 0 00001 HIGH 2 START 0 4 END END of the PDF END RUN FULL END of the MODEL END Current status analyses Current status analysis consists of observations that are collected cross sectionally The methods most commonly associated with current status analysis are probit and logit analysis mle makes it easy to do current status analysis with any of the built in distribution functions Under a cross sectional study design each observation consists of 1 time of a single observation since the study began t 2 an indicator variable to determine whether or not the individual experienced the event The result of the indicator 110 Statistical examples variable is that the individual is a responder r or non responders n The likelihood from N observations made up of N responders and N non responders is 65 L019 50 10 70 10 This likelihood can be interpreted as follows For the likelihood for the non responders is the area under the pdf from the time of observation to infinity Thus a responder contributes a likelihood that is exactly like a right censored observation The likelihood for a responder is the area under the pdf from 2o or 0 for pdfs defined to have positive arguments to
140. eter mle initially uses a Ax and Ay of Dx_sTART and then iteratively finds a Ax that changes the loglikelihood by at least Dx_TOOSMALL but no more than Dx_TOOBIG Up to DX_MAXITS such iterations are permitted The default values are almost always suitable The one serious limitation of this method is that it does not work well for hierarchical likelihoods The second estimate of the variance covariance matrix is computed by estimating the second partial derivative by numeric perturbation This method does not truly compute an expectation and is sometimes inaccurate you can compare the two methods by setting both INFO_METHOD1 TRUE and INFO_METHOD2 TRUE Nevertheless when hierarchical likelihoods are being computed this method will produce better estimates Confidence Interval Report An approximate confidence region for each parameter can be estimated by mle The report is printed when PRINT_CI TRUE When the variable PRINT_SHORT TRUE the report format is modified so that all parameters estimates are printed on one line The confidence interval is defined as the extent of upper and lower perturbations away from the estimates that change the loglikelihood by a specified amount For example approximate 95 confidence intervals are formed when the change in the loglikelihood in each direction is 5 0239 This value corresponds to an expected probability of 0 025 on each tail of the chi sq
141. eters until the values that maximize the likelihood are found In other words free parameters are values that are to be estimated by mle they are the unknowns in likelihood models When the mop EL statement is run mle will estimate the value of 6 The word parameter is used in a very specific way as defined in Chapter 1 Parameters are the quantities to be estimated in a likelihood model 68 mle 2 1 manual that parameter unless the parameter is constrained to some fixed value in the REDUCE part of the model statement In the simplest case parameters are specified as PARAM lt p gt HIGH lt expr gt LOW lt expr gt START lt expr gt TEST lt expr gt FORM lt formspec gt END where lt p gt is the name chosen for the parameter The keywords HIGH LOW START and TEST specify characteristics for the parameter HIGH and Low specifies the minimum and maximum value allowed for the parameter mle will not exceed these values while trying to maximize the likelihood start tells the maximizer what value to start with TEST denotes the value against which to test the parameter for significance By default TEST is zero It is used for a Wald test as the parameter is being written to the output file Additionally this is the value that the parameter is constrained to when left out by the wI TH command Setting Parameter Information MODEL DATA Five characteristics may be set for each paramet
142. ever The Continue Statement Arrays MLE myarray REAL 1 TO 20 TO 20 DO i i 2 FOR i 1 myarray END for e FOR i 1 WRITELN END for END Like the BREAK statement the CONTINUE statement works within loops WHILE REPEAT and FOR When a CONTINUE statement is encountered all further statements are skipped until the end of the current loop The CONTINUE statement is a convenient way to skip over sections of code and force another iteration of the loop An array is a series of contiguous memory locations referenced by a single variable name Arrays have many important uses in computer programming They are almost always used with FoR loops or other looping structures The important idea behind arrays is that an integer value serves as an offset or index to the array elements For example consider an array called myarray that is defined to be 20 REAL elements long Each element of the array can be indexed by placing an integer expression within square brackets e g myarray 3 3 2 Suppose we wish to create a table of squared values and later in the program print the values out The following code will accomplish this TO 20 DO i 2 myarray i In this last example a one dimensional array was defined as a REAL and indexed over the range from 1 to 20 Arrays must always be explicitly declared in mle They must be defined the first time the variable is mention
143. expressions Algebraic expressions are expressions are created using a series of special operators and calls to functions Operators include algebraic symbols like and a series of algebraic keywords for integer operations DIV MOD SHL SHR See Table 8 The right hand side of an assignment statement is an expression Examples of valid assignment statements with expressions on the right hand side are 2 3 HOURS 60 2 12 5 first 10 second SIN 2 PI mask SHL 4 23 DIV 4 Boolean expressions evaluate to either TRUE or FALSE The operators for creating boolean expressions are gt lt gt lt lt gt and boolean keywords AND OR XOR and NOT and some simple functions These operators are used in the same way as they are in many other programming languages a lt gt 42 2 Bos b a lt gt 12 AND a gt 0 137 Programming tutorial The difference between boolean and logical expressions is that boolean expressions work with the values TRUE and FALSE only whereas logical expressions work with bits on integers For example NoT TRUE is equal to FALSE but NOT 767 is equal to 768 How does this work The number 767 is represented by the computer as the binary sequence 00000000000000000000001011111111 The logical Not operator flips all 1s to Os and Os to ls so that the number becomes 11111111111111111111110100000000 The first left most bit denotes a neg
144. f the PDF the MODEL Survival analysis Right truncated observations 7 Right truncation arises in survival analysis when the later risk is determined by the study design For example we might have data on child mortality for analysis Each child was followed from birth to age five and the only children available in the data set were those who died from birth to five This type of data collection can lead to unbiased results provided child s observations are right truncated at age five For this example we will use the Gompertz competing hazards mortality model for a fictitious prospective study of mortality We will have observations selected for mortality by age five and no right censoring A single age at death is known The likelihood for exact times to death with right truncation is y f t 18 L 8 t t le From this likelihood it can be seen that an individual s probability of death is the pdf at the age of death divided by the area from 0 to te which renormalizes the pdf for the period of actual observation An individual likelihood is constructed in mle aS PDF GOMPERTZ 2 1 2 1 6 0 05 0 3 END which is a death at the age of 2 1 113 Statistical examples MLE TITLE Example DATAFILE ex6 dat OUTFILE ex6 out DATA tdeath FIELD 1 Left truncation time END talpha 5 0 set a constant for right truncation MODEL DATA PDF GOMPERTZ tdeath tdeath talpha PARAM a LOW 0 00001 HIGH 0
145. fined procedures and functions are now available 21 mle 2 1 manual BREAK CONTINUE and EXIT statements have been added DOS Windows versions of mle execute from two to five times faster Versions are now available for Linux and other operating systems New versions are not available for Solaris SPARC systems The 64 ki limit on user defined arrays and DATA variables in DOS Windows versions has been lifted The dataarray structure for defining arrays single or multidimensional lt expr gt lt expr gt has been added for assigning initial values to array variables Array variables can be assigned to other array variables of identical size Complex numbers are now supported Many functions have been extended to return complex numbers Complex numbers are specified as the expression for example 2 7 3 41 The REAL2STR function has been modified to provide for many new formats Some predefined files are now flushed i e buffered data are written before the program exits SYMBOLICINFIN is a new Boolean variable that when TRUE the default writes oo and oo for infinity When false it prints a number This is useful when writing output to be used by other programs Also the value of infinity can be changed by assigning a new value to INFINITY The default width of real numbers is controlled by the REALWIDTH and the default number of decimal places is controlled by the REALDECIMALS variables
146. follow the numeric expressions in the CURVE END These strings will be appended to the Gnuplot plot statement so that plot options or other functions can be plotted The statements will be written to the plot file The typical purpose is to re plot curves in a different style Suppose we want to plot the normal distribution with u 0 and o 5 over the range 10 to 10 and also show an 21 point histogram superimposed on the continuous curve The mle code to do this is MLE PLOTFILE DEFAULTPLOTNAME open a plot file PLOT set ylabel normal pdf f t set xlabel t CURVE WITH boxes x 10 10 21 x the x value PDF NORMAL x 0 5 END the function to plot with lines END do END plot END mle The plot file written in the Gnuplot graphics language looks like this 91 mle 2 1 manual set terminal windows reset set data style lines set autoscale set nokey set ylabel normal pdf f t set xlabel t plot eg6 001 using 1 2 notitle with boxes Y with lines The first line was written when the PLOTFILE statement was executed The next line is blank because the PLOTINIT variable written to the file when PLOT was executed is empty The next line came directly from the string argument list for the PLoT statement The line beginning with plot was generated by the CURVE statement Notice that the Gnuplot continuation character comes at the end of the line This means that the next line
147. g whereas the latter does not The MULTIPLOT lt x gt lt y gt END statement can be used to create x by y grids of xxy plots on a single page The statement writes commands to the plot file and an initialization string taken from the variable MULTIPLOTINIT The PLOT END statement initiates a single plot graph or chart It will write an initialization string to the plot file taken from the variable PLOTINIT The CURVE END statement writes a single curve to the current plot This is the statement that writes the Gnuplot plot and splot statements to the plot file Each curve statement also creates a data set used by the plot file The name of a plot file should usually end in the file extension p1t because this extension is used by mle and Gnuplot 86 mle 2 1 manual mle can select a plot file name based on the name of the program file by using the DEFAULTPLOTNAME function The statement PLOTFILE DEFAULTPLOTNAME will create a plot name that matches the name of the program file but with the p1t extension replacing the extension mle The plot file will accumulate graphics instructions from the mle muLTIPLOT WRITEPLOTLN PLOT and CURVE commands until a new plot file is opened or the mle program terminates The plot file is then processed through Gnuplot to display or print the plots The Plot Statement The PLOT END
148. ge or fine tune the behavior of mle and the values of these variables can be changed with assignment statements Assignment statements may be placed anywhere within the body of the mle program that is between the MLE and its matching END Some examples are 3 e wast Normally assignment statements do not occur within the DATA END and MODEL END statements Assignment like statements occur within the DATA statement for transformations Additionally the PREASSIGN and POSTASSIGN functions allow a list of one or more assignment or other statements to be used Finally the PARAM END statement uses assignment like statements like to define start highest and lowest values of parameters 10 MAXITER 100 EPSILON 0 0000 PRINT_OBS TRUE mle 2 1 manual Set the maximum number of iterations 001 Set the criterion for convergence print all observations after transformations The simplest assignment statement is generically defined in this way lt variable name gt lt expression gt The lt variable name gt name can be a preexisting variable e g MAXITER EPSILON or a user defined variable The lt expression gt that follows the equal sign can be a simple constant another variable or a mathematical expression Details of the syntax and functions that can be used to make up expressions are given in a later chapter The following are some examples of assignm
149. grams of any significance contains some type of looping or iteration mle has the For statement the REPEAT statement and the WHILE statement for this purpose The IF statement provides the means to conditionally executing statements Here is a simple example 144 Programming tutorial MLE age REAL WRITE How old are you READLN age IF age lt 0 THEN WRITELN That s not possible ELSEIF age lt 4 THEN WRITELN Perhaps you were you giving your age in decades ELSEIF age gt 115 THEN WRITELN Perhaps you are giving your age in months ELSE WRITELN Live long and prosper END if END The 1F statement will execute only one of the WRITELN statements depending on the range of values entered The statement works this way First it evaluates the expression after the IF If the expression is true the first WRITELN will be executed and then flow will jump to the end of the 1F statement That is all the other parts of the IF statement will be skipped If the expression after IF is FALSE the first ELSEIF expression will be evaluated Again if it evaluates to true the statement s that follows will be executed and control will jump to the end of the IF statement As a last resort when all IF and ELSEIF expressions evaluate to FALSE the statement between ELSE and END will be executed Generically this is what the statement looks like
150. gt brings up the Block menu This menu provides editing functions for selecting moving and performing other functions on blocks of text The menu contains these elements markBegin markEnd Goto Copy Delete Move cLear Write Quit Search menu Marks the beginning of a block Marks the end of the block Goes to the currently marked block Copies the current block Deletes the current block Moves the current block Removes the current block Writes the current block to a file Quits this menu From the main menu lt Alt gt s brings up the Search menu This menu provides text searching and replacement functions The menu contains these elements Find Find Next Find Opts Replace Replace neXt Replace oPts Goto_line Quit Mle menu Searches for a string of text Searches for the next occurrence of the text Searches for text after setting the search options Searches and replaces text Searches and replaces text again Searches and replaces text after setting some options Goes to the specified line number Exits the menu From the main menu lt Alt gt M brings up the Mle menu This menu provides some several mle related special functions The menu contains these elements 34 Parse Run Expression template Insert template Options Quit Window menu mle 2 1 manual Submits the current file to mle with the parse option p This in effect checks for syntax errors Submits the
151. gt will not be executed unless the start lt expr gt is less than or equal to the To lt expr gt Likewise if the step size is less than zero then the start lt expr gt should be greater than or equal to the To lt expr gt Form 3 of the For statement performs the loop in a fixed number of steps defined by the lt expr gt after STEPS in equally spaced values from the start lt expr gt to the To lt expr gt The variable lt v gt is declared as type REAL or must be REAL if it is already defined Here is a simple example that goes from O to 1 in 100 steps FOR x 0 TO 1 STEPS 101DO END Form 4 of the ror statement takes an array variable or a dataarray and loops through the array from its lowest bound to its highest bound The index variable may be any type and must match the type of the array elements Here is an example using a dataarray FOR x TRUE FALSE FALSE TRUE TRUE DO END REPEAT Statement The REPEAT statement loops through statements until some condition is met The format is REPEAT lt statements gt UNTIL lt bexpr gt The lt statements gt are executed and then the Boolean expression lt bexpr gt is evaluated If the result iS FALSE the loop repeats and lt statements gt are executed again When lt bexpr gt evaluates to TRUE the loop terminates A REPEAT statement always executes the lt statements gt at least once
152. h so many ways of doing the same thing you might well ask what is the best way The answer is that the best way is to write it in the way that is clearest to you so that you can read the program a year later and be able to make sense of what you were doing Reading from the keyboard Reading from the keyboard is sometimes very useful Here is a program that prompts a user for information from the keyboard It asks for sample sizes means and standard deviations from two studies computes a pooled standard deviation and computes a paired t test MLE This program computes a paired t test Define the variables to read nl INTEGER n2 INTEGER ul REAL u2 REAL sl REAL s2 REAL Read in the sample sizes means and standard deviations WRITELN Paired t test WRITE Sample size 1 READLN n1 WRITE Sample size 2 READLN n2 WRITE Mean 1 READLN ul WRITE Mean 2 READLN u2 WRITE Stdev 1 READIN s1 WRITE Stdev 2 READIN s2 Compute the values of interest dfl nl 1 df2 n2 iT dfp df1 df2 s_pooled SOR df1 s1 2 df2 s2 2 dfp t ul u2 s_pooled soR 1 n1 1 n2 STUDENTT t dfp Now write the results to the screen WRITELN Pooled t t df dfp One tailed p p END 141 Programming tutorial The prompts for information are written using the WRITE procedure This means that the cursor does
153. hat define the data file and the output file The pata statement comes next followed by a MODEL statement Omitted sections of code are specified lt like this gt MLE DATAFILE mydatafile dat for example OUTFILE myoutfile out TITLE MAXITER 100 DATA lt Data specification gt END MODEL lt Expression gt RUN lt Run specification gt END END An Example Figure 1 is an example of an mle program that estimates the parameters of a likelihood The problem at hand is to estimate the distribution of gestational ages at birth given for the observations shown in Figure 2 These observations are counts of gestational ages at birth that were mostly recorded two within one week We will use survival analysis to estimate the parameters u and for the best fitting normal distribution This is an example of survival analysis with interval censored observations In this example each line in the data file represents multiple observations The number of observations on each line is given as frequencies within each interval mle 2 1 manual Distribution of gestational age Data are from Hammes and Treloar 1970 Am J Pub Health 60 1496 1505 MAXITER 50 Maximum number of iterations allowed EPSILON 0 0000001 Criterion for convergence of the model DATAFILE hammes dat Opens the input data file OUTFILE hammes out Opens the output file DATA This is the data statement Data are interval cens
154. he Wolves of Mount McKinley Fauna of the National Parks of the U S Fauna Series No 5 238 pp U S Dept Int National Park Service Washington Nelder JA and Mead R 1965 A simplex method for function minimization Computer Journal 7 308 13 Nelson W 1982 Applied Life Data Analysis New York John Wiley and Sons Pearson K 1895 Phil Trans Roy Soc London Series A 186 343 414 Pearson K 1900 Phil Mag and J Sci 5 Series 50 157 75 Pickles A 1985 An Introduction to Likelihood Analysis Norwich Geobooks Powell MJD 1964 An efficient method for finding the minimum of a function of several variables without calculating derivatives Comp Journal 7 155 62 Press WH Flannery BP Teukolsky SA Vetterling WT 1989 Numerical Recipes in Pascal The Art of Scientific Programming Cambridge Cambridge University Press Raftery AE 1995 Bayesian model selection in social research Sociological Methodology 25 111 195 Rao CR 1973 Linear Statistical Inference and Its Applications New York John Wiley and Sons Ridders CJF 1982 Advances in Engineering Software 4 2 75 6 Royall R 1999 Statistical Evidence A Likelihood Paradigm London Chapman amp Hall CRC 165 References Salvia AA 1985 Reliability applications of the alpha distribution JEEE Transactions on Reliability 34 251 2 SAS Institute 1985 SAS User s Guide Statistics Version 5 edition Cary NC SAS Institute Inc Schr dinger E 1915 Z
155. he model with parameter p free and secondly fit it with p held constant to 0 5 Statistical criteria a likelihood ratio test Akaike s Information Criterion or a Walt test can then be used to determine whether p deviates from the value 0 5 The run list defines which parameters are free and allows the user to test reduced models The run list begins with the word RUN and ends with a matching END Between the RUN and the END comes a list that specifies how the model is to be run Each model can be run with a different combination of free and fixed parameters Generically a runlist looks like this RUN FULL THEN END REDUCE lt reducelist gt THEN END WITH lt withlist gt THEN END END 59 FULL REDUCE largeeffect MODEL DATA mle 2 1 manual When FULL is specified all model parameters defined with the PARAM END function are taken to be free parameters and estimated Only one FULL is usually needed for a model The REDUCE keyword provides a mechanism to constrain some parameters of the model The REDUCE keyword is followed by a list of constraints All parameters of a model will be considered free except those constrained in the lt reducelist gt Parameters may be constrained to other parameters to constants or to variables More than one REDUCE keyword may occur in a single run list The lt reducelist gt is a set of one or more constraints that look like assig
156. however they may represent other measurements length dose height etc 72 mle 2 1 manual Table 5 Likelihoods returned by PDF for one two three and four time variables under different conditions The Log Normal distribution is used as an example Example LNNORMAL LNNORMAL te LNNORMAL te LNNORMAL te LNNORMAL te LNNORMAL ty te ta LNNORMAL ty te ta LNNORMAL tu tes to to LNNORMAL tu tes to to INNORMAL fm te ta INNORMAL ty te tas to PDFs contribute to likelihoods in a number of ways When 1 arg tuzte te 00 u lt te ta lt ty to te tu te ta tu te ta Class Exact failure at t Exact failure at t t Right censored or cross sectional non responder at t Cross sectional responder at te Interval censored over the interval to te Includes as a limiting case cross sectional responder and right censored Left truncated exact failure Left truncated interval censored failure Left and right truncated exact failure Left and right truncated interval censored failure Hazard Right truncated hazard Resulting Likelihood L f t L fuse L f de S t u L f dz Flt L f dz S t S t u 10 _ ft roa 2 Re _ S t S t SU S roa 2 L f t od F t L S t S o _SU S t _ St S t Oe S t S t f dz oe poh S
157. ical functions like GAMMA BETA and BESSELI return real values and so the variable to which these functions are assigned must be type REAL as well Integer variables and functions can always be assigned to real variables they are automatically converted to real values on assignment On the other hand you must use the ROUND Or TRUNC functions to convert a real number into an integer value for assignment to an integer varuable e Integer variables take on whole number values over a machine dependent range of numbers For most versions of mle this range is 2 147 483 648 to 2 147 483 647 Arguments to some functions require INTEGER type variables like IDIV e Complex variables include a real number part and an imaginary part Complex numbers are specified by expressions such as 1 2 0 4i Most mathematical functions are defined for complex types For example sorRT 1 Oi returns 0 000 1 000i There is no natural ordering for complex variables so that the comparisons lt lt gt and gt are undefined Boolean variables take on one of two states TRUE or FALSE No other value is allowed or recognized Boolean expressions are frequently used to test conditions For example the IF THEN ELSE END function evaluates the first expression between the 1F and THEN to 4 Tia i Be aware however that the computer representation for real numbers is
158. ically this is a variable like age sex income etc The effect of the covariate is multiplied by the value of the PARAM function that follows The way in which covariates and parameters are modeled is discussed in more detail below Here is an example of a likelihood hand coded for an exponential PDF for exact failure times PARAMs and built in simple functions and algebraic expressions are all shown in this likelihood MODEL DATA PARAM lambda LOW 0 HIGH 1 START 0 23 END EXP lambda t END RUN FULL END Notice that lambda is first defined as a parameter and thereafter is used as an ordinary variable As mle iteratively seeks a solution new values of lambda will be tried As the likelihood itself is being computed the PARAM function will simply return the current estimate of Lambda An alternative way to code this example is to define the parameter first and assign it to another variable MODEL PREASSIGN lam PARAM lambda LOW 0 HIGH 1 START 0 23 END DATA lam EXP lam t END data END preassign RUN FULL END model The PREASSIGN function is described in another chapter In the next example five parameters are defined two each for the two PDF functions and one parameter that was added for the first argument to the mIx function call Typically parameters are defined for the intrinsic parameters of a PDF function For example the normal PDF has two intrinsic parameters u and O The firs
159. ics of interest are teen mothers mothers from 20 to 33 and mothers over age 35 Two dummy variables can be created from the continuous measure of age The reference age group can be defined as mothers from 20 to 35 One dummy variable is created that takes on the value 1 for mothers under 20 and O otherwise And the second dummy variable takes on a value of 1 for mothers over 35 and a O otherwise Dummy variables are easy to create within the DATA statement Suppose you are measuring the length of some study animal You want to create four dummy variables for the length range short 0 to 30 mm medium 30 to 40 mm long 40 to 50 mm and very long 50 mm gt The xxx yyy notation defines an interval that includes exact number xxx and up to but not including yyy 50 DATA length is_short is_medium is_long is_verylong IF length gt 50 THEN 1 ELSE 0 END mle 2 1 manual FIELD 5 DROPIF length lt 0 IF length lt 30 THEN 1 ELSE 0 IF length gt 30 AND length lt 40 THEN 1 ELSE 0 IF length gt 40 AND length lt 50 THEN 1 ELSE 0 Skipping initial lines in the data file Data files may have initial descriptive lines at the top that must be skipped The INPUT_SKIP variable controls how many lines to skip in a data file For example if the first four lines must be skipped the line INPUT_SKIP 4 should appear before the pata statement It will direct mle to discard the first four lines of the
160. ifferent types of graphics devices In order to plot or display a graph on a particular device you must specify a terminal type Gnuplot can then generate graphics for that specific device As an example in previous graphs in this chapter the device was set to Windows the graphs were copied and pasted into this document The terminal Gnuplot statement set terminal windows is in all of these programs You can set the terminal to another device One type of device defined by Gnuplot is a dumb terminal specified by set terminal dumb You can the graphics device to a dumb terminal in two ways First you can editing the Gnuplot program i e the program that ends in plt and add this statement before the plot command and after any other set terminal statement Alternatively you can insert the command WRITEPLOTLN set terminal dumb in the mle program after the PLOTFILE statement The following example shows the result of plotting the previous sine and cosine example with the terminal set to dumb Even so Gnuplot is not distributed under the same license In fact it is a coincidence that GNU appears in Gnuplot and is the name adopted by the Free Software Foundation See the Gnuplot manual for details 97 mle 2 1 manual A a a tas Pa eee KEK eS SS EEEE E fe aoe Se ee A ee He Ht KKK KKK Ht k k kk kk kk k k Kk Kk Kk 4 HE x x HEE o xx Ke tz HA R A
161. ing the perspective MLE PLOTFILE DEFAULTPLOTNAME PLOT set zrange 0 1 set contour base set nosurface set yrange reverse set view 180 0 CURVE x 3 3 25 BY y 3 3 25 X Y EXP x 2 1 8 x y y 2 la type of bivariate normal END curve END plot END 100 mle 2 1 manual A Helix A helix is defined parametrically with simple functions The following code generates a helix MLE PLOTFILE DEFAULTPLOTNAME WRITEPLOTLN set zrange 1 set view 60 30 0 75 2 set hidden3d PLOT CURVE x 0 2 15 BY y PI 4 PI 40 x COS y X SIN y y 3 END curve END plot END LO nueln 2 i 1S L05003T ps5 aes Geometric Figures Mathematically defined geometric figures can be easily drawn This example shows a number of useful tricks in Gnuplot including turning off the axis borders and graphing multiple plots MLE PLOTFILE DEFAULTPLOTNAME WRITEPLOTLN set zrange 0 set hidden3d set view 70 WRITEPLOTLN set noborder set noxtics set noytics set noztics PLOT CURVE x 0 2 PI 20 BY y 0 4 20 plot a cone SIN x y COS x y y 5 END curve CURVE x 0 2 PI 20 BY y 0 2 PI 20 Now plot a torus around the cone COS x 3 COS y SIN x 3 COS y SIN y 2 5 END curve CURVE x 0 2 PI 20 BY y PI 2 PI 2 20 And place a sphere on top COS x COS y SIN x COS y SIN y 6 END curve END plot END mle 101 mle 2 1 manua
162. ion is dropped The line newvar FIELD 3 DROPIF newvar lt 0 will drop all observations when the variable in field three is not positive 0 KEEPIF provides another mechanism to drop observations The expression following KEEPIE must evaluate to TRUE or FALSE If FALSE the observation is dropped that is not kept The line newvar FIELD 3 KEEPIF newvar gt 0 will drop all observations for which the variable in field three is not positive KEEPIF and DROPIF expressions can be far more complex but must return TRUE Or FALSE Usually data are read from a data file The DATAFILE procedure defines and opens this file Here is an example DATAFILE test dat DATA o_time FIELD 1 o_time 365 25 DROPIF o_time gt 1000 c_time FIELD 3 IF c_time 1 THEN c_time ELSE c_time 365 25 END height FIELD 6 DROPIF height lt 0 heightsq height 2 missing FIELD 4 DROPIF missing_data lt gt 1 frequency FIELD 5 DROPIF frequency lt 0 END The variable names FREQUENCY or FREQ are taken as frequencies for each observation If both variable names are used FREQUENCY is taken as the frequency variable The frequency of each observation is used to compute a proper likelihood as if multiple lines of identical observations were read If the FREQUENCY or FREQ keywords are missing a frequency of one is assumed for each observation
163. ion with parameters For example it might be the SILER model if individuals span the entire lifespan or it might be the MAKEHAM Gompertz Makeham model if the entire sample consists of adults Under either model the likelihood given a series of skeletal ages is Table 7 Ages at death for 608 Dall mountain sheep Source Deevey 1947 Minimum age Maximum age Number dying in interval 33 88 7 8 7 18 28 29 42 80 2 3 4 5 6 7 8 9 121 10 11 12 Statistical examples js La 1 F 1 10 I if exact ages are known or N L S 10 S 16 i l if ages are known over intervals The data of Murie 1944 as reported in Deevey 1947 will serve as our example The raw data consist of 608 Dall mountain sheep skulls collected in the Mt McKinley Park Table 7 The ages at death were determined from the annual growth rings on the horns Causes of death were not determined but predation by wolves was quite common The data were fit by maximum likelihood to the mixed makeham model The most parsimonious model had all parameters except the Q parameter The following parameter estimates and standard error were found p 0 221 0 018 Qt 1 297 0 211 0 0 00146 0 00032 B3 0 618 0 023 The log likelihood was 1461 350 Probability of success Treatment length days The interpretation of the mixed makeham model is that there are two subgroups a
164. is New York John Wiley amp Sons Wise ME 1966 Tracer dilution curves in cardiology and random walk and lognormal distributions Acta Physiologica Pharmacologica Neerlandica 14 175 204 Wood JW 1989 Fecundity and natural fertility in humans Oxford Reviews of Reproductive Biology 11 61 109 Wood JW 1994 Dynamics of Human Reproduction Biology Biometry Demography Hawthorne NY Aldine de Gruyter Wood JW Holman DJ O Connor KA and Ferrell RE 2001 Models of human mortality In Hoppa R and Vaupel J eds Paleodemography Age Distributions from Skeletal Samples Cambridge Cambridge University Press Wood JW Holman DJ Weiss KM Buchanan AV LeFor B 1992 Hazards models for human biology Yearbook of Physical Anthropology 35 43 87 Wood JW Holman DJ Yasin A Peterson RJ Weinstein M Chang M c 1994 A multistate model of fecundability and sterility Demography 31 403 26 Zipf GK 1949 Human Behavior and the Principle of Least Effort Reading Addison Wesley 166 mle 2 1 manual 167
165. ivate to that function The function declaration does not actually do any work in a program The work comes when the function is called as in the wRITELN line that calls the function Once called each argument is 18 mle 2 1 manual evaluated and a copy is assigned to the formal argument defined in the heading of the function The statements within the function are executed and the result is passed back to the expression Within a function the variable RETURN is automatically declared RETURN can be used as an ordinary variable When the function exits the value stored in RETURN is passed back to the calling expression Here is what this example produces 4 0000000000 20 250000000 125 00000000 Here are some other notes about user defined functions e Like procedures vAR arguments can be defined e Arrays can only be passed as var arguments to user defined functions e Functions and procedures can be defined and called within a function but will not be available outside that function e User defined functions can overwrite the name of intrinsic functions BEGIN END Statement The BEGIN END statement provides a means of providing multiple statements in contexts where only a single statement is allowed The format is BEGIN lt statements gt END The most important use for this statement is with the PREASSIGN END and POSTASSIGN END functions discussed in a later chapter FOR
166. ken as the frequency variable For example DATAFILE test dat DATA frequency FIELD 1 DROPIF frequency lt 0 start_time FIELD 2 last_time END FIELD 3 will take the first field in test dat as the frequency for each observation The maximizer will automatically use the frequency variable as a count of repeated observations Transformations of data A number of simple data transformations can be made within mle The transformations are done while the data are being read from the input file Examples of transformations are 49 mle 2 1 manual DATA event_time FIELD 5 direction FIELD 6 winglength FIELD 8 estage END event_time 1900 365 25 DROPIF event_time lt 0 COS direction LN winglength 2 25 3 7 winglength 12 76 winglength 2 1 14 Transformations begin with which is followed by an expression Expressions are discussed in great detail in the reference manual Basically expressions in mle are similar or identical to expressions found in other computer languages and spreadsheets In the first variable declaration of the example event_time is read in from the input file That initial value Of event_time is then used in the transformation and a new value of event_time is computed as event_time 1900 365 25 This result is assigned back to event_time Following that the DROP1F statement will conditionally decide whether or not the observation is to be dropp
167. keyword e g using a predefined function 134 Types Programming tutorial e An additional point of good programming practice is to create variable names that are meaningful Choose subject_birthdate over something less descriptive like soa Doing so will pay off in the extra time many times over The payoff comes when you look at your program weeks months or years later and are able to quickly understand what the program does On the other hand some abbreviation is warranted particularly if you do so consistently for all variables If you always use subj in place of subject the variable name subj_birthdate might work just as well The variable signature is also assigned a string value In this case the string value is computed as the concatenation of three separate elements first a string constant secondly a string value returned by the REAL2STR function and third a string constant Assignment statements serve two purposes First they create new variables The variables population greeting and signature did not exist until they were defined in the assignment statement When each variable is first used in an assignment statement its type is determined by the type returned from the expression on the right hand side of the assignment statement The other purpose of assignment statements is to assign values to variables as is done here Once a variable is created it can be assigned other values of the same type or values th
168. l Animation Example Multiple PLOT END statements can be used to create animation in mle Alternatively the time dimension can be introduced with the use of a looping statement outside of the PLOT END statement Gnuplot has a pause command that helps control the length of time each plot is displayed Here is an example MLE An animation example PLOTFILE DEFAULTPLOTNAME open plot file WRITEPLOTLN set contour both set hidden3d FOR f 4 TO 9 DO PLOT pause 2 wait two seconds before showing the next plot CURVE x 10 10 30 BY y 10 10 30 Ke yr BESSELT O SORT xX Z y 2 f END curve END plot END for END mle This example produces this sequence of plots 102 mle 2 1 manual Creating Plots from the Model Statement The MODEL statement can create two types of commonly used plots that are related to model estimation The first plot includes three graphs of distributions the survival density function the probability density function and the hazard function Each of these is graphed with error bars The second type of plot is a likelihood surface graph in either one or two variables Before attempting to plot either one of these special plots a plotfile must be defined with the PLOTFILE procedure This opens the plot file and defines the name of the plot data file Additionally the PLOT END statement must surround the MODEL statement Estimated Distributi
169. le and assigns the value to the variable age After that the expression age 365 25 270 is evaluated and the result assigned to the variable age The second example reads the second field and assigns the value to the variable weight Following that the expression weight 1000 1s evaluated and assigned to the variable weight Then the expression weight lt 0 is evaluated If TRUE the observation is dropped If not the observation is kept e Observations can now be simulated or otherwise created within mle without reference to a data file This is done by setting CREATE_OBS to some positive value example will create 100 uniform random observations CREATE_OBS 100 DATA v1 FIELD 1 RAND END data The following e A number of useless functions that were used with the old data transformations have been eliminated e g ONE SECOND ONEIF RESPONSE etc e A number of new functions have been added e g DEFAULTOUTNAME FISH ER ISODD STRING2REAL INT2STR EOF EOLN A fairly complete set of functions are now available to work with calendar dates A full list of simple functions can be generated by typing mle h functions e The PREASSIGN and POSTASSIGN functions have been generalized so that any single statement is allowed in the statement part of the function By using a BI EGIN END block more than one statement can be us
170. list of parameters can be specified that will be used to create a series of models More than one wITH keyword may occur in a single run list The lt withlist gt is a list of parameters Parameters are listed in one of two ways Parameters listed outside of parentheses are included in every model Parameters listed in within parentheses are included in some models but not others essentially all permutations of models are generated from parameters listed in parentheses 60 MODEL DATA PDF BERNOULLITRIAL success PARAM b_0 LOW 500 HIGH 500 FORM LOGISTIC COVAR x1 PARAM b_1 LOW COVAR x2 PARAM b_2 LOW COVAR x3 PARAM b_3 LOW END mle 2 1 manual Here is an example Suppose the likelihood of interest specifies a logistic regression model with three covariates N 1 L B t i l 1 exp B Bit Boa B x B t p is a Bernoulli trial with parameter p the function returns p whenever t is 1 success and returns 1 p when t is O failure This likelihood has four parameters fp defines the baseline probability of success PB to B are the effects of covariate x to x3 on the baseline probability respectively A natural way of estimating this model is try every permutation of covariates and take the most parsimonious of the models Here is a likelihood that will do just that 10 HIGH 10 HIGH 10 HIGH 10 START 10 START 10 START param END pdf END data RUN WITH b_0 b_1 b_2 b_
171. llespie 1989 and then compiling the result with ac compiler An old version of mle 2 0 10 is still available for Solaris Sparc but I have made the agonizing decision to restrict future development to architectures supported by FPC For the first time user of mle I would like to offer this encouragement Many of the uninitiated find the ideas behind maximum likelihood estimation completely foreign Yet the principles once grasped are utterly straightforward and fundamental A Zen like attitude really helps empty your mind of traditional statistical teachings Learning the mle language for doing likelihood estimation essentially involves thinking about the likelihood of an observation and specifying the likelihood for that observation in a way that is useful to the computer Once you begin thinking in this mindset the rest is straightforward hard work Many people have contributed their ideas insights criticisms and bug reports Other s have given me time or space for development datasets interesting analytical challenges or have given of his or her time in reading or testing I thank Ken Bennett Adam Connor Henry Harpending Dennis Hogan Robert E Jones George Kephart Goeff Kushnick Lyle Konigsberg Arindam Mukherjee Kathleen O Connor Paul Riggs David Steven Bethany Usher Kenneth Weiss and James Wood Their encouragement and interest are deeply appreciated I suspect that few software manuals come complete with dedications
172. lt color gt lt fontname gt lt size gt for displaying in windows 98 mle 2 1 manual The FINISHPLOT procedure The procedure FINISHPLOT provides a way to execute Gnuplot from within the mle program itself The procedure takes a single boolean argument Here is what the procedure does e If the argument is TRUE a pause 1 statement will be written to the plotfile This will cause the graph to be displayed until you either press a key or click on a dialog box If the argument is FALSE the pause statement is not written to the plotfile e The plotfile is closed No more curves can be written to this file e The Gnuplot program is executed with plotfile as its argument This will cause the plot to be written to whatever terminal is defined For example if the command set terminal windows Windows or set terminal x11 Unix is specified in the plotfile the graph will be displayed on the screen Other drivers will cause the plot to be written to the file defined by a Gnuplot set output command Additional details on how the Gnuplot program is executed see the description of the FINISHPLOT procedure in the procedure summary chapter More Examples Additional examples of graphical programming in mle are given here Graphing PDFs SDF CDF and HFs Here is an example of plotting all four basic probability functions for the Weibull distribution with three different sets of parameters This example shows multiple plots in o
173. lue is taken from the PARAM function as the Low value and the maximum is taken from HIGH value Here is an example of statistical estimation and plotting of distributions and a likelihood surface 103 0 9 0 8 0 7 0 6 0 5 0 4 0 3 0 2 0 1 0 1 mle 2 1 manual MLE TITLE Japanese tooth eruption lower first incisor DATAFILE japan dat OUTFILE DEFAULTOUTNAME PLOTFILE DEFAULTPLOTNAME DATA Lite FIELD 5 LINE 1 earliest eruption age for lower central incisor Lite FIELD 6 LINE 1 latest eruption age sex FIELD 3 LINE 2 Child s sex END PLOT_DISTS TRUE DIST_T_START 5 0 Plot the distribution from 5 DIST_T_END 15 0 to 10 months DIST_T_N 25 in 25 points PLOT surrounds the model statement MODEL DATA PDF NORMAL lilo lilc PARAM mean LOW 6 HIGH 10 START PARAM stdev LOW 1 2 HIGH 3 START END pdf normal END data RUN FULL SURFACE mean stdev plots the surface for mean and stdev END model END plot END mle The following four plots result Survival Function Probability Density Function Hazard Function Likelihood 200 240 280 320 360 400 104 Statistical examples 105 Statistical examples Chapter 6 Statistical examples This chapter provides a series of examples in creating likelihood models and estimating parameters of the models The examples are categorized by the type of likelihood problem being done Some of the example
174. ly the statement looks like this REPEAT lt statements gt UNTIL lt bexpr gt The lt statements gt are executed and then the boolean expression lt bexpr gt is evaluated If the result is FALSE the loop repeats and lt statements gt are executed again When lt bexpr gt evaluates to TRUE the loop terminates A REPEAT statement always executes lt statements gt at least once The next example is a program that converts polar to rectangular coordinates The REPEAT statement is used to verify that the angle falls in the proper range 147 Programming tutorial MLE Program to convert polar coordinates to rectangular coordinates angle REAL radius REAL twopi 2 PI REPEAT WRITE Angle in radians READLN angle good angle gt 0 AND angle lt twopi IF not good THEN WRITELN Angle must be gt 0 and lt twopi END UNTIL good WRITE Radius READLN radius x POLARTORECTX angle radius y POLARTORECTY angle radius WRITELN Rectangular coordinates are END mle WHILE statement The WHILE statement provides a means of looping through statements while some condition is met The format is WHILE lt bexpr gt DO lt statements gt END The boolean expression lt bexpr gt is executed first If the value is TRUE the lt statements gt are executed once and lt bexpr gt is evaluated again The sequence contin
175. m in mle will make it easy to move to another programming language There is no single language that is good at handling all programming problems All languages have strengths and weaknesses for particular programming tasks mle is good for doing straightforward manipulation of data and scientific computation and developing simple simulations The extensive library of pre defined functions is what makes mle useful for these tasks The language is not suited for building complex interfaces using the mouse graphics menus etc and is not good for low level development like for writing an operating system Additionally mle is an interpreted language Hence if speed is an important criterion then conventionally compiled languages like C or Pascal should be used instead of mle Elements of mle programming The first program The outline of an mle program looks like this MLE lt statement 1 gt lt statement 2 gt lt statement 3 gt END mle 132 Programming tutorial Between the keywords MLE and END comes a series of statements When the program is run each statement is executed in turn Let s put a statement in Type the following text into an editor save it and run it MLE WRITELN Hello Universe END This program consists of a single WRITELN procedure WRITELN takes a list of zero or more arguments writes them to the screen or to a file in some circumstances and puts the cursor a
176. m one iteration to the next falls below this value E Comments Comments can be placed throughout the body of a program by enclosing the text in curly brackets and Likewise the curly brackets can be used to effectively remove large sections of code A second way to comment out all or part of a single line is to put a pound sign at the point where you want the comment to begin mle ignores all text from the pound sign to the end of the line Reading Data The data file called hammes dat is shown in Figure 2 Data files are standard ASCII text files of numbers The numbers are organized into a series of fields Each field is usually delimited by white space tabs or spaces as used in Figure 2 or commas You can specify your own list of field delimiters for example if your data are separated by colons or semicolons This is done by changing the value of the variable called DELIMITERS see the DATA chapter for details The data in Figure 2 are structured as three columns of numbers The first field is the last observed gestational age prior to birth The second field is the observed gestational age after a birth was 4 mle 2 1 manual observed These two times form an interval within which the birth occurred i e the birth occurred at some unknown time within this interval The third field is the number of births that were observed within the interval Figure 2 Data file read by the program in Figure 1
177. mle A Programming Language for Building Likelihood Models Version 2 1 Volume 1 User s Manual Darryl J Holman mle user s manual Preface mle A Programming Language for Building Likelihood Models Version 2 1 Volume 1 User s Manual Copyright 1991 2003 Darryl J Holman Department of Anthropology Center for Studies in Demography and Ecology Center for Statistics and the Social Sciences The University of Washington Box 353100 Seattle WA 98195 djholman u washington edu The software and manual for mle version 2 is distributed in electronic form free of charge for personal and academic use Permission to use copy and distribute this software and documentation is hereby granted for personal and non commercial academic use provided that the above copyright notice appears on all copies and that both the copyright notice and this permission notice appear in the supporting documentation Other uses of this manual or software are prohibited unless the author grants written permission This software may not be sold or repackaged for sale in whole or in part without permission of the author This software is provided as is without warranty In no event shall the Author be liable for any damages including but not limited to special consequential or other damages The Author specifically disclaims all other warranties expressed or implied including but not limited to the determination of suitability of this product for a s
178. model is that the repeated observations for individuals violate the condition that the likelihoods for each observation are independent To fix this problem we can compute an expected likelihood for each individual s observations The integral computes the expected likelihood for each subject Here is a concrete example Say we have data in which levels are denoted by the number 1 or 2 as in Tom Smith 23 4 LOS 3 9 52 22309 Xe 26 8 1 Steven Jones 9 59 LN 26 8 1 Martin Johnson 0 44 1 ENE DL ag 9 9 1 where the observations beginning with a 2 correspond to the individual at the preceding 1 so that Tom Smith has three observations beginning 23 4 19 2 and 26 8 If we were to treat all observations within and among individuals as independent we could simply drop all of the level 1 lines and form a likelihood as the product of all observations But if we want to treat observations within individuals as correlated non independent the we can integrate over a distribution of common effects as shown in the likelihood above Usually we will estimate one or more parameters for the distribution g z in addition to 6 If we assume that g z and f t are normal distributions the likelihood in mle would be specified as MLE DATAFILE example dat OUTFILE example out DATA lev FIELD 1 topen FIELD 2 tclose FIELD 3 END MODEL DATA LEVEL lev 2 THEN INTEGRATE z 12 12 PDF NORMAL z 0 PARAM sigmaz LOW 0
179. n example of nested procedures MLE PROCEDURE printthings sl STRING s2 STRING PROCEDURE indent VAR s STRING n INTEGER Indents a string by n spaces FOR i 1 TO n DO s 5 END for END proc indent indent s1 6 indent s2 12 WRITELN s1 WRITELN s2 END proc printthings END EXIT statement The ExIT statement causes the immediate exit of the current procedure or function If exit is called from the main program it has the same effect as a HALT statement the program is exited 153 Programming tutorial User defined functions MLE FUNCTION mle allows users to define their own functions User defined functions serve a number of very important purposes e Functions are used to extend the types of expressions that can be created e Functions provide a way to collect commonly computed operations into a single place This addresses the frequent need to have the result computed on different variables or in different parts of a program e Functions also help modularize programs into smaller more maintainable components Functions must be completely defined prior to their first reference in a program just like procedures For example suppose you want to write a function that returns the average of two integers You would first define a function that takes two integer arguments The return type of the function must also be defined The body of the function does the computation and then return
180. n facilitates this Essentially the QUANTILE will accept a value drawn from a uniform distribution and return a value that is randomly drawn from the base density A uniform variate from zero to one is generated by the Rann function Before the RAND function can be called the random number generator must be seeded This is done by a call to procedure SEED with a positive integer argument If you prefer not to choose an initial seed value the function CLOCKSEED will generate one using the computer s date and time Here is an example of a program that prints out a number randomly drawn from a Weibull density with user specified parameters SEED CLOCKSEED WRITELN Returns a value drawn from a WEIBULL distribution WRITE a and b parameters of the WEIBULL distribution READLN a b WRITELN QUANTILE WEIBULL RAND a b END END Flow control IF statement Normally statements are executed one at a time in the order in which they appear Frequently it is necessary to loop branch and otherwise modify the flow of programs This section introduces statements and techniques that allow you to modify the flow of program statements First the IF statement is introduced followed by several looping statements A loop is a programming concept that allows segments of code to be repeatedly executed This allows the computer to do what computers do best perform repetitive tasks Almost all pro
181. n not sterile 1 s The second fraction is constant at s S U s S t s The overall hazard at time t is simply the hazard of the non susceptible subgroup weighted by the proportion of that group at time t The proportion of susceptible individuals at time will decrease as fecund individuals fail and must depend on survivorship of the non sterile group to time and the initial fraction of sterile individuals s This fraction at time t is l s S 2 Be SO The hazard at time fis l s S t LO l s f s s S SO s 1 s S t h t pOh t and the probability density function is found as Figure 5 The effect of contamination by a sterile subgroup on the survivorship distribution The subgroup makes up fraction s of the initial population at risk The left panel shows survivorship for the uncontaminated group and the right panel shows the same distribution contaminated by the sterile subgroup 117 9 L O s t t MLE TITLE DATAF ILE OUTFILE DATA topen tclose weight age END MODEL DATA MIX fraction S END RUN FULL END of END Statistical examples FO AOSA 0 S Oh O 8 f These forms for the PDF SDF and hazard function provide for reasonably straight forward maximum likelihood estimation of the parameters of the distribution for the susceptible observations as well as s The general form of the likelihood when sterility is included bec
182. n which mle reads numbers by using the pn option The command line mle pn 8x3017 22 16 12k returns 44 mle 2 1 manual 8x3017 is the integer 1551 22 16 is the real 0 0064771107796 12k is the real 12000 000000000 A list of all number formats is given with mle h numbers Start file options The Sr and sw options work together to read and write temporary results to a file called a start file while a MODEL statement is executing When the Sw option is used the current parameter estimates are written at each iteration The Sr option will read the start file and replace the START parameter values with the start file values The purpose for using these options is to preserve intermediate results for models that take a long time to solve For example if a program will take weeks or months to solve using these options can prevent the loss of work in the event the computer crashes Batch options Batch refers to running programs in an unattended mode Typically batch mode is used when a user or another program starts running a program and then logs out mle provides a few options that assist in running in a batch mode The b option turns off keyboard monitoring for interactive debugging while executing models Normally a user can interrupt mle while solving a model and the interactive debugger can be used However this can potentially lead to difficulties because the keyboard must be monitored While running
183. nction evaluation was inexpensive we merely had to look at a point to know its value Now imagine that the map surface is covered by a piece of paper You can only expose a tiny hole in the map in order to read the color at that point that is to evaluate the function at that point Furthermore each hole takes a long time to cut perhaps minutes or hours Then the question becomes this how do we find the maximum elevation of the map in the shortest possible time The map analogy will be used to understand how different computer algorithms finds the maximum of a likelihood surface Many different function maximization methods have been developed at least since Isaac Newton developed methods out of the calculus Nevertheless no single method has emerged as superior for all types of problems In general function maximization is easiest to do when information is available for the derivative of the function A traditional way of finding maximum likelihood parameters for simple functions is to symbolically find the derivatives of the function with respect to each free parameter Each partial derivative is set to zero This set of equations is collectively called the likelihood equations Since the derivatives are defined as the slope of the function it follows that any place where all the partial derivatives go to zero must be a minimum or a maximum of the function If practical the likelihood equations are solved that is the sets of parameter value
184. ndividual likelihoods for each observation and tables of distributions Bayesian model averaging reports etc This section describes the output options and how to direct the output to a file Defining the output file mle defines a special file that is used for the results of DATA and MODEL statements The oUTFILE is used to define where the results will be sent otherwise they are sent to the screen A number of variables control the format of the output Typically an program used to estimate a likelihood model contains a line like the following near the top of the program OUTFILE analysis out writes to the file analysis out As an alternative to specifying the file name explicitly the function DEFAULTOUTNAME can be called This function will use the name of the program to automatically generate an output file name Suppose you run the command mle myprog mle The statement OUTFILE DEFAULTOUTNAME Will create a file called myprog out for the output Standard Error Report A report with estimated standard errors is printed when PRINT_SE TRUE The parameters will be written with an estimate of standard errors By default standard errors are written to the output file Whenever standard errors are reported a variance covariance matrix will be estimated If the matrix is singular which can happen for a number of reasons the standard errors are 00 When the variable PRINT_SHORT TRUE the rep
185. ne more iteration and then reenters the interactive debugger Pick a symbol selects a symbol to display The value of the symbol is displayed between debugger commands for this and all subsequent calls to the debugger Change the value of a symbol If no symbol is selected the user will be prompted for a symbol to change and then a value to change it to If a symbol is selected with Pick then that symbol will be changed 82 mle 2 1 manual Search for symbols Prompts the user for search text and then searches the symbol table for symbol names that match any part of the search text The name types and value of matching symbols are displayed 83 mle 2 1 manual Chapter 5 Plots and graphs The PLOT command is used to create plots and charts in mle This chapter discusses the command and gives some examples of creating graphs mle does not directly generate graphs Instead it writes graphing programs in the Gnuplot plotting language The graphs can be printed using one of the many device drivers included in Gnuplot Additionally graphs can be imported into a number of text processing languages like TEX or MSWord or manipulated in graphics editing programs Here is a list of the plotting capabilities offered by mle Two dimensional data plots of data points parametric functions bar charts histograms graphs with error bars for x y or both Three dimensional plots including surfaces and contour plots Multiple curves o
186. ne program and how key titles can be added to the plot Also note that the keys are moved around for different sets of plots MLE PLOTFILE DEFAULTPLOTNAME WRITEPLOTLN set xlabel t set autoscale set key minz 0 01 maxz 10 np 100 titles STRING 1 TO 4 Probability Density Survival Cumulative Density Hazard ylab STRING 1 TO 4 t S t F t h t MULTIPLOT 2 2 FOR ty 1 TO 4 DO loop through PDF SDF CDF HF PLOT set title Weibull titles ty Function set ylabel ylab ty FOR v 1 TO 3 DO use three different variances CURVE z minz maxz np KEY Weibull 6 INT2STR z z PDF WEIBULL z IF ty 2 THEN 0 ELSEIF ty 3 THEN oo ELSE z END IF ty 3 THEN z ELSE 0 END 6 v these are the weibull parameters END pdf END curve END for v END plot END for ty END multiplot END mle mle 2 1 manual Here is the result of this program AN a panes es i as a EEES Contour plots Contours can be drawn beneath the surface of a three dimensional plot Here is an example MLE PLOTFILE DEFAULTPLOTNAME WRITEPLOTLN set zrange 0 1 set contour base set hidden3d set view 70 PLOT DO x y 3 3 25 3 3 25 x Y EXP x 2 1 8 x y y 2 la type of bivariate normal END do END plot END A contour plot alone is generated from the previous example by turning off the surface and chang
187. nec com 9477 html Box GEP Hunter WG Hunter JS 1978 Statistics for Experimenters New York John Wiley amp Sons Bratley P Fox BL Schrage LE 1983 A Guide to Simulation New York Springer Verlag Brent RP 1973 Algorithms for minimization without derivatives Englewood Cliffs NJ Prentice Hall Burnham KP and Anderson DR 1998 Model Selection and Inference A Practical Information Theoretic Approach New York Springer Verlag Chew V 1968 Some alternatives to the normal distribution The American Statistician 22 22 4 Chhikara RA and Folks JL 1989 The Inverse Gaussian Distribution New York Marcel Dekker Christensen R 1984 Data Distributions Lincoln MA Entropy Ltd Cox DR Oakes D 1984 Analysis of survival data London Chapman and Hall Cullen AC Frey HC 1999 Probabilistic Techniques in Exposure Assessment A Handbook for Dealing with Variability and Uncertainty in Model and Inputs New York Plenum Press Daniels HE 1945 Proc Royal Soc London Series A 183 405 35 163 References Deevey ES Jr 1947 Life tables for natural populations of animals Quarterly Review of Biology 22 283 314 Dobson AJ 1990 An Introduction to Generalized Linear Models Boca Raton Chapman amp Hall CRC Edwards AWF 1972 Likelihood Cambridge Cambridge University Press Efron B 1982 The Jackknife the Bootstrap and Other Resampling Plans Philadelphia Society for Industrial and Applied Mathematics Eggenb
188. nguages is recognized by mle Operator s Precedence Category 23 mle 2 1 manual NOT High Uniary operators Exponent operator DIV MOD AND SHL SHR Multiplying operators OR XOR Adding operators or lt gt lt gt lt gt low Relational operators The expression 23 4 2 3 is equivalent to ADD NEGATE 23 POWER NEGATE 2 3 which returns 55 Parenthesis can be used to override operator precedence For example 2 5 3 7 will evaluate each multiplication before the addition Addition can be forced to occur first with parenthesis as in 2 5 3 7 MULTIPLY 4 e The DATA statement has been rewritten to have a more intuitive transformation mechanism The transformation looks like an assignment statement following the FIELD specification if any A list of DROPIF lt expr gt and KEEPIF lt expr gt statements can then be specified replacing the old DRop and KEEP statements Here are some examples DATA age FIELD 1 age 365 25 270 convert to days since conception weight FIELD 2 weight 1000 DROPIF weight lt 0 height FIELD 3 KEEPIF height gt 0 bmi height weight 2 END data The formal specification for each variable is this lt var gt FIELD X LINE y lt expr gt DRoP1F lt bexpr gt KEEPIF lt bexpr gt and LINE The first example above reads a value in the first field of the data fi
189. nment statements Parameters so constrained will not be estimated Consider the following likelihood Le II fa lies 0 i l This likelihood estimates the effect of the variable sex on the mean of a distribution Suppose f t is anormal distribution This likelihood would be written as al Eo PDF NORMAL topen tclose PARAM mean low 5 HIGH 500 START 100 FORM LOGLIN COVAR sex PARAM b_sex LOW 5 HIGH 5 START 0 END END param PARAM stdev LOW 0 001 HIGH 25 START 10 END END pdf normal END data RUN FULL Runs the model with no constraints REDUCE b_sex 0 One constraint REDUCE mean 100 b_sex 0 Constrains 2 parameters REDUCE b_sex largeeffect Fixes sex to another param or variable END WITH Notice that there are four versions of the model that will be estimated The first case FULL estimates all three parameters mean b_sex and stdev The second case constrains the parameter b_sex to O no effect so that only two parameters are estimated The third case constrains the parameter mean 4 and b_sex to 0 so that only one parameter is estimated The forth REDUCE constrains b_sex to the value of a variable The w1TH keyword provides a mechanism to include certain parameters in a model The wITH differs from the FULL and REDUCE keywords because a single WITH command can generate more than one model The witH keyword is followed by a list of parameters to always include in each model Additionally a
190. not assigned gt Main menu Alt EE File menu Alt E A Edit menu Alt_B Fdo Block menu Alt uti Search menu Alt Micra mle menu Alt Wears Window menu AltA nanan Help menu Default keyboard mapping The default keyboard map is described in this section The default keyboard mapping can be changed by saving the current map Shift_F9 by default and editing the resulting file Ctrl An eteetetettss unmapped Ctrl_B linedelBOL Delete to beginning of line Ctrl Conan unmapped Ctrl Donar unmapped 0 CE cuts linedelEOL Delete to end of line Ctrl Fonera flipcase eee Change case to end of line Ctrl Greener unmapped 0 Ctrl_H oo chardelback Delete character backspace Ctrl tabneXxt ccee Go to next tab Ctrl lt s c entero Break line at current position Ctrl AA quitnosave Quit without saving Ctrl Lie eee eveere tolower oore Change to lower case to EOL Cte Mis bot secksbel ed A et eledeledeass Break line at current position Cil Nient lineINS 0 cece eee Insert new line Ctrl O sists Ope iese Open Save current file if necessary C Pansin unmapped 0 MAA unmapped 0 0 CI AAA worddel Delete word Ctrl Swe unmapped Ctrl Tce mletemp Insert an mle template Ctrl Uan LOUPPET eee Change to upper case to EOL CHEV Sissies an
191. not go to the next line when waiting for input from the keyboard The READLN statements each read a value from the keyboard and it expects the line to be terminated by the lt Enter gt key In fact the READLN statement like WRITELN can read multiple arguments in one statement Write a program to see what the behavior is when multiple arguments are given to a READLN statement Mathematical computation Summation MLE computes the square of an integer using a series n INTEGER WRITE Integer to square READLN n mle contains many common and some uncommon functions for doing mathematical computation Summation over a series of number is a commonly needed function in scientific programming For example the value n can be computed from the series h Y Qi D Here is a program that reads an integer from the keyboard and i l computes the series in this way n2 SUMMATION i 1 ABS n 2 i 1 END WRITELN n END Products MLE computes factorial function n INTEGER WRITE Find factorial of what integer READLN n 2 is n2 The sUMMATION function takes four arguments The first argument is an integer variable that is the variable of summation In this program i is used as the variable of summation It is not previously defined so it will be implicitly defined by the SUMMATION function The next two arguments are in parentheses They define the upper and lower limits
192. nside the procedure refers to the local variable within the procedure not the global variables defined at the top The keyword var has a very important effect on the arguments root and root2 These arguments once they are modified in the body of the procedure will pass the modifications back to the original calling argument Without the var keyword changing the value of an argument has no effect on the calling arguments In other words vAR makes the argument variable or changeable Calling the procedure To call the procedure the code might include something like this 152 Programming tutorial a REAL a2 REAL a3 REAL rl COMPLEX r2 COMPLEX PROCEDURE quadratic a REAL b REAL c REAL VAR root1l COMPLEX VAR root2 COMPLEX tmpc COMPLEX END The main body of the program starts here quadratic 2 3 4 rl 12 a 4 a2 1 5 a3 1 quadratic a a2 a3 rl 12 END The statements within the procedure are executed the values of rootl and root2 are updated and control is passed back to the main program In the main program the variables rl and r2 have been updated with the results from rootl and root2 Nested procedures New procedure and function definitions can be defined within existing procedures In the same way that variables defined inside a procedure are visible from within a procedure procedures defined within procedures are only visible from within that procedure Here is a
193. nt DATAFILE MYDATA DAT prior to the DATA statement Full path names are permissible you might call the DATAFILE procedure as DATAFILE C STATS MLE BONES DATAFILE DAT 47 mle 2 1 manual The DATA statement The DATA END statement reads in the data file Within the DATA END is a sequence of one or more variable names Here is a simple DATA statement that creates three variables DATAFILE test dat DATA first_time FIELD 3 missing_data FIELD 4 last_time FIELD 1 END This example shows three components for defining each variable the variable name word FIELD and a field number the key Variable name Variables names begin with a letter and can then contain any combination of letters numbers the underscore and period characters A variable name may be up to 255 characters long and all characters are significant Examples of valid variable names are LAST_ALIVE VARIABLE_14 A_REALLY_LONG_VARIABLE_NAME and A Variable names case sensitive so the variable abc is the same as ABC and aBc are not In the current version of mle all variables created in the DATA END statement are defined to be type real This is so even if the number format suggests that the variable should be type integer Integer values read from the data file are simply converted to real number values Text strings can exist within a text file but
194. nty of estimating multiple models can be taken into account mle supports Bayesian model selection and Bayesian model averaging Accessible introductions to these topics can be found in Burnham and Anderson 1998 and Raftery 1995 The following show how to enable Bayesian model selection and the types of model selection are supported AIC_SELECT TRUE selects via Akaike s information criterion AIC AICC_SELECT TRUE selects via sample size corrected AIC BIC_SELECT TRUE selects via Bayesian information criterion BIC When any of these three variables are set to TRUE Bayesian model averaging will be conducted according to the criterion Bayesian model averaging uses certain assumptions to find relative probabilities that each of the models is the true model or the best fitting model A final set of parameters estimated according to the best overall model is computed and a second set of standard errors are computed that is an average over all models weighted by the probability of each model The standard errors contain a component of variability from model selection uncertainty and a component for uncertainty of the parameter estimates See Burnham and Anderson 1998 325 Results The output report from a mle mopEL statement consists of a number of smaller reports Most reports can be enabled or disabled by modifying variables Some examples are parameter estimate reports the variance covariance matrix a list of the i
195. nual DROP lt 0 ADD 10 MULTIPLY 2 KEEP gt 24 SUBTRACT 10 POWER 3 DROP lt 1 rresponding new syntax is DROPIF vl lt 0 v2 10 2 KEEPIF v3 gt 24 v4 10 3 DROPIF v4 lt 1 28 mle 2 1 manual 30 mle 2 1 manual Chapter 2 Installing and running mle The mle interpreter is a small self contained program that can be run from the command line of the operating system This chapter describes how to install mle in both the DOS environment and the Windows environment A brief tutorial is given on how to run mle and how to edit program files using a text editor Additionally the editor emle is described All command line options are described Installing mle Under Windows mle is installed using a built in installer This will install the interpreter along with a rudimentary editor that can be used to edit and run mle programs If you prefer you can install everything by hand under Windows as well this is especially helpful if you want to run mle from the DOS command line The current releases of mle can be found on the web at http faculty washington edu djholman mle For the purposes of this manual we will assume that the current release is 2 1 16 Unix Find the current release of mle For a Linux ELF binary the current release might be called mle 2 1 15 1inux i386 tar Z Experienced Unix users will recognize this as a compressed tar file Here are the steps for installation
196. o Note that when the second time is set to zero it will be less than topen so mle returns the area from topen to infinity 111 Statistical examples MLE TITLE Example DATAFILE ex4 dat OUTFILE ex4 out DATA time FIELD read in observation time resp FIELD 1 if responder 0 if nonresponder topen IF 1 THEN 0 ELSE time END tclose IF 1 THEN time ELSE 1 END END MODEL DATA PDF LOGNORMAL topen tclose PARAM a LOW 0 00001 HIGH 9 START 1 END PARAM b LOW 0 00001 HIGH 2 START 0 4 END END of the PDF END RUN FULL END of the MODEL END Survival analysis Left truncated observations Left truncation arises in survival analysis when some early portion of an individual s period of risk is not observed For example in a prospective study of mortality we might want to follow all living people in some area instead of just following individuals from birth This type of data collection can lead to unbiased results provided observations are left truncated at the age at which people are enrolled in the study The idea is that had the someone died prior to being enrolled in the study that would not have been enrolled therefore their risk of mortality is know to be zero For this example we will use the Siler competing hazards mortality model for a fictitious prospective study of mortality We will two types of observations those who died and those who are right censored For each observation we know three
197. o 97 Setting the Qutput Device s 5 a iio 97 The FINISHPELOT procedure 99 MORE EXAMPLES sti aos 99 Graphing PDFS SDF CDE Gnd HESS 00 sa 99 CONTOUF DIOIS E OA Ni Sela ties 100 AS AS E A A dd 101 GEOMEITIC EIBULOS it a aa a AA se Haase alate ses stat EA EA E sida das 101 ANIMA OW EXGMPLE ts A A E 102 mle user s manual Brief table of contents CREATING PLOTS FROM THE MODEL STATEMENT sssccccessccccesecccceucecsseeccsesecceseeccssuecceeccssuuaecseeseceesueeceseueeseeeceees 103 SESH NACA DiS tb cel 103 Likelihood Surfaces cidad illinois 103 A IA ARA A E NA 103 SURVIVAL ANALYSIS EXACT MEASUREMENTS ccccsesesscccsecccccsctesecccceeececcescesessueesenseseseusesessucesessueuessensueseeueesensueees SURVIVAL ANALYSIS EXACT FAILURE AND RIGHT CENSORED OBSERVATIONS ie SURVIVAL ANALYSIS INTERVAL CENSORED OBSERVATIONS scccccessececeeccecsccsesesccsceueeceseueececsuersecsseecsseueesseaueeseeeneees CURRENT STATUS ANALY SESi 2 55035062555 i550 3 sativa id lia Meese SURVIVAL ANALYSIS LEFT TRUNCATED OBSERVATIONS cccccsseececeecceccueccsceucececeeecsceveeceseueccseeuecceseueecessueesessueeseuanees SURVIVAL ANALYSIS RIGHT TRUNCATED OBSERVATIONS ccsccccceseccecesccsceescceceueeceseuceceseueeceseueecessueecessueessesuecseeaneees SURVIVAL ANALYSIS LEFT AND RIGHT TRUNCATED OBSERVATIONS ssccccccsecceceeececevecceceecceceuseceseueeceseueesessueeseuanees SURVIVAL ANALYSIS ACCELERATED FAILURE TIME
198. ocumentation for Gnuplot is available for free with the program Here are a few notes on the language e Gnuplot can be used interactively or in a batch mode For example you can read in a file created by mle into the Windows version of Gnuplot and then modify the plot interactively e The Gnuplot language usually takes one statement per line Multiple statements on one line by are formed by separating the commands by a semicolon Also a single statement can be spread across multiple lines by using the backslash character as the last character on a line The pound sign is used as a comment delimiter e The Gnuplot language is case sensitive Lower case is used for functions and key words Also algebraic operators follow the syntax of c So in Gnuplot is equivalent to lt gt in mle and 3 in Gnuplot is equivalent to moa in mle Exponentiation in Gnuplot uses the operator e Many options in Gnuplot are set with the set command Here are some examples set terminal hpljii set key on set title fun with graphics set logscale xy set size 0 5 0 5 set xlabel time hours 4 set ylabel density There are many set options available in Gnuplot These are usually inserted into the plot file using mle s wRITEPLOTLN statement or in the initial string list in the PLOT statement Setting the Output Device Gnuplot is relatively device independent That is it can work across a number of computer platforms and write to d
199. of the summation The fourth argument is the expression of summation Notice that i appears within the function Its value will be updated with each iteration of the function Like summation taking a product over a series of number is a commonly needed function in scientific programming For example the factorial function n 1 x n 2X X n 1 X n can be computed as II Here is a program that reads an i l integer from the keyboard and computes the series in this way factn PRODUCT i 1 n i END WRITELN n END PONE rs Y factn 142 Integration MLE myvalue Programming tutorial Like the summation function PRODUCT function takes four arguments p Suppose you want to compute the integral f sing 2x dx Here is an vT example of how that can be coded myvalue INTEGRATE x SQRT PI SQRT PI SIN x 2 2 x END The expression assigns the result about 1 525 to myvalue Here is a description of the meaning of each part of the expression INTEGRATE x x is the variable of integration SORT PI This is the lower limit of integration SORT PI This is the upper limit of integration Close of the argument list SIN x 2 2 x The function to be integrated END Integrate writeln myvalue END Probabilities sigs REAL REAL REAL REAL End of the integrate function Like the SUMMATION and PRODUCT functions there are four arguments to the INTEGR
200. om the default of C Program Files mle to some other location like C Documents and Settings lt username gt mle e Once the installation is complete you can optionally modify your PATH variable so that mle can be run from any directory on the command line The PATH variable can be changed in most versions of Windows via Start gt Settings gt Control Panel gt System Advanced gt Environment Variab les Editing a program Writing an mle program requires that you edit the text of the program and then submit it to the mle interpreter The next step is to view the output of the program Depending on the results you will then edit the program again and submit it again Almost any text editor can be used to edit a program Additionally the Windows version of mle comes with a simple text editor that is tailored to editing and running programs This section first describes some text editors available in DOS and Unix that can be used for editing programs Then the mle editor is briefly described Under Unix there are a number of de facto standard editors that are used for programming The vi editor in particular is available on almost every installation Other commonly used text editors on Unix systems are Pico and EMACS Before you can develop mle programs you will need to know one of these editors Under DOS or Windows there are a number of editors available besides the one that comes with mle A standard editor available in all
201. omes 1 5 j jfa ara 10 s6 19 8 10 slt 1 where the 5 x y is the Kronecker s delta function which equals one when x y and zero when x y The following example estimates one such model The likelihood begins with the MIX function which produces an average of the second and third arguments weighted by first argument which is a probability The first PDF is PDF STERILE END which returns one if tclose is infinity or less than topen Covariates are modeled on both the non susceptible fraction as well as the hazard of the susceptible fraction Example ex dat ex out Last observation time prior to the event First observation time after the event the first covariate the second covariate PARAM s LOW 100 HIGH 100 START 0 FORM LOGLIN define the immune COVAR weight PARAM b_s_weight LOW 20 HIGH 20 START 0 END COVAR sex PARAM b_s_sex LOW 20 HIGH 20 START 0 END END param s PDF STERILE topen tclose END returns 1 for right censored observations PDF LNNORMAL topen tclose PARAM a LOW 0 00001 HIGH 100 START 25 END PARAM b LOW 0 01 HIGH 50 START 3 END HAZARD COVAR weight PARAM b_weight LOW 20 HIGH 20 START 0 COVAR sex PARAM b_sex LOW 20 HIGH 20 START 0 END hazard END of the PDF mix function the MODEL 118 Statistical examples Linear regression in the likelihood framework 53 23 19 34 24 65 44 ah 29 58 37 46 50 44 56 Lao 36 28 7 DL eL 240
202. on lt v gt to denote an identifier lt rexpr gt to denote an expression of type REAL lt iexpr gt to denote an INTEGER expression lt bexpr gt to denote a BOOLEAN expression lt sexpr gt to denote a string expression Here is an example WHILE lt bexpr gt DO lt statements gt END e When syntax diagrams are shown items shown within 7 are optional arguments Note that the brackets are italicized Un italicized are part of the language used for arrays For example WRITELN fexpr lt expr gt lt expr gt shows that the WRITELN statements has an optional set of arguments enclosed within parenthesis The first argument can optionally be a file expression At least one expression must be enclosed within the parentheses Commas separating the expressions are optional e Ellipses are used in two ways First they denote that a pattern can be repeated an unlimited number of times Hence in the previous point the ellipses indicate that an unlimited number of expressions can be placed within the WRITELN statement The second use denotes that parts of a statement or function are not shown For example MODEL END uses ellipses in this way e A list of alternatives are separated by the vertical bar For example the DATA function has a series of optional forms specified this way DATA FORM SUMLL SUMMATION SUM PRODUCT PROD lt expr gt END
203. on about the distribution of possible parameter values 7 mle 2 1 manual Parameters are sometimes the constants defined within a function For example in the equation for the slope of a line y mx b we would call m and b the parameters of the equation and x the argument This is clearer when we rewrite the equation for a slope as f x m b mx b which is read f of x given m and b This function has a single argument x and the parameters are m and b Typically a series of x values are known and the goal is to find the best values for parameters m and b By best of course we mean the best in some statistical sense In mle m and b would be called parameters if and only if they were quantities to be estimated The one exception to this usage of parameter is for the built in probability density functions in mle where we refer to intrinsic parameters For example the normal distribution f tlu has two intrinsic parameters and O Typically we wish to estimate these intrinsic parameters If so the intrinsic parameters u and are also model parameters As described later most probability density functions take four argument for t Combinations of these arguments allow you to specify e The probability density function 1 argument or 2 identical arguments e The cumulative density function 2 arguments the Ist argument lt minimum range of the PDF e The survival density function 2 arguments the 2nd argumen
204. ons The survival function probability density function and hazard function can be plotted from a MODEL statement by setting the variable PLOT_DISTS to TRUE The mechanism is similar to that used for printing the values using the PRINT_DIST variable In addition to PLOT_DISTS TRUE you must set three other values DIST_T_START defines the lowest value over which the distribution is plotted DIST_T_END is the highest value over which the distribution is plotted DIST_T_N is the number of points to plot An example of plotting these distributions is given after the description of likelihood surfaces Likelihood Surfaces PLOT MODEL A likelihood surface can be plotted over one parameter or two parameters of a model All other parameters are taken at their estimated value Surface plots are made by adding SURFACE lt xparam gt Or SURFACE lt xparam gt lt yparam gt to the end of the RUN Or REDUCE list part of the MODEL statement Here is the format surrounds the model statement for plotting surfaces lt model statement gt RUN FULL SURFACE lt xparam gt plots a likelihood profile over one parameter FULL SURFACE lt xparam gt lt yparam gt plots a likelihood profile over two parameters REDUCE REDUCE SURFACE lt xparam gt SURFACE lt xparam gt lt yparam gt END model END plot An Example For each parameter being plotted the minimum plotted va
205. ons This section discusses procedure writing and variable passing The next section discusses the related concept of user defined functions User defined procedures serve a number of purposes e Procedures can be used to extend the languages Essentially you can write your own statements that take a list of zero or more arguments e Procedures provide a way to collect commonly defined operations into a single place This addresses the frequent need to have the same set of operations performed on different variables or in different parts of a program e Procedures provide a way to modularize programs That is programs can be composed of a small set of general operations each that is a separate procedure Each of those in turn can call a set of other procedures This programming style called top down programming can lead to more robust and readable code Procedures must be completely defined prior to their first reference in a program For example suppose you want to write a procedure that returns the roots of a quadratic equation You would first define the procedure quadratic say that takes 5 arguments three real coefficients as inputs and two complex numbers that are the roots as the outputs Your program could then call that procedure repeatedly in your program with different inputs Here is how the procedure could be written 151 Programming tutorial MLE PROCEDURE quadratic a REAL b REAL c REAL VAR root1 CO
206. or providing a short summary of intrinsic parameters for predefined PDFs For example typing mle h weibull yields WEIBULL Distribution 4 continuous variables t open t close t left trunc t right trunc Exact failure when t open t close Range t Time Q lt t lt 00 2 intrinsic parameters a Scale 0 lt a lt 00 b Shape 0 lt b lt 00 a is the characteristic life 63 2th in units of a f t S t h t S t exp t a b h t b t b 1 a b mean a Gamma 1 1 b var a 2 Gamma 1 2 b Gamma 1 2 b 2 mode a 1 1 b 1 b for b gt 1 mode 0 for b lt 1 median a log 2 0 5 Gamma x is the gamma function Covariate effects may be modeled on the hazard 42 mle 2 1 manual Table 1 Command line options h lt name gt H lt name gt Description Sets VERBOSE to TRUE so that an iteration history and other information is printed to standard output while solving a likelihood model Provides help information about PDFs functions variables constants reserved words and parameter transformations When lt name gt is replaced by a PDF name a transformation name a function or a predefined variable a brief help message is given If lt name gt is not a known topic a list of topics is printed Provides help information like h but matches anything that contains the string lt name gt If lt name gt is not given all help messages are given
207. ordprev cooooccccccnnns Skip back one word Ctrl_RtArr wordnext 0 0 0 Skip ahead one word Ctrl_End windowdown Move window down Ctrl_PgDn fileend ee Go to end of file Ctrl_Home windowup Move window up Allan unmapped Ata Dr titeni unmapped 00 ALLS AA unmapped 0 Altdo unmapped AILS aaa unmapped Alt_6 s sia unmapped 04 Running a program mle programs are usually run by typing mle followed by any command line options followed by the name of the program file on the DOS or Unix command line The mle interpreter will then read in and parse the entire program file and the program statements will be executed If mle encounters an error in the program an error message is printed and further execution terminates Warning messages are printed from mle without terminating the run 40 mle 2 1 manual The following sections provide more details on how to run mle from the command line Specifying the Program File and Command Line Options There are several methods for specifying the program file Typically the program file is specified on the command line Here are some examples of how the mle command is used to run a program file called test mle test gt mle test mle Runs mle on the file analysis mle test gt mle v test mle Runs mle verbose option is set c
208. ored and are in units of days as per Table 2 of Hammes and Treloar topen FIELD 1 time at opening the interval tclose FIELD 2 time at closing the interval frequency FIELD 3 Frequency from Menstrual History Program END data MODEL DATA function to loop through all observations PDF NORMAL topen tclose Define the parametric distribution PARAM mean LOW 400 START 270 END PARAM stdev LOW 5 100 START 20 END END pdf END data RUN FULL run the model with both parameters free END model END mle Figure 1 Program to estimate parameters for the distribution of gestational ages at birth Program Constants and Variables A number of variables and constants e g MAXITER are pre defined in mle Frequently you will want to change the value of these variables in order to fine tune the behavior of the program change the type of output produced etc MAXITER is a pre defined variable that changes the maximum number of iterations permitted in estimating the model parameters In this example the value of MAXITER is changed from the default value of 100 to a maximum of 50 The TITLE variable is also assigned to a string variable i e a series of characters The TITLI variable is simply written to the output file The variable EPSILON is assigned a value as well This variable determines how precisely the parameters are to be found normal convergence occurs when the change in the log likelihood fro
209. ort format is modified so that all parameters estimates are printed on one line 62 mle 2 1 manual Variance covariance Matrix The estimated variance covariance matrix is printed by setting PRINT_VCV TRUE The number of elements of the matrix printed on a single line is normally 5 but can be changed by modifying the value of VCV_wWIDTH The asymptotic variance covariances of maximum likelihood estimates are found by inverting the local Fisher s information matrix for the n parameters _ 72 _ 72 E n E 00 00 8 I g a a ar e a1 00 0 00 The expectations are ideally taken at the true parameter values In practice we have parameter A estimates not the true values Hence numerical estimates of the information matrix I are A found by plugging in parameter estimates O An estimated variance covariance matrix is then estimated as V I mle uses two different estimates for the variance covariance matrix Either one or both methods may be used by setting INFO_METHOD1 Or INFO_METHOD2 to TRUE or FALSE The default method INFO_METHOD1 TRUE computes the variance and covariance matrix by inverting Nelson s 1983 approximation to the Fisher s information matrix The xth yth element of that matrix is computed as E yr oz dx dL oy using the standard perturbation method for approximating the l partial derivative Appropriate sizes for Ax and Ay are iteratively computed for each param
210. pecific purpose use or application The user is responsible for ensuring the accuracy of any results Sound engineering scientific and statistical judgment is the user s responsibility Suggested citation Holman Darryl J 2003 mle A Programming Language for Building Likelihood Models Version 2 1 Vol 1 User s Manual http faculty washington edu djholman mle mle List There is an email list for mle users to receive update and bug notices To subscribe send an email message to majordomo pop psu edu with the text subscribe mle as the body of the email message mle user s manual Preface Preface mle is the culmination of years of tinkering punctuated by occasional bursts of concentrated activity that began in 1991 At the time I was a graduate student in biological anthropology and demography working on several projects that used parametric survival analysis Some of the parametric models I was working with a bivariate normal and a negative exponential distribution with lognormally distributed frailty and an immune fraction were not available in software I had at hand Instead I pieced together a series of numerical routines some translated from FORTRAN to Pascal into a special purpose program for my needs Ken Weiss suggested that there was a need for a general purpose program for specifying and solving likelihood models That suggestion and encouragement from Jim Wood and Robert Jones led me to develop mle Since th
211. pecification part of the Param function has been generalized to COVAR lt expr gt lt expr gt A typical specification is PARAM x LOW 0 HIGH 100 START 25 COVAR z PARAM beta_z LOW 5 HIGH 5 START 0 END END Nevertheless other expressions are legal For example PARAM x LOW 0 HIGH 100 START 25 COVAR z 1 END param e The PARAM options HIGH LOW START and TEST are treated like assignment statements which are evaluated just prior to maximization The right hand side of the assignment can be any valid expression For example PARAM a LOW IF y gt 3 THEN 0 ELSE 3 HIGH x 2 2x 4 START e The const part of the MODEL statement is longer supported e A number of procedures have been added that can be used wherever a statement is allowed including OPENAPPEND Opens a file for appending OPENREAD Opens a file for reading OPENWRITE Opens a file for writing WRITE writes to standard output WRITELN writes a line to the standard output READ Reads variables from the standard input READLN Reads one line of variables from the standard input PRINT writes to the output file PRINTLN writes a line to the output file CLOSE Closes a file SEED seeds the random number generator DATAFILE defines the data file OUTFILE defines the output file HALT halts the program e A variety of statements have been added that can be used wherever a statement is allowed
212. pent Variable 2 41 17647059 13 43612339 0 32630585 VAR COEFFICIENT STD ERROR T STATISTIC Alpha 66 46540496 B 1 1 29019050 0 34276468 3 76407073 B 2 0 LUTO 3647 0 24858973 0 44666675 SUM OF MEAN SOURCE SQUARES SQUARE REGRESS 23291795 1162 5897 RESIDUAL 2101 2911 150 0922 TOTAL 4426 4706 276 6544 R SQUARE 0 9253 STANDARD ERROR OF ESTIMATE 12 251213 The following shows the mle code for the equivalent likelihood model Notice that this program is similar to the accelerated failure time model except that the form for modeling covariates on the mean is additive FORM ADD MLE TITLE Test regression DATAFILE eg dat OUTFILE eg out DATA y FIELD 3 xl FIELD 1 x2 FIELD 2 END MODEL DATA PDF NORMAL y PARAM mu LOW 7 HIGH 500 START 50 FORM ADD COVAR xl PARAM bl LOW 10 HIGH 10 START 0 END COVAR x2 PARAM b2 LOW 10 HIGH 10 START 0 END END param PARAM sig LOW 0 1 HIGH 200 START 10 END END pdf END data RUN FULL END END The following output fragment shows the result from this model LogLike 65 06725 Iterations 334 Func evals 25383 Del LL 9 745E 0011 Converged normally Results with estimated standard errors 27 evals Solution with 4 free parameters Name Form Estimate Std Error t against mu ADD 66 46589883575 9 596050356992 6 92638078825 0 0 b1 1 290194199465 0 453901547297 2 84245384742 0 0 b2 0 11104975496 0 202022074279 0 5496911927 0 0 sig 11 11779472801 2 6308
213. points gt is missing the value stored in PLOTPOINTS will be used instead which is initially 100 Here is an example that draws two curves on a plot MLE PLOTFILE DEFAULTPLOTNAME PLOT CURVE z 15 100 z PDF NORMAL z 15 100 z PDF WEIBULL z 0 Sip 2 END END 0 Hed 2 END END CURVE z END plot END 0 18 0 16 0 14 0 12 0 1 oos 0 06 0 04 0 02 0 2 4 6 8 10 12 14 16 The second form for the two dimensional curve statement generates a series of INTEGER x values for use in computing y values It looks like this CURVE KEY lt keystring gt WITH lt withstring gt AXES lt axesstring gt lt x_var gt lt x_min gt TO lt x_max gt lt x_expr gt lt y_expr gt lt expr gt lt strings gt e 200 END This form of the CURVE statement creates a series Of INTEGER x points It begins with lt x_var gt set to lt x_min gt and ends with the point lt x_max gt The value of lt x_var gt will be incremented by 1 for 88 mle 2 1 manual each point and will be used to compute lt x_expr gt and lt y_expr gt and perhaps other expressions as well Here is an example that draws two curves on a plot MLE PLOTFILE DEFAULTPLOTNAME PLOT set data style boxes set xrange 0 5 12 5 CURVE i 0 TO 10 i PDF BINOMIAL i 0 5 10 END
214. r Theorie der Fall und Steigversuche an Teilchenn mit Brownsche Bewegung Physikalische Zeitschrift 16 289 95 Shah BK Dave PH 1963 A note on log logistic distribution Journal of the M S University of Baroda Science Number 12 15 20 Subbotin MT 1923 On the law of frequency of errors Mathematicheskii Sbornik 31 296 301 Tanner MA 1996 Tools for Statistical Inference Methods for the Exploration of Posterior Distributions and Likelihood Functions 3rd edition New York Springer Verlag Taylor BN 1995 Guide for the Use of the International System of Units SI National Institute of Standards and Technology special publication 811 1995 edition Washington US Government Printing Office Thomas M 1949 A generalizaton of poisson s binomial limit for use in ecology Biometrika 36 18 25 Tuma NB Hannan MT 1984 Social Dynamics Models and Methods New York Academic Press Tweedie MCK 1947 Functions of a statistical variate with given means with special reference to Laplacian distributions Proceedings of the Cambridge Philosophical Society 43 41 9 Van Canneyt M 2000 Free Pascal Programmers Manual for FPC version 1 0 2 version 1 8 Vaupel JW 1990 Relatives risks Frailty models of life history data Theoretical Population Biology 37 220 34 Vaupel JW Yashin AI 1985 Heterogeneity s ruses Some surprising effects of selection on population dynamics American Statistician 39 176 85 Wald A 1947 Sequential Analys
215. r surfaces can be drawn on a single plot A simple mechanism to specify a grid of multiple plots on a single page Data points and fitted curves Up to two x and two y axes on a single two dimensional graph Cartesian or polar coordinates in two dimensions Rectangular spherical or cylindrical coordinates in three dimensions Simple generation of estimated distributions with error bars One and two dimensional likelihood profiles Creating Plots There are four steps used for creating graphs in mle Define the plot file using the PLOTFILE lt name gt procedure in a program 85 mle 2 1 manual Define one or more plots using the PLOT END statement in a program Usually the statements within PLOT END will include one or more CURVE END statements that draw the curve on the current plot Run the mle program The plot file and its data files will be created as a Gnuplot program At this point you have the option to edit the plot as a Gnuplot program Run the Gnuplot program on the plot file to create display or print the graph In some cases this forth step can be done from within the mle program using the FINISHPLOT procedure Defining the Plot File The first step in creating a graphic is to define a plot file using the PLOTFILE lt name gt procedure mle writes a Gnuplot program to the plot file Gnuplot is discussed in a later section The name of the plot file also determines the
216. r the age of 40 0 n40 0 1 TO N_OBS DO 40 THEN rthan40 greaterthan40 1 lessthan40 lessthan40 1 END if END for WRITELN le ssthan40 lt 40 and greaterthan40 gt 40 mle 2 1 manual Number formats The mle language primarily works with numbers With this in mind a wide variety of number formats including some automatic conversions are supported The standard formats for real and integer numbers are recognized so that 3 14159 12 14 and 0 001 are read as would be expected Real numbers must have a digit both before and after the decimal point so 23 is not valid but o 0 3E12 Table 3 Standard number formats Format D d d d ds ds d ds d ds dEd dE d d dEd d dE d d Ed d E d ORv dXy d d d d d d d d d d d d d d dAM d d dPM d d d dAM d d d dPM d dPM d dAM d d dAM d d dPM dHd d dHa d d dHd dHd d dHd d ddd ddd d dd d d d ddd d dd ddd d dd d dd d d d d d d d d dd d d d d d d d d_dld 23 IS wow won le4 12345e 67 are valid numbers Examples 1 200 3 1415 3 14 23 7M 45 7da 2n 2 418E 3e23 511E 10 31 416e 1 7 0E 10 12 e 6 1 45E 3 1 0E0 ORXLVI OrMXVI Ormdclxvi 2x1001 binary 8X3270 octal 16xA4CC hex 32x3vq4h base 32 10 42 14 55 32 10 40 23 4 16 53 2 10 42AM 2 55 32pm 10 40 23 4am 230h16 32 14H32 6
217. ractices can greatly aid in reading understanding and debugging a program Good formatting consists of selecting and consistently using indentation to reflect logical levels and blocks within a program Comments are indispensable for making a program understandable Throughout this manual mle programs use indentation to show for example the matching MODEL and END This manual uses two spaces to indent each natural level Keywords that are a part of mle are always upper case letters and user defined words are lower case this is not required since mle is not case sensitive Finally each matching EnD is usually followed by a comment denoting what key word the END matches This last convention is helpful in complex programs that involve many nested functions Typographic Conventions Typographic conventions are used in this manual to distinguish between mle language components and English text e Keywords in mle are shown in a fixed pitch font as uppercase words MODEL END DATA END and DEFAULTOUTNAME e User defined variables and identifiers are shown in a fixed pitch font as lowercase words y slope intercept x e Within programs items placed in lt and gt and italicized are used to denote an omitted or unspecified parts of the code For example lt Statements gt denotes a list of program statements that have been omitted Other commonly used phrases are lt expr gt to denote an expressi
218. random number generator stops a program from running further WRITELN Final value is total Writes text to the screen DATAFILE hammes dat Defines and opens a data file OUTFILE hammes out defines and opens an output file User defined Procedures mle provides capabilities for users defined procedures and functions A procedure is a single word command that takes a list of zero or more arguments when called a procedure executes a series of statements and returns to the place whence called User defined procedures are something like subroutines in FORTRAN they are very similar to Pascal s user defined procedures User defined procedures must be understood as two components the procedure definition and a call to the procedure 16 mle 2 1 manual A user defined procedure must be defined prior to being invoked called By convention user defined procedures and functions are usually placed near the beginning of the program Here is an example of a user defined procedure being defined and later called MLE a STRING Hello world PROCEDURE myproc a INTEGER b REAL c STRING Define the procedure here msg ais WRITELN In myproc a a b b IF a lt 10 THEN WRITELN msg lt 10 a a ROUND b ELSE WRITELN msg gt 10 END if WRITELN Exit myproc with a a END procedure End of user defined procedure definition t 4 WRITELN Call myproc with t t myproc t 4 2
219. ras dd ts A E ll A e Block commands 1 dt dile it donde tada des Page formatting COMMANAS cc ccccesscccecessceeeeseneeeceenneeecesanecceseaeeeceeaeeecseaeeeeeseaeeeceeeaeeeeeeeeeeeseaeeeeeenaeeeeesnaeeeeesas Help eiat aate S it td dll dido dota dd Execution commands No A AA A O OEA E A OE OA OE O A OEO OtheriCommands es n td A e ad A ad ee Menu Command did e a de a a a e a E e a a e ed e a e Default Keyboard mapping riir prese tet i etrn ieir ir e RUNNING A PROGRAM ccccceeccsecccsecccecccsccccscccseccuscceseccuscceueccucceusceueceusccuseceusccuueceusceeseceueceueceueceseceuseceuceuseceusceuseceuceeae SPECIFYING THE PROGRAM FILE AND COMMAND LINE OPTIONS Fel POPU ONS ess A ad Debus giiis OPONE AA Sh Other ODIOS A A i Se te ts testing number formats Start file Options eee eee Batch Optio a as ainteractivemode 1 Lbalb Les ol a o db dd dp do Leo dd a deseo el o do doo da do LO dl no LE EA e dl ado LLO CALCULATOR MODE ao Ll READING DATA FROMA FILE a tea ind als Namine the data dle aa anna i a e aeee ee aaaea aee eea ee aaeeei TheDATA Statements dla a ida aaee aeara E e diced eaeh eR Dropping or keeping observations n se Observation frequency Transformations of data ossee OIR ARA AN 50 Skipping initial lines in the data file cesccceseccessscessecesseeeescecsscecescecssecesseceseecscecsscecessecesaeecseesseesseesessesesaeeesneeeeaeees 51 Delimiters m He Gta Tlatelolco e ee eee e enie ee oeiee e dirt 51 CREATING
220. ree character is available on some hardware platforms as ASCII code 230 On many Intel platforms holding down the lt ALT gt key and typing 230 on the numeric keypad gives the degree character The Greek letter micro 4 is available on some hardware platforms as ASCII code 248 On many Intel platforms holding down the lt ALT gt key and typing 248 on the numeric keypad gives this character 55 mle 2 1 manual Less common formats include numbers with metric and percent suffixes numbers interpreted as times numbers in an angle notation one format that converts degrees to radians numbers in bases from 2 to 36 Roman numerals why you ask Why not numbers in fraction notation and several date formats These formats are supported in data files as well as numeric constants within an mle program Table 3 is a comprehensive list of formats recognized by mle and Table 2 is a list of suffixes permissible on standard integer and real format numbers 56 mle 2 1 manual Chapter 4 Building Likelihood Models The MODEL statement is at the heart of parameter estimation It specifies the likelihood defines parameters and specifies which parameters are to be estimated A complete understanding of how models are built in mle requires an understanding of the structure of the MODEL statement an understanding of parameters and how they are specified an understanding of how expressions are specified and are built into likelihoods and an
221. rger crystal and are subsequently less likely to fall out of alignment As the temperature drops and the atoms move around less large overall changes in structure become less probable When absolute zero is reached the structure becomes fixed at room temperature solid metals continue to anneal very slowly Rapid cooling of the metal called quenching in metallurgy because the metal is thrust into cool water or pickle does not provide sufficient time for crystals to move about and organize Thus numerous vacancies and dislocations exist among many small crystals and orderliness is minimal Maximizing the crystalline order or minimizing vacancies and dislocations is done by cooling the metal very slowly and providing ample opportunity for the random crystal movements to fortuitously align themselves into more ordered structures The simulated annealing method attempts to mimic the physical process of annealing An initial temperature is set and a cooling rate is specified New parameters are randomly chosen over a large range of the parameter space As the temperature cools smaller and smaller ranges of the parameter space are explored Additionally the maximizer will not always travel up hill At any given temperature a certain fraction of downhill moves will be taken so that local maxima will not trap the maximizer The advantage of simulated annealing over other methods is that it is very good at finding the global maximum even in the pre
222. rom a uniform sin pi RAND sine transformed variates that yields the following data set o 9809586099 2439682743 7307000229 8642639946 8824737096 0966561712 3989167160 7812333882 3651667470 4812826931 Oo0O0O0O00Oo00o0O0o0oo0o0o SO oO Ooo 0 00 mle 2 1 manual Printing observations and statistics Some other variables can be used to fine tune the DATA statement The variable PRINT_DATA_STATS when set to TRUE prints summary statistics for each variable including the mean variance standard deviation minimum and maximum The default is TRUE so this report can be suppressed with PRINT_DATA_STATS FALSE When PRINT_OBS is set to TRUE each observation is printed to the output file The report is printed after all transformations have been done The default value is FALSE so you must have the statement PRINT_OBS TRUE to print the observations The variable PRINT_COUNTS when set to TRUE prints out how many lines were read from the input file how many observations were kept and how many observations were dropped The default value is TRUE so these reports can be suppressed with PRINT_COUNTS FALSE The PRINT_BASIC variable when TRUE directs that the title parameter file name input file name and the count of variables to be read from the input file are printed The PRINT_FIELDS variable when TRUE prints out the name of each variable and the fiel
223. ropped one line per observation is assumed The line specifier must be a positive integer constant The remaining specification provides ways of transforming variables and dropping or keeping observations The next several sections discuss transformations and gives additional examples of declaring variables in the DATA section 48 mle 2 1 manual Dropping or Keeping observations A series of statements to drop or keep individual observations from the input file can be specified as the last items in a variable declaration within the DATA statement Here are some example of this DATAFILE test dat my_drop_value 100 DATA first_time FIELD DROPIF first_time lt 0 missing_data FIELD DROPIF missing_data lt gt 1 last_time FIELD KEEPIF last_time gt 0 DROPIF last_time INFINITY OR first_time lt last_time alt_missing FIELD KEEPIF alt_missing missing_data END The Drop1F keyword specifies that a condition will be tested if the condition is true then the entire observation is dropped The first DROPIF statement here specifies that the entire observation is to be dropped if first_time is less then or equal to zero The KEEPIF keyword is like DROPIF except that the observation will be kept if the condition is true and dropped otherwise The grammar is KEEPIF lt bexpr gt and DROPIF lt bexpr gt where lt bexpr gt is a boolean expression A boolean expression is one that evaluates to true or false
224. s Case study Extended Poisson for modeling species abundance 14 This example shows the use of a user defined function for programming a pdf that is not built into mle In fact the Thomas distribution is available in mle but we will ignore the built in implementation for this example This example also shows some graphics programming in mle Thomas 1949 discusses the problem of clustering among a given species of plants in ecological surveys Ecologists were using the Poisson distribution to describe the number of plants found in randomly selected square quadrats The Thomas distribution Thomas 1949 Christensen 1984 models the count of k plants in a quadrat as resulting from one or more clusters of plants and is given by Lal p ib k a b e A Data are counts of Armeria maritima plants surveyed in 100 quadrats on Blakeney Marsh 57 quadrats with O plants 6 with 1 plant 12 with 2 5 quadrats each with 3 4 and 5 plants 7 quadrats with 6 plants and 1 quadrat each with 7 9 and 10 plants The following mle program fits these data to the Thomas distribution as well as the Poisson distribution and graphs the distributions of observed versus expected number of plants 126 Statistical examples MLE Distribution of Ameria maritima in Blakeney Marsh using the Thomas distribution or Double Poisson distribution Data are given by M Thomas 1949 A generalization of Poisson s Binomial Limit for use in Ecology Biometrika
225. s are analytically found that simultaneously yields zero for each of the partial derivatives The maximum likelihood estimates for a parameter is found from a particular series of observations by simply applying that equation on the set of observations Unfortunately this method is difficult and non general and therefore not practical for general purpose maximization as found in mle Advances in computer assisted symbolic mathematics packages like Maple and Mathematica may eventually prove this method feasible for many users but the need for specialized mathematical knowledge and skills still limits this method A general method must work for most types of likelihood functions whether or not analytical derivatives are easy or even possible to find Another class of fast maximizers estimates derivatives numerically These methods are not robust for complex surfaces with many local maxima From some starting point they tend to rush up to the top of the nearest local maximum A given function may have one or many points where the derivatives goes to zero so this method may not find the global maximum Numerical derivatives have limitations resulting in part from the inaccuracy of real number representation in computers so that a number of derivative free methods have been developed One clever method solves a two dimensional maximization problem by trying to enclose the maximum within a triangle The triangle grows and shrinks based only on inform
226. s are usually written The name of the output file is specified in the mle program file The program file also specifies what kind of result will be written to the output file and how much of the details will be included You can also specify that mle write partial results and messages to the screen or standard output as itis called This is helpful for monitoring progress while estimation is taking place Skeleton of an mle Program An mle program begins with the word mLE and ends with a matching END A typical program includes four types of statements between the MLE and END e A DATA statement describes the format of the input data file and provides simple data transformations and mechanisms to drop observations e A MODEL statement defines the likelihood function along with the parameters to be estimated A second part of each MODEL statement contains the keyword RUN that specifies how the model is to be estimated e Assignment statements define variables and change the values of the variables including some that affect the behavior of the DATA and MODEL statements mle 2 1 manual e Procedure statements like DATAFILE and OUTFILE take a list of arguments and performs some predefined action DATAFILE for example names and opens up the file read in by the DATA statement The following code fragment shows the skeleton of a typical mle program The first two statements are procedure calls t
227. s because mle defines and does the bookkeeping for the data file the output file the plot file and the screen or standard output file File variables can be created should you wish to create and manipulate other files When a variable is first used in an assignment statement its type will be determined by the type returned from the expression on the right hand side Here are some examples to illustrate the point large_data N_OBS gt 5000 large_data will be type BOOLEAN subtitle Analysis DEFAULTOUTNAME subtitle will be type STRING nine S80 nine will be REAL five 2 3 five will be INTEGER You can explicitly define the type for a variable when it is first referenced in an assignment statement Here are some examples C STRING x c would default to CHAR but is explicitly defined as a STRING variable nine REAL 3 3 nine would default to INTEGER but will be a REAL variable t BOOLEAN TRUE t is explicitly declared as Boolean although this is the default ang REAL SIN 2 pi ang is explicitly declared as real although this is the default Array Variables Multidimensional arrays and matrices of all types are supported by mle Array variables must be explicitly defined the first time the variable is mentioned in the program The format is lt var gt lt type gt lt minl gt To lt maxI gt lt min2 gt To lt max2 gt Some examples of declarations are STRING 1 TO 5 Defines a one dim
228. s include data files Survival analysis Exact measurements 2 This first example not only provides an illustration of a simple mle program but also shows the notation that will be used throughout this chapter The problem at hand is finding one or more parameters O of some distribution f tl0 given a series of observations t f tz fu The values of t are known exactly For an individual observation t the individual likelihood is L f t 0 and the overall likelihood for the N observations is L t ate 1O dt Data for this example Table 6 are a series of 15 observations of times to breakdown for an insulating fluid at 32 kV The times are arranged as one observation per line in a file named exl dat The underlying distribution is believed to follow a negative exponential probability density function with a single parameter lambda The following mle program analyses these data Comments are enclosed in curly brackets Here is the code for this problem 107 Statistical examples MLE TITLE 32 kV Insulating Fluid Example from Nelson 1982 105 DATAFILE exl dat Input data file name OUTFILE exl out Name to which results are written DATA data are read from the data file here failtime FIELD 1 END MODEL this specifies the likelihood model DATA this corresponds to the product in the likelihood equation PDF EXPONENTIAL failtime PARAM lambda LOW 0 00001 HIGH 1 START 0 05 END END pdf END
229. s it is for the DATA function but with one difference the default form is FORM PRODUCT The only real difference between the LEVELDELTA and the LEVEL function is how each function decides when to exit the current level The LEVELDELTA function simply looks for a change in the value of lt expression gt whereas LEVEL evaluates a boolean function lt bexpr gt for each observation and terminates when the expression evaluates to FALSE In the example given under the LEVEL function the only change necessary to use the LEVELDELTA function is replace the LEVEL line with LEVELDELTA lev THEN Here is an example program uses the LEVELDELTA function The program estimates the change in oxygen consumption AVO in individuals undergoing repeated exercise tests using a variety of predictor variables like the increase in heart rate over the resting state Since there are repeated measures on individuals a distribution of individual effects is estimated along with other parameters The likelihood is N L J 2 10 0 LOs tr B 2 0 dz i l o Where g z is the distribution of individual effects with a mean of zero and a variance of o f v is the distribution of AVO values with parameters B u and o 76 mle 2 1 manual MLE does a linear regression w repeated measures model DATAFILE example dat OUTFIL
230. s the results through the predefined variable RETURN Here is how the function could be written average vl INTEGER v2 INTEGER REAL This function returns the average of two integers RETURN END END vl v2 2 Defining the function The function definition begins with the word FUNCTION and ends with a corresponding END The word following FUNCTION is the name of the function in this case average The name is followed by a list enclosed in parenthesis of formal arguments two in this case The argument name and type must be specified for each of the argument In this example both are defined to be type INTEGER The argument names and for that matter any variables that might be defined within the function are private to the function the same is true for procedures Names of preexisting variables outside of the procedure are not affected by and do not affect declarations of variables using the same name inside the function As with procedures arguments can be preceded by the var keyword This would have the side effect of allowing the function to modify the argument Without var keywords changing the value of an argument within a function has no effect on the calling arguments On general principles it is considered bad programming practice to allow functions to modify arguments Calling the function To call the function the main program might include something like this 154
231. sdeavest aera eiea Asian O TNA PrOdUCtS iii a esas Integration Probabilities Random Numbers srecen a E E E NN Flow CONEIOL sess ccccececscaseicescaacetalancdevsiatedenesansdeeiiatecenudenedevicadedebecenedeuscenedeuacededeuacadevaueded suenededevebsdedevebicededenedepsvetsderose TE statement as FOR statement cicoui a Ss SL ee FOR STEP Statement iia FOR STEPS Statement eea Ses ek eects eek as es a i lk as es ed REPEAT statement A cid WHILE statement ci aie nk an a es es ms ed ds Taca a The Break Statement ad The Continue Statement ca oc Shae ee eh ee User defined Proce duress a Defining the procedure A a ed en ee ee ees Calling the POCERO Nested procede vi bd NO TEE ico RA RN User defined JUNCOS ee e a eel e e ae Defining the TINO serens re ia det di Cos Ee Ka EES sdansveaanacevessuanadevansavesedeasdiouesadusereseuapseeselnedeess Callitig the funciones e E al alo dl Menino ole el dle vores Sloss leg NET AAA A DINN D A LOE IN SERS E A E E dial mle user s manual Brief table of contents A simple simulation PVOQV I ccccccccccesscceeeesseeeeeeceneecusceceesaeecessaaeeceseaeaaeeeessecensaaeeeensaeeceeeaeececeneaeeeeeaeeeeeeeaeeeensaneeeeees A less simple simulation program cccccccccccecesscceeeesseeeeceeceeeeseseceueneeceeeaaeeesesssaeeceeseasensaeeeeseaaeeeeseaeaseceeseeeeeneeeeeenaeeees An even more complicated simulation PrOQVAM sccsscceescecesnecesseeesesecsseeceseceseecessecesaeeessacecesaecesse
232. sedetadesasededenecetedebacededebacededenadededenadededevadededenadedeseuadededanededesenadenedenadsvedenesoteuenas 62 Defining the outputileuirinnaa a ee tte 62 Standard Error Reports A a a 62 Varlance covariance Matrix cians ett ccet th ehh aks Ain te eek vce le ee eee atcha 63 Confidence Interv l Report 25 34 iii aa od AL A hh Se ee 63 Report with no standard error or confidence intervals eeseeesecsscecsseeeesseeesseecsacecseecssaeeesseecsseecseeseteeeesaeessaeers 65 Printing DistributrO iss eese seen esee e e eea a ea eae e eese ae naet eae e Ne a eeaeee trata eean naa Sai 65 Othier Printin ODON Sen Egee ee ita ii anae edea arae er dle tales edi 65 Variabl s created by models ode ee aee Ke arane aeea a e aa tog aang Nar aae wdea eap a es Naaa aes adaa paseak 65 BUILDING MODEL STATEMENTS dereer rerea e ane inepe de osnttvssecvestegetnsenvsecuosdevesnsvensecenslopetnesuntegusclovednosuneesuesdovsdesbutiseveeess 66 The DATAA O a bebe Td eben Taba She Tel Leven e dae Died odin a aa eade 67 The PARAM JUN AA E NN 68 Setting Parameter Information sss aeea a a o ane ane d ql dra 69 Modeling Covariate Effects EE O EEE EE EE NOE OE dedatedents pda 70 Me PPF JUNCHONS e a e does tube dhe ing ben r e e e did daa E Ee A eE 71 PDE Time Ar MENOS A teeta dati eel aia A ltd tela dsd 72 The Hazard Parameter lt al as aladdin 74 NAAA A O 74 The LEVELDEETA JUNCO A A A io 76 SETTING THE MAXIMIZATION METHOD 3 2 lt csoiesssvesdsvace
233. seissoetsduvereoscesievesdu veces eiii rain dba 77 Conjugate gradient method ce 8 sstsey tet eicteti ee tet coitus ates Bs ides ote ei tet teeta iii et eis one Bude tos 78 SIMPLE A IA reece eer errr Ada Adal EOS de 79 DiIEEEMeNOd A A A A AOS eee ee 79 Simulated Annealing Method Ad is oot 80 Stopping Oller araa aE r E a AE E a E E ise 82 Looping Through Me thodss a E tr s 82 THE INTERACTIVE DEBUGGER E ronda od 82 PLOTS AND GRAPHS ai a a TEE AAA KEE KASAKA ANENA O ANAONA NAAA NAA A KEAREN AA AOTROA AA AA EAEE Aa ak 85 CREATING PLOTS e rae a a Aae a E E AA E EEA ANE ARE exes R EEE EEE EEEE ES EEEE ES 85 Defining the PIF le cscc cee cd gaia csc ccehs a ds e ee eee ea e eae aaeeea dead 86 The Plot Statement ARA A a aea AN 87 The Curve Staltement ia obec ces redee ere ee i e i e E eee ddesdiesdeactheddcadhededescaendiend 88 Two dimensional Plots sii cited a a tallada ds 88 TE Yee A E A RARE ANAE REEE O AREEN EEN O EEE ORE E R 89 AE td E A dd RE OEE EROE O OE O EE E EE 89 WE a RE E NN 89 ANDINA NS 90 Other Steiner dt illes 91 Three dimensional Plots p datada talent del dadas daban edil dcir dates si 92 Multiple plots ss ds tek capctss soaks cia die tae dh ee teas dee dahn headed daca tube ddea tae dardo tabs ddguins donante Medes anida ici 95 Working With Gnuplot A e re el Shaatae cdots aa tone aa a a 96 Whatis Gruplotk id saint E Aid asiig Mai cies 96 How to Obtain Grmuplot 2i a Sa 96 Basi s of Gnuplot cia el
234. sence of highly multimodal likelihood surfaces The user can fine tune the behavior of the algorithm so that functions with complex topography can be searched more thoroughly for the maximum Another advantage of simulated annealing is that it does not require computation of derivatives In fact simulated annealing can find the maximum of discontinuous functions and those otherwise without first derivatives Finally the simulated annealing algorithm is extremely simple and intuitive The disadvantages of simulated annealing are that it usually takes from one to several orders of magnitude more function evaluations than do other methods and the user must have an understanding of the algorithm to set up initial parameters that lend themselves to efficient estimation Sometimes it is worth experimenting to find the best combinations of input parameters to the simulated annealing algorithm so as to minimize the total number of function evaluations Simulated annealing begins at some user defined temperature T and a user defined rate of cooling r At the end of one cycle of annealing the temperature is reduced as T Txr and a new cycle of annealing is performed Typically the temperature will be 1 for simple function to 100 000 for difficult functions and it is cooled every cycle by r 0 85 When the algorithm begins the starting point is evaluated and becomes the best value so far Each iteration will then search the likelihood surface in a partiall
235. separated by commas others not Can you identify each of the six arguments They are My name is This is a string constant first This is a variable defined earlier in the program mus This is a one character string constant middle This is another variable rR Another one character string constant last This is a third variable Suppose earlier in the program there was the statements first Thomas middle A last Edison Then the WRITELN statement above will write 6 different things to the screen Here is a murkier statement WRITELN om 1 mom If you look carefully you can deduce that the output is the 5 character sequence 1 The same as if you had typed WRITELN A programmer with a more developed sense of aesthetics would do neither of the above two statements Rather s he would recognize that it is very confusing and write the program this way singlequote space comma WRITELN singlequote space comma space singlequote As an aside you can use the operator to concatenate strings So another way of writing the program is 140 Programming tutorial singlequote space comma WRITELN singlequote space comma space singlequote Better yet it could be written singlequote space comma confusingstring singlequote space comma space singlequote WRITELN confusingstring Wit
236. sions before attempting to develop mle programs The likelihood within a MODEL statement is a single sometimes complicated expression Expressions are used to define limits of integration summations and products they can be used to define START HIGH LOW and TES values for parameters and many other things This section briefly discusses expressions and functions and then provides some details on functions of special interest when building likelihood models The reference manual should be consulted for summaries of expressions and descriptions of all functions defined in mle At the simplest level an expression in mle can be a numerical constant or a variable name More complex expressions consist of algebraic operators etc and function calls each with zero or more arguments Most functions in mle are simple functions with a fixed number of arguments for example PERMUTATIONS x y ARCSIN x ABS x MIX p x y 66 mle 2 1 manual A second class of functions are more complex and have a more complicated syntax These functions begin with a keyword and end with an END Examples of some of these functions are the PARAM END function DATA END function not to be confused with the DATA END statement described in a previous chapter the PDF END function the INTEGRATE a b c END function and the IF THEN ELSE END function Suppose you want to integrate
237. ssions errorbars 3 to 4 CURVE expressions 2d only financebars 7 CURVE expressions 2d only fsteps 2 CURVE expressions 2d only histeps 2 CURVE expressions 2d only impulses 2 2d or 3 3d CURVE expressions mle 2 1 manual lines 2 2d or 3 3d CURVE expressions linespoints 2 2d or 3 3d CURVE expressions points 2 2d or 3 3d CURVE expressions steps 2 CURVE expressions 2d only vector 4 2d or 5 3d CURVE expressions xerrorbars 3 to 4 CURVE expressions 2d only xyerrorbars 4 to 6 CURVE expressions 2d only yerrorbars 3 to 4 CURVE expressions 2d only Options can follow each plot style in the wITH string The options are linetype lt number gt linesize lt number gt linewith lt number gt pointtype lt number gt and pointsize lt number gt the options can be abbreviated 1t 1s 1w pt ps respectively The Gnuplot manual discusses these options in more detail Here is example of a simple plot that makes use of some of the CURVE options MLE PLOTFILE DEFAULTPLOTNAME PLOT set key bottom left set y2tics CURVE KEY sin x AXES x1ly1 WIT x 0 2 PI 100 x SIN x END CURVE KEY cos x AXES x pi li linety x 0 2 PI 100 x COS x END CURVE KEY tan x AXES x T Ti linetype 2 E 07 2D 100 0 x TAN x END END plot END mle sin x 60 0 8 SIX RR X ae 0 1
238. statement initiates a single graph or chart The statement does not do the plotting itself instead each CURVE END statement executed within the PLOT END statement will add a single curve to the plot The format of the statement is PLOT lt string_expr gt lt statements gt END When a PLOT statement is executed a few statements may be written to the plot file Then the lt statements gt are executed All CURVE statements executed before the END is reached will result in one curve being added to the current plot The optional series of string expressions enclosed within parentheses can immediately following the PLOT statement These strings will be written to the plot file The purpose of these strings is to provide additional information to the Gnuplot program such as titles ranges and borders They are simply written verbatim to the plot file In fact plots can be written in the Gnuplot language with these strings Here is an example MLE PLOTFILE gploteg plt PLOT plot 0 2 pi 5 5 sin x cos x tan x END END The resulting Gnuplot file is set terminal windows reset set data style lines set autoscale set nokey plot 0 2 pi 5 5 sin x cos x tan x And here is the resulting plot The PLOT statement writes the PLOTINIT string to the plot file You can assign a string to the PLOTINIT variable and it will be written for each PLOT
239. symmetrical dispersion param asymmetrical diserpsion param da 0 0 antisymmetry 0 5 the output file EGER a REAL b REAL REAL from the trait distribution tribution to use RETURN QUANTILE NORMAL RAND a b END ELSEIF dist 2 THEN RETURN QUANTILE LOGNO ELSEIF DIST 3 THEN RETURN QUANTILE EXPON ELSE WRITELN Error dist is HALT END if END drawtrait FUNCTION DRAWNOISE mu REAL draws a random devel RETURN QUANTILE NORMAL END drawnoise PROCEDURE openoutfile i INT dig STRING IF NOT DIREXISTS outdir MKDIR outdir END if IF i lt 10 THEN dig 00 INT2STR i ELSEIF i lt 100 THEN dig 0 INT2STR i ELSE dig INT2STR i END OPENWRITE fout outdir END openoutfile FOR 1 TO nsims DO c openout file s FOR j 1 TO nsubjects DO pick the individua size drawtrait 2 tra create right and 1 IF RAND gt prob_AS THEN RMAL RAND a b END ENTIAL RAND a END invalid sigma REAL REAL opmental noise value RAND mu sigma END EGER THEN outfilebase dig reate nsims files l s baseline trait it_a trait_b eft measures right left ELSE left right END if size size size size drawnoise da drawnoise da drawnoise da drawnoise da sd_right sd_left sd_right sd_left write this observation to the file WRITELN fo END for 3 ut i
240. t hidden3d set view 50 splot eg8 001 using 1 2 3 notitle eg8 002 using 1 2 3 notitle eg8 003 using 1 2 3 notitle Notice that there were three plot data files created one for each surface The resulting graph looks like this 94 mle 2 1 manual J A awa JULES EPR 777 i GER RL TS E IIS ARSS LF 3 Cae UNOS Or ia SIGE RON 6 Z LIX NE RY 83 SES N L 7 A i Multiple plots Multiple plots can be placed on a single page with the MULTIPLOT END statement The form of the statement is MULTIPLOT lt xplots gt lt yplots gt lt statements gt END The two arguments determine the number of plots that are placed across the page lt xplots gt and vertically down the page lt yplots gt In this way lt xplots gt by lt yplots gt pages of plots are generated Once a page is filled a new page is automatically generated The lt statements gt are any valid mle statements including PLOT END statements typically two or more PLOT statements are executed The PLOT END statements may be executed within a user defined procedure call The PLOTFILE procedure must be called before the MULTIPLOT statement Here is an example The following program shows a series of Weibull distributions MLE PLOTFILE DEFAULTPLOTNAME nx 3 ny 2 MULTIPLOT nx ny totp nx ny FOR mu 1 to totp DO PLOT FOR sig 1 TO 3 DO CURVE t 0 10 50 t PDF WEIBULL t mu sig
241. t is gt maximum range or the 2nd argument lt the Ist argument e An area under the probability density function 2 arguments within the range of the PDF e The hazard function 3 identical arguments e Any of the above with right and left truncation of the distribution The 3rd and 4th arguments define the left and right truncation points Thus in the syntax of mle there is a natural delineation between arguments and intrinsic parameters Consider the following function call PDF NORMAL 0 4 0 40 10 20 END This function call has the four time arguments 0 4 0 and 40 Together they specify a normal distribution truncated over the range O and 40 with the area between O and 4 returned The two intrinsic parameters of the normal are passed as u 10 and o 20 There are no model parameters in this example simply because there are no PARAM functions specified Writing mle Programs This section gives additional details needed to write mle programs The simplest way to create a new mle program is to modify a working program like that given in Figure 1 to make it do the task at hand Style Conventions mle is a free format language That is a program can be written on a single line or spaced across multiple lines Indenting spacing within a line and spacing across lines is never done for the computer Rather the use of indentation is solely for the benefit of human readers mle 2 1 manual Good programming p
242. t parameter specified in the parameter list will be treated as u The second will be treated as How can you know the proper order for parameters Generally location parameters appear first and are usually denoted a in this manual scale parameters are second and shape parameters are third You can get a quick synopsis of each type of PDF by using the h option from the command line e g mle h SHIFTWEIBULL 70 mle 2 1 manual Parameters are also used to model effects of covariates on other parameters Here is an example in which two parameters used in place of some fixed values of u and o for a normal distribution are defined with two covariate parameters each PDF NORMAL topen tclose PARAM mean LOW 100 HIGH 400 START 270 TEST 0 FORM LOGLIN COVAR sex PARAM b_sex_mu LOW 2 HIGH 2 START O END COVAR weight PARAM b_weight_mu LOW 2 HIGH 2 START O END END PARAM stdev LOW 0 1 HIGH 100 START 20 FORM LOGLIN COVAR sex PARAM b_sex_sig LOW 2 HIGH START COVAR weight PARAM b_weight_sig LOW 2 HIGH START END END 2 2 In this example the first parameter of the normal distribution u has two covariates and their corresponding parameters modeled on it The exact specification of how covariates and their parameters are modeled depend on the Form of the intrinsic parameter In the example the FORM LOGLIN specifies that a log linear specification is to be used The log linear specification is y
243. t the start of the next line on the screen The single argument is the string Hello Universe The term string refers to a sequence of text characters The single quote marks on each side serve to define the extent of the string As it happens you can also use double quote marks so that Hello Universe does the same thing You cannot mix the two types of marks for a string If all went well when you ran the program the message Hello Universe was sent to the screen and you have successfully written your first mle program If not you have probably gotten an error message For example if you left off the second quote mark the message is returned Unclosed at end of a line or file Error found while parsing line 2 column 10 in file egl mle mle like all programming languages requires you to follow some very strict rules Here are a few to get you started e Arguments to simple functions and procedures are enclosed in a set of parentheses not square brackets or curly braces e Keywords and variables cannot have spaces and most punctuation within them mleis a free fromat language Indentation spacing and formatting are ignored with some exceptions The previous program could be written on a single line as MLE WRITELN Hello Universe END e A space or valid punctuation mark must separate keywords The program MLEWRITELN Hello Universe END is not valid because MLE and WRITELN are run together The program MLE
244. the solver and the rest of the parameter space is excluded Furthermore some conditions can cause the maximizer to leap to another part of the surface where a local minimum might be reached For example when maximizing a likelihood function that includes numerical integration the tolerance in the integrator must be several orders of magnitude smaller than that of the solver or else the error in integration can lead the solver astray Two forms of the conjugate gradient method are available METHOD CGRADIENT1 and METHOD CGRADIENT2 Simplex The simplex method is a derivative free maximization method described by Nelder and Mead 1965 and popularized by Press et al 1989 The method is set with METHOD SIMPLEX Direct Method A simple method for finding a maximum is to consider only one dimension at a time So for our map we would find the highest latitude for a given longitude by examining points along a line of longitude We could use the method of bisection or even better ways to find the maximum along that line of longitude in the fewest number of evaluations i e fewest holes Once we have settled on a latitude we can find the longitude of highest elevation along that latitude We next go back and find a new latitude for the new longitude etc This is known as the direct method Nelson 1983 and works well for some functions over a small number of dimensions In fact the method is usually more robust at finding a glob
245. the time of observation which is the probability of the event occurring at some time unknown time before the time of observation In mle the area under the likelihood for a responder is specified as PDF LOGNORMAL 1 5 2 0 5 END return 0 217 which is the area between O or anything less than 0 and 5 under a lognormal distribution with u 2 and o 0 5 Consider a data set that contains a time of observation and an indicator variable that is O if the observation was a non responder and 1 for a responder One way of coding this model is to place an IF THEN ELSE END statement to switch between responder and nonrespondes likelihoods as appropriate for each observation MLE TITLE Example DATAFILE ex4 dat OUTFILE ex4 out DATA t FIELD 1 time of observation resp FIELD 2 1 if responder 0 if nonresponder END MODEL DATA IF resp 1 THEN it is a responder PDF LOGNORMAL 0 t PARAM a LOW 0 00001 HIGH 9 START 1 END PARAM b LOW 0 00001 HIGH 2 START 0 4 END END of the PDF ELSE non responder PDF LOGNORMAL t oo a b END END of if then else END data RUN FULL END of the MODEL END Alternatively The following mle data statement will transform the observation time into a set of two times For a responder topen will be set to zero and tclose will take the value of the observed time For a non responder t open will take the value of the observed time and tclose will be set to zer
246. three models will yield identical results because of round off errors This will be most The loglikelihood is specified 30 END 1 END The likelihood is specified 30 END 1 END In practice results may differ slightly noticeable in the last model because the product of very small numbers will lead to smaller and smaller numbers before the log is taken of the entire likelihood There are several reasons for data is used within the likelihood Some likelihoods are much easier to write as a loglikelihood providing these three ways of specifying how the e Some likelihoods require things like taking an expectation outside of the individual likelihoods where the integration is done outside of the data function e Some multilevel or hierarchical likelihoods There are two functions that are closely related LEVELDELTA funciton hierarchical models can be constructed The PARAM function mle has a general method for defining all parame function defines a parameter and its characteris MODEL statement These two functions p require this type of control over the likelihood VEL function and the rovides a mechanism by which multilevel or to the pata function the LE ters to be used in a likelihood model The param tics The function should only be used within a When models are solved free parameters are estimated by iteratively plugging new values in for those param
247. ts the menu The editor preserves a number of settings from one editing session to the next foreground color background color insert status word wrap status right and left margins ruler setting mle indent setting mle keyword case setting mle comment case setting back up setting search from top flag search ignore case flag 35 mle 2 1 manual The information for these settings is stored in the file emle cfg which resides in the same directory as emle exe The configuration file can also save a series of user defined commands that are executed whenever the editor is started To add commands to the file use the A1t_F9 command which prompts for additional commands before saving the configuration file Default command mapping The default mapping between editor commands and the keyboard is described in this section Notice that a command can have more than one key assigned to it The default keyboard mapping can be changed by saving the current map Shift_F9 by default and editing the resulting file The editor will then read the keyboard map by default The keymap is stored in the file emle kbm which resides in the same directory as emle exe Cursor control commands RAT Go to next character CATE sesscreeeses Go to previous character Ctrl PgUp Go to beginning of file Ctrl_PgDn Go to end of file Ends Go to end of line Home Go to beginning of line Dn Atte niinen Go to next line
248. uared distribution with one degree of freedom Over both directions the total interval can be considered a 95 confidence interval for the parameter 63 mle 2 1 manual The interpretation of the one dimensional confidence region must be done with caution as the method assumes that parameters are uncorrelated Figure 4 shows what happens when parameters are correlated which is quite common Panel a shows the contour of the loglikelihood surface when parameter 1 is changed over the p axis and parameter 2 is changed over the pz axis The bold ellipsis represents the desired confidence level say 95 The dotted lines show the confidence limits when p is perturbed along the axis to each side of the estimate this occurs where the bold ellipse intersects the p axis Panel b shows what happens when parameters are correlated Now the dotted lines still show the 95 confidence limits when p is perturbed from the estimate and p gt is held constant at its maximum The dashed lines show the true confidence region defined as the greatest extend of the 95 confidence ellipse over the space of p and p2 It is easy to see that the one dimensional confidence interval will always underrepresented the true interval p and p gt are correlated a b Figure 4 The log likelihood contour over the space of parameters p and p gt The bold ellipse represents the target change in likelihood that defines the upper and lower bounds of the confi
249. uce a report based on a sample size corrected Akaike s information criterion AICC Setting BIC_SELECT TRUE will produce a report based on Bayesian information criterion BIC For each report the most parsimonious model is selected Parameters for the selected model are reported with new estimates of standard errors that include model selection uncertainty The variable 1C_SAMPLE_SIZE can be set to the effective sample size for a set of observations used for AICC and BIC report e The RUN part of the MODEL statement now takes on a THEN lt statements gt END clause The statements are executed after each sub model is solved Likewise THEN lt statements gt END can be used after each FULL REDUCE and WITH clause to run statements after the model Differences Between Version 1 and Version 2 Changes and New Features in Version 2 There are a number of syntax differences and other changes between mle version 1 and version 2 Here is a summary of the most important changes e General algebraic expressions are now recognized Standard operators include AND OR XOR NOT MOD DIV SHL SHR gt lt lt gt gt lt These operators can be used to build algebraic and Boolean expressions of nearly unlimited complexity Both and are allowed for specifying Boolean comparisons The standard operator precedence common to most programming la
250. ues until lt bexpr gt evaluates to FALSE That is when lt bexpr gt is FALS the loop terminates E The chief difference between a WHILE loop and a REPEAT loop is that the REPEAT loop is always executed at least once The wHILE loop may be skipped the first time Here is an example of a small program using a while loop Compute factorial n INTEGER WRITE Enter an integer READLN n tmp REAL 1 WHILE n gt 1 DO tmp n n WRITELN tmp The Break Statement The BREAK statement is a special statement that works with FOR WHILE and REPEAT statements When a BREAK statement is encountered the loop is immediately exited The behavior of a BREAK statement outside of a loop causes the current scope to be exited This means that within the main program outside of a user defined procedure or function a BREAK acts like a HALT and causes the program to terminate Within a user defined procedure or function the procedure or function is exited back to the place from where it was called 148 REPEAT Programming tutorial Here is an example of how the BREAK statement can be used to shorten the section of code given in an earlier example WRITE Angle in radians READLN angle IF ang BREAK END WRITELN UNTIL 1 0 le gt 0 AND angle lt twopi THEN exit the REPEAT loop Angle must be gt 0 and lt twopi that is loop for
251. uld stop searching for the maximum likelihood after 10 000 evaluations of the likelihood Looping Through Methods mle provides a mechanism to specify that different methods be used to solve the same likelihood For example you can set METHOD1 DIRECT MAXITER1 10 METHOD2 CGRADIENT1 MAXITER2 50 0 to begin the problem with the direct method and then switch to a conjugate gradient solver for the next 500 iterations The variables METHOD MAXEVALS MAXITER and EPSILON can have a digit appended in this way When the variable METHOD_LOOP is set to true mle will loop back to the first method and continue the solver sequence again until one of the methods converges normally The Interactive Debugger mle incorporates an interactive debugger that provides some degree of control while models are being solved Entries in the symbol table can be viewed and changed so that convergence can be forced early or postponed output variables can be changed and the values of various debugging options can be set and reset The debugger is called by typing lt CTRL gt C on most systems The lt BREAK gt key also works on some systems After mle gets to some reasonable stopping point usually the end of an iteration control will be passed to the user The debugger responds with Exit immediately exits the program Resume resumes running mle from where it left off One step continue from where it left off for o
252. unction Each of the methods has strengths and weaknesses for different types of functions Understanding some of the details of each method is useful for deciding which to use for any given application The following sections describe each of the maximizers and points out strengths and weaknesses of each The behavior of some methods can be modified considerably by the user The maximization method is selected by setting the variable merHop For example METHOD ANNEALING will use the simulated annealing method The default method is DIRECT The overall goal of function maximization is to find the set of parameters that maximize a function A simple analogy is to imagine that you are looking at a topographic map that codes altitude by color You want to find the longitude and latitude coordinates the parameters that will put you at the highest point on the map By looking over the map you may be able to 77 mle 2 1 manual quickly ascertain a mountain peak or some other maximum In order to do this however you effectively scanned hundreds of thousands of points on the map until finding those places where the colors suggest the highest altitude With a little more work the highest peak is easily resolved Visual evaluation of maximum elevation is easy and takes almost no time because the map shows the elevations evaluated at hundreds of thousands of points on the map and our eyes can quickly scan those points That is each fu
253. understanding of the specification for running models This chapter discusses the MODEL statement It is assumed that you understand the basics of expressions and data types for the mle language The reference manual and Chapter 1 provides much of the necessary background on expressions This chapter covers several aspects of expressions that are primarily used for building typical likelihood models in mle the PARAM function the PDF function the DATA function and LEVEL functions Structure of the MODEL Statement The basic structure of the MODEL statement looks like this MODEL lt expression gt RUN THEN END lt runlist gt END The single lt expression gt in the MODEL statement defines the likelihood that is to be maximized Technical details about writing expressions are given in the Reference manual some details are provided here as well The optional THEN END clause gives you a way to do something after each model is solved For example you could insert code to transform the parameters from one form into another plot distributions or write results to another file Most legal statements can come between the THEN and END except DATA END and MODEL END statements The lt runlist gt is a series of one or more commands that specify which of the parameters are to be changed in maximizing the likelihood The commands are FULL REDUCE Or WITH A simple example
254. w the file can be read from or written to depending on how it was opened The ri file Likewise WRIT append to files Again the file variable must be the first argument The first E and WRIT EAD and READLN procedures can be used to read from a argument to the procedures must be the file variable ELN procedures can be used to write or e After operations on a file have been completed the cLosE procedure ensures the file is properly closed The close procedure forces the operating system to flush any buffers and update the directory information for a file Here is a simple program that reads in a file and reverses the characters in each line Notice the use of the 1 EXISTS function for checking to see if a file exists 150 EOF function to check for the end of the file and the Programming tutorial MLE reads text from a file and reverses the text filename STRING f FILE textline STRING READDELIMITERS read the whole line including spaces REPEAT WRITE File name READLN filename ok EXISTS filename IF NOT ok THEN WRITELN Couldn t find filename END if UNTIL ok OPENREAD f filename WHILE NOT EOF f DO READLN f textline FOR x STRINGLEN textline TO 1 STEP 1 DO WRITE SUBSTRING textline x 1 END for WRITELN END while END mle User defined procedures mle allows users to define their own procedures and functi
255. x AND s_age lt x obswidth THEN ageo cid x agec cid x BREAK END if END for END if obswidth end for cid now estimate params from the current simulated data MODEL SUMMATION j 1 numbobs LN PDF NORMAL ageo j agec j PARAM mu LOW 1 HIGH 10 START 3 END PARAM sig LOW 0 01 HIGH 5 START 2 END END END summation RUN FULL THEN save parameter estimates savemu sim mu savesig sim sig END then END model END for sim Now do two models one to tally the mu s and one sig s PRINTLN Finding mean and stdev for mu parameters MODEL SUMMATION j 1 numbsims LN PDF NORMAL savemu 3 PARAM mu_mean LOW 1 HIGH 10 START 3 TEST 6 0 END PARAM mu_sd LOW 0 0001 HIGH 5 START 2 END END pdf ln END summation RUN FULL THEN print out simulation stats PRINTLN mu mean mu_mean y mu SD mu_sd true sitmean PRINTLN Absolute bias sitmean mu_mean bias 100 mu_mean sitmean t test param lt gt 0 t mu_mean mu_sd t test param sitmean t PRINTLN PRINTLN mu_mean sitmean mu_sd END then end model Now collect info for the estimates of sig PRINTLN Finding mean and stdev for sig parameters MODEL SUMMATION j 1 numbsims LN PDF NORMAL savesig 3 PARAM sig_mean LOW 0 00001 HIGH 6 START 3 TEST 1 0 END PARAM sig_sd LOW 0 000001 HIGH 2 START 0 5 END END paf ln END summation
256. x10 Zeta zepto x102 Yotta yocto x10 lt N mu 23 0xg gt gt 3 lines re 3 Observat NAME 1 42 2 38 amp Us 43 ITa 031 28s 505 mle 2 1 manual ad from file THEDATA DAT ions kept and 0 observations dropped age amount rate time 0000000 2000 0000 18000000 50 4000000 0000000 8000 00000 12000000 37 2000000 0000000 1000 0000 19000000 25 2000000 3333333 0333 3333 16333333 37 6000000 3333333 4333333 33 00143333 158 880000 1010093 2081 66600 03785939 12 6047610 0000000 8000 00000 12000000 25 2000000 0000000 2000 0000 19000000 50 4000000 Accessing observations lessthan40 greatertha FOR D_IDX IF age gt greate ELSE Variables created by the DATA statement are treated somewhat differently than are other variables The value of a particular variable changes depending on a counter that keeps track of the current observation The value of a variable for the current observation is accessed by specifying the variable name What determines the current observation Within MoDEL statements the current observation is usually set by the DATA function Internally the DATA function loops through all observations and sums the individual likelihood computed for each observation The LEVEL and LEVELDELTA functions work in similar ways Here are more specific details on how the individual observations are accessed Consider the variables read in the example above When
257. y random way and always keep track of the best point so far A single cycle of annealing i e one iteration consists of the following First a cycle of random movements is started Nana random steps are taken over one direction at a time The 80 mle 2 1 manual maximum width of the random step for parameter i is controlled by the step length variable v For our map example this would correspond to evaluating N ana randomly picked points along a line of longitude or latitude Initially we would use the entire height and with of the map for the maximum step length As each point is evaluated we keep track of the overall best maximum Any time we find a point higher than our current maximum we move to that point and consider it our new starting point But if a lower point is found we might accept that point according to the Metropolis criterion Metropolis et al 1953 by which the point is accepted with probability exp A T where Al is the difference between the current starting point and the downhill point we have just evaluated In other words we draw a uniform random number on 0 1 and accept the move if that number is less than a negative exponential survival function of Al with parameter 1 7 This criterion means that at high temperatures we will frequently accept downhill moves with large changes in the loglikelihood but as temperature drops downhill moves will only occur at small changes in the loglikelihood After completing the N

A Programming Language for Building Likelihood Models Version 2.1

Contents

Download Pdf Manuals

Related Search

Related Contents