Home

Graphical Tools for Exploring and Analyzing Data From ARIMA Time

image

Contents

1. Oglar arl 6 For the quantities inside the substitute the observed values if the value has been observed ie if the subscript of Z or a on the right hand side is less than the forecast origin t substitute in the observed value If the quantity has not been observed because it is in the future then substitute in the forecast for this value 0 for a future a and a previously 26 computed l for a future value of Z Box and Jenkins 1969 show that forecast is optimal in the sense that it minimizes the mean squared error of the forecasts A forecast error for predicting Z with lead time I ie forecasting ahead time periods is denoted by e l and defined as the difference between an observed Z and its forecast counterpart Z I e l Zi Zi I This forecast error has variance Varle o2 1 yf ji 7 where the y coefficients are the coefficients in the random shock infinite MA form of the ARIMA model If the random shocks i e the a values are normally distributed and if we have an appropriate ARIMA model then our forecasts and the associated forecast errors are ap proximately normally distributed Then an approximate 95 prediction interval around any forecast will be Zi 1 a 1 968 1 where Seq y Varle U Both ZAO and Sj depend on the unknown model parameters With reasonably large samples however adequate approximate prediction intervals can be computed by substit
2. The variance of the MA q model is vo 02 Var Zi E Z2 1 02 02 3 5 Model autocorrelation function The model based or true autocorrelations Yk Cov Zt Zie E Z Zien k 1 2 cia yo War Z Var Z ee are the theoretical correlations between observations separated by k time periods This autocorrelation function where the correlation between observations separated by k time periods is a constant is defined only for stationary time series The true autocorrelation function depends on the underlying model and the parameters of the model The covari ances and variances needed to compute the ACF can be derived by standard expectation operations The formulas are simple for pure AR and pure MA models The formulas be come more complicated for mixed ARMA models Box and Jenkins 1969 or Wei 1989 for example give details 13 Example 3 4 Autocovariance and Autocorrelation Functions for the MA 2 Model The autocovariance function for an MA 2 model can be obtained from N E Z 2Z141 Ef a 1 201 2 ar Orar 201 1 at 1 E 0 42a7_ 9147 010pB af 1 E az 0102 A1 o4 Then the autocorrelation function is ei 0102 01 Min IA Using similar operations it is easy to show that se y 1 07 02 p2 and that in general for MA 2 pk Yk J0 0 for k gt 2 The calculations are similar for an MA q model with other values of q q Example 3 5 A
3. a eae pe ee en nonstationary examples Meee ee ee eS SS meneren a Se a ee ee ee Se ee The following commands illustrate identification and estimation forecasting runs for simulated nonstationary time series The models shown are not necessarily the best ones The read ts commands do not need to be reexecuted if the directory home wqmeeker stat451 Data has been attached in Splus simnsa d lt read ts file name simnsa dat path name home wqmeeker stat451 data title Simulated Time Series A series units Sales frequency 5 61 start year 21 start month 1 simnsb d lt read ts file name simnsb dat path name home wqmeeker stat451 data title Simulated Time Series B series units Sales frequency 5 start year 21 start month 1 simnsc d lt read ts file name simnsc dat path name home wqmeeker stat451 data title Simulated Time Series C series units Sales frequency 5 start year 21 start month 1 iden simnsa d iden simnsa d d 1 esti simnsa d model model pdq d 1 q 1 esti simnsa d model model pdq d 1 q 2 iden simnsb d iden simnsb d d 1 esti simnsb d model model pdq d 1 q 1 esti simnsb d model model pdq p 1 d 1 q 1 iden simnsc d iden simnsc d d 1 esti simnsc d model model pdq p 1 d 1 esti simnsc d model model pdq p 2 d 1 esti simnsc d model model pdq p 3 d 1 here is a nonseasonal example for which it is not
4. e Function multiple acf sim allows the user to assess the variability i e noise that one will expect to see in sample ACF functions The function allows the user to assess the effect that realization size will have on one s ability to correctly identify an ARMA model e Function multiple pi sim allows the user to assess the variability i e noise that one will expect to see in a sequence of computed prediction intervals The function allows the user to assess the effect of sample size and confidence level on the size and probability of correctness of a simulated sequence of prediction intervals e Function multiple probplot sim allows the user to assess the variability i e noise that one will expect to see in a sequence of probability plots The function allows the user to assess the effect of sample size has on one s ability to discriminate among distributional models 2 3 Creating a data structure S plus object to serve as input to the iden and esti functions Before using the iden and esti functions it is necessary to read data from an ASCII file The read ts function reads data title time and period information and puts all of the information into one S plus object making it easy to use the data analysis functions iden and esti This function can be used in the interactive mode or in the command mode Example 2 1 To read the savings rate data introduced in Example 1 1 using the interac tive mode give
5. peak load data load d lt read ts file name load dat path name home wqmeeker stat451 data title Monthly Peak Load for Iowa Electricity series units Millions of Kilowatt hours frequency 12 start year 1970 start month 1 iden load d iden load d D 1 iden load d D 1 d 1 esti load d model model pdq D 1 p 2 period 12 esti load d model model pdq D 1 p 2 Q 1 period 12 esti load d model airline model ozone d lt read ts file name ozone dat path name home wqmeeker stat451 data title Monthly Average of Hourly Readings of Ozone on Downtown Los Angeles series units pphm frequency 12 start year 1955 start month 1 identification iden ozone d iden ozone d D 1 iden ozone d d 1 D 1 iden ozone d gamma 0 iden ozone d D 1 gamma 0 estimation no transformation esti ozone d model model pdq D 1 q 1 Q 1 period 12 y range c 0 1 9 esti ozone d model model pdq D 1 p 3 Q 1 period 12 y range c 0 1 9 esti ozone d model model pdq D 1 p 1 q 1 Q 1 period 12 y range c 0 1 9 esti ozone d model airline model y range c 0 1 9 log transformation esti ozone d model model pdq D 1 q 1 Q 1 period 12 y range c 0 1 9 gamma 0 66 esti ozone d model model pdg D 1 p 3 Q 1 period 12 y range c 0 1 9 gamma 0 esti ozone d model model pdq D 1 p 1 q 1 Q 1 period 12 y range c 0 1 9 gamma 0 esti ozone d model airline model y range c 0 1 9 ga
6. 0 0 0 0 0 0 0 9 09 nacf 38 wt 1 1B 1 9B712 a_t plot true acfpacf model list ma c 1 0 0 0 0 0 0 0 0 0 0 0 9 09 nacf 38 71 H AR seasonal models 1 5B 1 9B 12 w_t a_t plot true acfpacf model list ar c 5 0 0 0 0 0 0 0 0 0 0 0 9 45 nacf 38 1 1B 1 9B712 w_t a_t plot true acfpacf model list ar c 1 0 0 0 0 0 0 0 0 0 0 0 9 09 nacf 38 H mixed ARMA seasonal models plot true acfpacf model list ar c 0 0 0 0 0 0 0 0 0 0 0 5 ma c 0 0 0 0 0 0 0 0 0 0 0 5 nacf 38 plot true acfpacf model list ar c 0 0 0 0 0 0 0 0 0 0 0 7 ma c 0 0 0 0 0 0 0 0 0 0 0 3 nacf 38 plot true acfpacf model list ar c 0 0 0 0 0 0 0 0 0 0 0 7 ma c 0 0 0 0 0 0 0 0 0 0 0 3 nacf 38 HHHHHHpsi coefficients psi model list ar c 0 0 0 0 0 0 0 0 0 0 0 5 ma c 0 0 0 0 0 0 0 0 0 0 0 5 number psi 38 some likelihood contour examples this function uses an Splus loop and as such takes a failrly large amount of time to run arima contour mjborsa parks d model model pdq D 1 Q 1 q 1 period 12 list 1 seq 99 0 length 15 list 2 seq 5 5 length 15 arima contour airline d airline model 72 References Becker R A Chambers J M and Wilks A R 1988 The New S Language A Pro gramming Environment for Data Analysis and Graphics Wadsworth and Brooks Box G E P and Jenkins G M 1976 Time Series Analysis Forec
7. 1 0 95 2 u S lt 1 o oO o 2 isd 2 0 5 10 15 20 Lag True PACF ma 1 0 95 i Lu z l c 2 L 1 1 rr 14 p n 2 E 2 0 5 10 15 20 Lag Figure 10 True ACF and PACF for MA 2 model with 6 1 02 95 True ACF ma 0 95 ar 0 95 W ae LA Sle aimee S o o 0 5 10 15 20 Lag True PACF ma 0 95 ar 0 95 o True PACF 0 0 1 0 Lag Figure 11 True ACF and PACF for ARMA 1 1 model with 1 95 01 95 17 gt The PACF function will die down sometimes in an oscillating manner e For ARMA p models gt The ACF function will die down sometimes in an oscillating manner after lag max 0 q p gt The PACF function will die down sometimes in an oscillating manner after lag As described in Section 4 estimates of the the ACF and PACF computed from data can be compared with the expected patterns of the true ACF and PACF functions and used to help suggest appropriate models for the data 4 Tentative Identification At the tentative identification stage we use sample autocorrelation function ACF and the sample partial autocorrelation function PACF The sample ACF and sample PACF can be thought of as estimates of similar theoretical or true ACF and PACF functions that are characteristic of some underlying ARMA model Recall from Sections 3 5 and 3 6 that the true ACF and true PACF can be computed as a function of a specifi
8. Z Zi 10 is a first order seasonal difference with period 12 as would be used for monthly data with 12 observations per year Rewriting 10 and using successive resubstitution i e using Wi 12 24 12 Zt 24 gives Z tW Zi 24 Wire Wi Z1 36 Wi 24 Wi 12 Wi and so on This is a kind of seasonal integration In general W 1 B Z is a Dth order seasonal difference with period S where D denotes the number of seasonal differences Example 8 5 Refer to Figure 18 This figure shows plots of the logarithms of the airline with no differencing of the time series and provides strong evidence of nonstationarity Figure 22 21 and 23 shows similar outputs from function iden with differencing schemes d 1 D 0 d 0 D 1 and d 1 D 1 respectively The commands used to make these figures were 32 International Airline Passengers w 1 B 1 log Thousands of Passengers N o z 8 N 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 time ACF PACF gt e Ww 2 LL g 8 g3 wo a 2 e 0 10 20 30 0 10 20 30 Lag Lag Figure 21 Output from function iden for the log of the airline data with differencing scheme d 1 D 0 International Airline Passengers w 1 B 12 1 log Thousands of Passengers N S z e 8 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 time ACF PACF 2 2 W O Le lt g 8 rar
9. e Setting up special commands or changing the way that commands perform e g many users like to have the rm remove command ask about each file one by one when you try to remove one or more files at a time e Change the prompt vincent to something more informative I like to see the machine name on my prompt To make things easier for statistics 451 students there is a special setup script that will help you set up your environment for our class assignments To execute this script do the following vincent add wqmeeker you may see a message here before getting the prompt again vincent stat451 course setup Appending add wqmeeker to end of environment Appending add splus to end of environment Appending add stat to end of environment home joeuser emacs No such file or directory Appending Smode commands to emacs nome joeuser emacs No such file or directory nome joeuser cshrc mine No such file or directory 39 Appending some commands to end of cshrc mine home joeuser cshrc mine No such file or directory One time setup of Splus directory for Statistics 451 You need not enter this command again Always run Splus for Stat 451 from this directory Otherwise you will not have access to Meeker s special functions The set up command described above creates a directory named stat451 and sets up some files to make it easy to run S plus on Vincent You should move or put any data files you w
10. optional at least one of p q P or Q must 49 be given to specify a model USAGE model pdq p 0 d 0 q 0 P 0 D 0 Q 0 period NULL OPTIONAL ARGUMENTS p The order of the nonseasonal AR terms in the model both the first and second order terms in the model are re quested then p 2 but if only first order term is request ed then p 1 The default value is p 0 d The number of nonseasonal differences carried out on the data If a value is not specified then the default value is d 0 q The order of the nonseasonal MA terms in the model both the first and second order terms in the model are re quested then q 2 but if only first order term is request ed then q 1 The default value is q 0 P The order of the seasonal AR terms in the model If both the first and second order terms in the model are request ed then P 2 but if only first order term is requested then P 1 The default value is P 0 D The number of seasonal differences carried out on the data If a value is not specified then the default value is D 0 Q The order of the seasonal MA terms in the model If both the first and second order terms in the model are request ed then Q 2 but if only first order term is requested then Q 1 The default value is Q 0 period The seasonal period VALUE The specified model structure is produced SEE ALSO arima mle esti model like tseries EXAMPLES ari model lt model pdq p 1 ar
11. 65 esti gasry d model model pdg p 4 y range c 45 65 esti gasry d model model pdq p 5 y range c 45 65 ar 4 or ar 5 looks good on the output gasrx model lt list ar opt c T T T T T ar c 1 89741535 1 11863605 0 02289849 0 14876710 null model lt list ar opt c T ar c 00001 null x model will give ccf without prewhitening prewhiten gasrx d gasry d null model now specify the ar 5 model including coefficients for prewhitening ccf estimate prewhiten gasrx d gasry d gasrx model generate a matrix of time series that have leading indicators in them grate lt ts c rep 0 6 gasr xymat 1 rep 0 21 grate xmat lt tsmatrix lag grate 3 lag grate 4 lag grate 5 lag grate 6 now estimate the transferfunction models 68 note that the forecasts do not account for the uncertainty in the future x values esti gasry d model model pdg p 3 xreg in lead matrix gasr xymat 1 lagvec c 3 4 5 6 y range c 45 65 esti gasry d model model pdq p 3 xreg in lead matrix gasr xymat 1 lagvec c 3 4 5 6 7 y range c 45 65 the following commands give iden output for the first half the second half and the middle half of the airline data total realization length is 144 iden data subset airline d end 72 iden data subset airline d first 73 iden data subset airline d first 37 end 108 The data subset function can be used similarly in the esti funct
12. The estimated standard errors are sometimes seriously overstated when applying Bartlett s formula to residual autocorrelations This is especially possible at the very short lags lags 1 2 and perhaps lag 3 Therefore we must be very careful in using the t like ratio espe cially those at the short lags Pankratz 1983 suggests using warning limits of 1 50 to indicate the possibility of a significant spike in the residual ACF Interpretation of sample ACF s is complicated because we are generally interpreting simultaneously a large number of p values As with the interpretation of ACF and PACF functions of data during the identification stage the t like ratios should be used only as guidelines for making decisions during the process of model building 22 6 2 Ljung Box test Another way to assess the residual ACF is to test the residual autocorrelations as a set rather than individually The Ljung Box approximate chi squared statistic provides such a test For k autocorrelations we can assess the reasonableness of the following joint null hypothesis Ho pi a pola px a 0 with the test statistic K Q n n 2 n k ARG k 1 where n is the number of observations used to estimate the model The statistic Q ap proximately follows a chi squared distribution with k p degrees of freedom where p is the number of parameters estimated in the ARIMA model If this statistic is larger than ES kp
13. aT s 8 lt T a 0 1 0 Lag Lag Figure 22 Output from function iden for the log of the airline data with differencing scheme d 0 D 1 33 International Airline Passengers w 1 B 1 1 B4 12 1 log Thousands of Passengers 9 So o Bos Ce 9 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 time ACF PACF o o W z 5 2 2 Ee ee 9 8 Bs Er eel ma TI o o 0 10 20 30 0 10 20 30 Lag Lag Figure 23 Output from function iden for the log of the airline data with differencing scheme d 1 D 1 gt iden airline d gamma 0 d 1 D 0 gt iden airline d gamma 0 d 0 D 1 gt iden airline d gamma 0 d 1 D 1 The results with d 1 D 0 and d 0 D 1 continue to indicate nonstationarity The results with d 1 D 1 appears to be stationary In Section 10 2 we will fit a seasonal ARMA model to the data differenced in this way g 9 Seasonal ARIMA Models and Properties 9 1 Seasonal ARIMA model The multiplicative period S seasonal ARIMA p d q x P D Q s model can be written as B p B 1 B 1 B Z 4 B 44 B az 11 where p B 1 dB 2B orth pB is the pth order nonseasonal autoregressive AR operator 0 B 1 01B 62B 6 B is the qth order nonseasonal moving average MA operator p B 1 sB 25B SpgB 34 is the Qth order period S season
14. an S list contains the time series data structure title and the units EXAMPLES xxx d lt read ts home wqmeeker stat451 data xxx dat data is in the file home wqmeeker stat451 data xxx dat savings rate d lt read ts file name home wqmeeker stat451 data savings_rate dat title US Savings Rate for 1955 1980 series units Percent of Disposable Income frequency 4 start year 1955 start month 1 airline d lt read ts file name home wqmeeker stat451 data airline dat title International Airline Passengers series units Thousands of Passengers frequency 12 start year 1949 start month 1 arma roots Compute the roots of a MA or AR defining polynomial and display graphically DESCRIPTION Given the coefficients of a MA or AR defining polynomial this function will plot the polynomial solve numerically for the roots and plot the roots on a unit circle so that the user can see if they are inside or outside USAGE arma roots coeficients fplot T nplot 400 REQUIRED ARGUMENTS coeficients A vector of coefficinets in order coefficient of 58 B 1 B 2 OPTIONAL ARGUMENTS nplot Number of points used to plot the polynomial function VALUE The roots of the polynomial are returned in a vector but invisibly EXAMPLES arma roots c 1 5 4 arma roots c 0 2 arma roots c 5 9 5 6 E Examples of S plus and PlusTS Commands for Time Se ries and Simulation to Illustrat
15. did the add splus before you went into emacs Now you should have S plus running in one side of the emacs window To get help on an S plus command including iden and esti you can use the emacs S plus help facility Just type C c C v and give the name of the object when prompted if you put the cursor on test for the name that you want that name will be the default Here are some emacs Smode commands that you should find to be useful TAB complete an object name RET send off a command to S plus C c C c Break executing function that is taking too long M p Back up one command so it can be changed and resubmitted M n Next command M gt Put the cursor on S plus prompt at the end of the window C c C d Edit an object 44 Here are some regular emacs commands that you should find to be useful C x C c Exit emacs C x C s Save the emacs buffer C x C w Write the emacs buffer prompts for a new file name C v Scroll down one page M v Scroll up one page C x b Change buffer C x C f Read a file into a buffer C x u undo an unwanted change C s incremental search forward for something C r incremental reverse search for something C Space Set mark C d Delete next character C k Kill ie delete and save in kill ring everything on a line to the right of cursor C w Kill everything between mark and cursor Acts like cut in Windows C y Bring back i e yank the last thing killed and put it where ever cursor is when used wi
16. find the roots of the polynomial 1 1 6B 4B use the command gt arma roots c 1 6 4 re im dist 1 0 5495 0 0 5495 2 4 5495 0 4 5495 gt The roots are displayed graphically in Figure 5 showing that this polynomial has two real roots one of inside and one outside of the unit circle Thus if this were an MA AR polynomial the model would be nonivertible nonstationary g Example 3 2 To find the roots of the polynomial 1 5B 1B use the command 10 Coefficients 1 6 0 4 Polynomial Function versus B Roots and the Unit Circle i o wo N 4 5 amp 7 S o A ak Z El XY wo am y 4 amp N 10 8 6 4 2 0 4 2 0 2 4 B Real Part Figure 5 Plot of polynomial 1 1 6B 4B corresponding roots gt arma roots c 5 1 re im dist 12 5 1 9365 3 1623 2 2 5 1 9365 3 1623 gt The roots are displayed graphically in Figure 6 showing that this polynomial has two imaginary roots both of which are outside of the unit circle Thus if this were an MA AR polynomial the model would be invertible stationary g Example 3 3 To find the roots of the polynomial 1 5B 9B 1B 5B use the command gt arma roots c 5 9 1 5 re im dist 1 0 1275 0 8875 0 8966 2 0 1275 0 8875 0 8966 3 1 8211 0 0000 1 8211 4 1 3662 0 0000 1 3662 gt The roots are displayed graphically in Figure 7 showing that this polynomial has two real roots
17. first versions of functions iden and esti and the documentation for these functions Some material from that document has been incorporated into this document Allaa Kamal Alshawa made helpful comments on an earlier version of this document 38 A Setup for Using S plus on Project Vincent A 1l General The use of S plus requires some basic familiarity with interactive computing how to log into the UNIX system what keys to use on the terminal to delete characters and lines and how to create and remove a directory copy or move a file etc Elementary training is generally needed to learn about how to create edit copy move and delete files Project Vincent workshops are offered regularly by the ISU Computation Center and there is also a self guided on line Vincent tutorial S plus provides an integrated environment for data analysis numerical calculations graphics and related computations The S plus language is the medium through which all user computations and programming take place It provides a way of interacting directly with S plus and also allows users to extend S plus A 2 Setting up your Vincent environment Project Vincent allows many options for customizing your working environment Some of these options are discussed in the Vincent workshops These include e Setting up a default printer to which all of your printing will go e Setting up a default bin where laser printer output from the main laser printer in Durham hall
18. necessarily the best ones The read ts commands do not need to be reexecuted if the directory home wqmeeker stat451 Data has been attached in Splus fa i a a a a a a a a a eh re ee Se ee ee ee eS e airline data SSS eS ee ee he ee ee SSS Se Se Se ee airline d lt read ts file name airline dat path name home wqmeeker stat451 data title International Airline Passengers time units Year 63 series units Thousands of Passengers frequency 12 start year 1949 start month 1 iden airline d iden airline d gamma 0 iden airline d gamma 3 we will follow Box and Jenkins and use the log transformation iden airline d gamma 0 D 1 iden airline d gamma 0 d 1 iden airline d gamma 0 d 1 D 1 simple AR 1 model esti airline d gamma 0 model model pdq p 1 y range c 100 1100 seasonal models esti airline d gamma 0 model model pdq d 1 D 1 q 1 period 12 esti airline d gamma 0 model model pdq d 1 D 1 Q 1 period 12 esti airline d gamma 0 model model pdq d 1 D 1 q 1 Q 1 period 12 y range c 100 1100 The following command fits the additive airline mode also used by Box and Jenkins Because of problem with Splus and constrained parameters Splus provides no covariance matrix esti airline d gamma 0 model add airline model try some aternative transformations esti airline d gamma 0 3 model model pdq d 1 D 1 q 1 Q 1 period 12 y range c 100 1100 esti
19. options button press the left button then move the cursor to printing and press the left button this should bring up the printing options window Click on the far left hand side of the of the command line Ipr h r and hit the backarrow command enough times to clear the command Then type lprsave In its place Then use the left mouse button to click on the following in sequence Apply Save Close This sets up a special command that will save plots in a file Save ps instead of sending them one by one to the laser printer You should have to do this only once for each directory in which you run splus When you have a plot on the motif window that you want to save all you need to do is to click on the graph window and then click on the print button When you are done with your S plus session return to the unix prompt and then request printing of the Save ps file note the capital s in Save ps to get your graphs You can do this as follows 42 gt qQ vincent lpr P du139_1j4 S1 G subsidy_wqmeeker Save ps The S1 note capital S asks for one sided printing better for graphs In some cases you may have defaults set up for one or more of these options so that they have to be specified only if you wish to change an option Check in your environment file You can use the same command to print files other than Save ps just by changing Save ps to the file that you want to print After the Save ps file has
20. ple is used to compute the interval and then one addition al observation is generated and plotted If the additional observation does not fall in the prodiction interval the 54 interval is marked in red USAGE multiple pi sim sample sizes c 30 120 conlevs c 0 8 0 95 mu 100 sigma 10 number simulations 100 scale 2 REQUIRED ARGUMENTS None OPTIONAL ARGUMENTS sample sizes A vector of sample sizes Length 2 is recom mended Default is sample sizes c 10 40 conlevs A vector of confidence levels Length 2 is recommend ed Default is conlevs c 0 8 0 95 mu Mean of the distribution from which samples for the pred iction interval and additional observations are taken De fault is 100 Sigma Standard deviation of the distribution from which sam ples for the prediction interval and additional observa tions are taken Default is 100 number simulations Number of confidence intervals to be sam pled Default is 100 scale Scale factor that determines the scale of the common y axis of the plots The default is 2 5 A larger number would allow more white space around edges A smaller number might allow some of the interval endpoints to be outside of the plots describe any side effects if they exist DETAILS If the lengths of sample sizes and conlevs are both 2 the output is in the form of a 2 x 2 factorial that shows the effect that sample size and confidence level have on the width and probability of c
21. series data and to generate short term fore casts A univariate ARIMA model is an algebraic statement describing how observations on a time series are statistically related to past observations and past residual error terms from the same time series Time series data refer to observations on a variable that occur in a time sequence usually the observations are equally spaced in time and this is required for the Box Jenkins methods described in this document ARIMA models are also useful for forecasting data series that contain seasonal or periodic variation The goal is to find a parsimonious ARIMA model with the smallest number of estimated parameters needed to fit adequately the patterns in the available data As shown in Figure 3 a stochastic process model is used to represent the data generating process of interest Data and knowledge of the process are used to identify a model Data is then used to fit the model to the data by estimating model parameters After the model has been checked it can be used for forecasting or other purposes e g explanation or control 1 3 Strategy for finding an adequate model Figure 4 outlines the general strategy for finding an adequate ARIMA model This strategy involves three steps tentative identification estimation and diagnostic checking Each of these steps will be explained and illustrated in the following sections of this document Tentative Identification Estim
22. simstal3 dat path name home wqmeeker stat451 data title Simulated Time Series 13 series units Pressure frequency 10 start year 1 start month 1 simstai5 d lt read ts file name simstai5 dat path name home wqmeeker stat451 data title Simulated Time Series 15 series units Pressure frequency 10 start year 1 start month 1 iden simsta5 d esti simsta5 d model model pdq p 1 esti simsta5 d model model pdq p 2 iden simsta6 d esti simsta6 d model model pdq p 2 esti simsta6 d model model pdq p 3 esti simsta6 d model model pdq p 1 q 1 iden simsta7 d esti simsta7 d model model pdq q 2 esti simsta7 d model model pdq p 2 iden simsta8 d esti simsta8 d model model pdq p 1 esti simsta8 d model model pdq q 1 iden simstal1 d esti simstal1 d model model pdq q 1 60 esti simstal1 d model model pdq q 2 iden simsta13 d esti simstal3 d model model pdq p 1 esti simsta13 d model model pdq p 2 iden simsta15 d esti simsta15 d model model pdq p 1 esti simstal5 d model model pdq p 2 spot d lt read ts file name spot dat path name home wqmeeker stat451 data title Wolfer Sunspots series units Number of Spots frequency 1 start year 1770 start month 1 iden spot d esti spot d model model pdq p 1 esti spot d model model pdq p 2 esti spot d model model pdq p 3 esti spot d model model pdq p 16
23. specified time series model USAGE multiple acf sim model list ar 0 9 sample size vec c 30 60 120 number simulations 4 true F REQUIRED ARGUMENTS 52 OPTIONAL ARGUMENTS model A list containing two components names ar and ma These components should be vectors containing coefficianes of autoregressive and moving average parts of an ARMA model sample size vec A vector of sample sizes Length 3 is recom mended Default is sample size vec c 30 60 120 number simulations Number of simulations Default is number simulations 4 true Not ready yet REFERENCES Pankrantz A Forecasting with Univariate Box Jenkins Models EXAMPLES pure ar multiple acf sim model list ar 9 multiple acf sim model list ar c 9 5 multiple acf sim model list ar c 1 5 6 multiple acf sim model list ar c 1 5 9 pure ma multiple acf sim model list ma 9 multiple acf sim model list ma c 9 7 arma multiple acf sim model list ar 9 ma 9 multiple acf sim model list ar 9 ma 9 multiple acf sim model list ar 9 ma 9 multiple acf sim model list ar c 9 5 ma c 1 1 multiple ci sim Simulate a large number of confidence intervals and display results graphically DESCRIPTION USAGE This function will allow the user to observe the effect that confidence level and sample size have on the results of repeated sampling and construction of confidence inter vals when sampli
24. stage we use the observed residuals a to test hypotheses about the independence of the random shocks In time series analysis the observed residuals are defined as the difference between Z and the one step ahead forecast for Z see Section 7 In order to check the assumption that the random shocks a are independent we compute a residual ACF from the time series of the observed residuals aj 42 The formula for calculating the ACF of the residuals is n k DE t a Gtk Pr ei 12 ja t 1 where a is the sample mean of the a values Values of p that are importantly different from 0 i e statistically significant indicate a possible deviation from the independence assumption To judge statistical significance we use Bartlett s approximate formula to estimate the standard errors of the residual autocorrelations The formula is 1 2 k 1 Spa 4 25 _ 7 n j l Having found the estimated standard errors of p we can test approximately the null hypothesis Ho pr a 0 for each residual autocorrelation coefficient To do this we use a standard t like ratio re px 0 55 a In large samples if the true pg a 0 then this t like ratio will follow approximately a standard normal distribution Thus with a large realization if Ho is true there is only about a 5 chance of having t outside the range 2 and values outside of this range will indicate non zero autocorrelation
25. the standard deviation of the random shocks Plot of the residual ACF Plots of the residuals versus time and residuals versus fitted values Yv Vv Vv y 7y Plot of the actual and fitted values versus time and also forecasted values versus time with their 95 prediction interval gt Normal probability plot of residuals e Function plot true acfpacf computes and plots the true ACF and PACF functions for a specified model and model parameters e Function arma roots will graphically display the roots of a specified AR or MA defining polynomial It will plot the polynomial and the roots of the polynomial relative to the unit circle This will allow easy assessment of stationarity for AR polynomials and invertibility for MA polynomials e Function model pdq allows the user to specify in a very simple form an ARMA ARIMA or SARIMA model to be used as input to function esti e Function new model like tseries allows the user to specify an ARMA ARIMA or SARIMA model in a form that is more general than that allowed by the simpler function model pdq It allows the inclusion of high order terms without all of the low order terms e Function add text is useful for adding comments or other text to a plot e Function iden sim is designed to give users experience in identifying ARMA models based on simulated data from a randomly simulated model It is also possible to specify a particular model from which data are to be simulated
26. the three stages tentative identification estimation and forecasting of time series analysis The software is based on S plus S plus is a software system that runs under the UNIX and MS DOS operating systems on a variety of hardware configurations but only on Vincent at ISU 2 1 S plus concepts S plus is an integrated computing environment for data manipulation calculation and graphical display Among other things it has e An effective data handling and storage facility e A suite of operators for calculations on arrays and in particular matrices e A large collection of intermediate tools for data analysis e Graphical facilities for data display either at a terminal or on hard copy e A well developed simple and effective programming language which includes condi tionals loops user defined functions and input and output facilities Much of the S plus system itself is written in the S plus language and advanced users can add their own S plus functions by writing in this language This document describes a collection of such functions for time series analysis One does not however have to understand the S plus language in order to do high level tasks like using the functions described in this document 2 2 New S plus functions for time series analysis The following is a brief description of each of the S PlusTS functions written in the Splus language for time series Some of these functions are for data analysis and
27. to generate forecasts for future quarters These data will be used as one of the examples in this document g wo 2 I A A AL VE 3 N MAL H p Io RAAT NN Io LV HH 8 Ui MAA Eee 5 f 7 Y 2 oe re ee E an JN ee a H U f yoy o l U Q u T T T T T i T 1955 1957 1959 1961 1963 1965 1967 1969 Year Figure 1 Savings rate as a percent of disposable personal income for the years 1955 to 1979 Thousands of Passengers 300 L A Ap M AN ta w P N NN V T T T T T T T T 1949 1951 1953 1955 1957 1959 1961 100 Year Figure 2 Number of international airline passengers from 1949 to 1960 Ps Description PG Process ae Realization Generates Data Zi Zos Za As eee lhe Pat ge Statistical Model Inferences l I i i 1 for the Process te te Parameters ie vee 2e Forecasts l 1 Stochastic Process Model Control Wissing en Figure 3 Schematic showing the relationship between a data generating process and the statistical model used to represent the process Example 1 2 Figure 2 shows seasonal monthly time series data on the number of interna tional airline passengers from 1949 to 1960 These data show upward trend and a strong seasonal pattern g 1 2 Basic ideas Box and Jenkins 1969 show how to use Auto Regressive Integrated Moving Average ARIMA models to describe model time
28. 2 model lt model pdq p 2 mal model lt model pdq q 1 ma2 model lt model pdq q 2 imaii model lt model pdq d 1 q 1 armaii model lt model pdq p 1 q 1 airline model lt model pdq period 12 d 1 D 1 q 1 Q 1 50 REFERENCES Meeker W Q 1978 Tseries A User Oriented Computer Program for Time Series Analysis The American Statisti cian Vol 32 No 3 model like tseries DESCRIPTION USAGE This function is used to define an ARIMA model to be used in the arima mle function It uses notation like TSERIES Although the arguments are all listed under optional at least one of p q P or Q must be given to specify a model model like tseries d NULL p NULL q NULL D NULL period NULL P NULL Q NULL start values NULL OPTIONAL ARGUMENTS d period The number of regular differences carried out on the data If a value is not specified then the default value is d 0 The order of the Regular AutoRegressive Terms in the model If both the first and second order terms in the model are requested then p c 1 2 but if only the second order term is requested then p 2 The default value is p 0 The order of the Regular Moving Average Terms in the model If both the first and second order terms in the model are requested then q c 1 2 but if only the second order term is requested then q 2 The default value is q 0 The number of seasonal differences carried out on the data If a value is not specifi
29. 958 1958 1900 voor tme Range Mean Plot ACF 88 0 10 20 zo 5 m Ee PACF En a ES T T T T T T T id as so s2 s4 se se eo ee o 10 20 zo m Lag Figure 18 Plot of the logarithms of the airline data along with a range mean plot and plots of sample ACF and PACF 29 e The relationship between level e g mean and spread e g standard deviation of data about the fitted model e The shape of any trend curve and e The shape of the distribution of residuals We must be careful that a transformation to fix one problem will not lead to other problems The range mean plot only looks at the relationship between variability and level For many time series it is not necessary or even desirable to transform so that the correlation between ranges and means is zero Generally it is good practice to compare final results of an analysis by trying different transformations 8 3 Mean stationarity and differencing ARMA models are most useful for predicting stationary time series These models can how ever be generalized to ARIMA models that have integrated random walk type behavior Thus ARMA models can be generalized to ARIMA models that are useful for describing certain kinds but not all kinds of nonstationary behavior In particular an integrated ARIMA model can be changed to a stationary ARMA model using a differencing opera tion Then we fit an ARMA model to the differenced data Differencing involves calculating successive changes in t
30. Graphical Tools for Exploring and Analyzing Data From ARIMA Time Series Models William Q Meeker Department of Statistics Iowa State University Ames IA 50011 January 13 2001 Abstract S plus is a highly interactive programming environment for data analysis and graphics The S plus also contains a high level language for specifying computations and writing functions This document describes S plus functions that integrate and extend existing S plus functions for graphical display and analysis of time series data These new functions allow the user to identify a model estimate the model parameters and forecast future values of a univariate time series by using ARIMA or Box Jenkins models Use of these functions provides rapid feedback about time series data and fitted models These functions make it easy to fit check and compare ARIMA models Key Words S plus ARIMA model nonstationary time series data identification estima tion and diagnostic checking 1 Introduction 1 1 Time series data Time series data are collected and analyzed in a wide variety of applications and for many different reasons One common application of time series analysis is for short term forecast ing Example 1 1 Figure 1 shows seasonally adjusted quarterly time series data on savings rate as a percent of disposable personal income for the years 1955 to 1979 as reported by the U S department of Commerce An economist might analyze these data in order
31. MA models Section 9 extends the ARIMA models to Seasonal SARIMA models Section 10 shows how to fit SARIMA time series models to seasonal data Each of these sections use examples to illustrate the use of the new S plus functions for time series analysis At the end of this document there are a number of appendices that contain additional useful information In particular Appendix A gives information on how to set up your Vincent account to use the special time series functions if you have been to a Statistics 451 workshop you have seen most this material before Appendix B provides detailed instructions on how to print graphics from S plus on Vincent Again this information was explained in the Splus workshop Appendix C explains the use of the Emacs Smode that allows the use of S plus from within Emacs a very powerful tool Appendix D provides a brief description and detailed documentation on the special S plus functions that have been developed especially for Statistics 451 e Appendix E contains a long list of Splus commands that have been used to analyze a large number of different example real and simulated time series These commands also appear at the end of the Statistics 451 file splusts examples g Students can use the commands in this file to experiment with the different functions 2 Computer Software The S plus functions described in this document were designed to graphically guide the user through
32. airline d gamma 1 model model pdq d 1 D 1 q 1 Q 1 period 12 y range c 100 1100 ohio data ohio d lt read ts file name ohio dat path name home wqmeeker stat451 data title Ohio Electric Power Consumption time units Year series units Billions of Kilowatt hours frequency 12 start year 1954 start month 1 iden ohio d will single differencing help iden ohio d d 1 we will now see that even multiple differencing will not help iden ohio d d 2 iden ohio d d 3 iden ohio d d 4 what we need is a seasonal diffeence iden ohio d D 1 iden ohio d d 1 D 1 now we will try to estimate some models esti ohio d model model pdq D 1 p 2 period 12 esti ohio d model model pdq D 1 p 2 Q 1 period 12 esti ohio d model model pdq D 1 p 2 q 1 Q 1 period 12 esti ohio d model model pdq D 1 p 1 q 1 Q 1 period 12 try the airline model esti ohio d model model pdq period 12 d 1 D 1 q 1 Q 1 now try a log transformation iden ohio d gamma 0 iden ohio d gamma 0 d 1 D 1 try the airline model with logs saving the output structure to recover the residuals and other stuff additional residual analysis ohio model6 out lt esti ohio d gamma 0 model model pdq period 12 d 1 D 1 q 1 Q 1 now can use ordinary Splus commands to look at residuals tsplot ohio model6 out resid investigate the acf spike at 10 show acf ohio model6 out resid 65
33. al autoregressive AR operator Og B 1 OsB O25B OggB is the Pth order period S seasonal moving average MA operator The variable a in equation 11 is again the random shock element assumed to be inde pendent over time and normally distributed with mean 0 and variance g2 The lower case letters p d q indicate the nonseasonal orders and the upper case letters P D Q indicate the seasonal orders of the model The general seasonal model can also be written an unscrambled form as Zt Pi Z1 1 b2Z4 2 Qpr Zt p Oai 030 2 Oj Qt ge At where the values of coefficients like and 6 need to be computed from the expansion including differencing operators of 11 For example if D 1 d 1 q 1 Q 1 and S 12 we have 1 B 1 BEZ 1 01 B 1 O12B ja leading to Zi Diar Zi 12 Zr 13 P1ar 1 O12012 010120413 at where it is understood that 1 dj2 1 f 1 and all other j 0 This model is a nonstationary all 13 roots of the AR defining polynomial for this model are on the unit circle seasonal model 9 2 Seasonal ARIMA Model Properties Because the SARIMA model as well as the model for differences of an SARIMA model the can be written in ARMA form the methods outlined in Section 3 can be used to compute the true properties of stationary SARMA models Example 9 1 If after differ
34. and two imaginary roots The imaginary roots are inside the unit circle Thus if this were an MA AR polynomial the model would be nonivertible nonstationary g 11 Function Figure 6 Plot of polynomial 1 5B 1B corresponding roots Function Figure 7 Plot of polynomial 1 1 2 0 6 0 8 0 4 40 20 60 80 100 Coefficients 0 5 0 1 Polynomial Function versus B Imaginary Part Roots and the Unit Circle a a Real Part Coefficients 0 5 0 9 0 1 0 5 Polynomial Function versus B Roots and the Unit Circle Ee ze Ak i S p Veg I 5 i a P ae Ti ST 4 2 0 2 1 0 1 B Real Part 5B 9B 1B3 12 HB corresponding roots 3 3 Model mean The mean of a stationary AR p model is derived as uz E Z EO iZ e ag at B 0o iB Z4 1 PpEl Zip Elu bo 91 bp E Zt 8 io Because E 6 a _ 0 for any k it is easy to show that this same expression also gives the mean of a stationary ARMA p q model 3 4 Model variance There is no simple general expression for the variance of a stationary ARMA p q model There are however simple formulas for the variance of the AR p model and the MA p model The variance of the AR p model is Ta yo o Va Z EZ _ PiP PpPp where Z Z E Zt and pi pp are autocorrelations see Section 3 5
35. ant to use into this directory You only need to give the stat451 course setup command once A 3 Getting into S plus Every time you login to Vincent and want to run S plus for Statistics 451 you will need to do the following vincent cd stat451 vincent splus session setup vincent Splus S PLUS Copyright c 1988 1993 Statistical Sciences Inc S Copyright AT amp T Version 3 1 Release 1 for DEC RISC ULTRIX 4 x 1993 Working data will be in Data gt Now you can give Splus commands for example gt motif this gets the graphics window which you may want to move to another place on the screen gt iden simsta5 d gt iden simsta6 d simsta5 d and simsta6 d are data sets in the stat451 directory that has been set up for you you can get hard copies of the plots as described below gt q0 this is how you get out of S PLUS vincent Note that you have to give the add wqmeeker before you can use any of the special stat 451 commands and you have to give the add splus before you can use splus 40 Here is some more information about the use of S plus In an interactive session with S plus when you type the expressions S plus does the computations A simple example is gt 3 11 5 2 3 1 41 4 Alternatively you can assign the output of an expression to an object with a name that you specify as in gt my number lt 3 11 5 2 3 gt my number 1 41 4 gt When the value of an ex
36. art year and start month specifying the seasonality are given in a manner that is analogous to monthly data q Example 2 3 To read the airline passenger monthly data introduced in Example 1 2 use the following command airline d lt read ts file name home wqmeeker stat451 data airline dat title International Airline Passengers series units Thousands of Passengers frequency 12 start year 1949 start month 1 time units Year This command creates an S plus object airline d g Example 2 4 To read the AT amp T common stock closing prices for the 52 weeks of 1979 use the command att stock d lt read ts file name home wqmeeker stat451 data att_stock dat title 1979 Weekly Closing Price of ATT Common Stock series units Dollars frequency 1 start year 1 start month 1 time units Week This command creates an S plus object att stock d Note that these data have no natural seasonality Thus the inputs for frequency start year start month are just given the value 1 g 3 ARMA Models and Properties A time series model provides a convenient hopefully simple mathematical probabilistic description of a process of interest In this section we will describe the class of models known as AutoRegressive Moving Average or ARMA models This class of models is par ticularly useful for describing and short term forecasting of stationary time series processes The ARMA class of models also provides a useful starting
37. asting and Control 2nd ed San Francisco Holden Day Brown R G 1962 Smoothing Forecasting and Prediction of Discrete Time Series Prentice Hall New Jersey Chatfield C 1989 The Analysis of Time Series An Introduction London Chapman and Hall Jenkins G M 1979 Practical Experiences with Modeling and Forecasting Time Series Gwilym Jenkins amp Partners Ltd Pankratz A 1983 Forecasting with Univariate Box Jenkins Models Concept and Cases John Wiley and Sons Inc Spector P 1994 An Introduction to S and S plus Belmont CA Duxbury Press Statistical Sciences 1993 S plus User s Manual Statistical Sciences 1993 S plus Reference Manual Venables W 1992 Notes on S A Programming Environment for Data Analysis and Graphics Dept of Statistics The University of Adelaide Wei W S 1989 Time Series Analysis Redwood City CA Addison Wesley 73
38. ation Diagnostic Checking Use the Model Time Series Plot Range Mean Plot ACF and PACF Least Squares or Maximum Likelihood Residual Analysis and Forecasts Forecasting Explanation Control Figure 4 Flow diagram of iterative model building steps 1 4 This Overview document is organized as follows Section 2 briefly outlines the character of the S plus software that we use as a basis for the graphically oriented analyses described in this document Section 3 introduces the basic ARMA model and some of its properties including a model s true autocorrelation ACF and partial autocorrelation functions PACF Section 4 introduces the basic ideas behind tentative model identification showing how to compute and interpret the sample autocorrelation ACF and sample partial autocorrelation functions PACF Section 5 explains the important practical ideas behind the estimation of ARMA model parameters Section 6 describes the use of diagnostic checking methods for detecting possible problems in a fitted model As in regression modeling these methods center on residual analysis Section 7 explains how forecasts and forecast prediction intervals can be generated for ARMA models Section 8 indicated how the ARMA model can be used indirectly to model certain kinds of nonstationary models by using transformations e g taking logs and differ encing of the original time series leading to ARI
39. been printed you should delete it so that you do not print the same graphs again next time Do this by giving the following command rm Save ps C Emacs Smode Emacs is a powerful editor An interface known as a Smode has been written to allow S plus to run from within emacs You do not have to use S plus inside emacs but experienced users find that such use has important advantages over the standard mode of operation Once you learn a few simple commands emacs becomes very easy to use When you type regular text to emacs the text goes into the file that you are editing When you are on a workstation the mouse is the easiest way to move the cursor There are special commands to delete text move text read in files write files etc Commands are sent with control characters using the control key and meta characters using the ALT key or the escape key The ALT key works like the shift or control key e g to get meta C or M c hold down ALT and touch the C key The ESC key works differently To get meta C or M c touch ESC and then touch the C key Emacs has many commands but you only need a few of them to operate effectively This section assumes that you have had some experience with emacs and understand a few of the basic emacs commands Running S plus from with emacs has the following advantages e You can easily edit and resubmit previous commands e You can scroll back in the S plus buffer to see everything t
40. clear if differencing should be used or not savings rate d lt read ts file name savings_rate d path name home wqmeeker stat451 data title US Savings Rate for 1955 1980 series units Percent of Disposable Income frequency 4 start year 1955 start month 1 iden savings rate d esti savings rate d model model pdq p 1 esti savings rate d model model pdq p 1 q 2 62 esti savings rate d model new model like tseries p 1 q 2 esti savings rate d model model pdq p 2 esti savings rate d model model pdq p 3 iden savings rate d d 1 esti savings rate d model model pdq p 1 d 1 q 1 esti savings rate d model model pdq p 3 d 1 q 1 set up a bunch of models so that we do not have to type the long strings for the more complicated ones ari model lt model pdq p 1 ar2 model lt model pdq p 2 mai model lt model pdq q 1 ma2 model lt model pdq q 2 ima11 model lt model pdq d 1 q 1 arma11 model lt model pdq p 1 q 1 airline model lt model pdg period 12 d 1 D 1 q 1 Q 1 add airline model lt new model like tseries period 12 d 1 D 1 q c 1 12 13 sub1 airline model lt model pdq period 12 d 1 D 1 Q 1 sub2 airline model lt model pdq period 12 d 1 D 1 q 1 SSS SS a SS a a a a Se a SS Se Se eS eS ee eS SS ee ee The following commands illustrate identification and estimation forecasting runs for several seasonal time series The models shown are not
41. del to the d 1 D 1 differenced log airline data The commands used to fit this model were gt airline model lt model pdq d 1 D 1 q 1 Q 1 period 12 gt esti airline d gamma 0 model airline model In this case because the model is a little more complicated we have created a separate S plus object to contain the model and this object is specified directly to the esti command All of the diagnostic checks look good in this example suggesting that this model will provide adequate forecasts for future values of the time series as long as there are no major changes in the process g 10 3 Forecasts for SARIMA models After expanding an SARMA model into its unscrambled form see Section 9 1 one can apply directly the forecasting methods given in Section 7 to compute forecasts forecast standard errors and prediction intervals Example 10 3 Figure 26 shows forecasts and prediction intervals for the airline model fitted to the airline passenger data q 37 International Airline Passengers ponent 1 ndif 1 ma 1 Component 2 period 12 nde 1 ma 1 on we log Thousands of Passengers Actual Values Fitted Values Future Values 95 Prediction Interval Index Normal Probability Plot Residuals Figure 26 Second part of the output for the fitting of the SARIMA 0 1 1 x 0 1 1 model to the d 1 D 1 differenced log airline data Acknowledgements Dilek Tali wrote the
42. e Statistical Concepts Se a EEEE stationary examples Ree eee ee ee ee a ee SS a eS eS a a ee The following commands illustrate reading identification and estimation forecasting runs for simulated stationary time series The models shown are not necessarily the best ones The read ts commands do not need to be reexecuted if the directory home wqmeeker stat451 Data has been attached in Splus options echo T pscript simsta5 d lt read ts file name simsta5 dat path name home wqmeeker stat451 data title Simulated Time Series 5 series units Sales frequency 7 start year 1 start month 1 simsta6 d lt read ts file name simsta6 dat path name home wqmeeker stat451 data title Simulated Time Series 6 series units Sales frequency 5 start year 21 start month 1 simsta7 d lt read ts file name simsta7 dat path name home wqmeeker stat451 data title Simulated Time Series 7 59 series units Deviation frequency 5 start year 21 start month 1 simsta8 d lt read ts file name simsta8 dat path name home wqmeeker stat451 data title Simulated Time Series 8 series units decibles frequency 10 start year 1 start month 1 simstali d lt read ts file name simstal1 dat path name home wqmeeker stat451 data title Simulated Time Series 11 series units Deviation frequency 10 start year 1 start month 1 simsta13 d lt read ts file name
43. ed model and its parameters At the beginning of the identification stage the model is unknown so we tentatively select one or more ARMA models by looking at graphs of the sample ACF and PACF computed from the available data Then we choose a tentative model whose associated true ACF and PACF look like the sample ACF and PACF calculated from the data To choose a final model we proceed to the estimation and checking stages returning to the identification stage if the tentative model proves to be inadequate 4 1 Sample autocorrelation function The sample ACF is computed from a time series realization Z1 Z2 Zn and gives the observed correlation between pairs of observations Z Zt separated by various time spans k 1 2 3 Each estimated autocorrelation coefficient p is an estimate of the corresponding parameter pz The formula for calculating the sample ACF is n k DS Zi DZ Z ns Dz t l One can begin to make a judgment about what ARIMA model s might fit the data by examining the patterns in the estimated ACF The iden command automatically plots the specified time series and provides plots of the sample ACF and PACF 18 US Savings Rate for 1955 1980 w Percent of Disposable Income 1960 Range Mean Plot ACF Figure 12 Plot of savings rate data along with a range mean plot and plots of sample ACF and PACF 4 2 Sample partial autocorrelatio
44. ed then the default value is D 0 The seasonal period The order of the Seasonal AutoRegressive Terms in the model If both first and second order terms in the model are requested then P c 1 2 but if only the second order term is requested then P 2 The default value is P 0 The order of the Seasonal Moving Average Terms in the model If both first and second order terms in the model 51 are requested then Q c 1 2 but if only the second order term is requested then Q 2 The default value is Q 0 VALUE The specified model structure is produced SEE ALSO model pdq arima mle esti EXAMPLES arl model lt model like tseries p 1 ar2 model lt model like tseries p c 1 2 mai model lt model like tseries q 1 ma2 model lt model like tseries q c 1 2 armait model lt model like tseries p 1 q 1 airline model lt model like tseries period 12 d 1 D 1 q 1 Q 1 add airline model lt model like tseries period 12 d 1 D 1 q c 1 12 13 REFERENCES Meeker W Q 1978 Tseries A User Oriented Computer Program for Time Series Analysis The American Statisti cian Vol 32 No 3 multiple acf sim Simulate a number of realizations with different sample sizes from a specified time series model and display the sample ACF functions graphically DESCRIPTION This function will allow the user to observe the effect that sample size has on the results of sample ACF func tions from simulated realizations from a
45. el like tseries EXAMPLES esti airline d gamma 0 m 2 model airline model DESCRIPTION This function produces a variety of graphical devices to help to identify a tentative ARIMA model The plots pro 47 duced are plot of the original or transformed data a range mean plot of the transformed data and plots of the ACF and PACF USAGE iden data d seasonal tsp data d time series 3 gamma 1 m 0 d 0 D 0 column 1 lag max 38 period 10 seasonal label F REQUIRED ARGUMENTS data d The output from the read ts function OPTIONAL ARGUMENTS seasonal The number of observations in a period for the time series By default this value is automatically taken from the time series data set gamma The transformation power parameter If no transforma tion is desired then gamma 1 which is the default value If a log transformation is needed then use gamma 0 all other transformations are in the form of z m gamma m The constant that is added to the response variable before the data is transformed The default value is m 0 d The number of regular differences carried out on the data If a value is not specified then the default value is d 0 D The number of seasonal differences carried out on the data If a value is not specified then the default value is D 0 column Specifies the number of observations in the interval for the range mean plot The default is period 10 lag max The maximum number of lags at
46. encing a seasonal model is W Orar1 O12012 O10 1204 13 Qt the true ACF and PACF are easy to derive Plots and tabular values of the true ACF and PACF functions can be obtained using the plot true acfpacf command For example if 0i 3 and O1 te then gt plot true acfpacf model list ma c 3 0 0 0 0 0 0 0 0 0 0 0 7 21 The output from this command is shown in Figure 24 q 10 Analysis of Seasonal Time Series Data Fitting SARIMA models to seasonal data is similar to fitting ARMA and ARIMA models to nonseasonal data except that there are more alternative models from which to choose and thus the process is somewhat more complicated 35 True ACF ma 0 3 0 0 0 0 0 0 0 0 0 0 0 0 7 0 21 u G o L L oO o 2 E 2 0 5 10 15 20 Lag True PACF ma 0 3 0 0 0 0 0 0 0 0 0 0 0 0 7 0 21 2 W O o L o g l 2 2 Lag Figure 24 True ACF and PACF for a multiplicative seasonal MA model with 01 3 and Oy 7 10 1 Identification of seasonal models As described in Section 4 for ARMA models identification of SARIMA models begins by looking at the ACF and PACF functions As illustrated in Section 8 1 it may be necessary to transform e g take logs the realization first Then as described and illustrated in Sections 8 5 and 8 5 it may be necessary to use differencing to make a time series approx imately stationary in the mean At this point
47. er sink When running inside Emacs there is no need to use the sink command One can save the entire current buffer for later editing by using the C x C s Emacs command e To create an S plus data object from data available in one of your directories use the scan for example Al gt data lt matrix scan data file ncol 5 byrow T reads in a matrix with 5 columns e To see a list of S plus objects currently in you Data directory use the command objects To remove objects permanently use the rm function for example gt rm x y data e When S plus looks for an object it searches through a sequence of directories known as the search list Usually the first entry in the search list is the Data subdirectory of your current S plus working directory The names of the directories currently on the search list can be seen by executing the function gt search B Instructions for Printing Graphics with S plus In order to get hard copy of splus graphics from the motif window you can do the follow ing Get into S plus and give the motif command After the motif window comes up put the cursor on options and press the left button Then move the cursor to color scheme and press the the left button This should bring up the color scheme window Then use the left mouse button to click on the following in sequence Cyan to Red Apply Save Close This sets up your color system Then put the cursor on the
48. hat you have done in the session unless you use editing commands to get rid of some of the history you can use the sequence C x h C w to clear the buffer of all history e The emacs buffer in which you are running S plus save all of your output You can then write the buffer to a file using C x C w or C x C s and then edit the results to be suitable for printing or for input into a word processor You do not have to use the sink file command e The emacs S plus help C c C v is easier to use To use Splus within emacs do the following 1 Give the command vincent emacs workshop q amp 43 This will bring up an emacs window with a file containing run s script file that contains a collection of examples without having to type these commands you can use the X windows cut and paste facility to pick up commands from these examples and deliver then directly to S plus 2 With the cursor in the emacs window give the command C x 2 while pressing control press x then release the control and press 2 This will split the window into two buffers 3 Now we will fire up Splus in one of the windows choose the one you want with the cursor Give the command vincent M x S while holding down the ALT key press x once then press S then press Return Emacs will ask you which directory and give you a default Just press Return to get the default it is what you want If it does not work make sure that you
49. he values of a data series 8 4 Regular differencing To difference a data series we define a new variable W which is the change in Z from one time period to the next i e Wi 1 B Zi 4 2 1 A E dd 9 This working series W is called the first difference of Z We can also look at differencing from the other side In particular rewriting 9 and using successive resubstitution i e using Wii Zt 1 Zt 2 gives ZA Zia Wi Zia Wi 1 Wi Zi 3 Wia Wi 1 Wie Zia Wiz Wia Wii Wi and so on This shows why we would say that a model with a 1 B differencing term is integrated If the first differences do not have a constant mean we might try a new W which will be the second differences of Z i e Wi Zi Zi 44 1 Ae Zt 2Zt1 Z2 t 3 4 n 30 1979 Weekly Closing Price of ATT Common Stock we Dollars 52 54 56 58 60 Range Mean Plot ACF Figure 19 Output from function iden for the weekly closing prices for 1979 AT amp T common stock Using the back shift operator as shorthand 1 B is the differencing operator since 1 B Z Z Z1 Then in general W 1 BZ is a dth order regular difference That is d denotes the number of nonseasonal differences For most practical applications d 0 or 1 Sometimes d 2 is used but values of d gt 2 are rarely usef
50. ime causing the coefficients e g bj and 81 in the assumed model change in value If this happens forecasts based on a model fitted to the entire data set are less accurate than they could be One way to check a model for this problem is to fit the chosen model to subsets of the available data e g divide the realization into two parts to see if the coefficients change importantly 23 US Savings Rate for 1955 1980 Model Component 1 ar posable Income Mt 1 ar 1 on w Percent of Residuals vs Time Residuals vs Fitted Values Residual ACF ACF Fitted Values Figure 15 First part of the esti output for the AR 1 model and the savings rate data e The forecasts computed from a model can themselves be useful as a model checking diagnostic Forecasts that seem unreasonable relative to past experience and expert judgment should suggest doubt about the adequacy of the assumed model Example 6 1 Continuing with Example 4 1 for the savings rate data the command gt esti savings rate d model model pdq p 1 does estimation provides output for diagnostic checking and provides a plot of the fitted data and forecasts all using the AR 1 model suggested by Figure 12 The graphical output from this command is shown in Figures 15 and 16 Abbreviated tabular output from this command is ARIMA estimation results Series y trans Model Component 1 ar 1 AICc 217 3166 2 Log Likelih
51. ion If you want to create a new data structure say containing the second half of the realization use airline2 d lt data subset airline d first 73 The following command does a probability plot simulation run it several times what do you notice multiple probplot sim normal data is the default distribution multiple probplot sim distribution exponential multiple probplot sim distribution uniform multiple probplot sim distribution cauchy 69 The following command does a confidence interval simulation Execute the command several times what do you notice You might want to stop and restart Splus after running this function if things slow down for you multiple ci sim The following commands do sample ACF plot simulations run each one several times what do you notice pure ar multiple acf sim model list ar 9 multiple acf sim model list ar 9 multiple acf sim model list ar c 9 5 multiple acf sim model list ar c 1 5 6 multiple acf sim model list ar c 1 5 9 pure ma multiple acf sim model list ma 9 multiple acf sim model list ma c 9 7 arma multiple acf sim model list ar 9 ma 9 multiple acf sim model list ar 9 ma 9 multiple acf sim model list ar 9 ma 9 multiple acf sim model list ar c 9 5 ma c 1 1 the following command will choose a model and parameters simulate a time series and provide identification output realizati
52. lp ver ify and fit an ARIMA model to time series data USAGE esti data d gamma 1 m 0 seasonal tsp data d time series 3 model 46 gof lag 10 lag max 38 number forecasts 24 pred level 0 95 seasonal label F REQUIRED ARGUMENTS data d The output from the read ts function model A list specifying an ARIMA model Same as input to arima mle See functions model pdq and model like tseries OPTIONAL ARGUMENTS gamma The transformation power parameter If no transforma tion is desired then gamma 1 which is the default value If a log transformation is needed then use gamma 0 all other transformations are in the form of ztm gamma m The constant that is added to the response variable before the data is transformed The default value is m 0 seasonal The number of observations in a period for the time series By default this value is automatically taken from the time series data set lag max The maximum number of lags at which to estimate the autocovariance The default value is lag max 38 number forecasts Specifies the number of observations back from the end of the data set that the forecasts are to be gin Default value is number forecasts 24 pred level Specifies the prediction confidence level Default value is pred level 0 95 VALUE This function produces a table containing some statistics and series of plots SEE ALSO arima mle arima diag arima forecast acf my acf plot model pdq mod
53. mma 0 read in the complete ozone data including dummy variables ozone xmat lt matrix scan home wqmeeker stat451 data ozoregr data ncol 4 byrow T get the regression x matrix corresponding the the box tiao intervention model ozo xreg lt cbind ozone xmat 2 my cumsum ozone xmat 3 12 my cumsum ozone xmat 4 12 no transformation esti ozone d model model pdq D 1 q 1 Q 1 period 12 y range c 0 1 9 xreg in ozo xreg log transformation esti ozone d model model pdq D 1 q 1 Q 1 period 12 y range c 0 1 9 xreg in ozo xreg gamma 0 gasr xymat lt matrix scan home wqmeeker stat451 data gasrxy data ncol 2 byrow T gasrx d lt read ts file name gasrx dat path name home wqmeeker stat451 data title Coded Gas Rate input series units Std Cu ft sec frequency 1 start year 1 start month 1 gasry d lt read ts file name gasry dat path name home wqmeeker stat451 data title C02 output series units Percent C02 frequency 1 67 start year 1 start month 1 univariate identification iden gasrx d iden gasry d univariate estimation esti gasrx d model model pdq p 3 y range c 4 5 4 5 esti gasrx d model model pdg p 4 y range c 4 5 4 5 esti gasrx d model model pdq p 5 y range c 4 5 4 5 ar 4 looks good on the input esti gasry d model model pdg p 2 y range c 45 65 esti gasry d model model pdg p 3 y range c 45
54. model building Others provide properties of specified models There are also S PlusTS functions that use simulation to provide experience in model building and interpretation of graphical output Definition of terms instructions for use and examples are given in the following sections Detailed documentation is contained in Appendix D e Function read ts is used to read a time series into S plus and construct a data structure containing information about the time series This data set is then used as input to the iden and esti functions e Function iden provides graphical and tabular information for tentative identifica tion of ARMA ARIMA and SARIMA models For a particular data set specified transformation default is no transformation and amount of differencing default is no differencing the function provides the following gt Plot of the original data or transformed data p gt If there is no differencing range mean plot of the transformed data which is used to check for nonstationary variance gt Plots of the sample ACF and PACF of the possibly transformed and differenced time series e Function esti for a specified data set and model estimates the parameters of the specified ARMA ARIMA or SARIMA model and provides diagnostic checking output and forecasts in graphical form Graphical outputs and statistics given are Parameter estimates with their approximate standard errors and t like ratios Estimated variance and
55. n function The sample PACF bu 2 gives the correlation between ordered pairs Zi Zt sep arated by various time spans k 1 2 3 with the effects of intervening observations Zt 1 Z442 Zt k 1 removed The sample PACF function can be computed as a func tion of the sample ACF The formula for calculating the sample PACF is bu an k 1 Pk So rn Pk j kk s k 2 3 4 k 1 pe 5 Pk 15j pj j l where bkj Pk 1 5 PeePe ik j K 3 4 5 j 1 2 k 1 This method of computing the sample PACF is based on the solution of a set of equations known as the Yule Walker equations that give 1 p2 pk aS a function of 1 d2 bp Each estimated partial autocorrelation coefficient dz is an estimate of the corresponding model based true PACF jp computed as shown in 3 Example 4 1 Figure 12 shows iden output for the savings rate data along with a range mean plot that will be explained later and plots of sample ACF and PACF The command used to generate this output was 19 Wolfer Sunspots w Number of Spots 100 50 Range Mean Plot ACF Figure 13 Plot of sunspot data along with a range mean plot and plots of sample ACF and PACF gt iden savings rate d The sample ACF and PACF plots show that the ACF dies down and that the PACF cuts off after one lag This matches the pattern of an AR 1 model and suggests this m
56. n outlier for all fitted models It useful perhaps even important to at some point to repeat the analysis with the final model adjusting this outlier to see if such an adjustment has an important effect on the final conclusion s As seen in Figure 16 the forecasts always lag the actual values about 1 time period This suggests that the model might be improved The residual ACF function has some large spikes a low lags indicating that there is some evidence that the AR 1 model does not adequately explain the correlational structure This suggests adding another term or terms to the model One might for example try an ARMA 1 1 model next Except for the outlier the normal probability plot indicates that the residuals seem to follow approximately a normal distribution 7 Forecasts for ARMA models An important application in time series analysis is to forecast the future values of the given time series A forecast is characterized by its origin and lead time The origin is the time from which the forecast is made usually the last observation in a realization and the lead time is the number of steps ahead that the series is forecast Thus the forecast of the future observation Zj made from origin t going ahead time periods is denoted by Zil Starting with the unscrambled ARMA model in 2 an intuitively reasonable forecast can be computed as Zil 0o o1 Ze 141 b2 Ze 241 plZt pril 5 Orla 92 ar 241
57. ng is from the same population multiple ci sim sample sizes c 10 40 conlevs c 0 8 0 95 mu 100 sigma 10 53 number simulations 100 scale 2 5 REQUIRED ARGUMENTS None OPTIONAL ARGUMENTS sample sizes A vector of sample sizes Length 2 is recom mended Default is sample sizes c 10 40 conlevs A vector of confidence levels Length 2 is recommend ed Default is conlevs c 0 8 0 95 mu Mean of the distribution from which samples are taken sigma Standard deviation of the distribution from which sam ples are taken number simulations Number of confidence intervals to be sam pled DETAILS If the lengths of sample sizes and conlevs are both 2 the output is a 2 x 2 factorial that shows the effect that sample size and confidence level have on the width and probability correctness of a sequence of confidence inter vals Each time the function is executed a different set of intervals will be generated EXAMPLES multiple ci sim multiple ci sim sample sizes c 10 40 conlevs c 0 95 0 99 multiple ci sim sample sizes c 30 120 conlevs c 0 95 0 99 multiple pi sim Simulate a large number of prediction intervals and display results graphically DESCRIPTION This function will allow the user to observe the effect that confidence level and sample size have on the results of repeated sampling and construction of prediction inter vals when sampling is from the same population The sam
58. odel as a good starting point for modeling the savings rate data g 4 3 Understanding the sample autocorrelation function As explained in Section 4 1 the sample ACF function plot show the correlation between observations separated by k 1 2 time periods For another example Figure 13 gives the iden output from the Wolfer sunspot time series In order to better understand an ACF or to investigate the reasons for particular spikes in a sample ACF it is useful to obtain a visualization of the individual correlations in an ACF Figure 14 was obtained by using the command gt show acf spot d 5 Estimation Computer programs are generally needed to obtain estimates of the parameters of the ARMA model chosen at the identification stage Estimation output can be used to compute quantities that can provide warning signals about the adequacy of our model Also at this stage the results may also indicate how a model could be improved and this leads back to the identification stage 20 Lag 1 Lag 2 Lag 3 Lag 4 Figure 14 Plot of the show acf command showing graphically the correlation between sunspot observations separated by k 1 2 12 years The basic idea behind maximum likelihood ML estimation is to find for a given set of data and specified model the combination of model parameters that will ma
59. on plot range mean plot plots of the sample acf and pacf The user will have an option to ask for another realization of the same size If a realization size is not provided in the initial call the user is prompted to input one 70 iden sim Instead of getting a random model you can also request that the simulation proceed with a specified model as in iden sim model list ar 9 iden sim model list ar 9 iden sim model list ar 9 ma 91 true acf pacf computations The following commands no need to run more than once you would always get the same answer nonseasonal models pure ar plot true acfpacf model list ar 9 plot true acfpacf model list ar 9 plot true acfpacf model list ar c 9 6 plot true acfpacf model list ar c 1 5 6 plot true acfpacf model list ar c 1 5 6 pure ma models plot true acfpacf model list ma 9 plot true acfpacf model list ma c 9 7 arma mixed models plot true acfpacf model list ar 9 ma 9 plot true acfpacf model list ar 9 ma 9 plot true acfpacf model list ar c 9 5 ma c 1 1 HHH HMA seasonal models wt 1 5B 1 9B712 a_t plot true acfpacf model list ma c 5 0 0 0 0 0 0 0 0 0 0 0 9 45 nacf 38 wt 1 1B 1 9B712 a_t plot true acfpacf model list ma c 1 0 0 0 0 0 0 0 0 0 0 0 9 09 nacf 38 wt 1 1B 1 9B712 a_t plot true acfpacf model list ma c 1 0 0 0 0
60. one can use the ACF and PACF to help identify a tentative model for the transformed differenced series Example 10 1 In Example 8 5 it was suggested that the transformed differenced time series W 1 B 1 B log Airline Passengers shown in Figure 23 appeared to be stationary Looking at the importantly large spikes in the sample ACF and sample PACF functions suggests a model Wi 0iat 1 O12012 O19 1244 13 ar 10 2 Estimation and diagnostic checking for SARIMA models As with fitting ARMA and ARIMA models fitting SARIMA for a specified model is straightforward as long as the model is not too elaborate and was suggested by the identifi cation tools having too many or unnecessary parameters in a model can lead to estimation 36 International Airline Passengers ponent 2 period 12 ndifi 1 Residuals vs Time Residuals vs Fitted Values Residual ACF ACF Fitted Values Figure 25 First part of the output for the fitting of the SARIMA O 1 1 x 0 1 1 model to the d 1 D 1 differenced log airline data difficulties The diagnostic checking for SARIMA models is also similar except with sea sonal models it is necessary to look carefully at the ACF function as lags that are multiples of S to look for evidence of model inadequacy Example 10 2 Figures 25 and 25 show the graphical output from the esti command when used to fit the SARIMA O 1 1 x 0 1 1 mo
61. ood 215 3166 S 0 6881749 Parameter Estimation Results MLE se t ratio 95 lower 95 upper ar 1 0 8146081 0 05715026 14 25379 0 7025935 0 9266226 Constant term 6 167464 Standard error 0 06780789 24 US Savings Rate for 1955 1980 Model Component 1 ar an w Percent of Disposable Income Actual Values Fitted Values Future Values 95 Prediction Interval nt of Disposable Income 6 8 Index Normal Probability Plot Residuals Figure 16 Second part of the esti output for the AR 1 model and the savings rate data t ratio 90 95497 Ljung Box Statistics dof Ljung Box p value 5 10 11577 0 07202058 10 83296 0 09367829 11 26102 0 12763162 11 35323 0 18247632 12 48835 0 18715667 Oo ON OD ACF Lag ACF se t ratio 1 0 15321802 0 09853293 1 55499303 0 26128739 0 10081953 2 59163466 0 03345993 0 10719249 0 31214809 0 03554756 0 10729385 0 33131035 0 01055575 0 10740813 0 09827701 oP WN BE oP W N Forecasts Lower Forecast Upper 25 1 2 645728 3 994526 5 343324 2 2 657691 4 397372 6 137053 3 2 769109 4 725533 6 681956 4 2 904987 4 992855 7 080724 5 3 039914 5 210618 7 381323 The output shows several things Overall the model fits pretty well There is an outlying observation It would be interesting to determine the actual cause of the observation One can expect that unless something special is done to accommodate this observation that it will remain a
62. orming This is especially useful when there are some values of Zj that are negative 0 or even close to 0 This is because the log of a nonpositive number or a non integer power of a negative number is not defined Moreover transformation of numbers close to 0 can cause undesired shifts in the data pattern Shifting the entire series upward by adding a constant m can have the effect of equalizing the effect of a transformation across levels 8 2 Using a range mean plot to help choose a transformation We use a range mean plot to help detect nonstationary variance A range mean plot is constructed by dividing the time series into logical subgroups e g years for 12 monthly observations per year and computing the mean and range in each group The ranges are plotted against the means to see if the ranges tend to increase with the means If the variability in a time series increases with level as in percent change the range mean plot will show a strong positive relationship between the sample means and the sample variances for each period in the time series If the relationship is strong enough it may be an indication of the need for some kind of transformation Example 8 1 Returning to savings rate data see Example 4 1 the range mean plot in Figure 12 shows that other than the outlying value there is no strong relationship between the mean and ranges Thus there is no evidence of need for a transformation for these data Example 8 2 Fig
63. orrectness of a sequence of prediction intervals Each time the function is executed different sets of intervals will be generated EXAMPLES multiple pi sim multiple pi sim sample sizes c 10 40 conlevs c 0 95 0 99 multiple pi sim sample sizes c 30 120 conlevs c 0 95 0 99 multiple probplot sim Function to simulate probability plots with different sample 55 sizes DESCRIPTION When executed the function will create an r by c matrix of probability plots where r is the length of sample size vec and c is the number of replications USAGE multiple probplot sim sample size vec c 10 20 60 number simulations 5 REQUIRED ARGUMENTS None OPTIONAL ARGUMENTS sample size vec A vector of sample sizes for the simulation The default is sample size vec c 10 20 60 A length of 3 produces a nice looking matrix with 3 rows of sub plots number simulations The number of replications for each sample size The default is number simulations 5 If the number is much larger than this hte plots may be hard to read SIDE EFFECTS Produces a plot EXAMPLES multiple probplot sim multiple probplot sim sample size vec c 50 500 5000 XX KEYWORDS Goodness of fit Graphics q q plot plot true acfpacf Function to compute and plot the trye acf and pacf functions DESCRIPTION Given a specified ARMA model this function will compute the true ACF and PACF functions and put them on a plot USAGE
64. plot true acfpacf model nacf 24 type b original par T REQUIRED ARGUMENTS 56 model Model as specified for iden and esti OPTIONAL ARGUMENTS nacf Number of lags to be computed and plotted type To allow plotting of just ACF or PACF Default is to plot both original par Unless this is set to F par will be returned to its original state after the function is done VALUE Describe the value returned SEE ALSO tacf tpacf EXAMPLES pure ar plot true acfpacf model list ar 9 plot true acfpacf model list ar 9 plot true acfpacf model list ar c 9 6 plot true acfpacf model list ar c 1 5 6 plot true acfpacf model list ar c 1 5 6 pure ma plot true acfpacf model list ma 9 plot true acfpacf model list ma c 9 7 arma plot true acfpacf model list ar 9 ma 9 plot true acfpacf model list ar 9 ma 9 plot true acfpacf model list ar c 9 5 ma c 1 1 DESCRIPTION This function reads the time series data into S It creates the data set data d which contains the time series data title and units USAGE read ts file name title units frequency start year start month path REQUIRED ARGUMENTS 57 None needed function will prompt for things that it needs To avoid prompting however one can specify the arguments directly This is particularly useful in preparing batch files to analyze time series data VALUE This function returns
65. plus commands plot true acfpacf model list ar 95 plot true acfpacf model list ar c 78 2 plot true acfpacf model list ma c 1 95 plot true acfpacf model list ar 95 ma 95 VVV V The output from these commands is shown in Figures 8 9 10 and 11 respectively 3 8 General behavior of ACF and PACF functions The general behavior of the true ACF and PACF functions is easy to describe e For AR p models gt The ACF function will die down sometimes in an oscillating manner When close to a stationarity boundary the rate at which the ACF decays may be very slow e g and AR 1 model with 999 For this reason if the ACF dies down too slowly it may be an indication that the process generating the data is nonstationary gt The PACF function will cut off i e be equal to 0 after lag p e For MA q models gt The ACF function will cut off i e be equal to 0 after lag q 15 True ACF ar 0 95 2 u 2 o E RAe 2 0 5 10 15 20 Lag True PACF ar 0 95 oe LL 2 a o v o 2 E 2 0 5 10 15 20 Lag Figure 8 True ACF and PACF for AR 1 model with 95 True ACF ar 0 78 0 2 2 O 1 o oO o 2 2 0 5 10 15 20 Lag True PACF ar 0 78 0 2 Q W 2 5 o o 2 Q 0 5 10 15 20 Lag Figure 9 True ACF and PACF for AR 2 model with 1 78 2 2 16 True ACF ma
66. point for developing models for describing and short term forecasting of some nonstationary time series processes 3 1 ARMA models The nonseasonal ARMA p q model can be written as Pp B Zt Oo 0 B a 1 where B is the backshift operator giving BZ Z_1 09 is a constant term p B 1 41B 2B bp BP is the pth order nonseasonal autoregressive AR operator and 6 B 1 01B 62B 6 B is the qth order moving average MA operator The variable a in equation 1 is the random shock term also sometimes called the innovation term and is assumed to be independent over time and normally distributed with mean 0 and variance of The ARMA model can be written in the unscrambled form as Zt bo b1Z 1 P2Zt 2 bpZe p 0104 1 O2a 2 Ogar g tar 2 3 2 Stationarity and Invertibility If p 0 the model is a pure MA model and Z is always stationary When an ARMA model has p gt 0 AR terms Z is stationary if and only if all of the roots of p B all lie outside of the unit circle If q 0 the ARMA model is pure AR and Z is always invertible When an ARMA model has q gt 0 MA terms Z is invertible if and only if all of the roots of 6 B all lie outside of the unit circle The S PlusTS function arma roots can be used to compute and display the roots of an AR or an MA polynomial Example 3 1 To
67. pression is assigned to a name we create an S plus object An S plus object can contain numeric values character strings logical values True or False results from a complete statistical analysis function definition or an entire graph Expressions are combinations of object names and constants with operators and functions A vector is an S plus object matrix array and category are examples of other classes of S plus objects Some examples are gt x lt 1 200 generate a sequence of x values from 1 to 200 gt y lt x 2 rnorm x generate some y values with random error gt cbind x y print x and y values as a matrix gt plot x y simple scatter plot of x and y gt tsplot ship time series plot of ship data There are some other useful things about S plus that one needs to know They are e There is an online help facility in S plus To use this just enter help function name at the gt prompt For example gt help arima mle When running S plus from within Emacs this method does not work Try C c C h instead Right now this only works with Emacs version 18 When running S plus from within Emacs another alternative is to fire up Splus in another window and use it only for help e One way to obtain a hard copy of output from a computation is to use the sink function enter sink filename and from then on the screen output will be directed to a file in your default directory To restore screen output ent
68. th C x C w or C x C k yanking is useful for moving or coping a chunk of text Acts like paste in Windows You can yank the same text to several different places to do a copy C g Quit the current command including search C x 2 Split window horizontally For other commands see the GNU Emacs reference card which is available for free in Durham hall 139 45 D Time Series Analysis Primary S plus Function Documen tation This appendix provides detailed documentation in the form of S plus man pages for a number of special S plus functions that have been written especially for the Statistics 451 course but which we hope will be more generally useful DESCRIPTION This function is used to annotate the plots Pointing at the motif graphics window the left button on the mouse can be used for marking a sequence of points on the plot After you have used the left button to mark some or all of the points where you want text push the middle button to indicate that you have specified all points Then you will be prompted in the Splus command window for the text to put in each place that you have indicated It is best to only to 3 or 4 pieces of text at a time USAGE add text VALUE The text is added to the plot SEE ALSO locator DESCRIPTION This function is a combination of estimation diagnostic checking and forecasting stages of an ARIMA model It pro vides useful graphical devices and statistics to he
69. the 1 a quantile of the chi square distribution with k p degrees of freedom there is strong evidence that the model is inadequate 6 3 Other Diagnostic Checks The following are other useful diagnostic checking methods e The residuals from a fitted model constitute a time series that can be plotted just as the original realization is plotted A plot of the residuals versus time is sometimes helpful in detecting problems with the fitted model and gives a clear indication of outlying observations It is also easier to see a changing variance in the plot of the residuals then in a plot of the original data The residual plot can also be helpful in detecting data errors or unusual events that impact a time series e As in regression modeling a plot of residuals versus fitted values can be useful for identifying departures from the assumed model e A normal probability plot of the residuals can be useful for detecting departures from the assumption that residuals are normally distributed the assumption of normally distributed residuals is important when using normal distribution based prediction intervals e Adding another coefficient to a model may result in useful improvement This diagnos tic check is known as overfitting Generally one should have a reason for expanding a model in a certain direction Otherwise overfitting is arbitrary and tends to violate the principle of parsimony e Sometimes a process will change with t
70. the following S plus commands and then respond to the various prompts gt savings rate d lt read ts path home wqmeeker stat451 splus data Input unix file name savings_rate dat Input data title US Savings Rate for 1955 1980 Input response units Percent of Disposable Income Input frequency 4 Input start year 1955 Input start month 1 The file savings rate dat is in the home wqmeeker stat451 data path along with a number of other stat451 data sets You do not need to specify the path if the data is in your stat451 directory When you want to input your own data put it is an ascii flat file and give it a name like xxxx dat where xxxx is a meaningful name for the time series Then the read ts command will generate a data set savings rate d This dataset provides input for our other S PlusTS functions g Instead of answering all of the questions another alternative is to use the command option to create the input data structure Example 2 2 To read the savings rate quarterly data introduced in Example 1 1 using the command option use the following command savings rate d lt read ts file name home wqmeeker stat451 data savings_rate dat title US Savings Rate for 1955 1980 series units Percent of Disposable Income frequency 4 start year 1955 start month 1 time units Year This command creates an S plus object savings rate d Notice that for quarterly data the inputs frequency st
71. ul For time series that have integrated or random walk type behavior first differencing is usually sufficient to produce a time series with a stationary mean occasionally however second differencing is also used Differencing more than twice virtually never needed We must be very careful not to difference a series more than needed to achieve stationarity Unnecessary differencing creates artificial patterns in a data series and reduces forecast accuracy Example 8 3 Figure 19 shows output from function iden for the weekly closing prices for 1979 AT amp T common stock Both the time series plot and the range mean plot strongly suggests that the mean of the time series is changing with time g Example 8 4 Figure 20 shows output from function iden for the changes or first differ ences in the weekly closing prices for 1979 AT amp T common stock These plots indicate that the changes in weekly closing price have a distribution that could be modeled with a constant mean ARMA model g 31 1979 Weekly Closing Price of ATT Common Stock w 1 B 1 Dollars oO N gt o x 0 10 20 30 40 50 time ACF PACF e e tL LL g 8 T 8 Beten te E AREN EN A E e e 0 10 20 30 0 10 20 30 Lag Lag Figure 20 Output from function iden for the weekly closing prices for 1979 AT amp T common stock 8 5 Seasonal differencing For seasonal models seasonal differencing is often useful For example W 1 BY Z
72. ure 17 shows iden output of monthly time series data giving the number of international airline passengers between 1949 and 1960 The command used to make this figure is gt iden airline d The range and mean of the 12 observations within each year were computed and plotted This plot shows clearly the strong relationship between the ranges and the means Figure 18 shows iden output for the logarithm of the number of passengers The command used to make this figure is gt iden airline d gamma 0 This figure shows that the transformation has broken the strong correlation and made the amount of variability more stable over the realization When interpreting the range mean plot and deciding on an appropriate transformation we have to remember that doing a transformation affects many parts of a time series model for example 28 International Airline Passengers we Thousands of Passengers 8 8 8 z 8 Ei 8 voo 1950 1951 1052 1953 1954 1055 1956 1957 1958 1958 1960 voot am Range Mean Plot ACF bs Leeann a4 sept o vo 20 30 34 if Lag j PACF 2 2 g5 a T T T ii 200 seo 00 o 10 zo zo m Lag Figure 17 Plot of the airline data along with a range mean plot and plots of sample ACF and PACF International Airline Passengers w log Thousands of Passengers os 1948 1950 1951 1982 1988 1954 1985 1956 1967 1
73. uting estimates computed from model fitting into 5 and 7 Example 7 1 Figure 16 shows forecasts for the savings rate data The forecasts start low but increase to an asymptote equal to the estimated mean of the time series The set of prediction intervals gets wider with the width also reaching a limit approximately 1 96 times the estimate of oz 4 8 Modeling Nonstationary Time Series 8 1 Variance stationarity and transformations Some time series data exhibit a degree of variability that changes through time e g Fig ure 2 In some cases especially when variability increases with level such series can be transformed to stabilize the variance before being modeled with the Univariate Box Jenkins ARIMA method A common transformation involves taking the natural logarithms of the original series This is appropriate if the variance of the original series is approximately pro portional to the mean More generally one can use the family of Box Cox transformations Grimi 40 log Zi m y 0 Le 8 where Z is the original untransformed time series and y is primary transformation pa rameter Sometimes the Box Cox power transformation is presented as Zj Zj m 7 27 The more complicated form in 8 has the advantage of being monotone increasing in Zi for any y and lim dem ee am Ee log Z m 70 The quantity m is typically chosen to be 0 but can be used to shift a time series usually upward before transf
74. utocovariance and Autocorrelation Functions for the AR 2 Model The autocovariance function for an AR 2 model can be obtained from y E t Elp doli a l E Z QE Z Z1 E Zpar41 1 Hoy 0 y2 E Z Z142 E Z diZi41 p22 at42 E t 26 Z E Zar42 diy 270 te Then the autocorrelation function is pp Yr yo and po 1 gives the AR 2 ACF pi A 2p1 p2 2 P3 p 21 pk ipk P2Pk 2 The calculations are similar for an AR p model with other values of p In general the first two equations are commonly known the Yule Walker equations g 3 6 Model partial autocorrelation function The model PACF or true PACF gives the model based correlation between observations Zt Zia separated by k time periods k 1 2 3 accounting for the effects of 14 intervening observations Zt 1 Zt 2 Ziek The PACF function can be computed as a function of the model or true ACF function The formula for calculating the true PACF is gu pi kl Pk X Pk 11j Pk j kk b 23 33 3 k 1 i y k 1 j Pj j 1 where Prj Pk 1 j kkk 1 k j k 3 4 j 1 2 k 1 This formula for computing the true PACF is based on the solution of the Yule Walker equations 3 7 Computing and graphing the model true ACF and PACF func tions Plots and tabular values of the true ACF and PACF functions can be obtained using the following S
75. which to estimate the autocovariance The default value is lag max 38 period Specifies the number of observations to be used in a group when making the range mean plot The default is sea sonal period unless the seasonal period is 3 or less in which case the default is 10 seasonal label If true ACF and PACF lags are given in units of seasonal period Default is FALSE VALUE A series of plots are produced ACF and PACF s are print ed SEE ALSO f range mean my acf plot 48 EXAMPLES iden airline d iden airline d gamma 0 m 2 iden airline d gamma 0 d 1 Give users experience in model identification using simulated data DESCRIPTION This function will simulate samples from an ARMA model plot the data generate sample ACF and PACF functions and let user gain experience of tentatively identifying the model USAGE iden sim realization size 100 model NULL lag max 20 REQUIRED ARGUMENTS None OPTIONAL ARGUMENTS realization size Size of the sample realization to be generat ed model User specified model If no model is specified a ran dom model will be generated from an internal list lag max Maximum lag for the ACF and PACF functions EXAMPLES iden sim iden sim model list ar c 9 5 iden sim model list ar c 9 5 ma c 1 1 DESCRIPTION This function is used to define an ARIMA model to be used in the arima mle function Although the arguments are all listed under
76. ximize the probability of the data Combinations of parameters giving high probability are more plausible than those with low probability The ML estimation criteria is the preferred method Assuming a correct model the likelihood function from which ML estimates are derived reflects all useful information about the parameters contained in the data Estimated coefficients are nearly always correlated with one another Estimates of the parameter correlations provide a useful diagnostic If estimates are highly correlated the estimates are heavily dependent on each other and tend to be unstable e g slight changes in data can cause large changes in estimates If one or more of the correlations among parameters is close to 1 it may be an indication that the model contains too many parameters 6 Diagnostic Checking At the diagnostic checking stage we check to see if the tentative model is adequate for its purpose A tentative model that fails one or more diagnostic tests is rejected If the model is rejected we repeat the cycle of identification estimation and diagnostic checking in an attempt to find a better model The most useful diagnostic checks consist of plots of the residuals plots of functions of the residuals and other simple statistics that describe the adequacy of the model 21 6 1 Residual autocorrelation In an ARMA model the random shocks a are assumed to be statistically independent so at the diagnostic checking

Download Pdf Manuals

image

Related Search

Related Contents

recherches simples dans esidoc mode d`emploi  『リアルアクション ボクのムシ』 2007年4月4日より順次発売  Philips DVP5140 DivX Ultra DVD player  07 Crevice Nester Productivity Protocol v1.4  68650 Fahrrad-Computer_content_LB2.indd  Honeywell 53034 Home Security System User Manual    Guide pratique de TEX Live 2015  Feuille de match - Comité de Rugby du Lyonnais  

Copyright © All rights reserved.
Failed to retrieve file