Over the past, many efforts have led to the development of a variety of quantitativemodelling methods to forecast stock prices and volatility, ranging from the combination ofautoregressi
Trang 1
NATIONAL ECONOMICS UNIVERSITY Faculty of Mathematical Economics
-*** -ASSIGNMENT SUBJECT: ECONOMETRICS II
Topic
Stock Price And Return Volatility Prediction Using ARIMA And ARCH-GARCH Models
Student name: Nguyen Thi Lan Nhi Student ID: 11219243 Class: Actuary 63 Instructor: Bui Duong Hai
Trang 3I Introduction
Established in early 2000, the Vietnamese stock market has become a very attractive investment channel for investors, from professional investment organizations to individual investors However, in addition to the high profitability, this is also an activity that always exists with many potential risks because investors do not always accurately predict the trend
of stock prices in the future Therefore, the accurate prediction of stock price fluctuations to have a strategy to serve the business of individuals, organizations is becoming necessary Over the past, many efforts have led to the development of a variety of quantitative modelling methods to forecast stock prices and volatility, ranging from the combination of autoregressive integrated moving average (ARIMA) and generalized autoregressive conditional heteroscedasticity (GARCH) models, to Gaussian process regression (GPR) model and artificial neuron network (ANN) model
Within the scope of the Econometrics II course, this study will focus on applying ARIMA and ARCH-GARCH models to predict the closing prices and volatility of IMP- Inexpharm Vietnam Joint Stock Company’s share based on historical data
2.1 ARIMA Model
ARIMA, which stands for Autoregressive Integrated Moving Average, is a statistical model used for the analysis of time series data and forecasting future data points within the series This model is built on the concept that the current data can be explained by its past values and the cumulative impact of past disturbances, assuming the time series is stationary The ARIMA model consists of three main components, each of which is characterized by a parameter:
The Autoregressive (AR) component, denoted by p, which signifies the number of
lagged values used to predict the current value in the time series This component captures the relationship between the current value in the time series and its previous values
The Integrated (I) component, denoted by , which signifies the order ofd
differencing This component reflects the number of differencing operations required
to make the time series stationary
The Moving Average (MA) component, denoted by q, which represents the order of
moving averages This component captures the relationship between the current value
in the time series and white noise term based on past forecast errors
The ARIMA(p, d, q) model can be generalized by the following expression:
Trang 42.2 ARCH – GARCH Models
ARCH Model (Autoregressive Conditional Heteroskedasticity)
The ARCH model was introduced by Robert F Engle in 1982 It assumes that the conditional variance of the error term at each time point is a function of past error terms
The ARCH(p) model is defined as follows:
GARCH Model (Generalized Autoregressive Conditional Heteroskedasticity):
The GARCH model, introduced by Tim Bollerslev in 1986, is an extension of the ARCH model It allows for a more flexible specification of the conditional variance
by including lagged values of both the conditional variance and squared past observations
The GARCH(p, q) model is defined as follows:
In which:
The volatility is stationary if
Thus, the unconditional variance is:
3.1 Data
For this study, I have collected a series of closing share prices of Imexpharm Corporation (HOSE: IMP) from 391 trading sessions between January 1 , 2022 and July 31 , 2023.st st
Inexpharm Corporation is a leading manufacturer and distributor of pharmaceutical products
in Vietnam The company's products include prescription drugs, over-the-counter drugs, and medical devices Imexpharm's products are sold in over 50 countries worldwide Moreover, IMP stocks have been listed on HOSE since December 4 , 2006, it is becoming moreth
popular among Vietnamese investors The company has a strong track record of growth and profitability Imexpharm is expected to continue to grow in the coming years, as the Vietnamese pharmaceutical market is expected to grow significant
Trang 560000
70000
80000
2022-01 2022-07 2023-01 2023-07
Date
Moreover, the dataset also includes the 1 diffrence series, a growth rate series and a logst
return series, which have derived from the original stock price series using the following formulae:
Closing price at time t The 1 differencest
Growth rate
Log return
Then, the time series plots for all time series in the dataset is provided below:
Figure 1 IMP stock price series
Trang 6-3000
0
3000
2022-01 2022-07 2023-01 2023-07
Date
-0.04
0.00
0.04
-0.04
0.00
0.04
2022-01 2022-07 2023-01 2023-07
Date
Figure 2 The 1 difference series
Figure 3 Growth rate series
Figure 4 Log return series
Trang 73.2 Methodology
Forecasting stock prices using ARIMA model
The Box-Jenkins method is a systematic process for identifying, estimating, and diagnosing autoregressive integrated moving average (ARIMA) time series models It was developed by George Box and Gwilym Jenkins in the 1970s, and it remains one of the most popular and widely used methods for time series forecasting today
Figure 5 Box-Jenkins method
For my study, 03 series inclusing closing price, growth rate and log return series are applied the Box-Jenkins method forecast the future prices of IMP stock The steps involved in the Box-Jenkins method can be summarized as follows:
Box-Jenkins Step 1: Identification
Stationarity Check
Examine the time series plot to identify any trends or seasonality A stationary time series is often easier to model
Conduct Dickey-Fuller (DF) tests with trend, with drift and without drift to check for the stationarity of the series around the time trend, around the long run mean , and around 0, respectively
o DF test with trend:
If is statistically significant and where , then we can conclude that the series
if stationary around the time trend is statistically insignificant, then proceed
to DF test with drift
o DF test with drift:
Trang 8stationary around the long run mean If is statistically insignificant, then proceed to DF test without drift
o DF test without drift:
If then we can conclude that the series is stationary around value 0 In all cases, if , we conclude that the series is non-stationary
Differencing
If the time series is not stationary, take differences until stationarity is achieved By doing so, the degree of differencing d for the ARIMA model can be identified
Autocorrelation and Partial Autocorrelation Analysis
Examine the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots to identify potential autoregressive (AR) and moving average (MA) orders
Box-Jenkins Step 2: Estimation
ARIMA Model Selection
Use the information gathered in the identification step to choose tentative orders for the ARIMA model
Estimate the parameters of the chosen ARIMA model using both maximum likelihood estimation (MLE) and ordinary least square (OLS) methods With each model, record the estimated coefficients and their standard errors, together with the AIC and BIC value of the model into a table to compare between models in order to choose the best model for each 03 series – closing price, growth rate and log return of IMP stock
Box-Jenkins Step 3: Diagnostic Checking
Stationary property by unit circle
Use the inverse unit root circle to check for stationarity of the autoregressive terms and moving average terms in the model If all inverse roots are within the unit circle, conclude the AR process and MA process in the ARIMA model are stationary
Residual Analysis
Examine the residuals of the estimated model for patterns or systematic behavior The residuals should ideally be white noise
Check the ACF and PACF of the residuals to ensure there are no significant autocorrelations
Use the Ljung-Box test to check for the absence of autocorrelation in the residuals If the p-value of the test is higher than 5%, conclude that the residual series is a white
Trang 9Assessing model
Once satisfactory models are obtained, use these models to make forecasts for first 10 observations in August, 2023
o For growth rate series
o For log return series
Forecasting errors
Calculate and compare the forecasting errors RMSE, MAE and MAPE of all 03 series
to select the best model to forecast the closing prices of IMP stock in other days of August, 2023
Forecasting stock return volatility using ARCH-GARCH model
Step 1: Residual Calculation
Obtain the residuals from the selected ARIMA model by subtracting the predicted values from the observed values
Calculate the squared residuals from the selected ARIMA model
Step 2: Test for conditional heteroskedasticity
The heteroscedasticity test aims to discover whether the variance from the data is constant or time varying If σ is homocedastic, so the volatility value is calculated by using the formula of standard deviation If it is heteroscedastic, the volatility value is calculated by using ARCH-GARCH method
Form an auxiliary regression of the squared residuals on lagged squared residuals Using ARCH-LM test to test whether the squared residuals exhibit autoregressive conditional heteroskedasticity
The null hypothesis is that there is no conditional heteroskedasticity If the p-value of the test is lower than 5%, conclude that the presence of conditional heteroskedasticity
Step 3: ARCH-GARCH Modeling
Examine the PACF of the squared residuals to identify potential ARCH orders Examine the significance of ARCH estimated coefficients
Use the identified orders to estimate the GARCH model parameters, which applying the maximum likelihood estimation (MLE) methods
Step 4: Forecasting
Use the ARCH-GARCH model to forecast future volatility
Trang 104.1 Forecasting stock prices using ARIMA model
4.1.1 Testing for stationarity
The study uses the Dickey Fuller test to test the stationarity of the stock price series The result is presented in the table below:
Table 2 DF tests for stock price series
It is shown that in all three cases of DF tests with trend, with drift and without drift Thus, at significant level 1%, we can conclude that the stock price series is non-stationary Therefore,
it is necessary to transform this series into the stationary form to have a better forecasting This study will take the 1 difference to the original series, growth rate and log return seriesst
in ARIMA model applications
The table below shows the results of Dickey Fuller test for 1 difference, growth rate and logst
return series
Table 3 DF tests 1 difference series, growth rate series and log return series st
1 st
difference series
Growth rate series
Log return series
Table 4 DF test’s coefficient estimation results
With trend With drift Without drift
1 st difference
Lagged values -1.22494*** -1.20925*** -1.20894***
Growth rate Intercept -0.00293 0.00001
Trang 115 10 15 20
Lag
Series dimp
Lag
Series dimp
Lag
Series gimp
Series limp
Lag
Series gimp
Series limp
Log return
Lagged values -1.12500*** -1.11143*** -1.11119***
*,**,***: significant at 10%, 5%, 1%
From Table 3 and Table 4, at significant level 1%, we can conclude that our 1 differencest
series, growth rate series and log return series are stationary around value 0
4.1.2 Autocorrelation and Partial Autocorrelation
Figure 5 ACF and PACF correlograms of 1 difference series st
In the PACF plot, the series has partial correlation at order 1 and 4, same as autocorrelation order in the ACF plot Therefore, the 1 diffrence series has the possible values for lag orderst
p = 1, 4 and for order of moving average q = 1, 4
Figure 6 ACF and PACF correlograms of growth rate series
In the PACF plot, the series has partial correlation at order 4, same as autocorrelation order in the ACF plot Therefore, growth rate series has the possible values for lag order p = 4 and for order of moving average q = 4
Figure 7 ACF and PACF correlograms of log return series
Trang 12autocorrelation orders are 1 and 4 Therefore, log return series has the possible values for lag order p = 4 and for order of moving average q = 1, 4
4.1.3 ARIMA Model Selection
Table 5 04 best ARIMA models for the stock price series
ARIMA (0,1,4) (1,1,3) (2,1,2) (3,1,2)
-15.6142 -16.0750 -14.5890 -15.4734
0.1436*** -0.6453*** -0.7480** -0.8402***
-0.0335 0.0465 0.6487*** 0.4924**
-Information criteria 6713.66 6716.50 6715.37 6716.40 BIC 6737.46 6740.08 6739.17 6744.16
*,**,***: significant at 10%, 5%, 1%
It can be seen that mean coefficient is not significant at level 10% in all four models above This because the degree of differencing d = 1 for the closing price series and the 1st
difference series is stationary around value 0 Comparing four models, we can firstly reject model ARIMA (1,1,3) as the AR and MA coeficients are not all significant and the AIC and BIC values are almost highest The remain models have an insignificant coefficient, so the information criteria is consider As the AIC and BIC values of ARIMA (0,1,4) are the
smallest, ARIMA (0,1,4) is the most suitable model for the closing price series of IMP
Table 6 04 best ARIMA models for grwoth rate series
ARIMA (0,0,4) (2,0,2) (3,0,2) (4,0,0)
- 0.7450*** 0.7553*** -0.0901*
- -0.6509* -0.5419** -0.0140
Trang 133000 0 3000
Residuals from ARIMA(0,1,4) with drift
-0.10 0.00 0.10
0 5 10 15 20 25
0 20 40 60
-6000 -3000 0 3000
*,**,***: significant at 10%, 5%, 1%
Table 7 04 best ARIMA models for log retur series
ARIMA (0,0,4) (2,0,2) (3,0,1) (3,0,2)
-0.0003 0.0000 -0.0003 -0.0003
0.0972** -0.8538*** -0.6926*** -0.8517***
-0.0009 0.7184*** - 0.5955***
-Information criteria -1914.31 -1911.94 -1909.57 -1910.48 BIC -1890.50 -1888.13 -1885.76 -1882.70
*,**,***: significant at 10%, 5%, 1%
From table 6 and table 7, it can be seen that the estimated mean coefficient is approximately zero as both growth rate and log return series are stationary around value 0 Comparing four models in table 6, we can select models ARIMA (2,0,2) as the AR and MA coeficients are all
significant and the AIC and BIC values are almost smallest Therefore, ARIMA (2,0,2) is the
most suitable model for the growth rate series of IMP.
With four models in table 7, since the AR and MA coeficients are not all significant then the information criteria is considered As the AIC and BIC values of ARIMA (0,0,4) or MA (4)
are the smallest, ARIMA (0,0,4) is the most suitable model for log return series of IMP.
4.1.4 Diagnostic Checking
ARIMA (0,1,4) for the stock price series
Figure 8 Unit circle of ARIMA (0,1,4) Figure
9 Residual series of ARIMA (0,1,4)
-1.0
-0.5
0.0
0.5
1.0
Within
Inverse MA roots