Week 6–7 Practice: Stationarity, Unit Root, Differencing, AR/MA/ARMA

Practice Exercises (Kaggle Notebook) — No Solutions

Kaggle Notebook Environment

Complete these exercises in a Kaggle notebook. Use the dataset and path conventions below.

Dataset

Adding the dataset on Kaggle

Loading the data in the notebook

import pandas as pd
path = "/kaggle/input/hospital-admissions-ts-practice/healthcare_ts.csv"
df = pd.read_csv(path)
df["date"] = pd.to_datetime(df["date"])
df = df.set_index("date").sort_index()
y = df["value"]

Required packages

Kaggle notebooks include pandas, numpy, matplotlib, and statsmodels. If needed, install in a cell:

# !pip install statsmodels
Objective: Use this healthcare time series to practice the full pipeline from unit root testing through model selection, AR/MA/ARMA fitting, and evaluation (aligned with slides 4.1 and 4.2). Solutions are available separately.

Exercise 1: Load and explore

Task 1.1

Load the CSV from the Kaggle input path, parse date, and set it as the datetime index. Use the series y (column value).

Task 1.2

Plot the series and compute basic statistics (mean, std, min, max). Comment on trend and variability.

Exercise 2: Unit root tests

Task 2.1

Run the Augmented Dickey–Fuller (ADF) test on the raw series (e.g. statsmodels.tsa.stattools.adfuller). Report the test statistic, p-value, and critical values.

Task 2.2

Conclude whether the series is stationary or has a unit root. State the null and alternative hypotheses and your decision.

Task 2.3 (optional)

Run the KPSS test and compare conclusions (trend stationary vs difference stationary).

Exercise 3: Differencing

Task 3.1

Compute the first difference of the series. Plot the differenced series.

Task 3.2

Run the ADF test (and optionally KPSS) on the differenced series. Conclude whether the differenced series is stationary and hence whether d = 1 is appropriate.

Exercise 4: ACF and PACF for model selection

Task 4.1

Plot the ACF and PACF of the differenced series (use a reasonable number of lags, e.g. 20–40). Optionally plot ACF/PACF of the raw series for comparison.

Task 4.2

Using the usual guidelines (PACF cutoff for AR order, ACF cutoff for MA order), suggest candidate orders (p, q) for the differenced series, and hence (p, d, q) with d = 1 for the original series (e.g. ARIMA(1,1,0), ARIMA(0,1,1), or ARIMA(1,1,1)).

Exercise 5: AR/MA/ARMA fit

Task 5.1

Fit 2–3 candidate ARIMA(p, 1, q) models (e.g. (1,1,0), (0,1,1), (1,1,1)) using the full series.

Task 5.2

Report estimated coefficients, AIC, and BIC for each model.

Task 5.3

Choose a preferred model (e.g. by BIC or parsimony) and state the final order.

Exercise 6: Evaluation

Task 6.1 — Residual diagnostics

For the chosen model, plot the ACF of the residuals. Run a Ljung–Box test (e.g. on the first several lags). Comment on whether the residuals behave like white noise.

Task 6.2 — Forecast evaluation

Split the data temporally (e.g. last 10–15% as test). Generate one-step or short-horizon forecasts for the test period. Compute RMSE and MAE. Plot forecasts vs actual values.

Summary

This practice set covers:

Complete the exercises in a Kaggle notebook. Solutions are available separately.