Week 6–7 Practice: Stationarity, Unit Root, Differencing, AR/MA/ARMA

Practice Exercises (Kaggle Notebook) — No Solutions

Kaggle Notebook Environment

Complete these exercises in a Kaggle notebook. Use the dataset and path conventions below.

Dataset

Kaggle: Hospital Admissions TS Practice
File: healthcare_ts.csv
Description: Daily hospital admissions (simulated), one univariate series with columns date and value.
Length: 730 daily observations (~2 years).

Adding the dataset on Kaggle

Open the dataset link above (or go to your notebook and click Add Data).
Search for hospital-admissions-ts-practice by trongnghia7171, or use the link: https://www.kaggle.com/datasets/trongnghia7171/hospital-admissions-ts-practice/data.
Attach the dataset to your notebook. The file path will be /kaggle/input/hospital-admissions-ts-practice/healthcare_ts.csv.

Loading the data in the notebook

import pandas as pd
path = "/kaggle/input/hospital-admissions-ts-practice/healthcare_ts.csv"
df = pd.read_csv(path)
df["date"] = pd.to_datetime(df["date"])
df = df.set_index("date").sort_index()
y = df["value"]

Required packages

Kaggle notebooks include pandas, numpy, matplotlib, and statsmodels. If needed, install in a cell:

# !pip install statsmodels

Objective: Use this healthcare time series to practice the full pipeline from unit root testing through model selection, AR/MA/ARMA fitting, and evaluation (aligned with slides 4.1 and 4.2). Solutions are available separately.

Exercise 1: Load and explore

Task 1.1

Load the CSV from the Kaggle input path, parse date, and set it as the datetime index. Use the series y (column value).

Task 1.2

Plot the series and compute basic statistics (mean, std, min, max). Comment on trend and variability.

Exercise 2: Unit root tests

Task 2.1

Run the Augmented Dickey–Fuller (ADF) test on the raw series (e.g. statsmodels.tsa.stattools.adfuller). Report the test statistic, p-value, and critical values.

Task 2.2

Conclude whether the series is stationary or has a unit root. State the null and alternative hypotheses and your decision.

Task 2.3 (optional)

Run the KPSS test and compare conclusions (trend stationary vs difference stationary).

Exercise 3: Differencing

Task 3.1

Compute the first difference of the series. Plot the differenced series.

Task 3.2

Run the ADF test (and optionally KPSS) on the differenced series. Conclude whether the differenced series is stationary and hence whether d = 1 is appropriate.

Exercise 4: ACF and PACF for model selection

Task 4.1

Plot the ACF and PACF of the differenced series (use a reasonable number of lags, e.g. 20–40). Optionally plot ACF/PACF of the raw series for comparison.

Task 4.2

Using the usual guidelines (PACF cutoff for AR order, ACF cutoff for MA order), suggest candidate orders (p, q) for the differenced series, and hence (p, d, q) with d = 1 for the original series (e.g. ARIMA(1,1,0), ARIMA(0,1,1), or ARIMA(1,1,1)).

Exercise 5: AR/MA/ARMA fit

Task 5.1

Fit 2–3 candidate ARIMA(p, 1, q) models (e.g. (1,1,0), (0,1,1), (1,1,1)) using the full series.

Task 5.2

Report estimated coefficients, AIC, and BIC for each model.

Task 5.3

Choose a preferred model (e.g. by BIC or parsimony) and state the final order.

Exercise 6: Evaluation

Task 6.1 — Residual diagnostics

For the chosen model, plot the ACF of the residuals. Run a Ljung–Box test (e.g. on the first several lags). Comment on whether the residuals behave like white noise.

Task 6.2 — Forecast evaluation

Split the data temporally (e.g. last 10–15% as test). Generate one-step or short-horizon forecasts for the test period. Compute RMSE and MAE. Plot forecasts vs actual values.

Summary

This practice set covers:

Load and explore: Load healthcare time series in Kaggle and inspect trend and variability.
Unit root tests: ADF (and optionally KPSS) on the raw series; conclude non-stationarity.
Differencing: First difference; confirm stationarity with ADF.
ACF/PACF: Model selection for (p, d, q) with d = 1.
AR/MA/ARMA fit: Fit ARIMA(p, 1, q); report coefficients and AIC/BIC; choose preferred model.
Evaluation: Residual diagnostics (ACF, Ljung–Box) and forecast evaluation (RMSE, MAE, plot).

Complete the exercises in a Kaggle notebook. Solutions are available separately.