Practice Exercises (Kaggle Notebook) — No Solutions
Complete these exercises in a Kaggle notebook. Use the dataset and path conventions below.
healthcare_ts.csvdate and value.trongnghia7171, or use the link: https://www.kaggle.com/datasets/trongnghia7171/hospital-admissions-ts-practice/data./kaggle/input/hospital-admissions-ts-practice/healthcare_ts.csv.import pandas as pd
path = "/kaggle/input/hospital-admissions-ts-practice/healthcare_ts.csv"
df = pd.read_csv(path)
df["date"] = pd.to_datetime(df["date"])
df = df.set_index("date").sort_index()
y = df["value"]
Kaggle notebooks include pandas, numpy, matplotlib, and statsmodels. If needed, install in a cell:
# !pip install statsmodels
Load the CSV from the Kaggle input path, parse date, and set it as the datetime index. Use the series y (column value).
Plot the series and compute basic statistics (mean, std, min, max). Comment on trend and variability.
Run the Augmented Dickey–Fuller (ADF) test on the raw series (e.g. statsmodels.tsa.stattools.adfuller). Report the test statistic, p-value, and critical values.
Conclude whether the series is stationary or has a unit root. State the null and alternative hypotheses and your decision.
Run the KPSS test and compare conclusions (trend stationary vs difference stationary).
Compute the first difference of the series. Plot the differenced series.
Run the ADF test (and optionally KPSS) on the differenced series. Conclude whether the differenced series is stationary and hence whether d = 1 is appropriate.
Plot the ACF and PACF of the differenced series (use a reasonable number of lags, e.g. 20–40). Optionally plot ACF/PACF of the raw series for comparison.
Using the usual guidelines (PACF cutoff for AR order, ACF cutoff for MA order), suggest candidate orders (p, q) for the differenced series, and hence (p, d, q) with d = 1 for the original series (e.g. ARIMA(1,1,0), ARIMA(0,1,1), or ARIMA(1,1,1)).
Fit 2–3 candidate ARIMA(p, 1, q) models (e.g. (1,1,0), (0,1,1), (1,1,1)) using the full series.
Report estimated coefficients, AIC, and BIC for each model.
Choose a preferred model (e.g. by BIC or parsimony) and state the final order.
For the chosen model, plot the ACF of the residuals. Run a Ljung–Box test (e.g. on the first several lags). Comment on whether the residuals behave like white noise.
Split the data temporally (e.g. last 10–15% as test). Generate one-step or short-horizon forecasts for the test period. Compute RMSE and MAE. Plot forecasts vs actual values.
This practice set covers:
Complete the exercises in a Kaggle notebook. Solutions are available separately.