Practice Exercises (Kaggle Notebook) — No Solutions
Complete these exercises in a Kaggle notebook. Use the EEG/EMG dataset and path conventions below.
eeg_emg_ts.csvdate, eeg_alpha_power, emg_rms.trongnghia7171, or use the link: https://www.kaggle.com/datasets/trongnghia7171/eeg-emg-ts-practice./kaggle/input/eeg-emg-ts-practice/eeg_emg_ts.csv.import pandas as pd
path = "/kaggle/input/eeg-emg-ts-practice/eeg_emg_ts.csv"
df = pd.read_csv(path)
df["date"] = pd.to_datetime(df["date"])
df = df.set_index("date").sort_index()
eeg = df["eeg_alpha_power"]
emg = df["emg_rms"]
Kaggle notebooks include pandas, numpy, matplotlib, and statsmodels. If needed, install in a cell:
# !pip install statsmodels
Load the CSV from the Kaggle input path, parse date, and set it as the datetime index. Extract the EEG series (eeg_alpha_power) and the EMG series (emg_rms).
Plot both series (EEG and EMG) and compute basic statistics (mean, std, min, max) for each.
Comment on trend, variability, and physiological interpretation: e.g. alpha power changes over a recording session (drowsiness, arousal); EMG baseline drift and burst-like structure.
Run the Augmented Dickey–Fuller (ADF) test on the raw EEG series and on the raw EMG series. Report test statistic, p-value, and critical values for each.
Conclude for each series whether it is stationary or has a unit root. State the null and alternative hypotheses and your decision.
Run the KPSS test for each series and compare conclusions (trend stationary vs difference stationary).
Compute the first difference of the EEG series and of the EMG series. Plot both differenced series.
Run the ADF test (and optionally KPSS) on each differenced series. Conclude whether the differenced series are stationary and hence whether d = 1 is appropriate for both.
Plot the ACF and PACF of the differenced EEG series (use a reasonable number of lags, e.g. 40–50). Using the usual guidelines (PACF cutoff for AR order, ACF cutoff for MA order), suggest candidate orders (p, q) for the differenced series, hence (p, d, q) with d = 1 for the raw EEG (e.g. ARIMA(1,1,0), ARIMA(2,1,0)). Note any periodic peaks (e.g. alpha-rhythm modulation at a given lag).
Plot the ACF and PACF of the differenced EMG series. Suggest ARIMA(p, 1, q) for the raw EMG. Comment on EMG-specific behavior (e.g. short memory, MA-like cutoff typical of burst dynamics).
Fit 2–3 candidate ARIMA(p, 1, q) models for the EEG series (e.g. (1,1,0), (2,1,0), (1,1,1)). Report estimated coefficients, AIC, and BIC. Choose a preferred model (e.g. by BIC).
Fit 2–3 candidate ARIMA(p, 1, q) models for the EMG series (e.g. (0,1,1), (1,1,1), (0,1,2)). Report coefficients, AIC, BIC. Choose a preferred model.
Compare the EEG and EMG preferred orders (e.g. AR vs MA dominance) and interpret in terms of physiological dynamics.
For both preferred models (EEG and EMG), plot the ACF of the residuals and run a Ljung–Box test (e.g. on the first 20 lags). Comment on whether the residuals behave like white noise for each.
For both series, use a temporal train/test split (e.g. last 10–15%). Generate one-step or short-horizon forecasts for the test period. Compute RMSE and MAE for EEG and for EMG. Plot forecasts vs actual values for each. Briefly compare EEG vs EMG forecastability.
This practice set covers:
Complete the exercises in a Kaggle notebook. Solutions are available separately.