Lab 4: Survival Analysis with Excel (INSTRUCTOR ANSWER KEY)

Overview

This lab focuses entirely on survival analysis using Lab4-Multivariate-Clinical.csv (~290 patients, multivariate clinical trial data). Students import data, explore covariates, build life tables, plot survival curves, compare treatment arms, and interpret results.

Estimated time: 120 minutes


Part 1: Import & Prepare — Notes

Task 1.1 — Import

Task 1.2 — Quick inspection

Instructor Key:

The dataset has ~290 rows (exact count depends on generator seed; current version = 290 patients).

Grading: Full credit if formulas are correct and numbers are consistent with the data. Exact counts may differ slightly if students imported differently or have header row issues.


Part 2: Exploratory Covariate Analysis — Keys

Task 2.1 — Treatment × Gender

Instructor Key:

Treatment_Arm Female Male Grand Total
Standard ~70–75 ~65–75 ~140–145
Intensive ~65–75 ~70–75 ~140–150
Grand Total ~140 ~150 290

Arms should be roughly balanced. Minor imbalances are expected from randomization.

Task 2.2 — Baseline characteristics

Instructor Key:

Treatment_Arm Avg Age Avg Severity_Score Avg Baseline_CRP
Standard ~59–61 ~50–54 ~11–14
Intensive ~59–61 ~50–54 ~11–14

Both arms should be similar at baseline (randomized trial). If large differences appear, students should note them as potential confounders.

Task 2.3 — Regional CRP

Instructor Key:

Expect small differences across North/Central/South. CRP averages typically range 11–15 mg/L.

Grading the bar chart: - Chart type appropriate (bar or column). - Title present and descriptive. - Field buttons hidden. - One-sentence interpretation.

Task 2.4 — Written interpretation

Instructor Note: Look for: - Statement that arms are roughly balanced (good randomization). - Noting any regional variation in CRP. - No need for formal hypothesis testing at this stage.

Grading Part 2: /15 total — Pivots correct (5), chart clean (5), interpretation reasonable (5).


Part 3: Life Table — Overall Cohort — Keys

Task 3.1–3.4 — Life table construction

Instructor Key (approximate values — verify against current dataset):

Interval Start End N_at_Risk Events Censored Interval_Surv Cumul_Surv
0–12 0 12 290 ~38–42 ~17–20 ~0.86 ~0.86
12–24 12 24 ~230 ~37–42 ~15–20 ~0.82 ~0.71
24–36 24 36 ~175 ~37–42 ~15–20 ~0.77 ~0.55
36–48 36 48 ~120 ~35–40 ~18–22 ~0.70 ~0.38
48–60 48 60 ~65 ~35–40 ~18–22 ~0.45 ~0.17

Values above are illustrative. Run the dataset through the formulas to get exact numbers for your seed.

Instructor Note — Common errors:

  1. First interval lower bound: Students may use ">0" and miss patients at exactly Time_Months = 0 (if any exist). Using ">=0" is safer for the first interval.
  2. Double-counting at boundaries: If a patient has Time_Months = 12.0, which interval does it fall in? Ensure consistency: (Start, End] means ">"&Start, "<="&End.
  3. N_at_Risk not decrementing: The recursion N_next = N_current - Events_current - Censored_current is the most critical formula. If it's wrong, all subsequent rows break.
  4. Cumulative Survival increasing: This is impossible. If it happens, check the Events count (should not be negative) or the Interval_Survival formula.

Task 3.5 — Verification

Instructor Key: - Sum of all Events + Censored = Total patients (290). If not, there's a boundary or formula error. - Cumulative Survival is monotonically non-increasing.

Grading Part 3: /25 total — Correct intervals and COUNTIFS (10), N_at_Risk recursion correct (5), Interval & Cumulative Survival formulas correct (5), verification checks pass (5).


Part 4: Survival Curve — Keys

Task 4.1 — Overall curve

Instructor Key:

Instructor Note: If students use a line chart instead of scatter, the X-axis spacing may be wrong (treating intervals as categories rather than numeric). Deduct partially.

Grading Part 4: /10 total — Correct chart type (3), proper axis formatting (3), title and labels present (2), clean design (2).


Part 5: Stratified Survival — Keys

Task 5.1 — Two stratified life tables

Instructor Key:

The COUNTIFS formulas add one criterion: =COUNTIFS(Tbl_Clinical[Time_Months], ">"&Start, Tbl_Clinical[Time_Months], "<="&End, Tbl_Clinical[Event_Status], 1, Tbl_Clinical[Treatment_Arm], "Standard")

N_at_Risk for Standard arm first interval: =COUNTIF(Tbl_Clinical[Treatment_Arm], "Standard") → ~140–145

N_at_Risk for Intensive arm first interval: =COUNTIF(Tbl_Clinical[Treatment_Arm], "Intensive") → ~140–150

The Intensive arm should show higher cumulative survival at most time points (the data was generated with lower event probability for Intensive: p_event = 0.62 vs 0.72).

Instructor Note — Common errors:

  1. Students forget to add the Treatment_Arm criterion to both Events and Censored formulas.
  2. Students use the overall N_at_Risk instead of arm-specific N_at_Risk.
  3. The two series are plotted on separate charts instead of overlaid.

Task 5.2 — Comparison chart

Instructor Key:

Alternative acceptable approach: Some students may filter the data using Slicers on a copy, then build separate life tables. This is valid as long as the formulas are correct.

Task 5.3 — Interpretation

Instructor Key — expected answers:

  1. Intensive arm has better survival (curve stays higher).
  2. Difference becomes visible around 12–24 months and widens over time.
  3. No formal statistical conclusion — a visual comparison alone cannot determine significance. Log-rank test or Cox regression (Week 11) is needed.

Grading: Full credit requires acknowledging the limitation in point 3.

Grading Part 5: /25 total — Two correct stratified life tables (10), comparison chart with both series (8), interpretation with limitation acknowledged (7).


Part 6: Scatter — Keys

Task 6.1–6.2 — Severity vs. Time scatter

Instructor Key:

Expected pattern: Higher severity patients tend to have slightly shorter times (subtle — the synthetic data includes a 0.85 multiplier for Severity > 65). Censored observations should be spread across the severity range but tend to appear at various time points (they didn't experience the event during follow-up).

Task 6.3 — Interpretation

Instructor Key:

Grading Part 6: /15 total — Two-series scatter correct (8), formatting and labels (3), interpretation (4).


Part 7: Summary & Reflection — Grading

Instructor Note: Look for five elements:

  1. Cohort summary — correct numbers.
  2. Overall survival — references the curve and approximate median.
  3. Treatment comparison — identifies which arm is better visually.
  4. Limitations — explicitly states that visual comparison is not a statistical test; mentions log-rank or Cox for formal inference.
  5. Censoring — explains that censored patients are not "failures" and that ignoring censoring would bias results.

Grading Part 7: /10 total — All 5 elements present and coherent.


Grading Rubric Summary

Component Points
Part 1: Import & data inspection /5
Part 2: Exploratory covariate analysis (pivots + chart + interpretation) /15
Part 3: Overall life table (formulas + verification) /25
Part 4: Overall survival curve /10
Part 5: Stratified survival (two life tables + comparison chart + interpretation) /25
Part 6: Severity vs Time scatter (two-series + interpretation) /15
Part 7: Summary memo /5
Total /100

Dataset generation

Regenerate clinical data with:

python3 generate_lab4_multivariate_data.py

→ produces Lab4-Multivariate-Clinical.csv