This lab focuses entirely on survival analysis using Lab4-Multivariate-Clinical.csv (~290 patients, multivariate clinical trial data). Students import data, explore covariates, build life tables, plot survival curves, compare treatment arms, and interpret results.
Estimated time: 120 minutes
Ctrl+T → name it Tbl_Clinical.Instructor Key:
The dataset has ~290 rows (exact count depends on generator seed; current version = 290 patients).
- Total patients:
=COUNTA(Tbl_Clinical[Patient_ID])→ 290- Events:
=COUNTIF(Tbl_Clinical[Event_Status], 1)→ approximately 192–198 (varies with seed)- Censored:
=COUNTIF(Tbl_Clinical[Event_Status], 0)→ approximately 92–98- Event rate: ~67%
Grading: Full credit if formulas are correct and numbers are consistent with the data. Exact counts may differ slightly if students imported differently or have header row issues.
Instructor Key:
Treatment_Arm Female Male Grand Total Standard ~70–75 ~65–75 ~140–145 Intensive ~65–75 ~70–75 ~140–150 Grand Total ~140 ~150 290 Arms should be roughly balanced. Minor imbalances are expected from randomization.
Instructor Key:
Treatment_Arm Avg Age Avg Severity_Score Avg Baseline_CRP Standard ~59–61 ~50–54 ~11–14 Intensive ~59–61 ~50–54 ~11–14 Both arms should be similar at baseline (randomized trial). If large differences appear, students should note them as potential confounders.
Instructor Key:
Expect small differences across North/Central/South. CRP averages typically range 11–15 mg/L.
Grading the bar chart: - Chart type appropriate (bar or column). - Title present and descriptive. - Field buttons hidden. - One-sentence interpretation.
Instructor Note: Look for: - Statement that arms are roughly balanced (good randomization). - Noting any regional variation in CRP. - No need for formal hypothesis testing at this stage.
Grading Part 2: /15 total — Pivots correct (5), chart clean (5), interpretation reasonable (5).
Instructor Key (approximate values — verify against current dataset):
Interval Start End N_at_Risk Events Censored Interval_Surv Cumul_Surv 0–12 0 12 290 ~38–42 ~17–20 ~0.86 ~0.86 12–24 12 24 ~230 ~37–42 ~15–20 ~0.82 ~0.71 24–36 24 36 ~175 ~37–42 ~15–20 ~0.77 ~0.55 36–48 36 48 ~120 ~35–40 ~18–22 ~0.70 ~0.38 48–60 48 60 ~65 ~35–40 ~18–22 ~0.45 ~0.17 Values above are illustrative. Run the dataset through the formulas to get exact numbers for your seed.
Instructor Note — Common errors:
- First interval lower bound: Students may use
">0"and miss patients at exactly Time_Months = 0 (if any exist). Using">=0"is safer for the first interval.- Double-counting at boundaries: If a patient has Time_Months = 12.0, which interval does it fall in? Ensure consistency:
(Start, End]means">"&Start, "<="&End.- N_at_Risk not decrementing: The recursion
N_next = N_current - Events_current - Censored_currentis the most critical formula. If it's wrong, all subsequent rows break.- Cumulative Survival increasing: This is impossible. If it happens, check the Events count (should not be negative) or the Interval_Survival formula.
Instructor Key: - Sum of all Events + Censored = Total patients (290). If not, there's a boundary or formula error. - Cumulative Survival is monotonically non-increasing.
Grading Part 3: /25 total — Correct intervals and COUNTIFS (10), N_at_Risk recursion correct (5), Interval & Cumulative Survival formulas correct (5), verification checks pass (5).
Instructor Key:
- Chart type: Scatter with Straight Lines and Markers is acceptable. True KM is a step function — mention in class that the scatter approximation is adequate for this course.
- Y-axis must go from 0 to 1.0 (or 0% to 100%).
- The curve should show a steady decline. Median survival (where curve crosses 0.5) is roughly around 30–36 months for this dataset.
Instructor Note: If students use a line chart instead of scatter, the X-axis spacing may be wrong (treating intervals as categories rather than numeric). Deduct partially.
Grading Part 4: /10 total — Correct chart type (3), proper axis formatting (3), title and labels present (2), clean design (2).
Instructor Key:
The COUNTIFS formulas add one criterion:
=COUNTIFS(Tbl_Clinical[Time_Months], ">"&Start, Tbl_Clinical[Time_Months], "<="&End, Tbl_Clinical[Event_Status], 1, Tbl_Clinical[Treatment_Arm], "Standard")N_at_Risk for Standard arm first interval:
=COUNTIF(Tbl_Clinical[Treatment_Arm], "Standard")→ ~140–145N_at_Risk for Intensive arm first interval:
=COUNTIF(Tbl_Clinical[Treatment_Arm], "Intensive")→ ~140–150The Intensive arm should show higher cumulative survival at most time points (the data was generated with lower event probability for Intensive: p_event = 0.62 vs 0.72).
Instructor Note — Common errors:
- Students forget to add the
Treatment_Armcriterion to both Events and Censored formulas.- Students use the overall N_at_Risk instead of arm-specific N_at_Risk.
- The two series are plotted on separate charts instead of overlaid.
Instructor Key:
- Two series on one chart, clearly labeled with legend.
- The Intensive arm curve should be above the Standard arm curve (better survival).
- The separation should be visible from ~12 months onward.
Alternative acceptable approach: Some students may filter the data using Slicers on a copy, then build separate life tables. This is valid as long as the formulas are correct.
Instructor Key — expected answers:
- Intensive arm has better survival (curve stays higher).
- Difference becomes visible around 12–24 months and widens over time.
- No formal statistical conclusion — a visual comparison alone cannot determine significance. Log-rank test or Cox regression (Week 11) is needed.
Grading: Full credit requires acknowledging the limitation in point 3.
Grading Part 5: /25 total — Two correct stratified life tables (10), comparison chart with both series (8), interpretation with limitation acknowledged (7).
Instructor Key:
- X =
Severity_Score, Y =Time_Months, two series byEvent_Status.- Approach 1: Helper column
=IF([@Event_Status]=1, "Event", "Censored"), then manually select ranges for each series.- Approach 2: Sort by
Event_Status, select Event rows as Series 1, Censored rows as Series 2.Expected pattern: Higher severity patients tend to have slightly shorter times (subtle — the synthetic data includes a 0.85 multiplier for Severity > 65). Censored observations should be spread across the severity range but tend to appear at various time points (they didn't experience the event during follow-up).
Instructor Key:
- Mild negative relationship between severity and time (sicker patients may have earlier events).
- Censored observations appear throughout — they are not clustered only at long times, which is realistic (patients may be censored at any time for administrative reasons).
- This scatter helps visualize informative vs. non-informative censoring concepts.
Grading Part 6: /15 total — Two-series scatter correct (8), formatting and labels (3), interpretation (4).
Instructor Note: Look for five elements:
- Cohort summary — correct numbers.
- Overall survival — references the curve and approximate median.
- Treatment comparison — identifies which arm is better visually.
- Limitations — explicitly states that visual comparison is not a statistical test; mentions log-rank or Cox for formal inference.
- Censoring — explains that censored patients are not "failures" and that ignoring censoring would bias results.
Grading Part 7: /10 total — All 5 elements present and coherent.
| Component | Points |
|---|---|
| Part 1: Import & data inspection | /5 |
| Part 2: Exploratory covariate analysis (pivots + chart + interpretation) | /15 |
| Part 3: Overall life table (formulas + verification) | /25 |
| Part 4: Overall survival curve | /10 |
| Part 5: Stratified survival (two life tables + comparison chart + interpretation) | /25 |
| Part 6: Severity vs Time scatter (two-series + interpretation) | /15 |
| Part 7: Summary memo | /5 |
| Total | /100 |
Regenerate clinical data with:
python3 generate_lab4_multivariate_data.py
→ produces Lab4-Multivariate-Clinical.csv