Lab 4: Survival Analysis with Excel

Overview

In this lab you will apply the survival analysis concepts from the Week 7 lecture to a multivariate clinical dataset (Lab4-Multivariate-Clinical.csv, ~290 patients). You will work through the full analytical pipeline: importing and understanding the data, exploring covariates with PivotTables, building life tables, plotting survival curves, comparing treatment groups, and interpreting the results.

Estimated time: 120 minutes
Tools: Microsoft Excel (2016 or newer)


Dataset

File: Lab4-Multivariate-Clinical.csv

Column Type Description
Patient_ID Text Unique patient identifier
Age Number Age at enrollment (years)
Gender Text Female / Male
Treatment_Arm Text Standard / Intensive
Severity_Score Number Disease severity at baseline (0–100; higher = more severe)
Baseline_CRP Number C-Reactive Protein at baseline (mg/L; inflammatory biomarker)
Region_Clinic Text North / Central / South
Time_Months Number Follow-up time in months
Event_Status Number 1 = event occurred; 0 = censored (no event by end of follow-up)

Clinical context: Patients with a chronic condition were randomized to Standard or Intensive treatment. Time_Months records how long each patient was followed. Event_Status = 1 means the outcome of interest was observed; 0 means the patient was still event-free when follow-up ended (right-censored).


Part 1: Import & Prepare (10 min)

Task 1.1 — Import the CSV

  1. Open Excel. Go to Data > Get Data > From Text/CSV (or Data > From Text).
  2. Select Lab4-Multivariate-Clinical.csv. Preview the data to confirm columns.
  3. Click Load to import into a new worksheet.
  4. Select any cell in the imported data. Press Ctrl+T (or Insert > Table) to convert to an Excel Table. Name it Tbl_Clinical (Table Design > Table Name).

Task 1.2 — Quick data inspection

On a new sheet named Summary, answer:

  1. How many patients are in the dataset? =COUNTA(Tbl_Clinical[Patient_ID])
  2. How many experienced the event? =COUNTIF(Tbl_Clinical[Event_Status], 1)
  3. How many were censored? =COUNTIF(Tbl_Clinical[Event_Status], 0)
  4. What is the overall event rate (%)? = Events / Total * 100

Part 2: Exploratory Covariate Analysis (20 min)

Goal: Before building survival curves, understand who is in the study. Covariates can influence outcomes and are essential context.

Task 2.1 — Patient distribution by Treatment and Gender

  1. Insert a PivotTable on a new sheet Exploration.
  2. Rows: Treatment_Arm. Columns: Gender. Values: Count of Patient_ID.
  3. Question: Are the treatment arms roughly balanced by gender?

Task 2.2 — Baseline characteristics by Treatment Arm

  1. Insert another PivotTable (same sheet or new).
  2. Rows: Treatment_Arm.
  3. Values: Average of Age, Average of Severity_Score, Average of Baseline_CRP.
  4. Format the averages to 1 decimal place.
  5. Question: Are the two treatment arms similar at baseline, or is one group older/sicker?

Task 2.3 — Regional CRP variation

  1. PivotTable: Rows = Region_Clinic, Values = Average of Baseline_CRP.
  2. Insert a Bar Chart from this PivotTable (Insert > Bar Chart).
  3. Apply clean formatting: remove gridlines, add a descriptive title, hide field buttons.
  4. Question: Which region has the highest baseline inflammation?

Task 2.4 — Brief written interpretation

Write 3–4 sentences summarizing what you learned from Tasks 2.1–2.3. Are the groups comparable? Any imbalances worth noting?


Part 3: Life Table — Overall Cohort (25 min)

Goal: Build a grouped life table by hand in Excel and calculate cumulative survival.

Task 3.1 — Set up the life table

  1. Create a new sheet named Survival.
  2. Build a table with the following column headers:
Interval Start End N_at_Risk Events Censored Interval_Survival Cumulative_Survival
  1. Define 5 intervals: 0–12, 12–24, 24–36, 36–48, 48–60 months.

Task 3.2 — Count Events and Censored per interval

For each interval, use COUNTIFS to count:

For the first interval (0–12), use ">="&0 or ">=0" for the lower bound.

Task 3.3 — Calculate N at Risk

Task 3.4 — Calculate survival probabilities

Task 3.5 — Verify your table


Part 4: Survival Curve — Overall (10 min)

Task 4.1 — Plot the overall survival curve

  1. Select the End (or midpoint) column as X and Cumulative_Survival as Y.
  2. Insert > Scatter > Scatter with Straight Lines and Markers.
  3. Format:
  4. Y-axis: set maximum to 1.0 (or 100%). Minimum to 0.
  5. Title: "Overall Survival Curve (Life Table Method)".
  6. X-axis label: "Time (Months)".
  7. Y-axis label: "Cumulative Survival Probability".
  8. Remove unnecessary gridlines for a cleaner look.

Part 5: Stratified Survival — Treatment Comparison (30 min)

Goal: Compare survival between Standard and Intensive treatment arms.

Task 5.1 — Build two stratified life tables

On the Survival sheet (or a new sheet Stratified), create two life tables side by side:

Use the same intervals (0–12, 12–24, 24–36, 36–48, 48–60).

For the COUNTIFS formulas, add one more criterion:
Tbl_Clinical[Treatment_Arm], "Standard" (or "Intensive").

Calculate N_at_Risk, Events, Censored, Interval_Survival, and Cumulative_Survival for each arm separately.

Task 5.2 — Plot both curves on one chart

  1. Create a chart with two series:
  2. Series 1: Standard arm Cumulative_Survival (use one color, e.g., blue).
  3. Series 2: Intensive arm Cumulative_Survival (use another color, e.g., orange).
  4. Insert > Scatter > Scatter with Straight Lines and Markers.
  5. Format:
  6. Add a legend identifying each arm.
  7. Title: "Survival by Treatment Arm".
  8. Same axis formatting as Part 4.

Task 5.3 — Interpret the comparison

Write 2–3 sentences answering: 1. Which treatment arm appears to have better survival? 2. At what time point does the difference become most visible? 3. Can we conclude one treatment is statistically better from this visual alone? Why or why not?


Part 6: Multivariate Scatter — Severity vs. Time (15 min)

Goal: Visualize the relationship between disease severity and follow-up time, distinguishing events from censored observations.

Task 6.1 — Prepare data for two-series scatter

  1. On the Tbl_Clinical table, add a helper column: Is_Event = =IF([@Event_Status]=1, "Event", "Censored").
  2. Alternatively, work with two filtered ranges.

Task 6.2 — Create the scatter plot

  1. X-axis: Severity_Score. Y-axis: Time_Months.
  2. Plot two series with different markers/colors:
  3. Events (Event_Status = 1): solid markers.
  4. Censored (Event_Status = 0): open or lighter markers.
  5. Title: "Severity Score vs. Follow-up Time by Event Status".

Task 6.3 — Interpret

Write 2–3 sentences: 1. Is there a visible relationship between severity and time? 2. Where do the censored observations tend to cluster? What does this tell you?


Part 7: Summary & Reflection (10 min)

Write a short memo (half a page to one page) covering:

  1. Cohort summary: How many patients, event rate, key baseline differences between arms.
  2. Overall survival: What does the overall curve show? At what time point does survival drop below 50%?
  3. Treatment comparison: Which arm looks better visually? How large is the gap?
  4. Limitations: Why can't we draw a firm statistical conclusion from life tables alone? (Hint: no formal test like log-rank or Cox regression — that comes in Week 11.)
  5. Censoring awareness: How does censoring affect interpretation?

Deliverables

Submit one workbook StudentName_ID_Lab4.xlsx containing:

Sheet Content
Tbl_Clinical (or Data) Imported dataset as Excel Table
Summary Quick counts (total, events, censored, event rate)
Exploration PivotTables (Tasks 2.1–2.3) + bar chart + written interpretation
Survival Overall life table + overall survival curve
Stratified Two stratified life tables + comparison chart + interpretation
Scatter Severity vs. Time scatter + interpretation

Plus the short memo (in a text box on a sheet, or as a separate Word/PDF upload).

Submit via the LMS.


Academic integrity

Use your own Excel file; course datasets are for learning only.