Patient Survival Analysis Dataset Instructions (GBSG2)

Dataset Overview

This dataset is based on the German Breast Cancer Study Group 2 (GBSG2) dataset, containing information on 686 patients with primary node-positive breast cancer. The dataset includes survival times, event indicators, and various prognostic factors for survival analysis.

Variables Description

Survival Analysis Variables

  • time: Survival time in days (integer)
  • status: Event indicator (1 = death/recurrence, 0 = censored/alive)

Patient Characteristics

  • age: Patient age in years (integer, typically 25-80)
  • menostat: Menopausal status (Pre, Post)

Treatment Information

  • horTh: Hormonal therapy (yes, no)

Clinical Variables

  • tsize: Tumor size in mm (integer, typically 5-60)
  • tgrade: Tumor grade (1, 2, 3)
    • 1 = Well differentiated (low grade)
    • 2 = Moderately differentiated (intermediate grade)
    • 3 = Poorly differentiated (high grade)
  • pnodes: Number of positive nodes (integer, 0-20)
  • progrec: Progesterone receptor level (integer, 0-200)
  • estrec: Estrogen receptor level (integer, 0-200)

Survival Analysis Notes

  • Event: Death or recurrence (status = 1)
  • Time: Days from diagnosis/treatment to event or censoring
  • Censoring: Patients still alive or lost to follow-up (status = 0)
  • Censored observations: Patients who did not experience the event during the observation period

Data Quality

  • No missing values
  • All variables are properly formatted
  • Time-to-event data is complete and valid
  • Event indicators are binary (0/1)

Use Cases

This dataset is suitable for:

  • Calculating survival probabilities over time
  • Comparing survival across treatment groups (hormonal therapy)
  • Identifying prognostic factors (age, tumor characteristics)
  • Creating survival curves for different patient groups
  • Analyzing the impact of clinical variables on patient outcomes

Excel Analysis Tips

  • Use PivotTables to calculate survival probabilities by time periods
  • Create charts to visualize survival curves
  • Compare survival between treatment groups (horTh: yes vs no)
  • Analyze survival by tumor grade, size, or node status
  • Use conditional formulas to segment patients by characteristics
  • Create comparative visualizations showing survival differences

Medical Context

This dataset represents real-world survival analysis in oncology, where understanding patient prognosis and treatment effectiveness is critical for clinical decision-making.