Lab 3: Medical EDA with Confidence Intervals and p-values
Load and clean medical data, run descriptive analysis, segment with PivotTables, and practice CI/p-value interpretation in Excel.
Important: Excel vs Google Sheets
This lab is designed for Microsoft Excel. You may use formulas in both tools, but menu paths for Power Query, PivotTables, and some statistical options are Excel-specific.
Dataset Overview
Introduction
This lab uses a synthetic Medical EDA dataset (patient-level records) created for Week 5 content:
- Descriptive statistics
- Summary tables
- Data segmentation
- PivotTables
- Introductory confidence intervals (95% CI) and p-values
Dataset Files
- Google Drive (recommended): Lab 2/3 Dataset Folder
- Course website (alternative):
- Medical-EDA.csv
- Medical-EDA.xlsx
- Student package:
student desk/Lab-3-Dataset/(if provided)
Variables
| Column | Type | Description |
|---|---|---|
| PatientID | Text | Unique patient ID |
| VisitDate | Date/Text | Visit date (some rows need type fixes) |
| Age | Number | Age in years |
| Sex | Text | Female/Male |
| Region | Text | North/South/East/West (inconsistent spacing/case in some rows) |
| HospitalUnit | Text | Clinical unit |
| DiagnosisGroup | Text | Hypertension/Diabetes/Respiratory (inconsistent case in some rows) |
| TreatmentGroup | Text | A/B (some lowercase values) |
| BaselineSBP | Number | Baseline systolic blood pressure |
| FollowupSBP | Number | Follow-up systolic blood pressure |
| LDL | Number/Text | LDL cholesterol (mixed numeric/text in some rows) |
| HbA1c | Number | Glycemic marker (some missing values) |
| BMI | Number | Body Mass Index |
| AdherenceScore | Number/Text | Medication adherence score (mixed numeric/text in some rows) |
| LengthOfStay | Number | Days in hospital |
| Readmission30d | Binary (0/1) | 30-day readmission indicator |
| Notes | Text | Optional notes |
Data Quality Notes
- Duplicate records
- Blank rows
- Mixed text/number types (LDL, AdherenceScore)
- Inconsistent category formatting (Region, DiagnosisGroup, TreatmentGroup)
- Missing values (Region, HbA1c)
- Outliers (BMI, LengthOfStay)
Learning Objectives
By the end of this lab, you can:
- Import and clean medical data in Excel
- Produce descriptive statistics and segmented summaries
- Build PivotTables for subgroup analysis
- Compute and interpret 95% CIs
- Compute and interpret p-values for simple comparisons
Quick Reference (Week 5 Skills)
- Import:
Data -> Get Data -> From File -> From Text/CSVorFrom Excel Workbook - Open query again:
Data -> Queries & Connections -> Right-click query -> Edit - PivotTable:
Insert -> PivotTable - Pivot grouping: right-click date/number field -> Group
- Pivot filters: Report Filter and optional Slicer
- Pivot design: Subtotals, Grand Totals, Report Layout
- Pivot calculated field: PivotTable Analyze -> Fields, Items, & Sets -> Calculated Field
- Refresh:
Data -> Refresh All - 95% CI formula:
mean +/- t* * (SD / sqrt(n)) - Two-sample p-value:
=T.TEST(rangeA, rangeB, 2, 3) - Categorical p-value (optional):
=CHISQ.TEST(actual_range, expected_range)
Lab Exercises
Part A: Load and Clean
Exercise A1: Import and Inspect
Question: Import Medical-EDA.csv (or .xlsx) and identify at least 4 quality issues.
Instructions:
1. Import with Transform Data.
2. Inspect data types and scan columns for missing values and inconsistent categories.
3. Record at least 4 issues found.
Exercise A2: Clean Core Columns
Question: Clean data so all key columns are analysis-ready.
Instructions:
1. Remove fully blank rows.
2. Remove duplicates (all columns).
3. Trim text columns: Region, DiagnosisGroup, TreatmentGroup.
4. Replace category variants (e.g., a -> A, b -> B, uppercase/lowercase diagnosis labels -> standard labels).
5. Set data types:
- VisitDate -> Date
- Age, BaselineSBP, FollowupSBP, LDL, HbA1c, BMI, AdherenceScore, LengthOfStay -> Number
- Readmission30d -> Whole Number
Part B: Descriptive Statistics
Exercise B1: Overall Summary
Question: Build a summary table for BaselineSBP, FollowupSBP, LDL, and BMI.
Instructions:
1. On a new sheet, create a table with metrics: Mean, Median, SD, Min, Max, N.
2. Use formulas such as:
- AVERAGE(range)
- MEDIAN(range)
- STDEV.S(range)
- MIN(range), MAX(range)
- COUNT(range)
Exercise B2: Segment by Treatment
Question: Compare FollowupSBP and LengthOfStay between Treatment A and B.
Instructions:
1. Use AVERAGEIFS (or PivotTable) to compute mean by treatment.
2. Report group means side-by-side.
Part C: Summary Tables and Segmentation
Exercise C1: COUNTIFS and AVERAGEIFS
Question: Create a segmented summary by Region and DiagnosisGroup.
Instructions:
1. Build a small matrix of counts: COUNTIFS(Region, r, DiagnosisGroup, d).
2. Build another matrix of means: AVERAGEIFS(FollowupSBP, Region, r, DiagnosisGroup, d).
Exercise C2: Readmission Segmentation
Question: Compare readmission rates by TreatmentGroup and Sex.
Instructions:
1. Compute subgroup readmission rate: AVERAGEIFS(Readmission30d, TreatmentGroup, g, Sex, s).
2. Convert to percent format.
Part D: PivotTables
Exercise D1: PivotTable 1 (Clinical Outcome by Treatment)
Question: Summarize FollowupSBP and LengthOfStay by TreatmentGroup.
Instructions:
1. Insert PivotTable from clean table.
2. Rows: TreatmentGroup
3. Values: Average of FollowupSBP, Average of LengthOfStay
Exercise D2: PivotTable 2 (Readmission by Region and Diagnosis)
Question: Build a segmented PivotTable for readmission.
Instructions:
1. Rows: Region
2. Columns: DiagnosisGroup
3. Values: Average of Readmission30d (format as %)
4. Optional Filter: Sex
Exercise D3: PivotTable 3 (Time Grouping by Month/Quarter)
Question: Group patient visits by month/quarter and summarize average follow-up SBP.
Instructions:
1. Create a new PivotTable from the clean table.
2. Rows: VisitDate
3. Values: Average of FollowupSBP
4. Right-click any date in the PivotTable -> Group -> select Months (and optionally Quarters).
5. Add TreatmentGroup to Filters to compare A vs B after grouping.
Exercise D4: Pivot Layout, Filters, and Slicers
Question: Improve readability and interactivity of a PivotTable.
Instructions:
1. Start from PivotTable 1 or 2.
2. Turn on/off Subtotals and Grand Totals (PivotTable Design tab).
3. Change Report Layout (Compact/Outline/Tabular) and choose the clearest format.
4. Add at least one report filter (e.g., Sex or HospitalUnit).
5. Optional: Insert a Slicer for TreatmentGroup and test interactive filtering.
Exercise D5 (optional): Calculated Field in PivotTable
Question: Add a calculated field for SBP improvement (BaselineSBP - FollowupSBP) in a PivotTable.
Instructions:
1. Open a PivotTable built from the cleaned data.
2. PivotTable Analyze -> Fields, Items, & Sets -> Calculated Field.
3. Name: SBP_Improvement.
4. Formula: =BaselineSBP - FollowupSBP
5. Add TreatmentGroup as Rows and summarize average SBP_Improvement.
Part E: Confidence Intervals and p-values
Exercise E1: 95% CI for Mean FollowupSBP (Overall)
Question: Calculate the 95% CI for mean FollowupSBP.
Instructions:
1. Compute:
- mean = AVERAGE(range)
- sd = STDEV.S(range)
- n = COUNT(range)
2. Compute t critical:
- t_star = T.INV.2T(0.05, n-1)
3. Margin of error:
- ME = t_star * sd / SQRT(n)
4. CI bounds:
- Lower = mean - ME
- Upper = mean + ME
Exercise E2: 95% CI by Treatment Group
Question: Compute separate 95% CIs for Treatment A and B (FollowupSBP).
Instructions:
1. Filter or reference each treatment group separately.
2. Repeat E1 formulas for each group.
3. Compare intervals qualitatively (overlap vs separation).
Exercise E3: p-value for Difference in Means (Treatment A vs B)
Question: Is FollowupSBP different between Treatment A and B?
Instructions:
1. Put group ranges in two columns (A and B).
2. Use:
- =T.TEST(rangeA, rangeB, 2, 3)
- 2 = two-tailed, 3 = unequal variances (Welch)
3. Decision rule:
- If p < 0.05, reject equal-means null hypothesis.
Exercise E4 (optional): p-value for Categorical Association
Question: Is 30-day readmission associated with TreatmentGroup?
Instructions:
1. Build a 2x2 observed table (TreatmentGroup x Readmission30d) using COUNTIFS or PivotTable.
2. Build expected counts table.
3. Use CHISQ.TEST(observed_range, expected_range).
Part F: Short Interpretation
Exercise F1: Mini report paragraph
Question: Write 5-8 sentences summarizing findings.
Instructions:
Include:
1. One descriptive-statistics insight
2. One segmentation/PivotTable insight
3. One CI statement
4. One p-value statement
5. One practical implication for healthcare decision-making
Verification Checklist
- [ ] Cleaned dataset loaded in Excel
- [ ] Descriptive statistics table completed
- [ ] At least 2 segmented summary tables completed
- [ ] At least 3 PivotTables completed (including one grouped by month/quarter)
- [ ] At least one Pivot filter (or slicer) used
- [ ] Optional calculated field tested (
SBP_Improvement) - [ ] 95% CI computed correctly (overall and by treatment)
- [ ] p-value test completed and interpreted correctly
- [ ] Mini report paragraph completed
Common Errors and Fixes
- Numbers treated as text
- Fix with data type conversion in Power Query or
VALUE()in Excel. - Date not recognized
- Check locale and source format; re-parse date column.
- T.TEST returns error
- Ensure both ranges are numeric and contain no text blanks.
- CHISQ.TEST error
- Observed and expected ranges must have same shape and non-zero expected counts.
Conclusion
You practiced the full Week 5 EDA workflow on medical data:
- Load and clean
- Descriptive and segmented summaries
- PivotTables
- Confidence intervals and p-values
These outputs feed directly into Group Progress Report 3 and prepare you for later visualization and statistical-analysis weeks.