Midterm Practical Lab Exam — Instructor Version (With Answer Key)

Course: Data Analysis with Spreadsheet Program Scope: Weeks 1-5 (Excel Basics, Data Import & Cleaning, EDA & PivotTables) Duration: 60 minutes | Total Points: 100


Exam Rules

  • You may use Microsoft Excel or Google Sheets.
  • You may NOT use the internet, AI tools, or communicate with other students.
  • Submit two deliverables at the end of the exam:
  • Your Excel workbook (.xlsx) — saved as StudentID_Midterm.xlsx
  • Your printed Response Form (filled in by hand or typed)

Dataset Overview

You are given a Student Academic Records dataset (StudentRecords.csv or StudentRecords.xlsx) containing student enrollment and academic data. The Excel file has two sheets:

  1. StudentRecords — Main data table (~165 rows, before cleaning)
  2. FacultyInfo — Faculty lookup table (4 rows)

Main Table Columns

Column Type Description
StudentID Text Unique student identifier (e.g., SV-1001)
EnrollDate Date/Text Enrollment date (some rows have inconsistent format)
Faculty Text Faculty name (some inconsistent spacing/case)
Program Text Undergraduate or Graduate
Gender Text Female or Male
Province Text Home province (some missing)
GPA Number Grade point average (0-4.0 scale; outliers exist)
Credits Number Total credits earned
MathScore Number/Text Math exam score (mixed type in some rows)
EnglishScore Number English exam score (some missing)
Scholarship Text Yes/No (some lowercase inconsistencies)
TuitionFee Number Annual tuition in millions VND
GraduationStatus Binary (0/1) 1 = graduated on time

FacultyInfo Lookup Table

FacultyCode FacultyFullName Dean
Business Business Administration Dr. Tran Van A
IT Information Technology Dr. Le Thi B
Economics Economics Dr. Nguyen Van C
Law Law Dr. Pham Thi D

Part A: Import & Clean (20 pts, ~10 min)

Exercise A1: Import and Identify Issues (8 pts)

Question: Import StudentRecords.csv using Power Query and identify data quality issues.

Instructions: 1. Open Excel. Go to Data → Get Data → From Text/CSV. 2. Select StudentRecords.csv and click Transform Data to open Power Query Editor. 3. Inspect the data preview: check column types, scan for blanks, duplicates, and inconsistencies.

Response Form: - Q-A1a: Record the total number of rows in the raw dataset. - Q-A1b: List at least 3 specific data quality issues (column name + issue type).

Answer: - Q-A1a: 165 rows (including header, the data has 165 data rows). - Q-A1b: Typical issues include: 1. Faculty — Inconsistent spacing/capitalization (e.g., " business ", "BUSINESS", "Business") 2. MathScore — Mixed text/number type (some values stored as text with spaces like " 70.3 ") 3. EnrollDate — Inconsistent date format (most are YYYY-MM-DD, some are DD/MM/YYYY) 4. Province — Missing values in ~5 rows 5. EnglishScore — Missing values in ~5 rows 6. GPA — Outlier values above 4.0 (e.g., 4.6, 4.82) 7. Blank rows — 2 fully blank rows 8. Duplicate rows — 3 exact duplicate records 9. Scholarship — Some "yes"/"no" instead of "Yes"/"No"


Exercise A2: Clean the Data (12 pts)

Question: Clean the dataset in Power Query so it is analysis-ready.

Instructions: 1. Remove fully blank rows (select all columns → Remove Rows → Remove Blank Rows). 2. Remove duplicate rows (Home → Remove Rows → Remove Duplicates). 3. Select the Faculty column → Transform → Trim → Capitalize Each Word. 4. Select MathScore → Change Type → Decimal Number (errors become null). 5. Select EnrollDate → Change Type → Date (use locale if prompted). 6. Close & Load to a new worksheet.

Response Form: - Q-A2a: How many rows remain after removing blank rows and duplicates? - Q-A2b: How many distinct Faculty values after standardization? - Q-A2c: How many errors appeared when converting MathScore to Number? - Q-A2d: Screenshot of Applied Steps panel.

Answer: - Q-A2a: 160 rows (165 − 2 blank − 3 duplicates = 160). - Q-A2b: 4 distinct faculties (Business, Economics, IT, Law). Note: "IT" may appear as "It" if using Capitalize Each Word — this is acceptable. - Q-A2c: 0 errors if spaces are trimmed first; the text-as-number values (e.g., " 70.3 ") convert successfully once leading/trailing spaces are removed. If not trimmed, approximately 10 errors. - Q-A2d: Should show steps like: Source, Changed Type, Removed Blank Rows, Removed Duplicates, Trimmed Text (Faculty), Capitalized Each Word (Faculty), Changed Type (MathScore), Changed Type (EnrollDate).


Part B: Formulas & Lookup (20 pts, ~10 min)

Exercise B1: Basic Summary (6 pts)

Question: Calculate basic summary statistics on a new "Analysis" sheet.

Instructions: 1. Create a new sheet named "Analysis". 2. Calculate: - Total number of students: =COUNTA(StudentRecords[StudentID]) (or reference the loaded table range) - Average GPA: =AVERAGE(StudentRecords[GPA]) - Highest TuitionFee: =MAX(StudentRecords[TuitionFee]) - Lowest TuitionFee: =MIN(StudentRecords[TuitionFee])

Response Form: - Q-B1a: Total number of students. - Q-B1b: Average GPA (round to 2 decimals). - Q-B1c: Highest and Lowest TuitionFee.

Answer: - Q-B1a: 160 - Q-B1b: 3.08 - Q-B1c: Highest = 47.5, Lowest = 15.0


Exercise B2: Conditional Logic (7 pts)

Question: Classify students by academic standing using nested IF, then count each category.

Instructions: 1. Add a new column AcademicStanding in the cleaned data (or on the Analysis sheet). 2. Formula: =IF([@GPA]>=3.5, "Excellent", IF([@GPA]>=2.5, "Good", "Needs Improvement")) 3. Copy the formula down for all rows. 4. Use COUNTIF to count each category on the Analysis sheet.

Response Form: - Q-B2a: Write the exact IF formula used. - Q-B2b: Fill in the count for each standing.

Answer: - Q-B2a: =IF([@GPA]>=3.5,"Excellent",IF([@GPA]>=2.5,"Good","Needs Improvement")) (accept equivalent formulations) - Q-B2b:

Academic Standing Count
Excellent ~37
Good ~97
Needs Improvement ~26

(Values may vary slightly depending on how outlier GPAs above 4.0 are handled.)


Exercise B3: VLOOKUP (7 pts)

Question: Use VLOOKUP to add the Dean name from the FacultyInfo lookup sheet.

Instructions: 1. Add a new column Dean in the cleaned data sheet. 2. Formula: =VLOOKUP([@Faculty], FacultyInfo!A:C, 3, FALSE) - Lookup value: Faculty name in the current row - Table array: FacultyInfo sheet columns A to C - Column index: 3 (Dean is the 3rd column) - Exact match: FALSE 3. Copy the formula down for all rows.

Response Form: - Q-B3a: Write the exact VLOOKUP formula used. - Q-B3b: Which Faculty appears most frequently, and who is its Dean? - Q-B3c: What error does VLOOKUP return for a non-existent FacultyCode?

Answer: - Q-B3a: =VLOOKUP([@Faculty], FacultyInfo!A:C, 3, FALSE) (accept variants with explicit ranges) - Q-B3b: Depends on the random distribution. Check COUNTIF for each faculty. The Dean corresponds to the lookup table above. - Q-B3c: #N/A


Part C: Descriptive Statistics & Summary Tables (20 pts, ~12 min)

Exercise C1: Descriptive Statistics Table (10 pts)

Question: Build a descriptive statistics table for GPA and MathScore.

Instructions: 1. On the Analysis sheet, create a table with these metrics for both GPA and MathScore: - Mean: =AVERAGE(range) - Median: =MEDIAN(range) - Std Dev: =STDEV.S(range) - Min: =MIN(range) - Max: =MAX(range) - Count: =COUNT(range)

Response Form: - Q-C1a: Fill in the descriptive statistics table. - Q-C1b: Is the Mean GPA higher or lower than the Median? What does this suggest?

Answer:

Metric GPA MathScore
Mean 3.08 69.00
Median 3.09 70.50
Std Dev 0.56 14.24
Min 1.74 27.4
Max 4.60 99.6
Count 160 160
  • Q-C1b: Mean (3.08) is slightly lower than Median (3.09), suggesting the distribution is approximately symmetric or very slightly left-skewed. The two outlier GPAs above 4.0 pull the mean slightly upward but are not enough to make it exceed the median significantly.

Exercise C2: Segmented Summary (10 pts)

Question: Create segmented summaries using AVERAGEIFS and COUNTIFS.

Instructions: 1. Average GPA by Faculty: - =AVERAGEIFS(GPA_range, Faculty_range, "Business") — repeat for each faculty. 2. Count of scholarship students ("Yes") by Program: - =COUNTIFS(Scholarship_range, "Yes", Program_range, "Undergraduate") - =COUNTIFS(Scholarship_range, "Yes", Program_range, "Graduate")

Response Form: - Q-C2a: Average GPA by Faculty table + which Faculty has the highest average. - Q-C2b: Scholarship count by Program table + the COUNTIFS formula used.

Answer:

Average GPA by Faculty:

Faculty Average GPA
Business 3.10
Economics 2.97
IT 3.03
Law 3.20

Highest average: Law (3.20)

Scholarship count by Program:

Program Scholarship Count
Undergraduate ~5
Graduate ~21

Formula: =COUNTIFS(Scholarship_range, "Yes", Program_range, "Graduate")


Part D: PivotTables (25 pts, ~15 min)

Exercise D1: PivotTable — GPA by Faculty and Program (10 pts)

Question: Create a PivotTable showing average GPA by Faculty and Program.

Instructions: 1. Select the cleaned data range. Go to Insert → PivotTable → place on a new sheet. 2. Drag Faculty to Rows. 3. Drag Program to Columns. 4. Drag GPA to Values → Summarize by Average. 5. Go to PivotTable Design tab → Report Layout → Show in Tabular Form. 6. Enable Grand Totals for both rows and columns.

Response Form: - Q-D1a: Copy the PivotTable results into the provided table. - Q-D1b: Which Faculty+Program combination has the highest average GPA? Is the Graduate-Undergraduate gap consistent across faculties?

Answer:

Faculty Graduate Undergraduate Grand Total
Business 3.41 2.69 3.10
Economics 3.18 2.75 2.97
IT 3.26 2.75 3.03
Law 3.30 3.05 3.20
Grand Total 3.30 2.80 3.08
  • Q-D1b: Business Graduate (3.41) has the highest average GPA. The Graduate-Undergraduate gap varies: it is largest for Business (~0.72) and smallest for Law (~0.25).

Exercise D2: PivotTable — Enrollment by Year (8 pts)

Question: Create a PivotTable showing enrollment count and average tuition by year.

Instructions: 1. Create a new PivotTable from the cleaned data. 2. Drag EnrollDate to Rows → right-click any date → Group → select Years. 3. Drag StudentID to Values → Summarize by Count. 4. Drag TuitionFee to Values → Summarize by Average. 5. Add Gender to Report Filter (or Filters area).

Response Form: - Q-D2a: Number of distinct years and which year had the most enrollments. - Q-D2b: Average TuitionFee for Female students in the most recent year, and for Male students.

Answer: - Q-D2a: 3 years (2022, 2023, 2024). Most enrollments: 2023 (60 students). - Q-D2b: Values depend on the Gender filter. Students should filter the PivotTable by Gender and read the value for 2024.


Exercise D3: Slicer & Observation (7 pts)

Question: Add a Slicer and draw an insight from interactive filtering.

Instructions: 1. Click on PivotTable 1 (from D1). 2. Go to PivotTable Analyze → Insert Slicer → select Scholarship. 3. Click "Yes" in the Slicer to filter to scholarship students only. 4. Observe how the GPA averages change.

Response Form: - Q-D3a: Screenshot of PivotTable 1 with Slicer set to "Yes". - Q-D3b: One specific observation comparing scholarship students' GPA to the overall average.

Answer: - Q-D3a: Screenshot should show the Slicer connected to the PivotTable with "Yes" highlighted. - Q-D3b: Scholarship students have higher average GPA across all faculties (typically above 3.5), which is expected since the scholarship criterion in the dataset is linked to GPA >= 3.5.


Part E: Confidence Interval, p-value & Interpretation (15 pts, ~13 min)

Exercise E1: 95% CI for Mean GPA (6 pts)

Question: Calculate the 95% confidence interval for the overall mean GPA.

Instructions: 1. On the Analysis sheet, compute: - mean = AVERAGE(GPA_range) - sd = STDEV.S(GPA_range) - n = COUNT(GPA_range) - t_star = T.INV.2T(0.05, n-1) - ME = t_star * sd / SQRT(n) - Lower = mean - ME - Upper = mean + ME

Response Form: - Q-E1: Fill in the CI calculation table and write the interpretation sentence.

Answer:

Component Value
Mean 3.08
Std Dev 0.56
N 160
t* 1.9750
Margin of Error 0.087
Lower Bound 2.99
Upper Bound 3.17

Interpretation: "We are 95% confident that the true population mean GPA lies between 2.99 and 3.17."


Exercise E2: Two-Group Comparison (5 pts)

Question: Test whether mean GPA differs between Undergraduate and Graduate students.

Instructions: 1. Identify the GPA values for Undergraduate students (one range) and Graduate students (another range). You can filter, sort, or use helper columns to separate the two groups. 2. Use: =T.TEST(range_undergrad, range_grad, 2, 3) - 2 = two-tailed test - 3 = unequal variances (Welch's t-test) 3. Compare the p-value to 0.05.

Response Form: - Q-E2a: The exact T.TEST formula. - Q-E2b: The p-value result. - Q-E2c: Is p < 0.05? Conclusion statement.

Answer: - Q-E2a: =T.TEST(undergrad_GPA_range, grad_GPA_range, 2, 3) - Q-E2b: p ≈ 0.0000 (very small, essentially zero to 4 decimal places) - Q-E2c: Yes, p < 0.05. Conclusion: "The difference in mean GPA between Undergraduate (2.80) and Graduate (3.30) students is statistically significant at the 5% level."


Exercise E3: Mini Summary (4 pts)

Question: Write 3-4 sentences summarizing your key findings.

Instructions: Your summary must include: 1. One finding from descriptive statistics (Part C) 2. One pattern or insight from PivotTable analysis (Part D) 3. One inferential conclusion referencing your CI or p-value (Part E)

Response Form: - Q-E3: Write your summary paragraph.

Answer: Example: "The average GPA across all 160 students is 3.08 with a standard deviation of 0.56, indicating moderate variation in academic performance. PivotTable analysis reveals that Graduate students consistently outperform Undergraduate students across all four faculties, with the largest gap in Business (3.41 vs 2.69). A two-sample t-test confirms this difference is statistically significant (p < 0.001), and the 95% confidence interval for overall mean GPA [2.99, 3.17] suggests the population average is solidly above 3.0."


Verification Checklist

  • [ ] Cleaned dataset loaded (160 rows, consistent types)
  • [ ] Analysis sheet with basic formulas (B1)
  • [ ] AcademicStanding column with IF formula (B2)
  • [ ] Dean column with VLOOKUP (B3)
  • [ ] Descriptive statistics table completed (C1)
  • [ ] Segmented summary tables completed (C2)
  • [ ] PivotTable 1: GPA by Faculty × Program with Tabular layout (D1)
  • [ ] PivotTable 2: Enrollment by Year with Gender filter (D2)
  • [ ] Slicer for Scholarship on PivotTable 1 (D3)
  • [ ] 95% CI computed correctly (E1)
  • [ ] T.TEST completed and interpreted (E2)
  • [ ] Summary paragraph written (E3)
  • [ ] Response Form completed and submitted