Titanic Dataset Instructions
Dataset Overview
This dataset contains passenger information from the Titanic, including demographics, ticket class, and survival status. The dataset contains missing values and various data types, making it perfect for practicing data cleaning techniques.
Dataset Files
- Titanic.xlsx: Main dataset file containing passenger records
Variables Description
The dataset includes:
- Passenger Information: Passenger ID, name, age, sex
- Travel Details: Ticket class, ticket number, cabin, embarked port
- Family: Number of siblings/spouses, parents/children aboard
- Survival: Survival status (survived or not)
Data Quality Notes
- Contains missing values in age, cabin, and embarked columns
- Various data formats that need standardization
- Inconsistent text formatting
- Perfect for practicing data cleaning and validation
Use Cases
This dataset is suitable for:
- Data cleaning and preparation exercises
- Handling missing values
- Standardizing data formats
- Using Power Query for data transformation
- Initial exploratory analysis of survival patterns
Excel Analysis Tips
- Use Power Query to clean and transform data
- Practice handling missing values (removal, imputation)
- Standardize text formats and data types
- Create summary tables of passenger demographics
- Analyze survival rates by passenger class, gender, age