Course Overview

📊 Course Information

  • Course Code: FDA.6.1.2.01.V
  • Credits: 3
  • In-Class Hours: 45
  • Self-Study Hours: 90
  • Program: Undergraduate

📚 Prerequisites

  • Mathematics for Economists
  • Probability Theory and Mathematical Statistics (or Statistics in Economics and Business)

🎯 Course Objectives

  • G1: Understand the roles and applications of data science in economics, finance, and marketing, and describe the basic steps in the lifecycle of a data science project.
  • G2: Use Python programming to process tabular data, including writing statements with basic control structures and using NumPy and Pandas.
  • G3: Understand processes for collecting, cleaning, and wrangling data from common sources, using transformations, combinations, and aggregations for analysis.
  • G4: Apply visualization tools and reporting to build appropriate data visualizations aligned with analytical goals, and present and interpret results in reports using Markdown, PDF, or presentation formats.
  • G5: Apply basic machine learning models such as linear regression, decision trees, random forests, gradient boosting, etc., to solve simple predictive tasks in business, and evaluate models with quantitative metrics such as MAE, RMSE, and R².

💻 Required Software

  • Python 13.0 or higher
  • Jupyter Notebook or Google Colab
  • NumPy, Pandas, Matplotlib, Seaborn, scikit-learn

This Week (Week 6)

📌 Week Highlights

  • 📊 Learn to read and write data in CSV, JSON, and Excel formats
  • 🌐 Practice web scraping and data collection from websites
  • 📖 Read Chapter 7 from the course textbook
  • 📓 Download Lecture 4 notebook from course schedule
  • 💻 Complete data input/output exercises
  • 📤 Submit Homework 2 by Oct 16, 22:00

📢 Announcements

Homework 2 Due Soon!

2025-10-10

Homework 2 (Data Input and Storage) is due on Oct 16 at 22:00. Submit your .ipynb file via Google Drive. Practice with CSV, JSON, Excel, web scraping, and APIs.

Lecture 4 Quiz Available

2025-10-10

Quiz on Data Input and Storage is now available! Test your understanding of CSV, JSON, Web scraping, RSS feeds, APIs, and SQL databases. 30 questions with hints.

🗓️ Coming Up

  • Week 7: Data Input and Storage Practice - Next week
  • Week 8: Midterm 1 & Data Preprocessing - In 2 weeks

Instructors

Dr. Nguyen Trong Nghia

Dr. Nguyen Trong Nghia

Lecture

Member of the Business AI Lab (BAI LAB) research group and lecturer at the Department of Data Science and Artificial Intelligence, School of Technology, National Economics University. Holds a PhD in Computer Science from Chonnam National University, Korea (2025), with expertise in AI applications for business.

MSc. Nguyen Thi Minh Trang

MSc. Nguyen Thi Minh Trang

Tutorial

Lecturer at the Department of Data Science and Artificial Intelligence, School of Technology, National Economics University. Member of the Lab for Research and Technology Transfer of Data Science and Artificial Intelligence. Holds a Master's degree in Business Analytics from Nottingham Trent University, UK (2023).

MSc. Dam Tien Thanh

MSc. Dam Tien Thanh

Tutorial

Member of the DataOptLab research team, Department of Data Science and Artificial Intelligence, School of Technology, National Economics University. Graduated with Honors from the University of Technology – VNU (2020) and holds a Master's degree from Phenikaa University (2023).

Course Learning Outcomes (CLOs)

By the end of this course, students will be able to:

G1: Understand the roles and applications of data science in economics, finance, and marketing, and describe the basic steps in the lifecycle of a data science project.

CLO1.1 Present definitions and characteristics of data science. Level II
CLO1.2 Explain the role of data science in business domains. Level II
CLO1.3 Describe the main steps in the lifecycle of a data science project. Level III
CLO1.4 Distinguish common data types in business analytics. Level III
CLO1.5 Relate practical application examples of data science in economics and finance. Level III

G2: Use Python programming to process tabular data, including writing statements with basic control structures and using NumPy and Pandas.

CLO2.1 Write Python code using variables, data types, and basic loops. Level II
CLO2.2 Use NumPy to manipulate one- and two-dimensional arrays. Level I
CLO2.3 Operate with Pandas DataFrame: filtering, grouping, joining, computations. Level II
CLO2.4 Run notebooks in Jupyter or Google Colab. Level II
CLO2.5 Read and modify sample Python code for simple data tasks. Level II

G3: Understand processes for collecting, cleaning, and wrangling data from common sources, using transformations, combinations, and aggregations for analysis.

CLO3.1 Read data from CSV, Excel, and simple web pages. Level III
CLO3.2 Identify missing data and apply suitable cleaning techniques. Level II
CLO3.3 Transform data types and create new features for analysis. Level II
CLO3.4 Perform aggregation, grouping, and pivoting. Level II
CLO3.5 Merge and combine multiple tables into one analytical dataset. Level II

G4: Apply visualization tools and reporting to build appropriate data visualizations aligned with analytical goals, and present and interpret results in reports using Markdown, PDF, or presentation formats.

CLO4.1 Plot basic charts such as histogram, scatter plot, bar chart, and boxplot with Matplotlib/Seaborn. Level II
CLO4.2 Choose appropriate chart types for data and presentation objectives. Level II
CLO4.3 Interpret insights from charts and visual analysis. Level II
CLO4.4 Present analysis results as Markdown or PDF reports. Level NA
CLO4.5 Apply data storytelling in presenting results. Level I

G5: Apply basic machine learning models such as linear regression, decision trees, random forests, gradient boosting, etc., to solve simple predictive tasks in business, and evaluate models with quantitative metrics such as MAE, RMSE, and R².

CLO5.1 Apply linear regression to predict continuous variables in business. Level II
CLO5.2 Train simple binary classifiers such as decision trees and random forests. Level II
CLO5.3 Split data into training and test sets. Level II
CLO5.4 Evaluate model performance using MAE, RMSE, R², confusion matrix. Level I
CLO5.5 Interpret predictions and apply them to real contexts. Level II

Course Schedule

Week Type Topic Materials Assessment
1 Lecture Introduction to Data Science and Applications in Economics and Business
  • Core concepts in data science
  • Data analysis vs. data science
  • What is modeling?
  • Introduction to Python
Chapters 1, 2, 3
📓 Notebook
In-class discussion
2 Lecture Python Programming Language
  • Variables in Python
  • Vectors and sequential data types
  • Conditional statements
  • Loops
  • Functions
Chapter 4
📓 Notebook
In-class discussion; homework
3 Practice Python Programming Practice
  • Variables in Python
  • Vectors and sequential data types
  • Conditional statements
  • Loops
  • Functions
Basic Python practice
📓 Notebook
In-class discussion; homework
4 Lecture Python Libraries for Data Science
  • Data science libraries overview
  • NumPy arrays
  • Pandas DataFrame
Chapters 5, 6
📓 Notebook
In-class discussion; homework
5 Practice Python with NumPy and Pandas
  • Work with NumPy arrays
  • Evaluate NumPy performance vs. native arrays
  • Practice basic Pandas DataFrame operations
NumPy practice; Pandas practice
📓 Notebook
In-class discussion; homework
6 Lecture Data Input and Storage
  • Read and write text formats
  • Web data collection
  • Read from Microsoft Excel
  • Interact with Web APIs
  • Interact with databases
Chapter 7
📓 Notebook
7 Practice Data Input and Storage Practice
  • Read and write text formats
  • Web scraping/collection
  • Read Excel
  • Interact with Web APIs
  • Interact with databases
Practice on data input and storage
📓 Notebook
8 Midterm Midterm 1 & Data Preprocessing Lecture
  • Midterm Exam 1
  • Format data aligned to research goals
  • Handle outliers and missing values

📓 Notebook
Midterm 1
9 Practice Data Preprocessing Practice
  • Clean data, transform to desired formats
  • Detect and handle outliers with boxplots, Z-scores
  • Handle missing data by deletion, imputation, etc.
Preprocessing practice
📓 Notebook
In-class discussion; homework
10 Lecture Data Transformation and Feature Engineering
  • Reshape data between wide and long formats
  • Encode categorical variables
  • Normalize quantitative variables
  • Create new features from raw attributes
Chapter 8
📓 Notebook
In-class discussion; homework
11 Practice Data Transformation Practice
  • Reshape wide/long
  • Encode categorical variables
  • Normalize quantitative variables
  • Create new features
Data transformation practice In-class discussion; homework
12 Lecture Data Visualization
  • Basic charts: histogram, scatter, bar
  • Principles for selecting appropriate charts for one or multiple variables
  • Use of color, shape, and size to enhance interpretability
  • Interpret data via visualization
Chapter 10 In-class discussion; homework
13 Practice Data Visualization Practice
  • Plot with Matplotlib and Seaborn
  • Apply visualization in analysis
  • Create simple dashboards in Python
Visualization practice In-class discussion; homework
14 Lecture Modeling with Data (Machine Learning)
  • Core ML concepts
  • Linear models for regression and classification
  • Decision trees
  • Tree ensembles
  • Model evaluation metrics
Chapter 11 In-class discussion; homework
15 Midterm & Practice Midterm 2 & Machine Learning Modeling Practice
  • Midterm Exam 2
  • Linear models for regression and classification
  • Decision trees
  • Tree ensembles
Modeling practice Midterm 2

Note: Schedule is subject to change. Check course announcements for updates.

Homework Assignments

Complete and submit your homework assignments via the Google Drive links below. All submissions must be in .ipynb (Jupyter Notebook) format.

hw1

NumPy and Pandas Practice

📅 Due: 22:00 Oct 09, 2025

Work with NumPy arrays and Pandas DataFrames to perform data manipulation tasks. Practice array operations, DataFrame creation, data selection, and basic statistical analysis.

📤 Submit on Google Drive

Upload your .ipynb file to the shared folder

Deliverables:

  • Jupyter notebook (.ipynb) with solutions
  • Brief documentation of approach

Grading Criteria:

  • Correctness (60%)
  • Code efficiency (20%)
  • Documentation (20%)
hw2

Data Input and Storage

📅 Due: 22:00 Oct 16, 2025

Practice reading and writing data from multiple sources including CSV, JSON, Excel files, web scraping, RSS feeds, Web APIs, and SQL databases.

📤 Submit on Google Drive

Upload your .ipynb file to the shared folder

Deliverables:

  • Jupyter notebook (.ipynb) with data collection code
  • Sample datasets collected (CSV/JSON)
  • Documentation of data sources used

Grading Criteria:

  • Correctness (60%)
  • Code quality (20%)
  • Documentation (20%)
hw3

Data Collection and Cleaning

📅 Due: Week 9

Collect data from web sources, clean and preprocess it for analysis.

Deliverables:

  • Python script or notebook
  • Cleaned dataset (CSV)
  • Process documentation

Grading Criteria:

  • Data quality (40%)
  • Code quality (30%)
  • Documentation (30%)
hw4

Data Visualization Project

📅 Due: Week 13

Create comprehensive visualizations to explore and present insights from a business dataset.

Deliverables:

  • Jupyter notebook with visualizations
  • Written analysis (500–700 words)

Grading Criteria:

  • Visualization quality (40%)
  • Insight generation (40%)
  • Presentation (20%)
final-project

Machine Learning Business Application

📅 Due: Week 15

Build and evaluate a machine learning model to solve a business prediction problem.

Deliverables:

  • Complete Jupyter notebook
  • Dataset and preprocessing code
  • Final report with model evaluation
  • 5-minute presentation

Grading Criteria:

  • Model performance (30%)
  • Code quality (25%)
  • Report quality (25%)
  • Presentation (20%)

Important: Late submissions will be penalized 1 point per day. Missing submissions receive 0 points. Make sure to name your file properly: StudentID_HW#.ipynb

Practice Quizzes

Test your understanding with these interactive quizzes. Each quiz corresponds to lecture material and helps reinforce key concepts.

Lecture 2 Quiz: Python Basics

Week 2

Test your understanding of Python fundamentals including variables, data types, and control structures.

Topics covered:
  • Variables and data types
  • Conditional statements
  • Loops and iteration
Start Quiz

Lecture 3 Quiz: NumPy & Pandas

Week 3

Quiz on Python data science libraries including NumPy arrays and Pandas DataFrames.

Topics covered:
  • NumPy arrays and operations
  • Pandas DataFrames
  • Data manipulation basics
Start Quiz

Lecture 4 Quiz: Nhập và Lưu Trữ Dữ Liệu với Python

Week 4

30 câu hỏi trắc nghiệm về đọc và ghi dữ liệu từ CSV, JSON, Web, RSS/XML, Excel, API và SQL Database.

Topics covered:
  • CSV, JSON, Excel file operations
  • Web scraping with read_html()
  • RSS Feed parsing with BeautifulSoup
  • Web API interaction
  • SQL Database operations
Start Quiz

Lecture 5 Quiz: Làm sạch và chuẩn bị dữ liệu

Week 5

30 câu hỏi trắc nghiệm về xử lý dữ liệu thiếu, trùng lặp, chuẩn hóa dữ liệu, xử lý chuỗi ký tự và mã hóa dữ liệu phân loại.

Topics covered:
  • Xử lý dữ liệu thiếu (Missing Data)
  • Xử lý dữ liệu trùng lặp (Duplicate Data)
  • Chuẩn hóa và biến đổi dữ liệu
  • Xử lý chuỗi ký tự (String Processing)
  • Mã hóa dữ liệu phân loại (Categorical Encoding)
Start Quiz

Lecture 6 Quiz: Sắp xếp và biến đổi dữ liệu

Week 6

30 câu hỏi trắc nghiệm về MultiIndex, Data Merging, Concatenation, và Data Reshaping trong Pandas.

Topics covered:
  • Hierarchical Indexing (MultiIndex)
  • Data Merging với pandas.merge
  • Data Concatenation với pandas.concat
  • Data Reshaping (melt, pivot, stack, unstack)
  • Advanced data transformation techniques
Start Quiz

Assessment & Grading

Assessment Method Week Description Weight
Attendance/participation Weeks 1–15 Full in-class participation; homework evaluation; in-class engagement 10%
Knowledge Check 1 Week 8 Quiz/coding/presentation in class 20%
Knowledge Check 2 Week 15 Quiz/coding/presentation in class 20%
Final exam Per university exam schedule Computer-based multiple-choice exam 50%

Key Policies

📋 Eligibility

Students must achieve at least 5 points for class participation to be eligible for the final exam (per university regulations).

📅 Attendance

Students are responsible for attending all scheduled sessions. In cases of force majeure, students should self-study provided materials and complete assigned supplementary readings.

📝 Submissions

Failure to submit individual or group assignments as required will result in a score of 0 for that component. Late submissions are penalized by 1 point per day after the official deadline.

View Full Policies

Course Resources

📚

Course Textbook

Data Science in Economics and Business (Python Applications)

Access Textbook
📊

Slide Deck

Weekly lecture slides and presentation materials

View Slides
💻

GitHub Repository

Code examples, datasets, and supplementary materials

Visit GitHub
🔧

Software & Tools

  • Python 13.0 or higher
  • Jupyter Notebook or Google Colab
  • NumPy, Pandas, Matplotlib, Seaborn, scikit-learn
Setup Guide