Practical 2: Data Cleaning & Exploratory Data Analysis (EDA)

Objective

Learn to clean real-world datasets and perform comprehensive exploratory data analysis to understand data characteristics.

Duration

3-4 hours

Prerequisites


What You’ll Learn


📊 Dataset

Use the Iris or Titanic dataset (provided)


📋 Key Tasks

1. Load and Explore Data

import pandas as pd
df = pd.read_csv('dataset.csv')
print(df.head())
print(df.info())
print(df.describe())

2. Handle Missing Values

# Check for missing values
print(df.isnull().sum())

# Fill missing values
df.fillna(df.mean(), inplace=True)  # For numerical
df.fillna(df.mode()[0], inplace=True)  # For categorical

3. Detect Outliers

import numpy as np
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
outliers = ((df < (Q1 - 1.5*IQR)) | (df > (Q3 + 1.5*IQR))).sum()
print(outliers)

4. Create Visualizations

import matplotlib.pyplot as plt
df.hist(bins=20)
plt.show()

df.plot(kind='box')
plt.show()

📊 Learning Outcomes


💾 Deliverables


Next: Practical 3 → ← Back to Practicals