Chapter 1: NumPy

NumPy is one of the most used Python Modules for data analytics. In this chapter we will learn most of its useful tools. 

Intro to NumPy

NumPy Python

What is NumPy?

NumPy Array vs. Python Lists

4 Important differences

np.tile()

A very useful numpy function to structure and restructure data

numpy.random

Generate random numbers in a prescribed structure using Numpy

Example: Creating a random movie

numpy array .reshape() function

Up your data preprocessing game using Numpy .reshape() function

Data Source: https://www.kaggle.com/datasets/kapillondhe/american-sign-language

Chapter 2: Code Optimization

Add your description here.

Code Optimization - Iterations

Loop vs Apply Function vs Vectorization & Broadcasting

GitHub Repo

Code Optimization - Data Types

Choose the data types appropriately to be able to run faster and smarter analysis

The datatype: DateTime 

| Optimize your code with the right datatype

GitHub Repo

Pandas merge function

Doing Date Integration the best possible way

GitHub Repo

Pandas DataFrame Index

Multi-level index and columns

GitHub Repo

Iterate vs. Map

Why maping a function is better than iterations (loops) for data processing?

GitHub Repo

Chapter 3: Data cleaning

Add your description here.

Data Cleaning Example 3

Detecting and Dealing with Errors

GitHub Repo

Data Cleaning Example 4

Detecting, Diagnosing and Dealing with Missing Values

GitHub Repo

Data Cleaning Example 5

Detecting, Diagnosing and Dealing with Missing Values

GitHub Repo

Data Cleaning Example 6

Detecting, Diagnosing and Dealing with Outliers

GitHub Repo

Chapter 4 Data Integration

Add your description here.

Example 1

Entity Identification Challenge

GitHub Repo

Example 2

Aggregation Mismatch

GitHub Repo