top of page

Pandas Tutorials for Beginners


Pandas Tutorials for Beginners


Pandas is an open-source data manipulation and analysis library for Python. It provides powerful data structures and tools for efficiently working with structured data. In this tutorial, you'll learn the basics of Pandas and how to use its core functionalities for data manipulation, cleaning, and analysis.


1. Introduction to Pandas


Pandas is built on top of the NumPy library and provides data structures that are essential for data analysis in Python. The two primary data structures in Pandas are Series and DataFrame.


Series: A one-dimensional array-like object that can hold various data types.

DataFrame: A two-dimensional table that can store data of different types in columns.


2. Installation


You can install Pandas using pip, a Python package installer:


pip install pandas

3. Pandas Data Structures


Series


A Series is a one-dimensional labeled array that can hold data of any type (integers, strings, floats, etc.). It's similar to a column in a spreadsheet or a one-dimensional NumPy array.



import pandas as pd

data = [10, 20, 30, 40, 50]
series = pd.Series(data, name="MySeries")
print(series)

DataFrame


A DataFrame is a two-dimensional table of data with rows and columns. It's the most commonly used Pandas object and is often used to represent tabular data.


data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'City': ['New York', 'San Francisco', 'Los Angeles']
}
df = pd.DataFrame(data)
print(df)

4. Loading Data


Pandas provides various functions to load data from different sources:


CSV Files


csv_data = pd.read_csv('data.csv')

Excel Files

excel_data = pd.read_excel('data.xlsx', sheet_name='Sheet1')

SQL Databases


import sqlite3

conn = sqlite3.connect('database.db')
sql_query = "SELECT * FROM table_name"
sql_data = pd.read_sql(sql_query, conn)

5. Data Manipulation


Selecting and Indexing


Selecting a column


ages = df['Age']

Selecting multiple columns


subset = df[['Name', 'City']]

Selecting rows by index


row = df.loc[0]

Filtering Data


Filtering based on a condition


young_people = df[df['Age'] < 30]

Adding and Removing Columns


Adding a new column


df['Gender'] = ['Female', 'Male', 'Male']

Removing a column


df.drop('Gender', axis=1, inplace=True)

Handling Missing Data



Checking for missing values


missing_values = df.isnull().sum()

Dropping rows with missing values


df.dropna(inplace=True)

Filling missing values


df['Age'].fillna(df['Age'].mean(), inplace=True)

Sorting and Ranking


Sorting by a column


sorted_df = df.sort_values(by='Age')

Ranking

df['Age_Rank'] = df['Age'].rank(ascending=False)


6. Data Analysis


Descriptive Statistics


Summary statistics of the dataframe


summary_stats = df.describe()

Correlation matrix


correlation_matrix = df.corr()

Grouping and Aggregation


Grouping by a column and calculating mean


grouped = df.groupby('City')['Age'].mean()

Pivot Tables


pivot_table = df.pivot_table(index='City', columns='Gender', values='Age', aggfunc='mean')

7. Data Visualization


Pandas can work seamlessly with other data visualization libraries like Matplotlib and Seaborn.


import matplotlib.pyplot as plt
import seaborn as sns

 Plotting a bar chart
sns.barplot(x='City', y='Age', data=df)
plt.show()




Related Posts

How to Install and Run Ollama on macOS

Ollama is a powerful tool that allows you to run large language models locally on your Mac. This guide will walk you through the steps to...

bottom of page