MATPLOTLIB
Matplotlib is a powerful plotting library in Python primarily used for data visualization. It enables the
creation of static, animated, and interactive visualizations, offering a wide range of customizable plot
types, such as line plots, scatter plots, histograms, and heatmaps. The library is often paired with pandas
for data handling and NumPy for numerical data, making it a popular choice in data science and machine
learning.
Basic Components in Matplotlib
Before diving into specific plots, here are some basic functions and parameters in matplotlib:
• plt.title('Title'): Adds a title to the plot.
• plt.xlabel('Label'): Labels the x-axis.
• plt.ylabel('Label'): Labels the y-axis.
• plt.show(): Displays the plot.
• plt.figure(figsize=(width, height)): Sets the figure size in inches.
• alpha=value: Sets transparency of plot elements, where value is between 0 (transparent) and 1
(opaque).
• color='color_name': Sets color of plot elements.
• edgecolor='color_name': Sets edge color in bar or histogram plots.
• marker='symbol': Sets marker style in scatter or line plots.
• linewidth=value: Sets the line width in line plots.
Importing Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the Titanic dataset
titanic = pd.read_csv('titanic.csv')
# Set the default style for Seaborn
sns.set()
Matplotlib Plot Examples
Here are various types of plots with code examples and explanations for each.
1. Histogram - Age Distribution
A histogram shows the distribution of a dataset across defined intervals or "bins."
plt.figure(figsize=(10,6)) # Sets the plot size to 10x6 inches.
plt.hist(titanic['Age'].dropna(), bins=20, color='skyblue', edgecolor='black')
# Plots a histogram of 'Age' with 20 bins, sky-blue color, and black edges.
plt.title('Age Distribution of Titanic Passengers') # Adds a title to the plot.
plt.xlabel('Age') # Labels the x-axis as 'Age'.
plt.ylabel('Frequency') # Labels the y-axis as 'Frequency'.
plt.show() # Displays the plot.
Explanation:
• plt.hist(): Creates a histogram.
• bins: Sets the number of intervals.
• color: Specifies the color of bars.
• edgecolor: Defines bar edge color for visibility.
2. Bar Plot - Survival by Class
Bar plots are useful for comparing categorical data.
plt.figure(figsize=(8,6)) # Sets the figure size to 8x6 inches.
titanic['Pclass'].value_counts().sort_index().plot(kind='bar', color='lightcoral')
# Creates a bar plot for passenger class with sorted index and coral color.
plt.title('Number of Passengers in Each Class') # Adds a title.
plt.xlabel('Class') # Labels the x-axis as 'Class'.
plt.ylabel('Count') # Labels the y-axis as 'Count'.
plt.show() # Displays the plot.
Explanation:
• kind='bar': Specifies bar plot type.
• value_counts(): Counts each category occurrence.
3. Pie Chart - Gender Proportion
Pie charts show the proportion of categories in a dataset.
plt.figure(figsize=(6,6)) # Sets the plot size for a square shape.
titanic['Sex'].value_counts().plot(kind='pie', autopct='%1.1f%%',
startangle=140, colors=['lightblue', 'pink'])
# Creates a pie chart with percentages, start angle, and colors.
plt.title('Gender Distribution') # Adds a title.
plt.ylabel('') # Removes the y-axis label.
plt.show() # Displays the plot.
Explanation:
• kind='pie': Specifies a pie chart.
• autopct: Adds percentage labels.
• startangle: Sets rotation angle.
4. Box Plot - Age Distribution by Class
Box plots show data distribution based on quartiles and outliers.
Boxplt.figure(figsize=(10,6)) # Sets the figure size.
sns.boxplot(x='Pclass', y='Age', data=titanic, palette='viridis') # Creates
a box plot of 'Age' grouped by 'Pclass' using 'viridis' color palette.
plt.title('Age Distribution by Passenger Class') # Adds a title.
plt.xlabel('Passenger Class') # Labels the x-axis.
plt.ylabel('Age') # Labels the y-axis.
plt.show() # Displays the plot.
Explanation:
• sns.boxplot(): Creates a box plot.
• palette: Sets color palette for distinct groups.
5. Count Plot - Number of Survivors
Count plots display counts of each category.
plt.figure(figsize=(8,6)) # Sets the figure size.
sns.countplot(x='Survived', data=titanic, palette='viridis') # Plots survival count
with 'viridis' color palette.
plt.title('Survival Count (0 = Not Survived, 1 = Survived)') # Adds a title.
plt.xlabel('Survived') # Labels the x-axis.
plt.ylabel('Count') # Labels the y-axis.
plt.show() # Displays the plot.
Explanation:
• sns.countplot(): Counts occurrences of each category in a categorical column.
6. Scatter Plot - Age vs. Fare
Scatter plots show the relationship between two variables.
plt.figure(figsize=(10,6)) # Sets the figure size.
plt.scatter(titanic['Age'], titanic['Fare'], alpha=0.5, color='teal') # Plots 'Age'
vs 'Fare' with 50% transparency and teal color.
plt.title('Age vs. Fare') # Adds a title.
plt.xlabel('Age') # Labels the x-axis.
plt.ylabel('Fare') # Labels the y-axis.
plt.show() # Displays the plot.
Explanation:
• plt.scatter(): Creates a scatter plot.
• alpha: Sets transparency to reduce overlap.
7. Heatmap - Correlation Matrix
A heatmap visualizes data values in a matrix format using colors.
plt.figure(figsize=(10,8)) # Sets the figure size.
sns.heatmap(titanic.corr(), annot=True, cmap='coolwarm', square=True, linewidths=0.5)
# Creates a heatmap for correlations, with cell values and 'coolwarm' color map.
plt.title('Correlation Matrix') # Adds a title.
plt.show() # Displays the plot.
Explanation:
• sns.heatmap(): Creates a heatmap.
• annot=True: Displays correlation values on cells.
• cmap: Defines color gradient for values.