Intro to Data Visualization
  • Introduction
  • Getting started
    • Introduction to Pandas
    • Accessing Files on Colab
    • Reviewing Data
      • Understanding type(data) in Pandas
    • Data Types
      • Categorical Data
      • Numeric Data
      • Temporal Data
      • Geographic Data
    • How to Check Data Type
    • Slicing and Subsetting DataFrames
    • Aggregating Data
  • Visualization Types
    • Exploratory Process
    • Explanatory Process
  • data exploration
    • Exploration Overview
    • Exploration with Plotly
      • Exploring Distributions
      • Exploring Relationships
      • Exploring with Regression Plots
      • Exploring Correlations
      • Exploring Categories
      • Exploring Time Series
      • Exploring Stocks with Candlestick
      • Exploring with Facets
      • Exploring with Subplots
    • Exploring with AI
  • Data Explanation
    • Data Explanation with Plotly
      • Using Text
      • Using Annotations
      • Using Color
      • Using Shape
      • Accessibility
      • Using Animations
    • Use Cases
  • Exercises and examples
    • Stock Market
      • Loading Yahoo! Finance Data
      • Use Cases for YF
      • Exploring YF Data
      • Understanding Boeing Data Over Time
      • Polishing the visualization
      • Analyzing with AI
      • Comparisons
    • The Gapminder Dataset
      • Loading the Gapminder Data
      • Use Cases
      • Exploring the Data
      • Exporting a Static Image
Powered by GitBook
On this page
  1. data exploration
  2. Exploration with Plotly

Exploring Relationships

PreviousExploring DistributionsNextExploring with Regression Plots

Last updated 3 months ago

Importance of Visualizing Relationships

Visualizing relationships between variables is a fundamental aspect of data analysis, as it helps uncover patterns, correlations, and interactions within datasets. Scatter plots are one of the most common tools for visualizing relationships, particularly when both variables are numerical. By plotting one variable on the x-axis and the other on the y-axis, scatter plots reveal how changes in one variable correspond to changes in another. Patterns such as clusters, trends, or outliers become immediately apparent, helping analysts identify correlations or anomalies. Enhancements like color, size, or shape of the markers can be used to introduce additional dimensions, making scatter plots even more informative. For example, in a dataset analyzing sales performance, a scatter plot could show the relationship between advertising spend and revenue, with point size representing customer count and color indicating different regions.

Beyond scatter plots, other visualization techniques can effectively capture relationships depending on the data type and complexity. Line charts are ideal for showing relationships over time, such as tracking how sales and temperature co-vary across seasons. Heatmaps work well for illustrating relationships in categorical or aggregate data, as they use color intensity to represent the strength of a relationship. Additionally, advanced techniques like pair plots or parallel coordinates charts can visualize relationships among multiple variables simultaneously, revealing intricate interactions. Tools like Plotly further enhance relationship visualizations by adding interactivity, such as hover-over details and zooming capabilities, allowing users to explore data relationships dynamically. By choosing the right type of visualization, analysts can better understand relationships but also communicate findings effectively to their audience.

Let's explore the relationships.


1. Scatter Plot

Scatter plots are useful for visualizing the relationship between two numerical variables.

# Example: Scatter plot
fig = px.scatter(df, x='Date', y='Sales', title='Sales Scatter Plot')
fig.show()

You can also customize scatter plots by adding color or size dimensions. Note that these are column names, so the color and size of the points will vary with the amount of Sales.

# Scatter plot with color and size
fig = px.scatter(df, x='Date', y='Sales', color='Sales', size='Sales', title='Enhanced Scatter Plot')
fig.show()

The is also available on the Plotly website. If you need highly customized pairwise plots or additional styling, you can combine Plotly's flexibility with Python loops to generate subplots manually.

For any given type of plot or chart, there is also usually a user guide on the Plotly website, which provides some helpful examples. For example, here is the .


2. Marginal Plots

Marginal distribution plots show the relationship between two variables and the marginal distribution of each single variable. To plot the marginal distribution, use the marginal_x and marginal_y parameters in the scatter function.

import plotly.express as px

df = px.data.iris()

fig = px.scatter(df, x="sepal_length", y="sepal_width", marginal_x="histogram", marginal_y="rug")
fig.show()

3. Scatterplot Matrix

The plotly.express module has a function for scatterplot matrix, px.scatter_matrix. This function creates a scatter plot matrix to visualize pairwise relationships in a dataset.

Example: Using px.scatter_matrix

import pandas as pd
import plotly.express as px

# Sample DataFrame
data = {
    'A': [1, 2, 3, 4],
    'B': [4, 3, 2, 1],
    'C': [2, 4, 1, 3],
    'D': [5, 7, 6, 8]
}
df = pd.DataFrame(data)

# Create a scatter matrix
fig = px.scatter_matrix(df, dimensions=['A', 'B', 'C', 'D'], title="Scatter Matrix Example")
fig.show()

Features of px.scatter_matrix:

  • Dimensions: Select specific columns to include in the scatter matrix.

  • Color: Add a categorical variable to color the points.

  • Customization: Adjust titles, axes, and marker styles.


px.scatter() function documentation
Plotly user guide on scatter plots