Intro to Data Visualization
  • Introduction
  • Getting started
    • Introduction to Pandas
    • Accessing Files on Colab
    • Reviewing Data
      • Understanding type(data) in Pandas
    • Data Types
      • Categorical Data
      • Numeric Data
      • Temporal Data
      • Geographic Data
    • How to Check Data Type
    • Slicing and Subsetting DataFrames
    • Aggregating Data
  • Visualization Types
    • Exploratory Process
    • Explanatory Process
  • data exploration
    • Exploration Overview
    • Exploration with Plotly
      • Exploring Distributions
      • Exploring Relationships
      • Exploring with Regression Plots
      • Exploring Correlations
      • Exploring Categories
      • Exploring Time Series
      • Exploring Stocks with Candlestick
      • Exploring with Facets
      • Exploring with Subplots
    • Exploring with AI
  • Data Explanation
    • Data Explanation with Plotly
      • Using Text
      • Using Annotations
      • Using Color
      • Using Shape
      • Accessibility
      • Using Animations
    • Use Cases
  • Exercises and examples
    • Stock Market
      • Loading Yahoo! Finance Data
      • Use Cases for YF
      • Exploring YF Data
      • Understanding Boeing Data Over Time
      • Polishing the visualization
      • Analyzing with AI
      • Comparisons
    • The Gapminder Dataset
      • Loading the Gapminder Data
      • Use Cases
      • Exploring the Data
      • Exporting a Static Image
Powered by GitBook
On this page
  1. Getting started

Aggregating Data

The groupby and aggregate methods in pandas are powerful tools for summarizing and aggregating data within a DataFrame. The groupby method allows you to group rows based on the values in one or more columns, creating subsets of the data that share the same group label. Once grouped, the aggregate function can be applied to calculate summary statistics, such as sums, means, counts, or custom computations for each group. For example, in a sales dataset, you can use groupby on a "Region" column and aggregate by summing the "Sales" column to find total sales per region. This approach is highly efficient and flexible.

One of the strengths of combining groupby with aggregate is its versatility. You can apply multiple aggregation functions simultaneously to different columns using a dictionary-like syntax. For instance, in a dataset with columns "Date," "Sales," and "Profit," you can group by "Date" and compute both the total sales (sum) and average profit (mean) in a single step. Additionally, custom functions can be applied using Python's lambda expressions or user-defined functions, enabling complex and tailored calculations. This functionality is invaluable in exploratory data analysis and preprocessing tasks, where quickly summarizing and reshaping data is essential for uncovering patterns and preparing data for further analysis or visualization.

Let's get started with the tips dataset.

import plotly.express as px
import pandas as pd

# Download the tips dataset
df = px.data.tips()


df_grouped = df.groupby(['sex']).agg({'total_bill': 'sum'})
df_grouped.reset_index(inplace=True)
df_grouped.head()

fig = px.bar(df_grouped, x="sex", y="total_bill") 

fig.show()
PreviousSlicing and Subsetting DataFramesNextVisualization Types

Last updated 3 months ago