Intro to Data Visualization
  • Introduction
  • Getting started
    • Introduction to Pandas
    • Accessing Files on Colab
    • Reviewing Data
      • Understanding type(data) in Pandas
    • Data Types
      • Categorical Data
      • Numeric Data
      • Temporal Data
      • Geographic Data
    • How to Check Data Type
    • Slicing and Subsetting DataFrames
    • Aggregating Data
  • Visualization Types
    • Exploratory Process
    • Explanatory Process
  • data exploration
    • Exploration Overview
    • Exploration with Plotly
      • Exploring Distributions
      • Exploring Relationships
      • Exploring with Regression Plots
      • Exploring Correlations
      • Exploring Categories
      • Exploring Time Series
      • Exploring Stocks with Candlestick
      • Exploring with Facets
      • Exploring with Subplots
    • Exploring with AI
  • Data Explanation
    • Data Explanation with Plotly
      • Using Text
      • Using Annotations
      • Using Color
      • Using Shape
      • Accessibility
      • Using Animations
    • Use Cases
  • Exercises and examples
    • Stock Market
      • Loading Yahoo! Finance Data
      • Use Cases for YF
      • Exploring YF Data
      • Understanding Boeing Data Over Time
      • Polishing the visualization
      • Analyzing with AI
      • Comparisons
    • The Gapminder Dataset
      • Loading the Gapminder Data
      • Use Cases
      • Exploring the Data
      • Exporting a Static Image
Powered by GitBook
On this page
  1. Getting started

Slicing and Subsetting DataFrames

Slicing and subsetting DataFrames in pandas is essential for efficient data analysis, so you can extract specific rows, columns, or subsets of data tailored to their analysis needs. By focusing on relevant portions of the dataset, analysts can streamline their workflows, reduce computational overhead, and make targeted observations. For example, slicing can isolate data for a particular time period, product category, or geographical region, enabling a more detailed exploration of trends or patterns.

Pandas provides intuitive methods such as .loc[] for label-based slicing and .iloc[] for position-based slicing, alongside Boolean indexing (True/False) for filtering data based on conditions. These features help you in cleaning datasets, preparing data for visualization, or performing group-specific analyses. Mastering slicing and subsetting ensures that you can efficiently manipulate and focus your datasets, ultimately improving the speed and accuracy of data-driven decision-making.


Slicing and Indexing in Pandas

Selecting Columns

You can select a single column or multiple columns from a DataFrame:

# Single column selection
df['Name']

# Multiple columns selection
df[['Name', 'City']]

Selecting Rows by Index

Pandas allows you to select rows using index slicing or logical conditions:

# Select rows by index
df.iloc[0:2]  # Select first two rows

# Logical indexing
df[df['Age'] > 25]  # Select rows where Age is greater than 25

Selecting Specific Rows and Columns

Use .loc or .iloc for advanced subsetting:

# Select by label
df.loc[0, 'Name']  # First row, 'Name' column

# Select by position
df.iloc[0, 1]  # First row, second column

# Subset rows and columns
df.loc[0:1, ['Name', 'City']]  # Rows 0 and 1, columns 'Name' and 'City'

Combining Slicing and Plotting

You can combine slicing, logical filtering, and custom visualizations seamlessly. This is an example of Boolean indexing because you are filtering the data based on the condition, Values > 15.

# Filter rows based on a condition and plot
df_filtered = df[df['Values'] > 15]
fig = px.bar(df_filtered, x='Category', y='Values', title='Filtered Data Bar Chart')
fig.show()

Tips for Effective Subsetting

  1. Plan Your Analysis: Determine which parts of the data are relevant to your question or visualization.

  2. Chain Operations: Combine filtering and slicing with other Pandas operations (e.g., groupby) for efficient workflows.

  3. Preview Subsets: Use head() or sample() to inspect subsets before plotting.

Mastering these techniques allows you to focus on the most relevant data, improving the clarity and impact of your visualizations.

PreviousHow to Check Data TypeNextAggregating Data

Last updated 3 months ago