Intro to Data Visualization
  • Introduction
  • Getting started
    • Introduction to Pandas
    • Accessing Files on Colab
    • Reviewing Data
      • Understanding type(data) in Pandas
    • Data Types
      • Categorical Data
      • Numeric Data
      • Temporal Data
      • Geographic Data
    • How to Check Data Type
    • Slicing and Subsetting DataFrames
    • Aggregating Data
  • Visualization Types
    • Exploratory Process
    • Explanatory Process
  • data exploration
    • Exploration Overview
    • Exploration with Plotly
      • Exploring Distributions
      • Exploring Relationships
      • Exploring with Regression Plots
      • Exploring Correlations
      • Exploring Categories
      • Exploring Time Series
      • Exploring Stocks with Candlestick
      • Exploring with Facets
      • Exploring with Subplots
    • Exploring with AI
  • Data Explanation
    • Data Explanation with Plotly
      • Using Text
      • Using Annotations
      • Using Color
      • Using Shape
      • Accessibility
      • Using Animations
    • Use Cases
  • Exercises and examples
    • Stock Market
      • Loading Yahoo! Finance Data
      • Use Cases for YF
      • Exploring YF Data
      • Understanding Boeing Data Over Time
      • Polishing the visualization
      • Analyzing with AI
      • Comparisons
    • The Gapminder Dataset
      • Loading the Gapminder Data
      • Use Cases
      • Exploring the Data
      • Exporting a Static Image
Powered by GitBook
On this page
  1. Getting started
  2. Data Types

Categorical Data

Categorical data represents variables that fall into distinct categories or groups rather than numerical values. These categories are often labels or names, such as "gender," "region," or "product type," and they can be nominal (without a meaningful order) or ordinal (with an inherent order). For example, "colors" is nominal (e.g., teal, red, green), while "education level" (e.g., high school, bachelor's, master's) is ordinal. Categorical data is commonly used in surveys, demographics, and classification problems.

Analyzing and visualizing categorical data is essential for understanding distributions, identifying dominant groups, and exploring relationships between categories. Tools like bar charts, pie charts, and frequency tables are commonly used to summarize and represent this data effectively.

In Python, most categorical data is saved as string in pandas, but pandas also has a special data type called category to optimize memory usage and performance. Understanding and processing categorical data correctly is vital for drawing meaningful insights and performing accurate analyse


Code Example

How it appears in Pandas:

data = {
    'Category': ['A', 'B', 'A', 'C']
}

df = pd.DataFrame(data)
df['Category'] = df['Category'].astype('category')
print(df)

Output:

  Category
0       A
1       B
2       A
3       C
PreviousData TypesNextNumeric Data

Last updated 3 months ago