Intro to Data Visualization
  • Introduction
  • Getting started
    • Introduction to Pandas
    • Accessing Files on Colab
    • Reviewing Data
      • Understanding type(data) in Pandas
    • Data Types
      • Categorical Data
      • Numeric Data
      • Temporal Data
      • Geographic Data
    • How to Check Data Type
    • Slicing and Subsetting DataFrames
    • Aggregating Data
  • Visualization Types
    • Exploratory Process
    • Explanatory Process
  • data exploration
    • Exploration Overview
    • Exploration with Plotly
      • Exploring Distributions
      • Exploring Relationships
      • Exploring with Regression Plots
      • Exploring Correlations
      • Exploring Categories
      • Exploring Time Series
      • Exploring Stocks with Candlestick
      • Exploring with Facets
      • Exploring with Subplots
    • Exploring with AI
  • Data Explanation
    • Data Explanation with Plotly
      • Using Text
      • Using Annotations
      • Using Color
      • Using Shape
      • Accessibility
      • Using Animations
    • Use Cases
  • Exercises and examples
    • Stock Market
      • Loading Yahoo! Finance Data
      • Use Cases for YF
      • Exploring YF Data
      • Understanding Boeing Data Over Time
      • Polishing the visualization
      • Analyzing with AI
      • Comparisons
    • The Gapminder Dataset
      • Loading the Gapminder Data
      • Use Cases
      • Exploring the Data
      • Exporting a Static Image
Powered by GitBook
On this page
  1. Getting started

Reviewing Data

When working with a DataFrame, it's important to understand its structure and contents. Pandas provides several functions to help review a DataFrame.

Viewing the Top and Bottom Rows

One way to explore a new dataset is to preview the DataFrame. To preview a DataFrame by seeing the first few rows or the last few rows, use the head() and tail() functions:

# View the first 5 rows
data.head()

# View the last 5 rows
data.tail()

You can specify the number of rows to view, e.g., data.head(10) to see the first 10 rows or data.tail(10) to see the last 10 rows.

Getting Basic Information

Use info() to get an overview of the DataFrame:

# Overview of data types, non-null values, and memory usage
data.info()

Descriptive Statistics

The describe() method provides summary statistics for numerical columns:

# Summary statistics
data.describe()

This includes metrics like mean, standard deviation, min, max, and percentiles.

Value Counts

Use value_counts() to see the distribution of values in a specific column of categorical values:

# Count unique values in a column
data['Category'].value_counts()

Checking for Missing Values

Most datasets contain missing values. To identify missing data:

# Check for missing values
data.isnull().sum()

Shape and Columns

To get the dimensions and column names of the DataFrame:

# Number of rows and columns
data.shape

# Column names
data.columns

Sample Rows

Sometimes, it is useful to take a random sample of the rows in the dataset. Use the sample() function to randomly sample rows from the DataFrame:

# Randomly select 5 rows
data.sample(5)

Additional Resources

PreviousAccessing Files on ColabNextUnderstanding type(data) in Pandas

Last updated 3 months ago

For additional information, see the Pandas .

Cheatsheet