Intro to Data Visualization
  • Introduction
  • Getting started
    • Introduction to Pandas
    • Accessing Files on Colab
    • Reviewing Data
      • Understanding type(data) in Pandas
    • Data Types
      • Categorical Data
      • Numeric Data
      • Temporal Data
      • Geographic Data
    • How to Check Data Type
    • Slicing and Subsetting DataFrames
    • Aggregating Data
  • Visualization Types
    • Exploratory Process
    • Explanatory Process
  • data exploration
    • Exploration Overview
    • Exploration with Plotly
      • Exploring Distributions
      • Exploring Relationships
      • Exploring with Regression Plots
      • Exploring Correlations
      • Exploring Categories
      • Exploring Time Series
      • Exploring Stocks with Candlestick
      • Exploring with Facets
      • Exploring with Subplots
    • Exploring with AI
  • Data Explanation
    • Data Explanation with Plotly
      • Using Text
      • Using Annotations
      • Using Color
      • Using Shape
      • Accessibility
      • Using Animations
    • Use Cases
  • Exercises and examples
    • Stock Market
      • Loading Yahoo! Finance Data
      • Use Cases for YF
      • Exploring YF Data
      • Understanding Boeing Data Over Time
      • Polishing the visualization
      • Analyzing with AI
      • Comparisons
    • The Gapminder Dataset
      • Loading the Gapminder Data
      • Use Cases
      • Exploring the Data
      • Exporting a Static Image
Powered by GitBook
On this page
  • Getting Started with Pandas
  • Importing Libraries
  • Understanding DataFrame and Series
  • Loading Data
  • Import Plotly.Express DataFrame
  • Conclusion
  1. Getting started

Introduction to Pandas

Pandas is a powerful tool for data manipulation and analysis, and it integrates seamlessly with various visualization.

PreviousGetting startedNextAccessing Files on Colab

Last updated 3 months ago

is a python library to handle data similar to other statistical programming languages such as R. Pandas makes it easy to wrangle data for summaries, visualizations, and other analyses. For additional resources, explore the for the Pandas library.

Getting Started with Pandas

Before starting, ensure you have Pandas installed. You can install Pandas using pip:

pip install numpy pandas

If you are using Google Colab, pandas and numpy are already installed by default.


Importing Libraries

Start by importing the necessary libraries. The "as" allows creates a different alias for the library. In the code, np now refers to numpy and pd now refers to pandas.

import numpy as np
import pandas as pd

When using a function from a library, the syntax is as follows: library.function_name(). In the example above, pd.read_csv() means to use the read_csv() function from the pd library. We used the line import pandas as pd, so python knows that pd refers to pandas.

Understanding DataFrame and Series

Pandas primarily works with two data structures: DataFrame and Series. Understanding these structures is key to effectively using Pandas.

Series

A Series is a one-dimensional labeled array capable of holding any data type. It is similar to a column in a spreadsheet or a list in Python but with labeled indices. You can create a Series as follows:

import pandas as pd

data = [10, 20, 30, 40]
series = pd.Series(data, index=['A', 'B', 'C', 'D'])
print(series)

Output:

A    10
B    20
C    30
D    40
dtype: int64

DataFrame

A DataFrame is a two-dimensional labeled data structure, akin to a table in a database or an Excel spreadsheet. It consists of rows and columns.

You can create a new DataFrame from scratch using Pandas by defining a dictionary and converting it, as follows:

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago

This creates a DataFrame with columns Name, Age, and City. You can now manipulate or visualize this data as needed.

The DataFrame allows for more complex operations, including filtering, grouping, and merging datasets, making it a versatile tool for data analysis.


Loading Data

Pandas can read data from various file formats like CSV, Excel, SQL, and more. For this tutorial, we'll use a CSV file as an example.

Let's use the example_data.csv (or example_data.xlsx). This dataset is randomly-generated data with three columns: x, y, and z. To read the file, we would need to use the appropriate function. In many instances, you would pull data from a database or an API; however, we will be uploading flat files (static files).

First, identify which type of file you are working with. We will only use CSV (comma separated values) or Excel files. Use pd.read_csv() for csv files or pd.read_excel() for Excel files.

dat = pd.read_excel('example_data.xlsx')
data = pd.read_csv('data.csv')
print(data.head())

This will load your data into a DataFrame and display the first few rows.


Import Plotly.Express DataFrame

To load those datasets, you call the associated function to import into your environment. For instance, to load the tips dataset, use the tips() function as shown in the following code:

import pandas as pd
import plotly.express as px

df = px.data.tips()

There are several datasets included in the Plotly Express library:

for name in dir(px.data):
    if '__' not in name:
        print(name)
        
absolute_import
carshare
election
gapminder
iris
tips
wind

Conclusion

Using Pandas for data visualization provides a quick and straightforward way to explore your data. For more advanced visualizations, consider integrating Pandas with libraries like Matplotlib or Seaborn. Experiment with different types of plots to gain insights into your data!

The plotly.express library contains a number of DataFrames. For background into these datasets, Plotly maintains a of available datasets and their origins. These datasets are only recommended for practice in visualizations, not as the basis for decision-making.

For more information, refer to the and the .

Pandas
Cheatsheet
list
Pandas documentation
Matplotlib documentation