# Introduction to Pandas

[Pandas ](https://pandas.pydata.org/)is a python library to handle data similar to other statistical programming languages such as R. Pandas makes it easy to wrangle data for summaries, visualizations, and other analyses. For additional resources, explore the [Cheatsheet ](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)for the Pandas library.

### Getting Started with Pandas

Before starting, ensure you have Pandas installed. You can install Pandas using pip:

```bash
pip install numpy pandas
```

If you are using Google Colab, pandas and numpy are already installed by default.

***

### Importing Libraries

Start by importing the necessary libraries. The "as" allows creates a different alias for the library. In the code, np now refers to numpy and pd now refers to pandas.

```python
import numpy as np
import pandas as pd
```

***

When using a function from a library, the syntax is as follows: `library.function_name()`*.* In the example above, `pd.read_csv()` means to use the `read_csv()` function from the pd library. We used the line `import pandas as pd`, so python knows that `pd` refers to pandas.&#x20;

### Understanding DataFrame and Series

Pandas primarily works with two data structures: **DataFrame** and **Series**. Understanding these structures is key to effectively using Pandas.

#### Series

A Series is a one-dimensional labeled array capable of holding any data type. It is similar to a column in a spreadsheet or a list in Python but with labeled indices. You can create a Series as follows:

```python
import pandas as pd

data = [10, 20, 30, 40]
series = pd.Series(data, index=['A', 'B', 'C', 'D'])
print(series)
```

Output:

```
A    10
B    20
C    30
D    40
dtype: int64
```

#### DataFrame

A DataFrame is a two-dimensional labeled data structure, akin to a table in a database or an Excel spreadsheet. It consists of rows and columns.&#x20;

You can create a new DataFrame from scratch using Pandas by defining a dictionary and converting it, as follows:

```python
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)
```

Output:

```
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
```

This creates a DataFrame with columns `Name`, `Age`, and `City`. You can now manipulate or visualize this data as needed.

The DataFrame allows for more complex operations, including filtering, grouping, and merging datasets, making it a versatile tool for data analysis.

***

### Loading Data

Pandas can read data from various file formats like CSV, Excel, SQL, and more. For this tutorial, we'll use a CSV file as an example.

Let's use the example\_data.csv (or example\_data.xlsx). This dataset is randomly-generated data with three columns: x, y, and z. To read the file, we would need to use the appropriate function. In many instances, you would pull data from a database or an API; however, we will be uploading flat files (static files).

First, identify which type of file you are working with. We will only use CSV (comma separated values) or Excel files. Use pd.read\_csv() for csv files or pd.read\_excel() for Excel files.&#x20;

```
dat = pd.read_excel('example_data.xlsx')
```

```python
data = pd.read_csv('data.csv')
print(data.head())
```

This will load your data into a DataFrame and display the first few rows.

***

### Import Plotly.Express DataFrame

The plotly.express library contains a number of DataFrames. For background into these datasets, Plotly maintains a [list](https://github.com/plotly/datasets/blob/master/README.md) of available datasets and their origins. These datasets are only recommended for practice in visualizations, not as the basis for decision-making.

To load those datasets, you call the associated function to import into your environment. For instance, to load the tips dataset, use the tips() function as shown in the following code:

```python
import pandas as pd
import plotly.express as px

df = px.data.tips()
```

There are several datasets included in the Plotly Express library:&#x20;

```python
for name in dir(px.data):
    if '__' not in name:
        print(name)
        
absolute_import
carshare
election
gapminder
iris
tips
wind
```

***

### Conclusion

Using Pandas for data visualization provides a quick and straightforward way to explore your data. For more advanced visualizations, consider integrating Pandas with libraries like Matplotlib or Seaborn. Experiment with different types of plots to gain insights into your data!

For more information, refer to the [Pandas documentation](https://pandas.pydata.org/) and the [Matplotlib documentation](https://matplotlib.org/).
