Data Types

Main Data Types

There are four main types of data:

1. Categorical Data

Categorical data represents classifications or labels. Pandas has a special data type called category to optimize memory usage and performance.

2. Numeric Data

Numeric data represents numerical values and is used for computations and analysis.

3. Temporal Data

Temporal data represents specific times or durations. These are typically stored as datetime objects in Pandas.

4. Geographic Data

Geographic data represents location-related information, such as coordinates or region names. While Pandas does not have a specific data type for geographic data, it can be represented as strings or numerical values.


Pandas Data Types

In Pandas, objects primarily have the following data types:

  1. Numeric:

    • int64: For integer numbers.

    • float64: For floating-point numbers.

    • complex: For complex numbers (less common).

  2. String/Object:

    • object: Typically used for string or mixed data types (strings and numbers). It’s the default data type for text data in Pandas.

  3. Boolean:

    • bool: Represents True and False values. In visualization, Boolean data is viewed as categorical data.

  4. Datetime:

    • datetime64[ns]: For dates and times, with nanosecond precision.

  5. Timedelta:

    • timedelta64[ns]: For differences between datetime values.

  6. Categorical:

    • category: Represents categorical data, which can save memory and improve performance when working with repeated values.

Geographic data is not represented as a different type of data in Pandas DataFrame.

Example of Data Types in a DataFrame

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],       # String/Object
    'Age': [25, 30, 35],                      # int64
    'Height': [5.5, 6.0, 5.8],                # float64
    'IsStudent': [True, False, False],        # bool
    'JoinDate': ['2023-01-01', '2023-02-01', '2023-03-01']  # datetime64
}

df = pd.DataFrame(data)

# Convert 'JoinDate' to datetime
df['JoinDate'] = pd.to_datetime(df['JoinDate'])

# Display data types
print(df.dtypes)

Output

csharpCopy codeName                 object
Age                   int64
Height              float64
IsStudent              bool
JoinDate     datetime64[ns]
dtype: object

These data types allow Pandas to perform optimized operations tailored to the type of data you are working with. If needed, you can use .astype() to convert columns to a specific type.


Last updated