Reviewing Data
When working with a DataFrame, it's important to understand its structure and contents. Pandas provides several functions to help review a DataFrame.
Viewing the Top and Bottom Rows
One way to explore a new dataset is to preview the DataFrame. To preview a DataFrame by seeing the first few rows or the last few rows, use the head() and tail() functions:
You can specify the number of rows to view, e.g., data.head(10)
to see the first 10 rows or data.tail(10)
to see the last 10 rows.
Getting Basic Information
Use info()
to get an overview of the DataFrame:
Descriptive Statistics
The describe()
method provides summary statistics for numerical columns:
This includes metrics like mean, standard deviation, min, max, and percentiles.
Value Counts
Use value_counts()
to see the distribution of values in a specific column of categorical values:
Checking for Missing Values
Most datasets contain missing values. To identify missing data:
Shape and Columns
To get the dimensions and column names of the DataFrame:
Sample Rows
Sometimes, it is useful to take a random sample of the rows in the dataset. Use the sample() function to randomly sample rows from the DataFrame:
Additional Resources
Last updated