Exploring the Data

Let's start with the first use case, looking at Life Expectancy. We are interested in thinking about the factors that influence Life Expectancy. Looking at the columns dataset, and considering our use case, we can ask questions related to:

  • Economics: What is the correlation between life expectancy and GDP per capita?

  • Time Series: Does life expectancy change (increase/decrease) over time? Are the patterns consistent by region?

  • Regions: Are there geographic / regional patterns in life expectancy?

Let's explore the first questions, and you can explore the second two questions on your own.


Q1: Economics

Let's create some visualizations related to Life Expectancy and GDP.

Life expectancy and GDP

import plotly.express as px

fig = px.scatter(
    data_frame=df = px.data.gapminder(),
    x="lifeExp",
    y="gdpPercap",
    
)
fig.show()

Output:

Let's add a title and just look at Year = 2007.

import plotly.express as px

fig = px.scatter(
    data_frame=df[df['year'] == 2007],
    x="lifeExp",
    y="gdpPercap",
    color="continent",
    title="Life Expectancy v GDP: 2007"
    
)
fig.show()

Output

Time Series

There are several other options for charts looking at Life Expectancy and GDP per capita that explore how this effect changes over time.

  • What about a visualization with pooled data, not just 2007? What about different colors by year?

  • What about a visualization of the relationship for each continent?

  • Does it look different to include a regression plot?

Regions

  • What is the correlation between life expectancy and GDP per capita?

  • Time Series: Does life expectancy change (increase/decrease) over time? Are the patterns consistent by region?

  • Regions: Are there geographic / regional correlations?

Last updated