Average Number of Years Lost For People Who Died of Coronavirus in France¶

We illustrate the use of the package by estimating the average number of years by which people’s lives are shortened due to coronavirus. Using data from here that gives us the distribution of ages of people who died from COVID-19 in France, we find that people’s lives are shortened by about 9 years on average. And that this number has not changed through the crisis.

[1]:

import matplotlib.ticker as ticker
import pandas as pd
import seaborn as sns

from lost_years import lost_years_who

sns.set(style="darkgrid")

Prepare example input in DataFrame¶

Please look at country codes here:- https://www.lifetable.de/cgi-bin/country_codes.php

[2]:

df = pd.read_csv("data/covid-cedc-quot.csv", delimiter=";")
list(df)

[2]:

['reg', 'cl_age90', 'Dc_Elec_Covid_cum', 'jour']

[3]:

df["jour"].head()

[3]:

0    2020-03-06
1    2020-03-07
2    2020-03-08
3    2020-03-09
4    2020-03-10
Name: jour, dtype: object

[4]:

df.columns = ["reg", "age", "n_deaths", "date"]
df.drop(df.loc[df.age == 0].index, inplace=True)
df

[4]:

	reg	age	n_deaths	date
73	1	9	0	2020-03-06
74	1	9	0	2020-03-07
75	1	9	0	2020-03-08
76	1	9	0	2020-03-09
77	1	9	0	2020-03-10
...	...	...	...	...
16055	94	90	0	2020-05-13
16056	94	90	0	2020-05-14
16057	94	90	0	2020-05-15
16058	94	90	0	2020-05-16
16059	94	90	0	2020-05-17

14600 rows × 4 columns

[5]:

df2 = df.groupby(["date", "age"]).agg({"n_deaths": sum})
df2

/tmp/ipykernel_3020/1108104031.py:1: FutureWarning: The provided callable <built-in function sum> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  df2 = df.groupby(["date", "age"]).agg({"n_deaths": sum})

[5]:

		n_deaths
date	age
2020-03-06	9	0
	19	0
	29	0
	39	0
	49	0
...	...	...
2020-05-17	59	400
	69	1021
	79	2060
	89	3966
	90	2645

730 rows × 1 columns

[6]:

df2.unstack().plot(kind="area", stacked=True, figsize=(12, 8))

[6]:

<Axes: xlabel='date'>

../_images/examples_corona_virus_fr_daily_7_1.svg

[7]:

df2.unstack().plot(logy=True, figsize=(12, 8))

[7]:

<Axes: xlabel='date'>

../_images/examples_corona_virus_fr_daily_8_1.svg

[8]:

df2.reset_index(inplace=True)
df2

[8]:

	date	age	n_deaths
0	2020-03-06	9	0
1	2020-03-06	19	0
2	2020-03-06	29	0
3	2020-03-06	39	0
4	2020-03-06	49	0
...	...	...	...
725	2020-05-17	59	400
726	2020-05-17	69	1021
727	2020-05-17	79	2060
728	2020-05-17	89	3966
729	2020-05-17	90	2645

730 rows × 3 columns

[9]:

df2["year"] = 2020
df2["country"] = "FRA"
df2["sex"] = "M"

[10]:

df2

[10]:

	date	age	n_deaths	year	country	sex
0	2020-03-06	9	0	2020	FRA	M
1	2020-03-06	19	0	2020	FRA	M
2	2020-03-06	29	0	2020	FRA	M
3	2020-03-06	39	0	2020	FRA	M
4	2020-03-06	49	0	2020	FRA	M
...	...	...	...	...	...	...
725	2020-05-17	59	400	2020	FRA	M
726	2020-05-17	69	1021	2020	FRA	M
727	2020-05-17	79	2060	2020	FRA	M
728	2020-05-17	89	3966	2020	FRA	M
729	2020-05-17	90	2645	2020	FRA	M

730 rows × 6 columns

[11]:

fr_ldf = lost_years_who(df2, {"age": "age", "country": "country", "sex": "sex", "year": "year"})
fr_ldf.head()

[11]:

	date	age	year	country	sex	who_age	who_country	who_sex	who_year	who_life_expectancy
0	2020-03-06	9	2020	FRA	M	1	FRA	MLE	2020	79.064192
1	2020-03-06	19	2020	FRA	M	1	FRA	MLE	2020	79.064192
2	2020-03-06	29	2020	FRA	M	1	FRA	MLE	2020	79.064192
3	2020-03-06	39	2020	FRA	M	1	FRA	MLE	2020	79.064192
4	2020-03-06	49	2020	FRA	M	1	FRA	MLE	2020	79.064192

[12]:

gldf = fr_ldf.groupby(["date", "age", "who_life_expectancy"]).agg({"n_deaths": sum})
gldf

/tmp/ipykernel_3020/4129139556.py:1: FutureWarning: The provided callable <built-in function sum> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  gldf = fr_ldf.groupby(["date", "age", "who_life_expectancy"]).agg({"n_deaths": sum})

[12]:

			n_deaths
date	age	who_life_expectancy
2020-03-06	9	79.064192	0
	19	79.064192	0
	29	79.064192	0
	39	79.064192	0
	49	79.064192	0
...	...	...	...
2020-05-17	59	79.064192	400
	69	79.064192	1021
	79	79.064192	2060
	89	79.064192	3966
	90	79.064192	2645

730 rows × 1 columns

[13]:

gldf.reset_index(inplace=True)
gldf

[13]:

	date	age	who_life_expectancy	n_deaths
0	2020-03-06	9	79.064192	0
1	2020-03-06	19	79.064192	0
2	2020-03-06	29	79.064192	0
3	2020-03-06	39	79.064192	0
4	2020-03-06	49	79.064192	0
...	...	...	...	...
725	2020-05-17	59	79.064192	400
726	2020-05-17	69	79.064192	1021
727	2020-05-17	79	79.064192	2060
728	2020-05-17	89	79.064192	3966
729	2020-05-17	90	79.064192	2645

730 rows × 4 columns

[14]:

gldf["years_lost"] = gldf["who_life_expectancy"] * gldf["n_deaths"]
gldf

[14]:

	date	age	who_life_expectancy	n_deaths	years_lost
0	2020-03-06	9	79.064192	0	0.000000
1	2020-03-06	19	79.064192	0	0.000000
2	2020-03-06	29	79.064192	0	0.000000
3	2020-03-06	39	79.064192	0	0.000000
4	2020-03-06	49	79.064192	0	0.000000
...	...	...	...	...	...
725	2020-05-17	59	79.064192	400	31625.676724
726	2020-05-17	69	79.064192	1021	80724.539838
727	2020-05-17	79	79.064192	2060	162872.235129
728	2020-05-17	89	79.064192	3966	313568.584718
729	2020-05-17	90	79.064192	2645	209124.787337

730 rows × 5 columns

[15]:

gldf2 = gldf.groupby(["date"]).agg({"years_lost": sum})
gldf2

/tmp/ipykernel_3020/3749646989.py:1: FutureWarning: The provided callable <built-in function sum> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  gldf2 = gldf.groupby(["date"]).agg({"years_lost": sum})

[15]:

	years_lost
date
2020-03-06	79.064192
2020-03-07	316.256767
2020-03-08	553.449343
2020-03-09	790.641918
2020-03-10	948.770302
...	...
2020-05-13	795702.026376
2020-05-14	799259.915007
2020-05-15	803687.509749
2020-05-16	807798.847723
2020-05-17	810724.222820

73 rows × 1 columns

[16]:

gldf2 = gldf2.join(fr_ldf.groupby(["date"]).agg({"n_deaths": sum}))
gldf2["avg_years_lost"] = gldf2.years_lost / gldf2.n_deaths
gldf2

/tmp/ipykernel_3020/4090051171.py:1: FutureWarning: The provided callable <built-in function sum> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  gldf2 = gldf2.join(fr_ldf.groupby(["date"]).agg({"n_deaths": sum}))

[16]:

	years_lost	n_deaths	avg_years_lost
date
2020-03-06	79.064192	1	79.064192
2020-03-07	316.256767	4	79.064192
2020-03-08	553.449343	7	79.064192
2020-03-09	790.641918	10	79.064192
2020-03-10	948.770302	12	79.064192
...	...	...	...
2020-05-13	795702.026376	10064	79.064192
2020-05-14	799259.915007	10109	79.064192
2020-05-15	803687.509749	10165	79.064192
2020-05-16	807798.847723	10217	79.064192
2020-05-17	810724.222820	10254	79.064192

73 rows × 3 columns

[17]:

g = sns.relplot(x="date", y="avg_years_lost", kind="line", data=gldf2.reset_index())
g.fig.autofmt_xdate()
g.fig.set_figwidth(16)
g.fig.set_figheight(6)
for ax in g.axes.flatten():
    ax.xaxis.set_major_locator(ticker.MultipleLocator(10))

../_images/examples_corona_virus_fr_daily_18_0.svg

[ ]: