01 write smaller functions to easily combine them
◆
Let’s say we have a function that creates a bar plot of categorical variable counts using a given data frame and a column name.
import pandas as pd
def load_data():
return pd.DataFrame({'x': list('aababc')})
def plot_count_bars(data: pd.DataFrame, column: str):
cnt = data[column].value_counts().rename('counts').reset_index()
return cnt.plot.bar(x='index', y='counts')
plot_count_bars(load_data(), column='x')
import pandas as pd
def load_data():
return pd.DataFrame({'x': list('aababc')})
def plot_count_bars(data: pd.DataFrame, column: str):
cnt = data[column].value_counts().rename('counts').reset_index()
return cnt.plot.bar(x='index', y='counts')
plot_count_bars(load_data(), column='x')
Looks good!
However, what if we later create another function that generates a similar plot — but using dots connected by a line instead of bars?
def plot_count_line(data: pd.DataFrame, column: str):
cnt = data[column].value_counts().rename('counts').reset_index()
return cnt.plot.line(x='index', y='counts', marker='o')
plot_count_line(load_data(), column='x')
def plot_count_line(data: pd.DataFrame, column: str):
cnt = data[column].value_counts().rename('counts').reset_index()
return cnt.plot.line(x='index', y='counts', marker='o')
plot_count_line(load_data(), column='x')
It works as well but see that the counting code is duplicated. It’s better to keep the computational logic separate from plotting.
For example, create a new function called counts
that takes a data frame and generates counts. Then, modify the
plotting functions, so they take the pre-computed values and only perform the rendering.
def counts(data: pd.DataFrame, column: str):
return data[column].value_counts().rename('counts').reset_index()
def plot_count_bars(cnt: pd.DataFrame):
return cnt.plot.bar(x='index', y='counts')
def plot_count_line(cnt: pd.DataFrame):
return cnt.plot.line(x='index', y='counts', marker='o')
cnt = counts(load_data())
plot_count_bars(cnt)
plot_count_line(cnt)
def counts(data: pd.DataFrame, column: str):
return data[column].value_counts().rename('counts').reset_index()
def plot_count_bars(cnt: pd.DataFrame):
return cnt.plot.bar(x='index', y='counts')
def plot_count_line(cnt: pd.DataFrame):
return cnt.plot.line(x='index', y='counts', marker='o')
cnt = counts(load_data())
plot_count_bars(cnt)
plot_count_line(cnt)
The example is intentionally super-simple but this separation becomes much more relevant in complex cases.