01 write smaller functions to easily combine them

Let’s say we have a function that creates a bar plot of categorical variable counts using a given data frame and a column name.

import pandas as pd

def load_data():
    return pd.DataFrame({'x': list('aababc')})

def plot_count_bars(data: pd.DataFrame, column: str):
    cnt = data[column].value_counts().rename('counts').reset_index()
    return cnt.plot.bar(x='index', y='counts')

plot_count_bars(load_data(), column='x')
import pandas as pd

def load_data():
    return pd.DataFrame({'x': list('aababc')})

def plot_count_bars(data: pd.DataFrame, column: str):
    cnt = data[column].value_counts().rename('counts').reset_index()
    return cnt.plot.bar(x='index', y='counts')

plot_count_bars(load_data(), column='x')

Looks good!

However, what if we later create another function that generates a similar plot — but using dots connected by a line instead of bars?

def plot_count_line(data: pd.DataFrame, column: str):
    cnt = data[column].value_counts().rename('counts').reset_index()
    return cnt.plot.line(x='index', y='counts', marker='o')

plot_count_line(load_data(), column='x')
def plot_count_line(data: pd.DataFrame, column: str):
    cnt = data[column].value_counts().rename('counts').reset_index()
    return cnt.plot.line(x='index', y='counts', marker='o')

plot_count_line(load_data(), column='x')

It works as well but see that the counting code is duplicated. It’s better to keep the computational logic separate from plotting.

For example, create a new function called counts that takes a data frame and generates counts. Then, modify the plotting functions, so they take the pre-computed values and only perform the rendering.

def counts(data: pd.DataFrame, column: str):
    return data[column].value_counts().rename('counts').reset_index()

def plot_count_bars(cnt: pd.DataFrame):
    return cnt.plot.bar(x='index', y='counts')

def plot_count_line(cnt: pd.DataFrame):
    return cnt.plot.line(x='index', y='counts', marker='o')

cnt = counts(load_data())
plot_count_bars(cnt)
plot_count_line(cnt)
def counts(data: pd.DataFrame, column: str):
    return data[column].value_counts().rename('counts').reset_index()

def plot_count_bars(cnt: pd.DataFrame):
    return cnt.plot.bar(x='index', y='counts')

def plot_count_line(cnt: pd.DataFrame):
    return cnt.plot.line(x='index', y='counts', marker='o')

cnt = counts(load_data())
plot_count_bars(cnt)
plot_count_line(cnt)

The example is intentionally super-simple but this separation becomes much more relevant in complex cases.