Convert pandas MultiIndex DataFrame to Nested Dictonary

Updated: 3 minute read

Background/Problem

Just as yesterday’s post, I started writing this post about a year ago, but never finished it. Let’s change that now.

So, I had the following problem: I wanted to turn the data in a pandas.DataFrame [1] with a MultiIndex into a list of nested dictionaries. Wait what? Let me clarify with an example:

The following short script, sets up an example DataFrame:

import pandas as pd
import numpy as np

one = 6 * ["Alice"] + 6 * ["Bob"]
two = 2 * (2 * ["foo"] + 2 * ["bar"] + 2 * ["baz"])
three = 6 * ["one", "two"]

tuples = list(zip(*[one, two, three]))

index = pd.MultiIndex.from_tuples(tuples, names=["first", "second", "third"])

np.random.seed(42)
df = pd.DataFrame(
    np.random.randint(low=1, high=11, size=(3, 12)),
    index=["A", "B", "c"],
    columns=index
)

If you print it (print(df)), you get the following output:

first  Alice                     Bob
second   foo     bar     baz     foo     bar     baz
third    one two one two one two one two one two one two
A          7   4   8   5   7  10   3   7   8   5   4   8
B          8   3   6   5   2   8   6   2   5   1  10   6
c          9   1  10   3   7   4   9   3   5   3   7   5

At this point, I needed a way to convert every row (A, B, and C) into a nested dictionary. Let’s say the dictionary for row A is stored in dict_A. Printing it (print(dict_A)) should lead to the following output (I reformatted it a little bit for better readability):

{
    'Alice': {
        'bar': {'one': 8, 'two': 5},
        'baz': {'one': 7, 'two': 10},
        'foo': {'one': 7, 'two': 4}
    },
    'Bob': {
        'bar': {'one': 8, 'two': 5},
        'baz': {'one': 4, 'two': 8},
        'foo': {'one': 3, 'two': 7}
    }
}

If you think to yourself: Why would you want to do that? Well, I had a Jinja [2] template and the data in the DataFrame was supposed to be used in that template. A nested dictionary seemed to be the best data format for passing the data into the Jinja’s render function.

Ok, let’s look at how I accomplished that.

Solution

First I thought that this should be pretty easy to do. Every row in a DataFrame is of type <class 'pandas.core.series.Series'>. You can verify that with:

print(type(df.iloc[0]))

Series objects have a method called to_dict [3] that I was hoping would to the job:

print(df.iloc[0].to_dict())

But the output of that command looks like this (again a little reformatted):

{
    ('Alice', 'foo', 'one'): 7,
    ('Alice', 'foo', 'two'): 4,
    ('Alice', 'bar', 'one'): 8,
    ('Alice', 'bar', 'two'): 5,
    ('Alice', 'baz', 'one'): 7,
    ('Alice', 'baz', 'two'): 10,
    ('Bob', 'foo', 'one'): 3,
    ('Bob', 'foo', 'two'): 7,
    ('Bob', 'bar', 'one'): 8,
    ('Bob', 'bar', 'two'): 5,
    ('Bob', 'baz', 'one'): 4,
    ('Bob', 'baz', 'two'): 8
}

Well, not exactly what I was hoping for.

After some research and some tinkering around, I wrote the following function:

def convert_series_to_dict(series):
    if (series.index.nlevels == 1):
        return series.to_dict()
    else:
        output_dict = {}
        for header in series.index.levels[0]:
            if (header in series.index):
                output_dict[header] = convert_series_to_dict(series.xs(header))
        return output_dict

It’s a recursive function that checks the number of levels in the index of the passed in Series. It that number is one, it just calls the to_dict method on it. Otherwise it traverses through the levels and builds up a nested dictionary. If you call this function on the first row of the DataFrame:

print(convert_series_to_dict(df.iloc[0]))

you get exactly, what I was trying to get (see above in the Background/Problem section).

As a final touch, to create the list of nested dictionaries that I wanted, the following was added to the script:

list_of_dicts = []
for index, row in df.iterrows():
    list_of_dicts.append(convert_series_to_dict(row))

I won’t add the output you would get by printing list_of_dicts. I guess you can imagine what is looks like.

Changelog

2024-10-23:

  • Added missing line in first code listing



Take care,
Andreas


References

  1. NumFOCUS Inc., “pandas.DataFrame.” [Online]. Available at: https://docs.u-boot.org/en/latest/. [Accessed: 07-Feb-2024].
  2. Pallets, “Jina,” 2007. [Online]. Available at: https://jinja.palletsprojects.com/en/3.1.x/. [Accessed: 07-Feb-2024].
  3. NumFOCUS Inc., “pandas.DataFrame.” [Online]. Available at: https://pandas.pydata.org/docs/reference/api/pandas.Series.to_dict.html. [Accessed: 26-Jan-2024].

Updated:

Leave a comment