Computing Thoughts

Bruce Eckel's Programming Blog

Sep 16, 2018 - 4 minute read

JSON Encoding Python Dataclasses

The Hugo static-site generator can work with data files in the form of JSON, yaml or toml. If you place these in the data directory you can access them within Hugo templates (including Hugo shortcodes, which are called directly from a Markdown file) by saying .Site.Data.<filename>, and then use the contents as part of your static-site build.

This feature came in handy for the AtomicKotlin.com site, because the ebook deployment service I’m using (Stepik.org) uses numeric ids for each chapter, and I needed to generate links for the sample chapters. I decided to generate JSON from this list of ids, and thought it would be interesting to see whether Python 3.7’s @dataclass could be used for this.

The @dataclass decorator is only available in Python 3.7 and later. It follows the precedent set by languages like Scala (case classes) and Kotlin (data classes). It creates what is sometimes called a data transfer object, whose job is only to hold data. The problem is that you often need to create proper behavior for such objects, which requires defining methods for equality comparison, display, etc., and these methods are tedious to create and prone to mistakes. A @dataclass generates all these methods for you, providing a succinct syntax for data transfer classes.

I’ll start by creating a Person class. Note that @dataclass is only possible because of the addition of optional static typing to Python 3, as we need to declare types for the fields in the class. Observe how simple it is to create a @dataclass:

# person.py
# Requires Python 3.7
from dataclasses import dataclass
from pprint import pprint

@dataclass
class Person:
    first: str
    last: str
    id: int

def person_list(person_constructor, source: str):
    return [
        person_constructor(n[0], n[1], i)
        for i, n in enumerate(map(str.split,
            source.strip().splitlines()))
    ]

def muppets(person_constructor):
    return person_list(
        person_constructor,
        """
        Kermit Frog
        Fozzie Bear
        Bunsen Honeydew
        Rowlf Dog
        Camilla Chicken
        """)

if __name__ == "__main__":
    pprint(muppets(Person))

person_list() takes a person_constructor argument which it applies to create each instance of a Person object. This allows us to use a different Person constructor later in this post.

person_list() takes a block of text, splits it into lines, then maps str.split() to each line, producing a first and last name. It uses enumerate() to generate id numbers, and returns a list of Person objects for use in testing.

The output shows that the @dataclass decorator automatically generates a __str__() method for Person:

[Person(first='Kermit', last='Frog', id=0),
 Person(first='Fozzie', last='Bear', id=1),
 Person(first='Bunsen', last='Honeydew', id=2),
 Person(first='Rowlf', last='Dog', id=3),
 Person(first='Camilla', last='Chicken', id=4)]

However, there is no support for JSON encoding—that’s something we must do ourselves by inheriting the JSONEncoder class and overriding the default() method:

# person_encoder.py
import json
from person import Person, muppets

class PersonEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Person):
            return obj.__dict__
        # Base class default() raises TypeError:
        return json.JSONEncoder.default(self, obj)

if __name__ == "__main__":
    print(json.dumps(muppets(Person), cls=PersonEncoder, indent=2))

The basic JSONEncoder can only encode dicts, lists and primitive types. If you create your own encoder, you must check for that type inside your default() and produce a result that the basic encoder can handle.

Notice what I’ve done here: all I had to do was produce the internal object dictionary by saying return obj.__dict__. Because the basic encoder knows what to do with a dict, it can produce the keys and values from the __dict__ produced by the @dataclass.

When calling json.dumps(), you must provide the new PersonEncoder as the cls argument. Now we get properly-formatted JSON as our output:

[
  {
    "first": "Kermit",
    "last": "Frog",
    "id": 0
  },
  {
    "first": "Fozzie",
    "last": "Bear",
    "id": 1
  },
  {
    "first": "Bunsen",
    "last": "Honeydew",
    "id": 2
  },
  {
    "first": "Rowlf",
    "last": "Dog",
    "id": 3
  },
  {
    "first": "Camilla",
    "last": "Chicken",
    "id": 4
  }
]

What’s interesting is that I didn’t hand it a Person object, but a list. json.dumps() knew to take the list apart, then apply my PersonEncoder to each element of the list.

To take the example a little further, let’s see how hard it is to apply the same technique to a namedtuple, which has much in common with a @dataclass except that the fields of a namedtuple are immutable. It turns out that PersonEncoder is identical:

# namedtuple_encoder.py
import json
from person import muppets
from collections import namedtuple

Person = namedtuple("Person", "first last id")

class PersonEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Person):
            return obj.__dict__
        return json.JSONEncoder.default(self, obj)

if __name__ == "__main__":
    print(json.dumps(muppets(Person), cls=PersonEncoder, indent=2))

Notice there’s a slight difference in the resulting JSON:

[
  [
    "Kermit",
    "Frog",
    0
  ],
  [
    "Fozzie",
    "Bear",
    1
  ],
  [
    "Bunsen",
    "Honeydew",
    2
  ],
  [
    "Rowlf",
    "Dog",
    3
  ],
  [
    "Camilla",
    "Chicken",
    4
  ]
]

Instead of a list of dicts, the namedtuple produces a list of lists.