Python dataclass

Learn Python dataclass with simple examples, including @dataclass, default values, field(default_factory), frozen dataclasses, ordering, __post_init__, asdict(), and common mistakes.

Published

Updated

Read time 10 min read

Reviewed byDeepak Prasad

Python dataclass

If you often write small classes that mostly hold data—users, settings rows, API payloads—you have probably duplicated the same __init__, __repr__, and __eq__ boilerplate. Below I start with one runnable @dataclass example, then we build up together: defaults, safe defaults for lists and dicts, your own methods, __post_init__, turning objects into dicts, freezing instances, comparing and ordering, hiding fields from repr or equality, nesting types, a quick inheritance pattern, and how this compares to NamedTuple and SimpleNamespace. When you need every edge case, keep the official dataclasses documentation open beside you—I stay close to what you are most likely to ship in real code.

Tested on: Python 3.13.3; kernel 6.14.0-37-generic.


Simple Python dataclass example

Let's start with the smallest thing that still feels useful: two fields, you construct one object, you print it.

python
from dataclasses import dataclass


@dataclass
class User:
    name: str
    age: int


user = User("Alice", 30)
print(user)
Output

Run that on your machine and you should get something like User(name='Alice', age=30)—a readable repr—without you hand-writing __init__ or __repr__.


What is a dataclass in Python?

Picture a normal Python class, then put @dataclass on top. When the class body is parsed, you have already told Python what each attribute should look like using type annotations. The decorator uses those names to build __init__, __repr__, and __eq__ for you (and more if you ask for it). That idea landed in Python 3.7 as PEP 557.

So for you the mental model is simple: you describe the shape of the data once at the top of the class; Python fills in the boring methods you used to copy-paste. If you want to remember what life without it felt like, the next section shows the same idea as a plain class.


dataclass vs normal class

Say you need a Book with title, author, and pages. Without dataclasses you usually wire the same three names into __init__, into how the object prints, and into equality—again and again.

python
class BookPlain:
    def __init__(self, title: str, author: str, pages: int):
        self.title = title
        self.author = author
        self.pages = pages

    def __repr__(self):
        return f"BookPlain(title={self.title!r}, author={self.author!r}, pages={self.pages})"

    def __eq__(self, other):
        if not isinstance(other, BookPlain):
            return NotImplemented
        return (self.title, self.author, self.pages) == (other.title, other.author, other.pages)
Output

With @dataclass you write the fields once and stop there:

python
from dataclasses import dataclass


@dataclass
class Book:
    title: str
    author: str
    pages: int
Output

You can still call Book("1984", "George Orwell", 328) and use == between two books. The win for you shows up the moment you add or rename a field: you touch one place instead of three. If you want more background on classes and constructors in general, see the class and constructor guides.


Default values in a dataclass

You already know how default arguments work on functions—field defaults work the same way. Anything without a default must come first, then fields with defaults (Python enforces that ordering so the generated __init__ stays valid).

python
from dataclasses import dataclass


@dataclass
class Profile:
    username: str
    role: str = "member"
    points: int = 0


p = Profile("alex")
print(p)
Output

Try creating Profile("alex") without passing role or points; you should see those filled in for you in the printed object.


Use field(default_factory) for lists and dictionaries

Here is a trap I still see in real code: writing items: list = [] (or items: list[str] = []) as a default. Every instance ends up sharing that one list. You do not want that.

Instead you hand dataclass a callable that builds a fresh empty container each time—usually list or dict—via field(default_factory=...).

python
from dataclasses import dataclass, field


@dataclass
class Cart:
    items: list[str] = field(default_factory=list)


cart1 = Cart()
cart2 = Cart()
cart1.items.append("book")
print(cart1.items)
print(cart2.items)
Output

Run it: cart1 should show ['book'] while cart2 stays []. That is the behavior you expect when each cart owns its own list. The field() documentation spells out default_factory in more detail if you want the fine print.


Add methods to a dataclass

Nothing stops you from adding regular methods next to your fields—it is still a class. I use this when I want a tiny helper (formatting, a computed label) sitting right beside the data it belongs to.

python
from dataclasses import dataclass


@dataclass
class Book:
    title: str
    author: str
    pages: int

    def summary(self) -> str:
        return f"{self.title} by {self.author} ({self.pages} pp.)"


b = Book("1984", "George Orwell", 328)
print(b.summary())
Output

You should see a single readable line. If your class is turning into a big object graph with lots of behavior, ask yourself whether a plain class (or another pattern) reads clearer for your teammates—dataclasses shine when data is the point.


Use post_init after object creation

After dataclass builds and runs __init__ for you, Python looks for __post_init__ on your class. If you defined it, it runs next. That hook is where I put validation (“if end < start, raise”), normalization, or values derived from other fields.

If a value should not be passed into the constructor at all, mark it with field(init=False) and set it inside __post_init__.

python
from dataclasses import dataclass, field


@dataclass
class Book:
    title: str
    author: str
    pages: int
    weight_kg: float = field(init=False)

    def __post_init__(self) -> None:
        self.weight_kg = round(self.pages * 0.001, 3)


b = Book("1984", "George Orwell", 328)
print(b.weight_kg)
Output

You never pass weight_kg into Book(...); it appears after init. You should see a small float based on pages.


Convert dataclass to dictionary using asdict()

Sometimes you need JSON-friendly data or you are handing values to something that expects a dict. asdict() walks your instance and builds a dictionary, turning nested dataclasses into nested dicts along the way.

python
from dataclasses import asdict, dataclass


@dataclass
class User:
    name: str
    age: int


user = User("Alice", 30)
print(asdict(user))
Output

You should see {'name': 'Alice', 'age': 30}. If you prefer a tuple in field order instead of a mapping, reach for astuple() the same way.

One heads-up (I come back to this in common mistakes): asdict() is not just “give me __dict__”; it follows the documented recipe, including recursion and copying behavior that can surprise you on large or deeply linked objects.


Frozen dataclass for immutable objects

Pass frozen=True when you want the instance to behave more like a value: after construction, attribute writes raise FrozenInstanceError. I use this for small identifiers, coordinates, or anything you do not want changed by accident.

python
from dataclasses import FrozenInstanceError, dataclass


@dataclass(frozen=True)
class Point:
    x: int
    y: int


point = Point(1, 2)
try:
    point.x = 10
except FrozenInstanceError as e:
    print(type(e).__name__, str(e))
Output

You should see FrozenInstanceError printed along with text telling you the field cannot be assigned.


Order and compare dataclass objects

Flip on order=True when you want <, >, and friends generated for you. Here is the part that trips people up: Python does not guess which field “matters most.” It walks your fields left to right, like comparing two tuples built from those values.

python
from dataclasses import dataclass


@dataclass(order=True)
class Score:
    points: int
    name: str


a = Score(90, "Alice")
b = Score(80, "Bob")
print(a > b)
Output

You get True because 90 beats 80 on the first field. If points were equal, Python would move on to name and compare strings lexicographically.

So if you had title, then author, then pages, a > comparison looks at title first—not pages alone. When you care about sorting by one column, either declare that field earlier or mark other fields with field(compare=False) so they drop out of comparisons. You are in control of the ordering story.


Exclude fields from repr or comparison

Maybe you do not want a secret token showing up whenever someone prints your object, or you want equality to ignore a volatile cache field. Per-field field() flags let you tune what the generated methods include.

python
from dataclasses import dataclass, field


@dataclass
class Account:
    username: str
    token: str = field(repr=False, compare=False)


a1 = Account("alex", "secret-a")
a2 = Account("alex", "secret-b")
print(a1)
print(a1 == a2)
Output

When you print a1, you should not see token in the output, and a1 == a2 should be True because only username participates in equality for you here.


Nested dataclasses

You can nest them like any other type: one dataclass field whose type is another dataclass. When you call asdict() on the outer object, nested dataclasses become nested dicts, which is handy when you are building payloads.

python
from dataclasses import asdict, dataclass


@dataclass
class Address:
    city: str
    zip_code: str


@dataclass
class Person:
    name: str
    home: Address


p = Person("Riya", Address("Pune", "411001"))
print(asdict(p))
Output

You should see something like {'name': 'Riya', 'home': {'city': 'Pune', 'zip_code': '411001'}}.


Inheritance with dataclasses

You can subclass the same way you already do with regular classes: the child lists extra annotated fields, and the parent fields stay part of the constructor. Just watch the usual rule—non-default fields before defaulted ones—across the whole inheritance chain. When parent __init__ logic still matters, Python super() applies the same cooperative rules as with hand-written constructors.

python
from dataclasses import dataclass


@dataclass
class Publication:
    title: str


@dataclass
class Magazine(Publication):
    issue: int


m = Magazine("PyMag", 42)
print(m)
Output

You should see both title and issue in the repr. If you stack several levels of inheritance, skim the official notes on init=False on parent classes so you are not surprised by constructor shape.


dataclass vs NamedTuple vs SimpleNamespace

Here is how I choose between them in practice:

Construct Mutability Boilerplate When I reach for it
@dataclass Mutable unless you set frozen=True Low for init/repr/eq App models with fields, defaults, and a few methods
typing.NamedTuple / collections.namedtuple Immutable tuple feel Low Small fixed records, unpacking, hashable values when you need them
types.SimpleNamespace Mutable bag of attributes None Quick experiments, not a schema I would publish

If your data is really a row of values, a tuple or named tuple variant might still fit. If you live in string keys and dynamic shape, a dictionary might be simpler than a class. Dataclasses sit in the middle: named, typed fields, less ceremony than rolling everything by hand.


Common mistakes with Python dataclasses

Things I watch for when I review code:

  • Forgetting a type annotation on something you meant as a field—without it, dataclass might not treat that name as a field at all.
  • Using items: list[str] = [] (or any mutable literal default)—share one list across all instances by accident; use field(default_factory=list) instead.
  • Expecting Python to reject User(123, "oops") at runtime—annotations help you and tools like mypy, but the interpreter itself does not enforce them unless you add checks (often in __post_init__).
  • Believing order=True magically sorts by the field you care about—comparisons walk fields in declaration order; reorder fields or use compare=False where needed.
  • Growing a dataclass into a huge behavior-heavy hierarchy—sometimes a plain class or composition reads clearer for the next person who opens the file.
  • Assuming instances are immutable—by default you can still assign new values to attributes; reach for frozen=True when you need immutability.
  • Treating asdict() as a cheap shallow view—it recursively expands dataclasses and follows the documented copying rules, which matters for performance and for what still aliases.

Python dataclass quick reference table

What you want What you usually write
A small data record @dataclass + annotated fields
A scalar default role: str = "guest"
A fresh list or dict per object items: list[str] = field(default_factory=list)
Logic right after construction def __post_init__(self): ...
A field not passed into __init__ field(init=False) and assign in __post_init__
Immutability @dataclass(frozen=True)
Rich comparisons @dataclass(order=True)
Hide from repr or skip in == field(repr=False) / field(compare=False)
Dict for JSON or logging asdict(obj)
Tuple in field order astuple(obj)

Summary

By now you have seen how @dataclass lets you declare fields once and lean on Python for __init__, __repr__, and __eq__, with optional ordering and freezing when you need them. You default scalars directly, use default_factory for mutable defaults, hook __post_init__ for validation and derived values, and use asdict / astuple when another layer of your program wants plain data structures. Keep comparisons honest—field order drives order=True—and remember annotations document intent for you and for static checkers, not automatic runtime guards unless you add them yourself.


References


Frequently Asked Questions

1. What does the @dataclass decorator add automatically?

For you it means less typing: Python generates init, repr, and eq from the fields you annotate, and you can turn on extras like ordering or a frozen instance using the flags described in the official dataclasses docs.

2. Do Python dataclasses check types at runtime?

Not by default—the annotations tell dataclass what counts as a field and document intent for you and for tools like mypy, but CPython will not stop you from passing the wrong type unless you add checks yourself (often in post_init).

3. When should I use field(default_factory=...) instead of a default value?

Whenever the default would be mutable, like a list or dict: use default_factory so each new instance gets its own container instead of accidentally sharing one list across every object.

4. How does ordering work when I use order=True on a dataclass?

Python compares your instances like tuples built from the fields in the order you declared them—first field, then the next to break ties—not by magically picking “the main” field unless you arrange fields or compare flags that way yourself.
Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with more than 15 years of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive …