Python Marshmallow Tutorial: Schema, Fields, dump(), and load()

Learn Python Marshmallow with examples for creating schemas, using fields, serializing data with dump(), deserializing and validating data with load(), nested schemas, many=True, DateTime formatting, and common errors.

Published

Updated

Read time 8 min read

Reviewed byDeepak Prasad

Python Marshmallow Tutorial: Schema, Fields, dump(), and load()

Marshmallow is a small library for structuring, validating, and serializing data in Python. You declare a Schema with fields, call dump() when sending data outward, and load() when accepting untrusted input. The sections below walk through schemas, common field types, validation, nesting, and the hooks that sit around dump() and load().

Install with pip (see pip requirements file for project pinning). For error handling patterns around ValidationError, see Python try except.

Tested on: Python 3.13.3; marshmallow 3.26.2; kernel 6.14.0-37-generic.


What is Marshmallow in Python?

Marshmallow is a serialization and validation library. You describe the expected shape of data once in a Schema class (field names, types, which keys are required, and optional validators). That description then drives two operations: turning Python data into plain dicts you can send as JSON (dump()), and turning untrusted dicts (for example parsed request bodies) into validated Python data (load()).

It does not replace your database or web framework. It sits at the boundary where structured data enters or leaves your program—so you validate early, fail with clear field-level errors, and keep your internal code working with predictable dicts or objects.

A minimal round trip: define fields, load() inbound data, then dump() it again for a response or log line.

python
from marshmallow import Schema, fields


class BookSchema(Schema):
    title = fields.Str(required=True)
    pages = fields.Int(load_default=0)


payload = {"title": "Python Distilled", "pages": 250}
book = BookSchema().load(payload)
print("after load:", book)
print("after dump:", BookSchema().dump(book))

After load, book is a dict with normalized types (here pages is an int). After dump, you get JSON-serializable values—often strings for dates and decimals, depending on field types. The following sections unpack Schema, individual field types, errors, nesting, and hooks.


Install Marshmallow using pip

text
pip install marshmallow

Pin a major series in production (for example marshmallow>=3.21,<4) so upgrades stay predictable.

The in-page Run control for Python on this site executes code in a remote sandbox that includes only the standard library, so third-party packages such as Marshmallow are not available there. Every Python example below uses {run=false} on the fence so the button is not offered; run the snippets on your machine after installing Marshmallow (for example save a file and run python example.py, or paste into python -i).


Create a Marshmallow schema

A schema subclasses Schema and attaches fields as class attributes. Instantiate the class once and reuse it like a small codec:

python
from marshmallow import Schema, fields


class UserSchema(Schema):
    id = fields.Int(dump_only=True)
    name = fields.Str(required=True)
    email = fields.Email(required=True)
    age = fields.Int(load_default=None)


schema = UserSchema()

This schema expects name and email on input; id is output-only in typical APIs; age may be omitted on load and becomes None.


Common Marshmallow field types

fields.Str, fields.Int, fields.Float, fields.Bool, fields.Email, fields.Url, fields.Date, fields.DateTime, fields.Decimal, fields.List, fields.Dict, and fields.Nested cover most payloads. Use validate helpers such as validate.Length, validate.Range, and validate.OneOf to express rules next to the field.

python
from marshmallow import Schema, fields, validate


class TagSchema(Schema):
    label = fields.Str(validate=validate.Length(min=1, max=40))
    count = fields.Int(validate=validate.Range(min=0))

Serialize data using dump()

dump() walks the object or dict field-by-field and returns a plain dict (order-preserving) ready for json.dumps. It does not run the same validators as load() unless you customize that path—treat dumped data as already trusted.

python
from marshmallow import Schema, fields


class UserSchema(Schema):
    id = fields.Int(dump_only=True)
    name = fields.Str(required=True)
    email = fields.Email(required=True)
    age = fields.Int(load_default=None)


user = {"id": 1, "name": "Ada", "email": "ada@example.com", "age": 36}
print(UserSchema().dump(user))

That includes id, name, email, and age keys suitable for JSON encoding.


Deserialize and validate data using load()

load() validates required fields, runs validators, and coerces types. Valid input returns a plain dict:

python
from marshmallow import Schema, fields


class UserSchema(Schema):
    id = fields.Int(dump_only=True)
    name = fields.Str(required=True)
    email = fields.Email(required=True)
    age = fields.Int(load_default=None)


payload = {"name": "Lin", "email": "lin@example.com"}
print(UserSchema().load(payload))

That returns {'name': 'Lin', 'email': 'lin@example.com', 'age': None} because load_default supplies age when missing.


dump() vs load() in Marshmallow

Think in terms of data direction and trust.

load() answers: “Is this dict allowed, and what Python values should I use?” It runs required-field checks, type coercion (strings to numbers, ISO strings to datetime, and so on), and any validators you attached. It is the right place to gate untrusted input (HTTP bodies, CLI flags, config files) before that data reaches your domain logic. When something is wrong, Marshmallow raises ValidationError and you typically map err.messages to HTTP 400 or similar.

dump() answers: “How do I expose this object or dict to the outside world?” It reads attributes or keys and builds a plain dict suitable for json.dumps or logging. By default it does not re-run the same validation pipeline as load(); you assume the value is already valid for your API contract. Use dump() for responses, audit payloads, and message queues—after your business rules have already accepted the data.

The two calls can intentionally see different fields: dump_only (for example id, created_at) appears on output but is ignored on input; load_only (for example password) is accepted on input but stripped from serialized output. Constructor options such as only and exclude further narrow which fields participate in a given call without editing the class.


Required fields and validation

required=True on a field makes load() fail if the key is missing or the value is None (unless allow_none=True). Combine with field-level validate for richer rules:

python
from marshmallow import Schema, fields, validate


class ProfileSchema(Schema):
    username = fields.Str(required=True, validate=validate.Length(min=3))
    bio = fields.Str(load_default="")

Handle ValidationError

Wrap load() in try/except and read err.messages—a dict mirroring field paths. For general exception patterns, see Python try except.

python
from marshmallow import Schema, fields, validate, ValidationError


class LoginSchema(Schema):
    username = fields.Str(required=True, validate=validate.Length(min=3))
    password = fields.Str(required=True, validate=validate.Length(min=8))


try:
    LoginSchema().load({"username": "ab", "password": "short"})
except ValidationError as err:
    print(err.messages)

That prints a dict pointing at failing fields (for example length errors under username and password).


Nested schemas in Marshmallow

fields.Nested embeds another schema for structured sub-objects:

python
from marshmallow import Schema, fields


class AddressSchema(Schema):
    city = fields.Str(required=True)
    country = fields.Str(required=True)


class PersonSchema(Schema):
    name = fields.Str(required=True)
    address = fields.Nested(AddressSchema, required=True)


payload = {
    "name": "Kim",
    "address": {"city": "Seoul", "country": "KR"},
}
print(PersonSchema().load(payload))

Serialize and deserialize lists using many=True

When the JSON body is an array of objects, set many=True on the schema instance. Each element is validated independently; errors include the list index in the error path.

python
from marshmallow import Schema, fields


class UserSchema(Schema):
    id = fields.Int(dump_only=True)
    name = fields.Str(required=True)
    email = fields.Email(required=True)
    age = fields.Int(load_default=None)


rows = [
    {"name": "Ada", "email": "ada@example.com", "age": 36},
    {"name": "Lin", "email": "lin@example.com"},
]
print(UserSchema(many=True).load(rows))

For a nested list inside one object, use fields.Nested(SomeSchema, many=True) on the parent field instead.


Format DateTime fields in Marshmallow

Use the format argument to control string output during dump() and accepted string patterns on load():

python
from datetime import datetime
from marshmallow import Schema, fields


class EventSchema(Schema):
    name = fields.Str()
    starts = fields.DateTime(format="%Y-%m-%d %H:%M:%S")


dt = datetime(2026, 6, 20, 14, 30, 0)
print(EventSchema().dump({"name": "Meet", "starts": dt}))

For timezone-aware values and ISO strings, prefer aware datetime objects and document the format your API expects; see also Python datetime for clock and timezone basics.


Use only, exclude, load_only, and dump_only

Constructor arguments only and exclude temporarily narrow which fields participate:

python
from marshmallow import Schema, fields


class UserSchema(Schema):
    id = fields.Int(dump_only=True)
    name = fields.Str(required=True)
    email = fields.Email(required=True)
    age = fields.Int(load_default=None)


minimal = UserSchema(only=("name", "email")).dump(
    {"id": 1, "name": "Ada", "email": "a@e.com", "age": 40}
)
print(minimal)

Field flags load_only and dump_only split read vs write paths—passwords are the classic load_only case:

python
from datetime import datetime, timezone
from marshmallow import Schema, fields


class AccountSchema(Schema):
    username = fields.Str(required=True)
    password = fields.Str(required=True, load_only=True)
    last_login = fields.DateTime(dump_only=True)


acct = AccountSchema().dump(
    {
        "username": "ada",
        "password": "secret",
        "last_login": datetime(2026, 6, 19, 12, 0, tzinfo=timezone.utc),
    }
)
print(acct)

password is accepted on load() but omitted from dump(); last_login appears on output and is ignored on input. The printed dict includes username and last_login only.


Transform data with pre_load, post_load, pre_dump, and post_dump

Method decorators let you normalize or enrich data around validation:

python
from marshmallow import Schema, fields, pre_load, post_load, pre_dump, post_dump


class DemoSchema(Schema):
    name = fields.Str()

    @pre_load
    def strip_name(self, data, **kwargs):
        if isinstance(data, dict) and isinstance(data.get("name"), str):
            data = {**data, "name": data["name"].strip()}
        return data

    @post_load
    def add_meta(self, data, **kwargs):
        data["loaded"] = True
        return data

    @pre_dump
    def pass_through(self, obj, **kwargs):
        return dict(obj)

    @post_dump
    def tag_dump(self, data, **kwargs):
        data["dumped"] = True
        return data


print(DemoSchema().load({"name": "  ada  "}))
print(DemoSchema().dump({"name": "Ada"}))

pre_load/post_load wrap load(); pre_dump/post_dump wrap dump().


Common Marshmallow mistakes

  • Calling load() on data that still contains JSON strings—parse JSON first so Marshmallow sees dicts and lists.
  • Forgetting many=True when the payload is a top-level array.
  • Expecting dump() to validate inbound rules—validation defaults target load(); keep that separation clear.
  • Reusing a single Schema instance concurrently across threads without care—create per request if isolation matters.
  • Surprised by unknown keys: set policy explicitly on Meta:
python
from marshmallow import Schema, fields, EXCLUDE


class Loose(Schema):
    name = fields.Str()

    class Meta:
        unknown = EXCLUDE


print(Loose().load({"name": "Ada", "extra": 1}))

That loads {'name': 'Ada'} and drops extra instead of raising.


Python Marshmallow quick reference table

Task Pattern
Serialize Schema().dump(obj)
Deserialize + validate Schema().load(data)
List of objects Schema(many=True).load([...])
Nested object fields.Nested(ChildSchema)
Nested list fields.Nested(ChildSchema, many=True)
Hide secret on output load_only=True
Read-only API field dump_only=True
Subset of fields Schema(only=(...)) / exclude=(...)
Validation errors except ValidationError as err: err.messages

Summary

Marshmallow centers on Schema classes and fields: dump() serializes trusted data for JSON-friendly dicts, while load() deserializes and validates inbound dicts, raising ValidationError when rules fail. Required fields, validators, and ValidationError.messages give structured API errors. Wire schemas into a Flask web app once these patterns are clear. Nested and many=True model objects inside objects and lists. DateTime formatting keeps string contracts explicit. only, exclude, load_only, and dump_only separate read and write shapes. pre_load, post_load, pre_dump, and post_dump normalize and enrich data around the core two calls.


References


Frequently Asked Questions

1. What is Marshmallow used for in Python?

Marshmallow defines Schema classes that describe the shape of data, validate incoming structures with load(), and serialize Python objects or dicts to plain data with dump()—common for APIs and config without tying the tutorial to a specific web framework.

2. What is the difference between dump() and load() in Marshmallow?

dump() turns Python objects or dicts into serializable dicts (outbound); load() parses and validates untrusted input dicts into Python types and can raise ValidationError when rules fail (inbound).

3. How do I validate nested objects or lists in Marshmallow?

Use fields.Nested for a single sub-object, add many=True on Nested for a list of objects, or set many=True on the parent Schema when the top-level JSON is an array of records.

4. How do I hide a password in API responses but still accept it on input?

Mark the field load_only=True so it participates in load() but is omitted from dump() output, often paired with dump_only fields like id or created_at.
Bashir Alam

Data Analyst and Machine Learning Engineer

Computer Science graduate from the University of Central Asia, currently employed as a full-time Machine Learning Engineer at uExel. His expertise lies in OCR, text extraction, data preprocessing, and …