Relationships and Joins
This guide covers how Proper applications model the connections between data: declaring relationships, following them in code, joining tables, and avoiding the most common performance traps.
After reading this guide, you will know:
- Why relationships exist, and how a foreign key keeps your data consistent.
- How to declare one-to-many, one-to-one, many-to-many, and self-referencing relationships.
- How to traverse a relationship in both directions.
- How to control what happens when a related record is deleted.
- How to write joins, including with multiple foreign keys to the same table.
- How to avoid the N+1 query problem with
prefetch()and joinedselect(). - How to count and aggregate across relationships.
For patterns most apps don't reach for on day one - composite primary keys, circular foreign keys, polymorphic relationships, and using more than one database - see the Advanced Models guide.
1. Why Relationships
Most applications eventually need to connect things to other things. A user writes articles. An article has tags. A comment replies to another comment. You could try to keep everything in one table, but you'd quickly run into trouble:
# Don't do this
class Article(BaseModel):
title = pw.CharField()
body = pw.TextField()
author_name = pw.CharField()
author_email = pw.CharField()
author_bio = pw.TextField()
The author's name and bio would be repeated in every one of their articles. If they update their bio, you'd have to update every row. And if two articles spell the name slightly differently, you have no way to know they're the same person.
The fix is to split your data across tables and connect them with foreign keys:
class User(BaseModel):
name = pw.CharField()
email = pw.CharField(unique=True)
bio = pw.TextField(default="")
class Article(BaseModel):
title = pw.CharField()
body = pw.TextField()
author = pw.ForeignKeyField(User, backref="articles", on_delete="CASCADE")
Now the author lives in the user table, and each article points at them through a single author_id column. The database guarantees that author_id always refers to a real user - that's the constraint in "foreign key constraint". Your code follows the connection in either direction:
article.author # the user who wrote it
user.articles # all of this user's articles (a SelectQuery)
That's the deal: one declaration on the model, two-way navigation in your code, plus a database that won't let your data drift apart.
What Peewee gives you, and what it leaves explicit
A single ForeignKeyField declaration gets you: the column, the index, the on_delete cascade, the forward attribute (article.author), and the reverse accessor (user.articles).
Anything beyond that - many-to-many, joining across multiple tables, eager loading - is regular query code.
2. Types of Relationships at a Glance
Before diving in, here's a map of the territory:
| Shape | Example | How you declare it |
|---|---|---|
| One-to-many | A user has many articles | ForeignKeyField on the "many" side |
| One-to-one | A user has one profile | ForeignKeyField(unique=True) on the dependent side |
| Many-to-many | Articles have many tags, tags have many articles | A through model with two ForeignKeyFields |
| Self-referencing | A comment is a reply to another comment | ForeignKeyField("self", ...) |
| Polymorphic (no native support) | A reaction belongs to either an Article or a Comment | Discriminator column + nullable FKs (see Advanced Models §3) |
Each gets a chapter below. The shape that comes up most often, by far, is one-to-many - start there.
3. One-to-Many
A one-to-many relationship is when a single record on one side relates to many on the other. A user has many articles; an article has one user. The "many" side gets the foreign key.
3.1 Declaring a Foreign Key
Use pw.ForeignKeyField on the model that does the pointing:
# models/user.py
import peewee as pw
from .base import BaseModel
class User(BaseModel):
name = pw.CharField()
email = pw.CharField(unique=True)
# models/article.py
import peewee as pw
from .base import BaseModel
from .user import User
class Article(BaseModel):
title = pw.CharField()
body = pw.TextField()
author = pw.ForeignKeyField(User, backref="articles", on_delete="CASCADE")
Three things happen automatically when you declare a foreign key:
- A column is added - by default named
<field>_id. Soauthor = pw.ForeignKeyField(User, ...)creates anauthor_idcolumn on thearticletable. - An index is created on that column.
- A reverse accessor is added to the target model, named by the
backrefargument. Above,Usergains a.articlesattribute.
The resulting SQL for the article table:
CREATE TABLE "article" (
"id" INTEGER NOT NULL PRIMARY KEY,
"title" VARCHAR(255) NOT NULL,
"body" TEXT NOT NULL,
"author_id" INTEGER NOT NULL,
FOREIGN KEY ("author_id") REFERENCES "user" ("id") ON DELETE CASCADE
);
CREATE INDEX "article_author_id" ON "article" ("author_id");
3.2 Traversing in Both Directions
With the relationship declared, navigation is just attribute access:
# Forward - single instance
article = Article.get_by_id(1)
article.author # => User instance
article.author.name # => "Alice"
# Reverse - a SelectQuery (chainable, filterable, scope-able)
user = User.get_by_id(1)
user.articles # all of Alice's articles
user.articles.where(Article.title.contains("Python"))
user.articles.order_by(Article.created_at.desc()).limit(5)
user.articles.count()
The key thing to understand: article.author returns a loaded model instance, but user.articles returns a query - nothing has been fetched yet. You can keep chaining filters onto it, and it only runs when you iterate it, count it, or call .first().
Tip
Because the reverse side is a query, your scopes work seamlessly across relationships:
user.articles.published().recent().top(5)
3.3 What Happens on Delete: on_delete
When the parent record is deleted, what should happen to the children? You decide with on_delete:
| Value | Behavior |
|---|---|
"CASCADE" |
Delete the children too |
"SET NULL" |
Set the FK column to NULL (requires null=True on the FK) |
"RESTRICT" |
Refuse to delete the parent if children exist |
"NO ACTION" |
Same as RESTRICT on most databases, but checked at end of transaction |
| (not set) | Database default - usually NO ACTION. Don't rely on it. |
class Article(BaseModel):
# When the user is deleted, delete their articles
author = pw.ForeignKeyField(User, backref="articles", on_delete="CASCADE")
class Comment(BaseModel):
# Keep the comment, but null out the author
author = pw.ForeignKeyField(User, backref="comments",
on_delete="SET NULL", null=True)
article = pw.ForeignKeyField(Article, backref="comments",
on_delete="CASCADE")
Always set on_delete explicitly
The default behavior depends on the database engine, and a quiet NO ACTION somewhere can lead to either unexpected delete failures or - worse - orphaned rows. Pick the behavior you want and write it down.
3.4 Naming the Reverse Accessor
The backref argument gives the reverse query its name on the parent model. Pick names that read like English from the parent's point of view:
class Article(BaseModel):
author = pw.ForeignKeyField(User, backref="articles")
# Reads as: user.articles
If the same model is referenced more than once, each FK needs its own backref:
class Article(BaseModel):
author = pw.ForeignKeyField(User, backref="authored_articles")
editor = pw.ForeignKeyField(User, backref="edited_articles", null=True)
# user.authored_articles - articles where this user is the author
# user.edited_articles - articles where this user is the editor
Tip
If you don't need the reverse accessor at all, set backref="+". Peewee will skip creating it.
3.5 Customizing the Column Name
By default the column is <field>_id. Override it with column_name:
class Article(BaseModel):
author = pw.ForeignKeyField(User, backref="articles", column_name="user_id")
Useful when you're mapping to a legacy schema, but otherwise resist the urge.
3.6 Generating with the CLI
The model generator understands foreign keys through the fk-ModelName shorthand:
proper g model Article title:str body:text author:fk-User,backref:articles,on_delete:CASCADE
This creates models/article.py, registers it in models/__init__.py, and (with --migration) generates the migration file too.
4. One-to-One
A one-to-one relationship is just a one-to-many with the additional rule that the "many" side is at most one.
You enforce that by adding unique=True to the foreign key:
class User(BaseModel):
name = pw.CharField()
email = pw.CharField(unique=True)
class Profile(BaseModel):
user = pw.ForeignKeyField(User, backref="profile",
unique=True, on_delete="CASCADE")
bio = pw.TextField(default="")
avatar_url = pw.CharField(null=True)
twitter = pw.CharField(null=True)
The unique constraint guarantees that any given user_id appears at most once in the profile table - exactly one profile per user.
CREATE UNIQUE INDEX "profile_user_id" ON "profile" ("user_id");
4.1 Traversing a One-to-One
Forward navigation is the same as one-to-many:
profile.user # => User instance
Reverse navigation is where one-to-one trips people up. Even though there's at most one related record, user.profile is still a SelectQuery - Peewee doesn't change its return type based on unique=True. You need to materialize it:
user.profile.get() # raises DoesNotExist if missing
user.profile.first() # returns None if missing - usually what you want
Tip
If you find yourself writing .first() everywhere, add a small helper to your model:
class User(BaseModel):
@property
def profile_or_none(self):
return self.profile.first()
4.2 When Not to Use One-to-One
If the dependent model only has a couple of fields and they're always loaded together with the parent, a one-to-one is overkill - just add the columns to the parent table. Keep one-to-one for genuinely independent concerns: optional records, large blobs you don't always want to load, or data that has its own lifecycle.
5. Many-to-Many
A many-to-many relationship is when records on both sides can relate to many records on the other. Articles have many tags; tags belong to many articles.
Peewee deliberately doesn't have a built-in M:N field by default. You write the join table yourself, because most of the time you eventually want to put extra columns on it (when was the tag added? by whom?), and a hand-written model gives you that for free.
5.1 The Through Model Pattern
Three models - the two ends and the join:
# models/article.py
class Article(BaseModel):
title = pw.CharField()
body = pw.TextField()
# models/tag.py
class Tag(BaseModel):
name = pw.CharField(unique=True)
# models/article_tag.py
class ArticleTag(BaseModel):
article = pw.ForeignKeyField(Article, backref="article_tags",
on_delete="CASCADE")
tag = pw.ForeignKeyField(Tag, backref="tag_articles",
on_delete="CASCADE")
added_at = pw.DateTimeField(default=pw.utcnow)
class Meta:
indexes = (
(("article", "tag"), True), # Each pair is unique
)
The unique-together index is what stops a tag from being attached to the same article twice.
5.2 Querying Both Sides
There's no shortcut method - you write the join:
# All tags for an article
tags = Tag.select().join(ArticleTag).where(ArticleTag.article == article)
# All articles with a given tag
articles = Article.select().join(ArticleTag).where(ArticleTag.tag == tag)
# All articles tagged "python"
articles = (
Article.select()
.join(ArticleTag)
.join(Tag)
.where(Tag.name == "python")
)
The SQL for the third one:
SELECT "t1"."id", "t1"."title", "t1"."body"
FROM "article" AS "t1"
INNER JOIN "articletag" AS "t2" ON ("t2"."article_id" = "t1"."id")
INNER JOIN "tag" AS "t3" ON ("t2"."tag_id" = "t3"."id")
WHERE ("t3"."name" = 'python')
These join patterns deserve their own chapter - see Joins.
5.3 Adding Helper Methods
Repeating join code in every controller gets old fast. Wrap it in a method on the model:
class Article(BaseModel):
title = pw.CharField()
body = pw.TextField()
@property
def tags(self):
return Tag.select().join(ArticleTag).where(ArticleTag.article == self)
def add_tag(self, tag):
ArticleTag.get_or_create(article=self, tag=tag)
def remove_tag(self, tag):
ArticleTag.delete().where(
ArticleTag.article == self,
ArticleTag.tag == tag,
).execute()
Now controllers can write article.tags, article.add_tag(tag), and article.remove_tag(tag) without thinking about the join table at all.
5.4 The ManyToManyField Shortcut
If you really don't want to write a through model - say, the relationship is purely associative and you'll never need extra columns - Peewee's playhouse offers a shortcut:
from playhouse.fields import ManyToManyField
class Article(BaseModel):
title = pw.CharField()
body = pw.TextField()
tags = ManyToManyField(Tag, backref="articles")
# Set up the implicit through table after both classes are defined
ArticleTag = Article.tags.get_through_model()
This generates a hidden through model for you. You get attribute-style access:
article.tags.add(tag)
article.tags.remove(tag)
list(article.tags) # all tags
list(tag.articles) # all articles with this tag
The trade-off
You no longer have a Python class for the through table, so you can't add columns to it later without rewriting. If there's any chance you'll need when the tag was added, who added it, or any other per-link metadata, write the through model by hand.
5.5 Choosing Between the Two
| Use the through model when… | Use ManyToManyField when… |
|---|---|
| You might add per-link columns later | The relationship is pure association |
| You want explicit migration control over the join table | You want shorter code |
| You want to query the through table directly | You only ever traverse the relationship |
When in doubt, write the through model. The extra code is small; the flexibility is large.
6. Self-Referencing Relationships
Sometimes a model needs to point at itself: Threaded comments (a comment can be a reply to another comment), category trees (a category has subcategories), org charts (an employee reports to another employee), etc.
Use the string "self" in place of the target model:
class Comment(BaseModel):
article = pw.ForeignKeyField(
Article, backref="comments", on_delete="CASCADE"
)
author = pw.ForeignKeyField(
User, backref="comments", on_delete="CASCADE"
)
body = pw.TextField()
parent = pw.ForeignKeyField(
"self", backref="replies", null=True, on_delete="CASCADE"
)
A top-level comment has parent_id = NULL. A reply has parent_id set to the parent comment's id.
top_level = Comment.select().where(Comment.parent.is_null())
for reply in top_level[0].replies:
print(reply.body)
6.1 Walking the Tree
The hard part of self-referencing relationships isn't the declaration - it's querying all descendants (or all ancestors) without firing one query per level. SQL has a feature for this called a recursive CTE (Common Table Expression), and Peewee exposes it through the .cte() and .with_cte() methods.
For shallow trees, it's often simpler to walk .replies in Python and accept a few extra queries. Reach for a recursive CTE when the depth is unbounded, or when the tree is large enough that Python-side recursion would issue too many queries.
def descendants(root):
"""All comments below `root`, any depth, in one query."""
seed = (
Comment.select(Comment.id, Comment.parent, Comment.body)
.where(Comment.parent == root.id)
.cte("descendants", recursive=True,
columns=("id", "parent_id", "body"))
)
Child = Comment.alias()
step = (
Child.select(Child.id, Child.parent, Child.body)
.join(seed, on=(Child.parent == seed.c.id))
)
cte = seed.union_all(step)
return cte.select_from(cte.c.id, cte.c.parent_id, cte.c.body)
The result is a query that yields tuples of (id, parent_id, body) for every comment underneath root. The SQL is one WITH RECURSIVE ... statement - no Python recursion, no per-level query.
Tip
Wrap CTE queries in a method on the model. They're verbose to write and easy to misread; callers shouldn't have to look at them twice.
Avoid cycles
Nothing in the database stops you from making parent_id form a cycle (A → B → A). If your domain shouldn't allow cycles, enforce it in your save() method.
7. Joins
Once your data is split across tables, most non-trivial queries need to combine rows from more than one of them. That's a join.
7.1 What a Join Does
A join takes two tables and a condition, and produces a new virtual table with the columns of both, containing only the row pairs where the condition is true. The condition is usually "the FK on one side equals the primary key on the other".
The two join types you'll use 99% of the time:
- INNER JOIN: keep only rows where the condition matches on both sides. Used when you want articles that have an author - articles with no author are dropped.
- LEFT OUTER JOIN: keep every row from the left side, even when there's no match on the right (right-side columns are
NULL). Used when you want every article and its author if it has one.
Other types (RIGHT, FULL OUTER, CROSS) exist but rarely come up in application code.
7.2 The Default Join
Calling .join(Other) on a query joins the two tables on the foreign key between them. The default is INNER JOIN:
articles = (
Article.select()
.join(User)
.where(User.email.endswith("@example.com"))
)
The SQL:
SELECT "t1"."id", "t1"."title", "t1"."body", "t1"."author_id"
FROM "article" AS "t1"
INNER JOIN "user" AS "t2" ON ("t1"."author_id" = "t2"."id")
WHERE ("t2"."email" LIKE '%@example.com')
Peewee figures out the join condition by looking at the foreign keys between the two models. If there's exactly one FK from Article to User, that's what gets used.
7.3 Choosing the Join Type
Pass join_type to switch from inner to outer:
# All users, even those who haven't written any articles
query = (
User.select()
.join(Article, pw.JOIN.LEFT_OUTER)
)
The available constants are pw.JOIN.INNER, pw.JOIN.LEFT_OUTER, pw.JOIN.RIGHT_OUTER, pw.JOIN.FULL, pw.JOIN.FULL_OUTER, and pw.JOIN.CROSS.
7.4 Joining on a Specific Column
When the auto-detected join condition is wrong, or when there's no FK at all, give Peewee an explicit on=:
articles = (
Article.select()
.join(User, on=(Article.author == User.id))
)
on= accepts any expression - it's the same kind of clause you put in .where().
7.5 Multiple Foreign Keys to the Same Model
When two foreign keys on a model point to the same target, Peewee can't guess which one to join on. You have to be explicit:
class Article(BaseModel):
title = pw.CharField()
body = pw.TextField()
author = pw.ForeignKeyField(User, backref="authored_articles")
editor = pw.ForeignKeyField(User, backref="edited_articles", null=True)
# Join on the editor relation, not author
query = (
Article.select()
.join(User, on=(Article.editor == User.id))
.where(User.email == "alice@example.com")
)
To load both joined users onto the article in one query, alias the second one and use attr= on each .join() to name where the row data should land:
EditorAlias = User.alias()
query = (
Article.select(Article, User, EditorAlias)
.join(User, on=(Article.author == User.id), attr="author")
.switch(Article)
.join(EditorAlias, on=(Article.editor == EditorAlias.id), attr="editor")
)
for article in query:
article.author.name # populated, no extra query
article.editor.name # populated, no extra query
The attr= argument tells Peewee which Python attribute on the result row should hold the joined model instance. Without it, joining the same model twice would clobber the first one with the second.
7.6 switch() - Anchoring the Next Join
When you've joined A → B and now want to join A → C (not B → C), Peewee assumes the next .join(...) is from B. Use .switch(A) to go back:
query = (
Article.select(Article, User, Tag)
.join(User) # Article → User
.switch(Article) # back to Article
.join(ArticleTag) # Article → ArticleTag
.join(Tag) # ArticleTag → Tag
.where(Tag.name == "python")
)
Without .switch(), the ArticleTag join would try to attach to User, which has no FK to it.
7.7 Joining on Arbitrary Fields
on= doesn't have to involve a foreign key at all. Sometimes you join on natural keys, denormalized columns, or any expression that produces a boolean:
# Join orders to customers by email, not by an FK
query = (
Order.select(Order, Customer)
.join(Customer, on=(Order.customer_email == Customer.email))
)
This is rare in well-designed schemas, but invaluable when integrating with legacy data.
7.8 Chaining Multiple Joins
Each .join() extends the chain to one more table. The chain reads top-to-bottom:
# Articles → Comments → Users (the comment authors)
query = (
Article.select(Article, Comment, User)
.join(Comment)
.join(User, on=(Comment.author == User.id))
.where(Article.id == 42)
)
For chains longer than two or three joins, give the result some breathing room and consider pulling it into a @scope.
7.9 Loading Joined Columns
Here's the most important nuance about joins in Peewee:
Note
A .join() filters your query, but it doesn't load the joined columns into your result objects unless you ask for them in .select().
# Joined, but author is NOT loaded
query = Article.select().join(User).where(User.created_at > yesterday)
for article in query:
article.author.name # extra query, every iteration
To pull the joined data through, list both models in .select():
# Joined and loaded - one query total
query = Article.select(Article, User).join(User)
for article in query:
article.author.name # already there
This is the cornerstone of avoiding N+1 - see Eager Loading: Avoiding the N+1 Problem.
###0 Subqueries
Sometimes a join is the wrong shape, and a subquery is the right one. A subquery is a SELECT used as a value somewhere else.
7.9.1 IN with a subquery
# Articles whose author is on staff
staff_ids = User.select(User.id).where(User.is_staff == True)
articles = Article.select().where(Article.author.in_(staff_ids))
The SQL:
SELECT "t1"."id", "t1"."title", "t1"."body", "t1"."author_id"
FROM "article" AS "t1"
WHERE ("t1"."author_id" IN (
SELECT "t2"."id" FROM "user" AS "t2"
WHERE ("t2"."is_staff" = true)
))
7.9.2 As a derived value
You can compute a per-row value with a correlated subquery, alias it, and select it alongside the parent:
recent_count = (
Comment.select(pw.fn.COUNT(Comment.id))
.where(Comment.article == Article.id, Comment.created_at > yesterday)
.alias("recent_count")
)
articles = Article.select(Article, recent_count)
for article in articles:
article.recent_count # number of recent comments
For "is there at least one matching row" checks, an IN subquery is usually cleaner; reach for pw.fn.EXISTS(subquery) only when the subquery is independently shaped (e.g. references multiple outer columns).
8. Eager Loading: Avoiding the N+1 Problem
The single most common performance bug in ORM-backed applications looks like this:
# Looks innocent. Isn't.
for article in Article.select():
print(article.title, "by", article.author.name)
This issues one query to load articles, then one more query for each article to load its author. With 50 articles, that's 51 queries. With 5,000, you've got a problem. This is the N+1 problem.
Proper logs every SQL statement in development mode, so you can see this happening in your terminal. When you spot a wall of nearly-identical queries, you've found an N+1.
There are two ways to fix it.
8.1 Strategy A: Join + Select (one query, flat result)
When the relationship is "to one" - the article belongs to a user, the comment belongs to an article - join the table and pull its columns through .select():
articles = Article.select(Article, User).join(User)
for article in articles:
print(article.title, "by", article.author.name) # no extra query
One query. Each row carries all the article and author columns. This is what other ORMs sometimes call select_related or includes.
8.2 Strategy B: prefetch (two queries, joined in Python)
When the relationship is "to many" - has many comments, has many tags - joining produces a Cartesian explosion: one row per child for every parent. Instead, run two queries and let Peewee stitch the results together:
articles = Article.select()
comments = Comment.select()
for article in pw.prefetch(articles, comments):
print(article.title, len(article.comments)) # no extra queries
prefetch() runs the article query, then runs the comment query restricted to those articles' ids, then attaches each comment to its parent in Python. The prefetched children replace the regular backref attribute: before the call, article.comments is a lazy SelectQuery; after prefetch() returns, it's a plain list of fully-loaded Comment instances.
Warning
Because article.comments becomes a list after prefetching, methods like .where(...) or .count() no longer work on it - those belong to the query, not the list. Filter the comment query before passing it into prefetch():
recent = Comment.select().where(Comment.created_at > yesterday)
for article in pw.prefetch(Article.select(), recent):
article.comments # only the recent ones, as a list
You can prefetch arbitrarily deep:
articles = Article.select()
comments = Comment.select()
users = User.select()
# Three queries: articles, comments, comment authors
results = pw.prefetch(articles, comments, users)
8.3 Picking Between Them
A useful rule of thumb:
| Relationship | Use |
|---|---|
| "Belongs to" (the "one" side) | .join().select(A, B) |
| "Has many" (the "many" side) | prefetch(A, B) |
| "Has one" | Either works; .join() is simpler |
| Many-to-many | prefetch(A, ThroughModel, B) |
The reason: a join multiplies rows. Joining articles to their comments returns one row per comment (so a 100-article page with 20 comments each is 2,000 rows). Joining articles to their author returns one row per article. The first wastes bandwidth; the second doesn't.
8.4 Combining: Filter with Join, Load with Prefetch
You can use both in the same query - .join() to filter, prefetch() to load:
# Articles by staff users, with their comments preloaded
staff_articles = (
Article.select()
.join(User)
.where(User.is_staff == True)
)
results = pw.prefetch(staff_articles, Comment.select())
The join narrows the article set; the prefetch attaches the comments. Two queries, no N+1.
Watch the SQL
The fastest way to spot N+1 problems is to keep an eye on the dev server's query log when you're loading a page. If you see the same SELECT ... WHERE id = ? shape repeating, you have one. Fix it with one of the two strategies above.
9. Aggregations Across Relationships
Counting, summing, and averaging across a relationship is a thin wrapper over SQL aggregate functions. The pattern is always: .select() what you want plus the aggregate, .join() if needed, .group_by() the parent.
9.1 Counting Children
# How many articles each user has written
query = (
User.select(User, pw.fn.COUNT(Article.id).alias("article_count"))
.join(Article, pw.JOIN.LEFT_OUTER)
.group_by(User.id)
)
for user in query:
print(user.name, user.article_count)
The LEFT_OUTER join is what makes users with zero articles still show up (with article_count = 0). An inner join would silently drop them.
9.2 Wrapping Aggregates as Scopes
If you reach for a counted query more than once, extract it:
class User(BaseModel):
name = pw.CharField()
email = pw.CharField(unique=True)
@scope
def with_article_count(query):
return (
query.select(User, pw.fn.COUNT(Article.id).alias("article_count"))
.join(Article, pw.JOIN.LEFT_OUTER)
.group_by(User.id)
)
Now controllers read like this:
users = User.select().with_article_count().order_by(pw.SQL("article_count").desc())
9.3 Filtering on Aggregates: having()
WHERE filters rows before aggregation; HAVING filters them after. Use .having() when the condition involves the aggregate value itself:
# Only users who have written at least 5 articles
prolific = (
User.select(User, pw.fn.COUNT(Article.id).alias("article_count"))
.join(Article)
.group_by(User.id)
.having(pw.fn.COUNT(Article.id) >= 5)
)
9.4 Other Aggregates
pw.fn.SUM, pw.fn.AVG, pw.fn.MIN, and pw.fn.MAX all work the same way. Always alias() them so you can read the value off the result row.
10. Tips, Tricks, and Warnings
A grab-bag of things that come up after you've used relationships for a while.
10.1 Always set on_delete explicitly
The default behavior depends on the database engine. Pick the right value (CASCADE, SET NULL, or RESTRICT) and write it down. A relationship without on_delete is a bet that you, or whoever inherits this code, will remember the default. Don't make that bet.
10.2 backref returns a query, not a list
Every time you access user.articles, you get a fresh SelectQuery. Iterating it twice runs the query twice:
for a in user.articles: print(a.title) # query 1
for a in user.articles: print(a.id) # query 2 - same results
If you need to iterate the same set more than once, materialize it first:
articles = list(user.articles)
for a in articles: ...
for a in articles: ...
The exception is after pw.prefetch() - the backref attribute is replaced with a real list, so iteration is free but query methods like .where() won't work on it. See Strategy B: prefetch (two queries, joined in Python).
10.3 Watch for backref collisions
Two models referencing the same parent must use different backref names:
class Article(BaseModel):
author = pw.ForeignKeyField(User, backref="articles") # ok
class Comment(BaseModel):
user = pw.ForeignKeyField(User, backref="articles") # collision
The second declaration silently overrides the first. Pick distinct names - comments, authored_articles, assigned_tickets.
10.4 Adding an FK to a populated table
A NOT NULL foreign key needs a value for every existing row. When you add one to a table that already has data, the migration will fail unless either:
- The column is
null=True, or - You provide a
default.
For more, see the Migrations guide.
10.5 Index your FK + filter combinations
FK columns are indexed automatically, but most real queries filter on more than just the FK:
Article.select().where(Article.author == user, Article.published == True)
A composite index on (author_id, published) will be much faster than the bare FK index. See Models §7.2.
10.6 You cannot join across databases
If Article lives in your main database and PageView lives in analytics, no join in any framework can put them in the same query - the database engines don't talk to each other. Query each side and combine in Python. See Advanced Models §4.4.
10.7 Test relationships against real tables
The test fixture in conftest.py runs every test inside a transaction that rolls back when the test exits. That's fast enough that you should always test relationships with real .create() calls and real queries, not mocks. A mocked backref doesn't behave like a real SelectQuery, and the differences will bite you.
def test_user_articles_returns_only_their_articles():
alice = User.create(name="Alice", email="alice@example.com")
bob = User.create(name="Bob", email="bob@example.com")
Article.create(title="A1", body="...", author=alice)
Article.create(title="A2", body="...", author=alice)
Article.create(title="B1", body="...", author=bob)
assert alice.articles.count() == 2
assert bob.articles.count() == 1
11. Reference Card
A quick lookup for the syntax bits you'll forget the most.
11.1 ForeignKeyField options
| Option | What it does |
|---|---|
backref="..." |
Name of the reverse accessor on the target model |
backref="+" |
Don't create a reverse accessor at all |
on_delete=... |
"CASCADE", "SET NULL", "RESTRICT", "NO ACTION" |
unique=True |
Makes it a one-to-one |
null=True |
The FK column is nullable |
column_name=... |
Override the default <field>_id column name |
field=... |
Reference a non-PK column on the target (rare) |
11.2 pw.JOIN constants
INNER, LEFT_OUTER, RIGHT_OUTER, FULL, FULL_OUTER, CROSS.
11.3 Eager-loading checklist
| Shape | Tool |
|---|---|
| "Belongs to" (one side) | .select(A, B).join(B) |
| "Has many" | pw.prefetch(A, B) |
| Many-to-many | pw.prefetch(A, Through, B) |
| Nested | pw.prefetch(A, B, C, ...) |
| Filter + load | .join() for filter, prefetch() for load |
11.4 Useful imports
import peewee as pw
from playhouse.fields import ManyToManyField