File Storage
This guide covers how to attach files to your peewee models and how Proper stores, validates, and serves them.
After reading this guide, you will know:
- How to install the storage addon and configure one or more services.
- How to attach a single file or many files to a record.
- How to remove an attachment.
- How to link to and serve attachments in development and in production.
- How to validate uploaded files by size and content type.
- How to transform images and generate variants on demand.
- How to preview non-image files like PDFs and videos.
- How to test code that stores files.
- How to add support for storage services Proper doesn't ship with.
The companion AttachmentField section in the Forms guide covers the field that handles uploads inside a form in depth. Use this guide as the canonical reference for the storage subsystem; jump there for form mechanics.
1. What is File Storage?
File Storage in Proper handles uploading files to a cloud service like Amazon S3 or any S3-compatible API, and to your local disk in development. It provides a single Attachment model so the rest of your application can treat any uploaded file the same way regardless of where its bytes live.
The subsystem has two cooperating pieces:
- Storage services. A service knows how to put bytes somewhere and get them back:
Diskwrites to a folder on the local filesystem,S3talks to Amazon S3 (and any service that speaks the S3 protocol). Each service is named in your config, and one service is active per environment. - The
Attachmentmodel. Every uploaded file gets a row in theattachmenttable that records its filename, content type, byte size, which service holds the bytes, and a few other flags. Your own peewee models reference attachments through a regularForeignKeyField.
Because the application code talks to Attachment and never to a service directly, swapping Disk for S3 between development and production is one config change. A user's avatar is stored in storage/ on a developer's laptop, in temp/storage/ during the test run, and in an S3 bucket in production. None of that ripples into model or controller code.
your model -> ForeignKey -> Attachment row -> service -> bytes (disk / S3)
2. Setup
The storage addon is installed on demand; a freshly generated application doesn't carry it by default. From the project root:
$ proper install storage
This writes a few files into your application:
- An
Attachmentmodel that lives in your application and inherits its powers from the one in Proper, so you can extend it with extra fields or methods. PublicAttachmentControllerfor unsigned public URLs andAttachmentControllerfor signed URLs.- Drop-in Jx components for file inputs with image previews, paired with Stimulus controllers and CSS.
Run the migration after installing:
$ proper db migrate
libvips
To use image variants, you also need the system libvips library.
On macOS install the system library with brew install vips; on Debian or Ubuntu, apt install libvips-dev. For other systems, see the libvips installation page. Storing and serving files works without libvips; only attachment.variant(...) requires it.
2.1 Services Configuration
Storage services are declared in config/storage.py as a dictionary. Each entry has a name (chosen by you) and a type plus any options the service needs:
# config/storage.py
import os
STORAGE_SERVICES = {
"local": {
"type": "Disk",
"root": "storage/",
},
"test": {
"type": "Disk",
"root": "temp/storage",
},
"amazon": {
"type": "S3",
"bucket": "my-app-uploads",
"region": "us-east-1",
"access_key_id": os.getenv("AWS_ACCESS_KEY_ID"),
"secret_access_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
},
}
STORAGE = "local"
if env == "prod":
STORAGE = "amazon"
elif env == "test":
STORAGE = "test"
STORAGE names the default service. Application code never references a service name directly; it picks up the active one whenever you create an Attachment.
Note
To keep tests from touching the development storage folder, the bundled config defines a separate test service that points at temp/storage/.
You can declare more services than you actively use. The lookup is lazy: a service is instantiated only the first time someone or something writes to it.
2.2 Disk Service
The Disk service writes files under a single root folder. Configuration takes one option, root, interpreted relative to the project root (one level above app.root_path):
"local": {
"type": "Disk",
"root": "storage/",
}
If it doesn't already exists, the folder is created the first time the service is used.
Files are sharded across two levels of subdirectories using the first four characters of the attachment's UUID. Given an attachment with id 7fa3..., the bytes land at:
storage/
└── 7f/
└── a3/
└── avatar.jpg
This keeps any one folder from growing past a few thousand files, which matters once you have a lot of uploads. You don't have to think about the layout - Proper computes the path from the attachment's id - but if you go poking around storage/ to debug, that's the structure.
Tip
storage/ is in your .gitignore (the new-app generator already does it). Uploaded files are user data, not source, and they don't belong in version control.
2.3 S3 Service
The S3 service uploads to Amazon S3 or any service that speaks the S3 protocol (DigitalOcean Spaces, Cloudflare R2, MinIO, Wasabi, Backblaze B2, ...). Required: bucket. Everything else is optional.
"amazon": {
"type": "S3",
"bucket": "my-app-uploads",
"region": "us-east-1",
"access_key_id": os.getenv("AWS_ACCESS_KEY_ID"),
"secret_access_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
}
If you omit the credentials, boto3 falls back to its default credential chain (environment variables, IAM instance profile, shared AWS config files). On a properly-configured EC2 instance or container, you don't need to put credentials in your config at all.
For non-AWS services, set endpoint:
"spaces": {
"type": "S3",
"bucket": "my-bucket",
"region": "nyc3",
"endpoint": "https://nyc3.digitaloceanspaces.com",
"access_key_id": os.getenv("DO_SPACES_KEY"),
"secret_access_key": os.getenv("DO_SPACES_SECRET"),
}
The same shape covers Cloudflare R2, MinIO, Wasabi, and similar; check the provider's docs for the right endpoint value.
The S3 service requires boto3. Add it to your dependencies:
$ uv add boto3
2.4 Public Access
Public access is decided at the service level. Each entry in STORAGE_SERVICES can carry a public: True flag; every attachment stored in that service is reachable through a stable, unsigned URL. Attachments in services without the flag (the default) are reachable only through a signed URL with an expiration.
STORAGE_SERVICES = {
"local": {
"type": "Disk",
"root": "storage/",
},
# A second service for files that are genuinely world-readable:
"public": {
"type": "Disk",
"root": "storage/public",
"public": True,
},
"amazon": {
"type": "S3",
"bucket": "private-uploads",
},
"amazon_public": {
"type": "S3",
"bucket": "public-assets",
"public": True,
},
}
The benefit of routing public files through their own service is operational clarity: the bucket policies, CDN configuration, and access logs all align with the access mode. There's no risk of accidentally flipping a single attachment's public flag and exposing data, because the access mode is a property of where the bytes live.
To put a new attachment in a public service, pass service_name= either when constructing it manually or when declaring the form field:
# Manually:
att = Attachment(upload, service_name="public")
att.save()
# Through the form field:
avatar = f.AttachmentField(Attachment, service_name="public")
The two URL shapes correspond to two bundled controllers:
PublicAttachmentControllerserves/storage/public/<uuid>and is registered withskip_authentication = True. Anyone with the URL fetches the file.AttachmentControllerserves/storage/<token>. The token encodes the attachment id and an expiration; your normal authentication stack runs before the controller does. We come back to both in Serving Files.
2.5 Baseline Configuration
Three more keys, all populated by the installer, control how files are served and how variants are encoded:
STORAGE_ALLOWED_INLINE = [
"image/*",
"video/*",
"application/pdf",
]
STORAGE_ALLOWED_VARIANTS = [
"image/png",
"image/jpeg",
"image/gif",
]
STORAGE_FALLBACK_FORMAT = "png"
STORAGE_ALLOWED_INLINE is a list of glob patterns (matched with fnmatch) for content types that should be served inline in the browser. <img> shows the image, <video> plays the video, the PDF opens in the browser viewer. Anything not matched is served with Content-Disposition: attachment, triggering a download dialog.
STORAGE_ALLOWED_VARIANTS is a list of glob patterns for source content types whose format should be preserved when generating a variant. A source PNG produces a PNG variant; a source JPEG produces a JPEG variant. Add image/webp or image/avif if your application produces those.
STORAGE_FALLBACK_FORMAT is the format used for variants whose source content type is not in STORAGE_ALLOWED_VARIANTS. The default is "png" (lossless, supports transparency). Set it to "jpg" or "webp" if smaller files matter more than fidelity. A caller-supplied save={"format": "..."} always overrides both rules - see Format Conversion.
3. Attaching Files to Records
The recommended way to attach a file is to give your model a ForeignKeyField that points at Attachment. There is no magic attached_as declaration: a regular foreign key shows up in migrations, plays well with normal queries, and lets two records share the same blob if you ever need to copy a record.
3.1 Single Attachment
For "one file per record" relationships - an avatar on a user, a cover on a book, a logo on an organization - declare a nullable foreign key:
# models/user.py
import peewee as pw
from .attachment import Attachment
from .base import BaseModel
class User(BaseModel):
name = pw.CharField()
email = pw.CharField(unique=True)
avatar = pw.ForeignKeyField(Attachment, null=True)
That's the entire wire-up. user.avatar is either None or an Attachment instance. Reading attributes works as you'd expect:
user.avatar.url
user.avatar.filename
user.avatar.byte_size
user.avatar.content_type
To assign an attachment, you build it and save it before pointing the FK at it:
att = Attachment(upload, filename="avatar.jpg")
att.save()
user.avatar = att
user.save()
The order matters because the Attachment.id column has no default=uuid4. The id is generated when you call save(), not when you call Attachment(...). That's intentional: it makes attachment.id is None truthfully signal "not in the database yet" so you can't accidentally set a foreign key to a UUID that points at no row.
If you want to skip the ordering dance, use AttachmentField on a form (Forms and Attachments) - the field handles the build, save, and assign in one form.save() call.
3.2 Multiple Attachments
For "many files per record" - photos in a gallery, documents on a project, supporting evidence on a claim - use a normal through-model:
class Photo(BaseModel):
gallery = pw.ForeignKeyField(Gallery, backref="photos")
attachment = pw.ForeignKeyField(Attachment)
caption = pw.CharField(default="")
position = pw.IntegerField(default=0)
class Gallery(BaseModel):
name = pw.CharField()
# gallery.photos is provided by the backref above
gallery.photos is now an iterable of Photo rows, each carrying its own attachment, caption, and position. The through-model is also the natural place for any per-attachment data: ordering, alt text, captions, who uploaded it, when, in what context.
If you don't need any extra columns, you can still use a through-model with just two foreign keys. It's a tiny amount of boilerplate that pays off the first time you want to add ordering or labels.
3.3 Attaching File or IO Objects
The Attachment constructor accepts anything with a read() method - an open file, an in-memory BytesIO, a request body. It also accepts the MultipartPart objects produced by the form parser, which is what AttachmentField uses under the hood.
from io import BytesIO
from .attachment import Attachment
# From in-memory bytes:
data = generate_report_pdf(...)
att = Attachment(BytesIO(data), filename="report.pdf", content_type="application/pdf")
att.save()
# From an open file:
with open("/tmp/photo.jpg", "rb") as fp:
att = Attachment(fp, filename="photo.jpg")
att.save()
# From a remote URL via httpx:
import httpx
resp = httpx.get("https://example.com/avatar.png")
att = Attachment(BytesIO(resp.content), filename="avatar.png")
att.save()
Things to know about the constructor:
service_namedefaults to whateverSTORAGEresolves to at runtime. Passservice_name="amazon"to override (write to a different bucket, for example, even though the active default islocal).filenameis parameterized: lowercased, special characters replaced with dashes, the extension preserved as a separate part."My Photo!.JPG"becomes"my-photo.jpg"on disk.content_typeis detected from the filename extension when you don't supply it. The fallback isapplication/octet-stream.byte_sizeis populated by the service duringsave()- never pre-fill it.idisNoneuntilsave()runs.
3.4 Replacing vs Adding to a Collection
For a single foreign key, the natural operation is replace: assign a new attachment, then delete the old one so its bytes don't linger in the service. AttachmentField does this automatically; if you're doing it by hand:
old = user.avatar
user.avatar = new_attachment
user.save()
if old is not None:
old.purge_later()
purge_later() queues the delete in a background worker - you don't want a slow S3 round-trip blocking your response handler. See Removing Files for the difference between purge() and purge_later().
For a collection (a Photo through-model), you have two choices, and which one you want depends on the user interaction:
- Adding. Create a new
Photorow pointing at a newAttachment. The collection grows by one. Use this for "drop a new photo into the gallery" flows. - Replacing all. Delete every existing
Photorow (purging their attachments) and create new ones. Use this only when the form really replaces the whole collection - e.g. a JSON API that PUTs the canonical list.
# Add:
att = Attachment(upload, filename=upload.filename)
att.save()
Photo.create(gallery=gallery, attachment=att, position=len(gallery.photos))
# Replace all:
for photo in gallery.photos:
photo.attachment.purge_later()
photo.delete_instance()
for upload in uploads:
att = Attachment(upload)
att.save()
Photo.create(gallery=gallery, attachment=att)
The "replace all" pattern is rare in HTML form workflows because you usually want partial updates. It's more common in API endpoints that operate on whole collections.
3.5 Forms and Attachments
The form-side mechanics are covered in detail in the AttachmentField section of the Forms guide. The orientation goes here.
For any ForeignKeyField(Attachment) column on a model, the corresponding form field is f.AttachmentField:
# models/forms/user_form.py
from proper import forms as f
from ..attachment import Attachment
from ..user import User
class UserForm(f.Form):
name = f.TextField()
email = f.EmailField()
avatar = f.AttachmentField(Attachment, required=False)
class Meta:
orm_cls = User
The controller doesn't need to know an upload is involved:
def update(self):
self.form.save()
self.response.redirect_to("User.show", self.user)
form.save() reads the multipart submission, builds an Attachment(upload), calls .save() on it (uploading the bytes to the active service), assigns the new attachment to user.avatar, and queues the previous attachment for deletion. All in one call.
The field interprets a structured payload composed of two sub-inputs:
| User action | <field>[file] |
<field>[_destroy] |
What save() does |
|---|---|---|---|
| Uploaded a new file | populated | (any) | Save new attachment, queue old one for deletion. |
| Clicked "Remove" | empty | "1" |
Clear the FK, queue old one for deletion. |
| Left the field alone | empty | "0" or absent |
Preserve the existing attachment unchanged. |
The render helpers file_input() and destroy_input() produce the two HTML inputs, and the image_input.jx component the storage addon ships does the JS work for drag-and-drop, preview, and the "Remove" toggle. See Rendering Forms - Attachment uploads for the HTML side.
For uploads that don't fit the foreign-key pattern - a one-off CSV import, a webhook from an outside service, a parser that reads bytes and discards the file - work directly with Attachment (Attaching File or IO Objects).
3.6 Validating Attached Files
AttachmentField ships with two server-side validators - max_size and accept - and accepts custom validation through the standard form validate_<field> hook. They run during form.validate(), before any upload would otherwise be saved.
Limiting file size. Pass max_size in bytes:
class BookForm(f.Form):
title = f.TextField()
cover = f.AttachmentField(
Attachment,
max_size=5 * 1024 * 1024, # 5 MB
)
A failed check produces errors.FILE_TOO_LARGE with a formatted size in the error args. The default message is "File size should be 5 MB or less".
max_size is a form-level check: the multipart parser still has to receive the bytes before the field can measure them. For a hard ceiling that rejects oversized requests before they're parsed, set the framework-wide MAX_CONTENT_LENGTH in config/main.py. The two work together: the framework limit protects the server from huge uploads, max_size produces friendly per-field errors for files that fit the request limit but exceed your application's policy.
Restricting content types. Pass accept as a list of patterns:
class UserForm(f.Form):
avatar = f.AttachmentField(
Attachment,
accept=["image/*"],
)
accept uses fnmatch semantics, so image/* covers image/jpeg, image/png, image/webp, and so on. List specific types when you want to be stricter:
attachment = f.AttachmentField(
Attachment,
accept=["image/png", "image/jpeg", "image/webp"],
)
A failed check produces errors.INVALID_CONTENT_TYPE with the rejected list in the error args. Comparison is case-insensitive on both sides.
The HTML accept attribute on the file input is a separate thing - it filters the picker dialog client-side but doesn't validate. Always pair accept= on the field with the same accept= on the rendered input; the addon's image_input.jx component does this for you.
Skipping validation on bound attachments. Both max_size and accept are skipped when the form value is an existing Attachment row rather than a fresh upload. The check looks at whether the value has a size (or content_type) attribute; if not, the validator passes.
This matters when you re-render an edit form. The user opens the page, the form binds the existing avatar, the user changes their name (not the avatar), and submits. The form value at that point is the bound Attachment row, not an upload, so the size and content-type rules don't apply. Existing attachments are grandfathered until the user actually replaces them, at which point the new upload is checked against the current rules.
Custom validators. For checks beyond size and content type, use the standard form validate_<field> hook:
class DocumentForm(f.Form):
file = f.AttachmentField(Attachment, accept=["application/pdf"])
def validate_file(self):
upload = self.file.value
if upload is None or isinstance(upload, Attachment):
return # nothing new to check
filename = getattr(upload, "filename", "") or ""
if " " in filename:
self.file.error = "Filename must not contain spaces"
The isinstance(upload, Attachment) guard mirrors the built-in skip - don't re-validate an already-saved attachment. For reusable rules, write your own subclass of AttachmentField.
Customizing error messages. Pass messages={...} with the message key and your replacement template:
cover = f.AttachmentField(
Attachment,
max_size=2 * 1024 * 1024,
accept=["image/jpeg", "image/png"],
messages={
"file_too_large": "Cover image must be 2 MB or less",
"invalid_content_type": "Cover image must be a JPEG or PNG",
"required": "Please choose a cover image",
},
)
For application-wide message changes, define a translation in your locale files instead of repeating messages={...} on every field. See the Internationalization guide for the keys and overrides.
4. Removing Files
To remove an attachment, call one of the purge methods. Both delete the file from the active service and remove the database row; they differ in when the work happens.
| Method | Synchronous | Use case |
|---|---|---|
purge() |
yes | Inline cleanup where blocking on the service is fine. |
purge_later() |
no | Replacing an attachment from a request handler. |
attachment.purge() # blocks until the service confirms deletion
attachment.purge_later() # enqueues a Huey task, returns immediately
purge() calls service.purge(), removes any variants of the attachment, and deletes the row. It's the right call from a CLI script or a background job, where you want the work done before the next thing runs.
purge_later() enqueues a Huey task that does the same work in a worker process. The task takes only the attachment's primary key and re-fetches the row before acting, so it's safe even if the row is deleted some other way before the task runs.
AttachmentField uses purge_later() for the previous attachment after a successful replacement: the new file is uploaded and saved synchronously (you want to know if that fails), but the cleanup of the old file happens in the background (a slow S3 delete shouldn't block the form response).
To remove just the variants of an attachment, leaving the original alone:
attachment.purge_variants() # synchronous
attachment.purge_variants_later() # background
This is occasionally useful after a design change that invalidates dimensions, or in a migration that switches output format.
5. Serving Files
Once an attachment is saved, attachment.url returns a string you can drop into a template, a JSON response, or anywhere else a URL goes. The shape of that URL depends on whether the attachment lives in a public service or a private one:
# Stored in a service with public=True:
attachment.url
# => "/storage/public/550e8400-e29b-41d4-a716-446655440000"
# Stored in a service without the flag (the default):
attachment.url
# => "/storage/aBcDe.fGhIj.kLmNoPqRsT"
Both routes are wired up by the proper install storage blueprint. The first goes through PublicAttachmentController, which is registered with skip_authentication = True. The second goes through AttachmentController, which runs your normal authentication stack and verifies the signed token.
5.1 Public URLs
Attachments in a public service are reachable through a stable URL containing only the attachment's UUID:
<img src="{{ user.avatar.url }}" alt="">
There's no token, no expiration, and no server-side check beyond "does this row exist and is it stored in a public service". Anyone who learns the URL can fetch the file. Use this for files that are genuinely public.
If you want browsers and CDNs to cache the file aggressively, public is the right shape. Combine it with a CDN in front of your application (or a separate static-site hosting setup) when read traffic is high.
5.2 Signed URLs
When the attachment lives in a private service (the default), attachment.url returns a URL containing a signed token - a short string that encodes the attachment's id and an expiration date, signed with your application's secret key. The AttachmentController decodes the token, checks the signature and expiration, looks up the attachment, and serves the file.
By default, tokens expire one year after they're generated. To verify a token with a custom maximum age, look the attachment up explicitly:
from .attachment import Attachment
attachment = Attachment.get_signed(token, max_age=3600) # one hour
if attachment is None:
raise NotFound
Attachment.get_signed(token, max_age=...) returns the attachment if the token is valid and within max_age, or None otherwise. Attachment.get_public(pk) is the analogous lookup for public attachments by UUID.
The bundled AttachmentController.show() calls Attachment.get_signed(token, max_age=None), which falls back to the default one-year window. To tighten or extend that, edit the controller:
# controllers/storage_controller.py
from datetime import timedelta
class AttachmentController(AppController):
def show(self):
token = self.params.get("token")
max_age = int(timedelta(hours=24).total_seconds())
obj = Attachment.get_signed(token, max_age=max_age)
if not obj:
raise NotFound
obj.send_file()
Signed URLs cap the time window during which a URL is valid, but they don't say anything about who should be allowed to use it. For attachments that are tied to a specific user (a private invoice, a paid download), enforce the access check in the controller as well - see the next section.
5.3 Authenticated Controller Downloads
The signed token proves that a URL was generated by your code; it doesn't prove the current viewer should be allowed to see the file. To gate access on application-level rules - team membership, ownership, payment status - put the check inside AttachmentController:
class AttachmentController(AppController):
def show(self):
token = self.params.get("token")
obj = Attachment.get_signed(token)
if not obj:
raise NotFound
if not self.current_user.can_read(obj):
raise Forbidden
obj.send_file()
send_file() takes no arguments. It picks up the active response from the request context, sets the right content type, and chooses inline vs attachment disposition based on STORAGE_ALLOWED_INLINE. The actual byte streaming is delegated to the underlying service: a direct file send for Disk, a get_object plus body write for S3.
For attachments that should always force a download (a "Save As" link for an image), serve them from your own controller and pass as_attachment=True to the service:
class DownloadController(AppController):
def show(self):
att = Attachment.get_signed(self.params["token"])
if not att:
raise NotFound
att._service.send_file(att, self.response, as_attachment=True)
The pattern is uncommon - usually the inline-vs-attachment decision is content-type-driven and STORAGE_ALLOWED_INLINE is enough - but the option is there.
6. Downloading Files
Sometimes you need to read an attachment's bytes back into Python: parsing a CSV, hashing a file for a deduplication check, re-uploading to a different service, transcoding through an external tool. download() returns the file as bytes:
data = attachment.download()
# => b'...'
Use it for parsing, hashing, transcoding, or any work that needs the file in memory. The whole file is materialized at once, which is fine for images and small documents but inappropriate for multi-gigabyte uploads. For very large files, write a streaming controller that reads from the underlying service and pipes to the response.
Reading the bytes through download() always goes through the service: on Disk, that's a path.read_bytes(); on S3, a get_object followed by reading the body.
7. Analyzing Files
Proper does not automatically extracts metadata (image dimensions, audio bitrate, video duration) from uploaded files out of the box. The reason is mostly pragmatic: the analysis pipeline depends on a small zoo of native libraries (libvips for images, ffprobe for video, mutagen for audio) and we'd rather you opt into the ones you actually need.
Every Attachment carries a metadata JSON column that you can populate yourself:
from PIL import Image
att = Attachment(upload, filename="photo.jpg")
att.save()
# Compute and store dimensions:
img = Image.open(att.download_to_tempfile())
att.metadata = {
"width": img.width,
"height": img.height,
"alt": "Sunset over the harbor",
"captured_at": img.getexif().get(36867),
}
att.save()
(download_to_tempfile() is not a built-in method - this example sketches the shape of analyzer code you'd write yourself; pyvips, Pillow, ffprobe, and mutagen all accept either bytes or paths and the bytes are one download() away.)
For analysis that should always run, override save() on your Attachment subclass:
class Attachment(app.attachment_for(BaseModel)):
def save(self, *args, **kwargs):
result = super().save(*args, **kwargs)
if self._upload is None and self.content_type.startswith("image/") and not self.metadata:
# Newly persisted image, no metadata yet - extract dimensions.
try:
w, h = self._extract_dimensions()
self.metadata = {"width": w, "height": h}
super().save()
except Exception:
pass # don't block uploads on analysis failures
return result
The pattern is: store first, analyze second, swallow analysis failures so a flaky analyzer can't break uploads.
For metadata that you'd want to query against (uploaded-by user id, gallery id, expiration date), add a real column to your Attachment subclass instead of stuffing it into metadata. JSON is fine for ad-hoc, optional data; a real column is better when it's queryable or required.
8. Displaying Images, Videos, and PDFs
attachment.url works wherever a string URL goes. The most common shapes:
{# An image #}
<img src="{{ user.avatar.url }}" alt="{{ user.name }}">
{# A video, served inline #}
<video src="{{ post.clip.url }}" controls></video>
{# A link that downloads with the original filename #}
<a href="{{ document.file.url }}" download>{{ document.file.filename }}</a>
{# A list of photos with thumbnails (variants are covered later) #}
{% for photo in gallery.photos %}
<a href="{{ photo.attachment.url }}">
<img src="{{ photo.attachment.variant(resize_to_fill=(200, 200)).url }}">
</a>
{% endfor %}
Whether the file shows inline (the <img> actually renders) or downloads (the browser opens a Save dialog) is decided by STORAGE_ALLOWED_INLINE (Baseline Configuration). The default covers image/*, video/*, and application/pdf - exactly the formats that browsers know how to display in place. For everything else, the browser saves the file.
In a JSON response, treat the URL like any other field:
def show(self):
self.response.json = {
"id": self.user.id,
"name": self.user.name,
"avatar_url": self.user.avatar.url if self.user.avatar else None,
}
For PDFs, modern browsers render them in an embedded viewer when served inline. STORAGE_ALLOWED_INLINE defaults include application/pdf, so a <iframe src="{{ doc.file.url }}"> works without extra configuration. If you want PDFs to download instead, drop application/pdf from STORAGE_ALLOWED_INLINE.
9. Transforming Images
A variant is a derived file generated from an original. The classic case is a thumbnail: store one full-size avatar, but render a 200x200 crop in lists, a 64x64 crop in headers, and a blurred hero version on the profile page. Each variant is itself an Attachment row, with parent set to the original and variant_key set to a hash of the operations that produced it.
Variants are:
- On-demand - the variant file is generated the first time you ask for it, not at upload time.
- Cached forever - subsequent calls return the existing row without reprocessing.
- Persisted - both the bytes (in the active service) and the row (in the
attachmenttable) are kept. - Image-only by default - the bundled
transform_imagehandlesimage/*content types. PDFs and videos require a custom transform method (covered in Previewing Files).
Variants are not free. The first request that triggers a new variant pays for the whole transformation (decode, resize, encode, upload). For predictable response times in production, pre-generate the variants you know you'll need - see Eager Loading Variants.
9.1 Generating a Variant
Call variant(**ops) on any image attachment:
thumb = user.avatar.variant(resize_to_fill=(200, 200))
thumb.url
# => "/storage/aBcDe..."
The first call processes the image and creates a new Attachment row. Subsequent calls with the same operations look up the existing row by hash and return it.
Variants inherit the parent's service_name unless you override it, so they land in the same service - and inherit its access mode automatically. A parent in a public service gives a public variant; a parent in a private service gives a signed variant. The variant's id is its own UUID.
You can chain operations in a single call - they're applied left to right:
hero = post.cover_image.variant(
resize_to_fill=(1600, 600),
blur=(8.0,),
)
In a template, calling variant() is cheap once the variant exists - it's a single index lookup by hash, not a recompute - so you can put it directly in the markup without caching gymnastics.
9.2 Available Transformations
Pass any combination of these as keyword arguments to variant():
| Operation | Args | What it does |
|---|---|---|
resize_to_fit |
(width, height) |
Fit inside the box, preserving aspect ratio. Either dimension can be None. |
resize |
(width, height) |
Alias for resize_to_fit. |
resize_to_fill |
(width, height) |
Fill the box exactly, cropping the longer side. Center crop by default. |
resize_and_pad |
(width, height) |
Fit, then pad with black or transparent to reach the exact box. |
rotate |
(degrees,) |
Rotate by an arbitrary angle. Corners are filled with black by default. |
fliphor |
() |
Flip horizontally. |
flipver |
() |
Flip vertically. |
grayscale |
() or (r, g, b) |
Convert to grayscale. Default uses BT.601 luminance weights. |
sepia |
() or (r, g, b) |
Apply a sepia tone. Defaults produce a classic warm sepia. |
blur |
(sigma,) |
Gaussian blur. Larger sigma means more blur. |
composite |
(overlay,) |
Blend an overlay image on top - useful for watermarks. |
Each operation accepts either a positional tuple or a tuple ending with a kwargs dict for advanced options:
attachment.variant(resize_to_fill=(400, 400, {"crop": "attention"})) # smart crop
attachment.variant(rotate=(45, {"background": [255, 255, 255]})) # white corners
attachment.variant(composite=("logo.png", {"gravity": "south-east"})) # bottom-right watermark
The full set of pyvips options is forwarded through; the pyvips documentation is the reference for what each operation supports.
9.3 Format Conversion
Two special keys, load and save, control how the image is read and written:
# Load options - passed to pyvips when opening the file:
attachment.variant(
resize_to_fit=(800, 600),
load={"autorot": True}, # respect EXIF orientation (the default)
)
# Save options - passed to pyvips when writing the variant:
attachment.variant(
resize_to_fit=(800, 600),
save={"format": ".webp", "Q": 80},
)
When you don't pass save["format"], Proper picks one for you based on the source content type:
- If the source matches a pattern in
STORAGE_ALLOWED_VARIANTS(Baseline Configuration), the variant is saved in the source format. A JPEG source produces a JPEG variant, a PNG source produces a PNG variant. - Otherwise, the variant is saved in
STORAGE_FALLBACK_FORMAT(default"png"). This covers source formats like TIFF, BMP, or HEIC that you don't want to expose verbatim.
A caller-supplied save={"format": "..."} overrides both rules. The format string controls both the file extension and the pyvips encoder; per-encoder options (quality, compression level, ...) are forwarded as additional keys in the same save dict.
9.4 Variant Idempotency
variant() is idempotent: same arguments, same variant. The mechanism is a SHA-256 hash of the ops dict, stored as variant_key on the variant row. When you call variant(), Proper:
- Resolves the save format (per the rules in Format Conversion) and injects it into the ops dict.
- Computes the SHA-256 hash from the resolved ops.
- Looks up
Attachment.parent == self AND variant_key == hash. - Returns the existing row if found, otherwise generates the variant and inserts a new row.
The resolved format is part of the hash, so a JPEG source with resize_to_fill=(200, 200) and a TIFF source with the same kwargs produce different keys (and different variants - the TIFF lands as PNG).
The argument order matters for the hash ((200, 100) is not the same as (100, 200)), but the order of keys within the load and save dicts does not - those are sorted before hashing.
This means you can call variant() freely in templates without worrying about duplicate work:
{% for user in users %}
<img src="{{ user.avatar.variant(resize_to_fill=(64, 64)).url }}">
{% endfor %}
Each iteration looks up the same variant by hash. One database query per call, no reprocessing.
9.5 Purging Variants
Call purge_variants() to delete every variant of an attachment, leaving the original in place:
attachment.purge_variants() # synchronous
attachment.purge_variants_later() # queue a Huey task
You'd typically do this after a design change that invalidates dimensions (the 200x200 thumbnail is now 240x240 everywhere, regenerate), or in a migration that switches output format (move from JPEG to WebP variants).
purge() (without _variants) deletes the original and all of its variants in one call.
9.6 Eager Loading Variants
In production you usually want to avoid the "first request pays" cost. Two common patterns:
Pre-generate after upload. Right after an attachment is saved, queue the variants you'll need:
@app.queue.task
def generate_avatar_variants(attachment_id):
att = Attachment.get_or_none(Attachment.id == attachment_id)
if att is None:
return
att.variant(resize_to_fill=(64, 64))
att.variant(resize_to_fill=(200, 200))
att.variant(resize_to_fill=(800, 800))
def update(self):
self.form.save()
if self.user.avatar_id:
generate_avatar_variants(str(self.user.avatar_id))
self.response.redirect_to("User.show", self.user)
The first request that displays an avatar finds the variant already in the database and serves it without recomputing.
Pre-generate in a migration. When you add a new variant size to your design, walk the existing attachments and generate the new variant once:
# scripts/backfill_variants.py
from myapp.models import User
for user in User.select().where(User.avatar_id.is_null(False)):
user.avatar.variant(resize_to_fill=(64, 64))
Run it once after deploying the design change.
The variant cache (the row in attachment plus the bytes in the service) survives across requests, processes, and deploys. You only pay the generation cost once per (parent, ops) combination ever.
10. Previewing Files
Out of the box, variant() only handles images. For everything else - PDFs, videos, audio with cover art, ePubs - subclass your Attachment and extend the SUPPORTED_VARIANT_TYPES mapping with a content-type pattern and a transform method.
The mapping lives on the Attachment class:
SUPPORTED_VARIANT_TYPES = {
"image/*": "transform_image",
}
Keys are content-type glob patterns (matched with fnmatch); values are method names on the class. To add PDF previews, override the dict and write the method:
# models/attachment.py
import io
from pdf2image import convert_from_bytes
from ..main import app
from .base import BaseModel
class Attachment(app.attachment_for(BaseModel)):
SUPPORTED_VARIANT_TYPES = {
"image/*": "transform_image",
"application/pdf": "transform_pdf",
}
def transform_pdf(self, source, page=1, **ops):
images = convert_from_bytes(source, first_page=page, last_page=page)
buf = io.BytesIO()
images[0].save(buf, format="PNG")
return buf.getvalue()
Now pdf_attachment.variant(page=1) produces a PNG preview of the first page, stored as a normal variant - cached, deduplicated, served through whatever service the parent uses. The same shape works for video frame extraction (FFmpeg via ffmpeg-python), audio waveform thumbnails, anything that turns one file type into another.
The transform method receives the original bytes and the operation kwargs; it returns either bytes, a bytearray, or anything else that Attachment.create_variant() can handle as an upload. It can also delegate to self.transform_image() after extracting an image:
def transform_pdf(self, source, page=1, **ops):
images = convert_from_bytes(source, first_page=page, last_page=page)
buf = io.BytesIO()
images[0].save(buf, format="PNG")
# Apply image transformations (resize, crop, etc.) to the extracted page:
return self.transform_image(buf.getvalue(), **ops)
That last form lets you pass resize_to_fill=(400, 400) to a PDF variant call and have it work the way it would for an image.
The transform method receives all of the kwargs you passed to variant(), including the resolved save dict. If your transform produces bytes in a fixed format and ignores save["format"], that's fine - the variant_filename is still derived from the resolved format and the bytes are written as-is. If you want the transform to honor format conversions (PDF page extracted as PNG vs JPG), inspect ops["save"]["format"] and branch accordingly.
For variants of audio or video, you can use the same hook with ffmpeg-python to extract a frame:
class Attachment(app.attachment_for(BaseModel)):
SUPPORTED_VARIANT_TYPES = {
"image/*": "transform_image",
"video/*": "transform_video_frame",
}
def transform_video_frame(self, source, at_seconds=1, **ops):
import ffmpeg
# ffmpeg-python can read from a pipe; pass `source` via stdin.
out, _ = (
ffmpeg
.input("pipe:0", ss=at_seconds)
.output("pipe:1", format="image2", vframes=1)
.run(input=source, capture_stdout=True, capture_stderr=True)
)
return self.transform_image(out, **ops)
The transformation method shape is deliberately simple. Anything that can read bytes and produce bytes is a valid transform.
11. Testing
The bundled config/storage.py defines a separate test service pointing at temp/storage/. The STORAGE selector switches to it when the environment is "test", so the test suite never writes into your development storage folder.
"test": {
"type": "Disk",
"root": "temp/storage",
}
This is the equivalent of giving your tests their own database: a clean, isolated, throwaway service whose contents you can wipe between tests without losing anything that mattered.
11.1 Discarding Files Stored During Tests
A small autouse fixture wipes the folder between tests:
# tests/conftest.py
import shutil
import pytest
@pytest.fixture(autouse=True)
def clean_storage(app):
test_root = app.root_path.parent / "temp" / "storage"
yield
if test_root.exists():
shutil.rmtree(test_root)
Mark it autouse=True to run on every test (the recommended default) or pull it in explicitly when only some tests upload files. Because the folder is recreated by the Disk service on first use, you don't need to set it up before the test runs - just clean up after.
If you prefer to clean before each test, swap the yield order:
@pytest.fixture(autouse=True)
def clean_storage(app):
test_root = app.root_path.parent / "temp" / "storage"
if test_root.exists():
shutil.rmtree(test_root)
yield
For attachments built in a fixture (a fixture that loads a known set of users with avatars, for example), generate them once per session and let the fixture set up and tear down its own folder.
11.2 Testing Against Cloud Services
For S3-backed code paths, moto mocks the AWS API at the boto3 layer. Configure it as a fixture rather than baking it into the test storage config:
# tests/conftest.py
import os
import pytest
from moto import mock_aws
@pytest.fixture()
def aws_credentials():
os.environ["AWS_ACCESS_KEY_ID"] = "testing"
os.environ["AWS_SECRET_ACCESS_KEY"] = "testing"
os.environ["AWS_DEFAULT_REGION"] = "us-east-1"
@pytest.fixture()
def s3_storage(aws_credentials, app):
with mock_aws():
import boto3
boto3.client("s3").create_bucket(Bucket="test-bucket")
# Override the active service for the duration of the test:
original = app.config["STORAGE"]
app.config["STORAGE"] = "amazon"
yield
app.config["STORAGE"] = original
Tests that need the S3 code path opt in by depending on s3_storage; the rest keep using the disk service. moto starts and tears down a per-test in-memory S3 with no network calls.
For end-to-end tests that should hit a real bucket - smoke tests on staging, integration tests that exercise IAM policies - you can use Minio running on a docker container. Or point a test environment at a dedicated bucket and run them outside the unit-test loop. The S3 service has no special test mode of its own.
12. Implementing Support for Other Cloud Services
Proper ships with two services: Disk and S3. To add support for, say, Google Cloud Storage, subclass proper.storage.services.Service and implement four methods:
# myapp/storage/gcs.py
import typing as t
from proper.storage.services import Service
if t.TYPE_CHECKING:
from proper.app import App
from proper.types import TAttachment, TUpload
class GoogleCloud(Service):
def __init__(self, app: "App", **config: t.Any) -> None:
from google.cloud import storage as gcs
self.bucket = gcs.Client().bucket(config["bucket"])
super().__init__(app, **config)
def upload(self, upload: "TUpload", obj: "TAttachment") -> None:
file = getattr(upload, "file", upload)
blob = self.bucket.blob(self._key(obj))
file.seek(0)
blob.upload_from_file(file, content_type=obj.content_type)
blob.reload()
obj.byte_size = blob.size
def download(self, obj: "TAttachment") -> bytes:
return self.bucket.blob(self._key(obj)).download_as_bytes()
def send_file(self, obj: "TAttachment", response, as_attachment: bool = False) -> None:
data = self.download(obj)
disposition = "attachment" if as_attachment else "inline"
response.content_type = obj.content_type
response.set_content_length(len(data))
response.headers["content-disposition"] = (
f'{disposition}; filename="{obj.filename}"'
)
response.body = data
def purge(self, obj: "TAttachment") -> None:
self.bucket.blob(self._key(obj)).delete()
def _key(self, obj: "TAttachment") -> str:
return f"{str(obj.id)[:2]}/{str(obj.id)[2:4]}/{obj.id}/{obj.filename}"
Once the class is imported anywhere in your app (a top-level import in myapp/__init__.py is the simplest place), the type value in the service config matches the class name:
STORAGE_SERVICES = {
"google": {
"type": "GoogleCloud",
"bucket": "my-app-uploads",
},
}
STORAGE = "google"
Look at proper/storage/services/disk.py and proper/storage/services/s3.py for working references. Both are short - around 70 lines each - and cover all four methods plus their constructors.
The four methods of the Service interface:
| Method | Returns | What it does |
|---|---|---|
upload |
None |
Write the upload to the service. Sets obj.byte_size along the way. |
download |
bytes |
Read the file out of the service into memory. |
send_file |
None |
Stream the file to the active response, with the right disposition. |
purge |
None |
Delete the file from the service. Trailing empty directories may also go. |
The keys you store the file under are entirely up to you - Disk shards by {id[:2]}/{id[2:4]}/{filename}, S3 by {id[:2]}/{id[2:4]}/{id}/{filename}. The shape doesn't have to match either; what matters is that _key() is deterministic so download and purge can find what upload wrote.
For services with native streaming responses (presigned URLs, range requests), send_file is where you'd integrate with that machinery. The signature (obj, response, as_attachment) is the contract; how you fulfill it is up to the service.
13. What's Next
Storage touches forms, models, async tasks, and the asset pipeline. A few places to go from here:
AttachmentFieldin the Forms guide - the full reference forAttachmentField, including the lifecycle table and transactional save semantics.- Rendering Forms - Attachment uploads - the HTML side:
file_input(),destroy_input(), and theimage_input.jxcomponent. - Models and Relationships - the foreign-key patterns that connect your records to attachments.
- Background Tasks - the queue that runs
purge_later(),purge_variants_later(), and any eager-loading task you write. - pyvips documentation - the full image-processing reference behind
transform_image. - boto3 documentation - the underlying client for the S3 service.