# Django ORM relations ## 1. Use `select_related` sparingly Django's `select_related` results in an SQL JOIN. It's very useful when fetching a one-to-one relation: ```python class User(Model): ... class UserProfile(Model): user = OneToOneField(User, related_name="profile") ... User.objects.select_related("profile").get(pk=1) # ✅DO! ``` It often results in massive data over-fetching when fetching a many-to-one relation: ```python class Album(Model): ... class Track(Model): album = Foreignkey(Album, related_name="tracks") ... x = Track.objects.select_related("album").all() # ❌DON'T! ``` If an album has an average of 20 tracks, we're asking the database to fetch and serialize information on each album 20 times on average. With a few JOINs it's easy to end up with result tables that span gigabytes of data and once those no longer fit in memory, the result set starts to be assembled on disk. DevOps don't want you to force the database to write to disk, trust me on this one. The correct course of action is to use `prefetch_related` and let Django fetch the related records in a separate query: ```python x = Track.objects.prefetch_related("album").all() # ✅DO! ``` While intuitively it may seem slower to this using two queries rather that one, it's usually faster when the database is working under stress. The cost of sending an additional query packet to the DB server is negligible for RDS. ## 2. Apply `select_related` and `prefetch_related` as late as possible While it may seem convenient to do all optimizations in methods that filter data (often at model or manager level), it's much easier to figure out if all relations are correctly prefetched (and to confirm that none of the unwanted ones are present!) if the list of prefetches/joined relations lives close to the point where your application uses the data. The harder it is to tell what prefetches/joins you're using, the more prevalent are going to be the *n+1* bugs where a single change in logic results in extra queries being executed for each row of the result set. ```python def active_products(): return Product.objects.filter( active=True ).prefetch_related("category", "stock", "seller") ... # seven seas and seven mountains def list_products_in_category(request, category_id): products = active_products.filter( category_id=category_id ) # ❌DON'T! ``` ```python def active_products(): return Product.objects.filter( active=True ) ... # seven seas and seven mountains def list_products_in_category(request, category_id): products = active_products.filter( category_id=category_id ).prefetch_related("seller") # we don't even need most of them # ✅DO! ``` ## 3. Use `Exists` and `OuterRef` for filtering Filter queries are cheap unless INNER JOINs or DISTINCT results are involved. In those cases the database has to fetch a huge amount of joined rows (in the worst case the cartesian product of all joined tables), sort them and then deduplicate them before returning the results. As mentioned before, this gets very slow if the results are forced to be flushed to disk because of their size (or because your database server is executing multiple queries in parallel—congratulations, your business is popular). ```python authors = Author.objects.filter(book__category__name__ilike="foo") # ❌slow! ``` These queries are much faster if the database knows you're only interested in one table and use the others to narrow down the results. Using an EXISTS subquery for this purpose means the database can stop looking for matches as soon as it finds one. ```python categories = Category.objects.filter( name__ilike="foo" ).values("id") # avoid overfetching in the subquery books = Book.objects.annotate( matching=Exists(categories.filter(category_id=OuterRef("id"))) ).filter( matching=True ).values("id") # avoid overfetching in the subquery authors = Author.objects.annotate( matching=Exists(books.filter(author_id=OuterRef("id"))) ).filter( matching=True ) # ✅FAST! # but also # ❌quite ugly! ``` Unfortunately Django does not allow using `Exist` expressions to filter directly. This means it's not easy to make the code both fast and readable, so it use with care or write helper queryset methods to take care of the boilerplate code.