Examples

Dependency Examples

The following examples may give you a first idea on how to use the depends keyword of the computed decorator for different scenarios.

No Dependencies

The most basic example is a computed field, that has no field dependencies at all. It can be constructed by omitting the depends argument, e.g.:

class MyComputedModel(ComputedFieldsModel):

    @computed(Field(...))
    def comp(self):
        return some_value_pulled_from_elsewhere

Such a field will only be recalculated by calling save() or save(update_fields=[comp, ...]) on a model instance. It never will be touched by the auto resolver, unless you force the recalculation by directly calling update_dependent(MyComputedModel.objects.all()), which implies update_fields=None, thus updates all model local fields, or again by explicitly listing comp in update_fields like in update_dependent(MyComputedModel.objects.all(), update_fields=['comp']).

Local Fields

A more useful computed field example would do some calculation based on some other model local fields:

class MyComputedModel(ComputedFieldsModel):
    fieldA = Field(...)
    fieldB = Field(...)

    @computed(Field(...), depends=[['self', ['fieldA', 'fieldB']]])
    def comp(self):
        return some_calc(self.fieldA, self.fieldB)

This can be achieve in a safe manner by placing a self rule in depends, listing local concrete fields on the right side, as shown above.

Background on self rule

At a first glance it seems weird, that you should declare dependencies on other model local fields. Well, in previous versions it was not needed at all, but turned out as a major shortcoming of the old depends syntax leading to unresolvable ambiguity. The new syntax and the need to put local fields in a self rule enables django-computedfields to properly derive the execution order of local computed fields (MRO) and to correctly expand on update_fields given to a partial save call.

Local Computed Fields

To depend on another local computed field, simply list it in the self rule as another local concrete field:

class MyComputedModel(ComputedFieldsModel):
    fieldA = Field(...)
    fieldB = Field(...)
    fieldC = Field(...)

    @computed(Field(...), depends=[['self', ['fieldA', 'fieldB']]])
    def comp(self):
        return some_calc(self.fieldA, self.fieldB)

    @computed(Field(...), depends=[['self', ['fieldC', 'comp']]])
    def final(self):
        return some__other_calc(self.fieldC, self.comp)

The auto resolver will take care, that the computed fields are calculated in the correct order (MRO). In the example above it will make sure, that final gets recalculated after comp only once, and never vice versa. This also works with a partial save with save(update_fields=['fieldA']). Here the resolver will expand update_fields to ['fieldA', 'comp', 'final'].

The ability to depend on other local computed fields may lead to update cycles:

class MyComputedModel(ComputedFieldsModel):
    fieldA = Field(...)
    fieldB = Field(...)
    fieldC = Field(...)

    @computed(Field(...), depends=[['self', ['fieldA', 'fieldB', 'final']]])
    def comp(self):
        return some_calc(self.fieldA, self.fieldB, self.final)

    @computed(Field(...), depends=[['self', ['fieldC', 'comp']]])
    def final(self):
        return some__other_calc(self.fieldC, self.comp)

There is no way to create or update such an instance, as comp relies on final, which itself relies on comp. Here the the dependency resolver will throw a cycling exception during startup.

Note

Dependencies to other local computed fields always must be cycle-free.

Many-To-Many Fields

Django’s ManyToManyField can be used in the dependency declaration on the left side as a relation:

class Person(ComputedFieldsModel):
    name = models.CharField(max_length=32)

    @computed(models.CharField(max_length=256), depends=[['groups', ['name']]])
    def groupnames(self):
        if not self.pk:
            return ''
        return ','.join(self.groups.all().values_list('name', flat=True))

class Group(models.Model):
    name = models.CharField(max_length=32)
    members = models.ManyToManyField(Person, related_name='groups')

M2M relations are tested to work in both directions with their custom manager methods like add, set, remove and clear. Also actions done to instances on boths ends should correctly update computed fields through the m2m field. Still there are some specifics that need to be mentioned here.

In the method above there is a clause skipping the actual logic, if the instance has no pk value yet. That clause is needed, since Django will not allow access to an m2m relation manager before the instance was saved to the database. After the initial save the m2m relation can be accessed, now correctly pulling field values across the m2m relation.

M2M fields allow to declare a custom through model for the join table. To use computed fields on the through model or to pull fields from it to either side of the m2m relation, you cannot use the m2m field anymore. Instead use the foreign key relations declared on the through model in depends.

Another important issue around m2m fields is the risk to cause a rather high update pressure later on. Here it helps to remember, that the n:m relation in fact means, that every single instance in n potentially updates m instances and vice versa. If you have multiple computed fields with dependency rules spanning through an m2m field in either direction, the update penalty will explode creating a new bottleneck in your project. Although there are some ways to further optimize computed fields updates, they are still quite limited for m2m fields. Also see below under optimization examples.

Warning

M2M fields may create a high update pressure on computed fields and should be avoided in depends as much as possible.

Forced Update of Computed Fields

The simplest way to force a model to resync all its dependent computed fields is to re-save all model instances:

for inst in desynced_model.objects.all():
    inst.save()

While this is easy to comprehend, it has the major drawback of resyncing all dependencies as well for every single save step touching related models over and over. Thus it will show a bad runtime for complicated dependencies on big tables. A slightly better way is to call update_dependent instead:

from computedfields.models import update_dependent
update_dependent(desynced_model.objects.all())

which will touch dependent models only once with an altered queryset containing all affected records.

If you have more knowledge about the action that caused a partial desync, you can customize the queryset accordingly:

# given: some bulk action happened before like
# desynced_model.objects.filter(fieldA='xy').update(fieldB='z')

# either do
for inst in desynced_model.objects.filter(fieldA='xy'):
    inst.save(update_fields=['fieldB'])
# or
update_dependent(desynced_model.objects.filter(fieldA='xy'), update_fields=['fieldB'])

Here both save or update_dependent will take care, that all dependent computed fields get updated. Again using update_dependent has the advantage of further reducing the update pressure. Providing update_fields will narrow the update path to computed fields, that actually rely on the listed source fields.

A full resync of all computed fields project-wide can be triggered by calling the management command updatedata. This comes handy if you cannot track down the cause of a desync or do not know which models/fields are actually affected.

Tip

After bulk actions always call update_dependent with the changeset for any model to be on the safe side regarding sync status of computed fields. For models, that are not part of any dependency, update_dependent has a very small footprint in O(1) and will not hurt performance.

Note that bulk actions altering relations itself might need a preparation step with preupdate_dependent (see API docs and optimization examples below).

Optimization Examples

The way django-computedfields denormalizes data by precalculating fields at insert/update time puts a major burden on these actions. Furthermore it synchronizes data between all database relevant model instance actions from Python, which can cause high update load for computed fields under certain circumstances. The following examples try to give some ideas on how to avoid major update bottlenecks and to apply optimizations.

Prerequisites

Before trying to optimize things with computed fields it might be a good idea to check where you start from. In terms of computed fields there are two major aspects, that might lead to poor update performance:

  • method code itself

    For the method code it is as simple as that - complicated code tends to do more things, tends to run longer. Try to keep methods slick, there is no need to wonder about DB query load, if the genuine method code itself eats >90% of the runtime (not counting needed ORM lookups). For big update queries you are already on the hours vs. days track, if not worse. If you cannot get the code any faster, maybe try to give up on the “realtime” approach computed fields offer by deferring the hard work.

    (Future versions might provide a @computed_async decorator to partially postpone hard work in a more straight forward fashion.)

  • query load

    The following ideas/examples below mainly concentrate on query load issues with computed field updates and the question, how to gain back some update performance. For computed field updates the query load plays a rather important role, as any relation noted in dependencies is likely to turn into an n-case update. In theory this expands to O(n^nested_relations), practically it cuts down earlier due to finite records in the database and agressive model/field filtering done by the auto resolver. Still there is much room for further optimizations.

    Before applying some of the ideas below make sure to profile your project. Tools that might come handy for that:

    • django.test.utils.CaptureQueriesContext
      Comes with Django itself, easy to use in tests or at the shell to get an idea, what is going on in SQL.
    • django-debug-toolbar
      Nice Django app with lots of profiling goodies like the SQL panel to inspect database interactions and timings.
    • django-extensions
      Another useful Django app with tons of goodies around Django needs. With the ProfileServer it is easy to find bottlenecks in your project.

Using update_fields

Django’s ORM supports partial model instance updates by providing update_fields to save. This is a great way to lower the update penalty by limiting the DB writes to fields that actually changed. To keep computed fields in sync with partial writes, the resolver will expand update_fields by computed fields, that have dependency intersections, example:

class MyModel(ComputedFieldsModel):
    name = models.CharField(max_length=256)

    @computed(models.CharField(max_length=256), depends=[['self', ['name']]])
    def uppername(self):
        return self.name.upper()

my_model.name = 'abc'
my_model.save(update_fields=['name'])   # expanded to ['name', 'uppername']

This deviation from Django’s default behavior favours data integrity over strict field listing.

M2M relations

M2M relations are the logical continuation of the section above - they always fall under the category of “complicated dependencies”. On relational level m2m fields are in fact n:1:m relations, where the 1 is an entry in the join table linking with foreign keys to the n and m ends.

For computed fields, whose dependencies span over m2m relations, this means, that you almost always should apply a prefetch lookup. Let’s look at the m2m example we used above, but slightly changed:

class Person(ComputedFieldsModel):
    name = models.CharField(max_length=32)

    @computed(models.CharField(max_length=256),
        depends=[['groups', ['name']]],
        prefetch_related=['groups']
    )
    def groupnames(self):
        if not self.pk:
            return ''
        names = []
        for group in self.groups.all():
            names.append(group.name)
        return ','.join(names)

class Group(models.Model):
    name = models.CharField(max_length=32)
    members = models.ManyToManyField(Person, related_name='groups')

Here the groups access gets optimized by prefetching the items, which again helps, if we do an n-cases update to Person. Since m2m relations are meant as set operations, we have a rather high chance to trigger multiple updates on Person at once. Thus using prefetch is a good idea here.

With the through model Django offers a way, to customize the join table of m2m relations. As noted above, it is also possible to place computed fields on the through model, or to pull data from it to either side of the m2m relations via the fk relations. In terms of optimized computed field updates there is a catch though:

class Person(ComputedFieldsModel):
    name = models.CharField(max_length=32)

    @computed(models.CharField(max_length=256),
        depends=[
            ['memberships', ['joined_at']],
            ['memberships.group', ['name']]         # replaces groups.name dep
        ],
        prefetch_related=['memberships__group']
    )
    def groupjoins(self):
        if not self.pk:
            return ''
        names = []
        for membership in self.memberships.all():   # not using groups anymore
            names.append('{}: joined at {}'.format(
                membership.group.name, membership.joined_at))
        return ','.join(names)

class Group(models.Model):
    name = models.CharField(max_length=32)
    members = models.ManyToManyField(Person, related_name='groups', through='Membership')

class Membership(models.Model):
    person = models.ForeignKey(Person, related_name='memberships')
    group = models.ForeignKey(Group, related_name='memberships')
    joined_at = SomeDateField(...)

You should avoid listing the m2m relation and the through relations at the same time in depends, as it will double certain update tasks. Instead rework your m2m dependencies to use the through relation, and place appropriate prefetch lookups for them.

Another catch with m2m relations and their manager set methods is a high update pressure in general. This comes from the fact that a set method may alter dependent computed fields on both m2m ends, therefore the resolver has to trigger a full update into both directions. Currently this cannot be avoided, since the m2m_changed signal does not provide enough details about the affected relation. This is also the reason, why the resolver cannot autoexpand dependencies into the through model itself. Thus regarding performance you should be careful with multiple m2m relations on a model or computed fields with dependencies crossing m2m relations forth and back.

Tip

Performance tip regarding m2m relations - don’t use them with computed fields.

Avoid depending a computed field on another computed field, that lives behind an m2m relation. It surely will scale bad with any reasonable record count later on leading to expensive repeated update roundtrips with “coffee break” quality for your business logic.

“One batch to bind ‘em all …”

As anyone working with Django knows, inserting/updating big batches of data can get you into serious runtime troubles with the default model instance approach. In conjunction with computed fields you will hit that ground much earlier, as even the simplest computed field with just one foreign key relation at least doubles the query load, plus the time to run the associated field method, example:

class SimpleComputed(ComputedFieldsModel):
    fk = models.ForeignKey(OtherModel, ...)

    @computed(Field(...), depends=[['fk', ['some_field']]])
    def comp(self):
        return self.fk.some_field

...
# naive batch import with single model instance creation
for d in data:
    obj = SimpleComputed(**d)
    obj.save()

Here obj.save() will do an additional lookup in OtherModel to get comp calculated, before it can save the instance. This will get worse the more computed fields with dependencies the instance has.

To overcome these bottlenecks of the instance model approach, the ORM offers a bunch of bulk actions, that regain performance by operating more close to the DB/SQL level.

Warning

Using bulk actions does not update dependent computed fields automatically anymore. You have to trigger the updates yourself by calling update_dependent or update_dependent_multi.

update_dependent is in fact the “main gateway” of the update resolver, it is also used internally for updates triggered by instance signals. So lets have a look on how that function can be used and its catches.

Given that you want to update some_field on several instances of OtherModel of the example above. The single instance approach would look like this:

new_value = ...
for item in OtherModel.objects.filter(some_condition):
    item.some_field = new_value
    item.save()                     # correctly updates related SimpleComputed.comp

which correctly deals with computed field updates though the instance signals. But in the background in fact this happens:

new_value = ...
for item in OtherModel.objects.filter(some_condition):
    item.some_field = new_value
    save()
    # post_save signal:
        update_dependent(item, old)         # full refesh on dependents

Yes, we actually called updated_dependent over and over. For the single instance signal hooks there is no other way to guarantee data integrity in between, thus we have to do the full roundtrip for each call (the roundtrip itself is rather cheap in this example, but might be much more expensive with more complicated dependencies).

With a bulk action this can be rewritten much shorter:

new_value = ...
OtherModel.objects.filter(some_condition).update(some_field=new_value)
# caution: here computed fields are not in sync
...
# explicitly resync them
update_dependent(OtherModel.objects.filter(some_condition), update_fields=['some_field'])

which reduces the workload by far. But note that it also reveals the desync state of the database to Python, therefore it might be a good idea not to do any business critical actions between the bulk action and the resync. This can be ensured by placing everything under a transaction:

new_value = ...
with transaction.atomic():
    OtherModel.objects.filter(some_condition).update(some_field=new_value)
    update_dependent(OtherModel.objects.filter(some_condition), update_fields=['some_field'])

Of course there is a catch in using update_dependent directly - bulk actions altering fk relations need another preparation step, if they are part of a computed field dependency as reverse relation:

class Parent(ComputedFieldsModel):
    @computed(models.IntegerField(), depends=[['children', ['parent']]])
    def number_of_children(self):
        return self.children.all().count()

class Child(models.Model):
    parent = models.ForeignKey(Parent, related_name='children', on_delete=models.CASCADE)

...
# moving children to new parent by some bulk action
with transaction.atomic():
    old = preupdate_dependent(Child.objects.filter(some_condition))
    Child.objects.filter(some_condition).update(parent=new_parent)
    update_dependent(Child.objects.filter(some_condition), old=old)

Here preupdate_dependent will collect Parent instances before the the bulk change. We can feed the old relations back to update_dependent with the old keyword, so parents, that just lost some children, will be updated as well.

But looking at the example code it is not quite obvious, when you have to do this, as the fact is hidden behind the related name in depends of some computed field elsewhere. Therefore django-computedfields exposes a map containing contributing fk relations:

from computedfields.models import get_contributing_fks
fk_map = get_contributing_fks()
fk_map[Child]   # outputs {'parent'}

# or programatically (done similar in pre_save signal hook for instance.save)
old = None
if model in fk_map:
    old = preupdate_dependent(model.objects...)
model.objects.your_bulk_action()
update_dependent(model.objects..., old=old)

Last but not least - it is also possible to do several bulk actions at once and use the preupdate_dependent_multi and update_dependent_multi pendants. Argument is a list of querysets reflecting the changing records.

Note

When using bulk actions and the update_dependent variants yourself, always make sure, that the given querysets correctly reflect the changeset made by the bulk action. If in doubt, expand the queryset to a superset to not miss records by accident. Special care is needed for bulk actions, that alter fk relations itself.

A note on raw SQL updates…

Technically it is also possible to resync computed fields with the help of update_dependent after updates done by raw SQL queries. For that feed a model queryset reflecting the table, optionally filtered by the altered pks, back to update_dependent. To further narrow down the triggered updates, set update_fields to altered field names (watch out to correctly translate db_column back to the ORM field name).

Complicated & Deep nested

or “How to stall the DMBS for sure”

So you really want to declare computed fields with dependencies like:

class X(ComputedFieldsModel):
    a = models.ForeignKey(OtherModel, ...)

    @computed(Field(..),
        depends=[
            ['a', ['a1', 'a2', ...]],
            ['a.b_reverse', ['b1', 'b2', ...]],
            ['a.b_reverse.c', ['c1', 'c2', ...]],
            ['a.b_reverse.c.d_reverse', ['d1', 'd2', ...]],
            [...]
        ],
        prefetch_related=[]     # HELP, what to put here?
    )
    def busy_is_better(self):
        # 1000+ lines of code following here
        ...

To make it short - yes that is possible as long as things are cycle-free. Should you do that - probably not.

django-computedfields might look like a hammer, but it should not turn all your database needs into a nail. Maybe look for some better suited tools crafted for reporting needs.