Upgrading auth.User - the profile approach

This proposal presents a "middle ground" approach to improving and refactoring auth.User, based around a new concept of "profiles". These profiles provide the main customization hook for the user model, but the user model itself stays concrete and cannot be replaced.

I call it a middle ground because it doesn't go as far as refactoring the whole auth app -- a laudable goal, but one that I believe will ultimately take far too long -- but goes a bit further than just fixing the most egregious errors (username length, for example).

The User model

This proposal vastly pare down the User model to the absolute bare minimum and defers all "user field additions" to a new profile system.

The new User model:

class User(models.Model):
    identifier = models.CharField(unique=True)
    password = models.CharField(default="!")

Points of interest:

The identifier field is an arbitrary identifier for the user. For the common case it's a username. However, I've avoided term username since this field can be used for anything -- including, most notably, an email address for the common case of login-by-email. [A better name, one that doesn't class with User.id, would be welcome!]
The identifier field is unique and indexed. This is to optomize for the common case: looking up a user by name/email/whatever. This does mean that different auth backends that don't have something like a username will still need to figure out and generate something to put in this field.
If possible, identifier should be an unbounded varchar (something supported by most (all?) dbs, but not by Django). If not, we'll make it varchar(512) or something. The idea is to support pretty much anything as a user identifier, leaving it up to each user to decide what's valid.
Password's the same -- if possible, make it unbounded to be as future-proof as possible. If not, we'll make it varchar(512) or something.
Since there's no validation on identifier, the profile system must allow individual profiles to contribute site-specific constraints. See below.
Why have an "identifier" at all? Why not just leave it up to the profiles? Most uses will have a primary "login identifier" -- username, email, URL, etc. -- and making that something that 3rd-party apps can depend on is probably good. Making it indexed means the common case -- look up user by identifier -- is as fast as possible.
Why have a password at all? Because if we don't, users will invent their own password management and storage, and that's a loaded gun pointed at their feet. However, password newly defaults to "!", which is the unusable password. Thus, if an auth backend doesn't use passwords, it can ignore the password field; the user object will automatically be marked as one that can't be auth'd by password.

Profiles

OK, so if User gets stripped, all the user data needs to go somewhere... that's where profile comes in. Don't think about AUTH_USER_PROFILE which is weaksauce; this proposed new profile system is a lot more powerful.

Here's what a profile looks like:

from django.contrib.auth.models import Profile

class MyProfile(Profile):
	first_name = models.CharField()
	last_name = models.CharField()
	homepage = models.URLField()

Looks pretty simple, and it is. It's just syntactic sugar for the following:

class MyProfile(models.Model):
	user = models.OneToOneField(User)
	...

That is, a Profile subclass is just a model with a one-to-one back to user.

HOWEVER, we can do a few other interesting things here:

Multiple profiles

First, User.get_profile() and AUTH_USER_PROFILE go die in a fire. See below for backwards-compatibility concerns.

Thus, it should be obvious that supporting multiple profiles is trivial. In fact, it's basically a requirement since the auth app is going to need to ship with a profile that includes all the legacy fields (permissions, groups, etc), and that clearly can't be the only profile. So multiple profile objects: fully supported.

Auto-creation of profiles

Right now, one problem with the profile pattern is that when users are created you've got to create the associated profile somehow or risk ProfileDoesNotExist errors. People work around this with post_save signals, User monkeypatches, etc.

The new auth system will auto-create each profile when a user is created. If new profiles are added later, those profile objects will be created lazily (when they're accessed for the first time).

This behavior can be disabled:

class MyProfile(Profile):
	...

	class Meta(object):
		auto_create = False

Extra user validation

Profiles may contribute extra validation to the User object. For example, let's say that for my site I want to enforce the thought that User.identifier is a valid email address (thus making the built-in login forms require emails to log in):

from django.core import validators

class MyProfile(Profile):
	...
	
	def validate_identifier(self):
		return validators.is_valid_email(self.user.identifier)

That is, we get a special callback, validate_identifier, that lets us contribute validation to identifier. This looks a bit like a model validator function, and that's the point. User will pick up this validation function in its own validation, and thus that'll get passed down to forms and errors will be displayed as appropriate.

Profile data access from User

There's two ways of accessing profile data given a user: directly through the one-to-one accessor, and indrectly through a user data bag.

Direct access is simple: since Profile is just syntactic suger for a one-to-one field, given a profile...

class MyProfile(Profile):
	name = models.CharField()

... you can access it as user.myprofile.name.

The accessor name can be overidden via a Meta option:

class MyProfile(Profile):
	...
	
	class Meta(object):
		related_name = 'myprof'

[Or, if this is deemed too magical, we could require users to manually specify the OneToOneField and provide related_name there.]

This method is explicit and obvious to anyone who understands that a profile is just a object with a one-to-one relation to user.

However, it requires the accessing code to know the name of the profile class providing a piece of data. This starts to fall apart when it comes to reusable apps: I should be able to write an app that has a requirement like "some profile must define a name field for this app to function." Thus, users expose a second interface for profile data: user.data. This is an object that exposes an amalgamated view onto all profile data and allows access to profile data without knowing exactly where it comes from.

For example, let's imagine two profiles:

class One(Profile):
	name = models.CharField()
	age = models.IntegerField()

class Two(Profile):
	name = models.CharField()
	phone = models.CharField()

And some data:

user.one.name = "Joe"
user.one.age = 17
user.two.name = "Joe Smith"
user.two.phone = "555-1212"

Let's play:

>>> user.data["age"]
	17

>>> user.data["phone"]
	"555-1212"

>>> user.data["spam"]
Traceback (most recent call last):
	...
	KeyError: spam

>>> user.data["name"]
	"Joe"

Notice that both profiles are collapsed. This means that if there's an overlapping name, I only get one profile's data back. Which one is arbitrary and undefined. If I want all the fields in the case of an overlap, I can use getlist:

>>> user.data.getlist("name")
	["Joe", "Joe Smith"]

[Possible extension: getdict, returning something like {"one": "Joe", "two": "Joe Smith"}]

Setting data works; however, "in the face of ambiguity, refuse the temptation to guess":

>>> user.data["age"] = 24
>>> user.one.age
	24

>>> user.data["name"] = "Joe"
	 Traceback (most recent call last):
	...
	KeyError: "name" overlaps on multiple profiles; use 
          `user.one.name = ...` or `user.two.name = ...`

For completeness, there needs to be user.data.save() which saves all profiles (or perhaps just modified ones, if we're being clever).

Performance optimization

One of the main criticisms I anticipate is that this approach introduces a potentially large performance hit. Code like this:

user = User.objects.get(...)
user.prof1.field
user.prof2.field
user.prof3.field

could end up doing 4 queries. This could be even worse if we go with the magic-attributes described above: those DB queries would be eventually hidden.

Luckily this is fairly easy to optimize to some extent: allow user queries to pre-join onto all profile fields. THat is, instead of SELECT * FROM user do SELECT user.*, prof1.* FROM user JOIN prof1. Since profiles all subclass Profile it's trivial to know which models to do this to.

This may be too much magic -- and may trigger the "JOINs are evil" myth -- so there's a decision point over whether this should be on by default -- i.e. whether User.objects.all() performs these joins -- or whether it should be selected (User.objects.select_related_profiles() or somesuch).

Auth backends

Auth backends continue to work almost exactly as they did before. Most notably: they'll still need to return an instance of django.contrib.auth.models.User. This'll probably anger some, but with luck the fact that User is so wide open will make this easier.

However, auth backends now can take profiles into account, which means that things'll like OpenID backends can have an OpenIDProfile and store the URL field there (or use the URL as the identifier, perhaps).

Forms

Under the new system, if you simple create a model form for user:

class UserForm(ModelForm):
	class Meta:
		model = User

... you'll get a field that only has identifier and password.

Since hacking ModelForm to bring in profile fields is a bad idea...

THIS PART TBD.

Backward compatibility

The big one, of course. First, AUTH_USER_PROFILE and User.get_profile() get deprecated and removed according to the normal process.

After that, there's two facets here; an easy one and a hard one. Let's do the easy one first:

The "default profile"

Many, many apps rely on existing user fields (user.is_staff, user.permissions, etc.) -- the admin for one! The fields need to stick around at least for the normal deprecation period, and possibly for longer. Thus, we'll ship with a DefaultProfile that includes all the old removed fields, and we'll include sugar such that user.username, user.is_staff, and all that stuff continues to use.

[We might want to come up with a better name than DefaultProfile. If we plan on deprecating the object, maybe LegacyProfile is more appropriate.]

If we choose to go with the magic profile attributes, these'll continue to work as long as the legacy profile is available. If not, those accessors can issues warnings according to our normal deprecation policy.

At some point, people may want to remove the default profile. Obviously some stuff won't work -- the admin, again -- but we should make it possible for people to disable the default profile if they wish. I see two ways we could do this:

Put the default profile in a different app (django.contrib.defaultprofile). Put that in INSTALLED_APPS by default, and let people disable it. The downside of this is existing projects: they won't have this app in INSTALLED_APPS, and thus people will have to add it, or we'll have to have a warning, or... something else yuck.
Yet Another Setting (USE_DEFAULT_PROFILE).

Model migration

This one's the big one: there has to be a model migration. I'm not tied to the solution below, but there are a couple of rules this process needs to follow:

This migration cannot block on getting schema migration into core. It'd be great if we could leverage the migration tools, but we can't block on that work.
Until the new auth behavior is switched on, Django 1.5 has to be 100% backwards compatible with 1.4. That is, we need something similar to the USE_TZ setting behavior: until you ask for the new features, you get the old behavior. This decouples upgrading Django from upgrading auth, and makes the whole upgrade process much less low-risk. If we don't do this, we're effectively requiring downtime for a schema migration from all our users, and that's not OK.

Given those rules, here's my plan:

Django 1.5 ships with the ability to run in two "modes": legacy user mode, and new user mode. There's no setting to switch modes: the mode is determined by looking at the database: if auth_user has an identifier field, then we're in new mode; otherwise we're in old.

In old mode, django.contrib.auth.User behaves much as it did before:

The auth_user table looks as it did before -- i.e. user.username and friends are real, concrete fields.
None of the special Profile handling runs (no auto-joins, etc). Profile objects still work 'cause they're just special cases of models, but no magic identifiers, no validation contribution, etc.
user.identifier exists as a proxy to username to ease forward motion, but it's just a property proxy.

The new mode gets all the new behavior, natch.

Queries

FIXME: User.objects.filter(email=...) has to continue working (with deprecation)

How to upgrade

A single command:

./manage.py upgrade_auth

(or whatever). This means we have to ship with a bunch of REALLY WELL TESTED, hand-rolled SQL for all the supported Django backends and versions. That'll be a pain to write, but see rule #1 above. This'll do something along the lines of:

CREATE TABLE auth_defaultprofile (first_name, last_name, ...);
INSERT INTO auth_defaultprofile (first_name, ...) 
	SELECT first_name, ... FROM auth_user;
ALTER TABLE auth_user DELETE COLUMN first_name;
...
ALTER TABLE auth_user RENAME username TO identifier;

This means that the upgrade process will look like this:

Upgrade your app to Django 1.5. Deploy. Note that everything behaves as it has in the past.
Run manage.py upgade_auth.
Restart the server (ew, sorry.)
Now start using all the new profile stuff.

Note that sycndb will create the new models, so new projects get the new stuff without upgrading.

Warnings, etc.

Fairly standard, but with a twist:

In Django 1.5, if you haven't yet issued an upgrade_auth, you'll get a deprecation warning when Django starts.
In Django 1.6, this'll be a louder warning.
In Django 1.7, upgrade_auth will still be there, but Django will now refuse to start if the upgrade hasn't run yet.
In Django 1.8, upgrade_auth is gone.

Summary and recommendations

In essence, this plan does the following:

Drop most fields off django.contrib.auth.User.
Introduce profiles as a way of annotating extra fields on User.
Provide an upgrade path.

Within this, there are a handful of decision points:

FIXME: I've made these decisions; move this stuff and justifications below to FAQ.

User.save(): call validate() by default?
Profile validation: does it get access to other profiles?
Profile access from User: magic (user.profile_field) or not?
Auto-join to profiles: always, on by default, or off by default?
The default profile: separate app, or controlled by a setting?

My recommendations:

Validate by default: yes.
Validate other profiles: no.
Magic profile attributes: yes (see the FAQ).
Auto-join: on by default, disable-able by individual profiles or on individual querysets.
Default profile: separate app.

FAQ

This is pretty light right now; once there some more Q's frequently A'd, I'll address 'em here.

Why not a swappable user model?

I'm convinced that such an idea is ultimately a bad idea: it allows apps exert action at a distance over other apps. It would allow the idea of a user to completely change without any warning simply by modifying a setting. Django did this in the past -- replaces_module -- and it was a nightmare. I'm strongly against re-introducing it.

Why magic user attributes?

Normally, I'm against that sort of magic. However, I'd like to be able to write apps that say user.name and not have to think about where name comes from. If we have magic attributes, I can simply say that my app depends on some profile contributing name. If we don't, I have to know the profile class that's contributing that field (i.e. user.someprofile.name). In this context, the magic is the lesser of two evils.

What happens if multiple profiles define the same field?

Really, it's PEBCAK: know what you're installing, and don't do that. If you do, an arbitrary one wins (last by INSTALLED_APPS order, probably). You can still access both fields because there's a one-to-one there, so even with two name fields user.prof1.name and user.prof2.name will still work.

[Alternatively, this could cause an error at model validation time, but I can see cases where allowing shadowed names is OK. Especially 'cause you can still get to both fields via the one to one field.]

jacobian/authuser.md

Select an option

No results found

Select an option

No results found

Upgrading auth.User - the profile approach

The User model

Profiles

Multiple profiles

Auto-creation of profiles

Extra user validation

Profile data access from User

Performance optimization

Auth backends

Forms

Backward compatibility

The "default profile"

Model migration

Queries

How to upgrade

Warnings, etc.

Summary and recommendations

FAQ

Why not a swappable user model?

Why magic user attributes?

What happens if multiple profiles define the same field?

dstufft commented Mar 30, 2012

Uh oh!

dstufft commented Mar 30, 2012

Uh oh!

akaariai commented Mar 30, 2012

Uh oh!

hvdklauw commented Apr 3, 2012

Uh oh!