Django Patches for Efficient Database Access
In Do You Know What Your Database Is Doing, I mentioned that we had fixed several problems in Django that we discovered by using our query analyzer on Chesspark’s database logs. Here’s a list of the main tickets we’ve filed against Django. The relevant patches and discussion are attached to the tickets.
- Ticket #3460: psycopg2 backend uses wrong isolation level – This was the big one. The database should be used in autocommit mode for single queries, and transactions should be used where necessary. The python DB-API 2.0 mandates that the database wrapper automatically wrap transactions when not in autocommit mode, which leads to really inefficient database use when lots of single queries are used. This is pretty much how every web app works, so fixing this will give you quite a database speed boost.
- Ticket #3459: initializing queries called too many times – Django initialized the database by calling SET TIME ZONE and such for every query instead of only on connection setup. This patch is now in the official tree.
- Ticket #3575: iexact uses wrong sql – Django uses ILIKE instead of LOWER(col) = LOWER(‘blah’). This works fine, but ILIKE queries are unindexable.
- Ticket #4102: saving only changed fields – Django saves every field on a call to modelobj.save() even if only one changed. This creates nasty race conditions where two unrelated field updates can clobber each other.
- Ticket #3461: passing kwargs to the database wrapper – Django doesn’t pass cursor keyword args through to the database wrapper. You need to do this if you wish to use dict cursors in the psycopg2 backend.
Some of these patches are specific to the postgresql_psycopg2 backend, but there may be similar problems with other backends since the efficiency issue is really a problem baked into DB-API 2.0 (for convenience).
Filed under: code, django, python | 2 Comments