SQLAlchemy 0.4 Documentation

Multiple Pages | One Page
Version: 0.4.7p1 Last Updated: 07/31/08 11:09:56

Table of Contents

   (view full table)

Table of Contents: Full

   (view brief table)

Overview

The SQLAlchemy SQL Toolkit and Object Relational Mapper is a comprehensive set of tools for working with databases and Python. It has several distinct areas of functionality which can be used individually or combined together. Its major API components, all public-facing, are illustrated below:

           +-----------------------------------------------------------+
           |             Object Relational Mapper (ORM)                |
           |                [tutorial]    [docs]                       |
           +-----------------------------------------------------------+
           +---------+ +------------------------------------+ +--------+
           |         | |       SQL Expression Language      | |        |
           |         | |        [tutorial]  [docs]          | |        |
           |         | +------------------------------------+ |        |
           |         +-----------------------+ +--------------+        |
           |        Dialect/Execution        | |    Schema Management  |
           |              [docs]             | |        [docs]         |
           +---------------------------------+ +-----------------------+
           +----------------------+ +----------------------------------+
           |  Connection Pooling  | |              Types               |
           |        [docs]        | |              [docs]              |
           +----------------------+ +----------------------------------+

Above, the two most significant front-facing portions of SQLAlchemy are the Object Relational Mapper and the SQL Expression Language. These are two separate toolkits, one building off the other. SQL Expressions can be used independently of the ORM. When using the ORM, the SQL Expression language is used to establish object-relational configurations as well as in querying.

back to section top

Tutorials

back to section top

Reference Documentation

back to section top

Installing SQLAlchemy

Installing SQLAlchemy from scratch is most easily achieved with setuptools. (setuptools installation). Just run this from the command-line:

# easy_install SQLAlchemy

This command will download the latest version of SQLAlchemy from the Python Cheese Shop and install it to your system.

Otherwise, you can install from the distribution using the setup.py script:

# python setup.py install

Installing a Database API

SQLAlchemy is designed to operate with a DB-API implementation built for a particular database, and includes support for the most popular databases:

back to section top

Checking the Installed SQLAlchemy Version

This documentation covers SQLAlchemy version 0.4. If you're working on a system that already has SQLAlchemy installed, check the version from your Python prompt like this:

>>> import sqlalchemy
>>> sqlalchemy.__version__ 
0.4.0
back to section top

0.3 to 0.4 Migration

From version 0.3 to version 0.4 of SQLAlchemy, some conventions have changed. Most of these conventions are available in the most recent releases of the 0.3 series starting with version 0.3.9, so that you can make a 0.3 application compatible with 0.4 in most cases.

This section will detail only those things that have changed in a backwards-incompatible manner. For a full overview of everything that's new and changed, see WhatsNewIn04.

ORM Package is now sqlalchemy.orm

All symbols related to the SQLAlchemy Object Relational Mapper, i.e. names like mapper(), relation(), backref(), create_session() synonym(), eagerload(), etc. are now only in the sqlalchemy.orm package, and not in sqlalchemy. So if you were previously importing everything on an asterisk:

from sqlalchemy import *

You should now import separately from orm:

from sqlalchemy import *
from sqlalchemy.orm import *

Or more commonly, just pull in the names you'll need:

from sqlalchemy import create_engine, MetaData, Table, Column, types
from sqlalchemy.orm import mapper, relation, backref, create_session
back to section top

BoundMetaData is now MetaData

The BoundMetaData name is removed. Now, you just use MetaData. Additionally, the engine parameter/attribute is now called bind, and connect() is deprecated:

# plain metadata
meta = MetaData()

# metadata bound to an engine
meta = MetaData(engine)

# bind metadata to an engine later
meta.bind = engine

Additionally, DynamicMetaData is now known as ThreadLocalMetaData.

back to section top

"Magic" Global MetaData removed

There was an old way to specify Table objects using an implicit, global MetaData object. To do this you'd omit the second positional argument, and specify Table('tablename', Column(...)). This no longer exists in 0.4 and the second MetaData positional argument is required, i.e. Table('tablename', meta, Column(...)).

back to section top

Some existing select() methods become generative

The methods correlate(), order_by(), and group_by() on the select() construct now return a new select object, and do not change the original one. Additionally, the generative methods where(), column(), distinct(), and several others have been added:

s = table.select().order_by(table.c.id).where(table.c.x==7)
result = engine.execute(s)
back to section top

collection_class behavior is changed

If you've been using the collection_class option on mapper(), the requirements for instrumented collections have changed. For an overview, see Alternate Collection Implementations.

back to section top

All "engine", "bind_to", "connectable" Keyword Arguments Changed to "bind"

This is for create/drop statements, sessions, SQL constructs, metadatas:

myengine = create_engine('sqlite://')

meta = MetaData(myengine)

meta2 = MetaData()
meta2.bind = myengine

session = create_session(bind=myengine)

statement = select([table], bind=myengine)

meta.create_all(bind=myengine)
back to section top

All "type" Keyword Arguments Changed to "type_"

This mostly applies to SQL constructs where you pass a type in:

s = select([mytable], mytable.c.x=bindparam(y, type_=DateTime))

func.now(type_=DateTime)
back to section top

Mapper Extensions must return EXT_CONTINUE to continue execution to the next mapper

If you extend the mapper, the methods in your mapper extension must return EXT_CONTINUE to continue executing additional mappers.

back to section top

In this tutorial we will cover a basic SQLAlchemy object-relational mapping scenario, where we store and retrieve Python objects from a database representation. The database schema will begin with one table, and will later develop into several. The tutorial is in doctest format, meaning each >>> line represents something you can type at a Python command prompt, and the following text represents the expected return value. The tutorial has no prerequisites.

Version Check

A quick check to verify that we are on at least version 0.4 of SQLAlchemy:

>>> import sqlalchemy
>>> sqlalchemy.__version__ 
0.4.0
back to section top

Connecting

For this tutorial we will use an in-memory-only SQLite database. This is an easy way to test things without needing to have an actual database defined anywhere. To connect we use create_engine():

>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///:memory:', echo=True)

The echo flag is a shortcut to setting up SQLAlchemy logging, which is accomplished via Python's standard logging module. With it enabled, we'll see all the generated SQL produced. If you are working through this tutorial and want less output generated, set it to False. This tutorial will format the SQL behind a popup window so it doesn't get in our way; just click the "SQL" links to see whats being generated.

back to section top

Define and Create a Table

Next we want to tell SQLAlchemy about our tables. We will start with just a single table called users, which will store records for the end-users using our application (lets assume it's a website). We define our tables all within a catalog called MetaData, using the Table construct, which resembles regular SQL CREATE TABLE syntax:

>>> from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey    
>>> metadata = MetaData()
>>> users_table = Table('users', metadata,
...     Column('id', Integer, primary_key=True),
...     Column('name', String(40)),
...     Column('fullname', String(100)),
...     Column('password', String(15))
... )

All about how to define Table objects, as well as how to create them from an existing database automatically, is described in Database Meta Data.

Next, to tell the MetaData we'd actually like to create our users_table for real inside the SQLite database, we use create_all(), passing it the engine instance which points to our database. This will check for the presence of a table first before creating, so it's safe to call multiple times:

sql>>> metadata.create_all(engine) 

So now our database is created, our initial schema is present, and our SQLAlchemy application knows all about the tables and columns in the database; this information is to be re-used by the Object Relational Mapper, as we'll see now.

back to section top

Define a Python Class to be Mapped

So lets create a rudimentary User object to be mapped in the database. This object will for starters have three attributes, name, fullname and password. It only need subclass Python's built-in object class (i.e. it's a new style class). We will give it a constructor so that it may conveniently be instantiated with its attributes at once, as well as a __repr__ method so that we can get a nice string representation of it:

>>> class User(object):
...     def __init__(self, name, fullname, password):
...         self.name = name
...         self.fullname = fullname
...         self.password = password
...
...     def __repr__(self):
...        return "<User('%s','%s', '%s')>" % (self.name, self.fullname, self.password)
back to section top

Setting up the Mapping

With our users_table and User class, we now want to map the two together. That's where the SQLAlchemy ORM package comes in. We'll use the mapper function to create a mapping between users_table and User:

>>> from sqlalchemy.orm import mapper
>>> mapper(User, users_table) 
<sqlalchemy.orm.mapper.Mapper object at 0x...>

The mapper() function creates a new Mapper object and stores it away for future reference. It also instruments the attributes on our User class, corresponding to the users_table table. The id, name, fullname, and password columns in our users_table are now instrumented upon our User class, meaning it will keep track of all changes to these attributes, and can save and load their values to/from the database. Lets create our first user, 'Ed Jones', and ensure that the object has all three of these attributes:

>>> ed_user = User('ed', 'Ed Jones', 'edspassword')
>>> ed_user.name
'ed'
>>> ed_user.password
'edspassword'
>>> str(ed_user.id)
'None'

What was that last id attribute? That was placed there by the Mapper, to track the value of the id column in the users_table. Since our User doesn't exist in the database, its id is None. When we save the object, it will get populated automatically with its new id.

back to section top

Too Verbose ? There are alternatives

The full set of steps to map a class, which are to define a Table, define a class, and then define a mapper(), are fairly verbose and for simple cases may appear overly disjoint. Most popular object relational products use the so-called "active record" approach, where the table definition and its class mapping are all defined at once. With SQLAlchemy, there are two excellent alternatives to its usual configuration which provide this approach:

With either declarative layer it's a good idea to be familiar with SQLAlchemy's "base" configurational style in any case. But now that we have our configuration started, we're ready to look at how to build sessions and query the database; this process is the same regardless of configurational style.

back to section top

Creating a Session

We're now ready to start talking to the database. The ORM's "handle" to the database is the Session. When we first set up the application, at the same level as our create_engine() statement, we define a second object called Session (or whatever you want to call it, create_session, etc.) which is configured by the sessionmaker() function. This function is configurational and need only be called once.

>>> from sqlalchemy.orm import sessionmaker
>>> Session = sessionmaker(bind=engine, autoflush=True, transactional=True)

In the case where your application does not yet have an Engine when you define your module-level objects, just set it up like this:

>>> Session = sessionmaker(autoflush=True, transactional=True)

Later, when you create your engine with create_engine(), connect it to the Session using configure():

>>> Session.configure(bind=engine)  # once engine is available

This Session class will create new Session objects which are bound to our database and have the transactional characteristics we've configured. Whenever you need to have a conversation with the database, you instantiate a Session:

>>> session = Session()

The above Session is associated with our SQLite engine, but it hasn't opened any connections yet. When it's first used, it retrieves a connection from a pool of connections maintained by the engine, and holds onto it until we commit all changes and/or close the session object. Because we configured transactional=True, there's also a transaction in progress (one notable exception to this is MySQL, when you use its default table style of MyISAM). There's options available to modify this behavior but we'll go with this straightforward version to start.

back to section top

Saving Objects

So saving our User is as easy as issuing save():

>>> session.save(ed_user)

But you'll notice nothing has happened yet. Well, lets pretend something did, and try to query for our user. This is done using the query() method on Session. We create a new query representing the set of all User objects first. Then we narrow the results by "filtering" down to the user we want; that is, the user whose name attribute is "ed". Finally we call first() which tells Query, "we'd like the first result in this list".

sql>>> session.query(User).filter_by(name='ed').first() 
<User('ed','Ed Jones', 'edspassword')>

And we get back our new user. If you view the generated SQL, you'll see that the Session issued an INSERT statement before querying. The Session stores whatever you put into it in memory, and at certain points it issues a flush, which issues SQL to the database to store all pending new objects and changes to existing objects. You can manually invoke the flush operation using flush(); however when the Session is configured to autoflush, it's usually not needed.

OK, let's do some more operations. We'll create and save three more users:

>>> session.save(User('wendy', 'Wendy Williams', 'foobar'))
>>> session.save(User('mary', 'Mary Contrary', 'xxg527'))
>>> session.save(User('fred', 'Fred Flinstone', 'blah'))

Also, Ed has already decided his password isn't too secure, so lets change it:

>>> ed_user.password = 'f8s7ccs'

Then we'll permanently store everything thats been changed and added to the database. We do this via commit():

sql>>> session.commit()

commit() flushes whatever remaining changes remain to the database, and commits the transaction. The connection resources referenced by the session are now returned to the connection pool. Subsequent operations with this session will occur in a new transaction, which will again re-acquire connection resources when first needed.

If we look at Ed's id attribute, which earlier was None, it now has a value:

>>> ed_user.id
1

After each INSERT operation, the Session assigns all newly generated ids and column defaults to the mapped object instance. For column defaults which are database-generated and are not part of the table's primary key, they'll be loaded when you first reference the attribute on the instance.

One crucial thing to note about the Session is that each object instance is cached within the Session, based on its primary key identifier. The reason for this cache is not as much for performance as it is for maintaining an identity map of instances. This map guarantees that whenever you work with a particular User object in a session, you always get the same instance back. As below, reloading Ed gives us the same instance back:

sql>>> ed_user is session.query(User).filter_by(name='ed').one() 
True

The get() method, which queries based on primary key, will not issue any SQL to the database if the given key is already present:

>>> ed_user is session.query(User).get(ed_user.id)
True
back to section top

Querying

A whirlwind tour through querying.

A Query is created from the Session, relative to a particular class we wish to load.

>>> query = session.query(User)

Once we have a query, we can start loading objects. The Query object, when first created, represents all the instances of its main class. You can iterate through it directly:

sql>>> for user in session.query(User):
...     print user.name
ed
wendy
mary
fred

...and the SQL will be issued at the point where the query is evaluated as a list. If you apply array slices before iterating, LIMIT and OFFSET are applied to the query:

sql>>> for u in session.query(User)[1:3]: 
...    print u
<User('wendy','Wendy Williams', 'foobar')>
<User('mary','Mary Contrary', 'xxg527')>

Narrowing the results down is accomplished either with filter_by(), which uses keyword arguments:

sql>>> for user in session.query(User).filter_by(name='ed', fullname='Ed Jones'):
...    print user
<User('ed','Ed Jones', 'f8s7ccs')>

...or filter(), which uses SQL expression language constructs. These allow you to use regular Python operators with the class-level attributes on your mapped class:

sql>>> for user in session.query(User).filter(User.name=='ed'):
...    print user
<User('ed','Ed Jones', 'f8s7ccs')>

You can also use the Column constructs attached to the users_table object to construct SQL expressions:

sql>>> for user in session.query(User).filter(users_table.c.name=='ed'):
...    print user
<User('ed','Ed Jones', 'f8s7ccs')>

Most common SQL operators are available, such as LIKE:

sql>>> session.query(User).filter(User.name.like('%ed'))[1] 
<User('fred','Fred Flinstone', 'blah')>

Note above our array index of 1 placed the appropriate LIMIT/OFFSET and returned a scalar result immediately.

The all(), one(), and first() methods immediately issue SQL without using an iterative context or array index. all() returns a list:

>>> query = session.query(User).filter(User.name.like('%ed'))

sql>>> query.all()
[<User('ed','Ed Jones', 'f8s7ccs')>, <User('fred','Fred Flinstone', 'blah')>]

first() applies a limit of one and returns the first result as a scalar:

sql>>> query.first()
<User('ed','Ed Jones', 'f8s7ccs')>

and one(), applies a limit of two, and if not exactly one row returned (no more, no less), raises an error:

sql>>> try:  
...     user = query.one() 
... except Exception, e: 
...     print e
Multiple rows returned for one()

All Query methods that don't return a result instead return a new Query object, with modifications applied. Therefore you can call many query methods successively to build up the criterion you want:

sql>>> session.query(User).filter(User.id<2).filter_by(name='ed').\
...     filter(User.fullname=='Ed Jones').all()
[<User('ed','Ed Jones', 'f8s7ccs')>]

If you need to use other conjunctions besides AND, all SQL conjunctions are available explicitly within expressions, such as and_() and or_(), when using filter():

>>> from sqlalchemy import and_, or_

sql>>> session.query(User).filter(
...    and_(User.id<224, or_(User.name=='ed', User.name=='wendy'))
...    ).all()
[<User('ed','Ed Jones', 'f8s7ccs')>, <User('wendy','Wendy Williams', 'foobar')>]

You also have full ability to use literal strings to construct SQL. For a single criterion, use a string with filter():

sql>>> for user in session.query(User).filter("id<224").all():
...     print user.name
ed
wendy
mary
fred

Bind parameters can be specified with string-based SQL, using a colon. To specify the values, use the params() method:

sql>>> session.query(User).filter("id<:value and name=:name").\
...     params(value=224, name='fred').one() 
<User('fred','Fred Flinstone', 'blah')>

Note that when we use constructed SQL expressions, bind parameters are generated for us automatically; we don't need to worry about them.

To use an entirely string-based statement, using from_statement(); just ensure that the columns clause of the statement contains the column names normally used by the mapper (below illustrated using an asterisk):

sql>>> session.query(User).from_statement("SELECT * FROM users where name=:name").params(name='ed').all()
[<User('ed','Ed Jones', 'f8s7ccs')>]

from_statement() can also accomodate full select() constructs. These are described in the SQL Expression Language Tutorial:

>>> from sqlalchemy import select, func

sql>>> session.query(User).from_statement(
...     select(
...            [users_table], 
...            select([func.max(users_table.c.name)]).label('maxuser')==users_table.c.name) 
...    ).all() 
[<User('wendy','Wendy Williams', 'foobar')>]

There's also a way to combine scalar results with objects, using add_column(). This is often used for functions and aggregates. When add_column() (or its cousin add_entity(), described later) is used, tuples are returned:

sql>>> for r in session.query(User).\
...     add_column(select([func.max(users_table.c.name)]).label('maxuser')):
...     print r 
(<User('ed','Ed Jones', 'f8s7ccs')>, u'wendy')
(<User('wendy','Wendy Williams', 'foobar')>, u'wendy')
(<User('mary','Mary Contrary', 'xxg527')>, u'wendy')
(<User('fred','Fred Flinstone', 'blah')>, u'wendy')
back to section top

Building a One-to-Many Relation

We've spent a lot of time dealing with just one class, and one table. Let's now look at how SQLAlchemy deals with two tables, which have a relationship to each other. Let's say that the users in our system also can store any number of email addresses associated with their username. This implies a basic one to many association from the users_table to a new table which stores email addresses, which we will call addresses. We will also create a relationship between this new table to the users table, using a ForeignKey:

>>> from sqlalchemy import ForeignKey

>>> addresses_table = Table('addresses', metadata, 
...     Column('id', Integer, primary_key=True),
...     Column('email_address', String(100), nullable=False),
...     Column('user_id', Integer, ForeignKey('users.id')))

Another call to create_all() will skip over our users table and build just the new addresses table:

sql>>> metadata.create_all(engine) 

For our ORM setup, we're going to start all over again. We will first close out our Session and clear all Mapper objects:

>>> from sqlalchemy.orm import clear_mappers
>>> session.close()
>>> clear_mappers()

Our User class, still around, reverts to being just a plain old class. Lets create an Address class to represent a user's email address:

>>> class Address(object):
...     def __init__(self, email_address):
...         self.email_address = email_address
...
...     def __repr__(self):
...         return "<Address('%s')>" % self.email_address

Now comes the fun part. We define a mapper for each class, and associate them using a function called relation(). We can define each mapper in any order we want:

>>> from sqlalchemy.orm import relation

>>> mapper(User, users_table, properties={    
...     'addresses':relation(Address, backref='user')
... })
<sqlalchemy.orm.mapper.Mapper object at 0x...>

>>> mapper(Address, addresses_table) 
<sqlalchemy.orm.mapper.Mapper object at 0x...>

Above, the new thing we see is that User has defined a relation named addresses, which will reference a list of Address objects. How does it know it's a list ? SQLAlchemy figures it out for you, based on the foreign key relationship between users_table and addresses_table.

back to section top

Working with Related Objects and Backreferences

Now when we create a User, it automatically has this collection present:

>>> jack = User('jack', 'Jack Bean', 'gjffdd')
>>> jack.addresses
[]

We are free to add Address objects, and the session will take care of everything for us.

>>> jack.addresses.append(Address(email_address='jack@google.com'))
>>> jack.addresses.append(Address(email_address='j25@yahoo.com'))

Before we save into the Session, lets examine one other thing that's happened here. The addresses collection is present on our User because we added a relation() with that name. But also within the relation() function is the keyword backref. This keyword indicates that we wish to make a bi-directional relationship. What this basically means is that not only did we generate a one-to-many relationship called addresses on the User class, we also generated a many-to-one relationship on the Address class. This relationship is self-updating, without any data being flushed to the database, as we can see on one of Jack's addresses:

>>> jack.addresses[1]
<Address('j25@yahoo.com')>

>>> jack.addresses[1].user
<User('jack','Jack Bean', 'gjffdd')>

Let's save into the session, then close out the session and create a new one...so that we can see how Jack and his email addresses come back to us:

>>> session.save(jack)
sql>>> session.commit()
>>> session = Session()

Querying for Jack, we get just Jack back. No SQL is yet issued for for Jack's addresses:

sql>>> jack = session.query(User).filter_by(name='jack').one()
>>> jack
<User('jack','Jack Bean', 'gjffdd')>

Let's look at the addresses collection. Watch the SQL:

sql>>> jack.addresses
[<Address('jack@google.com')>, <Address('j25@yahoo.com')>]

When we accessed the addresses collection, SQL was suddenly issued. This is an example of a lazy loading relation.

If you want to reduce the number of queries (dramatically, in many cases), we can apply an eager load to the query operation. We clear out the session to ensure that a full reload occurs:

>>> session.clear()

Then apply an option to the query, indicating that we'd like addresses to load "eagerly". SQLAlchemy then constructs a join between the users and addresses tables:

>>> from sqlalchemy.orm import eagerload

sql>>> jack = session.query(User).options(eagerload('addresses')).filter_by(name='jack').one() 
>>> jack
<User('jack','Jack Bean', 'gjffdd')>

>>> jack.addresses
[<Address('jack@google.com')>, <Address('j25@yahoo.com')>]

If you think that query is elaborate, it is ! But SQLAlchemy is just getting started. Note that when using eager loading, nothing changes as far as the ultimate results returned. The "loading strategy", as it's called, is designed to be completely transparent in all cases, and is for optimization purposes only. Any query criterion you use to load objects, including ordering, limiting, other joins, etc., should return identical results regardless of the combination of lazily- and eagerly- loaded relationships present.

An eagerload targeting across multiple relations can use dot separated names:

query.options(eagerload('orders'), eagerload('orders.items'), eagerload('orders.items.keywords'))

To roll up the above three individual eagerload() calls into one, use eagerload_all():

query.options(eagerload_all('orders.items.keywords'))
back to section top

Querying with Joins

Which brings us to the next big topic. What if we want to create joins that do change the results ? For that, another Query tornado is coming....

One way to join two tables together is just to compose a SQL expression. Below we make one up using the id and user_id attributes on our mapped classes:

sql>>> session.query(User).filter(User.id==Address.user_id).\
...         filter(Address.email_address=='jack@google.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]

Or we can make a real JOIN construct; below we use the join() function available on Table to create a Join object, then tell the Query to use it as our FROM clause:

sql>>> session.query(User).select_from(users_table.join(addresses_table)).\
...         filter(Address.email_address=='jack@google.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]

Note that the join() construct has no problem figuring out the correct join condition between users_table and addresses_table..the ForeignKey we constructed says it all.

The easiest way to join is automatically, using the join() method on Query. Just give this method the path from A to B, using the name of a mapped relationship directly:

sql>>> session.query(User).join('addresses').\
...     filter(Address.email_address=='jack@google.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]

By "A to B", we mean a single relation name or a path of relations. In our case we only have User->addresses->Address configured, but if we had a setup like A->bars->B->bats->C->widgets->D, a join along all four entities would look like:

session.query(Foo).join(['bars', 'bats', 'widgets']).filter(...)

Each time join() is called on Query, the joinpoint of the query is moved to be that of the endpoint of the join. As above, when we joined from users_table to addresses_table, all subsequent criterion used by filter_by() are against the addresses table. When you join() again, the joinpoint starts back from the root. We can also backtrack to the beginning explicitly using reset_joinpoint(). This instruction will place the joinpoint back at the root users table, where subsequent filter_by() criterion are again against users:

sql>>> session.query(User).join('addresses').\
...     filter_by(email_address='jack@google.com').\
...     reset_joinpoint().filter_by(name='jack').all()
[<User('jack','Jack Bean', 'gjffdd')>]

In all cases, we can get the User and the matching Address objects back at the same time, by telling the session we want both. This returns the results as a list of tuples:

sql>>> session.query(User).add_entity(Address).join('addresses').\
...     filter(Address.email_address=='jack@google.com').all()
[(<User('jack','Jack Bean', 'gjffdd')>, <Address('jack@google.com')>)]

Another common scenario is the need to join on the same table more than once. For example, if we want to find a User who has two distinct email addresses, both jack@google.com as well as j25@yahoo.com, we need to join to the Addresses table twice. SQLAlchemy does provide Alias objects which can accomplish this; but far easier is just to tell join() to alias for you:

sql>>> session.query(User).\
...     join('addresses', aliased=True).filter(Address.email_address=='jack@google.com').\
...     join('addresses', aliased=True).filter(Address.email_address=='j25@yahoo.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]

The key thing which occurred above is that our SQL criterion were aliased as appropriate corresponding to the alias generated in the most recent join() call.

The next section describes some "higher level" operators, including any() and has(), which make patterns like joining to multiple aliases unnecessary in most cases.

Relation Operators

A summary of all operators usable on relations:

  • Filter on explicit column criterion, combined with a join. Column criterion can make usage of all supported SQL operators and expression constructs:

    sql>>> session.query(User).join('addresses').\
    ...    filter(Address.email_address=='jack@google.com').all()
    [<User('jack','Jack Bean', 'gjffdd')>]
    

    Criterion placed in filter() usually correspond to the last join() call; if the join was specified with aliased=True, class-level criterion against the join's target (or targets) will be appropriately aliased as well.

    sql>>> session.query(User).join('addresses', aliased=True).\
    ...    filter(Address.email_address=='jack@google.com').all()
    [<User('jack','Jack Bean', 'gjffdd')>]
    
  • Filter_by on key=value criterion, combined with a join. Same as filter() on column criterion except keyword arguments are used.

    sql>>> session.query(User).join('addresses').\
    ...    filter_by(email_address='jack@google.com').all()
    [<User('jack','Jack Bean', 'gjffdd')>]
    
  • Filter on explicit column criterion using any() (for collections) or has() (for scalar relations). This is a more succinct method than joining, as an EXISTS subquery is generated automatically. any() means, "find all parent items where any child item of its collection meets this criterion":

    sql>>> session.query(User).\
    ...    filter(User.addresses.any(Address.email_address=='jack@google.com')).all()
    [<User('jack','Jack Bean', 'gjffdd')>]
    

    has() means, "find all parent items where the child item meets this criterion":

    sql>>> session.query(Address).\
    ...    filter(Address.user.has(User.name=='jack')).all()
    [<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
    

    Both has() and any() also accept keyword arguments which are interpreted against the child classes' attributes:

    sql>>> session.query(User).\
    ...    filter(User.addresses.any(email_address='jack@google.com')).all()
    [<User('jack','Jack Bean', 'gjffdd')>]
    
  • Filter_by on instance identity criterion. When comparing to a related instance, filter_by() will in most cases not need to reference the child table, since a child instance already contains enough information with which to generate criterion against the parent table. filter_by() uses an equality comparison for all relationship types. For many-to-one and one-to-one, this represents all objects which reference the given child object:

    # locate a user
    sql>>> user = session.query(User).filter(User.name=='jack').one() 
    
    # use the user in a filter_by() expression
    sql>>> session.query(Address).filter_by(user=user).all()
    [<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
    

    For one-to-many and many-to-many, it represents all objects which contain the given child object in the related collection:

    # locate an address
    sql>>> address = session.query(Address).\
    ...    filter(Address.email_address=='jack@google.com').one() 
    ['jack@google.com']
    
    # use the address in a filter_by expression
    sql>>> session.query(User).filter_by(addresses=address).all()
    [<User('jack','Jack Bean', 'gjffdd')>]
    
  • Select instances with a particular parent. This is the "reverse" operation of filtering by instance identity criterion; the criterion is against a relation pointing to the desired class, instead of one pointing from it. This will utilize the same "optimized" query criterion, usually not requiring any joins:

    sql>>> session.query(Address).with_parent(user, property='addresses').all()
    [<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
    
  • Filter on a many-to-one/one-to-one instance identity criterion. The class-level == operator will act the same as filter_by() for a scalar relation:

    sql>>> session.query(Address).filter(Address.user==user).all()
    [<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
    

    whereas the != operator will generate a negated EXISTS clause:

    sql>>> session.query(Address).filter(Address.user!=user).all()
    []
    

    a comparison to None also generates an IS NULL clause for a many-to-one relation:

    sql>>> session.query(Address).filter(Address.user==None).all()
    []
    
  • Filter on a one-to-many instance identity criterion. The contains() operator returns all parent objects which contain the given object as one of its collection members:

    sql>>> session.query(User).filter(User.addresses.contains(address)).all()
    [<User('jack','Jack Bean', 'gjffdd')>]
    
  • Filter on a multiple one-to-many instance identity criterion. The == operator can be used with a collection-based attribute against a list of items, which will generate multiple EXISTS clauses:

    sql>>> addresses = session.query(Address).filter(Address.user==user).all()
    
    sql>>> session.query(User).filter(User.addresses == addresses).all()
    [<User('jack','Jack Bean', 'gjffdd')>]
    
back to section top

Deleting

Let's try to delete jack and see how that goes. We'll mark as deleted in the session, then we'll issue a count query to see that no rows remain:

>>> session.delete(jack)
sql>>> session.query(User).filter_by(name='jack').count() 
0

So far, so good. How about Jack's Address objects ?

sql>>> session.query(Address).filter(
...     Address.email_address.in_(['jack@google.com', 'j25@yahoo.com'])
...  ).count() 
2

Uh oh, they're still there ! Analyzing the flush SQL, we can see that the user_id column of each address was set to NULL, but the rows weren't deleted. SQLAlchemy doesn't assume that deletes cascade, you have to tell it so.

So let's rollback our work, and start fresh with new mappers that express the relationship the way we want:

sql>>> session.rollback()  # roll back the transaction
>>> session.clear() # clear the session
>>> clear_mappers() # clear mappers

We need to tell the addresses relation on User that we'd like session.delete() operations to cascade down to the child Address objects. Further, we also want Address objects which get detached from their parent User, whether or not the parent is deleted, to be deleted. For these behaviors we use two cascade options delete and delete-orphan, using the string-based cascade option to the relation() function:

>>> mapper(User, users_table, properties={    
...     'addresses':relation(Address, backref='user', cascade="all, delete, delete-orphan")
... })
<sqlalchemy.orm.mapper.Mapper object at 0x...>

>>> mapper(Address, addresses_table) 
<sqlalchemy.orm.mapper.Mapper object at 0x...>

Now when we load Jack, removing an address from his addresses collection will result in that Address being deleted:

# load Jack by primary key
sql>>> jack = session.query(User).get(jack.id)    

# remove one Address (lazy load fires off)
sql>>> del jack.addresses[1]  

# only one address remains
sql>>> session.query(Address).filter(
...     Address.email_address.in_(['jack@google.com', 'j25@yahoo.com'])
... ).count() 
1

Deleting Jack will delete both Jack and his remaining Address:

>>> session.delete(jack)

sql>>> session.commit()

sql>>> session.query(User).filter_by(name='jack').count() 
0

sql>>> session.query(Address).filter(
...    Address.email_address.in_(['jack@google.com', 'j25@yahoo.com'])
... ).count() 
0
back to section top

Building a Many To Many Relation

We're moving into the bonus round here, but lets show off a many-to-many relationship. We'll sneak in some other features too, just to take a tour. We'll make our application a blog application, where users can write BlogPosts, which have Keywords associated with them.

First some new tables:

>>> from sqlalchemy import Text
>>> post_table = Table('posts', metadata, 
...        Column('id', Integer, primary_key=True),
...        Column('user_id', Integer, ForeignKey('users.id')),
...        Column('headline', String(255), nullable=False),
...        Column('body', Text)
...        )

>>> post_keywords = Table('post_keywords', metadata,
...        Column('post_id', Integer, ForeignKey('posts.id')),
...        Column('keyword_id', Integer, ForeignKey('keywords.id')))

>>> keywords_table = Table('keywords', metadata,
...        Column('id', Integer, primary_key=True),
...        Column('keyword', String(50), nullable=False, unique=True))

sql>>> metadata.create_all(engine) 

Then some classes:

>>> class BlogPost(object):
...     def __init__(self, headline, body, author):
...         self.author = author
...         self.headline = headline
...         self.body = body
...     def __repr__(self):
...         return "BlogPost(%r, %r, %r)" % (self.headline, self.body, self.author)

>>> class Keyword(object):
...     def __init__(self, keyword):
...         self.keyword = keyword

And the mappers. BlogPost will reference User via its author attribute:

>>> from sqlalchemy.orm import backref

>>> mapper(Keyword, keywords_table) 
<sqlalchemy.orm.mapper.Mapper object at 0x...>

>>> mapper(BlogPost, post_table, properties={   
...    'author':relation(User, backref=backref('posts', lazy='dynamic')),
...    'keywords':relation(Keyword, secondary=post_keywords)
... }) 
<sqlalchemy.orm.mapper.Mapper object at 0x...>

There's three new things in the above mapper:

Usage is not too different from what we've been doing. Let's give Wendy some blog posts:

sql>>> wendy = session.query(User).filter_by(name='wendy').one()
>>> post = BlogPost("Wendy's Blog Post", "This is a test", wendy)
>>> session.save(post)

We're storing keywords uniquely in the database, but we know that we don't have any yet, so we can just create them:

>>> post.keywords.append(Keyword('wendy'))
>>> post.keywords.append(Keyword('firstpost'))

We can now look up all blog posts with the keyword 'firstpost'. We'll use a special collection operator any to locate "blog posts where any of its keywords has the keyword string 'firstpost'":

sql>>> session.query(BlogPost).filter(BlogPost.keywords.any(keyword='firstpost')).all()
[BlogPost("Wendy's Blog Post", 'This is a test', <User('wendy','Wendy Williams', 'foobar')>)]

If we want to look up just Wendy's posts, we can tell the query to narrow down to her as a parent:

sql>>> session.query(BlogPost).with_parent(wendy).\
... filter(BlogPost.keywords.any(keyword='firstpost')).all()
[BlogPost("Wendy's Blog Post", 'This is a test', <User('wendy','Wendy Williams', 'foobar')>)]

Or we can use Wendy's own posts relation, which is a "dynamic" relation, to query straight from there:

sql>>> wendy.posts.filter(BlogPost.keywords.any(keyword='firstpost')).all()
[BlogPost("Wendy's Blog Post", 'This is a test', <User('wendy','Wendy Williams', 'foobar')>)]
back to section top

Further Reference

Generated Documentation for Query: class Query(object)

ORM Generated Docs: module sqlalchemy.orm

Further information on mapping setups are in Mapper Configuration.

Further information on working with Sessions: Using the Session.

back to section top

This tutorial will cover SQLAlchemy SQL Expressions, which are Python constructs that represent SQL statements. The tutorial is in doctest format, meaning each >>> line represents something you can type at a Python command prompt, and the following text represents the expected return value. The tutorial has no prerequisites.

Version Check

A quick check to verify that we are on at least version 0.4 of SQLAlchemy:

>>> import sqlalchemy
>>> sqlalchemy.__version__ 
0.4.0
back to section top

Connecting

For this tutorial we will use an in-memory-only SQLite database. This is an easy way to test things without needing to have an actual database defined anywhere. To connect we use create_engine():

>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///:memory:', echo=True)

The echo flag is a shortcut to setting up SQLAlchemy logging, which is accomplished via Python's standard logging module. With it enabled, we'll see all the generated SQL produced. If you are working through this tutorial and want less output generated, set it to False. This tutorial will format the SQL behind a popup window so it doesn't get in our way; just click the "SQL" links to see whats being generated.

back to section top

Define and Create Tables

The SQL Expression Language constructs its expressions in most cases against table columns. In SQLAlchemy, a column is most often represented by an object called Column, and in all cases a Column is associated with a Table. A collection of Table objects and their associated child objects is referred to as database metadata. In this tutorial we will explicitly lay out several Table objects, but note that SA can also "import" whole sets of Table objects automatically from an existing database (this process is called table reflection).

We define our tables all within a catalog called MetaData, using the Table construct, which resembles regular SQL CREATE TABLE statements. We'll make two tables, one of which represents "users" in an application, and another which represents zero or more "email addreses" for each row in the "users" table:

>>> from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
>>> metadata = MetaData()
>>> users = Table('users', metadata,
...     Column('id', Integer, primary_key=True),
...     Column('name', String(40)),
...     Column('fullname', String(100)),
... )

>>> addresses = Table('addresses', metadata, 
...   Column('id', Integer, primary_key=True),
...   Column('user_id', None, ForeignKey('users.id')),
...   Column('email_address', String(50), nullable=False)
...  )

All about how to define Table objects, as well as how to create them from an existing database automatically, is described in Database Meta Data.

Next, to tell the MetaData we'd actually like to create our selection of tables for real inside the SQLite database, we use create_all(), passing it the engine instance which points to our database. This will check for the presence of each table first before creating, so it's safe to call multiple times:

sql>>> metadata.create_all(engine) 
back to section top

Insert Expressions

The first SQL expression we'll create is the Insert construct, which represents an INSERT statement. This is typically created relative to its target table:

>>> ins = users.insert()

To see a sample of the SQL this construct produces, use the str() function:

>>> str(ins)
'INSERT INTO users (id, name, fullname) VALUES (:id, :name, :fullname)'

Notice above that the INSERT statement names every column in the users table. This can be limited by using the values keyword, which establishes the VALUES clause of the INSERT explicitly:

>>> ins = users.insert(values={'name':'jack', 'fullname':'Jack Jones'})
>>> str(ins)
'INSERT INTO users (name, fullname) VALUES (:name, :fullname)'

Above, while the values keyword limited the VALUES clause to just two columns, the actual data we placed in values didn't get rendered into the string; instead we got named bind parameters. As it turns out, our data is stored within our Insert construct, but it typically only comes out when the statement is actually executed; since the data consists of literal values, SQLAlchemy automatically generates bind parameters for them. We can peek at this data for now by looking at the compiled form of the statement:

>>> ins.compile().params 
{'fullname': 'Jack Jones', 'name': 'jack'}
back to section top

Executing

The interesting part of an Insert is executing it. In this tutorial, we will generally focus on the most explicit method of executing a SQL construct, and later touch upon some "shortcut" ways to do it. The engine object we created is a repository for database connections capable of issuing SQL to the database. To acquire a connection, we use the connect() method:

>>> conn = engine.connect()
>>> conn 
<sqlalchemy.engine.base.Connection object at 0x...>

The Connection object represents an actively checked out DBAPI connection resource. Lets feed it our Insert object and see what happens:

>>> result = conn.execute(ins)
INSERT INTO users (name, fullname) VALUES (?, ?)
['jack', 'Jack Jones']
COMMIT

So the INSERT statement was now issued to the database. Although we got positional "qmark" bind parameters instead of "named" bind parameters in the output. How come ? Because when executed, the Connection used the SQLite dialect to help generate the statement; when we use the str() function, the statement isn't aware of this dialect, and falls back onto a default which uses named parameters. We can view this manually as follows:

>>> ins.bind = engine
>>> str(ins)
'INSERT INTO users (name, fullname) VALUES (?, ?)'

What about the result variable we got when we called execute() ? As the SQLAlchemy Connection object references a DBAPI connection, the result, known as a ResultProxy object, is analogous to the DBAPI cursor object. In the case of an INSERT, we can get important information from it, such as the primary key values which were generated from our statement:

>>> result.last_inserted_ids()
[1]

The value of 1 was automatically generated by SQLite, but only because we did not specify the id column in our Insert statement; otherwise, our explicit value would have been used. In either case, SQLAlchemy always knows how to get at a newly generated primary key value, even though the method of generating them is different across different databases; each databases' Dialect knows the specific steps needed to determine the correct value (or values; note that last_inserted_ids() returns a list so that it supports composite primary keys).

back to section top

Executing Multiple Statements

Our insert example above was intentionally a little drawn out to show some various behaviors of expression language constructs. In the usual case, an Insert statement is usually compiled against the parameters sent to the execute() method on Connection, so that there's no need to use the values keyword with Insert. Lets create a generic Insert statement again and use it in the "normal" way:

>>> ins = users.insert()
>>> conn.execute(ins, id=2, name='wendy', fullname='Wendy Williams') 
INSERT INTO users (id, name, fullname) VALUES (?, ?, ?)
[2, 'wendy', 'Wendy Williams']
COMMIT

<sqlalchemy.engine.base.ResultProxy object at 0x...>

Above, because we specified all three columns in the the execute() method, the compiled Insert included all three columns. The Insert statement is compiled at execution time based on the parameters we specified; if we specified fewer parameters, the Insert would have fewer entries in its VALUES clause.

To issue many inserts using DBAPI's executemany() method, we can send in a list of dictionaries each containing a distinct set of parameters to be inserted, as we do here to add some email addresses:

>>> conn.execute(addresses.insert(), [ 
...    {'user_id': 1, 'email_address' : 'jack@yahoo.com'},
...    {'user_id': 1, 'email_address' : 'jack@msn.com'},
...    {'user_id': 2, 'email_address' : 'www@www.org'},
...    {'user_id': 2, 'email_address' : 'wendy@aol.com'},
... ])
INSERT INTO addresses (user_id, email_address) VALUES (?, ?)
[[1, 'jack@yahoo.com'], [1, 'jack@msn.com'], [2, 'www@www.org'], [2, 'wendy@aol.com']]
COMMIT

<sqlalchemy.engine.base.ResultProxy object at 0x...>

Above, we again relied upon SQLite's automatic generation of primary key identifiers for each addresses row.

When executing multiple sets of parameters, each dictionary must have the same set of keys; i.e. you cant have fewer keys in some dictionaries than others. This is because the Insert statement is compiled against the first dictionary in the list, and it's assumed that all subsequent argument dictionaries are compatible with that statement.

back to section top

Connectionless / Implicit Execution

We're executing our Insert using a Connection. There's two options that allow you to not have to deal with the connection part. You can execute in the connectionless style, using the engine, which opens and closes a connection for you:

sql>>> result = engine.execute(users.insert(), name='fred', fullname="Fred Flintstone")

and you can save even more steps than that, if you connect the Engine to the MetaData object we created earlier. When this is done, all SQL expressions which involve tables within the MetaData object will be automatically bound to the Engine. In this case, we call it implicit execution:

>>> metadata.bind = engine
sql>>> result = users.insert().execute(name="mary", fullname="Mary Contrary")

When the MetaData is bound, statements will also compile against the engine's dialect. Since a lot of the examples here assume the default dialect, we'll detach the engine from the metadata which we just attached:

>>> metadata.bind = None

Detailed examples of connectionless and implicit execution are available in the "Engines" chapter: Connectionless Execution, Implicit Execution.

back to section top

Selecting

We began with inserts just so that our test database had some data in it. The more interesting part of the data is selecting it ! We'll cover UPDATE and DELETE statements later. The primary construct used to generate SELECT statements is the select() function:

>>> from sqlalchemy.sql import select
>>> s = select([users])
>>> result = conn.execute(s)
SELECT users.id, users.name, users.fullname
FROM users
[]

Above, we issued a basic select() call, placing the users table within the COLUMNS clause of the select, and then executing. SQLAlchemy expanded the users table into the set of each of its columns, and also generated a FROM clause for us. The result returned is again a ResultProxy object, which acts much like a DBAPI cursor, including methods such as fetchone() and fetchall(). The easiest way to get rows from it is to just iterate:

>>> for row in result:
...     print row
(1, u'jack', u'Jack Jones')
(2, u'wendy', u'Wendy Williams')
(3, u'fred', u'Fred Flintstone')
(4, u'mary', u'Mary Contrary')

Above, we see that printing each row produces a simple tuple-like result. We have more options at accessing the data in each row. One very common way is through dictionary access, using the string names of columns:

sql>>> result = conn.execute(s)
>>> row = result.fetchone()
>>> print "name:", row['name'], "; fullname:", row['fullname']
name: jack ; fullname: Jack Jones

Integer indexes work as well:

>>> row = result.fetchone()
>>> print "name:", row[1], "; fullname:", row[2]
name: wendy ; fullname: Wendy Williams

But another way, whose usefulness will become apparent later on, is to use the Column objects directly as keys:

sql>>> for row in conn.execute(s):
...     print "name:", row[users.c.name], "; fullname:", row[users.c.fullname]
name: jack ; fullname: Jack Jones
name: wendy ; fullname: Wendy Williams
name: fred ; fullname: Fred Flintstone
name: mary ; fullname: Mary Contrary

Result sets which have pending rows remaining should be explicitly closed before discarding. While the resources referenced by the ResultProxy will be closed when the object is garbage collected, it's better to make it explicit as some database APIs are very picky about such things:

>>> result.close()

If we'd like to more carefully control the columns which are placed in the COLUMNS clause of the select, we reference individual Column objects from our Table. These are available as named attributes off the c attribute of the Table object:

>>> s = select([users.c.name, users.c.fullname])
sql>>> result = conn.execute(s)
>>> for row in result:  
...     print row
(u'jack', u'Jack Jones')
(u'wendy', u'Wendy Williams')
(u'fred', u'Fred Flintstone')
(u'mary', u'Mary Contrary')

Lets observe something interesting about the FROM clause. Whereas the generated statement contains two distinct sections, a "SELECT columns" part and a "FROM table" part, our select() construct only has a list containing columns. How does this work ? Let's try putting two tables into our select() statement:

sql>>> for row in conn.execute(select([users, addresses])):
...     print row
(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com')
(1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com')
(1, u'jack', u'Jack Jones', 3, 2, u'www@www.org')
(1, u'jack', u'Jack Jones', 4, 2, u'wendy@aol.com')
(2, u'wendy', u'Wendy Williams', 1, 1, u'jack@yahoo.com')
(2, u'wendy', u'Wendy Williams', 2, 1, u'jack@msn.com')
(2, u'wendy', u'Wendy Williams', 3, 2, u'www@www.org')
(2, u'wendy', u'Wendy Williams', 4, 2, u'wendy@aol.com')
(3, u'fred', u'Fred Flintstone', 1, 1, u'jack@yahoo.com')
(3, u'fred', u'Fred Flintstone', 2, 1, u'jack@msn.com')
(3, u'fred', u'Fred Flintstone', 3, 2, u'www@www.org')
(3, u'fred', u'Fred Flintstone', 4, 2, u'wendy@aol.com')
(4, u'mary', u'Mary Contrary', 1, 1, u'jack@yahoo.com')
(4, u'mary', u'Mary Contrary', 2, 1, u'jack@msn.com')
(4, u'mary', u'Mary Contrary', 3, 2, u'www@www.org')
(4, u'mary', u'Mary Contrary', 4, 2, u'wendy@aol.com')

It placed both tables into the FROM clause. But also, it made a real mess. Those who are familiar with SQL joins know that this is a Cartesian product; each row from the users table is produced against each row from the addresses table. So to put some sanity into this statement, we need a WHERE clause. Which brings us to the second argument of select():

>>> s = select([users, addresses], users.c.id==addresses.c.user_id)
sql>>> for row in conn.execute(s):
...     print row
(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com')
(1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com')
(2, u'wendy', u'Wendy Williams', 3, 2, u'www@www.org')
(2, u'wendy', u'Wendy Williams', 4, 2, u'wendy@aol.com')

So that looks a lot better, we added an expression to our select() which had the effect of adding WHERE users.id = addresses.user_id to our statement, and our results were managed down so that the join of users and addresses rows made sense. But let's look at that expression? It's using just a Python equality operator between two different Column objects. It should be clear that something is up. Saying 1==1 produces True, and 1==2 produces False, not a WHERE clause. So lets see exactly what that expression is doing:

>>> users.c.id==addresses.c.user_id 
<sqlalchemy.sql.expression._BinaryExpression object at 0x...>

Wow, surprise ! This is neither a True nor a False. Well what is it ?

>>> str(users.c.id==addresses.c.user_id)
'users.id = addresses.user_id'

As you can see, the == operator is producing an object that is very much like the Insert and select() objects we've made so far, thanks to Python's __eq__() builtin; you call str() on it and it produces SQL. By now, one can that everything we are working with is ultimately the same type of object. SQLAlchemy terms the base class of all of these expressions as sqlalchemy.sql.ClauseElement.

back to section top

Operators

Since we've stumbled upon SQLAlchemy's operator paradigm, let's go through some of its capabilities. We've seen how to equate two columns to each other:

>>> print users.c.id==addresses.c.user_id
users.id = addresses.user_id

If we use a literal value (a literal meaning, not a SQLAlchemy clause object), we get a bind parameter:

>>> print users.c.id==7
users.id = :users_id_1

The 7 literal is embedded in ClauseElement; we can use the same trick we did with the Insert object to see it:

>>> (users.c.id==7).compile().params
{'users_id_1': 7}

Most Python operators, as it turns out, produce a SQL expression here, like equals, not equals, etc.:

>>> print users.c.id != 7
users.id != :users_id_1

>>> # None converts to IS NULL
>>> print users.c.name == None
users.name IS NULL

>>> # reverse works too 
>>> print 'fred' > users.c.name
users.name < :users_name_1

If we add two integer columns together, we get an addition expression:

>>> print users.c.id + addresses.c.id
users.id + addresses.id

Interestingly, the type of the Column is important ! If we use + with two string based columns (recall we put types like Integer and String on our Column objects at the beginning), we get something different:

>>> print users.c.name + users.c.fullname
users.name || users.fullname

Where || is the string concatenation operator used on most databases. But not all of them. MySQL users, fear not:

>>> print (users.c.name + users.c.fullname).compile(bind=create_engine('mysql://'))
concat(users.name, users.fullname)

The above illustrates the SQL that's generated for an Engine that's connected to a MySQL database; the || operator now compiles as MySQL's concat() function.

If you have come across an operator which really isn't available, you can always use the op() method; this generates whatever operator you need:

>>> print users.c.name.op('tiddlywinks')('foo')
users.name tiddlywinks :users_name_1
back to section top

Conjunctions

We'd like to show off some of our operators inside of select() constructs. But we need to lump them together a little more, so let's first introduce some conjunctions. Conjunctions are those little words like AND and OR that put things together. We'll also hit upon NOT. AND, OR and NOT can work from the corresponding functions SQLAlchemy provides (notice we also throw in a LIKE):

>>> from sqlalchemy.sql import and_, or_, not_
>>> print and_(users.c.name.like('j%'), users.c.id==addresses.c.user_id, 
...     or_(addresses.c.email_address=='wendy@aol.com', addresses.c.email_address=='jack@yahoo.com'),
...     not_(users.c.id>5))
users.name LIKE :users_name_1 AND users.id = addresses.user_id AND 
(addresses.email_address = :addresses_email_address_1 OR addresses.email_address = :addresses_email_address_2) 
AND users.id <= :users_id_1

And you can also use the re-jiggered bitwise AND, OR and NOT operators, although because of Python operator precedence you have to watch your parenthesis:

>>> print users.c.name.like('j%') & (users.c.id==addresses.c.user_id) &  \
...     ((addresses.c.email_address=='wendy@aol.com') | (addresses.c.email_address=='jack@yahoo.com')) \
...     & ~(users.c.id>5) 
users.name LIKE :users_name_1 AND users.id = addresses.user_id AND 
(addresses.email_address = :addresses_email_address_1 OR addresses.email_address = :addresses_email_address_2) 
AND users.id <= :users_id_1

So with all of this vocabulary, let's select all users who have an email address at AOL or MSN, whose name starts with a letter between "m" and "z", and we'll also generate a column containing their full name combined with their email address. We will add two new constructs to this statement, between() and label(). between() produces a BETWEEN clause, and label() is used in a column expression to produce labels using the AS keyword; it's recommended when selecting from expressions that otherwise would not have a name:

>>> s = select([(users.c.fullname + ", " + addresses.c.email_address).label('title')], 
...        and_( 
...            users.c.id==addresses.c.user_id, 
...            users.c.name.between('m', 'z'), 
...           or_(
...              addresses.c.email_address.like('%@aol.com'), 
...              addresses.c.email_address.like('%@msn.com')
...           )
...        )
...    ) 
>>> print conn.execute(s).fetchall() 
SELECT users.fullname || ? || addresses.email_address AS title 
FROM users, addresses 
WHERE users.id = addresses.user_id AND users.name BETWEEN ? AND ? AND 
(addresses.email_address LIKE ? OR addresses.email_address LIKE ?)
[', ', 'm', 'z', '%@aol.com', '%@msn.com']
[(u'Wendy Williams, wendy@aol.com',)]

Once again, SQLAlchemy figured out the FROM clause for our statement. In fact it will determine the FROM clause based on all of its other bits; the columns clause, the whereclause, and also some other elements which we haven't covered yet, which include ORDER BY, GROUP BY, and HAVING.

back to section top

Using Text

Our last example really became a handful to type. Going from what one understands to be a textual SQL expression into a Python construct which groups components together in a programmatic style can be hard. That's why SQLAlchemy lets you just use strings too. The text() construct represents any textual statement. To use bind parameters with text(), always use the named colon format. Such as below, we create a text() and execute it, feeding in the bind parameters to the execute() method:

>>> from sqlalchemy.sql import text
>>> s = text("""SELECT users.fullname || ', ' || addresses.email_address AS title 
...            FROM users, addresses 
...            WHERE users.id = addresses.user_id AND users.name BETWEEN :x AND :y AND 
...            (addresses.email_address LIKE :e1 OR addresses.email_address LIKE :e2)
...        """)
sql>>> print conn.execute(s, x='m', y='z', e1='%@aol.com', e2='%@msn.com').fetchall() 
[(u'Wendy Williams, wendy@aol.com',)]

To gain a "hybrid" approach, any of SA's SQL constructs can have text freely intermingled wherever you like - the text() construct can be placed within any other ClauseElement construct, and when used in a non-operator context, a direct string may be placed which converts to text() automatically. Below we combine the usage of text() and strings with our constructed select() object, by using the select() object to structure the statement, and the text()/strings to provide all the content within the structure. For this example, SQLAlchemy is not given any Column or Table objects in any of its expressions, so it cannot generate a FROM clause. So we also give it the from_obj keyword argument, which is a list of ClauseElements (or strings) to be placed within the FROM clause:

>>> s = select([text("users.fullname || ', ' || addresses.email_address AS title")], 
...        and_( 
...            "users.id = addresses.user_id", 
...             "users.name BETWEEN 'm' AND 'z'",
...             "(addresses.email_address LIKE :x OR addresses.email_address LIKE :y)"
...        ),
...         from_obj=['users', 'addresses']
...    )
sql>>> print conn.execute(s, x='%@aol.com', y='%@msn.com').fetchall() 
[(u'Wendy Williams, wendy@aol.com',)]

Going from constructed SQL to text, we lose some capabilities. We lose the capability for SQLAlchemy to compile our expression to a specific target database; above, our expression won't work with MySQL since it has no || construct. It also becomes more tedious for SQLAlchemy to be made aware of the datatypes in use; for example, if our bind parameters required UTF-8 encoding before going in, or conversion from a Python datetime into a string (as is required with SQLite), we would have to add extra information to our text() construct. Similar issues arise on the result set side, where SQLAlchemy also performs type-specific data conversion in some cases; still more information can be added to text() to work around this. But what we really lose from our statement is the ability to manipulate it, transform it, and analyze it. These features are critical when using the ORM, which makes heavy usage of relational transformations. To show off what we mean, we'll first introduce the ALIAS construct and the JOIN construct, just so we have some juicier bits to play with.

back to section top

Using Aliases

The alias corresponds to a "renamed" version of a table or arbitrary relation, which occurs anytime you say "SELECT .. FROM sometable AS someothername". The AS creates a new name for the table. Aliases are super important in SQL as they allow you to reference the same table more than once. Scenarios where you need to do this include when you self-join a table to itself, or more commonly when you need to join from a parent table to a child table multiple times. For example, we know that our user jack has two email addresses. How can we locate jack based on the combination of those two addresses? We need to join twice to it. Let's construct two distinct aliases for the addresses table and join:

>>> a1 = addresses.alias('a1')
>>> a2 = addresses.alias('a2')
>>> s = select([users], and_(
...        users.c.id==a1.c.user_id, 
...        users.c.id==a2.c.user_id, 
...        a1.c.email_address=='jack@msn.com', 
...        a2.c.email_address=='jack@yahoo.com'
...   ))
sql>>> print conn.execute(s).fetchall()
[(1, u'jack', u'Jack Jones')]

Easy enough. One thing that we're going for with the SQL Expression Language is the melding of programmatic behavior with SQL generation. Coming up with names like a1 and a2 is messy; we really didn't need to use those names anywhere, it's just the database that needed them. Plus, we might write some code that uses alias objects that came from several different places, and it's difficult to ensure that they all have unique names. So instead, we just let SQLAlchemy make the names for us, using "anonymous" aliases:

>>> a1 = addresses.alias()
>>> a2 = addresses.alias()
>>> s = select([users], and_(
...        users.c.id==a1.c.user_id, 
...        users.c.id==a2.c.user_id, 
...        a1.c.email_address=='jack@msn.com', 
...        a2.c.email_address=='jack@yahoo.com'
...   ))
sql>>> print conn.execute(s).fetchall()
[(1, u'jack', u'Jack Jones')]

One super-huge advantage of anonymous aliases is that not only did we not have to guess up a random name, but we can also be guaranteed that the above SQL string is deterministically generated to be the same every time. This is important for databases such as Oracle which cache compiled "query plans" for their statements, and need to see the same SQL string in order to make use of it.

Aliases can of course be used for anything which you can SELECT from, including SELECT statements themselves. We can self-join the users table back to the select() we've created by making an alias of the entire statement. The correlate(None) directive is to avoid SQLAlchemy's attempt to "correlate" the inner users table with the outer one:

>>> a1 = s.correlate(None).alias()
>>> s = select([users.c.name], users.c.id==a1.c.id)
sql>>> print conn.execute(s).fetchall()
[(u'jack',)]
back to section top

Using Joins

We're halfway along to being able to construct any SELECT expression. The next cornerstone of the SELECT is the JOIN expression. We've already been doing joins in our examples, by just placing two tables in either the columns clause or the where clause of the select() construct. But if we want to make a real "JOIN" or "OUTERJOIN" construct, we use the join() and outerjoin() methods, most commonly accessed from the left table in the join:

>>> print users.join(addresses)
users JOIN addresses ON users.id = addresses.user_id

The alert reader will see more surprises; SQLAlchemy figured out how to JOIN the two tables ! The ON condition of the join, as it's called, was automatically generated based on the ForeignKey object which we placed on the addresses table way at the beginning of this tutorial. Already the join() construct is looking like a much better way to join tables.

Of course you can join on whatever expression you want, such as if we want to join on all users who use the same name in their email address as their username:

>>> print users.join(addresses, addresses.c.email_address.like(users.c.name + '%'))
users JOIN addresses ON addresses.email_address LIKE users.name || :users_name_1

When we create a select() construct, SQLAlchemy looks around at the tables we've mentioned and then places them in the FROM clause of the statement. When we use JOINs however, we know what FROM clause we want, so here we make usage of the from_obj keyword argument:

>>> s = select([users.c.fullname], from_obj=[
...    users.join(addresses, addresses.c.email_address.like(users.c.name + '%'))
...    ])
sql>>> print conn.execute(s).fetchall()
[(u'Jack Jones',), (u'Jack Jones',), (u'Wendy Williams',)]

The outerjoin() function just creates LEFT OUTER JOIN constructs. It's used just like join():

>>> s = select([users.c.fullname], from_obj=[users.outerjoin(addresses)])
>>> print s
SELECT users.fullname 
FROM users LEFT OUTER JOIN addresses ON users.id = addresses.user_id

That's the output outerjoin() produces, unless, of course, you're stuck in a gig using Oracle prior to version 9, and you've set up your engine (which would be using OracleDialect) to use Oracle-specific SQL:

>>> from sqlalchemy.databases.oracle import OracleDialect
>>> print s.compile(dialect=OracleDialect(use_ansi=False))
SELECT users.fullname 
FROM users, addresses 
WHERE users.id = addresses.user_id(+)

If you don't know what that SQL means, don't worry ! The secret tribe of Oracle DBAs don't want their black magic being found out ;).

back to section top

Intro to Generative Selects and Transformations

We've now gained the ability to construct very sophisticated statements. We can use all kinds of operators, table constructs, text, joins, and aliases. The point of all of this, as mentioned earlier, is not that it's an "easier" or "better" way to write SQL than just writing a SQL statement yourself; the point is that it's better for writing programmatically generated SQL which can be morphed and adapted as needed in automated scenarios.

To support this, the select() construct we've been working with supports piecemeal construction, in addition to the "all at once" method we've been doing. Suppose you're writing a search function, which receives criterion and then must construct a select from it. To accomplish this, upon each criterion encountered, you apply "generative" criterion to an existing select() construct with new elements, one at a time. We start with a basic select() constructed with the shortcut method available on the users table:

>>> query = users.select()
>>> print query
SELECT users.id, users.name, users.fullname 
FROM users

We encounter search criterion of "name='jack'". So we apply WHERE criterion stating such:

>>> query = query.where(users.c.name=='jack')

Next, we encounter that they'd like the results in descending order by full name. We apply ORDER BY, using an extra modifier desc:

>>> query = query.order_by(users.c.fullname.desc())

We also come across that they'd like only users who have an address at MSN. A quick way to tack this on is by using an EXISTS clause, which we correlate to the users table in the enclosing SELECT:

>>> from sqlalchemy.sql import exists
>>> query = query.where(
...    exists([addresses.c.id], 
...        and_(addresses.c.user_id==users.c.id, addresses.c.email_address.like('%@msn.com'))
...    ).correlate(users))

And finally, the application also wants to see the listing of email addresses at once; so to save queries, we outerjoin the addresses table (using an outer join so that users with no addresses come back as well; since we're programmatic, we might not have kept track that we used an EXISTS clause against the addresses table too...). Additionally, since the users and addresses table both have a column named id, let's isolate their names from each other in the COLUMNS clause by using labels:

>>> query = query.column(addresses).select_from(users.outerjoin(addresses)).apply_labels()

Let's bake for .0001 seconds and see what rises:

>>> conn.execute(query).fetchall()
SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, addresses.id AS addresses_id, addresses.user_id AS addresses_user_id, addresses.email_address AS addresses_email_address
FROM users LEFT OUTER JOIN addresses ON users.id = addresses.user_id
WHERE users.name = ? AND (EXISTS (SELECT addresses.id
FROM addresses
WHERE addresses.user_id = users.id AND addresses.email_address LIKE ?)) ORDER BY users.fullname DESC
['jack', '%@msn.com']

[(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com'), (1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com')]

So we started small, added one little thing at a time, and at the end we have a huge statement..which actually works. Now let's do one more thing; the searching function wants to add another email_address criterion on, however it doesn't want to construct an alias of the addresses table; suppose many parts of the application are written to deal specifically with the addresses table, and to change all those functions to support receiving an arbitrary alias of the address would be cumbersome. We can actually convert the addresses table within the existing statement to be an alias of itself, using replace_selectable():

>>> a1 = addresses.alias()
>>> query = query.replace_selectable(addresses, a1)
>>> print query
SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, addresses_1.id AS addresses_1_id, addresses_1.user_id AS addresses_1_user_id, addresses_1.email_address AS addresses_1_email_address
FROM users LEFT OUTER JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id
WHERE users.name = :users_name_1 AND (EXISTS (SELECT addresses_1.id
FROM addresses AS addresses_1
WHERE addresses_1.user_id = users.id AND addresses_1.email_address LIKE :addresses_email_address_1)) ORDER BY users.fullname DESC

One more thing though, with automatic labeling applied as well as anonymous aliasing, how do we retrieve the columns from the rows for this thing ? The label for the email_addresses column is now the generated name addresses_1_email_address; and in another statement might be something different ! This is where accessing by result columns by Column object becomes very useful:

 
sql>>> for row in conn.execute(query):
...     print "Name:", row[users.c.name], "; Email Address", row[a1.c.email_address]
Name: jack ; Email Address jack@yahoo.com
Name: jack ; Email Address jack@msn.com

The above example, by its end, got significantly more intense than the typical end-user constructed SQL will usually be. However when writing higher-level tools such as ORMs, they become more significant. SQLAlchemy's ORM relies very heavily on techniques like this.

back to section top

Everything Else

The concepts of creating SQL expressions have been introduced. What's left are more variants of the same themes. So now we'll catalog the rest of the important things we'll need to know.

Bind Parameter Objects

Throughout all these examples, SQLAlchemy is busy creating bind parameters wherever literal expressions occur. You can also specify your own bind parameters with your own names, and use the same statement repeatedly. The database dialect converts to the appropriate named or positional style, as here where it converts to positional for SQLite:

>>> from sqlalchemy.sql import bindparam
>>> s = users.select(users.c.name==bindparam('username'))
sql>>> conn.execute(s, username='wendy').fetchall()
[(2, u'wendy', u'Wendy Williams')]

Another important aspect of bind parameters is that they may be assigned a type. The type of the bind parameter will determine its behavior within expressions and also how the data bound to it is processed before being sent off to the database:

>>> s = users.select(users.c.name.like(bindparam('username', type_=String) + text("'%'")))
sql>>> conn.execute(s, username='wendy').fetchall()
[(2, u'wendy', u'Wendy Williams')]

Bind parameters of the same name can also be used multiple times, where only a single named value is needed in the execute parameters:

>>> s = select([users, addresses], 
...    users.c.name.like(bindparam('name', type_=String) + text("'%'")) | 
...    addresses.c.email_address.like(bindparam('name', type_=String) + text("'@%'")), 
...    from_obj=[users.outerjoin(addresses)])
sql>>> conn.execute(s, name='jack').fetchall()
[(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com'), (1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com')]
back to section top

Functions

SQL functions are created using the func keyword, which generates functions using attribute access:

>>> from sqlalchemy.sql import func
>>> print func.now()
now()

>>> print func.concat('x', 'y')
concat(:param_1, :param_2)

Certain functions are marked as "ANSI" functions, which mean they don't get the parenthesis added after them, such as CURRENT_TIMESTAMP:

>>> print func.current_timestamp()
CURRENT_TIMESTAMP

Functions are most typically used in the columns clause of a select statement, and can also be labeled as well as given a type. Labeling a function is recommended so that the result can be targeted in a result row based on a string name, and assigning it a type is required when you need result-set processing to occur, such as for Unicode conversion and date conversions. Below, we use the result function scalar() to just read the first column of the first row and then close the result; the label, even though present, is not important in this case:

>>> print conn.execute(
...     select([func.max(addresses.c.email_address, type_=String).label('maxemail')])
... ).scalar()
SELECT max(addresses.email_address) AS maxemail
FROM addresses
[]

www@www.org

Databases such as Postgres and Oracle which support functions that return whole result sets can be assembled into selectable units, which can be used in statements. Such as, a database function calculate() which takes the parameters x and y, and returns three columns which we'd like to name q, z and r, we can construct using "lexical" column objects as well as bind parameters:

>>> from sqlalchemy.sql import column
>>> calculate = select([column('q'), column('z'), column('r')], 
...     from_obj=[func.calculate(bindparam('x'), bindparam('y'))])

>>> print select([users], users.c.id > calculate.c.z)
SELECT users.id, users.name, users.fullname 
FROM users, (SELECT q, z, r 
FROM calculate(:x, :y)) 
WHERE users.id > z

If we wanted to use our calculate statement twice with different bind parameters, the unique_params() function will create copies for us, and mark the bind parameters as "unique" so that conflicting names are isolated. Note we also make two separate aliases of our selectable:

>>> s = select([users], users.c.id.between(
...    calculate.alias('c1').unique_params(x=17, y=45).c.z, 
...    calculate.alias('c2').unique_params(x=5, y=12).c.z))

>>> print s
SELECT users.id, users.name, users.fullname 
FROM users, (SELECT q, z, r 
FROM calculate(:x_1, :y_1)) AS c1, (SELECT q, z, r 
FROM calculate(:x_2, :y_2)) AS c2 
WHERE users.id BETWEEN c1.z AND c2.z

>>> s.compile().params
{'x_2': 5, 'y_2': 12, 'y_1': 45, 'x_1': 17}
back to section top

Unions and Other Set Operations

Unions come in two flavors, UNION and UNION ALL, which are available via module level functions:

>>> from sqlalchemy.sql import union
>>> u = union(
...     addresses.select(addresses.c.email_address=='foo@bar.com'),
...    addresses.select(addresses.c.email_address.like('%@yahoo.com')),
... ).order_by(addresses.c.email_address)

sql>>> print conn.execute(u).fetchall()
[(1, 1, u'jack@yahoo.com')]

Also available, though not supported on all databases, are intersect(), intersect_all(), except_(), and except_all():

>>> from sqlalchemy.sql import except_
>>> u = except_(
...    addresses.select(addresses.c.email_address.like('%@%.com')),
...    addresses.select(addresses.c.email_address.like('%@msn.com'))
... )

sql>>> print conn.execute(u).fetchall()
[(1, 1, u'jack@yahoo.com'), (4, 2, u'wendy@aol.com')]
back to section top

Scalar Selects

To embed a SELECT in a column expression, use as_scalar():

sql>>> print conn.execute(select([
...       users.c.name, 
...       select([func.count(addresses.c.id)], users.c.id==addresses.c.user_id).as_scalar()
...    ])).fetchall()
[(u'jack', 2), (u'wendy', 2), (u'fred', 0), (u'mary', 0)]

Alternatively, applying a label() to a select evaluates it as a scalar as well:

sql>>> print conn.execute(select([
...       users.c.name, 
...       select([func.count(addresses.c.id)], users.c.id==addresses.c.user_id).label('address_count')
...    ])).fetchall()
[(u'jack', 2), (u'wendy', 2), (u'fred', 0), (u'mary', 0)]
back to section top

Correlated Subqueries

Notice in the examples on "scalar selects", the FROM clause of each embedded select did not contain the users table in its FROM clause. This is because SQLAlchemy automatically attempts to correlate embedded FROM objects to that of an enclosing query. To disable this, or to specify explicit FROM clauses to be correlated, use correlate():

>>> s = select([users.c.name], users.c.id==select([users.c.id]).correlate(None))
>>> print s
SELECT users.name 
FROM users 
WHERE users.id = (SELECT users.id 
FROM users)

>>> s = select([users.c.name, addresses.c.email_address], users.c.id==
...        select([users.c.id], users.c.id==addresses.c.user_id).correlate(addresses)
...    )
>>> print s
SELECT users.name, addresses.email_address 
FROM users, addresses 
WHERE users.id = (SELECT users.id 
FROM users 
WHERE users.id = addresses.user_id)
back to section top

Ordering, Grouping, Limiting, Offset...ing...

The select() function can take keyword arguments order_by, group_by (as well as having), limit, and offset. There's also distinct=True. These are all also available as generative functions. order_by() expressions can use the modifiers asc() or desc() to indicate ascending or descending.

>>> s = select([addresses.c.user_id, func.count(addresses.c.id)]).\
...     group_by(addresses.c.user_id).having(func.count(addresses.c.id)>1)
>>> print conn.execute(s).fetchall()
SELECT addresses.user_id, count(addresses.id)
FROM addresses GROUP BY addresses.user_id
HAVING count(addresses.id) > ?
[1]

[(1, 2), (2, 2)]

>>> s = select([addresses.c.email_address, addresses.c.id]).distinct().\
...     order_by(addresses.c.email_address.desc(), addresses.c.id)
>>> conn.execute(s).fetchall()
SELECT DISTINCT addresses.email_address, addresses.id
FROM addresses ORDER BY addresses.email_address DESC, addresses.id
[]

[(u'www@www.org', 3), (u'wendy@aol.com', 4), (u'jack@yahoo.com', 1), (u'jack@msn.com', 2)]

>>> s = select([addresses]).offset(1).limit(1)
>>> print conn.execute(s).fetchall() 
SELECT addresses.id, addresses.user_id, addresses.email_address
FROM addresses
LIMIT 1 OFFSET 1
[]

[(2, 1, u'jack@msn.com')]
back to section top

Updates

Finally, we're back to UPDATE. Updates work a lot like INSERTS, except there is an additional WHERE clause that can be specified.

>>> # change 'jack' to 'ed'
sql>>> conn.execute(users.update(users.c.name=='jack'), name='ed') 
<sqlalchemy.engine.base.ResultProxy object at 0x...>

>>> # use bind parameters
>>> u = users.update(users.c.name==bindparam('oldname'), values={'name':bindparam('newname')})
sql>>> conn.execute(u, oldname='jack', newname='ed') 
<sqlalchemy.engine.base.ResultProxy object at 0x...>

>>> # update a column to an expression
sql>>> conn.execute(users.update(values={users.c.fullname:"Fullname: " + users.c.name})) 
<sqlalchemy.engine.base.ResultProxy object at 0x...>

Correlated Updates

A correlated update lets you update a table using selection from another table, or the same table:

>>> s = select([addresses.c.email_address], addresses.c.user_id==users.c.id).limit(1)
sql>>> conn.execute(users.update(values={users.c.fullname:s})) 
<sqlalchemy.engine.base.ResultProxy object at 0x...>
back to section top

Deletes

Finally, a delete. Easy enough:

sql>>> conn.execute(addresses.delete()) 
<sqlalchemy.engine.base.ResultProxy object at 0x...>

sql>>> conn.execute(users.delete(users.c.name > 'm')) 
<sqlalchemy.engine.base.ResultProxy object at 0x...>
back to section top

Further Reference

The best place to get every possible name you can use in constructed SQL is the Generated Documentation.

Table Metadata Reference: Database Meta Data

Engine/Connection/Execution Reference: Database Engines

SQL Types: The Types System

back to section top

This section references most major configurational patterns involving the mapper() and relation() functions. It assumes you've worked through the Object Relational Tutorial and know how to construct and use rudimentary mappers and relations.

Mapper Configuration

Full API documentation for the ORM:

module sqlalchemy.orm.

Options for the mapper() function:

mapper().

Customizing Column Properties

The default behavior of a mapper is to assemble all the columns in the mapped Table into mapped object attributes. This behavior can be modified in several ways, as well as enhanced by SQL expressions.

To load only a part of the columns referenced by a table as attributes, use the include_properties and exclude_properties arguments:

mapper(User, users_table, include_properties=['user_id', 'user_name'])

mapper(Address, addresses_table, exclude_properties=['street', 'city', 'state', 'zip'])

To change the name of the attribute mapped to a particular column, place the Column object in the properties dictionary with the desired key:

mapper(User, users_table, properties={ 
   'id' : users_table.c.user_id,
   'name' : users_table.c.user_name,
})

To change the names of all attributes using a prefix, use the column_prefix option. This is useful for classes which wish to add their own property accessors:

mapper(User, users_table, column_prefix='_')

The above will place attribute names such as _user_id, _user_name, _password etc. on the mapped User class.

To place multiple columns which are known to be "synonymous" based on foreign key relationship or join condition into the same mapped attribute, put them together using a list, as below where we map to a Join:

# join users and addresses
usersaddresses = sql.join(users_table, addresses_table, \
    users_table.c.user_id == addresses_table.c.user_id)

mapper(User, usersaddresses, 
   properties = {
           'id':[users_table.c.user_id, addresses_table.c.user_id],
      })
back to section top

Deferred Column Loading

This feature allows particular columns of a table to not be loaded by default, instead being loaded later on when first referenced. It is essentially "column-level lazy loading". This feature is useful when one wants to avoid loading a large text or binary field into memory when it's not needed. Individual columns can be lazy loaded by themselves or placed into groups that lazy-load together.

book_excerpts = Table('books', db, 
    Column('book_id', Integer, primary_key=True),
    Column('title', String(200), nullable=False),
    Column('summary', String(2000)),
    Column('excerpt', String),
    Column('photo', Binary)
  )

class Book(object):
    pass

# define a mapper that will load each of 'excerpt' and 'photo' in 
# separate, individual-row SELECT statements when each attribute
# is first referenced on the individual object instance
mapper(Book, book_excerpts, properties = {
  'excerpt' : deferred(book_excerpts.c.excerpt),
  'photo' : deferred(book_excerpts.c.photo)
})

Deferred columns can be placed into groups so that they load together:

book_excerpts = Table('books', db, 
  Column('book_id', Integer, primary_key=True),
  Column('title', String(200), nullable=False),
  Column('summary', String(2000)),
  Column('excerpt', String),
  Column('photo1', Binary),
  Column('photo2', Binary),
  Column('photo3', Binary)
)

class Book(object):
  pass

# define a mapper with a 'photos' deferred group.  when one photo is referenced,
# all three photos will be loaded in one SELECT statement.  The 'excerpt' will 
# be loaded separately when it is first referenced.
mapper(Book, book_excerpts, properties = {
  'excerpt' : deferred(book_excerpts.c.excerpt),
  'photo1' : deferred(book_excerpts.c.photo1, group='photos'),
  'photo2' : deferred(book_excerpts.c.photo2, group='photos'),
  'photo3' : deferred(book_excerpts.c.photo3, group='photos')
})

You can defer or undefer columns at the Query level using the defer and undefer options:

query = session.query(Book)
query.options(defer('summary')).all()
query.options(undefer('excerpt')).all()

And an entire "deferred group", i.e. which uses the group keyword argument to deferred(), can be undeferred using undefer_group(), sending in the group name:

query = session.query(Book)
query.options(undefer_group('photos')).all()
back to section top

SQL Expressions as Mapped Attributes

To add a SQL clause composed of local or external columns as a read-only, mapped column attribute, use the column_property() function. Any scalar-returning ClauseElement may be used, as long as it has a name attribute; usually, you'll want to call label() to give it a specific name:

mapper(User, users_table, properties={
    'fullname' : column_property(
        (users_table.c.firstname + " " + users_table.c.lastname).label('fullname')
    )
})

Correlated subqueries may be used as well:

mapper(User, users_table, properties={
    'address_count' : column_property(
            select(
                [func.count(addresses_table.c.address_id)], 
                addresses_table.c.user_id==users_table.c.user_id
            ).label('address_count')
        )
})
back to section top

Overriding Attribute Behavior with Synonyms

A common request is the ability to create custom class properties that override the behavior of setting/getting an attribute. As of 0.4.2, the synonym() construct provides an easy way to do this in conjunction with a normal Python property constructs. Below, we re-map the email column of our mapped table to a custom attribute setter/getter, mapping the actual column to the property named _email:

class MyAddress(object):
   def _set_email(self, email):
      self._email = email
   def _get_email(self):
      return self._email
   email = property(_get_email, _set_email)

mapper(MyAddress, addresses_table, properties = {
    'email':synonym('_email', map_column=True)
})

The email attribute is now usable in the same way as any other mapped attribute, including filter expressions, get/set operations, etc.:

address = sess.query(MyAddress).filter(MyAddress.email == 'some address').one()

address.email = 'some other address'
sess.flush()

q = sess.query(MyAddress).filter_by(email='some other address')

If the mapped class does not provide a property, the synonym() construct will create a default getter/setter object automatically.

back to section top

Composite Column Types

Sets of columns can be associated with a single datatype. The ORM treats the group of columns like a single column which accepts and returns objects using the custom datatype you provide. In this example, we'll create a table vertices which stores a pair of x/y coordinates, and a custom datatype Point which is a composite type of an x and y column:

vertices = Table('vertices', metadata, 
    Column('id', Integer, primary_key=True),
    Column('x1', Integer),
    Column('y1', Integer),
    Column('x2', Integer),
    Column('y2', Integer),
    )

The requirements for the custom datatype class are that it have a constructor which accepts positional arguments corresponding to its column format, and also provides a method __composite_values__() which returns the state of the object as a list or tuple, in order of its column-based attributes. It also should supply adequate __eq__() and __ne__() methods which test the equality of two instances:

class Point(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __composite_values__(self):
        return [self.x, self.y]            
    def __eq__(self, other):
        return other.x == self.x and other.y == self.y
    def __ne__(self, other):
        return not self.__eq__(other)

Setting up the mapping uses the composite() function:

class Vertex(object):
    pass

mapper(Vertex, vertices, properties={
    'start':composite(Point, vertices.c.x1, vertices.c.y1),
    'end':composite(Point, vertices.c.x2, vertices.c.y2)
})

We can now use the Vertex instances as well as querying as though the start and end attributes are regular scalar attributes:

sess = Session()
v = Vertex(Point(3, 4), Point(5, 6))
sess.save(v)

v2 = sess.query(Vertex).filter(Vertex.start == Point(3, 4))

The "equals" comparison operation by default produces an AND of all corresponding columns equated to one another. If you'd like to override this, or define the behavior of other SQL operators for your new type, the composite() function accepts an extension object of type sqlalchemy.orm.PropComparator:

from sqlalchemy.orm import PropComparator
from sqlalchemy import sql

class PointComparator(PropComparator):
    def __gt__(self, other):
        """define the 'greater than' operation"""

        return sql.and_(*[a>b for a, b in
                          zip(self.prop.columns,
                              other.__composite_values__())])

maper(Vertex, vertices, properties={
    'start':composite(Point, vertices.c.x1, vertices.c.y1, comparator=PointComparator),
    'end':composite(Point, vertices.c.x2, vertices.c.y2, comparator=PointComparator)
})
back to section top

Controlling Ordering

By default, mappers will attempt to ORDER BY the "oid" column of a table, or the first primary key column, when selecting rows. This can be modified in several ways.

The "order_by" parameter can be sent to a mapper, overriding the per-engine ordering if any. A value of None means that the mapper should not use any ordering. A non-None value, which can be a column, an asc or desc clause, or an array of either one, indicates the ORDER BY clause that should be added to all select queries:

# disable all ordering
mapper(User, users_table, order_by=None)

# order by a column
mapper(User, users_table, order_by=users_table.c.user_id)

# order by multiple items
mapper(User, users_table, order_by=[users_table.c.user_id, users_table.c.user_name.desc()])

"order_by" can also be specified with queries, overriding all other per-engine/per-mapper orderings:

# order by a column
l = query.filter(User.user_name=='fred').order_by(User.user_id).all()

# order by multiple criterion
l = query.filter(User.user_name=='fred').order_by([User.user_id, User.user_name.desc()])

The "order_by" property can also be specified on a relation() which will control the ordering of the collection:

mapper(Address, addresses_table)

# order address objects by address id
mapper(User, users_table, properties = {
    'addresses' : relation(Address, order_by=addresses_table.c.address_id)
})

Note that when using eager loaders with relations, the tables used by the eager load's join are anonymously aliased. You can only order by these columns if you specify it at the relation() level. To control ordering at the query level based on a related table, you join() to that relation, then order by it:

session.query(User).join('addresses').order_by(Address.street)
back to section top

Mapping Class Inheritance Hierarchies

SQLAlchemy supports three forms of inheritance: single table inheritance, where several types of classes are stored in one table, concrete table inheritance, where each type of class is stored in its own table, and joined table inheritance, where the parent/child classes are stored in their own tables that are joined together in a select. Whereas support for single and joined table inheritance is strong, concrete table inheritance is a less common scenario with some particular problems so is not quite as flexible.

When mappers are configured in an inheritance relationship, SQLAlchemy has the ability to load elements "polymorphically", meaning that a single query can return objects of multiple types.

For the following sections, assume this class relationship:

class Employee(object):
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return self.__class__.__name__ + " " + self.name

class Manager(Employee):
    def __init__(self, name, manager_data):
        self.name = name
        self.manager_data = manager_data
    def __repr__(self):
        return self.__class__.__name__ + " " + self.name + " " +  self.manager_data

class Engineer(Employee):
    def __init__(self, name, engineer_info):
        self.name = name
        self.engineer_info = engineer_info
    def __repr__(self):
        return self.__class__.__name__ + " " + self.name + " " +  self.engineer_info

Joined Table Inheritance

In joined table inheritance, each class along a particular classes' list of parents is represented by a unique table. The total set of attributes for a particular instance is represented as a join along all tables in its inheritance path. Here, we first define a table to represent the Employee class. This table will contain a primary key column (or columns), and a column for each attribute that's represented by Employee. In this case it's just name:

employees = Table('employees', metadata, 
   Column('employee_id', Integer, primary_key=True),
   Column('name', String(50)),
   Column('type', String(30), nullable=False)
)

The table also has a column called type. It is strongly advised in both single- and joined- table inheritance scenarios that the root table contains a column whose sole purpose is that of the discriminator; it stores a value which indicates the type of object represented within the row. The column may be of any desired datatype. While there are some "tricks" to work around the requirement that there be a discriminator column, they are more complicated to configure when one wishes to load polymorphically.

Next we define individual tables for each of Engineer and Manager, which each contain columns that represent the attributes unique to the subclass they represent. Each table also must contain a primary key column (or columns), and in most cases a foreign key reference to the parent table. It is standard practice that the same column is used for both of these roles, and that the column is also named the same as that of the parent table. However this is optional in SQLAlchemy; separate columns may be used for primary key and parent-relation, the column may be named differently than that of the parent, and even a custom join condition can be specified between parent and child tables instead of using a foreign key. In joined table inheritance, the primary key of an instance is always represented by the primary key of the base table only (new in SQLAlchemy 0.4).

engineers = Table('engineers', metadata, 
   Column('employee_id', Integer, ForeignKey('employees.employee_id'), primary_key=True),
   Column('engineer_info', String(50)),
)

managers = Table('managers', metadata, 
   Column('employee_id', Integer, ForeignKey('employees.employee_id'), primary_key=True),
   Column('manager_data', String(50)),
)

We then configure mappers as usual, except we use some additional arguments to indicate the inheritance relationship, the polymorphic discriminator column, and the polymorphic identity of each class; this is the value that will be stored in the polymorphic discriminator column.

mapper(Employee, employees, polymorphic_on=employees.c.type, polymorphic_identity='employee')
mapper(Engineer, engineers, inherits=Employee, polymorphic_identity='engineer')
mapper(Manager, managers, inherits=Employee, polymorphic_identity='manager')

And that's it. Querying against Employee will return a combination of Employee, Engineer and Manager objects.

Polymorphic Querying Strategies

The Query object includes some helper functionality when dealing with joined-table inheritance mappings. These are the with_polymorphic() and of_type() methods, both of which are introduced in version 0.4.4.

The with_polymorphic() method affects the specific subclass tables which the Query selects from. Normally, a query such as this:

session.query(Employee).filter(Employee.name=='ed')

Selects only from the employees table. The criterion we use in filter() and other methods will generate WHERE criterion against this table. What if we wanted to load Employee objects but also wanted to use criterion against Engineer ? We could just query against the Engineer class instead. But, if we were using criterion which filters among more than one subclass (subclasses which do not inherit directly from one to the other), we'd like to select from an outer join of all those tables. The with_polymorphic() method can tell Query which joined-table subclasses we want to select for:

session.query(Employee).with_polymorphic(Engineer).filter(Engineer.engineer_info=='some info')

Even without criterion, the with_polymorphic() method has the added advantage that instances are loaded from all of their tables in one result set. Such as, to optimize the loading of all Employee objects, with_polymorphic() accepts '*' as a wildcard indicating that all subclass tables should be joined:

session.query(Employee).with_polymorphic('*').all()

with_polymorphic() is an effective query-level alternative to the existing select_table option available on mapper().

Next is a way to join along relation paths while narrowing the criterion to specific subclasses. Suppose the employees table represents a collection of employees which are associated with a Company object. We'll add a company_id column to the employees table and a new table companies:

companies = Table('companies', metadata,
   Column('company_id', Integer, primary_key=True),
   Column('name', String(50))
   )

employees = Table('employees', metadata, 
  Column('employee_id', Integer, primary_key=True),
  Column('name', String(50)),
  Column('type', String(30), nullable=False),
  Column('company_id', Integer, ForeignKey('companies.company_id'))
)

class Company(object):
    pass

mapper(Company, companies, properties={
    'employees':relation(Employee)
})

If we wanted to join from Company to not just Employee but specifically Engineers, using the join() method or any() or has() operators will by default create a join from companies to employees, without including engineers or managers in the mix. If we wish to have criterion which is specifically against the Engineer class, we can tell those methods to join or subquery against the full set of tables representing the subclass using the of_type() opertator:

session.query(Company).join(Company.employees.of_type(Engineer)).filter(Engineer.engineer_info=='someinfo')

A longhand notation, introduced in 0.4.3, is also available, which involves spelling out the full target selectable within a 2-tuple:

session.query(Company).join(('employees', employees.join(engineers))).filter(Engineer.engineer_info=='someinfo')

The second notation allows more flexibility, such as joining to any group of subclass tables:

session.query(Company).join(('employees', employees.outerjoin(engineers).outerjoin(managers))).\
    filter(or_(Engineer.engineer_info=='someinfo', Manager.manager_data=='somedata'))

The any() and has() operators also can be used with of_type() when the embedded criterion is in terms of a subclass:

session.query(Company).filter(Company.employees.of_type(Engineer).any(Engineer.engineer_info=='someinfo')).all()

Note that the any() and has() are both shorthand for a correlated EXISTS query. To build one by hand looks like:

session.query(Company).filter(
    exists([1], 
        and_(Engineer.engineer_info=='someinfo', employees.c.company_id==companies.c.company_id), 
        from_obj=employees.join(engineers)
    )
).all()

The EXISTS subquery above selects from the join of employees to engineers, and also specifies criterion which correlates the EXISTS subselect back to the parent companies table.

back to section top

Optimizing Joined Table Loads

When loading fresh from the database, the joined-table setup above will query from the parent table first, then for each row will issue a second query to the child table. For example, for a load of five rows with Employee id 3, Manager ids 1 and 5 and Engineer ids 2 and 4, will produce queries along the lines of this example:

session.query(Employee).all()
SELECT employees.employee_id AS employees_employee_id, employees.name AS employees_name, employees.type AS employees_type
FROM employees ORDER BY employees.oid
[]
SELECT managers.employee_id AS managers_employee_id, managers.manager_data AS managers_manager_data
FROM managers
WHERE ? = managers.employee_id
[5]
SELECT engineers.employee_id AS engineers_employee_id, engineers.engineer_info AS engineers_engineer_info
FROM engineers
WHERE ? = engineers.employee_id
[2]
SELECT engineers.employee_id AS engineers_employee_id, engineers.engineer_info AS engineers_engineer_info
FROM engineers
WHERE ? = engineers.employee_id
[4]
SELECT managers.employee_id AS managers_employee_id, managers.manager_data AS managers_manager_data
FROM managers
WHERE ? = managers.employee_id
[1]

The above query works well for a get() operation, since it limits the queries to only the tables directly involved in fetching a single instance. For instances which are already present in the session, the secondary table load is not needed. However, the above loading style is not efficient for loading large groups of objects, as it incurs separate queries for each parent row.

One way to reduce the number of "secondary" loads of child rows is to "defer" them, using polymorphic_fetch='deferred':

mapper(Employee, employees, polymorphic_on=employees.c.type, \
    polymorphic_identity='employee', polymorphic_fetch='deferred')
mapper(Engineer, engineers, inherits=Employee, polymorphic_identity='engineer')
mapper(Manager, managers, inherits=Employee, polymorphic_identity='manager')

The above configuration queries in the same manner as earlier, except the load of each "secondary" table occurs only when attributes referencing those columns are first referenced on the loaded instance. This style of loading is very efficient for cases where large selects of items occur, but a detailed "drill down" of extra inherited properties is less common.

More commonly, an all-at-once load may be achieved by constructing a query which combines all three tables together. The easiest way to do this as of version 0.4.4 is to use the with_polymorphic() query method which will automatically join in the classes desired:

query = session.query(Employee).with_polymorphic([Engineer, Manager])

Which produces a query like the following:

query.all()
SELECT employees.employee_id AS employees_employee_id, engineers.employee_id AS engineers_employee_id, managers.employee_id AS managers_employee_id, employees.name AS employees_name, employees.type AS employees_type, engineers.engineer_info AS engineers_engineer_info, managers.manager_data AS managers_manager_data
FROM employees LEFT OUTER JOIN engineers ON employees.employee_id = engineers.employee_id LEFT OUTER JOIN managers ON employees.employee_id = managers.employee_id ORDER BY employees.oid
[]

with_polymorphic() accepts a single class or mapper, a list of classes/mappers, or the string '*' to indicate all subclasses. It also accepts a second argument selectable which replaces the automatic join creation and instead selects directly from the selectable given. This can allow polymorphic loads from a variety of inheritance schemes including concrete tables, if the appropriate unions are constructed.

Similar behavior as provided by with_polymorphic() can be configured at the mapper level so that any user-defined query is used by default in order to load instances. The select_table argument references an arbitrary selectable which the mapper will use for load operations (it has no impact on save operations). Any selectable can be used for this, such as a UNION of tables. For joined table inheritance, the easiest method is to use OUTER JOIN:

join = employees.outerjoin(engineers).outerjoin(managers)

mapper(Employee, employees, polymorphic_on=employees.c.type, \
    polymorphic_identity='employee', select_table=join)
mapper(Engineer, engineers, inherits=Employee, polymorphic_identity='engineer')
mapper(Manager, managers, inherits=Employee, polymorphic_identity='manager')

The above mapping will produce a query similar to that of with_polymorphic('*') for every query of Employee objects.

When select_table is used, with_polymorphic() still overrides its usage at the query level. For example, if select_table were configured to load from a join of multiple tables, using with_polymorphic(Employee) will limit the list of tables selected from to just the base table (as always, tables which don't get loaded in the first pass will be loaded on an as-needed basis).

back to section top

Single Table Inheritance

Single table inheritance is where the attributes of the base class as well as all subclasses are represented within a single table. A column is present in the table for every attribute mapped to the base class and all subclasses; the columns which correspond to a single subclass are nullable. This configuration looks much like joined-table inheritance except there's only one table. In this case, a type column is required, as there would be no other way to discriminate between classes. The table is specified in the base mapper only; for the inheriting classes, leave their table parameter blank:

employees_table = Table('employees', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('manager_data', String(50)),
    Column('engineer_info', String(50)),
    Column('type', String(20), nullable=False)
)

employee_mapper = mapper(Employee, employees_table, \
    polymorphic_on=employees_table.c.type, polymorphic_identity='employee')
manager_mapper = mapper(Manager, inherits=employee_mapper, polymorphic_identity='manager')
engineer_mapper = mapper(Engineer, inherits=employee_mapper, polymorphic_identity='engineer')

Note that the mappers for the derived classes Manager and Engineer omit the specification of their associated table, as it is inherited from the employee_mapper. Omitting the table specification for derived mappers in single-table inheritance is required.

back to section top

Concrete Table Inheritance

This form of inheritance maps each class to a distinct table, as below:

employees_table = Table('employees', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
)

managers_table = Table('managers', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('manager_data', String(50)),
)

engineers_table = Table('engineers', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('engineer_info', String(50)),
)

Notice in this case there is no type column. If polymorphic loading is not required, there's no advantage to using inherits here; you just define a separate mapper for each class.

mapper(Employee, employees_table)
mapper(Manager, managers_table)
mapper(Engineer, engineers_table)

To load polymorphically, the select_table argument is currently required. In this case we must construct a UNION of all three tables. SQLAlchemy includes a helper function to create these called polymorphic_union, which will map all the different columns into a structure of selects with the same numbers and names of columns, and also generate a virtual type column for each subselect:

pjoin = polymorphic_union({
    'employee':employees_table,
    'manager':managers_table,
    'engineer':engineers_table
}, 'type', 'pjoin')

employee_mapper = mapper(Employee, employees_table, select_table=pjoin, \
    polymorphic_on=pjoin.c.type, polymorphic_identity='employee')
manager_mapper = mapper(Manager, managers_table, inherits=employee_mapper, \
    concrete=True, polymorphic_identity='manager')
engineer_mapper = mapper(Engineer, engineers_table, inherits=employee_mapper, \
    concrete=True, polymorphic_identity='engineer')

Upon select, the polymorphic union produces a query like this:

session.query(Employee).all()
SELECT pjoin.type AS pjoin_type, pjoin.manager_data AS pjoin_manager_data, pjoin.employee_id AS pjoin_employee_id,
pjoin.name AS pjoin_name, pjoin.engineer_info AS pjoin_engineer_info
FROM (
SELECT employees.employee_id AS employee_id, CAST(NULL AS VARCHAR(50)) AS manager_data, employees.name AS name,
CAST(NULL AS VARCHAR(50)) AS engineer_info, 'employee' AS type
FROM employees
UNION ALL
SELECT managers.employee_id AS employee_id, managers.manager_data AS manager_data, managers.name AS name,
CAST(NULL AS VARCHAR(50)) AS engineer_info, 'manager' AS type
FROM managers
UNION ALL
SELECT engineers.employee_id AS employee_id, CAST(NULL AS VARCHAR(50)) AS manager_data, engineers.name AS name,
engineers.engineer_info AS engineer_info, 'engineer' AS type
FROM engineers
) AS pjoin ORDER BY pjoin.oid
[]

back to section top

Using Relations with Inheritance

Both joined-table and single table inheritance scenarios produce mappings which are usable in relation() functions; that is, it's possible to map a parent object to a child object which is polymorphic. Similarly, inheriting mappers can have relation()s of their own at any level, which are inherited to each child class. The only requirement for relations is that there is a table relationship between parent and child. An example is the following modification to the joined table inheritance example, which sets a bi-directional relationship between Employee and Company:

employees_table = Table('employees', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('company_id', Integer, ForeignKey('companies.company_id'))
)

companies = Table('companies', metadata, 
   Column('company_id', Integer, primary_key=True),
   Column('name', String(50)))

class Company(object):
    pass

mapper(Company, companies, properties={
   'employees': relation(Employee, backref='company')
})

SQLAlchemy has a lot of experience in this area; the optimized "outer join" approach can be used freely for parent and child relationships, eager loads are fully useable, query aliasing and other tricks are fully supported as well.

In a concrete inheritance scenario, mapping relation()s is more difficult since the distinct classes do not share a table. In this case, you can establish a relationship from parent to child if a join condition can be constructed from parent to child, if each child table contains a foreign key to the parent:

companies = Table('companies', metadata, 
   Column('id', Integer, primary_key=True),
   Column('name', String(50)))

employees_table = Table('employees', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('company_id', Integer, ForeignKey('companies.id'))
)

managers_table = Table('managers', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('manager_data', String(50)),
    Column('company_id', Integer, ForeignKey('companies.id'))
)

engineers_table = Table('engineers', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('engineer_info', String(50)),
    Column('company_id', Integer, ForeignKey('companies.id'))
)

mapper(Employee, employees_table, select_table=pjoin, polymorphic_on=pjoin.c.type, polymorphic_identity='employee')
mapper(Manager, managers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='manager')
mapper(Engineer, engineers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='engineer')
mapper(Company, companies, properties={
    'employees':relation(Employee)
})

Let's crank it up and try loading with an eager load:

session.query(Company).options(eagerload('employees')).all()
SELECT anon_1.type AS anon_1_type, anon_1.manager_data AS anon_1_manager_data, anon_1.engineer_info AS anon_1_engineer_info,
anon_1.employee_id AS anon_1_employee_id, anon_1.name AS anon_1_name, anon_1.company_id AS anon_1_company_id,
companies.id AS companies_id, companies.name AS companies_name
FROM companies LEFT OUTER JOIN (SELECT CAST(NULL AS VARCHAR(50)) AS engineer_info, employees.employee_id AS employee_id,
CAST(NULL AS VARCHAR(50)) AS manager_data, employees.name AS name, employees.company_id AS company_id, 'employee' AS type
FROM employees UNION ALL SELECT CAST(NULL AS VARCHAR(50)) AS engineer_info, managers.employee_id AS employee_id,
managers.manager_data AS manager_data, managers.name AS name, managers.company_id AS company_id, 'manager' AS type
FROM managers UNION ALL SELECT engineers.engineer_info AS engineer_info, engineers.employee_id AS employee_id,
CAST(NULL AS VARCHAR(50)) AS manager_data, engineers.name AS name, engineers.company_id AS company_id, 'engineer' AS type
FROM engineers) AS anon_1 ON companies.id = anon_1.company_id ORDER BY companies.oid, anon_1.oid
[]

The big limitation with concrete table inheritance is that relation()s placed on each concrete mapper do not propagate to child mappers. If you want to have the same relation()s set up on all concrete mappers, they must be configured manually on each.

back to section top

Mapping a Class against Multiple Tables

Mappers can be constructed against arbitrary relational units (called Selectables) as well as plain Tables. For example, The join keyword from the SQL package creates a neat selectable unit comprised of multiple tables, complete with its own composite primary key, which can be passed in to a mapper as the table.

# a class
class AddressUser(object):
    pass

# define a Join
j = join(users_table, addresses_table)

# map to it - the identity of an AddressUser object will be 
# based on (user_id, address_id) since those are the primary keys involved
mapper(AddressUser, j, properties={
    'user_id':[users_table.c.user_id, addresses_table.c.user_id]
})

A second example:

# many-to-many join on an association table
j = join(users_table, userkeywords, 
        users_table.c.user_id==userkeywords.c.user_id).join(keywords, 
           userkeywords.c.keyword_id==keywords.c.keyword_id)

# a class 
class KeywordUser(object):
    pass

# map to it - the identity of a KeywordUser object will be
# (user_id, keyword_id) since those are the primary keys involved
mapper(KeywordUser, j, properties={
    'user_id':[users_table.c.user_id, userkeywords.c.user_id],
    'keyword_id':[userkeywords.c.keyword_id, keywords.c.keyword_id]
})

In both examples above, "composite" columns were added as properties to the mappers; these are aggregations of multiple columns into one mapper property, which instructs the mapper to keep both of those columns set at the same value.

back to section top

Mapping a Class against Arbitrary Selects

Similar to mapping against a join, a plain select() object can be used with a mapper as well. Below, an example select which contains two aggregate functions and a group_by is mapped to a class:

s = select([customers, 
            func.count(orders).label('order_count'), 
            func.max(orders.price).label('highest_order')],
            customers.c.customer_id==orders.c.customer_id,
            group_by=[c for c in customers.c]
            ).alias('somealias')
class Customer(object):
    pass

mapper(Customer, s)

Above, the "customers" table is joined against the "orders" table to produce a full row for each customer row, the total count of related rows in the "orders" table, and the highest price in the "orders" table, grouped against the full set of columns in the "customers" table. That query is then mapped against the Customer class. New instances of Customer will contain attributes for each column in the "customers" table as well as an "order_count" and "highest_order" attribute. Updates to the Customer object will only be reflected in the "customers" table and not the "orders" table. This is because the primary key columns of the "orders" table are not represented in this mapper and therefore the table is not affected by save or delete operations.

back to section top

Multiple Mappers for One Class

The first mapper created for a certain class is known as that class's "primary mapper." Other mappers can be created as well, these come in two varieties.

  • secondary mapper this is a mapper that must be constructed with the keyword argument non_primary=True, and represents a load-only mapper. Objects that are loaded with a secondary mapper will have their save operation processed by the primary mapper. It is also invalid to add new relation()s to a non-primary mapper. To use this mapper with the Session, specify it to the query method:

    example:

    # primary mapper
    mapper(User, users_table)
    
    # make a secondary mapper to load User against a join
    othermapper = mapper(User, users_table.join(someothertable), non_primary=True)
    
    # select
    result = session.query(othermapper).select()
    

    The "non primary mapper" is a rarely needed feature of SQLAlchemy; in most cases, the Query object can produce any kind of query that's desired. It's recommended that a straight Query be used in place of a non-primary mapper unless the mapper approach is absolutely needed. Current use cases for the "non primary mapper" are when you want to map the class to a particular select statement or view to which additional query criterion can be added, and for when the particular mapped select statement or view is to be placed in a relation() of a parent mapper.

  • entity name mapper this is a mapper that is a fully functioning primary mapper for a class, which is distinguished from the regular primary mapper by an entity_name parameter. Instances loaded with this mapper will be totally managed by this new mapper and have no connection to the original one. Most methods on Session include an optional entity_name parameter in order to specify this condition.

    example:

    # primary mapper
    mapper(User, users_table)
    
    # make an entity name mapper that stores User objects in another table
    mapper(User, alternate_users_table, entity_name='alt')
    
    # make two User objects
    user1 = User()
    user2 = User()
    
    # save one in in the "users" table
    session.save(user1)
    
    # save the other in the "alternate_users_table"
    session.save(user2, entity_name='alt')
    
    session.flush()
    
    # select from the alternate mapper
    session.query(User, entity_name='alt').select()
    

    Use the "entity name" mapper when different instances of the same class are persisted in completely different tables. The "entity name" approach can also perform limited levels of horizontal partitioning as well. A more comprehensive approach to horizontal partitioning is provided by the Sharding API.

back to section top

Extending Mapper

Mappers can have functionality augmented or replaced at many points in its execution via the usage of the MapperExtension class. This class is just a series of "hooks" where various functionality takes place. An application can make its own MapperExtension objects, overriding only the methods it needs. Methods that are not overridden return the special value sqlalchemy.orm.EXT_CONTINUE to allow processing to continue to the next MapperExtension or simply proceed normally if there are no more extensions.

API documentation for MapperExtension: class MapperExtension(object)

To use MapperExtension, make your own subclass of it and just send it off to a mapper:

m = mapper(User, users_table, extension=MyExtension())

Multiple extensions will be chained together and processed in order; they are specified as a list:

m = mapper(User, users_table, extension=[ext1, ext2, ext3])
back to section top

Relation Configuration

The full list of options for the relation() function:

relation()

Basic Relational Patterns

A quick walkthrough of the basic relational patterns.

One To Many

A one to many relationship places a foreign key in the child table referencing the parent. SQLAlchemy creates the relationship as a collection on the parent object containing instances of the child object.

parent_table = Table('parent', metadata,
    Column('id', Integer, primary_key=True))

child_table = Table('child', metadata,
    Column('id', Integer, primary_key=True),
    Column('parent_id', Integer, ForeignKey('parent.id')))

class Parent(object):
    pass

class Child(object):
    pass

mapper(Parent, parent_table, properties={
    'children':relation(Child)
})

mapper(Child, child_table)

To establish a bi-directional relationship in one-to-many, where the "reverse" side is a many to one, specify the backref option:

mapper(Parent, parent_table, properties={
    'children':relation(Child, backref='parent')
})

mapper(Child, child_table)

Child will get a parent attribute with many-to-one semantics.

back to section top

Many To One

Many to one places a foreign key in the parent table referencing the child. The mapping setup is identical to one-to-many, however SQLAlchemy creates the relationship as a scalar attribute on the parent object referencing a single instance of the child object.

parent_table = Table('parent', metadata,
    Column('id', Integer, primary_key=True),
    Column('child_id', Integer, ForeignKey('child.id')))

child_table = Table('child', metadata,
    Column('id', Integer, primary_key=True),
    )

class Parent(object):
    pass

class Child(object):
    pass

mapper(Parent, parent_table, properties={
    'child':relation(Child)
})

mapper(Child, child_table)

Backref behavior is available here as well, where backref="parents" will place a one-to-many collection on the Child class.

back to section top

One To One

One To One is essentially a bi-directional relationship with a scalar attribute on both sides. To achieve this, the uselist=False flag indicates the placement of a scalar attribute instead of a collection on the "many" side of the relationship. To convert one-to-many into one-to-one:

mapper(Parent, parent_table, properties={
    'child':relation(Child, uselist=False, backref='parent')
})

Or to turn many-to-one into one-to-one:

mapper(Parent, parent_table, properties={
    'child':relation(Child, backref=backref('parent', uselist=False))
})
back to section top

Many To Many

Many to Many adds an association table between two classes. The association table is indicated by the secondary argument to relation().

left_table = Table('left', metadata,
    Column('id', Integer, primary_key=True))

right_table = Table('right', metadata,
    Column('id', Integer, primary_key=True))

association_table = Table('association', metadata,
    Column('left_id', Integer, ForeignKey('left.id')),
    Column('right_id', Integer, ForeignKey('right.id')),
    )

mapper(Parent, left_table, properties={
    'children':relation(Child, secondary=association_table)
})

mapper(Child, right_table)

For a bi-directional relationship, both sides of the relation contain a collection by default, which can be modified on either side via the uselist flag to be scalar. The backref keyword will automatically use the same secondary argument for the reverse relation:

mapper(Parent, left_table, properties={
    'children':relation(Child, secondary=association_table, backref='parents')
})
back to section top

Association Object

The association object pattern is a variant on many-to-many: it specifically is used when your association table contains additional columns beyond those which are foreign keys to the left and right tables. Instead of using the secondary argument, you map a new class directly to the association table. The left side of the relation references the association object via one-to-many, and the association class references the right side via many-to-one.

left_table = Table('left', metadata,
    Column('id', Integer, primary_key=True))

right_table = Table('right', metadata,
    Column('id', Integer, primary_key=True))

association_table = Table('association', metadata,
    Column('left_id', Integer, ForeignKey('left.id'), primary_key=True),
    Column('right_id', Integer, ForeignKey('right.id'), primary_key=True),
    Column('data', String(50))
    )

mapper(Parent, left_table, properties={
    'children':relation(Association)
})

mapper(Association, association_table, properties={
    'child':relation(Child)
})

mapper(Child, right_table)

The bi-directional version adds backrefs to both relations:

mapper(Parent, left_table, properties={
    'children':relation(Association, backref="parent")
})

mapper(Association, association_table, properties={
    'child':relation(Child, backref="parent_assocs")
})

mapper(Child, right_table)

Working with the association pattern in its direct form requires that child objects are associated with an association instance before being appended to the parent; similarly, access from parent to child goes through the association object:

# create parent, append a child via association
p = Parent()
a = Association()
a.child = Child()
p.children.append(a)

# iterate through child objects via association, including association 
# attributes
for assoc in p.children:
    print assoc.data
    print assoc.child

To enhance the association object pattern such that direct access to the Association object is optional, SQLAlchemy provides the associationproxy.

Important Note: it is strongly advised that the secondary table argument not be combined with the Association Object pattern, unless the relation() which contains the secondary argument is marked viewonly=True. Otherwise, SQLAlchemy may persist conflicting data to the underlying association table since it is represented by two conflicting mappings. The Association Proxy pattern should be favored in the case where access to the underlying association data is only sometimes needed.

back to section top

Adjacency List Relationships

The adjacency list pattern is a common relational pattern whereby a table contains a foreign key reference to itself. This is the most common and simple way to represent hierarchical data in flat tables. The other way is the "nested sets" model, sometimes called "modified preorder". Despite what many online articles say about modified preorder, the adjacency list model is probably the most appropriate pattern for the large majority of hierarchical storage needs, for reasons of concurrency, reduced complexity, and that modified preorder has little advantage over an application which can fully load subtrees into the application space.

SQLAlchemy commonly refers to an adjacency list relation as a self-referential mapper. In this example, we'll work with a single table called treenodes to represent a tree structure:

nodes = Table('treenodes', metadata,
    Column('id', Integer, primary_key=True),
    Column('parent_id', Integer, ForeignKey('treenodes.id')),
    Column('data', String(50)),
    )

A graph such as the following:

root --+---> child1
       +---> child2 --+--> subchild1
       |              +--> subchild2
       +---> child3

Would be represented with data such as:

id       parent_id     data
---      -------       ----
1        NULL          root
2        1             child1
3        1             child2
4        3             subchild1
5        3             subchild2
6        1             child3

SQLAlchemy's mapper() configuration for a self-referential one-to-many relationship is exactly like a "normal" one-to-many relationship. When SQLAlchemy encounters the foreign key relation from treenodes to treenodes, it assumes one-to-many unless told otherwise:

# entity class
class Node(object):
    pass

mapper(Node, nodes, properties={
    'children':relation(Node)
})

To create a many-to-one relationship from child to parent, an extra indicator of the "remote side" is added, which contains the Column object or objects indicating the remote side of the relation:

mapper(Node, nodes, properties={
    'parent':relation(Node, remote_side=[nodes.c.id])
})

And the bi-directional version combines both:

mapper(Node, nodes, properties={
    'children':relation(Node, backref=backref('parent', remote_side=[nodes.c.id]))
})

There are several examples included with SQLAlchemy illustrating self-referential strategies; these include basic_tree.py and optimized_al.py, the latter of which illustrates how to persist and search XML documents in conjunction with ElementTree.

Self-Referential Query Strategies

Querying self-referential structures is done in the same way as any other query in SQLAlchemy, such as below, we query for any node whose data attrbibute stores the value child2:

# get all nodes named 'child2'
sess.query(Node).filter(Node.data=='child2')

On the subject of joins, i.e. those described in Querying with Joins, self-referential structures require the usage of aliases so that the same table can be referenced multiple times within the FROM clause of the query. Aliasing can be done either manually using the nodes Table object as a source of aliases:

# get all nodes named 'subchild1' with a parent named 'child2'
nodealias = nodes.alias()
sqlsess.query(Node).filter(Node.data=='subchild1').\
    filter(and_(Node.parent_id==nodealias.c.id, nodealias.c.data=='child2')).all()

or automatically, using join() with aliased=True:

# get all nodes named 'subchild1' with a parent named 'child2'
sqlsess.query(Node).filter(Node.data=='subchild1').\
    join('parent', aliased=True).filter(Node.data=='child2').all()

To add criterion to multiple points along a longer join, use from_joinpoint=True:

# get all nodes named 'subchild1' with a parent named 'child2' and a grandparent 'root'
sqlsess.query(Node).filter(Node.data=='subchild1').\
    join('parent', aliased=True).filter(Node.data=='child2').\
    join('parent', aliased=True, from_joinpoint=True).filter(Node.data=='root').all()
back to section top

Configuring Eager Loading

Eager loading of relations occurs using joins or outerjoins from parent to child table during a normal query operation, such that the parent and its child collection can be populated from a single SQL statement. SQLAlchemy's eager loading uses aliased tables in all cases when joining to related items, so it is compatible with self-referential joining. However, to use eager loading with a self-referential relation, SQLAlchemy needs to be told how many levels deep it should join; otherwise the eager load will not take place. This depth setting is configured via join_depth:

mapper(Node, nodes, properties={
    'children':relation(Node, lazy=False, join_depth=2)
})

sqlsession.query(Node).all()
back to section top

Specifying Alternate Join Conditions to relation()

The relation() function uses the foreign key relationship between the parent and child tables to formulate the primary join condition between parent and child; in the case of a many-to-many relationship it also formulates the secondary join condition. If you are working with a Table which has no ForeignKey objects on it (which can be the case when using reflected tables with MySQL), or if the join condition cannot be expressed by a simple foreign key relationship, use the primaryjoin and possibly secondaryjoin conditions to create the appropriate relationship.

In this example we create a relation boston_addresses which will only load the user addresses with a city of "Boston":

class User(object):
    pass
class Address(object):
    pass

mapper(Address, addresses_table)
mapper(User, users_table, properties={
    'boston_addresses' : relation(Address, primaryjoin=
                and_(users_table.c.user_id==addresses_table.c.user_id, 
                addresses_table.c.city=='Boston'))
})

Many to many relationships can be customized by one or both of primaryjoin and secondaryjoin, shown below with just the default many-to-many relationship explicitly set:

class User(object):
    pass
class Keyword(object):
    pass
mapper(Keyword, keywords_table)
mapper(User, users_table, properties={
    'keywords':relation(Keyword, secondary=userkeywords_table,
        primaryjoin=users_table.c.user_id==userkeywords_table.c.user_id,
        secondaryjoin=userkeywords_table.c.keyword_id==keywords_table.c.keyword_id
        )
})

Specifying Foreign Keys

When using primaryjoin and secondaryjoin, SQLAlchemy also needs to be aware of which columns in the relation reference the other. In most cases, a Table construct will have ForeignKey constructs which take care of this; however, in the case of reflected tables on a database that does not report FKs (like MySQL ISAM) or when using join conditions on columns that don't have foreign keys, the relation() needs to be told specifically which columns are "foreign" using the foreign_keys collection:

mapper(Address, addresses_table)
mapper(User, users_table, properties={
    'addresses' : relation(Address, 
         primaryjoin=users_table.c.user_id==addresses_table.c.user_id,
         foreign_keys=[addresses_table.c.user_id])
})
back to section top

Building Query-Enabled Properties

Very ambitious custom join conditions may fail to be directly persistable, and in some cases may not even load correctly. To remove the persistence part of the equation, use the flag viewonly=True on the relation(), which establishes it as a read-only attribute (data written to the collection will be ignored on flush()). However, in extreme cases, consider using a regular Python property in conjunction with Query as follows:

class User(object):
    def _get_addresses(self):
        return object_session(self).query(Address).with_parent(self).filter(...).all()
    addresses = property(_get_addresses)
back to section top

Multiple Relations against the Same Parent/Child

Theres no restriction on how many times you can relate from parent to child. SQLAlchemy can usually figure out what you want, particularly if the join conditions are straightforward. Below we add a newyork_addresses attribute to complement the boston_addresses attribute:

mapper(User, users_table, properties={
    'boston_addresses' : relation(Address, primaryjoin=
                and_(users_table.c.user_id==Address.c.user_id, 
                Addresses.c.city=='Boston')),
    'newyork_addresses' : relation(Address, primaryjoin=
                and_(users_table.c.user_id==Address.c.user_id, 
                Addresses.c.city=='New York')),
})
back to section top

Alternate Collection Implementations

Mapping a one-to-many or many-to-many relationship results in a collection of values accessible through an attribute on the parent instance. By default, this collection is a list:

mapper(Parent, properties={
    children = relation(Child)
})

parent = Parent()
parent.children.append(Child())
print parent.children[0]

Collections are not limited to lists. Sets, mutable sequences and almost any other Python object that can act as a container can be used in place of the default list.

# use a set
mapper(Parent, properties={
    children = relation(Child, collection_class=set)
})

parent = Parent()
child = Child()
parent.children.add(child)
assert child in parent.children

Custom Collection Implementations

You can use your own types for collections as well. For most cases, simply inherit from list or set and add the custom behavior.

Collections in SQLAlchemy are transparently instrumented. Instrumentation means that normal operations on the collection are tracked and result in changes being written to the database at flush time. Additionally, collection operations can fire events which indicate some secondary operation must take place. Examples of a secondary operation include saving the child item in the parent's Session (i.e. the save-update cascade), as well as synchronizing the state of a bi-directional relationship (i.e. a backref).

The collections package understands the basic interface of lists, sets and dicts and will automatically apply instrumentation to those built-in types and their subclasses. Object-derived types that implement a basic collection interface are detected and instrumented via duck-typing:

class ListLike(object):
    def __init__(self):
        self.data = []
    def append(self, item):
        self.data.append(item)
    def remove(self, item):
        self.data.remove(item)
    def extend(self, items):
        self.data.extend(items)
    def __iter__(self):
        return iter(self.data)
    def foo(self):
        return 'foo'

append, remove, and extend are known list-like methods, and will be instrumented automatically. __iter__ is not a mutator method and won't be instrumented, and foo won't be either.

Duck-typing (i.e. guesswork) isn't rock-solid, of course, so you can be explicit about the interface you are implementing by providing an __emulates__ class attribute:

class SetLike(object):
    __emulates__ = set

    def __init__(self):
        self.data = set()
    def append(self, item):
        self.data.add(item)
    def remove(self, item):
        self.data.remove(item)
    def __iter__(self):
        return iter(self.data)

This class looks list-like because of append, but __emulates__ forces it to set-like. remove is known to be part of the set interface and will be instrumented.

But this class won't work quite yet: a little glue is needed to adapt it for use by SQLAlchemy. The ORM needs to know which methods to use to append, remove and iterate over members of the collection. When using a type like list or set, the appropriate methods are well-known and used automatically when present. This set-like class does not provide the expected add method, so we must supply an explicit mapping for the ORM via a decorator.

back to section top

Annotating Custom Collections via Decorators

Decorators can be used to tag the individual methods the ORM needs to manage collections. Use them when your class doesn't quite meet the regular interface for its container type, or you simply would like to use a different method to get the job done.

from sqlalchemy.orm.collections import collection

class SetLike(object):
    __emulates__ = set

    def __init__(self):
        self.data = set()

    @collection.appender
    def append(self, item):
        self.data.add(item)

    def remove(self, item):
        self.data.remove(item)

    def __iter__(self):
        return iter(self.data)

And that's all that's needed to complete the example. SQLAlchemy will add instances via the append method. remove and __iter__ are the default methods for sets and will be used for removing and iteration. Default methods can be changed as well:

from sqlalchemy.orm.collections import collection

class MyList(list):
    @collection.remover
    def zark(self, item):
        # do something special...

    @collection.iterator
    def hey_use_this_instead_for_iteration(self):
        # ...

There is no requirement to be list-, or set-like at all. Collection classes can be any shape, so long as they have the append, remove and iterate interface marked for SQLAlchemy's use. Append and remove methods will be called with a mapped entity as the single argument, and iterator methods are called with no arguments and must return an iterator.

back to section top

Dictionary-Based Collections

A dict can be used as a collection, but a keying strategy is needed to map entities loaded by the ORM to key, value pairs. The collections package provides several built-in types for dictionary-based collections:

from sqlalchemy.orm.collections import column_mapped_collection, attribute_mapped_collection, mapped_collection

mapper(Item, items_table, properties={
    # key by column
    'notes': relation(Note, collection_class=column_mapped_collection(notes_table.c.keyword)),
    # or named attribute 
    'notes2': relation(Note, collection_class=attribute_mapped_collection('keyword')),
    # or any callable
    'notes3': relation(Note, collection_class=mapped_collection(lambda entity: entity.a + entity.b))
})

# ...
item = Item()
item.notes['color'] = Note('color', 'blue')
print item.notes['color']

These functions each provide a dict subclass with decorated set and remove methods and the keying strategy of your choice.

The collections.MappedCollection class can be used as a base class for your custom types or as a mix-in to quickly add dict collection support to other classes. It uses a keying function to delegate to __setitem__ and __delitem__:

from sqlalchemy.util import OrderedDict
from sqlalchemy.orm.collections import MappedCollection

class NodeMap(OrderedDict, MappedCollection):
    """Holds 'Node' objects, keyed by the 'name' attribute with insert order maintained."""

    def __init__(self, *args, **kw):
        MappedCollection.__init__(self, keyfunc=lambda node: node.name)
        OrderedDict.__init__(self, *args, **kw)

The ORM understands the dict interface just like lists and sets, and will automatically instrument all dict-like methods if you choose to subclass dict or provide dict-like collection behavior in a duck-typed class. You must decorate appender and remover methods, however- there are no compatible methods in the basic dictionary interface for SQLAlchemy to use by default. Iteration will go through itervalues() unless otherwise decorated.

back to section top

Instrumentation and Custom Types

Many custom types and existing library classes can be used as a entity collection type as-is without further ado. However, it is important to note that the instrumentation process will modify the type, adding decorators around methods automatically.

The decorations are lightweight and no-op outside of relations, but they do add unneeded overhead when triggered elsewhere. When using a library class as a collection, it can be good practice to use the "trivial subclass" trick to restrict the decorations to just your usage in relations. For example:

class MyAwesomeList(some.great.library.AwesomeList):
    pass

# ... relation(..., collection_class=MyAwesomeList)

The ORM uses this approach for built-ins, quietly substituting a trivial subclass when a list, set or dict is used directly.

The collections package provides additional decorators and support for authoring custom types. See the package documentation for more information and discussion of advanced usage and Python 2.3-compatible decoration options.

back to section top

Configuring Loader Strategies: Lazy Loading, Eager Loading

In the Object Relational Tutorial, we introduced the concept of Eager Loading. We used an option in conjunction with the Query object in order to indicate that a relation should be loaded at the same time as the parent, within a single SQL query:

sql>>> jack = session.query(User).options(eagerload('addresses')).filter_by(name='jack').all() 

By default, all relations are lazy loading. The scalar or collection attribute associated with a relation() contains a trigger which fires the first time the attribute is accessed, which issues a SQL call at that point:

sql>>> jack.addresses
[<Address(u'jack@google.com')>, <Address(u'j25@yahoo.com')>]

The default loader strategy for any relation() is configured by the lazy keyword argument, which defaults to True. Below we set it as False so that the children relation is eager loading:

# eager load 'children' attribute
mapper(Parent, parent_table, properties={
    'children':relation(Child, lazy=False)
})

The loader strategy can be changed from lazy to eager as well as eager to lazy using the eagerload() and lazyload() query options:

# set children to load lazily
session.query(Parent).options(lazyload('children')).all()

# set children to load eagerly
session.query(Parent).options(eagerload('children')).all()

To reference a relation that is deeper than one level, separate the names by periods:

session.query(Parent).options(eagerload('foo.bar.bat')).all()

When using dot-separated names with eagerload(), option applies only to the actual attribute named, and not its ancestors. For example, suppose a mapping from A to B to C, where the relations, named atob and btoc, are both lazy-loading. A statement like the following:

session.query(A).options(eagerload('atob.btoc')).all()

will load only A objects to start. When the atob attribute on each A is accessed, the returned B objects will eagerly load their C objects.

Therefore, to modify the eager load to load both atob as well as btoc, place eagerloads for both:

session.query(A).options(eagerload('atob'), eagerload('atob.btoc')).all()

or more simply just use eagerload_all():

session.query(A).options(eagerload_all('atob.btoc')).all()

There are two other loader strategies available, dynamic loading and no loading; these are described in Working with Large Collections.

Combining Eager Loads with Statement/Result Set Queries

When full statement or result-set loads are used with Query, SQLAlchemy does not affect the SQL query itself, and therefore has no way of tacking on its own LEFT [OUTER] JOIN conditions that are normally used to eager load relationships. If the query being constructed is created in such a way that it returns rows not just from a parent table (or tables) but also returns rows from child tables, the result-set mapping can be notified as to which additional properties are contained within the result set. This is done using the contains_eager() query option, which specifies the name of the relationship to be eagerly loaded.

# mapping is the users->addresses mapping
mapper(User, users_table, properties={
    'addresses':relation(Address, addresses_table)
})

# define a query on USERS with an outer join to ADDRESSES
statement = users_table.outerjoin(addresses_table).select(use_labels=True)

# construct a Query object which expects the "addresses" results 
query = session.query(User).options(contains_eager('addresses'))

# get results normally
r = query.from_statement(statement)

If the "eager" portion of the statement is "aliased", the alias keyword argument to contains_eager() may be used to indicate it. This is a string alias name or reference to an actual Alias object:

# use an alias of the addresses table
adalias = addresses_table.alias('adalias')

# define a query on USERS with an outer join to adalias
statement = users_table.outerjoin(adalias).select(use_labels=True)

# construct a Query object which expects the "addresses" results 
query = session.query(User).options(contains_eager('addresses', alias=adalias))

# get results normally
sqlr = query.from_statement(statement).all()

In the case that the main table itself is also aliased, the contains_alias() option can be used:

# define an aliased UNION called 'ulist'
statement = users.select(users.c.user_id==7).union(users.select(users.c.user_id>7)).alias('ulist')

# add on an eager load of "addresses"
statement = statement.outerjoin(addresses).select(use_labels=True)

# create query, indicating "ulist" is an alias for the main table, "addresses" property should
# be eager loaded
query = create_session().query(User).options(contains_alias('ulist'), contains_eager('addresses'))

# results
r = query.from_statement(statement)
back to section top

Working with Large Collections

The default behavior of relation() is to fully load the collection of items in, as according to the loading strategy of the relation. Additionally, the Session by default only knows how to delete objects which are actually present within the session. When a parent instance is marked for deletion and flushed, the Session loads its full list of child items in so that they may either be deleted as well, or have their foreign key value set to null; this is to avoid constraint violations. For large collections of child items, there are several strategies to bypass full loading of child items both at load time as well as deletion time.

Dynamic Relation Loaders

The most useful by far is the dynamic_loader() relation. This is a variant of relation() which returns a Query object in place of a collection when accessed. filter() criterion may be applied as well as limits and offsets, either explicitly or via array slices:

mapper(User, users_table, properties={
    'posts':dynamic_loader(Post)
})

jack = session.query(User).get(id)

# filter Jack's blog posts
posts = jack.posts.filter(Post.c.headline=='this is a post')

# apply array slices
posts = jack.posts[5:20]

The dynamic relation supports limited write operations, via the append() and remove() methods. Since the read side of the dynamic relation always queries the database, changes to the underlying collection will not be visible until the data has been flushed:

oldpost = jack.posts.filter(Post.c.headline=='old post').one()
jack.posts.remove(oldpost)

jack.posts.append(Post('new post'))

To place a dynamic relation on a backref, use lazy='dynamic':

mapper(Post, posts_table, properties={
    'user':relation(User, backref=backref('posts', lazy='dynamic'))
})

Note that eager/lazy loading options cannot be used in conjunction dynamic relations at this time.

back to section top

Setting Noload

The opposite of the dynamic relation is simply "noload", specified using lazy=None:

mapper(MyClass, table, properties=relation{
    'children':relation(MyOtherClass, lazy=None)
})

Above, the children collection is fully writeable, and changes to it will be persisted to the database as well as locally available for reading at the time they are added. However when instances of MyClass are freshly loaded from the database, the children collection stays empty.

back to section top

Using Passive Deletes

Use passive_deletes=True to disable child object loading on a DELETE operation, in conjunction with "ON DELETE (CASCADE|SET NULL)" on your database to automatically cascade deletes to child objects. Note that "ON DELETE" is not supported on SQLite, and requires InnoDB tables when using MySQL:

mytable = Table('mytable', meta,
    Column('id', Integer, primary_key=True),
    )

myothertable = Table('myothertable', meta,
    Column('id', Integer, primary_key=True),
    Column('parent_id', Integer),
    ForeignKeyConstraint(['parent_id'],['mytable.id'], ondelete="CASCADE"),
    )

mmapper(MyOtherClass, myothertable)

mapper(MyClass, mytable, properties={
    'children':relation(MyOtherClass, cascade="all, delete-orphan", passive_deletes=True)
})

When passive_deletes is applied, the children relation will not be loaded into memory when an instance of MyClass is marked for deletion. The cascade="all, delete-orphan" will take effect for instances of MyOtherClass which are currently present in the session; however for instances of MyOtherClass which are not loaded, SQLAlchemy assumes that "ON DELETE CASCADE" rules will ensure that those rows are deleted by the database and that no foreign key violation will occur.

back to section top

Mutable Primary Keys / Update Cascades

As of SQLAlchemy 0.4.2, the primary key attributes of an instance can be changed freely, and will be persisted upon flush. When the primary key of an entity changes, related items which reference the primary key must also be updated as well. For databases which enforce referential integrity, it's required to use the database's ON UPDATE CASCADE functionality in order to propagate primary key changes. For those which don't, the passive_cascades flag can be set to False which instructs SQLAlchemy to issue UPDATE statements individually. The passive_cascades flag can also be False in conjunction with ON UPDATE CASCADE functionality, although in that case it issues UPDATE statements unnecessarily.

A typical mutable primary key setup might look like:

users = Table('users', metadata,
    Column('username', String(50), primary_key=True),
    Column('fullname', String(100)))

addresses = Table('addresses', metadata,
    Column('email', String(50), primary_key=True),
    Column('username', String(50), ForeignKey('users.username', onupdate="cascade")))

class User(object):
    pass
class Address(object):
    pass

mapper(User, users, properties={
    'addresses':relation(Address, passive_updates=False)
})
mapper(Address, addresses)

passive_updates is set to True by default. Foreign key references to non-primary key columns are supported as well.

back to section top

The Mapper is the entrypoint to the configurational API of the SQLAlchemy object relational mapper. But the primary object one works with when using the ORM is the Session.

What does the Session do ?

In the most general sense, the Session establishes all conversations with the database and represents a "holding zone" for all the mapped instances which you've loaded or created during its lifespan. It implements the Unit of Work pattern, which means it keeps track of all changes which occur, and is capable of flushing those changes to the database as appropriate. Another important facet of the Session is that it's also maintaining unique copies of each instance, where "unique" means "only one object with a particular primary key" - this pattern is called the Identity Map.

Beyond that, the Session implements an interface which let's you move objects in or out of the session in a variety of ways, it provides the entryway to a Query object which is used to query the database for data, it is commonly used to provide transactional boundaries (though this is optional), and it also can serve as a configurational "home base" for one or more Engine objects, which allows various vertical and horizontal partitioning strategies to be achieved.

back to section top

Getting a Session

The Session object exists just as a regular Python object, which can be directly instantiated. However, it takes a fair amount of keyword options, several of which you probably want to set explicitly. It's fairly inconvenient to deal with the "configuration" of a session every time you want to create one. Therefore, SQLAlchemy recommends the usage of a helper function called sessionmaker(), which typically you call only once for the lifespan of an application. This function creates a customized Session subclass for you, with your desired configurational arguments pre-loaded. Then, whenever you need a new Session, you use your custom Session class with no arguments to create the session.

Using a sessionmaker() Configuration

The usage of sessionmaker() is illustrated below:

from sqlalchemy.orm import sessionmaker

# create a configured "Session" class
Session = sessionmaker(autoflush=True, transactional=True)

# create a Session
sess = Session()

# work with sess
sess.save(x)
sess.commit()

# close when finished
sess.close()

Above, the sessionmaker call creates a class for us, which we assign to the name Session. This class is a subclass of the actual sqlalchemy.orm.session.Session class, which will instantiate with the arguments of autoflush=True and transactional=True.

When you write your application, place the call to sessionmaker() somewhere global, and then make your new Session class available to the rest of your application.

back to section top

Binding Session to an Engine or Connection

In our previous example regarding sessionmaker(), nowhere did we specify how our session would connect to our database. When the session is configured in this manner, it will look for a database engine to connect with via the Table objects that it works with - the chapter called Binding MetaData to an Engine or Connection describes how to associate Table objects directly with a source of database connections.

However, it is often more straightforward to explicitly tell the session what database engine (or engines) you'd like it to communicate with. This is particularly handy with multiple-database scenarios where the session can be used as the central point of configuration. To achieve this, the constructor keyword bind is used for a basic single-database configuration:

# create engine
engine = create_engine('postgres://...')

# bind custom Session class to the engine
Session = sessionmaker(bind=engine, autoflush=True, transactional=True)

# work with the session
sess = Session()

One common issue with the above scenario is that an application will often organize its global imports before it ever connects to a database. Since the Session class created by sessionmaker() is meant to be a global application object (note we are saying the session class, not a session instance), we may not have a bind argument available. For this, the Session class returned by sessionmaker() supports post-configuration of all options, through its method configure():

# configure Session class with desired options
Session = sessionmaker(autoflush=True, transactional=True)

# later, we create the engine
engine = create_engine('postgres://...')

# associate it with our custom Session class
Session.configure(bind=engine)

# work with the session
sess = Session()

The Session also has the ability to be bound to multiple engines. Descriptions of these scenarios are described in unitofwork_partitioning.

Binding Session to a Connection

The examples involving bind so far are dealing with the Engine object, which is, like the Session class itself, a global configurational object. The Session can also be bound to an individual database Connection. The reason you might want to do this is if your application controls the boundaries of transactions using distinct Transaction objects (these objects are described in Using Transactions with Connection). You'd have a transactional Connection, and then you'd want to work with an ORM-level Session which participates in that transaction. Since Connection is definitely not a globally-scoped object in all but the most rudimental commandline applications, you can bind an individual Session() instance to a particular Connection not at class configuration time, but at session instance construction time:

# global application scope.  create Session class, engine
Session = sessionmaker(autoflush=True, transactional=True)

engine = create_engine('postgres://...')

...

# local scope, such as within a controller function

# connect to the database
connection = engine.connect()

# bind an individual Session to the connection
sess = Session(bind=connection)
back to section top

Using create_session()

As an alternative to sessionmaker(), create_session() exists literally as a function which calls the normal Session constructor directly. All arguments are passed through and the new Session object is returned:

session = create_session(bind=myengine)

The create_session() function doesn't add any functionality to the regular Session, it just sets up a default argument set of autoflush=False, transactional=False. But also, by calling create_session() instead of instantiating Session directly, you leave room in your application to change the type of session which the function creates. For example, an application which is calling create_session() in many places, which is typical for a pre-0.4 application, can be changed to use a sessionmaker() by just assigning the return of sessionmaker() to the create_session name:

# change from:
from sqlalchemy.orm import create_session

# to:
create_session = sessionmaker()
back to section top

Using the Session

A typical session conversation starts with creating a new session, or acquiring one from an ongoing context. You save new objects and load existing ones, make changes, mark some as deleted, and then persist your changes to the database. If your session is transactional, you use commit() to persist any remaining changes and to commit the transaction. If not, you call flush() which will flush any remaining data to the database.

Below, we open a new Session using a configured sessionmaker(), make some changes, and commit:

# configured Session class
Session = sessionmaker(autoflush=True, transactional=True)

sess = Session()
d = Data(value=10)
sess.save(d)
d2 = sess.query(Data).filter(Data.value==15).one()
d2.value = 19
sess.commit()

Quickie Intro to Object States

It's helpful to know the states which an instance can have within a session:

  • Transient - an instance that's not in a session, and is not saved to the database; i.e. it has no database identity. The only relationship such an object has to the ORM is that its class has a mapper() associated with it.

  • Pending - when you save() a transient instance, it becomes pending. It still wasn't actually flushed to the database yet, but it will be when the next flush occurs.

  • Persistent - An instance which is present in the session and has a record in the database. You get persistent instances by either flushing so that the pending instances become persistent, or by querying the database for existing instances (or moving persistent instances from other sessions into your local session).

  • Detached - an instance which has a record in the database, but is not in any session. Theres nothing wrong with this, and you can use objects normally when they're detached, except they will not be able to issue any SQL in order to load collections or attributes which are not yet loaded, or were marked as "expired".

Knowing these states is important, since the Session tries to be strict about ambiguous operations (such as trying to save the same object to two different sessions at the same time).

back to section top

Frequently Asked Questions

  • When do I make a sessionmaker ?

    Just one time, somewhere in your application's global scope. It should be looked upon as part of your application's configuration. If your application has three .py files in a package, you could, for example, place the sessionmaker line in your __init__.py file; from that point on your other modules say "from mypackage import Session". That way, everyone else just uses Session(), and the configuration of that session is controlled by that central point.

    If your application starts up, does imports, but does not know what database it's going to be connecting to, you can bind the Session at the "class" level to the engine later on, using configure().

    In the examples in this section, we will frequently show the sessionmaker being created right above the line where we actually invoke Session(). But that's just for example's sake ! In reality, the sessionmaker would be somewhere at the module level, and your individual Session() calls would be sprinkled all throughout your app, such as in a web application within each controller method.

  • When do I make a Session ?

    You typically invoke Session() when you first need to talk to your database, and want to save some objects or load some existing ones. Then, you work with it, save your changes, and then dispose of it....or at the very least close() it. It's not a "global" kind of object, and should be handled more like a "local variable", as it's generally not safe to use with concurrent threads. Sessions are very inexpensive to make, and don't use any resources whatsoever until they are first used...so create some !

    There is also a pattern whereby you're using a contextual session, this is described later in Contextual/Thread-local Sessions. In this pattern, a helper object is maintaining a Session for you, most commonly one that is local to the current thread (and sometimes also local to an application instance). SQLAlchemy 0.4 has worked this pattern out such that it still looks like you're creating a new session as you need one...so in that case, it's still a guaranteed win to just say Session() whenever you want a session.

  • Is the Session a cache ?

    Yeee...no. It's somewhat used as a cache, in that it implements the identity map pattern, and stores objects keyed to their primary key. However, it doesn't do any kind of query caching. This means, if you say session.query(Foo).filter_by(name='bar'), even if Foo(name='bar') is right there, in the identity map, the session has no idea about that. It has to issue SQL to the database, get the rows back, and then when it sees the primary key in the row, then it can look in the local identity map and see that the object is already there. It's only when you say query.get({some primary key}) that the Session doesn't have to issue a query.

    Additionally, the Session stores object instances using a weak reference by default. This also defeats the purpose of using the Session as a cache, unless the weak_identity_map flag is set to False.

    The Session is not designed to be a global object from which everyone consults as a "registry" of objects. That is the job of a second level cache. A good library for implementing second level caching is Memcached. It is possible to "sort of" use the Session in this manner, if you set it to be non-transactional and it never flushes any SQL, but it's not a terrific solution, since if concurrent threads load the same objects at the same time, you may have multiple copies of the same objects present in collections.

  • How can I get the Session for a certain object ?

    Use the object_session() classmethod available on Session:

    session = Session.object_session(someobject)
    
  • Is the session threadsafe ?

    Nope. It has no thread synchronization of any kind built in, and particularly when you do a flush operation, it definitely is not open to concurrent threads accessing it, because it holds onto a single database connection at that point. If you use a session which is non-transactional for read operations only, it's still not thread-"safe", but you also wont get any catastrophic failures either, since it opens and closes connections on an as-needed basis; it's just that different threads might load the same objects independently of each other, but only one will wind up in the identity map (however, the other one might still live in a collection somewhere).

    But the bigger point here is, you should not want to use the session with multiple concurrent threads. That would be like having everyone at a restaurant all eat from the same plate. The session is a local "workspace" that you use for a specific set of tasks; you don't want to, or need to, share that session with other threads who are doing some other task. If, on the other hand, there are other threads participating in the same task you are, such as in a desktop graphical application, then you would be sharing the session with those threads, but you also will have implemented a proper locking scheme (or your graphical framework does) so that those threads do not collide.

back to section top

Session Attributes

The session provides a set of attributes and collection-oriented methods which allow you to view the current state of the session.

The identity map is accessed by the identity_map attribute, which provides a dictionary interface. The keys are "identity keys", which are attached to all persistent objects by the attribute _instance_key:

>>> myobject._instance_key 
(<class 'test.tables.User'>, (7,))

>>> myobject._instance_key in session.identity_map
True

>>> session.identity_map.values()
[<__main__.User object at 0x712630>, <__main__.Address object at 0x712a70>]

The identity map is a weak-referencing dictionary by default. This means that objects which are dereferenced on the outside will be removed from the session automatically. Note that objects which are marked as "dirty" will not fall out of scope until after changes on them have been flushed; special logic kicks in at the point of auto-removal which ensures that no pending changes remain on the object, else a temporary strong reference is created to the object.

Some people prefer objects to stay in the session until explicitly removed in all cases; for this, you can specify the flag weak_identity_map=False to the create_session or sessionmaker functions so that the Session will use a regular dictionary.

While the identity_map accessor is currently the actual dictionary used by the Session to store instances, you should not add or remove items from this dictionary. Use the session methods save_or_update() and expunge() to add or remove items.

The Session also supports an iterator interface in order to see all objects in the identity map:

for obj in session:
    print obj

As well as __contains__():

if obj in session:
    print "Object is present"

The session is also keeping track of all newly created (i.e. pending) objects, all objects which have had changes since they were last loaded or saved (i.e. "dirty"), and everything that's been marked as deleted.

# pending objects recently added to the Session
session.new

# persistent objects which currently have changes detected
# (this collection is now created on the fly each time the property is called)
session.dirty

# persistent objects that have been marked as deleted via session.delete(obj)
session.deleted
back to section top

Querying

The query() function takes one or more classes and/or mappers, along with an optional entity_name parameter, and returns a new Query object which will issue mapper queries within the context of this Session. For each mapper is passed, the Query uses that mapper. For each class, the Query will locate the primary mapper for the class using class_mapper().

# query from a class
session.query(User).filter_by(name='ed').all()

# query with multiple classes, returns tuples
session.query(User).add_entity(Address).join('addresses').filter_by(name='ed').all()

# query from a mapper
query = session.query(usermapper)
x = query.get(1)

# query from a class mapped with entity name 'alt_users'
q = session.query(User, entity_name='alt_users')
y = q.options(eagerload('orders')).all()

entity_name is an optional keyword argument sent with a class object, in order to further qualify which primary mapper to be used; this only applies if there was a Mapper created with that particular class/entity name combination, else an exception is raised. All of the methods on Session which take a class or mapper argument also take the entity_name argument, so that a given class can be properly matched to the desired primary mapper.

All instances retrieved by the returned Query object will be stored as persistent instances within the originating Session.

back to section top

Saving New Instances

save() is called with a single transient instance as an argument, which is then added to the Session and becomes pending. When the session is next flushed, the instance will be saved to the database. If the given instance is not transient, meaning it is either attached to an existing Session or it has a database identity, an exception is raised.

user1 = User(name='user1')
user2 = User(name='user2')
session.save(user1)
session.save(user2)

session.commit()     # write changes to the database

There's also other ways to have objects saved to the session automatically; one is by using cascade rules, and the other is by using a contextual session. Both of these are described later.

back to section top

Updating/Merging Existing Instances

The update() method is used when you have a detached instance, and you want to put it back into a Session. Recall that "detached" means the object has a database identity.

Since update() is a little picky that way, most people use save_or_update(), which checks for an _instance_key attribute, and based on whether it's there or not, calls either save() or update():

# load user1 using session 1
user1 = sess1.query(User).get(5)

# remove it from session 1
sess1.expunge(user1)

# move it into session 2
sess2.save_or_update(user1)

update() is also an operation that can happen automatically using cascade rules, just like save().

merge() on the other hand is a little like update(), except it creates a copy of the given instance in the session, and returns to you that instance; the instance you send it never goes into the session. merge() is much fancier than update(); it will actually look to see if an object with the same primary key is already present in the session, and if not will load it by primary key. Then, it will merge the attributes of the given object into the one which it just located.

This method is useful for bringing in objects which may have been restored from a serialization, such as those stored in an HTTP session, where the object may be present in the session already:

# deserialize an object
myobj = pickle.loads(mystring)

# "merge" it.  if the session already had this object in the 
# identity map, then you get back the one from the current session.
myobj = session.merge(myobj)

merge() includes an important option called dont_load. When this boolean flag is set to True, the merge of a detached object will not force a get() of that object from the database. Normally, merge() issues a get() for every existing object so that it can load the most recent state of the object, which is then modified according to the state of the given object. With dont_load=True, the get() is skipped and merge() places an exact copy of the given object in the session. This allows objects which were retrieved from a caching system to be copied back into a session without any SQL overhead being added.

back to section top

Deleting

The delete method places an instance into the Session's list of objects to be marked as deleted:

# mark two objects to be deleted
session.delete(obj1)
session.delete(obj2)

# commit (or flush)
session.commit()

The big gotcha with delete() is that nothing is removed from collections. Such as, if a User has a collection of three Addresses, deleting an Address will not remove it from user.addresses:

>>> address = user.addresses[1]
>>> session.delete(address)
>>> session.flush()
>>> address in user.addresses
True

The solution is to use proper cascading:

mapper(User, users_table, properties={
    'addresses':relation(Address, cascade="all, delete")
})
del user.addresses[1]
session.flush()
back to section top

Flushing

This is the main gateway to what the Session does best, which is save everything ! It should be clear by now what a flush looks like:

session.flush()

It also can be called with a list of objects; in this form, the flush operation will be limited only to the objects specified in the list:

# saves only user1 and address2.  all other modified
# objects remain present in the session.
session.flush([user1, address2])

This second form of flush should be used carefully as it will not necessarily locate other dependent objects within the session, whose database representation may have foreign constraint relationships with the objects being operated upon.

Theres also a way to have flush() called automatically before each query; this is called "autoflush" and is described below.

Note that when using a Session that has been placed into a transaction, the commit() method will also flush() the Session unconditionally before committing the transaction.

Note that flush does not change the state of any collections or entity relationships in memory; for example, if you set a foreign key attribute b_id on object A with the identifier B.id, the change will be flushed to the database, but A will not have B added to its collection. If you want to manipulate foreign key attributes directly, refresh() or expire() the objects whose state needs to be refreshed subsequent to flushing.

back to section top

Autoflush

A session can be configured to issue flush() calls before each query. This allows you to immediately have DB access to whatever has been saved to the session. It's recommended to use autoflush with transactional=True, that way an unexpected flush call won't permanently save to the database:

Session = sessionmaker(autoflush=True, transactional=True)
sess = Session()
u1 = User(name='jack')
sess.save(u1)

# reload user1
u2 = sess.query(User).filter_by(name='jack').one()
assert u2 is u1

# commit session, flushes whatever is remaining
sess.commit()

Autoflush is particularly handy when using "dynamic" mapper relations, so that changes to the underlying collection are immediately available via its query interface.

back to section top

Committing

The commit() method on Session is used specifically when the Session is in a transactional state. The two ways that a session may be placed in a transactional state are to create it using the transactional=True option, or to call the begin() method.

commit() serves two purposes; it issues a flush() unconditionally to persist any remaining pending changes, and it issues a commit to all currently managed database connections. In the typical case this is just a single connection. After the commit, connection resources which were allocated by the Session are released. This holds true even for a Session which specifies transactional=True; when such a session is committed, the next transaction is not "begun" until the next database operation occurs.

See the section below on "Managing Transactions" for further detail.

back to section top

Expunge / Clear

Expunge removes an object from the Session, sending persistent instances to the detached state, and pending instances to the transient state:

session.expunge(obj1)

Use expunge when you'd like to remove an object altogether from memory, such as before calling del on it, which will prevent any "ghost" operations occurring when the session is flushed.

This clear() method is equivalent to expunge()-ing everything from the Session:

session.clear()

However note that the clear() method does not reset any transactional state or connection resources; therefore what you usually want to call instead of clear() is close().

back to section top

Closing

The close() method issues a clear(), and releases any transactional/connection resources. When connections are returned to the connection pool, whatever transactional state exists is rolled back.

When close() is called, the Session is in the same state as when it was first created, and is safe to be used again. close() is especially important when using a contextual session, which remains in memory after usage. By issuing close(), the session will be clean for the next request that makes use of it.

back to section top

Refreshing / Expiring

To assist with the Session's "sticky" behavior of instances which are present, individual objects can have all of their attributes immediately re-loaded from the database, or marked as "expired" which will cause a re-load to occur upon the next access of any of the object's mapped attributes. This includes all relationships, so lazy-loaders will be re-initialized, eager relationships will be repopulated. Any changes marked on the object are discarded:

# immediately re-load attributes on obj1, obj2
session.refresh(obj1)
session.refresh(obj2)

# expire objects obj1, obj2, attributes will be reloaded
# on the next access:
session.expire(obj1)
session.expire(obj2)

refresh() and expire() also support being passed a list of individual attribute names in which to be refreshed. These names can reference any attribute, column-based or relation based:

# immediately re-load the attributes 'hello', 'world' on obj1, obj2
session.refresh(obj1, ['hello', 'world'])
session.refresh(obj2, ['hello', 'world'])

# expire the attributes 'hello', 'world' objects obj1, obj2, attributes will be reloaded
# on the next access:
session.expire(obj1, ['hello', 'world'])
session.expire(obj2, ['hello', 'world'])
back to section top

Cascades

Mappers support the concept of configurable cascade behavior on relation()s. This behavior controls how the Session should treat the instances that have a parent-child relationship with another instance that is operated upon by the Session. Cascade is indicated as a comma-separated list of string keywords, with the possible values all, delete, save-update, refresh-expire, merge, expunge, and delete-orphan.

Cascading is configured by setting the cascade keyword argument on a relation():

mapper(Order, order_table, properties={
    'items' : relation(Item, items_table, cascade="all, delete-orphan"),
    'customer' : relation(User, users_table, user_orders_table, cascade="save-update"),
})

The above mapper specifies two relations, items and customer. The items relationship specifies "all, delete-orphan" as its cascade value, indicating that all save, update, merge, expunge, refresh delete and expire operations performed on a parent Order instance should also be performed on the child Item instances attached to it (save and update are cascaded using the save_or_update() method, so that the database identity of the instance doesn't matter). The delete-orphan cascade value additionally indicates that if an Item instance is no longer associated with an Order, it should also be deleted. The "all, delete-orphan" cascade argument allows a so-called lifecycle relationship between an Order and an Item object.

The customer relationship specifies only the "save-update" cascade value, indicating most operations will not be cascaded from a parent Order instance to a child User instance, except for if the Order is attached with a particular session, either via the save(), update(), or save-update() method.

Additionally, when a child item is attached to a parent item that specifies the "save-update" cascade value on the relationship, the child is automatically passed to save_or_update() (and the operation is further cascaded to the child item).

Note that cascading doesn't do anything that isn't possible by manually calling Session methods on individual instances within a hierarchy, it merely automates common operations on a group of associated instances.

The default value for cascade on relation()s is save-update, merge.

back to section top

Managing Transactions

The Session can manage transactions automatically, including across multiple engines. When the Session is in a transaction, as it receives requests to execute SQL statements, it adds each individual Connection/Engine encountered to its transactional state. At commit time, all unflushed data is flushed, and each individual transaction is committed. If the underlying databases support two-phase semantics, this may be used by the Session as well if two-phase transactions are enabled.

The easiest way to use a Session with transactions is just to declare it as transactional. The session will remain in a transaction at all times:

# transactional session
Session = sessionmaker(transactional=True)
sess = Session()
try:
    item1 = sess.query(Item).get(1)
    item2 = sess.query(Item).get(2)
    item1.foo = 'bar'
    item2.bar = 'foo'

    # commit- will immediately go into a new transaction afterwards
    sess.commit()
except:
    # rollback - will immediately go into a new transaction afterwards.
    sess.rollback()

Things to note above:

Alternatively, a transaction can be begun explicitly using begin():

# non transactional session
Session = sessionmaker(transactional=False)
sess = Session()
sess.begin()
try:
    item1 = sess.query(Item).get(1)
    item2 = sess.query(Item).get(2)
    item1.foo = 'bar'
    item2.bar = 'foo'
    sess.commit()
except:
    sess.rollback()
    raise

Like the transactional example, the same rules apply; an explicit rollback() or close() is required when an error occurs, and the commit() call issues a flush() as well.

Session also supports Python 2.5's with statement so that the example above can be written as:

Session = sessionmaker(transactional=False)
sess = Session()
with sess.begin():
    item1 = sess.query(Item).get(1)
    item2 = sess.query(Item).get(2)
    item1.foo = 'bar'
    item2.bar = 'foo'

Subtransactions can be created by calling the begin() method repeatedly. For each transaction you begin() you must always call either commit() or rollback(). Note that this includes the implicit transaction created by the transactional session. When a subtransaction is created the current transaction of the session is set to that transaction. Commiting the subtransaction will return you to the next outer transaction. Rolling it back will also return you to the next outer transaction, but in addition it will roll back database state to the innermost transaction that supports rolling back to. Usually this means the root transaction, unless you use the nested transaction functionality via the begin_nested() method. MySQL and Postgres (and soon Oracle) support using "nested" transactions by creating SAVEPOINTs, :

Session = sessionmaker(transactional=False)
sess = Session()
sess.begin()
sess.save(u1)
sess.save(u2)
sess.flush()

sess.begin_nested() # establish a savepoint
sess.save(u3)
sess.rollback()  # rolls back u3, keeps u1 and u2

sess.commit() # commits u1 and u2

Finally, for MySQL, Postgres, and soon Oracle as well, the session can be instructed to use two-phase commit semantics. This will coordinate the commiting of transactions across databases so that the transaction is either committed or rolled back in all databases. You can also prepare() the session for interacting with transactions not managed by SQLAlchemy. To use two phase transactions set the flag twophase=True on the session:

engine1 = create_engine('postgres://db1')
engine2 = create_engine('postgres://db2')

Session = sessionmaker(twophase=True, transactional=True)

# bind User operations to engine 1, Account operations to engine 2
Session.configure(binds={User:engine1, Account:engine2})

sess = Session()

# .... work with accounts and users

# commit.  session will issue a flush to all DBs, and a prepare step to all DBs,
# before committing both transactions
sess.commit()

Be aware that when a crash occurs in one of the databases while the the transactions are prepared you have to manually commit or rollback the prepared transactions in your database as appropriate.

back to section top

Embedding SQL Insert/Update Expressions into a Flush

This feature allows the value of a database column to be set to a SQL expression instead of a literal value. It's especially useful for atomic updates, calling stored procedures, etc. All you do is assign an expression to an attribute:

class SomeClass(object):
    pass
mapper(SomeClass, some_table)

someobject = session.query(SomeClass).get(5)

# set 'value' attribute to a SQL expression adding one
someobject.value = some_table.c.value + 1

# issues "UPDATE some_table SET value=value+1"
session.commit()

This works both for INSERT and UPDATE statements. After the flush/commit operation, the value attribute on someobject gets "deferred", so that when you again access it the newly generated value will be loaded from the database. This is the same mechanism at work when database-side column defaults fire off.

back to section top

Using SQL Expressions with Sessions

SQL constructs and string statements can be executed via the Session. You'd want to do this normally when your Session is transactional and you'd like your free-standing SQL statements to participate in the same transaction.

The two ways to do this are to use the connection/execution services of the Session, or to have your Session participate in a regular SQL transaction.

First, a Session thats associated with an Engine or Connection can execute statements immediately (whether or not it's transactional):

Session = sessionmaker(bind=engine, transactional=True)
sess = Session()
result = sess.execute("select * from table where id=:id", {'id':7})
result2 = sess.execute(select([mytable], mytable.c.id==7))

To get at the current connection used by the session, which will be part of the current transaction if one is in progress, use connection():

connection = sess.connection()

A second scenario is that of a Session which is not directly bound to a connectable. This session executes statements relative to a particular Mapper, since the mappers are bound to tables which are in turn bound to connectables via their MetaData (either the session or the mapped tables need to be bound). In this case, the Session can conceivably be associated with multiple databases through different mappers; so it wants you to send along a mapper argument, which can be any mapped class or mapper instance:

# session is *not* bound to an engine or connection
Session = sessionmaker(transactional=True)
sess = Session()

# need to specify mapper or class when executing
result = sess.execute("select * from table where id=:id", {'id':7}, mapper=MyMappedClass)
result2 = sess.execute(select([mytable], mytable.c.id==7), mapper=MyMappedClass)

# need to specify mapper or class when you call connection()
connection = sess.connection(MyMappedClass)

The third scenario is when you are using Connection and Transaction yourself, and want the Session to participate. This is easy, as you just bind the Session to the connection:

# non-transactional session
Session = sessionmaker(transactional=False)

# non-ORM connection + transaction
conn = engine.connect()
trans = conn.begin()

# bind the Session *instance* to the connection
sess = Session(bind=conn)

# ... etc

trans.commit()

It's safe to use a Session which is transactional or autoflushing, as well as to call begin()/commit() on the session too; the outermost Transaction object, the one we declared explicitly, controls the scope of the transaction.

When using the threadlocal engine context, things are that much easier; the Session uses the same connection/transaction as everyone else in the current thread, whether or not you explicitly bind it:

engine = create_engine('postgres://mydb', strategy="threadlocal")
engine.begin()

sess = Session()  # session takes place in the transaction like everyone else

# ... go nuts

engine.commit() # commit the transaction
back to section top

Contextual/Thread-local Sessions

A common need in applications, particularly those built around web frameworks, is the ability to "share" a Session object among disparate parts of an application, without needing to pass the object explicitly to all method and function calls. What you're really looking for is some kind of "global" session object, or at least "global" to all the parts of an application which are tasked with servicing the current request. For this pattern, SQLAlchemy provides the ability to enhance the Session class generated by sessionmaker() to provide auto-contextualizing support. This means that whenever you create a Session instance with its constructor, you get an existing Session object which is bound to some "context". By default, this context is the current thread. This feature is what previously was accomplished using the sessioncontext SQLAlchemy extension.

Creating a Thread-local Context

The scoped_session() function wraps around the sessionmaker() function, and produces an object which behaves the same as the Session subclass returned by sessionmaker():

from sqlalchemy.orm import scoped_session, sessionmaker
Session = scoped_session(sessionmaker(autoflush=True, transactional=True))

However, when you instantiate this Session "class", in reality the object is pulled from a threadlocal variable, or if it doesn't exist yet, it's created using the underlying class generated by sessionmaker():

>>> # call Session() the first time.  the new Session instance is created.
>>> sess = Session()

>>> # later, in the same application thread, someone else calls Session()
>>> sess2 = Session()

>>> # the two Session objects are *the same* object
>>> sess is sess2
True

Since the Session() constructor now returns the same Session object every time within the current thread, the object returned by scoped_session() also implements most of the Session methods and properties at the "class" level, such that you don't even need to instantiate Session():

# create some objects
u1 = User()
u2 = User()

# save to the contextual session, without instantiating
Session.save(u1)
Session.save(u2)

# view the "new" attribute
assert u1 in Session.new

# flush changes (if not using autoflush)
Session.flush()

# commit transaction (if using a transactional session)
Session.commit()

To "dispose" of the Session, there's two general approaches. One is to close out the current session, but to leave it assigned to the current context. This allows the same object to be re-used on another operation. This may be called from a current, instantiated Session:

sess.close()

Or, when using scoped_session(), the close() method may also be called as a classmethod on the Session "class":

Session.close()

When the Session is closed, it remains attached, but clears all of its contents and releases any ongoing transactional resources, including rolling back any remaining transactional state. The Session can then be used again.

The other method is to remove the current session from the current context altogether. This is accomplished using the classmethod remove():

Session.remove()

After remove() is called, the next call to Session() will create a new Session object which then becomes the contextual session.

That, in a nutshell, is all there really is to it. Now for all the extra things one should know.

back to section top

Lifespan of a Contextual Session

A (really, really) common question is when does the contextual session get created, when does it get disposed ? We'll consider a typical lifespan as used in a web application:

Web Server          Web Framework        User-defined Controller Call
--------------      --------------       ------------------------------
web request    -> 
                    call controller ->   # call Session().  this establishes a new,
                                         # contextual Session.
                                         sess = Session()

                                         # load some objects, save some changes
                                         objects = sess.query(MyClass).all()

                                         # some other code calls Session, it's the 
                                         # same contextual session as "sess"
                                         sess2 = Session()
                                         sess2.save(foo)
                                         sess2.commit()

                                         # generate content to be returned
                                         return generate_content()
                    Session.remove() <-
web response   <-

Above, we illustrate a typical organization of duties, where the "Web Framework" layer has some integration built-in to manage the span of ORM sessions. Upon the initial handling of an incoming web request, the framework passes control to a controller. The controller then calls Session() when it wishes to work with the ORM; this method establishes the contextual Session which will remain until it's removed. Disparate parts of the controller code may all call Session() and will get the same session object. Then, when the controller has completed and the response is to be sent to the web server, the framework closes out the current contextual session, above using the remove() method which removes the session from the context altogether.

As an alternative, the "finalization" step can also call Session.close(), which will leave the same session object in place. Which one is better ? For a web framework which runs from a fixed pool of threads, it doesn't matter much. For a framework which runs a variable number of threads, or which creates and disposes of a thread for each request, remove() is better, since it leaves no resources associated with the thread which might not exist.

  • Why close out the session at all ? Why not just leave it going so the next request doesn't have to do as many queries ?

    There are some cases where you may actually want to do this. However, this is a special case where you are dealing with data which does not change very often, or you don't care about the "freshness" of the data. In reality, a single thread of a web server may, on a slow day, sit around for many minutes or even hours without being accessed. When it's next accessed, if data from the previous request still exists in the session, that data may be very stale indeed. So it's generally better to have an empty session at the start of a web request.

back to section top

Associating Classes and Mappers with a Contextual Session

Another luxury we gain, when we've established a Session() that can be globally accessed, is the ability for mapped classes and objects to provide us with session-oriented functionality automatically. When using the scoped_session() function, we access this feature using the mapper attribute on the object in place of the normal sqlalchemy.orm.mapper function:

# "contextual" mapper function
mapper = Session.mapper

# use normally
mapper(User, users_table, properties={
    relation(Address)
})
mapper(Address, addresses_table)

When we use the contextual mapper() function, our User and Address now gain a new attribute query, which will create a Query object for us against the contextual session:

wendy = User.query.filter_by(name='wendy').one()

Auto-Save Behavior with Contextual Session's Mapper

By default, when using Session.mapper, new instances are saved into the contextual session automatically upon construction; there is no longer a need to call save():

>>> newuser = User(name='ed')
>>> assert newuser in Session.new
True

The auto-save functionality can cause problems, namely that any flush() which occurs before a newly constructed object is fully populated will result in that object being INSERTed without all of its attributes completed. As a flush() is more frequent when using sessions with autoflush=True, the auto-save behavior can be disabled, using the save_on_init=False flag:

# "contextual" mapper function
mapper = Session.mapper

# use normally, specify no save on init:
mapper(User, users_table, properties={
    relation(Address)
}, save_on_init=False)
mapper(Address, addresses_table, save_on_init=False)

# objects now again require explicit "save"
>>> newuser = User(name='ed')
>>> assert newuser in Session.new
False

>>> Session.save(newuser)
>>> assert newuser in Session.new
True

The functionality of Session.mapper is an updated version of what used to be accomplished by the assignmapper() SQLAlchemy extension.

Generated docstrings for scoped_session()

back to section top

Partitioning Strategies

this section is TODO

Vertical Partitioning

Vertical partitioning places different kinds of objects, or different tables, across multiple databases.

engine1 = create_engine('postgres://db1')
engine2 = create_engine('postgres://db2')

Session = sessionmaker(twophase=True, transactional=True)

# bind User operations to engine 1, Account operations to engine 2
Session.configure(binds={User:engine1, Account:engine2})

sess = Session()
back to section top

Horizontal Partitioning

Horizontal partitioning partitions the rows of a single table (or a set of tables) across multiple databases.

See the "sharding" example in attribute_shard.py

back to section top

Extending Session

Extending the session can be achieved through subclassing as well as through a simple extension class, which resembles the style of Extending Mapper called SessionExtension. See the docstrings for more information on this class' methods.

Basic usage is similar to MapperExtension:

class MySessionExtension(SessionExtension):
    def before_commit(self, session):
        print "before commit!"

Session = sessionmaker(extension=MySessionExtension())

or with create_session():

sess = create_session(extension=MySessionExtension())

The same SessionExtension instance can be used with any number of sessions.

back to section top

The Engine is the starting point for any SQLAlchemy application. It's "home base" for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database and DBAPI combination.

The general structure is this:

                                     +-----------+                        __________
                                 /---|   Pool    |---\                   (__________)
             +-------------+    /    +-----------+    \     +--------+   |          |
connect() <--|   Engine    |---x                       x----| DBAPI  |---| database |
             +-------------+    \    +-----------+    /     +--------+   |          |
                                 \---|  Dialect  |---/                   |__________|
                                     +-----------+                       (__________)

Where above, a sqlalchemy.engine.Engine references both a sqlalchemy.engine.Dialect and sqlalchemy.pool.Pool, which together interpret the DBAPI's module functions as well as the behavior of the database.

Creating an engine is just a matter of issuing a single call, create_engine():

engine = create_engine('postgres://scott:tiger@localhost:5432/mydatabase')

The above engine invokes the postgres dialect and a connection pool which references localhost:5432.

The engine can be used directly to issue SQL to the database. The most generic way is to use connections, which you get via the connect() method:

connection = engine.connect()
result = connection.execute("select username from users")
for row in result:
    print "username:", row['username']
connection.close()

The connection is an instance of sqlalchemy.engine.Connection, which is a proxy object for an actual DBAPI connection. The returned result is an instance of sqlalchemy.engine.ResultProxy, which acts very much like a DBAPI cursor.

When you say engine.connect(), a new Connection object is created, and a DBAPI connection is retrieved from the connection pool. Later, when you call connection.close(), the DBAPI connection is returned to the pool; nothing is actually "closed" from the perspective of the database.

To execute some SQL more quickly, you can skip the Connection part and just say:

result = engine.execute("select username from users")
for row in result:
    print "username:", row['username']
result.close()

Where above, the execute() method on the Engine does the connect() part for you, and returns the ResultProxy directly. The actual Connection is inside the ResultProxy, waiting for you to finish reading the result. In this case, when you close() the ResultProxy, the underlying Connection is closed, which returns the DBAPI connection to the pool.

To summarize the above two examples, when you use a Connection object, it's known as explicit execution. When you don't see the Connection object, but you still use the execute() method on the Engine, it's called explicit, connectionless execution. A third variant of execution also exists called implicit execution; this will be described later.

The Engine and Connection can do a lot more than what we illustrated above; SQL strings are only its most rudimentary function. Later chapters will describe how "constructed SQL" expressions can be used with engines; in many cases, you don't have to deal with the Engine at all after it's created. The Object Relational Mapper (ORM), an optional feature of SQLAlchemy, also uses the Engine in order to get at connections; that's also a case where you can often create the engine once, and then forget about it.

Supported Databases

Recall that the Dialect is used to describe how to talk to a specific kind of database. Dialects are included with SQLAlchemy for SQLite, Postgres, MySQL, MS-SQL, Firebird, Informix, and Oracle; these can each be seen as a Python module present in the sqlalchemy.databases package. Each dialect requires the appropriate DBAPI drivers to be installed separately.

Downloads for each DBAPI at the time of this writing are as follows:

The SQLAlchemy Wiki contains a page of database notes, describing whatever quirks and behaviors have been observed. Its a good place to check for issues with specific databases. Database Notes

back to section top

create_engine() URL Arguments

SQLAlchemy indicates the source of an Engine strictly via RFC-1738 style URLs, combined with optional keyword arguments to specify options for the Engine. The form of the URL is:

driver://username:password@host:port/database

Available drivernames are sqlite, mysql, postgres, oracle, mssql, and firebird. For sqlite, the database name is the filename to connect to, or the special name ":memory:" which indicates an in-memory database. The URL is typically sent as a string to the create_engine() function:

# postgres
pg_db = create_engine('postgres://scott:tiger@localhost:5432/mydatabase')

# sqlite (note the four slashes for an absolute path)
sqlite_db = create_engine('sqlite:////absolute/path/to/database.txt')
sqlite_db = create_engine('sqlite:///relative/path/to/database.txt')
sqlite_db = create_engine('sqlite://')  # in-memory database
sqlite_db = create_engine('sqlite://:memory:')  # the same

# mysql
mysql_db = create_engine('mysql://localhost/foo')

# oracle via TNS name
oracle_db = create_engine('oracle://scott:tiger@dsn')

# oracle will feed host/port/SID into cx_oracle.makedsn
oracle_db = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')

The Engine will ask the connection pool for a connection when the connect() or execute() methods are called. The default connection pool, QueuePool, as well as the default connection pool used with SQLite, SingletonThreadPool, will open connections to the database on an as-needed basis. As concurrent statements are executed, QueuePool will grow its pool of connections to a default size of five, and will allow a default "overflow" of ten. Since the Engine is essentially "home base" for the connection pool, it follows that you should keep a single Engine per database established within an application, rather than creating a new one for each connection.

Custom DBAPI connect() arguments

Custom arguments used when issuing the connect() call to the underlying DBAPI may be issued in three distinct ways. String-based arguments can be passed directly from the URL string as query arguments:

db = create_engine('postgres://scott:tiger@localhost/test?argument1=foo&argument2=bar')

If SQLAlchemy's database connector is aware of a particular query argument, it may convert its type from string to its proper type.

create_engine also takes an argument connect_args which is an additional dictionary that will be passed to connect(). This can be used when arguments of a type other than string are required, and SQLAlchemy's database connector has no type conversion logic present for that parameter:

db = create_engine('postgres://scott:tiger@localhost/test', connect_args = {'argument1':17, 'argument2':'bar'})

The most customizable connection method of all is to pass a creator argument, which specifies a callable that returns a DBAPI connection:

def connect():
    return psycopg.connect(user='scott', host='localhost')

db = create_engine('postgres://', creator=connect)
back to section top

Database Engine Options

Keyword options can also be specified to create_engine(), following the string URL as follows:

db = create_engine('postgres://...', encoding='latin1', echo=True)

A list of all standard options, as well as several that are used by particular database dialects, is as follows:

back to section top

More On Connections

Recall from the beginning of this section that the Engine provides a connect() method which returns a Connection object. Connection is a proxy object which maintains a reference to a DBAPI connection instance. The close() method on Connection does not actually close the DBAPI connection, but instead returns it to the connection pool referenced by the Engine. Connection will also automatically return its resources to the connection pool when the object is garbage collected, i.e. its __del__() method is called. When using the standard C implementation of Python, this method is usually called immediately as soon as the object is dereferenced. With other Python implementations such as Jython, this is not so guaranteed.

The execute() methods on both Engine and Connection can also receive SQL clause constructs as well:

connection = engine.connect()
result = connection.execute(select([table1], table1.c.col1==5))
for row in result:
    print row['col1'], row['col2']
connection.close()

The above SQL construct is known as a select(). The full range of SQL constructs available are described in SQL Expression Language Tutorial.

Both Connection and Engine fulfill an interface known as Connectable which specifies common functionality between the two objects, namely being able to call connect() to return a Connection object (Connection just returns itself), and being able to call execute() to get a result set. Following this, most SQLAlchemy functions and objects which accept an Engine as a parameter or attribute with which to execute SQL will also accept a Connection. As of SQLAlchemy 0.3.9, this argument is named bind.

Specify Engine or Connection
engine = create_engine('sqlite:///:memory:')

# specify some Table metadata
metadata = MetaData()
table = Table('sometable', metadata, Column('col1', Integer))

# create the table with the Engine
table.create(bind=engine)

# drop the table with a Connection off the Engine
connection = engine.connect()
table.drop(bind=connection)

Connection facts:

back to section top

Using Transactions with Connection

The Connection object provides a begin() method which returns a Transaction object. This object is usually used within a try/except clause so that it is guaranteed to rollback() or commit():

trans = connection.begin()
try:
    r1 = connection.execute(table1.select())
    connection.execute(table1.insert(), col1=7, col2='this is some data')
    trans.commit()
except:
    trans.rollback()
    raise

The Transaction object also handles "nested" behavior by keeping track of the outermost begin/commit pair. In this example, two functions both issue a transaction on a Connection, but only the outermost Transaction object actually takes effect when it is committed.

# method_a starts a transaction and calls method_b
def method_a(connection):
    trans = connection.begin() # open a transaction
    try:
        method_b(connection)
        trans.commit()  # transaction is committed here
    except:
        trans.rollback() # this rolls back the transaction unconditionally
        raise

# method_b also starts a transaction
def method_b(connection):
    trans = connection.begin() # open a transaction - this runs in the context of method_a's transaction
    try:
        connection.execute("insert into mytable values ('bat', 'lala')")
        connection.execute(mytable.insert(), col1='bat', col2='lala')
        trans.commit()  # transaction is not committed yet
    except:
        trans.rollback() # this rolls back the transaction unconditionally
        raise

# open a Connection and call method_a
conn = engine.connect()                
method_a(conn)
conn.close()

Above, method_a is called first, which calls connection.begin(). Then it calls method_b. When method_b calls connection.begin(), it just increments a counter that is decremented when it calls commit(). If either method_a or method_b calls rollback(), the whole transaction is rolled back. The transaction is not committed until method_a calls the commit() method. This "nesting" behavior allows the creation of functions which "guarantee" that a transaction will be used if one was not already available, but will automatically participate in an enclosing transaction if one exists.

Note that SQLAlchemy's Object Relational Mapper also provides a way to control transaction scope at a higher level; this is described in unitofwork_transaction.

Transaction Facts:

Understanding Autocommit

The above transaction example illustrates how to use Transaction so that several executions can take part in the same transaction. What happens when we issue an INSERT, UPDATE or DELETE call without using Transaction? The answer is autocommit. While many DBAPIs implement a flag called autocommit, the current SQLAlchemy behavior is such that it implements its own autocommit. This is achieved by searching the statement for strings like INSERT, UPDATE, DELETE, etc. and then issuing a COMMIT automatically if no transaction is in progress.

conn = engine.connect()
conn.execute("INSERT INTO users VALUES (1, 'john')")  # autocommits
back to section top

Connectionless Execution, Implicit Execution

Recall from the first section we mentioned executing with and without a Connection. Connectionless execution refers to calling the execute() method on an object which is not a Connection, which could be on the Engine itself, or could be a constructed SQL object. When we say "implicit", we mean that we are calling the execute() method on an object which is neither a Connection nor an Engine object; this can only be used with constructed SQL objects which have their own execute() method, and can be "bound" to an Engine. A description of "constructed SQL objects" may be found in SQL Expression Language Tutorial.

A summary of all three methods follows below. First, assume the usage of the following MetaData and Table objects; while we haven't yet introduced these concepts, for now you only need to know that we are representing a database table, and are creating an "executable" SQL construct which issues a statement to the database. These objects are described in Database Meta Data.

meta = MetaData()
users_table = Table('users', meta, 
    Column('id', Integer, primary_key=True), 
    Column('name', String(50))
)

Explicit execution delivers the SQL text or constructed SQL expression to the execute() method of Connection:

engine = create_engine('sqlite:///file.db')
connection = engine.connect()
result = connection.execute(users_table.select())
for row in result:
    # ....
connection.close()

Explicit, connectionless execution delivers the expression to the execute() method of Engine:

engine = create_engine('sqlite:///file.db')
result = engine.execute(users_table.select())
for row in result:
    # ....
result.close()

Implicit execution is also connectionless, and calls the execute() method on the expression itself, utilizing the fact that either an Engine or Connection has been bound to the expression object (binding is discussed further in the next section, Database Meta Data):

engine = create_engine('sqlite:///file.db')
meta.bind = engine
result = users_table.select().execute()
for row in result:
    # ....
result.close()

In both "connectionless" examples, the Connection is created behind the scenes; the ResultProxy returned by the execute() call references the Connection used to issue the SQL statement. When we issue close() on the ResultProxy, or if the result set object falls out of scope and is garbage collected, the underlying Connection is closed for us, resulting in the DBAPI connection being returned to the pool.

Using the Threadlocal Execution Strategy

With connectionless execution, each returned ResultProxy object references its own distinct DBAPI connection object. This means that multiple executions will result in multiple DBAPI connections being used at the same time; the example below illustrates this:

db = create_engine('mysql://localhost/test')

# execute one statement and receive results.  r1 now references a DBAPI connection resource.
r1 = db.execute("select * from table1")

# execute a second statement and receive results.  r2 now references a *second* DBAPI connection resource.
r2 = db.execute("select * from table2")
for row in r1:
    ...
for row in r2:
    ...
# release connection 1
r1.close()

# release connection 2
r2.close()

Where above, we have two result sets in scope at the same time, therefore we have two distinct DBAPI connections, both separately checked out from the connection pool, in scope at the same time.

An option exists to create_engine() called strategy="threadlocal", which changes this behavior. When this option is used, the Engine which is returned by create_engine() is a special subclass of engine called TLEngine. This engine, when it creates the Connection used by a connectionless execution, checks a threadlocal variable for an existing DBAPI connection that was already checked out from the pool, within the current thread. If one exists, it uses that one.

The usage of "threadlocal" modifies the underlying behavior of our example above, as follows:

Threadlocal Strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')

# execute one statement and receive results.  r1 now references a DBAPI connection resource.
r1 = db.execute("select * from table1")

# execute a second statement and receive results.  r2 now references the *same* resource as r1
r2 = db.execute("select * from table2")

for row in r1:
    ...
for row in r2:
    ...
# close r1.  the connection is still held by r2.
r1.close()

# close r2.  with no more references to the underlying connection resources, they
# are returned to the pool.
r2.close()

Where above, we again have two result sets in scope at the same time, but because they are present in the same thread, there is only one DBAPI connection in use.

While the above distinction may not seem like much, it has several potentially desirable effects. One is that you can in some cases reduce the number of concurrent connections checked out from the connection pool, in the case that a ResultProxy is still opened and a second statement is issued. A second advantage is that by limiting the number of checked out connections in a thread to just one, you eliminate the issue of deadlocks within a single thread, such as when connection A locks a table, and connection B attempts to read from the same table in the same thread, it will "deadlock" on waiting for connection A to release its lock; the threadlocal strategy eliminates this possibility.

A third advantage to the threadlocal strategy is that it allows the Transaction object to be used in combination with connectionless execution. Recall from the section on transactions, that the Transaction is returned by the begin() method on a Connection; all statements which wish to participate in this transaction must be executed by the same Connection, thereby forcing the usage of an explicit connection. However, the TLEngine provides a Transaction that is local to the current thread; using it, one can issue many "connectionless" statements within a thread and they will all automatically partake in the current transaction, as in the example below:

threadlocal connection sharing
# get a TLEngine
engine = create_engine('mysql://localhost/test', strategy='threadlocal')

engine.begin()
try:
    engine.execute("insert into users values (?, ?)", 1, "john")
    users.update(users.c.user_id==5).execute(name='ed')
    engine.commit()
except:
    engine.rollback()

Notice that no Connection needed to be used; the begin() method on TLEngine (which note is not available on the regular Engine) created a Transaction as well as a Connection, and held onto both in a context corresponding to the current thread. Each execute() call made use of the same connection, allowing them all to participate in the same transaction.

Complex application flows can take advantage of the "threadlocal" strategy in order to allow many disparate parts of an application to take place in the same transaction automatically. The example below demonstrates several forms of "connectionless execution" as well as some specialized explicit ones:

threadlocal connection sharing
engine = create_engine('mysql://localhost/test', strategy='threadlocal')

def dosomethingimplicit():
    table1.execute("some sql")
    table1.execute("some other sql")

def dosomethingelse():
    table2.execute("some sql")
    conn = engine.contextual_connect()
    # do stuff with conn
    conn.execute("some other sql")
    conn.close()

def dosomethingtransactional():
    conn = engine.contextual_connect()
    trans = conn.begin()
     # do stuff
    trans.commit()

engine.begin()
try:
    dosomethingimplicit()
    dosomethingelse()
    dosomethingtransactional()
    engine.commit()
except:
    engine.rollback()

In the above example, the program calls three functions dosomethingimplicit(), dosomethingelse() and dosomethingtransactional(). All three functions use either connectionless execution, or a special function contextual_connect() which we will describe in a moment. These two styles of execution both indicate that all executions will use the same connection object. Additionally, the method dosomethingtransactional() begins and commits its own Transaction. But only one transaction is used, too; it's controlled completely by the engine.begin()/engine.commit() calls at the bottom. Recall that Transaction supports "nesting" behavior, whereby transactions begun on a Connection which already has a transaction open, will "nest" into the enclosing transaction. Since the transaction opened in dosomethingtransactional() occurs using the same connection which already has a transaction begun, it "nests" into that transaction and therefore has no effect on the actual transaction scope (unless it calls rollback()).

Some of the functions in the above example make use of a method called engine.contextual_connect(). This method is available on both Engine as well as TLEngine, and returns the Connection that applies to the current connection context. When using the TLEngine, this is just another term for the "thread local connection" that is being used for all connectionless executions. When using just the regular Engine (i.e. the "default" strategy), contextual_connect() is synonymous with connect(). Below we illustrate that two connections opened via contextual_connect() at the same time, both reference the same underlying DBAPI connection:

Contextual Connection
# threadlocal strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')

conn1 = db.contextual_connect()
conn2 = db.contextual_connect()

>>> conn1.connection is conn2.connection
True

The basic idea of contextual_connect() is that it's the "connection used by connectionless execution". It's different from the connect() method in that connect() is always used when handling an explicit Connection, which will always reference distinct DBAPI connection. Using connect() in combination with TLEngine allows one to "circumvent" the current thread local context, as in this example where a single statement issues data to the database externally to the current transaction:

engine.begin()
engine.execute("insert into users values (?, ?)", 1, "john")
connection = engine.connect()
connection.execute(users.update(users.c.user_id==5).execute(name='ed'))
engine.rollback()

In the above example, a thread-local transaction is begun, but is later rolled back. The statement insert into users values (?, ?) is executed without using a connection, therefore uses the thread-local transaction. So its data is rolled back when the transaction is rolled back. However, the users.update() statement is executed using a distinct Connection returned by the engine.connect() method, so it therefore is not part of the threadlocal transaction; it autocommits immediately.

back to section top

Configuring Logging

As of the 0.3 series of SQLAlchemy, Python's standard logging module is used to implement informational and debug log output. This allows SQLAlchemy's logging to integrate in a standard way with other applications and libraries. The echo and echo_pool flags that are present on create_engine(), as well as the echo_uow flag used on Session, all interact with regular loggers.

This section assumes familiarity with the above linked logging module. All logging performed by SQLAlchemy exists underneath the sqlalchemy namespace, as used by logging.getLogger('sqlalchemy'). When logging has been configured (i.e. such as via logging.basicConfig()), the general namespace of SA loggers that can be turned on is as follows:

For example, to log SQL queries as well as unit of work debugging:

import logging

logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)
logging.getLogger('sqlalchemy.orm.unitofwork').setLevel(logging.DEBUG)

By default, the log level is set to logging.ERROR within the entire sqlalchemy namespace so that no log operations occur, even within an application that has logging enabled otherwise.

The echo flags present as keyword arguments to create_engine() and others as well as the echo property on Engine, when set to True, will first attempt to ensure that logging is enabled. Unfortunately, the logging module provides no way of determining if output has already been configured (note we are referring to if a logging configuration has been set up, not just that the logging level is set). For this reason, any echo=True flags will result in a call to logging.basicConfig() using sys.stdout as the destination. It also sets up a default format using the level name, timestamp, and logger name. Note that this configuration has the affect of being configured in addition to any existing logger configurations. Therefore, when using Python logging, ensure all echo flags are set to False at all times, to avoid getting duplicate log lines.

back to section top

Describing Databases with MetaData

The core of SQLAlchemy's query and object mapping operations are supported by database metadata, which is comprised of Python objects that describe tables and other schema-level objects. These objects can be created by explicitly naming the various components and their properties, using the Table, Column, ForeignKey, Index, and Sequence objects imported from sqlalchemy.schema. There is also support for reflection of some entities, which means you only specify the name of the entities and they are recreated from the database automatically.

A collection of metadata entities is stored in an object aptly named MetaData:

from sqlalchemy import *

metadata = MetaData()

To represent a Table, use the Table class:

users = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('email_address', String(60), key='email'),
    Column('password', String(20), nullable = False)
)

user_prefs = Table('user_prefs', metadata, 
    Column('pref_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False),
    Column('pref_name', String(40), nullable=False),
    Column('pref_value', String(100))
)

The specific datatypes for each Column, such as Integer, String, etc. are described in The Types System, and exist within the module sqlalchemy.types as well as the global sqlalchemy namespace.

Foreign keys are most easily specified by the ForeignKey object within a Column object. For a composite foreign key, i.e. a foreign key that contains multiple columns referencing multiple columns to a composite primary key, an explicit syntax is provided which allows the correct table CREATE statements to be generated:

# a table with a composite primary key
invoices = Table('invoices', metadata, 
    Column('invoice_id', Integer, primary_key=True),
    Column('ref_num', Integer, primary_key=True),
    Column('description', String(60), nullable=False)
)

# a table with a composite foreign key referencing the parent table
invoice_items = Table('invoice_items', metadata, 
    Column('item_id', Integer, primary_key=True),
    Column('item_name', String(60), nullable=False),
    Column('invoice_id', Integer, nullable=False),
    Column('ref_num', Integer, nullable=False),
    ForeignKeyConstraint(['invoice_id', 'ref_num'], ['invoices.invoice_id', 'invoices.ref_num'])
)

Above, the invoice_items table will have ForeignKey objects automatically added to the invoice_id and ref_num Column objects as a result of the additional ForeignKeyConstraint object.

The MetaData object supports some handy methods, such as getting a list of Tables in the order (or reverse) of their dependency:

>>> for t in metadata.table_iterator(reverse=False):
...    print t.name
users
user_prefs

And Table provides an interface to the table's properties as well as that of its columns:

employees = Table('employees', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('employee_name', String(60), nullable=False, key='name'),
    Column('employee_dept', Integer, ForeignKey("departments.department_id"))
)

# access the column "EMPLOYEE_ID":
employees.columns.employee_id

# or just
employees.c.employee_id

# via string
employees.c['employee_id']

# iterate through all columns
for c in employees.c:
    # ...

# get the table's primary key columns
for primary_key in employees.primary_key:
    # ...

# get the table's foreign key objects:
for fkey in employees.foreign_keys:
    # ...

# access the table's MetaData:
employees.metadata

# access the table's bound Engine or Connection, if its MetaData is bound:
employees.bind

# access a column's name, type, nullable, primary key, foreign key
employees.c.employee_id.name
employees.c.employee_id.type
employees.c.employee_id.nullable
employees.c.employee_id.primary_key
employees.c.employee_dept.foreign_key

# get the "key" of a column, which defaults to its name, but can 
# be any user-defined string:
employees.c.name.key

# access a column's table:
employees.c.employee_id.table is employees
>>> True

# get the table related by a foreign key
fcolumn = employees.c.employee_dept.foreign_key.column.table

Binding MetaData to an Engine or Connection

A MetaData object can be associated with an Engine or an individual Connection; this process is called binding. The term used to describe "an engine or a connection" is often referred to as a connectable. Binding allows the MetaData and the elements which it contains to perform operations against the database directly, using the connection resources to which it's bound. Common operations which are made more convenient through binding include being able to generate SQL constructs which know how to execute themselves, creating Table objects which query the database for their column and constraint information, and issuing CREATE or DROP statements.

To bind MetaData to an Engine, use the bind attribute:

engine = create_engine('sqlite://', **kwargs)

# create MetaData 
meta = MetaData()

# bind to an engine
meta.bind = engine

Once this is done, the MetaData and its contained Table objects can access the database directly:

meta.create_all()  # issue CREATE statements for all tables

# describe a table called 'users', query the database for its columns
users_table = Table('users', meta, autoload=True)

# generate a SELECT statement and execute
result = users_table.select().execute()

Note that the feature of binding engines is completely optional. All of the operations which take advantage of "bound" MetaData also can be given an Engine or Connection explicitly with which to perform the operation. The equivalent "non-bound" of the above would be:

meta.create_all(engine)  # issue CREATE statements for all tables

# describe a table called 'users',  query the database for its columns
users_table = Table('users', meta, autoload=True, autoload_with=engine)

# generate a SELECT statement and execute
result = engine.execute(users_table.select())
back to section top

Reflecting Tables

A Table object can be created without specifying any of its contained attributes, using the argument autoload=True in conjunction with the table's name and possibly its schema (if not the databases "default" schema). (You can also specify a list or set of column names to autoload as the kwarg include_columns, if you only want to load a subset of the columns in the actual database.) This will issue the appropriate queries to the database in order to locate all properties of the table required for SQLAlchemy to use it effectively, including its column names and datatypes, foreign and primary key constraints, and in some cases its default-value generating attributes. To use autoload=True, the table's MetaData object need be bound to an Engine or Connection, or alternatively the autoload_with=<some connectable> argument can be passed. Below we illustrate autoloading a table and then iterating through the names of its columns:

>>> messages = Table('messages', meta, autoload=True)
>>> [c.name for c in messages.columns]
['message_id', 'message_name', 'date']

Note that if a reflected table has a foreign key referencing another table, the related Table object will be automatically created within the MetaData object if it does not exist already. Below, suppose table shopping_cart_items references a table shopping_carts. After reflecting, the shopping carts table is present:

>>> shopping_cart_items = Table('shopping_cart_items', meta, autoload=True)
>>> 'shopping_carts' in meta.tables:
True

To get direct access to 'shopping_carts', simply instantiate it via the Table constructor. Table uses a special contructor that will return the already created Table instance if it's already present:

shopping_carts = Table('shopping_carts', meta)

Of course, it's a good idea to use autoload=True with the above table regardless. This is so that if it hadn't been loaded already, the operation will load the table. The autoload operation only occurs for the table if it hasn't already been loaded; once loaded, new calls to Table will not re-issue any reflection queries.

Overriding Reflected Columns

Individual columns can be overridden with explicit values when reflecting tables; this is handy for specifying custom datatypes, constraints such as primary keys that may not be configured within the database, etc.

>>> mytable = Table('mytable', meta,
... Column('id', Integer, primary_key=True),   # override reflected 'id' to have primary key
... Column('mydata', Unicode(50)),    # override reflected 'mydata' to be Unicode
... autoload=True)
back to section top

Specifying the Schema Name

Some databases support the concept of multiple schemas. A Table can reference this by specifying the schema keyword argument:

financial_info = Table('financial_info', meta,
    Column('id', Integer, primary_key=True),
    Column('value', String(100), nullable=False),
    schema='remote_banks'
)

Within the MetaData collection, this table will be identified by the combination of financial_info and remote_banks. If another table called financial_info is referenced without the remote_banks schema, it will refer to a different Table. ForeignKey objects can reference columns in this table using the form remote_banks.financial_info.id.

back to section top

ON UPDATE and ON DELETE

ON UPDATE and ON DELETE clauses to a table create are specified within the ForeignKeyConstraint object, using the onupdate and ondelete keyword arguments:

foobar = Table('foobar', meta,
    Column('id', Integer, primary_key=True),
    Column('lala', String(40)),
    ForeignKeyConstraint(['lala'],['hoho.lala'], onupdate="CASCADE", ondelete="CASCADE"))

Note that these clauses are not supported on SQLite, and require InnoDB tables when used with MySQL. They may also not be supported on other databases.

back to section top

Other Options

Tables may support database-specific options, such as MySQL's engine option that can specify "MyISAM", "InnoDB", and other backends for the table:

addresses = Table('engine_email_addresses', meta,
    Column('address_id', Integer, primary_key = True),
    Column('remote_user_id', Integer, ForeignKey(users.c.user_id)),
    Column('email_address', String(20)),
    mysql_engine='InnoDB'
)
back to section top

Creating and Dropping Database Tables

Creating and dropping individual tables can be done via the create() and drop() methods of Table; these methods take an optional bind parameter which references an Engine or a Connection. If not supplied, the Engine bound to the MetaData will be used, else an error is raised:

meta = MetaData()
meta.bind = 'sqlite:///:memory:'

employees = Table('employees', meta, 
    Column('employee_id', Integer, primary_key=True),
    Column('employee_name', String(60), nullable=False, key='name'),
    Column('employee_dept', Integer, ForeignKey("departments.department_id"))
)
sqlemployees.create()

drop() method:

sqlemployees.drop(bind=e)

The create() and drop() methods also support an optional keyword argument checkfirst which will issue the database's appropriate pragma statements to check if the table exists before creating or dropping:

employees.create(bind=e, checkfirst=True)
employees.drop(checkfirst=False)

Entire groups of Tables can be created and dropped directly from the MetaData object with create_all() and drop_all(). These methods always check for the existence of each table before creating or dropping. Each method takes an optional bind keyword argument which can reference an Engine or a Connection. If no engine is specified, the underlying bound Engine, if any, is used:

engine = create_engine('sqlite:///:memory:')

metadata = MetaData()

users = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('email_address', String(60), key='email'),
    Column('password', String(20), nullable = False)
)

user_prefs = Table('user_prefs', metadata, 
    Column('pref_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False),
    Column('pref_name', String(40), nullable=False),
    Column('pref_value', String(100))
)

sqlmetadata.create_all(bind=engine)
back to section top

Column Insert/Update Defaults

SQLAlchemy includes several constructs which provide default values provided during INSERT and UPDATE statements. The defaults may be provided as Python constants, Python functions, or SQL expressions, and the SQL expressions themselves may be "pre-executed", executed inline within the insert/update statement itself, or can be created as a SQL level "default" placed on the table definition itself. A "default" value by definition is only invoked if no explicit value is passed into the INSERT or UPDATE statement.

Pre-Executed Python Functions

The "default" keyword argument on Column can reference a Python value or callable which is invoked at the time of an insert:

# a function which counts upwards
i = 0
def mydefault():
    global i
    i += 1
    return i

t = Table("mytable", meta, 
    # function-based default
    Column('id', Integer, primary_key=True, default=mydefault),

    # a scalar default
    Column('key', String(10), default="default")
)

Similarly, the "onupdate" keyword does the same thing for update statements:

import datetime

t = Table("mytable", meta, 
    Column('id', Integer, primary_key=True),

    # define 'last_updated' to be populated with datetime.now()
    Column('last_updated', DateTime, onupdate=datetime.now),
)
back to section top

Pre-executed and Inline SQL Expressions

The "default" and "onupdate" keywords may also be passed SQL expressions, including select statements or direct function calls:

t = Table("mytable", meta, 
    Column('id', Integer, primary_key=True),

    # define 'create_date' to default to now()
    Column('create_date', DateTime, default=func.now()),

    # define 'key' to pull its default from the 'keyvalues' table
    Column('key', String(20), default=keyvalues.select(keyvalues.c.type='type1', limit=1))

    # define 'last_modified' to use the current_timestamp SQL function on update
    Column('last_modified', DateTime, onupdate=func.current_timestamp())
    )

The above SQL functions are usually executed "inline" with the INSERT or UPDATE statement being executed. In some cases, the function is "pre-executed" and its result pre-fetched explicitly. This happens under the following circumstances:

  • the column is a primary key column

  • the database dialect does not support a usable cursor.lastrowid accessor (or equivalent); this currently includes Postgres, Oracle, and Firebird.

  • the statement is a single execution, i.e. only supplies one set of parameters and doesn't use "executemany" behavior

  • the inline=True flag is not set on the Insert() or Update() construct.

For a statement execution which is not an executemany, the returned ResultProxy will contain a collection accessible via result.postfetch_cols() which contains a list of all Column objects which had an inline-executed default. Similarly, all parameters which were bound to the statement, including all Python and SQL expressions which were pre-executed, are present in the last_inserted_params() or last_updated_params() collections on ResultProxy. The last_inserted_ids() collection contains a list of primary key values for the row inserted.

back to section top

DDL-Level Defaults

A variant on a SQL expression default is the PassiveDefault, which gets placed in the CREATE TABLE statement during a create() operation:

t = Table('test', meta, 
    Column('mycolumn', DateTime, PassiveDefault(text("sysdate")))
)

A create call for the above table will produce:

CREATE TABLE test (
    mycolumn datetime default sysdate
)

The behavior of PassiveDefault is similar to that of a regular SQL default; if it's placed on a primary key column for a database which doesn't have a way to "postfetch" the ID, and the statement is not "inlined", the SQL expression is pre-executed; otherwise, SQLAlchemy lets the default fire off on the database side normally.

back to section top

Defining Sequences

A table with a sequence looks like:

table = Table("cartitems", meta, 
    Column("cart_id", Integer, Sequence('cart_id_seq'), primary_key=True),
    Column("description", String(40)),
    Column("createdate", DateTime())
)

The Sequence object works a lot like the default keyword on Column, except that it only takes effect on a database which supports sequences. When used with a database that does not support sequences, the Sequence object has no effect; therefore it's safe to place on a table which is used against multiple database backends. The same rules for pre- and inline execution apply.

When the Sequence is associated with a table, CREATE and DROP statements issued for that table will also issue CREATE/DROP for the sequence object as well, thus "bundling" the sequence object with its parent table.

The flag optional=True on Sequence will produce a sequence that is only used on databases which have no "autoincrementing" capability. For example, Postgres supports primary key generation using the SERIAL keyword, whereas Oracle has no such capability. Therefore, a Sequence placed on a primary key column with optional=True will only be used with an Oracle backend but not Postgres.

A sequence can also be executed standalone, using an Engine or Connection, returning its next value in a database-independent fashion:

seq = Sequence('some_sequence')
nextid = connection.execute(seq)
back to section top

Defining Constraints and Indexes

UNIQUE Constraint

Unique constraints can be created anonymously on a single column using the unique keyword on Column. Explicitly named unique constraints and/or those with multiple columns are created via the UniqueConstraint table-level construct.

meta = MetaData()
mytable = Table('mytable', meta,

    # per-column anonymous unique constraint
    Column('col1', Integer, unique=True),

    Column('col2', Integer),
    Column('col3', Integer),

    # explicit/composite unique constraint.  'name' is optional.
    UniqueConstraint('col2', 'col3', name='uix_1')
    )
back to section top

CHECK Constraint

Check constraints can be named or unnamed and can be created at the Column or Table level, using the CheckConstraint construct. The text of the check constraint is passed directly through to the database, so there is limited "database independent" behavior. Column level check constraints generally should only refer to the column to which they are placed, while table level constraints can refer to any columns in the table.

Note that some databases do not actively support check constraints such as MySQL and SQLite.

meta = MetaData()
mytable = Table('mytable', meta,

    # per-column CHECK constraint
    Column('col1', Integer, CheckConstraint('col1>5')),

    Column('col2', Integer),
    Column('col3', Integer),

    # table level CHECK constraint.  'name' is optional.
    CheckConstraint('col2 > col3 + 5', name='check1')
    )
back to section top

Indexes

Indexes can be created anonymously (using an auto-generated name "ix_") for a single column using the inline index keyword on Column, which also modifies the usage of unique to apply the uniqueness to the index itself, instead of adding a separate UNIQUE constraint. For indexes with specific names or which encompass more than one column, use the Index construct, which requires a name.

Note that the Index construct is created externally to the table which it corresponds, using Column objects and not strings.

meta = MetaData()
mytable = Table('mytable', meta,
    # an indexed column, with index "ix_mytable_col1"
    Column('col1', Integer, index=True),

    # a uniquely indexed column with index "ix_mytable_col2"
    Column('col2', Integer, index=True, unique=True),

    Column('col3', Integer),
    Column('col4', Integer),

    Column('col5', Integer),
    Column('col6', Integer),
    )

# place an index on col3, col4
Index('idx_col34', mytable.c.col3, mytable.c.col4)

# place a unique index on col5, col6
Index('myindex', mytable.c.col5, mytable.c.col6, unique=True)

The Index objects will be created along with the CREATE statements for the table itself. An index can also be created on its own independently of the table:

# create a table
sometable.create()

# define an index
i = Index('someindex', sometable.c.col5)

# create the index, will use the table's bound connectable if the `bind` keyword argument not specified
i.create()
back to section top

Adapting Tables to Alternate Metadata

A Table object created against a specific MetaData object can be re-created against a new MetaData using the tometadata method:

# create two metadata
meta1 = MetaData('sqlite:///querytest.db')
meta2 = MetaData()

# load 'users' from the sqlite engine
users_table = Table('users', meta1, autoload=True)

# create the same Table object for the plain metadata
users_table_2 = users_table.tometadata(meta2)
back to section top

The package sqlalchemy.types defines the datatype identifiers which may be used when defining metadata. This package includes a set of generic types, a set of SQL-specific subclasses of those types, and a small extension system used by specific database connectors to adapt these generic types into database-specific type objects.

Built-in Types

SQLAlchemy comes with a set of standard generic datatypes, which are defined as classes. Types are usually used when defining tables, and can be left as a class or instantiated, for example:

mytable = Table('mytable', metadata,
    Column('myid', Integer, primary_key=True),
    Column('data', String(30)),
    Column('info', Unicode(100)),
    Column('value', Number(7,4)) 
    )

Following is a rundown of the standard types.

String

This type is the base type for all string and character types, such as Unicode, TEXT, CLOB, etc. By default it generates a VARCHAR in DDL. It includes an argument length, which indicates the length in characters of the type, as well as convert_unicode and assert_unicode, which are booleans. length will be used as the length argument when generating DDL. If length is omitted, the String type resolves into the TEXT type.

convert_unicode=True indicates that incoming strings, if they are Python unicode strings, will be encoded into a raw bytestring using the encoding attribute of the dialect (defaults to utf-8). Similarly, raw bytestrings coming back from the database will be decoded into unicode objects on the way back.

assert_unicode is set to None by default. When True, it indicates that incoming bind parameters will be checked that they are in fact unicode objects, else an error is raised. A value of 'warn' instead raises a warning. Setting it to None indicates that the dialect-level convert_unicode setting should take place, whereas setting it to False disables it unconditionally (this flag is new as of version 0.4.2).

Both convert_unicode and assert_unicode may be set at the engine level as flags to create_engine().

back to section top

Unicode

The Unicode type is shorthand for String with convert_unicode=True and assert_unicode='warn'. When writing a Unicode-aware application, it is strongly recommended that this type is used, and that only Unicode strings are used in the application. By "Unicode string" we mean a string with a u, i.e. u'hello'. Otherwise, particularly when using the ORM, data will be converted to Unicode when it returns from the database, but local data which was generated locally will not be in Unicode format, which can create confusion.

back to section top

Text / UnicodeText

These are the "unbounded" versions of String and Unicode. They have no "length" parameter, and generate a column type of TEXT or CLOB.

back to section top

Numeric

Numeric types return decimal.Decimal objects by default. The flag asdecimal=False may be specified which enables the type to pass data straight through. Numeric also takes "precision" and "scale" arguments which are used when CREATE TABLE is issued.

back to section top

Float

Float types return Python floats. Float also takes a "precision" argument which is used when CREATE TABLE is issued.

back to section top

Datetime/Date/Time

Date and time types return objects from the Python datetime module. Most DBAPIs have built in support for the datetime module, with the noted exception of SQLite. In the case of SQLite, date and time types are stored as strings which are then converted back to datetime objects when rows are returned.

back to section top

Interval

The Interval type deals with datetime.timedelta objects. In Postgres, the native INTERVAL type is used; for others, the value is stored as a date which is relative to the "epoch" (Jan. 1, 1970).

back to section top

Binary

The Binary type generates BLOB or BYTEA when tables are created, and also converts incoming values using the Binary callable provided by each DBAPI.

back to section top

Boolean

Boolean typically uses BOOLEAN or SMALLINT on the CREATE TABLE side, and returns Python True or False.

back to section top

PickleType

PickleType builds upon the Binary type to apply Python's pickle.dumps() to incoming objects, and pickle.loads() on the way out, allowing any pickleable Python object to be stored as a serialized binary field.

back to section top

SQL-Specific Types

These are subclasses of the generic types and include:

class FLOAT(Numeric)
class TEXT(String)
class DECIMAL(Numeric)
class INT(Integer)
INTEGER = INT
class TIMESTAMP(DateTime)
class DATETIME(DateTime)
class CLOB(String)
class VARCHAR(String)
class CHAR(String)
class BLOB(Binary)
class BOOLEAN(Boolean)

The idea behind the SQL-specific types is that a CREATE TABLE statement would generate the exact type specified.

back to section top

Dialect Specific Types

Each dialect has its own set of types, many of which are available only within that dialect. For example, MySQL has a BigInteger type and Postgres has an Inet type. To use these, import them from the module explicitly:

from sqlalchemy.databases.mysql import MSEnum, MSBigInteger

table = Table('foo', meta,
    Column('enumerates', MSEnum('a', 'b', 'c')),
    Column('id', MSBigInteger)
)

Or some postgres types:

from sqlalchemy.databases.postgres import PGInet, PGArray

table = Table('foo', meta,
    Column('ipaddress', PGInet),
    Column('elements', PGArray(str))   # PGArray is available in 0.4, and takes a type argument
    )
back to section top

Creating your Own Types

User-defined types can be created which can augment the bind parameter and result processing capabilities of the built in types. This is usually achieved using the TypeDecorator class, which "decorates" the behavior of any existing type. As of version 0.4.2, the new process_bind_param() and process_result_value() methods should be used:

import sqlalchemy.types as types

class MyType(types.TypeDecorator):
    """a type that decorates Unicode, prefixes values with "PREFIX:" on 
    the way in and strips it off on the way out."""

    impl = types.Unicode

    def process_bind_param(self, value, engine):
        return "PREFIX:" + value

    def process_result_value(self, value, engine):
        return value[7:]

    def copy(self):
        return MyType(self.impl.length)

Note that the "old" way to process bind parameters and result values, the convert_bind_param() and convert_result_value() methods, are still available. The downside of these is that when using a type which already processes data such as the Unicode type, you need to call the superclass version of these methods directly. Using process_bind_param() and process_result_value(), user-defined code can return and receive the desired Python data directly.

As of version 0.4.2, TypeDecorator should generally be used for any user-defined type which redefines the behavior of another type, including other TypeDecorator subclasses such as PickleType, and the new process_...() methods described above should be used.

To build a type object from scratch, which will not have a corresponding database-specific implementation, subclass TypeEngine:

import sqlalchemy.types as types

class MyType(types.TypeEngine):
    def __init__(self, precision = 8):
        self.precision = precision

    def get_col_spec(self):
        return "MYTYPE(%s)" % self.precision

    def convert_bind_param(self, value, engine):
        return value

    def convert_result_value(self, value, engine):
        return value

Once you make your type, it's immediately useable:

table = Table('foo', meta,
    Column('id', Integer, primary_key=True),
    Column('data', MyType(16))
    )
back to section top

This section describes the connection pool module of SQLAlchemy. The Pool object it provides is normally embedded within an Engine instance. For most cases, explicit access to the pool module is not required. However, the Pool object can be used on its own, without the rest of SA, to manage DBAPI connections; this section describes that usage. Also, this section will describe in more detail how to customize the pooling strategy used by an Engine.

At the base of any database helper library is a system of efficiently acquiring connections to the database. Since the establishment of a database connection is typically a somewhat expensive operation, an application needs a way to get at database connections repeatedly without incurring the full overhead each time. Particularly for server-side web applications, a connection pool is the standard way to maintain a "pool" of database connections which are used over and over again among many requests. Connection pools typically are configured to maintain a certain "size", which represents how many connections can be used simultaneously without resorting to creating more newly-established connections.

Establishing a Transparent Connection Pool

Any DBAPI module can be "proxied" through the connection pool using the following technique (note that the usage of 'psycopg2' is just an example; substitute whatever DBAPI module you'd like):

import sqlalchemy.pool as pool
import psycopg2 as psycopg
psycopg = pool.manage(psycopg)

# then connect normally
connection = psycopg.connect(database='test', username='scott', password='tiger')

This produces a sqlalchemy.pool.DBProxy object which supports the same connect() function as the original DBAPI module. Upon connection, a connection proxy object is returned, which delegates its calls to a real DBAPI connection object. This connection object is stored persistently within a connection pool (an instance of sqlalchemy.pool.Pool) that corresponds to the exact connection arguments sent to the connect() function.

The connection proxy supports all of the methods on the original connection object, most of which are proxied via __getattr__(). The close() method will return the connection to the pool, and the cursor() method will return a proxied cursor object. Both the connection proxy and the cursor proxy will also return the underlying connection to the pool after they have both been garbage collected, which is detected via the __del__() method.

Additionally, when connections are returned to the pool, a rollback() is issued on the connection unconditionally. This is to release any locks still held by the connection that may have resulted from normal activity.

By default, the connect() method will return the same connection that is already checked out in the current thread. This allows a particular connection to be used in a given thread without needing to pass it around between functions. To disable this behavior, specify use_threadlocal=False to the manage() function.

back to section top

Connection Pool Configuration

For all types of Pool construction, which includes the "transparent proxy" described in the previous section, using an Engine via create_engine(), or constructing a pool through direct class instantiation, the options are generally the same. Additional options may be available based on the specific subclass of Pool being used.

For a description of all pool classes, see the generated documentation.

Common options include:

QueuePool options include:

back to section top

Custom Pool Construction

Besides using the transparent proxy, instances of sqlalchemy.pool.Pool can be created directly. Constructing your own pool involves passing a callable used to create a connection. Through this method, custom connection schemes can be made, such as a connection that automatically executes some initialization commands to start.

Constructing a QueuePool
import sqlalchemy.pool as pool
import psycopg2

def getconn():
    c = psycopg2.connect(username='ed', host='127.0.0.1', dbname='test')
    # execute an initialization function on the connection before returning
    c.cursor.execute("setup_encodings()")
    return c

p = pool.QueuePool(getconn, max_overflow=10, pool_size=5, use_threadlocal=True)

Or with SingletonThreadPool:

Constructing a SingletonThreadPool
import sqlalchemy.pool as pool
import sqlite

def getconn():
    return sqlite.connect(filename='myfile.db')

# SQLite connections require the SingletonThreadPool    
p = pool.SingletonThreadPool(getconn)
back to section top

SQLAlchemy has a variety of extensions available which provide extra functionality to SA, either via explicit usage or by augmenting the core behavior. Several of these extensions are designed to work together.

declarative

Author: Mike Bayer

Version: 0.4.4 or greater

declarative intends to be a fully featured replacement for the very old activemapper extension. Its goal is to redefine the organization of class, Table, and mapper() constructs such that they can all be defined "at once" underneath a class declaration. Unlike activemapper, it does not redefine normal SQLAlchemy configurational semantics - regular Column, relation() and other schema or ORM constructs are used in almost all cases.

declarative is a so-called "micro declarative layer"; it does not generate table or column names and requires almost as fully verbose a configuration as that of straight tables and mappers. As an alternative, the Elixir project is a full community-supported declarative layer for SQLAlchemy, and is recommended for its active-record-like semantics, its convention-based configuration, and plugin capabilities.

SQLAlchemy object-relational configuration involves the usage of Table, mapper(), and class objects to define the three areas of configuration. declarative moves these three types of configuration underneath the individual mapped class. Regular SQLAlchemy schema and ORM constructs are used in most cases:

from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class SomeClass(Base):
    __tablename__ = 'some_table'
    id = Column('id', Integer, primary_key=True)
    name =  Column('name', String(50))

Above, the declarative_base callable produces a new base class from which all mapped classes inherit from. When the class definition is completed, a new Table and mapper() have been generated, accessible via the __table__ and __mapper__ attributes on the SomeClass class.

Attributes may be added to the class after its construction, and they will be added to the underlying Table and mapper() definitions as appropriate:

SomeClass.data = Column('data', Unicode)
SomeClass.related = relation(RelatedInfo)

Classes which are mapped explicitly using mapper() can interact freely with declarative classes.

The declarative_base base class contains a MetaData object where newly defined Table objects are collected. This is accessed via the metadata class level accessor, so to create tables we can say:

engine = create_engine('sqlite://')
Base.metadata.create_all(engine)

The Engine created above may also be directly associated with the declarative base class using the engine keyword argument, where it will be associated with the underlying MetaData object and allow SQL operations involving that metadata and its tables to make use of that engine automatically:

Base = declarative_base(engine=create_engine('sqlite://'))

Or, as MetaData allows, at any time using the bind attribute:

Base.metadata.bind = create_engine('sqlite://')

The declarative_base can also receive a pre-created MetaData object, which allows a declarative setup to be associated with an already existing traditional collection of Table objects:

mymetadata = MetaData()
Base = declarative_base(metadata=mymetadata)

Relations to other classes are done in the usual way, with the added feature that the class specified to relation() may be a string name. The "class registry" associated with Base is used at mapper compilation time to resolve the name into the actual class object, which is expected to have been defined once the mapper configuration is used:

class User(Base):
    __tablename__ = 'users'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50))
    addresses = relation("Address", backref="user")

class Address(Base):
    __tablename__ = 'addresses'

    id = Column('id', Integer, primary_key=True)
    email = Column('email', String(50))
    user_id = Column('user_id', Integer, ForeignKey('users.id'))

Column constructs, since they are just that, are immediately usable, as below where we define a primary join condition on the Address class using them:

class Address(Base)
    __tablename__ = 'addresses'

    id = Column('id', Integer, primary_key=True)
    email = Column('email', String(50))
    user_id = Column('user_id', Integer, ForeignKey('users.id'))
    user = relation(User, primaryjoin=user_id==User.id)

When an explicit join condition or other configuration which depends on multiple classes cannot be defined immediately due to some classes not yet being available, these can be defined after all classes have been created. Attributes which are added to the class after its creation are associated with the Table/mapping in the same way as if they had been defined inline:

User.addresses = relation(Address, primaryjoin=Address.user_id==User.id)

Synonyms are one area where declarative needs to slightly change the usual SQLAlchemy configurational syntax. To define a getter/setter which proxies to an underlying attribute, use synonym with the instruments argument:

class MyClass(Base):
    __tablename__ = 'sometable'

    _attr = Column('attr', String)

    def _get_attr(self):
        return self._some_attr
    def _set_attr(self, attr)
        self._some_attr = attr
    attr = synonym('_attr', instruments=property(_get_attr, _set_attr))

The above synonym is then usable as an instance attribute as well as a class-level expression construct:

x = MyClass()
x.attr = "some value"
session.query(MyClass).filter(MyClass.attr == 'some other value').all()

The synonyn_for decorator can accomplish the same task:

class MyClass(Base):
    __tablename__ = 'sometable'

    _attr = Column('attr', String)

    @synonyn_for('_attr')
    @property
    def attr(self):
        return self._some_attr

Similarly, comparable_using is a front end for the comparable_property ORM function:

class MyClass(Base):
    __tablename__ = 'sometable'

    name = Column('name', String)

    @comparable_using(MyUpperCaseComparator)
    @property
    def uc_name(self):
        return self.name.upper()

As an alternative to __tablename__, a direct Table construct may be used. The Column objects, which in this case require their names, will be added to the mapping just like a regular mapping to a table:

class MyClass(Base):
    __table__ = Table('my_table', Base.metadata,
        Column('id', Integer, primary_key=True),
        Column('name', String(50))
    )

This is the preferred approach when using reflected tables, as below:

class MyClass(Base):
    __table__ = Table('my_table', Base.metadata, autoload=True)

Mapper arguments are specified using the __mapper_args__ class variable. Note that the column objects declared on the class are immediately usable, as in this joined-table inheritance example:

class Person(Base):
    __tablename__ = 'people'
    id = Column('id', Integer, primary_key=True)
    discriminator = Column('type', String(50))
    __mapper_args__ = {'polymorphic_on':discriminator}

class Engineer(Person):
    __tablename__ = 'engineers'
    __mapper_args__ = {'polymorphic_identity':'engineer'}
    id = Column('id', Integer, ForeignKey('people.id'), primary_key=True)
    primary_language = Column('primary_language', String(50))

For single-table inheritance, the __tablename__ and __table__ class variables are optional on a class when the class inherits from another mapped class.

As a convenience feature, the declarative_base() sets a default constructor on classes which takes keyword arguments, and assigns them to the named attributes:

e = Engineer(primary_language='python')

Note that declarative has no integration built in with sessions, and is only intended as an optional syntax for the regular usage of mappers and Table objects. A typical application setup using scoped_session might look like:

engine = create_engine('postgres://scott:tiger@localhost/test')
Session = scoped_session(sessionmaker(transactional=True, autoflush=False, bind=engine))
Base = declarative_base()

Mapped instances then make usage of Session in the usual way.

back to section top

associationproxy

Author: Mike Bayer and Jason Kirtland

Version: 0.3.1 or greater

associationproxy is used to create a simplified, read/write view of a relationship. It can be used to cherry-pick fields from a collection of related objects or to greatly simplify access to associated objects in an association relationship.

Simplifying Association Object Relations

Consider this "association object" mapping:

users_table = Table('users', metadata,
    Column('id', Integer, primary_key=True),
    Column('name', String(64)),
)

keywords_table = Table('keywords', metadata,
    Column('id', Integer, primary_key=True),
    Column('keyword', String(64))
)

userkeywords_table = Table('userkeywords', metadata,
    Column('user_id', Integer, ForeignKey("users.id"),
           primary_key=True),
    Column('keyword_id', Integer, ForeignKey("keywords.id"),
           primary_key=True)
)

class User(object):
    def __init__(self, name):
        self.name = name

class Keyword(object):
    def __init__(self, keyword):
        self.keyword = keyword

mapper(User, users_table, properties={
    'kw': relation(Keyword, secondary=userkeywords_table)
    })
mapper(Keyword, keywords_table)

Above are three simple tables, modeling users, keywords and a many-to-many relationship between the two. These Keyword objects are little more than a container for a name, and accessing them via the relation is awkward:

user = User('jek')
user.kw.append(Keyword('cheese inspector'))
print user.kw
# [<__main__.Keyword object at 0xb791ea0c>]
print user.kw[0].keyword
# 'cheese inspector'
print [keyword.keyword for keyword in u._keywords]
# ['cheese inspector']

With association_proxy you have a "view" of the relation that contains just the .keyword of the related objects. The proxy is a Python property, and unlike the mapper relation, is defined in your class:

from sqlalchemy.ext.associationproxy import association_proxy

class User(object):
    def __init__(self, name):
        self.name = name

    # proxy the 'keyword' attribute from the 'kw' relation
    keywords = association_proxy('kw', 'keyword')

# ...
>>> user.kw
[<__main__.Keyword object at 0xb791ea0c>]
>>> user.keywords
['cheese inspector']
>>> user.keywords.append('snack ninja')
>>> user.keywords
['cheese inspector', 'snack ninja']
>>> user.kw
[<__main__.Keyword object at 0x9272a4c>, <__main__.Keyword object at 0xb7b396ec>]

The proxy is read/write. New associated objects are created on demand when values are added to the proxy, and modifying or removing an entry through the proxy also affects the underlying collection.

  • The association proxy property is backed by a mapper-defined relation, either a collection or scalar.
  • You can access and modify both the proxy and the backing relation. Changes in one are immediate in the other.
  • The proxy acts like the type of the underlying collection. A list gets a list-like proxy, a dict a dict-like proxy, and so on.
  • Multiple proxies for the same relation are fine.
  • Proxies are lazy, and won't triger a load of the backing relation until they are accessed.
  • The relation is inspected to determine the type of the related objects.
  • To construct new instances, the type is called with the value being assigned, or key and value for dicts.
  • A creator function can be used to create instances instead.

Above, the Keyword.__init__ takes a single argument keyword, which maps conveniently to the value being set through the proxy. A creator function could have been used instead if more flexiblity was required.

Because the proxies are backed a regular relation collection, all of the usual hooks and patterns for using collections are still in effect. The most convenient behavior is the automatic setting of "parent"-type relationships on assignment. In the example above, nothing special had to be done to associate the Keyword to the User. Simply adding it to the collection is sufficient.

back to section top

Simplifying Association Object Relations

Association proxies are also useful for keeping association objects out the way during regular use. For example, the userkeywords table might have a bunch of auditing columns that need to get updated when changes are made- columns that are updated but seldom, if ever, accessed in your application. A proxy can provide a very natural access pattern for the relation.

from sqlalchemy.ext.associationproxy import association_proxy

# users_table and keywords_table tables as above, then:

userkeywords_table = Table('userkeywords', metadata,
    Column('user_id', Integer, ForeignKey("users.id"), primary_key=True),
    Column('keyword_id', Integer, ForeignKey("keywords.id"), primary_key=True),
    # add some auditing columns
    Column('updated_at', DateTime, default=datetime.now),
    Column('updated_by', Integer, default=get_current_uid, onupdate=get_current_uid),
)

def _create_uk_by_keyword(keyword):
    """A creator function."""
    return UserKeyword(keyword=keyword)

class User(object):
    def __init__(self, name):
        self.name = name
    keywords = association_proxy('user_keywords', 'keyword', creator=_create_uk_by_keyword)

class Keyword(object):
    def __init__(self, keyword):
        self.keyword = keyword
    def __repr__(self):
        return 'Keyword(%s)' % repr(self.keyword)

class UserKeyword(object):
    def __init__(self, user=None, keyword=None):
        self.user = user
        self.keyword = keyword

mapper(User, users_table, properties={
    'user_keywords': relation(UserKeyword)
})
mapper(Keyword, keywords_table)
mapper(UserKeyword, userkeywords_table, properties={
    'user': relation(User),
    'keyword': relation(Keyword),
})

user = User('log')
kw1  = Keyword('new_from_blammo')

# Adding a Keyword requires creating a UserKeyword association object
user.user_keywords.append(UserKeyword(user, kw1))

# And accessing Keywords requires traversing UserKeywords
print user.user_keywords[0]
# <__main__.UserKeyword object at 0xb79bbbec>

print user.user_keywords[0].keyword
# Keyword('new_from_blammo')

# Lots of work.

# It's much easier to go through the association proxy!
for kw in (Keyword('its_big'), Keyword('its_heavy'), Keyword('its_wood')):
    user.keywords.append(kw)

print user.keywords
# [Keyword('new_from_blammo'), Keyword('its_big'), Keyword('its_heavy'), Keyword('its_wood')]
back to section top

Building Complex Views

stocks = Table("stocks", meta,
   Column('symbol', String(10), primary_key=True),
   Column('description', String(100), nullable=False),
   Column('last_price', Numeric)
)

brokers = Table("brokers", meta,
   Column('id', Integer,primary_key=True),
   Column('name', String(100), nullable=False)
)

holdings = Table("holdings", meta,
  Column('broker_id', Integer, ForeignKey('brokers.id'), primary_key=True),
  Column('symbol', String(10), ForeignKey('stocks.symbol'), primary_key=True),
  Column('shares', Integer)
)

Above are three tables, modeling stocks, their brokers and the number of shares of a stock held by each broker. This situation is quite different from the association example above. shares is a property of the relation, an important one that we need to use all the time.

For this example, it would be very convenient if Broker objects had a dictionary collection that mapped Stock instances to the shares held for each. That's easy.

from sqlalchemy.ext.associationproxy import association_proxy
from sqlalchemy.orm.collections import attribute_mapped_collection

def _create_holding(stock, shares):
    """A creator function, constructs Holdings from Stock and share quantity."""
    return Holding(stock=stock, shares=shares)

class Broker(object):
    def __init__(self, name):
        self.name = name

    holdings = association_proxy('by_stock', 'shares', creator=_create_holding)

class Stock(object):
    def __init__(self, symbol, description=None):
        self.symbol = symbol
        self.description = description
        self.last_price = 0

class Holding(object):
    def __init__(self, broker=None, stock=None, shares=0):
        self.broker = broker
        self.stock = stock
        self.shares = shares

mapper(Stock, stocks_table)
mapper(Broker, brokers_table, properties={
    'by_stock': relation(Holding,
        collection_class=attribute_mapped_collection('stock'))
})
mapper(Holding, holdings_table, properties={
    'stock': relation(Stock),
    'broker': relation(Broker)
})

Above, we've set up the 'by_stock' relation collection to act as a dictionary, using the .stock property of each Holding as a key.

Populating and accessing that dictionary manually is slightly inconvenient because of the complexity of the Holdings association object:

stock = Stock('ZZK')
broker = Broker('paj')

broker.holdings[stock] = Holding(broker, stock, 10)
print broker.holdings[stock].shares
# 10

The by_stock proxy we've added to the Broker class hides the details of the Holding while also giving access to .shares:

for stock in (Stock('JEK'), Stock('STPZ')):
    broker.holdings[stock] = 123

for stock, shares in broker.holdings.items():
    print stock, shares

# lets take a peek at that holdings_table after committing changes to the db
print list(holdings_table.select().execute())
# [(1, 'ZZK', 10), (1, 'JEK', 123), (1, 'STEPZ', 123)]

Further examples can be found in the examples/ directory in the SQLAlchemy distribution.

The association_proxy convenience function is not present in SQLAlchemy versions 0.3.1 through 0.3.7, instead instantiate the class directly:

from sqlalchemy.ext.associationproxy import AssociationProxy

class Article(object):
   keywords = AssociationProxy('keyword_associations', 'keyword')
back to section top

orderinglist

Author: Jason Kirtland

orderinglist is a helper for mutable ordered relations. It will intercept list operations performed on a relation collection and automatically synchronize changes in list position with an attribute on the related objects. (See advdatamapping_properties_entitycollections for more information on the general pattern.)

Example: Two tables that store slides in a presentation. Each slide has a number of bullet points, displayed in order by the 'position' column on the bullets table. These bullets can be inserted and re-ordered by your end users, and you need to update the 'position' column of all affected rows when changes are made.

slides_table = Table('Slides', metadata,
                     Column('id', Integer, primary_key=True),
                     Column('name', String))

bullets_table = Table('Bullets', metadata,
                      Column('id', Integer, primary_key=True),
                      Column('slide_id', Integer, ForeignKey('Slides.id')),
                      Column('position', Integer),
                      Column('text', String))

 class Slide(object):
     pass
 class Bullet(object):
     pass

 mapper(Slide, slides_table, properties={
       'bullets': relation(Bullet, order_by=[bullets_table.c.position])
 })
 mapper(Bullet, bullets_table)

The standard relation mapping will produce a list-like attribute on each Slide containing all related Bullets, but coping with changes in ordering is totally your responsibility. If you insert a Bullet into that list, there is no magic- it won't have a position attribute unless you assign it it one, and you'll need to manually renumber all the subsequent Bullets in the list to accommodate the insert.

An orderinglist can automate this and manage the 'position' attribute on all related bullets for you.

 
mapper(Slide, slides_table, properties={
'bullets': relation(Bullet,
                    collection_class=ordering_list('position'),
                    order_by=[bullets_table.c.position])
})
mapper(Bullet, bullets_table)

s = Slide()
s.bullets.append(Bullet())
s.bullets.append(Bullet())
s.bullets[1].position
>>> 1
s.bullets.insert(1, Bullet())
s.bullets[2].position
>>> 2

Use the ordering_list function to set up the collection_class on relations (as in the mapper example above). This implementation depends on the list starting in the proper order, so be SURE to put an order_by on your relation.

ordering_list takes the name of the related object's ordering attribute as an argument. By default, the zero-based integer index of the object's position in the ordering_list is synchronized with the ordering attribute: index 0 will get position 0, index 1 position 1, etc. To start numbering at 1 or some other integer, provide count_from=1.

Ordering values are not limited to incrementing integers. Almost any scheme can implemented by supplying a custom ordering_func that maps a Python list index to any value you require. See the module documentation for more information, and also check out the unit tests for examples of stepped numbering, alphabetical and Fibonacci numbering.

back to section top

SqlSoup

Author: Jonathan Ellis

SqlSoup creates mapped classes on the fly from tables, which are automatically reflected from the database based on name. It is essentially a nicer version of the "row data gateway" pattern.

>>> from sqlalchemy.ext.sqlsoup import SqlSoup
>>> soup = SqlSoup('sqlite:///')

>>> db.users.select(order_by=[db.users.c.name])
[MappedUsers(name='Bhargan Basepair',email='basepair@example.edu',password='basepair',classname=None,admin=1),
 MappedUsers(name='Joe Student',email='student@example.edu',password='student',classname=None,admin=0)]

Full SqlSoup documentation is on the SQLAlchemy Wiki.

back to section top

Deprecated Extensions

A lot of our extensions are deprecated. But this is a good thing. Why ? Because all of them have been refined and focused, and rolled into the core of SQLAlchemy. So they aren't removed, they've just graduated into fully integrated features. Below we describe a set of extensions which are present in 0.4 but are deprecated.

SelectResults

Author: Jonas Borgström

NOTE: As of version 0.3.6 of SQLAlchemy, most behavior of SelectResults has been rolled into the base Query object. Explicit usage of SelectResults is therefore no longer needed.

SelectResults gives transformative behavior to the results returned from the select and select_by methods of Query.

from sqlalchemy.ext.selectresults import SelectResults

query = session.query(MyClass)
res = SelectResults(query)

res = res.filter(table.c.column == "something") # adds a WHERE clause (or appends to the existing via "and")
res = res.order_by([table.c.column]) # adds an ORDER BY clause

for x in res[:10]:  # Fetch and print the top ten instances - adds OFFSET 0 LIMIT 10 or equivalent
  print x.column2

# evaluate as a list, which executes the query
x = list(res)

# Count how many instances that have column2 > 42
# and column == "something"
print res.filter(table.c.column2 > 42).count()

# select() is a synonym for filter()
session.query(MyClass).select(mytable.c.column=="something").order_by([mytable.c.column])[2:7]

An important facet of SelectResults is that the actual SQL execution does not occur until the object is used in a list or iterator context. This means you can call any number of transformative methods (including filter, order_by, list range expressions, etc) before any SQL is actually issued.

Configuration of SelectResults may be per-Query, per Mapper, or per application:

from sqlalchemy.ext.selectresults import SelectResults, SelectResultsExt

# construct a SelectResults for an individual Query
sel = SelectResults(session.query(MyClass))

# construct a Mapper where the Query.select()/select_by() methods will return a SelectResults:
mapper(MyClass, mytable, extension=SelectResultsExt())

# globally configure all Mappers to return SelectResults, using the "selectresults" mod
import sqlalchemy.mods.selectresults

SelectResults greatly enhances querying and is highly recommended. For example, heres an example of constructing a query using a combination of joins and outerjoins:

mapper(User, users_table, properties={
    'orders':relation(mapper(Order, orders_table, properties={
        'items':relation(mapper(Item, items_table))
    }))
})
session = create_session()
query = SelectResults(session.query(User))

result = query.outerjoin_to('orders').outerjoin_to('items').select(or_(Order.c.order_id==None,Item.c.item_id==2))

For a full listing of methods, see the generated documentation.

back to section top

SessionContext

Author: Daniel Miller

The SessionContext extension is still available in the 0.4 release of SQLAlchemy, but has been deprecated in favor of the scoped_session() function, which provides a class-like object that constructs a Session on demand which references a thread-local scope.

For docs on SessionContext, see the SQLAlchemy 0.3 documentation.

back to section top

assignmapper

Author: Mike Bayer

The assignmapper extension is still available in the 0.4 release of SQLAlchemy, but has been deprecated in favor of the scoped_session() function, which provides a mapper callable that works similarly to assignmapper.

For docs on assignmapper, see the SQLAlchemy 0.3 documentation.

back to section top

ActiveMapper

Author: Jonathan LaCour

Please note that ActiveMapper has been deprecated in favor of either Elixir, a comprehensive solution to declarative mapping, or declarative, a built in convenience tool which reorganizes Table and mapper() configuration.

ActiveMapper is a so-called "declarative layer" which allows the construction of a class, a Table, and a Mapper all in one step:

class Person(ActiveMapper):
    class mapping:
        id          = column(Integer, primary_key=True)
        full_name   = column(String)
        first_name  = column(String)
        middle_name = column(String)
        last_name   = column(String)
        birth_date  = column(DateTime)
        ssn         = column(String)
        gender      = column(String)
        home_phone  = column(String)
        cell_phone  = column(String)
        work_phone  = column(String)
        prefs_id    = column(Integer, foreign_key=ForeignKey('preferences.id'))
        addresses   = one_to_many('Address', colname='person_id', backref='person')
        preferences = one_to_one('Preferences', colname='pref_id', backref='person')

    def __str__(self):
        s =  '%s\n' % self.full_name
        s += '  * birthdate: %s\n' % (self.birth_date or 'not provided')
        s += '  * fave color: %s\n' % (self.preferences.favorite_color or 'Unknown')
        s += '  * personality: %s\n' % (self.preferences.personality_type or 'Unknown')

        for address in self.addresses:
            s += '  * address: %s\n' % address.address_1
            s += '             %s, %s %s\n' % (address.city, address.state, address.postal_code)

        return s

class Preferences(ActiveMapper):
    class mapping:
        __table__        = 'preferences'
        id               = column(Integer, primary_key=True)
        favorite_color   = column(String)
        personality_type = column(String)

class Address(ActiveMapper):
    class mapping:
        id          = column(Integer, primary_key=True)
        type        = column(String)
        address_1   = column(String)
        city        = column(String)
        state       = column(String)
        postal_code = column(String)
        person_id   = column(Integer, foreign_key=ForeignKey('person.id'))

More discussion on ActiveMapper can be found at Jonathan LaCour's Blog as well as the SQLAlchemy Wiki.

back to section top

This is the MIT license: http://www.opensource.org/licenses/mit-license.php

Copyright (c) 2005, 2006, 2007, 2008 Michael Bayer and contributors. SQLAlchemy is a trademark of Michael Bayer.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

back to section top