2.2.4. Developing database, command line, and web-based programs with Python

2.2.4.1. Using databases with SQLAlchemy and SQLite

Databases are useful tools for organizing and quickly searching large datasets. Relational databases are the most common type of databases. Relational databases are based around schemas or structured definitions of the types of your data and the relationships among your data. These structured definitions facilitate fast searching of large datasets. However, these schemas can also make it cumbersome to represent multiple types of data with different structures. To overcome this limitation, several Python package provide support for editing or migrating schemas. Recently, there has been significant progress in the development of No-SQL databases which do not have fixed schemas.

Database engines are the software programs which implement SQL and No-SQL databases. Several popular SQL database engines include MySQL, Oracle, and SQLite. Several popular No-SQL database engines include CouchBase, CouchDB, and MongoDB.

SQL (Structured Query Language) is the language used to describe relational database schemas and how to insert and retrieve data to/from them. Each relational database engine uses it own variant of SQL, but the SQL languages used by MySQL, Oracle, SQLite and most other popular relational database engines are very similar.

Most of the popular database engines have their own Python interfaces. In addition, there are several Python packages such as SQLAlchemy which abstract away many of the details the individual database engines and enable Python developers to use database with little direct interaction with SQL. These packages make it easy to map between Python objects and rows in relational database tables and between attributes of those objects and columns of those tables. Thus, the packages often referred to as object-relational mappers.

This tutorial will teach you how to use SQLAlchemy to build a SQLite database to represent of single-cell organisms and their components (compartments, species, and reactions). First, we will build a schema that can describe cells. Second, we will build the database. Third, we will populate the database with data. Finally, we will query the database.

2.2.4.1.1. Define the Python object model

In order to represent organisms, compartments, species, and reactions, we must create four Python classes. These classes should inherit from sqlalchemy.ext.declarative.declarative_base() so that SQLAlchemy can map them onto SQLite tables and each class should define a __tablename__ class attribute to tell SQLAchemy which table each class should be mapped onto. First, these classes should define their instance attributes and how each instance attribute should be mapped onto a column in the relational database (i.e. their type in the database). In the example below, we have created name and ncbi_id instance attributes. Furthermore, to ensure we only create once Organism instance per organism, we have declared that the ncbi_id attributes must be unique (i.e. no two objects can have the same ncbi_id).:

import sqlalchemy
import sqlalchemy.ext.declarative
import sqlalchemy.orm

Base = sqlalchemy.ext.declarative.declarative_base()
# :obj:`Base`: base model for local sqlite database

class Organism(Base):
    __tablename__ = 'organism'

    ncbi_id = sqlalchemy.Column(sqlalchemy.Integer(), unique=True)
    name = sqlalchemy.Column(sqlalchemy.String(), unique=True)


class Compartment(Base):
    __tablename__ = 'compartment'

    name = sqlalchemy.Column(sqlalchemy.String())


class Specie(Base):
    __tablename__ = 'specie'

    name = sqlalchemy.Column(sqlalchemy.String())


class Reaction(Base):
    __tablename__ = 'reaction'

    name = sqlalchemy.Column(sqlalchemy.String())

2.2.4.1.1.1. Define the relationships between the classes

Once we have defined all of the classes that we need to represent organisms, compartments, species, and reactions, we can define how these classes are related to each other by defining relationship attributes and foreign keys. Foreign keys are columns that represent pointers to rows in other tables. Relationship attributes tell SQLAchemy how to map foreign keys between rows in tables onto references between Python objects. In order to define foreign keys, we must also define primary keys that for each table that the foreign keys can be related to. This can be done by adding _id attributes to each Python class.

There are four possible types of relationships between Python objects/relational table rows

  • One-to-one relationships
  • One-to-many relationships
  • Many-to-one relationships
  • Many-to-many relationships

The first three types of relationships can be defined by adding additional foreign key columns to tables. To define a many-to-many relationship, we must create an additional association table which contains foreign keys to both tables that we would like to relate.

The cascade argument to sqlalchemy.orm.relationship tells SQLAlchemy whether or not related objects should be deleted when their parents are deleted.:

specie_reaction = sqlalchemy.Table(
    'specie_reaction', Base.metadata,
    sqlalchemy.Column('specie__id', sqlalchemy.Integer, sqlalchemy.ForeignKey('specie._id')),
    sqlalchemy.Column('reaction__id', sqlalchemy.Integer, sqlalchemy.ForeignKey('reaction._id')),
)
# :obj:`sqlalchemy.Table`: Specie:Reaction \many-to-many association table


class Organism(Base):
    __tablename__ = 'organism'

    _id = sqlalchemy.Column(sqlalchemy.Integer(), primary_key=True)
    ncbi_id = sqlalchemy.Column(sqlalchemy.Integer(), unique=True)
    name = sqlalchemy.Column(sqlalchemy.String(), unique=True)


class Compartment(Base):
    __tablename__ = 'compartment'

    _id = sqlalchemy.Column(sqlalchemy.Integer(), primary_key=True)
    name = sqlalchemy.Column(sqlalchemy.String())
    organism_id = sqlalchemy.Column(sqlalchemy.Integer(), sqlalchemy.ForeignKey('organism._id'))
    organism = sqlalchemy.orm.relationship('Organism',
                                           foreign_keys=[organism_id],
                                           backref=sqlalchemy.orm.backref('compartments', cascade="all, delete-orphan"))


class Specie(Base):
    __tablename__ = 'specie'

    _id = sqlalchemy.Column(sqlalchemy.Integer(), primary_key=True)
    name = sqlalchemy.Column(sqlalchemy.String())
    compartment_id = sqlalchemy.Column(sqlalchemy.Integer(), sqlalchemy.ForeignKey('compartment._id'))
    compartment = sqlalchemy.orm.relationship('Compartment',
                                              foreign_keys=[compartment_id],
                                              backref=sqlalchemy.orm.backref('species', cascade="all, delete-orphan"))


class Reaction(Base):
    __tablename__ = 'reaction'

    _id = sqlalchemy.Column(sqlalchemy.Integer(), primary_key=True)
    name = sqlalchemy.Column(sqlalchemy.String())
    species = sqlalchemy.orm.relationship('Specie',
                                          secondary=specie_reaction,
                                          backref=sqlalchemy.orm.backref('reactions', cascade="all, delete-orphan", single_parent=True))

2.2.4.1.2. Optimizing the schema

To speed up the querying of the database, we can instruct SQLlite to build index tables, or pre-sorted copies of the primary tables that can be used to quickly find rows using a binary search rather than having to iterate over every row in a table. This can be done by setting the index argument to each column constructor to True:

specie_reaction = sqlalchemy.Table(
    'specie_reaction', Base.metadata,
    sqlalchemy.Column('specie__id', sqlalchemy.Integer, sqlalchemy.ForeignKey('specie._id'), index=True),
    sqlalchemy.Column('reaction__id', sqlalchemy.Integer, sqlalchemy.ForeignKey('reaction._id'), index=True),
)
# :obj:`sqlalchemy.Table`: Specie:Reaction \many-to-many association table


class Organism(Base):
    __tablename__ = 'organism'

    _id = sqlalchemy.Column(sqlalchemy.Integer(), primary_key=True)
    ncbi_id = sqlalchemy.Column(sqlalchemy.Integer(), index=True, unique=True)
    name = sqlalchemy.Column(sqlalchemy.String(), unique=True)


class Compartment(Base):
    __tablename__ = 'compartment'

    _id = sqlalchemy.Column(sqlalchemy.Integer(), primary_key=True)
    name = sqlalchemy.Column(sqlalchemy.String())
    organism_id = sqlalchemy.Column(sqlalchemy.Integer(), sqlalchemy.ForeignKey('organism._id'), index=True)
    organism = sqlalchemy.orm.relationship('Organism',
                                           foreign_keys=[organism_id],
                                           backref=sqlalchemy.orm.backref('compartments', cascade="all, delete-orphan"))


class Specie(Base):
    __tablename__ = 'specie'

    _id = sqlalchemy.Column(sqlalchemy.Integer(), primary_key=True)
    name = sqlalchemy.Column(sqlalchemy.String())
    compartment_id = sqlalchemy.Column(sqlalchemy.Integer(), sqlalchemy.ForeignKey('compartment._id'), index=True)
    compartment = sqlalchemy.orm.relationship('Compartment',
                                              foreign_keys=[compartment_id],
                                              backref=sqlalchemy.orm.backref('species', cascade="all, delete-orphan"))


class Reaction(Base):
    __tablename__ = 'reaction'

    _id = sqlalchemy.Column(sqlalchemy.Integer(), primary_key=True)
    name = sqlalchemy.Column(sqlalchemy.String())
    species = sqlalchemy.orm.relationship('Specie',
                                          secondary=specie_reaction,
                                          backref=sqlalchemy.orm.backref('reactions', cascade="all, delete-orphan", single_parent=True))

2.2.4.1.3. Create the database

Once we have defined the Python data model, we can use SQLAlchemy to generate the database:

DATABASE_FILENAME = 'test.sqlite'
# :obj:`str`: path to store the database

engine = sqlalchemy.create_engine('sqlite:///' + DATABASE_FILENAME)
# :obj:`sqlalchemy.engine.Engine`: database engine

# create the database
Base.metadata.create_all(engine)

We can use the sqlite3 lite command lite utility to inspect the schema of the database that SQLAchemy generated.:

sqlite3 test.sqlite .schema

2.2.4.1.4. Insert records into the database

We can insert records into the database by (1) creating a “session” on the database, (2) instantiating instances of the Organism, Compartment, Specie, and Reaction classes, (3) adding those instances to the session, and (4) “committing” the session. A session is an in-memory copy of the database which can be used to query and make changes to the database. However, the changes are not saved to the database (and therefore not accessible to other sessions) until they are committed.

Note, SQLAlchemy automatically creates constructors for each class which have keyword arguments for each instance attribute.

Note, SQLAlchemy automatically adds add objects to sessions that are linked to other objects that have been explicitly added to the session.:

session = sqlalchemy.orm.sessionmaker(bind=engine)()
# :obj:`sqlalchemy.orm.session.Session`: sqlalchemy session

# create homo sapiens organism with one reaction
organism = Organism(ncbi_id=9606, name='Homo sapiens')
session.add(organism)

compartment = Compartment(name='cytosol')
organism.compartments.append(compartment)

atp = Specie(name='atp')
compartment.species.append(atp)

adp = Specie(name='adp')
compartment.species.append(adp)

pi = Specie(name='pi')
compartment.species.append(pi)

h2o = Specie(name='h2o')
compartment.species.append(h2o)

h = Specie(name='h')
compartment.species.append(h)

reaction = Reaction(name='atp_hydrolysis')
reaction.species = [atp, adp, pi, h2o, h]


# create E. colii organism with one reaction
organism = Organism(ncbi_id=562, name='Escherichia coli')
session.add(organism)

compartment = Compartment(name='cytosol')
organism.compartments.append(compartment)

gtp = Specie(name='gtp')
compartment.species.append(gtp)

gdp = Specie(name='gdp')
compartment.species.append(gdp)

pi = Specie(name='pi')
compartment.species.append(pi)

h2o = Specie(name='h2o')
compartment.species.append(h2o)

h = Specie(name='h')
compartment.species.append(h)

reaction = Reaction(name='gtp_hydrolysis')
reaction.species = [gtp, gdp, pi, h2o, h]


# save the new objects to the database
session.commit()

2.2.4.1.5. Querying the database

The following examples illustrate how to query the database:

# get the number organisms
organisms = session.query(Organism) \
    .count()

# select all of the organisms
organisms = session.query(Organism) \
    .all()

# order the organisms by their names
organisms = session.query(Organism) \
    .order_by(Organism.name) \
    .all()

# order the organisms by their names in descending order
organisms = session.query(Organism) \
    .order_by(Organism.name.desc()) \
    .all()

# select only organism names
organisms = session.query(Organism.name) \
    .all()

# select a subset of the database
homo_sapiens = session.query(Organism) \
    .filter(Organism.ncbi_id=9606) \
    .first()

# using joining to select a subset based on reaction names
homo_sapiens = session.query(Organism) \
    .join(Compartment, Organism.compartments) \
    .join(Specie, Compartment.species) \
    .join(Reaction, Specie.reactions) \
    .filter(Reaction.name='atp_hydrolysis') \
    .first()

# get the number of species per organism
homo_sapiens = session.query(Organism, sqlalchemy.func.count(Organism._id)) \
    .join(Compartment, Organism.compartments) \
    .join(Specie, Compartment.species) \
    .group_by(Organism._id) \
    .all()

2.2.4.1.6. Editing and removing records

The following examples illustrate how to edit and remove records:

# edit the name of Homo sapiens to "H. sapiens"
homo_sapiens = session.query(Organism) \
    .filter(Organism.ncbi_id=9606) \
    .first()
homo_sapiens.name = 'H. sapiens'
session.commit()

# delete H. sapiens
session.query(Organism) \
    .filter(Organism.ncbi_id=9606) \
    .delete()
session.commit()

# delete E. coli
e_coli = session.query(Organism) \
    .filter(Organism.ncbi_id=562) \
    .first()
session.delete(e_coli)
session.commit()

2.2.4.1.7. SQLAlchemy documentation

See the SQLAlchemy documentation for additional information about building and querying databases with SQLAlchemy.

2.2.4.1.8. Additional tutorials

There are several good tutorial on how to use SQLAlchemy and SQLite

2.2.4.1.9. Advanced concepts

  • Migrations

2.2.4.2. Building command line programs with Cement

Cement is a useful package for easily building command line programs.

See the Cement quickstart guide for an introduction to Cement.

2.2.4.3. Building web-based programs with Flask

Flask is a useful package for building web-based programs.

There are several good Flask tutorials