SQL Alchemy

SQLAlchemy is a “Swiss Army knife” for working with databases in Python. Many popular Python frameworks build on top of SQLAlchemy, and it’s an essential tool in your backend engineering or data science toolbox. In this tutorial, we will design the schema for a blog app, define the database models, and populate the database with sample data. Then, we will leverage SQL Alchemy’s ORM to create, read, update, and delete instances from our database. Finally, we will use groupby and aggregate functions to retrieve descriptive statistics from our sample data.

Schema Design The first step to working with a relational database is to design your schema. For our tutorial, we will be working with the following schema:

blog application database schema An author profile (table named ‘profile’) and a user will have a one-to-one relationship. Furthermore, any individual author can contribute to multiple blogs, and each blog can feature multiple authors, establishing a many-to-many relationship. Finally, a blog can have many comments, but a comment can only be associated with one blog.

Project Setup

To get started building this application, first we are going to have to install SqlAlchemy with pip install SQLAlchemy. Then, in our main.py file, paste the following code:

""" main.py """

import os
from pathlib import Path

from sqlalchemy import create_engine, event, Engine
from sqlalchemy.orm import scoped_session, sessionmaker
from models import Model

BASE_DIR = Path('.').absolute()
engine = create_engine(f"sqlite:///{BASE_DIR}/db.sqlite3", echo=False)
session = scoped_session(
    sessionmaker(
        autoflush = False,
        autocommit = False,
        bind=engine
    )
)

@event.listens_for(Engine, "connect")
def set_sqlite_pragma(dbapi_connection, connection_record):
    cursor = dbapi_connection.cursor()
    cursor.execute("PRAGMA foreign_keys=ON")
    cursor.close()

In this code snippet, all you need to know is that the create_engine function will establish a connection to your database. The scoped_session function returns a session object, allowing you to manage database transactions. For this tutorial, we are using an SQLite database; however, you can change the connection string to a PostgreSQL database or any other database of your choice. For a deeper dive into these functions, you can refer to the SQLAlchemy documentation.

SQL Alchemy utilizes a declarative base class to allow Python classes to define SQL tables that already exist or will be created in your database. The declarative base class is defined below:

""" models.py """
from datetime import datetime

from sqlalchemy import Column, Integer, String, Boolean, ForeignKey, DateTime, Date, Float
from sqlalchemy.orm import Relationship
from sqlalchemy.orm import declarative_base
from main import session

Model = declarative_base()
Model.query = session.query_property()

Relationships

One to One relationship

In our schema design, any given user can have one AuthorProfile and an AuthorProfile will only ever link back to one user. Our AuthorProfile and User model are defined below:

""" models.py """

class User(Model):
    __tablename__ = "user"
    id = Column(Integer, primary_key=True, autoincrement=True)

    created_at = Column(DateTime, default=datetime.now())
    last_updated = Column(DateTime, onupdate=datetime.now())
    name = Column(String, nullable=False)
    age = Column(Integer, nullable=True)
    # uselist ensures you get a single value when querying instead of a list, which would be the case for on to many relatiohsips
    author_profile = Relationship("AuthorProfile", 
                                  back_populates="user", 
                                  passive_deletes=True, 
                                  uselist=False)


class AuthorProfile(Model):
    __tablename__ = "author"
    id = Column(Integer, primary_key=True, autoincrement=True)
    # unique forces one to one relationship
    user_id = Column(Integer, ForeignKey("user.id", ondelete='CASCADE'), unique=True, nullable=True)
    display_name = Column(String, nullable=False)
    bio = Column(String, nullable=True)
    
    user = Relationship("User", back_populates="author_profile")

Here, we define a user model with a few fields to provide information about a given user including name, age, and when the user created and last updated their account info. Similarly, we define a display_name and bio for our AuthorProfile class. To accomplish this SQLAlchemy has Integer, String, and DateTime class. The tablename property names the table in your SQL database. Additionally, nullable=False enforces a NOT NULL constraint for a given field in your table.

In our AuthroProfile class, we define our relationship in two ways. First, we provide a foreign key called user_id , which points to the ID in the user table. The ondelete argument specifies how the database will respond if the referencing instance is deleted. ‘CASCADE’ indicates that if a User is deleted the corresponding AuthorProfile will be deleted as well. The second requirement is defining our relationship. We use the Relationship constructor and pass the name of the class we would like to relate to (in our case “User”). The back_populates argument is required in the AuthorProfile’s user field so that if an AuthorProfile instance is changed the referenced User instance will update accordingly. Finally, we need to define our relationship in our User class. Specifying uselist=Falsein our constructor ensures that our model returns a single value instead of a list when returning the results of a query.

One-to-Many Relationships

Now that we have defined our User and AuthorProfile entities, it’s time to define our blogs and comments. A blog can have many comments, but a comment can only be associated with one blog.

""" models.py """
...
class Blog(Model):
    __tablename__ = "blog"
    id = Column(Integer, primary_key=True, autoincrement=True)
    
    created_at = Column(DateTime, default=datetime.now())
    last_updated = Column(DateTime, onupdate=datetime.now())
    title = Column(String, nullable=False)
    tagline = Column(String, nullable=True)
    content = Column(String, nullable=False)
    comments = Relationship("Comment")


class Comment(Model):
    __tablename__ = "comment"
    id = Column(Integer, primary_key=True, autoincrement=True)
    blog_id = Column(Integer, ForeignKey("blog.id", ondelete='CASCADE'), nullable=True)
    text = Column(String, nullable=False)
    blog = Relationship("Blog",
                         passive_deletes=True,
                         back_populates="comments",
                         uselist=False)

This is similar to our one-to-one relationship except we don’t need to specify uselist=True as it is false by default. Similarly, we define the foreign key and use back_populates to ensure synchronous changes during our database transactions.

Many-to-Many Relationships

Our last relationship is a many-to-many relationship. We are allowing Authors to collaborate on the same blog. This means a blog can have many authors and an author can write many blogs. To define a relationship of this nature in SQLAlchemy we can define two additional models:

""" models.py """
...
class AuthorProfile(Model):
    ...
    blogs = Relationship("Blog",
                         secondary="blog_author",
                         passive_deletes=True,
                         back_populates="authors"
                        )

class BlogAuthor(Model):
    __tablename__ = "blog_author"
    id = Column(Integer, primary_key=True, autoincrement=True)
    author_profile_id = Column(Integer, ForeignKey("author.id", ondelete="CASCADE"), nullable=True)
    blog_id = Column(Integer, ForeignKey("blog.id", ondelete="CASCADE"), nullable=True)

class Blog(Model):
    ...
    authors = Relationship("AuthorProfile",
                         secondary="blog_author",
                         passive_deletes=True,
                         back_populates="blogs")

First, we establish the relationship between AuthorProfiles and Blogs using the Relationship constructor, similar to a one-to-one relationship. The key difference is the secondary argument that uses the name of the association table as the value. Then, we define our association table, BlogAuthor, with two foreign key fields, one that points to AuthorProfile and one that points to Blog.

And that's it! We have successfully defined the models for our database schema. Now, all we need to do is open up a python repl and run the following:

>>> from main import engine
>>> from modles import Model
>>> Model.metadata.create_all(engine)

This will initialize a db.qlite3 file with the corresponding tables we just defined. Next, we will populate our database.

Seed the DB

Populating our database is extremely simple. Create a seed.py file, copy the following JSON data to the top of the file, and run the following script:

""" seed.py """
def seed_db():
    data = [(users, User), (authors, AuthorProfile), (blogs, Blog), 
            (blog_authors, BlogAuthor), (comments, Comment)]
    for objects, model in data:
        for obj in objects:
            session.add(model(**obj))
        session.commit()

if __name__ == "__main__":
     seed_db()

Querying

Select and Filter Every SQLAlchemy model has a query property that allows you to query for instances of that model in the database. The query.all() method enables you to select all from the specified table. The filter and filter_by method allows two different ways of specifying conditions to limit your query results.

""" queries.py """

from models import Blog, BlogAuthor, AuthorProfile, User, Comment
from main import engine

def print_blogs(q):
    """ 
    helper function for printing out blogs 
    """
    for blog in q: 
        print(blog.title)
        print(blog.tagline)
        print(blog.created_at)

def select_all_blogs():
    """ 
    select all blogs 
    """

    blogs = Blog.query.all()
    print_blogs(blogs)

def filter_blogs():
    """ 
    filter blogs by title you can use filter or the filter_by method 
    """
    q_1 = Blog.query.filter(Blog.title == "Beginner Django Tutorial")
    q_2 = Blog.query.filter_by(title="Beginner Django Tutorial")
    # filter blog where the title contains the word beginner
    q_3 = Blog.query.filter(Blog.title.contains("Beginner"))
    print_blogs(q_3)

if __name__ == "__main__":
    selelct_all_blogs()
    filter_blogs()

Insert

Creating new instances of your models is as simple as instantiating that model and adding it to your transaction with session.add().

""" queries.py """
...

def print_users(u):
    for user in u:
        print(user.name, user.age)    

def insert_users():
    u = User(name="Clark Kent", age=37)
    u2 = User(name="Clark Kent", age=37)
    session.add(u)
    session.add(u2)
    session.commit()

    users = User.query.filter(User.name.contains("Clark"))
    print_users(users)

if __name__ == "__main__":
    # selelct_all_blogs()
    # filter_blogs()
    insert_users()


>>> Clark Kent 37
>>> Clark Kent 37

Update and Delete

Once you query for the objects you would like to update, it's as simple as updating the properties on that instance and committing those updates with session.commit(). Deleting instances is the same process.

""" queries.py """
...

def update_user():
    """ update user """

    u = User.query.filter_by(name="Clark Kent").first()
    u.name = "Superman"
    session.commit()

if __name__ == "__main__":
    # selelct_all_blogs()
    # filter_blogs()
    # insert_users()
    update_user()
    deleted = User.query.filter_by(name="Clark Kent").delete()
    session.commit() 
    print(f"Successfully deleted {deleted} user(s)")

Joins

Joins are performed using query.join(). The first argument is the Entity you would like to join, and the second is the field in which you defined the relationship. Consider the following example:

def get_user_profiles():
    """ one to one example with user and profiles """
    profiles = AuthorProfile.query.join(User, AuthorProfile.user)
    print(f"DISPLAY_NAME: {profiles.first().display_name}")
    print(f"NAME: {profiles.first().user.name}")

def get_blog_comments():
    """ 
    One to many example: 
    grabbing all comments asssociated with one blog 
    """
    blogs = Blog.query.join(Comment, Blog.comments).all()
    for comment in blogs[1].comments:
        print(comment.text)

Group By and Aggregate Functions

Basic CRUD operations won’t allow us to select the oldest user in our database. We need to use aggregate functions to operate on the field property for all instances. To accomplish this we use the func object.

""" queries.py """

from sqlalchemy import func

def get_avg_age():
    res = User.query.with_entities(func.max(User.age), User.name, User.id).all()
    print(res)
>>> [(55, 'New User', 6)]

What if we wanted to know the number of comments each blog has? To do so, we need to first group the comments by their blog_id, then count the number of blogs for each group.

def get_comment_counts():
    num_comments_by_blog = Comment.query \
        .with_entities(Comment.blog_id, 
                        func.count(Comment.blog_id)) \
        .group_by(Comment.blog_id).all()
    print(num_comments_by_blog)
>>> [(1, 2), (2, 3), (3, 1), (4, 1)]

Notice the .with_entities method returns results as tuples instead of Python objects. Our result shows the blog_id as the first index and the number of comments as the second index.

Conclusion

In this tutorial, we learned the basics of working with SQL Alchemy. You can now confidently design a schema, define your database models, and write queries to fetch your desired results.