published 2023-07-08
Recently "Sketch of a Post-ORM" was on HN. The author elucidates some shortcomings of ORMs (some of which are straw men) and of raw SQL, before sketching the shape of the database schema and query management system that he wants. The comments mostly argue that ORMs are better than the author’s straw men – for example, ORMs do in fact make efficient queries most of the time, and you can always drop into raw SQL for the other 5%. A few of the comments also point to other non-ORM query and schema management systems, including:
As a long-time application developer and relational database (PostgreSQL) user, I feel the pain with both raw SQL and with ORMs. Stripping away the straw men, the pain points can be summarized as follows:
queries.php
files and started using ORMs.)For me, the shortcomings of raw SQL primarily have to do with the difficulty of composing more complex queries. I am comfortable writing SQL and don’t find it much of a burden to build a set of generic queries that can do CRUD on tables. I also think that inexperienced developers ought to bone up on SQL if they are going to develop applications that query databases: Learning and mastering at least one dialect of SQL is a basic skill and requirement for backend developers, because they are going to be using SQL for the rest of their career. (Despite the attempts to replace SQL, I don’t see it going anywhere, though I do think that things like PRQL will become more popular for analytical work.)
With #1, I have had good experiences using ORMs for CRUD applications. So I am mostly concerned about #2 and #3. With #3, I have had some very bad experiences debugging issues with complex ORM queries. The composability of queries in, say, SQLAlchemy leads some development teams to write queries in small pieces scattered over several files, which makes them impossible to understand in code, and the way the query functions are written can make it very difficult to render those queries into SQL to understand them.
These issues are compounded by #2: If you are not an expert in the DSL in question, it can be very difficult to understand what the query is doing. I have used both the Django and SQLAlchemy ORMs, but I’ve never become an absolute expert in any ORM’s DSL. If I’m writing a query and it gets to a certain level of complexity, such as needing a CTE or recursion, I will drop into SQL, which I have spent more time and effort mastering. This has proven wise, because it makes it possible for me to be very productive on a wide variety of projects: No matter what ORM is being used, CRUD queries are easy, and complex queries can be done in SQL.
#3 is also an issue with ORM migration systems: Typically, you write the table as a class, and then the migration system generates a language-integrated script that applies changes to the live database. Lots of layers of abstraction over SQL DDL there, when it would be so much easier just to write SQL CREATE TABLE
and ALTER TABLE
statements.
In the end, most of my issue with ORMs is that, although it provides some conveniences for doing CRUD queries, especially those with filtering driven by business logic, it adds too many layers of abstraction over SQL, which developers should understand and not have hidden from them.
From my point of view, the ideal solution to the shortcomings of both ORMs and raw SQL will have the following characteristics:
Given a relation name, fields, and filters, it is straightforward to generate SQL for
Key-value mappings are handled in a consistent manner:
Filters can be modeled as a list of key-value + operator relationships.
=
.Joins, CTEs, Windows, Grouping – all these are often useful, and many ORMs spend a lot of effort modeling them in application code. Instead, it would be a lot simpler to facilitate the more direct use of SQL.
What is the simplest thing that could possibly work? That would be defining queries as a sequence of SQL fragments and “composing” these fragments as a single SQL query. For example:
# Here's a simple Python example
database = ... # some database connection object
tablename = "items"
fields = ["*"]
params = {"a": 5, "b", "two"}
ops = {"a": "<"}
filters = {key: ops.get(key, "=") for key in params}
query = " ".join([
"SELECT",
",".join(fields),
"FROM",
tablename,
"WHERE",
" AND ".join(f"{key} {op} :{key}" for key, op in filters.items())
])
print(query) # SELECT * FROM items WHERE a = :a AND b < :b
results = database.fetch_all(query, params)
The above query is a string that is composed from other strings. The library can assist with with ergonomics around filters, fields, and parameters, as well as with the simple CRUD cases – this SELECT
query is essentially what the CRUD helper would produce. Here are some simple helpers that would make writing this query a breeze in real life:
from collections.abc import Sequence
def fields(data: Sequence) -> str:
"""
Render a comma-separated list of field names from the given fields. Use: E.g.,
for dynamically specifying SELECT or INSERT field lists.
"""
return ", ".join(key for key in data)
def filter(key: str, *, op: Optional[str] = "=", val: Optional[Any] = None):
"""
Render a filter from the given field key, optional operator, and optional value.
"""
return f"{key} {op} {val or ':' + key}"
def select(relation: str, fields=None, filters=None, orderby=None, limit=None) -> str:
"""
SELECT fields
FROM relation
[WHERE filters]
[ORDER BY orderby]
[LIMIT limit].
"""
fields = fields or ["*"]
query = [
f"SELECT {Q.fields(fields)}",
f"FROM {relation}",
]
if filters:
query.append(f"WHERE {' AND '.join(filters)}")
if orderby:
query.append(f"ORDER BY {orderby}")
if limit:
query.append(f"LIMIT {limit}")
return " ".join(query)
# For the SELECT query above, we still have to set the data up...
tablename = "items"
fields = ["*"]
params = {"a": 5, "b", "two"}
ops = {"a": "<"}
filters = [filter(field, op=ops.get(field, '=')) for field in params]
# ... but then, the query is just:
query = Q.select('items', fields=fields, filters=filters)
results = database.fetch_all(query, params)
This approach can also be used for arbitrarily complex cases. Many people poo-poo that, but are other approaches really better? The key thing that requires care is never to include the parameter value directly in the query string, but to let the database interface match values to the query via parameters.
There are other approaches, most of which involve a language-integrated DSL for generating SQL queries. Common Lisp has CLSQL and Postmodern, both of which define a query generator that uses lisp expressions to generate SQL. These libraries are very close to SQL so that the lisp expression translates directly to SQL. Other language-specific query generating libraries are less transparent. SQLAlchemy is perhaps the most complex and complete. This approach has the problems #2 and #3 above, of leaky abstractions and learning a DSL when learning SQL would be a better use of time and effort.
Another approach would be to use a library like PRQL and to write queries as a sequence of PRQL fragments that are compiled to SQL. This would be very similar to using the features of the language to compose SQL directly, but using a query language that is perhaps more amenable to composition.
For my money, composing SQL with strings and using a few helper functions for things like CRUD queries is the best all around approach.
I would much rather define a table or add a column in SQL than in some programming language’s DSL.
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name varchar NOT NULL
);
-- some time later...
ALTER TABLE users ADD COLUMN email varchar;
There is nothing that a schema migration should do that can’t be done by this method. (Someone might argue for the ability to create these definitions dynamically in application code. I have never seen the need for this, but if it arose, it could easily be accommodated using the query composition methods above.)
A migration tool should also have a method for loading seed data from CSV, JSON, or YAML.
# A YAML example
users:
- name: connieroberts
id: '8992e675-ac3a-4a1c-8816-293f37e62d0b'
- name: robertthomas
id: '26b9481d-e11d-413f-8389-a153342ea67c'
- name: joyharris
id: '923919f2-30a9-4dbe-abb2-f7d70f4407c0'
- name: andrewjohnston
id: 'ecfe5a2b-0378-4d5f-8cec-93b28e792463'
- name: susancross
id: 'f4ceacdf-ea1a-4096-800d-04da0caadbdf'
# users.csv - using pipe `|` as a delimiter because it is uncommon in field data.
id|name
8992e675-ac3a-4a1c-8816-293f37e62d0b|connieroberts
26b9481d-e11d-413f-8389-a153342ea67c|robertthomas
923919f2-30a9-4dbe-abb2-f7d70f4407c0|joyharris
ecfe5a2b-0378-4d5f-8cec-93b28e792463|andrewjohnston
f4ceacdf-ea1a-4096-800d-04da0caadbdf|susancross
A migration tool should be able to handle not just a single sequence of migrations, but branching and merging migrations. Merging is not an easy problem to solve in terms of schema maintenance, so the migration tool shouldn’t try to solve merging – the developers have to ensure that a merge does not create conflicts. But the migration tool should be able to process merge migrations.
It should also be able to handle situations where a developer switches branches but has migrations in the database from a branch that is no longer present in the file system – the migration system should be able to reverse out of migrations that are in the database but not in the codebase.
The Django migration tool has the ability to incorporate migrations defined in other applications that are imported into (but not present in) the current codebase. This is an extremely nice, labor-saving feature that enables the creation of applications that can be imported and used in other applications, including the migrations that make up the data models. I have not seen this feature elsewhere.
A few years ago (May 2019), I set out to solve these issues for myself: I started a Python library called SQLY with the following goals:
$1
… $N
) or question marks (?
) , others use named %s-strings (%(name)s
) or colon-strings (:name
) to indicate parameters that should be substituted. SQLY opts to use named colon-strings and automates generating SQL with the correct dialect’s syntax.As of this writing, SQLY has achieved these goals — the query and migration engines work as advertised, but some of the code throughout the library needs to be cleaned up, and there are no documentation (!) or tests (!!!). Until those things are completed, it is alpha-quality software.