Three Layer Architecture

Following is a description of my current favorite way of structuring applications and modules.

It's usually relevant to anything but the most trivial scripts; with more complex applications I will tend to structure individual modules like this.

Overview

Almost any application is separated into three main abstraction layers:

The storage typically contains objects revolving around loading and saving objects from business layer. This can be in literal sense, eg. a database CRUD, or in symbolic sense, ie. integration with an external service such as Google calendar.

Business logic layer is where most of the interesting stuff happens. The main "currency" in this layer is business objects, which aim to accurately represent what is important for this application specfically, while excluding details related to chores such as serialization.

Finally, the presentation layer is focused on passing useful representations of business objects to user. For example, if this is a backend part of a web application, the presentation layer will translate business objects to JSON as well as implement logic around error catching and HTTP statuses.

A train scenario

To have an example, throughout this guide I will be referring to an imaginary application, which is used to manage trains. (You can also imagine it's a game.)

Storage layer

The "first class citizen" is always a Repository object, which is an object providing exclusive interface to persistence, whether it's DB, simple file storage, in-memory storage or external API.

Interface

Following are guidelines on how the repository object should look & behave like:

Granularity

How many repositories we should have? When should we break up a repository? This depends on size and scope of the application.

For example, modular applications might benefit from having one repository for each module, but also commonly re-used repositories for things like users or orgs.

Note that some frameworks, eg. Django or SQLAlchemy, use the repository pattern as "hidden" ORM-style layer, ending up with a single repository per class, but this guideline suggests different approach. Realizing the need for a new business object should not come with the cost of having a new repository for it. It's OK to have single repository dealing with multiple closely related business classes.

The upper limit is when the repository becomes unwieldy to maintain because of complexity or dependencies, or hard to understand because it does not represent a business-related concern.

Exclusivity

Repository is "hiding" complexity of IO.

This means that the repository must be the only way (or at least primary) wat to access the state of given object.

For example, applications or modules should never access trains table directly. If some feature needs to access Train objects, it MUST use relevant TrainsRepository for this. When implementing a feature, it's OK to extend the repository itself, even at the cost of breaking the DRY principle.

For example: There's already a method for loading trains but it's not sufficient for our new feature. It is heavily used by another critical feature. If we have enough confidence (read "tests") we might alter that method but if we're not sure, it should be OK to add new method.

Given that we maintain this exclusivity, maintenance is easier because concerns related to same type of objects will be covered under the same repository.

For example, if, for some reason, we must touch DB schema or add extra logic to deal with reliability issues, all changes we need to make are within the given repository, and any testing will still apply.

Re-usability

A feature that is to be integrated with trains should only import TrainsRepository (or an utility function to help instantiate it) and related business data types.

Testability

Because the repository itself depends on, but is not the same as the underlying IO, we can now replace the IO while keeping the repository.

This means we can have integration/unit tests on the repository itself. As long as we can make it cheap and safe to replace the IO layer, we can start aiming for good test coverage.

For example, if we have a TrainsRepository which instantiates with, and internally uses an SQLite session, we can have test which instantiates the repository with an empty in-memory session.

Most IO solutions will make such flexibility easy: we can almost always test against some kind of empty or pre-populated storage.

Last resort (eg. proprietary remote API's), we can always create a mock of the IO layer. (Beware of mocks in general, though: they are hard to get right as they tend not to be realistic representation of the real thing, possibly leading to always-passing tests.)

Business logic layer

The business layer is comprised of

Models

..or classes, or types

Logic

Typically, there is a single class called Session, which follows these guidelines:

Reusability

While the repository object is re-usable already, the Session object is re-usable as well. Depending on the situation, one or another will be a better point to integrate with.

For example, if we only want to generate train statistics, it might be enough to import TrainsRepository, use a loader method to get a generator and start counting.

On the other hand, if we want our feature to perform a train-related operation equivalent to what user would do, we can use the whole Session object as well.|

Testability - logic

Same reasoning applies to Testability.

Because the session object can be initialized with any IO environment, we can have tests like this:

def test_scheduling():
    session = Session(repo=TrainsRepository(
        db=get_test_db(),
        user_id=0,
    ))
    train = session.get_train(train_id=0)
    schedule = session.schedule_train(
        train=train,
        schedule_time=datetime("2024-04-24T12:00Z")
    )
    assert schedule     # etc.

Notice that the example uses the real TrainsRepository object instead of a mock. This is preferable in most cases, since mocks can be tricky and if we follow this guide, the repository should be relatively "thin" so there's no reason to deal with the downsides of mocks.

Testability - models

Many models will remain pure dataclasses, so if we use a validation framework such as Pydantic, there won't be anything left to test anyway.

Things like factory methods or properties should be kept trivial so testability is not an issue.

Presentation layer

Focus of this layer is to provide integration with other services or users. For Backend, this usually means exposing some JSON endpoints. For CLI, this would mean parsing CLI args and running actions.

One application can have multiple interfaces, eg. one HTTP/JSON API, one HTTP/HTML API and also CLI API.

The presentation layer should be organized to parts such that:

For example, if our Train app was to expose a HTTP/JSON API using FastAPI, an 'api.py' module could look like this:

from fastapi import Depends, FastAPI, HTTPException

from .business import Session
from .models import TrainsListing

async def get_session(user_id: str = Query(...),
                      ):
    repo = TrainsRepository(
        db=get_db(),
        user_id=user_id,
    )

app = FastAPI(...)

@app.get('/trains/{train_id}',
         response_model=TrainsListing,
         )
async def get_train(train_id: int,
                    session: Session = Depends(get_session),
                    ) -> Train:
try:
    train = session.get_train(train_id=0)
except NoSuchTrainError as e:
    raise HTTPException(404, str(e))

@app.get('/trains/list',
         response_model=TrainsListing,
         )
async def trains_list(q: str = Path(...),
                      session: Session = Depends(get_session),
                      ) -> TrainsListing:
    return session.list_trains(q=q)

Common guidelines

Across the table:

Mastodon
published by mdpublish, 2024-07-02