Three Layer Architecture
Following is a description of my current favorite way of structuring applications and modules.
It's usually relevant to anything but the most trivial scripts; with more complex applications I will tend to structure individual modules like this.
Overview
Almost any application is separated into three main abstraction layers:
- storage layer
- business logic layer
- presentation layer.
The storage typically contains objects revolving around loading and saving objects from business layer. This can be in literal sense, eg. a database CRUD, or in symbolic sense, ie. integration with an external service such as Google calendar.
Business logic layer is where most of the interesting stuff happens. The main "currency" in this layer is business objects, which aim to accurately represent what is important for this application specfically, while excluding details related to chores such as serialization.
Finally, the presentation layer is focused on passing useful representations of business objects to user. For example, if this is a backend part of a web application, the presentation layer will translate business objects to JSON as well as implement logic around error catching and HTTP statuses.
A train scenario
To have an example, throughout this guide I will be referring to an imaginary application, which is used to manage trains. (You can also imagine it's a game.)
Storage layer
The "first class citizen" is always a Repository object, which is an object providing exclusive interface to persistence, whether it's DB, simple file storage, in-memory storage or external API.
Interface
Following are guidelines on how the repository object should look & behave like:
It has one or more sets of CRUD style methods working with business logic objects.
Eg.
async def load_trains(self) -> AsyncIterable[Train]
could be a method on a repository.It is instantiated with an IO object.
This can be eg. sqlalchemy session but also a third-party API session or a configuration file (eg. dictionary or an XML object). This IO object already represents concrete storage configuration.
Typically a unit test would pass a different IO object than production instance.
It hides low-level exceptions and exceptions from specific set.
For example, it might raise eg.
NoSuchTrainError
orTrainSaveError
or justTrainError
, but neverSQLWhateverError
orIOError
.Alternatively, the set of exceptions could be generalized and shared across multiple stores, eg. as
SaveError
orNotFoundError
. Such decision might be useful for error tracking and translation purposes.It enforces permissions, where applicable
It's customary to pass eg. User ID to the repository on initialization and then make sure that any methods can only affect database rows belonging to this user.
Granularity
How many repositories we should have? When should we break up a repository? This depends on size and scope of the application.
For example, modular applications might benefit from having one repository for each module, but also commonly re-used repositories for things like users or orgs.
Note that some frameworks, eg. Django or SQLAlchemy, use the repository pattern as "hidden" ORM-style layer, ending up with a single repository per class, but this guideline suggests different approach. Realizing the need for a new business object should not come with the cost of having a new repository for it. It's OK to have single repository dealing with multiple closely related business classes.
The upper limit is when the repository becomes unwieldy to maintain because of complexity or dependencies, or hard to understand because it does not represent a business-related concern.
Exclusivity
Repository is "hiding" complexity of IO.
This means that the repository must be the only way (or at least primary) wat to access the state of given object.
For example, applications or modules should never access trains
table
directly. If some feature needs to access Train
objects, it MUST use
relevant TrainsRepository
for this. When implementing a feature, it's
OK to extend the repository itself, even at the cost of breaking the DRY
principle.
For example: There's already a method for loading trains but it's not sufficient for our new feature. It is heavily used by another critical feature. If we have enough confidence (read "tests") we might alter that method but if we're not sure, it should be OK to add new method.
Given that we maintain this exclusivity, maintenance is easier because concerns related to same type of objects will be covered under the same repository.
For example, if, for some reason, we must touch DB schema or add extra logic to deal with reliability issues, all changes we need to make are within the given repository, and any testing will still apply.
Re-usability
A feature that is to be integrated with trains should only import
TrainsRepository
(or an utility function to help instantiate it)
and related business data types.
Testability
Because the repository itself depends on, but is not the same as the underlying IO, we can now replace the IO while keeping the repository.
This means we can have integration/unit tests on the repository itself. As long as we can make it cheap and safe to replace the IO layer, we can start aiming for good test coverage.
For example, if we have a TrainsRepository
which instantiates with, and
internally uses an SQLite session, we can have test which instantiates
the repository with an empty in-memory session.
Most IO solutions will make such flexibility easy: we can almost always test against some kind of empty or pre-populated storage.
Last resort (eg. proprietary remote API's), we can always create a mock of the IO layer. (Beware of mocks in general, though: they are hard to get right as they tend not to be realistic representation of the real thing, possibly leading to always-passing tests.)
Business logic layer
The business layer is comprised of
a set of simple data-oriented classes,
and one or more logical functions or classes.
Models
..or classes, or types
Contain minimum amount of data
Just as much as is necessary for achieving the necessary functionality
Contain almost no methods
Almost all of business logic should be in the logical objects or functions, not in the models.
Most models won't really need any methods; they will be basically just structs.
It's OK to have small factory class methods to help instantiate the objects, or properties which help extract information.
Should avoid inheritance between them
Inheritance can get messy pretty quickly. Models should be small and fully understandable from looking at them.
If you need different types of something, consider using enum as a type marker (they're not very pretty in Python) and if-else-ing on the enum value -- that can be often cleaner, especially if the difference in behavior is localized to a given module and the number of types does not need to be extended dynamically.
Must be correct and unambiguous
Values that are traditionally ambiguous such as currency or time values must be in their least ambiguous form.
For example, Python datetimes must not be timezone-agnostic; any disambiguation must be dealt with elsewhere.
Can be validated properly
Use of eg. pydantic for these models is appropriate and it helps further by adding Swagger documentation.
Logic
Typically, there is a single class called Session
, which follows
these guidelines:
Implements most of the actual logic
On initialization, takes one or more repositories (composition).
There can be specified class methods to help with common initializations.
Its methods generally use one or more repository and respond with another business object.
For example, a method could have a signature like this:
@dataclass class Session: repo: TrainsRepository
async def schedule_train(train: Train, schedule_time: datetime, ) -> TrainSchedule: ...
where
TrainsRepository
provides access to the storage layer andTrain
andTrainSchedule
are business models mentioned above.Any translation to user-facing form such as HTML, plain text or JSON (if we're a web app backend) happens in the Presentation layer discussed in the next section.
Reusability
While the repository object is re-usable already, the Session object is re-usable as well. Depending on the situation, one or another will be a better point to integrate with.
For example, if we only want to generate train statistics,
it might be enough to import TrainsRepository
, use a loader
method to get a generator and start counting.
On the other hand, if we want our feature to perform a train-related operation equivalent to what user would do, we can use the whole Session object as well.|
Testability - logic
Same reasoning applies to Testability.
Because the session object can be initialized with any IO environment, we can have tests like this:
def test_scheduling():
session = Session(repo=TrainsRepository(
db=get_test_db(),
user_id=0,
))
train = session.get_train(train_id=0)
schedule = session.schedule_train(
train=train,
schedule_time=datetime("2024-04-24T12:00Z")
)
assert schedule # etc.
Notice that the example uses the real TrainsRepository object instead of a mock. This is preferable in most cases, since mocks can be tricky and if we follow this guide, the repository should be relatively "thin" so there's no reason to deal with the downsides of mocks.
Testability - models
Many models will remain pure dataclasses, so if we use a validation framework such as Pydantic, there won't be anything left to test anyway.
Things like factory methods or properties should be kept trivial so testability is not an issue.
Presentation layer
Focus of this layer is to provide integration with other services or users. For Backend, this usually means exposing some JSON endpoints. For CLI, this would mean parsing CLI args and running actions.
One application can have multiple interfaces, eg. one HTTP/JSON API, one HTTP/HTML API and also CLI API.
The presentation layer should be organized to parts such that:
a part can be module or executable, based on required interface and best practices.
Modules don't depend on each other, although they all depend on Business Layer.
Code in individual modules should be "thin" -- ie. don't add extra logic here until it's specific to the translation and/or error handling the presentation operations.
For example, if our Train app was to expose a HTTP/JSON API using FastAPI, an 'api.py' module could look like this:
from fastapi import Depends, FastAPI, HTTPException
from .business import Session
from .models import TrainsListing
async def get_session(user_id: str = Query(...),
):
repo = TrainsRepository(
db=get_db(),
user_id=user_id,
)
app = FastAPI(...)
@app.get('/trains/{train_id}',
response_model=TrainsListing,
)
async def get_train(train_id: int,
session: Session = Depends(get_session),
) -> Train:
try:
train = session.get_train(train_id=0)
except NoSuchTrainError as e:
raise HTTPException(404, str(e))
@app.get('/trains/list',
response_model=TrainsListing,
)
async def trains_list(q: str = Path(...),
session: Session = Depends(get_session),
) -> TrainsListing:
return session.list_trains(q=q)
Common guidelines
Across the table:
Try to avoid inheritance
Most issues which are so commonly used as examples of inheritance are almost always better served by
- composition for code re-use
- interfaces for polymorphism
Note: in Python, interfaces are implemented by
typing.Protocol
, notabc.ABC
.Use typing where possible
Eg. in Python, a passing and complete MyPy check should be the minimum to get your code merged.