Data, computation and persistence

When thinking about systems these days I often start with three core questions.

What is the data like that the system will manage, use and emit to other systems?

What kind of computations the system needs to make with or based on the data?

How to persist data and state of calculations?

I started to ask these questions while programming with Clojure, but I continue asking the same questions while programming in other languages and environments.

These questions have helped me think about what the planned system could be and what the interplay of answers between these three questions is.

Let me elaborate.

Data

Starting with the data is a no-brainer, but how you think about the data can affect the two other components: how you do computations with the data and how that data is persisted into some storage.

It is tempting to start to think about data constantly in terms of a highly normalized relational database model. Still, there are domains and problems where the data needs to be accessed through other not-so-conventional models.

Hence it is good to keep the conceptual and logical or physical data models separate while you are still figuring out what the system will be like.

Computations

Depending on what kind of computations I want to do with the data, I might want to model actual data structures differently. Am I looking at discrete records or facts, a larger graph, or a stream of events to make sense of the world with a computation? What is the model through which I can solve and get the answer to the computation?

If all I have are filter, map and reduce - I will look at the problem and data models needed through that lens. But if I have a larger tool chest, I might approach the problem differently. For example, do I have a problem that could be solved efficiently with an optimization tool, graph engine or logic programming tool?

Persistence

How I model the data and what kind of computations I want to do with it then naturally affect the persistence model of the data into some storage system.

Modern relational databases are awesome, and PostgreSQL is often the right answer. But there are situations where you could benefit from something else: XTDB, Kafka, Redis, S3, DynamoDB or something else.

Similarly, selecting the persistence mechanism and style affects what computations or parts of the computation I can and should offload from the application code to the persistence engine or persistence subsystem.

Object storage like S3 or even a filesystem can be a perfect persistence mechanism when you can structure data intelligently and use query mechanisms like S3 select or grep and jq to answer your business questions.

My main goal: countering my biases

There are many other viewpoints and questions that I need to think about and use when designing systems and thinking about what we are trying to solve. Still, these three questions provide me with an excellent safeguard for not jumping to a simple but wrong conclusion too early.

Some programming environments and tools make it easy to solve certain problems in a certain way, and in the end, they can significantly affect how you try to approach a business problem. And no matter how smart you are, you can still fall into a silly trap set by abstractions if you do not think things through.

Ultimately, I don’t want the tail to wag the dog unless I consciously choose it to be so.