CQRS stands for Command Query Responsibility Segregation, and the main idea behind it is that we want to separate the reading part of the pipeline from the writing part. With CQRS, a method either does writes, or reads, but never both — or to put it more simply, a method that returns a result should never change that result.

Architecturally, it looks something like this:


What would be the advantage of separating the read and write paths? Well, for starters, in most applications, the number of read operations performed is orders of magnitude bigger than the number of write operations. Imagine your typical Facebook type app. As a user, how often do you browse other people’s profiles, and how often do you actually post on your own page? So obviously, for an application like that we need an architecture that scales out easily—namely it should be easy to host our web application on multiple servers. The stateless nature of the web helps in that respect, but what is really problematic is the fact that we have to track state — and we have to do it for multiple users. So let’s look at a way of doing that.

Obviously, if you are writing and updating data using the same model, it’s pretty hard to do any read related optimizations, because you will always be transferring the whole object over the wire when you are doing CRUD. There is of course the possibility of doing projections in the GET methods of your API, but this involves doing expensive inner joins on your relational database, then projecting the result of that query to some kind of view model (usually doing some kind of mapping). These can be very expensive operations, especially if your database is partitioned across multiple servers.

A typical distributed application works like the one in the picture. A user will make a request to a public endpoint, and based on the load of each server in the network, it will redirect the request to one server in the list. This way, as the number of users for our application increases, we can simply add additional servers to our server farm. The problem with the following picture is keeping track of data.

We can do two things:

  1. Connect all application servers to a single database server.
  2. We can have each application have it’s own server.

But the problem in case number 1. is that the database server quickly becomes a bottleneck, and in case number 2. we might have a be redirected to server B when he/she writes something, and then when reading that data, a user would be redirected to server A. To solve this, there are two strategies that can be used, and they can be used together. Affinity and replication.

Replication means that we use approach number 2. and we pretty much duplicate data on all servers. Affinity means that we redirect traffic for a specific user to a specific server instance. User “X” will write to server A, user “Y” to server B etc. This means that we also need to have a more intelligent load balancer in place. Also, an issue with the affinity strategy is: what do you do if a user needs to get data from multiple users—let’s say rendering the wall in Facebook. The users might have their data on multiple servers.

CQRS to the rescue

An application the size of Facebook probably has hundreds of thousands of servers and sophisticated proprietary caching and data storage. But it is a good example of the problems that can arise when scaling out web applications. We also don’t want to replicate all the data on all database server because it really becomes huge. So we usually use a combination of the techniques — for a very big application we would replicate data across data centers for speed and consistency, and we also use some for of affinity.

Let’s get back to the Facebook wall example. Imagine that we need to aggregate data from 30 different Facebook friends. Because we used write affinity, their posts might be on different servers, so aggregating the whole wall would be quite time consuming. Jill from the US might post something, my colleagues from the Netherlands might post something else, so it’s very unlikely the data will end up on the same database server. Decoupling reading and writing allows us to have both read and write affinity. The read database in the architectural diagram acts as a view cache. With CQRS, posting something on your wall will write it into the primary (write) database(s), and this will send notification events to everybody that’s interested in that. So let’s say my user is configured to always have read affinity with server X. That server will be notified by the servers which host the write part of the application queue, and it will update it’s local cache. Also, not only does the read part have different user affinity than the write part, the models are usually denormalized for speed of access.

Enter Flux

It didn’t occur to me for quite some time, but there are significant similarities between CQRS and Flux—one being present on the server side and the other one on the client side. In fact, they can be used together.


The parallels here mean that you don’t mutate state directly.