January 19, 2011

The Hexagonal Architecture and the pot of gold

As part of my research, I was trying to see how the big traffic sites are implemented. A lot of things are going around, a few diagrams are popular, but the main point seems to be able to distribute your code in as many machines as practical, as cheaply as possible.

That is good and reasonable, but in the end, my question was still ‘how do you do your code?’. Going specifically to it, the spectrum is broader, and you can find from massive spaghetti code, to massive structures done under industrial ideas and everything in between.

From all those approaches I liked most the DDDD approach, it being a distributed domain driven design. I knew DDD already, and the distribution part just made sense.

There are again a few different approaches to it, but being in .NET NServiceBus is the one I choose to follow, so I spent some time around it, and again, it makes sense when you look at the problems it is trying to solve.

But even than it makes sense, something was missing in the picture, at least the picture from the web site and mailing list. It looks more than a bunch of libraries (with very good reasons behind), but still suspiciously close to ‘use this technology and everything will be great’. I know that each one making that statement probably mean it, but I heard it too many times to believe in ‘technology’ solutions.

And then, I found CQRS, and from some video got the magical words ‘hexagonal architecture‘. It is a pattern, and as usual with them is deceptively simple. Instead of being concerned about a long line of code blocks calling one to the other, make it a single, tightly coupled core dealing with the outside using adapters. In particular, take the data layer outside of your model, and make it an external service.

I just loved the idea even before reaching the end of the article, but found it difficult to explain it to others. ‘What is the new thing about it, compared with any other data access layer?’, I was asked, and heard me babbling as I usually do when too many ideas jump in my head, obvious for me and nobody else around.

Now, the important thing for me is (and I just realized it a couple of hours ago) that instead of the situation from the last ten years with an object model and a relational model living in the same chunk of code (being it a single library or an arbitrary number of them but all related and interdependent), you can have only one model on each application. One for the domain, one for data storage, one for the external API, one for UI, one for Santa Claus if required. That is brilliant!

Do you have a business operation that you want to model? Do it disregarding anything about data storage or user interfaces. Make your model, make it the best you can, as simple or complicated as you need, and then request services from others (the main suspect the database) and provide services for something else (a UI or an external consumer).

The magic is, in your domain model now you really, but really really, don’t care about relational data. It does not matter if you have SQL or not, if you use an ORM or directly call ODBC. Data providing will have its own application, and the model in it will be only relational (or whatever you want). No business logic there, you don’t care in data storage about validating, or cascading activities, or logging or anything else. Your data storage application will only be concerned about storing and retrieving data. Just data. Genius!

There is nothing new in there. That should have been the role of the data layer. but the thing is, I never saw it so clearly separated until now. The theory was right, the implementation always ended as a mess and a bigger or smaller monolithic application.

The philosophy is even older, I still remember talking with people in FidoNet about the theory of Unix, and being thinking ‘what is a theory of an operating system? It is just a big program!’ without realizing that probably a very important concept was the fact that each operation is auto contained. You have twenty small applications and you pipe the results from one to the other. There is no need to solve the big problem (taking months or years of development), just do an application as small as possible, involving one activity to model, and off you go. It is a different implementation, but the concept is the same, and I love it.

How do this translate to my daily practice? Well, now I can focus in one thing at a time. I can do the model for something I need, finish it, and forget it. Then I can do the data storage. Or the UI, or some other atomic model. I can keep the external references outside, and prepare a bit of code to emulate the real thing until I have time to do it properly. Basically, I can advance a step at a time, and not be concerned about a landslide sending me back to the starting point. And I even get an extra bonus, because now application distribution is trivial, everything is distributed from the start!

NB: I am talking again to you, monkeys, it looks like we stroke gold here. Instead of being concerned about how long we will need to work until having something useful, we can make it as small as needed to be done in the available time, and still get something of value. Solid, commercial value, not just learning something new or doing something interesting. Are we going to stay sleeping or can we take the opportunity to eat an elephant a bit at a time? It is soft, it is golden, can we check if it is real gold with code?