Java Мissionary + stuff: architecture

Sunday, July 24, 2016

Microservices mishaps: 1 query, multiple DBs

I want to share my experience with the latest trend in software engineering - microservices.

In particular, I want to explain a case I recently stumbled upon which made us hate the strategy we've got of splitting the big monolithic application to microservices.

The use case is relatively simple - one wants to make a query concerning multiple entities. They have relations between them nevertheless, the data ideally should be owned by separate microservices in an autonomous way.

The example:

Let's say that the product is a ERP-like and naturally we would have entities like company and invoices. Naturally, we'd went for microservice for the company - let's name it company-srv and one for invoices - invoice-srv. Both of these should own the APIs and DB

Now what happens when we want to make a query that is something like:

Get all companies with more than 1000 invoices

or slightly more complex example:

Get all invoices that haven't sent email-report last month for the companies that have this feature enabled.

How do you handle such responsibility - is it in invoice-srv's ownership and it should have a dependency on the other services? How do you keep the performance high with increased row count?

Even if you handle this with additional relations between the 2 DBs between the autonomous services, let's add another more interesting factor to the situations.

The more complex (and quite common) example:

Let's try to intertwine permissions - let's say that the user who is trying to see the results of the above queries, doesn't have permissions for some of the companies? Or even more intriguing - for some of the branches of the companies...

Whose responsibility is it to orchestrate the query? Does it become a multi-query operation? Is there going to be a permissions-srv ? What about performance? Pagination?

...

All these are questions we ask when we pick microservices. Otherwise, it would have been a simple JOIN in the monolithic approach....

Are we going backwards? :)

PS:
more on the topic this cool post: https://www.javacodegeeks.com/2016/07/hardest-part-microservices-data.html

Saturday, August 20, 2011

A diffrent software architecture - LMAX

It was an interesting afternoon to read the article that the famous Martin Fowler had written a month ago.

It's about a different kind of architecture, how it worked and how the software engineers got to that solution of their problems. The project is about processing multiple concurrent orders and the platform had to provide low latency. The team managed to create a system that copes with 6 million transactions per second.
The strange thing is that they do it on a single-threaded application with in-memory database approach (caches).

They have tried other more-conventional approaches like multi-threading and the Actor approach but the performance of the Queue was limiting them, also the need to synchronize threads and interactions between them, context switch at CPU level was drawing performance back.

So they decided to separate input/output and processing into different modules.
Each input event gets unmarshalled and all its required pre-processing (IO operations and credit card verification operations for example) are done before hand and inserted into a the main stream that is processed by the business logic unit.

The team also create special implementations for collection classes so that they use caches better. They processed events not in a Queue approach but with a ring buffer and on it were running several processor modules - one for unmarshalling, one for backup (Journal) and Replicator and finally the Business Logic Processor. They are in a constant race and there are rules that prevent 1 from getting ahead of the other and processing a data field that isn't ready yet. This ring buffer has enormous size (20 million items)

The main idea is that CPU can process fast sequence of operations which are simple (therefore very optimized by the Java compiler), and using in-memory caches (no SQL queries), this could push the performance of a system high.

Also another gain for the team is that they don't have to cope with complex techniques like transactions, inter-thread communications etc...

I think I've just covered the basics of this software architecture approach. If you are interested enough, do visit the original article at -> http://martinfowler.com/articles/lmax.html