rponte/avoid-distributed-transactions.md

Last active October 20, 2025 12:29

Star (16) You must be signed in to star a gist
Fork (3) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/rponte/9477858e619d8b986e17771c8be7827f.js"></script>
Save rponte/9477858e619d8b986e17771c8be7827f to your computer and use it in GitHub Desktop.

Download ZIP

THEORY: Distributed Transactions and why you should avoid them (2 Phase Commit , Saga Pattern, TCC, Idempotency etc)

Raw

avoid-distributed-transactions.md

Distributed Transactions and why you should avoid them

Modern technologies won't support it (RabbitMQ, Kafka, etc.);
This is a form of using Inter-Process Communication in a synchronized way and this reduces availability;
All participants of the distributed transaction need to be avaiable for a distributed commit, again: reduces availability.

Implementing business transactions that span multiple services is not straightforward. Distributed transactions are best avoided because of the CAP theorem. Moreover, many modern (NoSQL) databases don’t support them. The best solution is to use the Saga Pattern.

[...]

One of the most well-known patterns for distributed transactions is called Saga. The first paper about it was published back in 1987 and has it been a popular solution since then.

There are a couple of different ways to implement a saga transaction, but the two most popular are:

Events/Choreography: When there is no central coordination, each service produces and listen to other service’s events and decides if an action should be taken or not;
Command/Orchestration: when a coordinator service is responsible for centralizing the saga’s decision making and sequencing business logic;

Author

rponte commented Dec 25, 2024 •

edited

Loading

💡 Top insights (and one point I don't agree with) about the Gregor Hohpe's article "Event-driven = Loosely coupled? Not so fast!":

TLDR
This is fantastic post from Gregor for reading slowly and deep thinking. It gets to the core of buzzwords such as EDA and Coupling used daily without really analysing direct and second-level consequences.

https://www.linkedin.com/posts/bibryam_i-love-blog-posts-that-make-me-think-gregor-activity-7218910524107878400-ZMy_/

I love blog posts that make me think. Gregor Hohpe hits the nail on the head 100% with this one. Here are the top insights (and one point I don't agree with):

Messages vs. Events
Messaging is an interaction style, whereas "Event" describes the semantics of a message.
Commands, Events, Documents are all Messages.

Messages vs. Channels
We must delineate which characteristics of an event-driven system derive from the properties of the channel rather than the message intent.
Message + Channel = Some Decoupling

Events vs. Coupling
This is the core of the article, introduces a unique perspective about coupling for the different interaction styles. We can now explore the Interaction Styles (RPC, P2P, Pub/Sub) with different Coupling types (change propagation).

Now vs Later
I love the second-level effects analysis too

Folks pitching EDA as decoupled may be the ones early in the lifecycle (like vendors) and won't have to live with the consequences of their (coupling) decisions. The more you take advantage of your ability to add recipients, the harder it will become to make a change. Thus, taking advantage of one dimension (add recipients) of decoupling exposes you more to another dimension of coupling (ex: schema changes).

⭐️ 5. What I don't fully agree with Gregor

"Location Coupling propagates location changes of a recipient to a sender (or vice versa). Message channels decouple location changes because they are (mostly) logical constructs... RPC can achieve some decoupling through DNS or active load balancers, but it's much more likely that a change affects the sender."

⭐️ 5.1.
IMO, one of the main reasons for using RPC is the decoupling of service consumer and provider, so they can evolve in isolation but also start/stop, scale, move, independently. Otherwise, there’s no reason to split a monolith into services and in-memory calls should be preferred over RPC.

⭐️ 5.2.
Today, no caller has a hard-coded callee address; it’s almost always a logical name:

Kubernetes service name, not Pod IP

Dapr sidecar/Istio proxy, with the logical name of the target service

AWS API Gateway address, rather than Lambda URL

Client-side service discovery that looks up the location, still not affecting the client

IMO, RPC today always involves some kind of proxy

⭐️ 5.3.
In all these cases, the provider can scale based on demand or move without affecting the client. You might say the client needs to know the seed location (K8s service, AWS gateway, sidecar, etc.), but that’s true for the location of the messaging channel too. Even if the topic is a logical name the broker URL is with fixed location.

Author

rponte commented Dec 31, 2024

How Complex Systems Fail

5. Complex systems run in degraded mode.
A corollary to the preceding point is that complex systems run as broken systems. The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws. [...]

16. Safety is a characteristic of systems and not of their components
Safety is an emergent property of systems; it does not reside in a person, device or department of an organization or system. Safety cannot be purchased or manufactured; it is not a feature that is separate from the other components of the system.

Author

rponte commented Jan 3, 2025 •

edited

Loading

Outbox Pattern - by Unico

The interesting part is they use Protobuf as a content type when sending events to the broker. Still, for some reason that's unclear in the article, they serialize this Protobuf data into JSON format before persisting it in the outbox table. I guess they do so because they use Debezium under the hood.
They also use the CloudEvents (v1.0.2) spec for defining the format of event data;

This is the Protobuf message using the CloudEvent spec:

syntax = "proto3";
import "google/protobuf/timestamp.proto";
import "google/protobuf/any.proto";  
message OutboxEvent {
  string specversion = 1;  
  string type = 2;  
  string source = 3;  
  string subject = 4;  
  string id = 5;  
  google.protobuf.Timestamp time = 6;  
  string datacontenttype = 7;  
  string dataschema = 8;  
  google.protobuf.Any data = 9;
}

And this is an example:

{
  "specversion": "1.0",
  "type": "someevent",
  "source": "integration",
  "subject": "1ec07712-79b7-485a-a0e2-0a1c33fd1016",
  "time": "2020-04-30T04:00:00Z",
  "datacontenttype": "application/json",
  "dataschema": "http://<schemapath>",
  "data": {
    "transactionId": "1ec07712-79b7-485a-a0e2-0a1c33fd1016",
    "doc": "123.123.123-00",
    "image_id": "ea02254f-28f4-4b31-99a5-957bb024f78d"
  }
}

Author

rponte commented Jan 16, 2025

Fidelis blog: System Design - Saga Pattern 🇧🇷 - artigo sobre Saga e Outbox Pattern escrito pelo Matheus Fidelis.

Author

rponte commented Feb 7, 2025

Do Caos a Consistência: A Ordem das Mensagens em Sistemas Distribuídos

Author

rponte commented Mar 26, 2025

River's blog: Building an idempotent email API with River unique jobs

Author

rponte commented May 3, 2025 •

edited

Loading

⭐️ (slides) Definition of Insanity: timeouts, retries and idempotency - by Sam Newman

Author

rponte commented May 3, 2025 •

edited

Loading

Thread on Twitter (X) by Qian Li:

Durable workflow timeouts

Timeouts are essential for building efficient and resilient systems. They help prevent systems from waiting indefinitely and free up resources while maintaining responsiveness under heavy load.

For example, suppose your server must finish a task within 30 minutes, but some operations are taking much longer to complete. Even if they eventually succeed, the response will still miss the deadline — wasting resources in the process. In such cases, proactively cancelling on timeout is the right choice.

DBOS docs: Workflow Timeouts

Author

rponte commented Jul 29, 2025 •

edited

Loading

Delivery semantics explained from the producer and consumer perspectives in Kafka: Kafka Message Delivery Guarantees

At most once: Messages are delivered once, and if there is a system failure, messages may be lost and are not redelivered.
At least once: This means messages are delivered one or more times. If there is a system failure, messages are never lost, but they may be delivered more than once.
Exactly once: This is the preferred behavior in that each message is delivered once and only once. Messages are never lost or read twice even if some part of the system fails.

Author

rponte commented Sep 7, 2025 •

edited

Loading

The most interesting part of how Dapr Outbox feature works is related to step 2: Dapr publishes an internal event BEFORE persisting the state and marker into the databases:

Author

rponte commented Sep 8, 2025

⭐️ Distributed transaction patterns for microservices compared

Author

rponte commented Sep 8, 2025

JavaZone: Ins and Outs of the Outbox Pattern - by Gunnar Morling

Author

rponte commented Sep 30, 2025

Implementing the Outbox Pattern - by Milan Jovanović

Author

rponte commented Oct 16, 2025

Microservices, clearing up the definitions -- by Andras Gerlits

For a software to be predictable, we need to make sure that single events from the client’s perspective are reflected as such throughout the whole system.

This means that if the client asks for a change, all its facets need to be accepted or rejected by the system as a single package, we can’t pick and choose which aspects to do or not to do, unless we have an intuitive way to prompt the user about what we have failed to achieve and that fault is translated back to the client. It’s easy to see that if we allow for such failures, we must design the corresponding 'translation' to the end user, and that this means understanding our client’s competence level with regards to that system. In other words, we can shift some of the responsibility on the end user, but only the ones we can expect them to manage appropriately and by providing them the right tools.

Author

rponte commented Oct 20, 2025 •

edited

Loading

⭐️ Spring Outbox | Github repository | LinkedIn Annoucement

Spring Outbox is a minimal-configuration Spring Boot library for reliably publishing domain events using the Outbox Pattern.

It works out of the box: you just add the dependency, enable the outbox, and provide a OutboxRecordProcessor bean. The library handles storing, processing, and retrying events automatically, so you can focus on your business logic instead of wiring infrastructure.

rponte/avoid-distributed-transactions.md

Distributed Transactions and why you should avoid them

rponte commented Dec 25, 2024 •

edited

Loading

Uh oh!

rponte commented Dec 31, 2024

Uh oh!

rponte commented Jan 3, 2025 •

edited

Loading

Uh oh!

rponte commented Jan 16, 2025

Uh oh!

rponte commented Feb 7, 2025

Uh oh!

rponte commented Mar 26, 2025

Uh oh!

rponte commented May 3, 2025 •

edited

Loading

Uh oh!

rponte commented May 3, 2025 •

edited

Loading

Uh oh!

rponte commented Jul 29, 2025 •

edited

Loading

Uh oh!

rponte commented Sep 7, 2025 •

edited

Loading

Uh oh!

rponte commented Sep 8, 2025

Uh oh!

rponte commented Sep 8, 2025

Uh oh!

rponte commented Sep 30, 2025

Uh oh!

rponte commented Oct 16, 2025

Uh oh!

rponte commented Oct 20, 2025 •

edited

Loading

Uh oh!

rponte/avoid-distributed-transactions.md

Distributed Transactions and why you should avoid them

rponte commented Dec 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💡 Top insights (and one point I don't agree with) about the Gregor Hohpe's article "Event-driven = Loosely coupled? Not so fast!":

Uh oh!

rponte commented Dec 31, 2024

Uh oh!

rponte commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Jan 16, 2025

Uh oh!

rponte commented Feb 7, 2025

Uh oh!

rponte commented Mar 26, 2025

Uh oh!

rponte commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Sep 8, 2025

Uh oh!

rponte commented Sep 8, 2025

Uh oh!

rponte commented Sep 30, 2025

Uh oh!

rponte commented Oct 16, 2025

Uh oh!

rponte commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Dec 25, 2024 •

edited

Loading

rponte commented Jan 3, 2025 •

edited

Loading

rponte commented May 3, 2025 •

edited

Loading

rponte commented May 3, 2025 •

edited

Loading

rponte commented Jul 29, 2025 •

edited

Loading

rponte commented Sep 7, 2025 •

edited

Loading

rponte commented Oct 20, 2025 •

edited

Loading