NATS is an excellent, clustered, full-mesh PubSub messaging system, highly performant and a cakewalk to setup. Full mesh means every node (servers and clients) knows about every other node, which is great, but makes it tricky to have multiple publishers on hot standby, for high availability of publishers (not the NATS network), while avoiding duplicate pubs.
Here --no-advertise comes in handy if we're willing to sacrifice the automatic meshing and discovery mechanism. This may be acceptable in setups where only a fixed set of NATS servers run in a cluster and whose addresses (either IPs or hostnames) are known.
The gnatsd --no-advertise flag makes a NATS server not advertise itself automatically to the mesh. For other nodes to discover --no-advertise nodes, the --routes have to be explicitly specified. If there are N servers, there should be N routes.
gnatds -sl reload=pid makes a running NATS server reload configuration from its config file (-c) without downtime. This can be used to take out
- A cluster of two NATS servers
server0andserver1that haveNsubscribers listening to the subjecttest. - A live publisher,
publisher0that is publishing on the subjecttest. - A hot standby publisher
publisher1, who is also publisheing on the subjecttest, but whose messages should only take effect in the cluster ifpublisher0goes down.
- Each publisher gets its own local NATS server (here,
dummy-nats0anddummy-nats1respectively for publisherspublisher0andpublisher). - The publishers do not publish directly to the upstream cluster, but to their local NATS servers.
- The primary publisher
publisher0's dummy NATS serverdummy-nats0is clustered to the upstream NATS servers (viaroutes). - The backup publiser
publisher1's dummy NATS serverdummy-nats1is not clustered to the upstream NATS servers (emptyroutes). - These configurations are specified in local configuration files.
- When publisher0 goes down or there is a fault (assuming there's a healthcheck mechanism)
- Remove the upstream's NATS
routesfromnats-dummy0's configuration and issue agnatsd -sl reload. - Add the upstream's NATS
routestonats-dummy1and do agnatsd -sl reload.
- Remove the upstream's NATS
The messages publisher0 had been publishing will immediately cease and make way for publisher1. Even if publisher0 or dummy-nats0 come back up, the messages will be self contained and not pushed to the cluster as the --no_advertise prevents automatic discovery and cluster formation, avoiding duplicate messages.
+-----------------------------------------------------------------------------------------------------+
| |
| N ... subscribers |
| |
+-----------------------------------------------------------------------------------------------------+
-/ -\
-/ -\
-/ -\
-/ -\
-/ -\
+----------------------------+ +---------------------------------+
| | | |
| NATS server0 | | NATS server1 |
| | | |
| listen :4222 | | listen :4222 |
| cluster-listen :4248 |---------| cluster-listen :4248 |
| no-advertise | | no-advertise |
| | | |
| | | nats-routes server0:4248 |
+-------------|--------------+ +---------------------------------+
| -----/
| -----/
| -----/
| -----/
| -----/
| -----/
|--/
+-----------------|--------------+ +--------------------------------+
| | | |
| NATS dummy-nats0 | | NATS dummy-nats1 |
| | | |
| listen :4222 | | listen :4222 |
| cluster-listen :4248 | | cluster-listen :4248 |
| no-advertise | | no-advertise |
| | | |
| routes server0:4248 | | nats [] |
| server1:4248 | | |
+----------------|---------------+ +----------------|---------------+
| |
| |
| |
+----------|----------+ +----------|----------+
| | | |
| publisher0 | | publisher1 |
| subject test | | subject test |
| | | |
| | | |
+---------------------+ +---------------------+