Core Implementation
monstera
package does not force any particular implementation style for cores, stubs and adapters. Package monsterax
implements my very opinionated style that I developed while working on Everblack services. I prefer to use Protobufs and
define functions with one Request
object as an argument and Response
object as a result (similar to gRPC service
declaration):
- Marshaling/unmarshaling comes out of the box.
- Types can be easily reused and composed with each other.
- Types can be reused between storage and core API.
- Stubs and adapters are type-safe and straightforward.
Those core protos are internal only and completely separated from public gRPC API that Everblack services expose. All business logic deals with core protos, and they are converted to/from public protos only at the API gateway level.
TODO
Rules
There are certain constraints that must be respected by the application core implementation. Apart from that there is significant freedom in the way how to implement it and enough room for imagination.
Rule 1: Must be deterministic
Applications cores are Finite State Machines from Raft perspective. When a message is commited Raft applies this message to its FSM, on each node independently. Also, when a new node joins a Raft group (or a node is far behind on the commited log) the node is being restored from a snapshot and replaying all other messages that happened after the snapshot was taken. Therefore, the same sequence of messages must transition all machines to exactly the same state (even days after they actually happened).
Practically that means there should be no time.now()
, math.rand
or anything like that in the code of the
application. Current time should be passed from outside in the message along with other inputs. Random values such as
ids and crypto keys also should be passed as inputs. There are also time-based tools in some popular libraries, such as
TTL on keys in embedded databases or cache libraries. Those must be avoided. If there is a need in expiration or garbage
collection of data inside a core, it should be implemented controllably. All time-based operation should be performed
with time provided from outside.
Rule 2: Must have no side effects
Application cores must not make requests to external services or emit messages. Its job is to take an input and modify its state based on that input and its current state, and nothing else. There are several reasons for that:
- No error handling from calling an external service (see Rule 3).
- Non-deterministic behavior based on the state of the external service (see Rule 1).
- The same message is applied to all nodes in the group, so side effects would be executed several times.
- Raft can replay messages after recovering from a snapshot, so side effects would be also replayed.
Writing to a local log file or collecting a metric for development purposes is ok. Those are not parts of the business logic.
There are different techniques for solving common problems:
- Emitting an event when a record is updated. Instead of calling another service (message broker, webhook, etc) to publish an event, a queue/stream can be added to that core and events will be stored there within the same operation/transaction as the orignal record mutation which triggered that event. Events can be pulled from that queue.
- Calling other service for related data. For example,
CreateQueue
operation needs to know what is the maximum number of queues is allowed for that particular user. Instead ofQueuesCore
calling Accounts service to get current service limits, the caller ofCreateQueue
operation should figure out those limits first and provide the number in the input. This is also deterministic (Rule 1), and even if service limits change over time, the same Update message will result in the same state transition after being replayed.
Rule 3: Must panic on internal errors
Raft is not designed to handle errors from FSMs. After a message is commited it must be applied to the FSM. If FSM fails to apply there is no backpressure, or intermediate state, or any other mechanism to handle that error. If FSM fails to apply this Raft node is declared dead.
Depending on the implementation of an application core there only few valid reasons to fail: out of memory errors, disk out of space errors, etc. Application level errors (i.e. object not found, invalid request, etc.) should be returned from FSM as a part of the result.
Panics are also easier to notice during local testing or load testing. Silent errors can lead to data corruptions and are harder to catch.
Rule 4: Must allow sequential writes and parallel reads
Raft applies messages to its FSM in a single thread. However, Monstera performs reads bypassing Raft directly on the application core, for performance reasons. Therefore, writes are always sequential, and reads can happen in parallel with writes or other reads.
Rule 5: Must take snapshots concurrently
Snapshotting is performed in two steps. First, Snapshot()
is called to take a snapshot of the current state. Then,
this snapshot is written to a sink. Snapshot creation is happening between FSM writes, but actual snapshot writing
to a sink happens in parallel with other FSM writes. This way large snapshots or slow sinks do not interrupt FSM updates.
Implementation of Snapshot()
must be fast, typically it is a copy operation for copy-on-write data structures, or a
new transaction in database with snapshot
isolation level or higher.
Rule 6: Must not change behavior with code changes
When a Monstera cluster is deployed each node is restarted one by one to keep the cluster available. That means there will be two versions of the code running at the same time and two different versions of FSM will apply Raft messages. This must lead to the same state on each node regardless of the code version running there.
To achieve that FSM implementation must keep the same behavior even after code changes. If there is a need to change behavior a flag should be added to branch between old and new behaviors first. It should default to old behavior. The flag state is stored in the FSM itself and switched by sending a special migration input some time after the deployment finished. This way the behavior can be switched synchronously for all nodes.