MVCC in Hyperledger Fabric

Multi-Version Concurrency Control (MVCC) is a family of techniques that let many transactions run at the same time while still preventing inconsistent updates.

The core idea is thus when you read a record, you read it’s value and version. Later, when you try to commit an update, the system checks whether the record is still at that same version. If someone else updated it first, your commit is rejected.

A practical analogy is a Git workflow. You check out a commit, edit a file, and prepare a merge. If the file changed on main since your checkout, Git refuses a clean merge and forces you to reconcile. MVCC is doing the same thing for key-value state (you built your update on an older version, can’t apply it as is).

In Hyperledger Fabric, MVCC is best thought of as optimistic, version based validation tied to it’s transaction model, where endorsers simulate against a snapshot and capture which committed version they depended on, committing peers later check those versions haven’t changed. If they have the transaction is deterministically invalidated during validate/commit.

Execute, Order, Validate

Fabric executes chaincode before ordering. Endorsing peers simulate the transaction and produce a read-write set (RWSet), which is then sent to the ordering service and later validated/committed by all peers. The official transaction-flow description is explicit that endorsers “execute … against the current state database to produce … read set, and write set” and that “no updates are made to the ledger at this point.”

That design buys throughput because simulations can run in parallel across peers and clients. It also creates the central risk: two transactions can be simulated against the same snapshot of world state, both appear internally consistent during simulation, and only later collide when the network tries to commit them in a single, total order. Fabric resolves that collision deterministically in the validate phase by applying MVCC checks at commit time.

sequenceDiagram
    participant Client
    participant Endorser as Endorsing Peer
    participant Orderer
    participant Committer as Committing Peer

    Client->>Endorser: Proposal
    Endorser->>Endorser: Simulate against current world state
    Endorser->>Endorser: Build RWSet
    Endorser-->>Client: ProposalResponse

    Client->>Orderer: Submit endorsed transaction
    Orderer->>Orderer: Total order + batch into block
    Orderer-->>Committer: Deliver block

    Committer->>Committer: VSCC
    Committer->>Committer: MVCC
    Committer->>Committer: Apply WriteSet for valid txs, tag invalid txs

Kwame transfers tokens

Assume an account-based token chaincode where balances live in world state as keys:

bal:kwame -> 100
bal:barma -> 5

Kwame submits a transfer of 60 tokens to Barma. The chaincode does the typical read, modify, write:

Read bal:kwame and bal:barma
Check kwame >= 60
Write new balances

What gets recorded during simulation

During simulation at an endorser, the read-set records the keys read and their committed version numbers, the write-set records the keys written and their new values.

For Kwame’s transfer, a simplified RWSet could look like this:

{
  "ns": "tokencc",
  "read_set": [
    { "key": "bal:kwame", "version": { "block_num": 120, "tx_num": 3 } },
    { "key": "bal:barma",   "version": { "block_num": 119, "tx_num": 9 } }
  ],
  "write_set": [
    { "key": "bal:kwame", "value": "40" },
    { "key": "bal:barma",   "value": "65" }
  ]
}

Fabric uses a blockchain height based versioning scheme.¹ The version of a key is the height of the transaction that last committed the key, represented as a tuple $\langle \text{blockNum}, \text{txNum} \rangle$ , where txNum is the index of the transaction within the block (starting at 0).

So reading bal:kwame at version $\langle 120, 3\rangle$ means the last committed update to bal:kwame came from transaction #3 in block 120.

Fabric does not support reading your writes inside a single transaction simulation. If you write a key and then read it in the same transaction, the read returns the last committed value, not the value you just wrote.

Where MVCC happens

When a committing peer receives a block, it validates each transaction and tags it valid or invalid. Fabric does validation as checking (1) signatures or endorsement policy and (2) that the current value of the world state matches the read set of the transaction when it was signed by the endorsing peer nodes; that there has been no intermediate update.

It’s useful to describe the commit time pipeline as rigid steps because the failure modes map directly to validation codes.

Transactions are checked to ensure they satisfy the endorsement policy (and related signature checks) failures produce ENDORSEMENT_POLICY_FAILURE
For each key in the transaction read set, compare the version captured during simulation to the version currently in world state. The validation rule is:

$V_{\text{current}}(k) \neq V_{\text{read}}(k) \Rightarrow \text{invalid}$

Fabric’s RWSet semantics specify that validity requires the read-set versions to match the current versions, including the effects of valid preceding transactions in the same block. A mismatch produces MVCC_READ_CONFLICT.
If the RWSet includes query-info for range queries, validation additionally ensures the range query would return the same results at commit time; otherwise the transaction is invalid and tagged PHANTOM_READ_CONFLICT.
Invalid transactions remain in the block history but do not update world state, only valid transactions have their write sets applied.

This diagram shows the decision path that produces the common conflict outcomes.

flowchart TD
    A[Start: received block B] --> B{For each tx i in block order}
    B --> C[Extract RWSet]
    C --> D{VSCC / endorsement policy ok?}
    D -- No --> E[Mark invalid: ENDORSEMENT_POLICY_FAILURE]
    D -- Yes --> F{For each read key k}
    F --> G[Lookup V_current in state view]
    G --> H{V_current == V_read?}
    H -- No --> I[Mark invalid: MVCC_READ_CONFLICT]
    H -- Yes --> J{More read keys?}
    J -- Yes --> F
    J -- No --> K{Range query info present?}
    K -- Yes --> L{Range results unchanged?}
    L -- No --> M[Mark invalid: PHANTOM_READ_CONFLICT]
    L -- Yes --> N[Stage writes, commit assigns versions <B,i>]
    K -- No --> N
    E --> O{More txs?}
    I --> O
    M --> O
    N --> O
    O -- Yes --> B
    O -- No --> P[Commit batch to state DB, append block]

MVCC checks are sequential within a block in the sense that the current version must reflect valid preceding transactions in the same block. Even if parts of validation can be parallelized, the correctness condition depends on a block-ordered evolving state view.

Both simulations succeed, one commit fails

Now add contention. Suppose Kwame submits two transfers concurrently:

T1: Kwame -> Barma, 60
T2: Kwame -> Diop, 60

Both transactions are simulated by endorsers before either is committed. During simulation, both read bal:kwame at the same version $\langle 120, 3\rangle$ and both produce a RWSet that writes bal:kwame = 40 (each believes Kwame had 100).

If the orderer batches them into the same block with T1 first, the committing peer processes T1, finds that $V_{\text{current}}(\text{bal:kwame}) = \langle 120, 3\rangle$ matches $V_{\text{read}}$ , marks T1 valid, and applies the write set. The version of bal:kwame becomes $\langle B, 0\rangle$ where $B$ is the new block number and 0 is T1’s index in that block. Versions are defined this way: $\langle \text{blockNum}, \text{txNum} \rangle$ .

When T2 is validated, it still carries $V_{\text{read}}(\text{bal:kwame}) = \langle 120, 3\rangle$ , but the current version is now $\langle B, 0\rangle$ . Since $V_{\text{current}} \neq V_{\text{read}}$ , T2 is tagged MVCC_READ_CONFLICT and its write set is discarded.

This is Fabric’s optimistic concurrency i.e do parallel speculative execution, then reject stale writers deterministically at commit.

Mitigation patterns

Conflicts are not exactly a client problem in the sense that retries alone do not fix throughput collapse under contention. The lever you control is the shape of your state updates, how many transactions must read the same keys (or ranges) to compute their writes.

Delta and append-only models

If your application is doing read, modify, write on a hot key (bal:kwame), you are forcing all writers for that account into a single version chain, and MVCC will reject stale writers. The scalable alternative is to stop updating a single aggregate key as the primary write path.

One approach is append-only deltas. Instead of writing bal:kwame directly, each transfer writes an immutable entry like delta:kwame:<txid> = -60 and delta:barma:<txid> = +60. These keys are naturally non-conflicting because they are unique per transaction. Aggregation into balances then becomes a read-time fold or, more commonly, an off-chain materialization step. The trade is you reduce MVCC contention on writes and push complexity into aggregation.

A related approach is to use a UTXO model deliberately spends consume unique keys, so independent spends don’t contend on a single aggregate balance key. The remaining hotspot moves to selection. If selection is implemented with broad range queries, phantom protection can become your dominant invalidation mode; you mitigate this by designing spend selection so that two concurrent spends are unlikely to select from the same key range segment, or by allocating UTXOs into partitions so range scans don’t overlap as often.

Off-chain batching and explicit serialization for hotspots

Some invariants are inherently single-writer per logical entity (e.g, “Kwame must never overspend his latest committed balance” in an account-based model). If you must enforce that invariant on-ledger with read, modify, write, then the system will serialize per-account updates somewhere. If you don’t provide that serialization, Fabric provides it indirectly by discarding losers at commit, which wastes endorsement CPU and reduces effective throughput.

A common architecture is an off-chain sequencer per hot key (actor model). All commands affecting bal:kwame go through a single partitioned worker keyed by account. That worker submits transactions in an order that avoids concurrent simulations against the same version, collapsing many small concurrent conflicts into one ordered stream. You can then batch multiple deltas for the same account into a single transaction per block interval, which reduces both MVCC conflicts and per-transaction overhead. The trade is added latency and operational responsibility for idempotency and backpressure in the sequencer.

Reduce read-set footprint where possible

Version checks only checks what you read. If your chaincode reads extra keys just in case (for example, reading bal:barma when you could treat the credit side as an append-only delta), you enlarge the conflict surface. The read-write semantics make it clear that the read set is what drives validity. Minimizing unnecessary reads is a direct way to reduce spurious conflicts, but it usually requires changing the data model (for example, moving from in-place increments to deltas) because correctness often forces reads.

Conclusion

MVCC in Hyperledger Fabric is the consequence of its execute order validate architecture where transactions are simulated against a snapshot, ordered later, and finally checked to ensure the versions they read are still current. That design enables parallel simulation, but it also means concurrent transactions that depend on the same state can both look valid during execution and still collide at commit.

In practice, MVCC conflicts are often more about data-model shape. Hot keys, broad read sets, and range-based selection all increase the chance that a transaction becomes stale before commit.

Thanks to Jakub Dzikowski and Arne Rutjes for reviewing the earlier draft and for their helpful comments.

This article describes versioning as implemented in mainstream Hyperledger Fabric releases, where key versions are represented by blockchain height. In Fabric-X, this has reportedly changed to a monotonically increasing sequence number. ↩