Validity-Proof based onchain-offchain hybrid apps

You must've heard about "Zero Knowledge Proofs" in relation to two big blockchain use-case categories: privacy and scaling.

I've written a conceptual introduction to zero knowledge proofs and covered how ZKPs have been used for onchain privacy in my Tornado Cash manual.

This post is a comprehensive walkthrough of how proofs are used for scaling. I've covered this at a high-level in my mental model post on rollups. We'll dive into details in this post. In particular, we'll reference StarkEx, produced by Starkware.

It might help to read the articles linked so far as "pre-requisites". Nevertheless, this post aims for no jargon, no unwarranted conceptual jumps etc. -- I promise (or at least, I try).

"Validity" proofs, not "ZK" proofs

A validity proof is the output of a cryptographic scheme. It is a small piece of data ("proof") that gives you a hard, mathematical guarantee that a given set of computations produces a given result, without the verifying party having to replay the executions to derive the result for themselves.

A zero knowledge proof is the output of a cryptographic scheme. It is a small piece of data ("proof") that lets you verify that a given piece of original data satisfies a given set of constraints, without having to reveal the original data itself.

Both definitions are picking out the same class of proving algorithms, but emphasize different properties of the proving algorithm. So, the term you use to talk about the proving algorithm depends on the property of the proving algorithm that your application relies on.

In the case of privacy applications like Tornado Cash, it is appropriate to use the term "zero knowledge" proofs since the property of the proofs that's key to the privacy application is not having to reveal the original data you're proving something about. In the case of scaling applications, we use the term "validity" proofs, since the property key to scaling applications is that proof validation is significantly cheaper than replaying the computations that the proof is about.

What "scale" are apps looking for?

All blockchains face a throughput-decentralization trade-off.

targeting high throughput increases the operational cost of running a node, which increases the barrier of entry for running a node and thus, can lead to a more centralized network of fewer nodes.
targeting low throughput means lower operational costs, which can lead to a more decentralized network of more nodes, but it also creates a worse platform for hosting high frequency apps.

Thus, we end up with a hard problem: a single blockchain cannot target high throughput without impacting operational cost per node. This also creates a problem for a number of app categories.

Most prominently -- trading. Users hold monetary value in the form of tokens (assets) on a blockchain. This blockchain may have slower transaction confirmation times (compared to sub-millisecond confirmation times of trades on centralized exchanges/orderbooks), since the blockchain trades off performance for decentralization and security guarantees.

The "scale" that blockchain applications want to achieve is essentially the ability to process the throughput (e.g. trades per second) at the levels of their centralized counterparts. The idea being that if performance is nearly equal, a product offering built on decentralized, permissionless infrastructure is better than one built on permissioned, centralized infrastructure.

Achieving scale via validity proofs

Going forward, this post will discuss the scaling applications in the context of building an orderbook app, for two main reasons:

This is a popular app category, where the motivation is clear for building a decentralized version of the product that's just as fast as its centralized counterpart.
We get to use starkex smart-contracts repository which is excellent reference material for an orderbook app.

The focus of this exercise is to walk through the general architecture of applications that use validity proofs to scale. So, we won't get hung up on starkex-implementation-specific details that are not hard facts about how such applications must be architected.

Here's the general strategy that apps employ to achieve this scale through the use of validity proofs.

A beefed up, offchain application server can process transactions/intents at the speeds of centralized orderbooks. This application server aims to act as a fast execution environment for orderbook/app actions that will let users trade their onchain monetary value/assets fast.
The offchain application server works in tandem with a canonical state smart contract deployed on a "settlement blockchain" (e.g. Ethereum mainnet). This smart contract stores all user assets that are made available on the offchain orderbook application, and this contract tracks the canonical ledger of user asset balances. This ledger is periodically updated by the offchain application server, which periodically informs the contract about (i) the app actions that took place within the period and (ii) the resultant state of the ledger.
This interaction of the offchain application posting state updates to the contract is codified in the updateState function of the UpdateState smart contract (part of the canonical state smart contract being discussed).

abstract contract UpdateState is
    StarkExStorage,
    StarkExConstants,
    MStarkExForcedActionState,
    VerifyFactChain,
    MAcceptModifications,
    MFreezable,
    MOperator,
    PublicInputOffsets
{
    
    function updateState(uint256[] calldata publicInput, uint256[] calldata applicationData)
        external
        virtual
        notFrozen
        onlyOperator
    {...}

}

The ledger tracked by the smart contract, and honored by the offchain application, is made up of three merkle trees: (1) $\text{validium-tree}$ , (2) $\text{rollup-tree}$ and (3) $\text{order-tree}$ . The smart contract only stores the root of each merkle tree.
Each leaf of the $\text{rollup-tree}$ is essentially the unique pair (owner (unique key), asset (unique key), balance). App actions (e.g. a matched order on the orderbook app) update some leaves of this tree. For example, starting with (Alice, ETH, 10), (Alice, USDC, 0), (Bob, USDC, 10000), and (Bob, ETH, 0), say a buy/sell order exchanging 1 ETH for 1000 USDC is matched between them, then the new leaves are (Alice, ETH, 9), (Alice, USDC, 1000), (Bob, USDC, 9000) and (Bob, ETH, 1). This results in a change in the root of the $\text{rollup-tree}$ merkle tree.
The application may process thousands of such app actions, resulting in a state transition of the $\text{rollup-tree}$ from $\text{root-before}$ to $\text{root-after}$ . The $\text{root-before}$ value is currently stored in the canonical state contract as the current state of the application ledger. The goal of the offchain application now is to supply the contract with $\text{root-after}$ , and undeniably prove to the contract that $\text{root-after}$ is the result of applying a legal/valid sequence of app actions onto $\text{root-before}$ .
The offchain application does this with the help of a proof-generation server. This proof-generation server has a validity proof circuit that takes the $\text{root-before}$ , $\text{root-after}$ and app actions as inputs and outputs a validity proof. The system also has deployed a proof-verifier smart contract on the settlement blockchain that's generated based on the proof circuit, and is able to verify a validity proof produced by that circuit alone.
The offchain application intends to prove the validity of the $\text{root-before}$ -> $\text{root-after}$ state transition to the canonical state contract by supplying it this validity proof, and having it verify the proof with the help of the verifier smart contract, in order to be assured and consider $\text{root-after}$ as the new canon state of the application. The important property of validity proofs is that running the proof through a verification operation codified in the verifier smart contract verify(public_inputs, proof) -> bool is vastly cheaper than essentially replaying all app actions on $\text{root-before}$ in the canonical state smart contract and deriving $\text{root-after}$ this way (which would be prohibitively gas expensive).
The model followed by the starkex system is as follows: the offchain application server ingests user intents for app actions, verifies the intents and executes the actions, thereby updating its locally stored ledger / locally stored $\text{rollup-tree}$ root. Once it has processed its periodic cap of app actions, it sends the locally tracked $\text{root-before}$ (which is also stored at the moment on the canon contract as the state of the application), the locally tracked resultant $\text{root-after}$ , and the ordered set of app actions to the proof generation server as inputs.
Here, the $\text{root-before}$ and $\text{root-after}$ inputs are sent as 'public inputs'. These inputs are meant to be broadcast as publicly visible/accessible when sent to the canonical state contract (and then, in turn, to the proof-verifier contract) since the canonical state contract needs to perform basic assurance checks e.g. that the proof-verification is proving an honest state transition applied upon the actual $\text{root-before}$ stored in the canonical state contract. Other inputs to the proof generation server e.g. the app actions themselves are 'private inputs'. These need not be provided to supplement the proof data in order to perform successful proof verification.
The proof-generation server generates a validity proof based on the inputs, and posts them onto the proof-verifier smart contract. The contract verifies the proof's correctness and marks the proof as successfully verified; this status is stored against the unique identifier of hash(public_inputs).
The offchain application server now sends $\text{root-before}$ , $\text{root-after}$ and public inputs to the updateState function of the canonical state smart contract. The contract checks that $\text{root-before}$ is the root it has currently stored as the application's canon state. Then, it sends the unique identifier hash(public_inputs) to the verifier smart contract and checks whether the associated validity proof has been successfully verified. If so, the contract stores $\text{root-after}$ as the canon state of the application.
This overview of the offchain app and onchain contracts interaction covers app actions that are concerned with transfers, amongst users, of asset balances already available on the application ledger. We now extend our overview to entries and exits -- users depositing onchain assets into the application and withdrawing from it.
Users would want to deposit their onchain assets into the application to be able to trade with their onchain monetary value on the settlement blockchain, but in a fast execution environment, thereby trading at CEX speeds while having their assets be self-custodial and existing on a secure, permissionless and decentralized blockchain.
Users deposit assets into the application by calling e.g. the depositERC20 function in Deposit.sol (a part of the canonical state smart contract). This pulls the user's asset e.g. USDC into the canonical state smart contract, and tracks the user's deposit in storage as a pending deposit. Here, 'pending' denotes that although the user has indeed deposited onchain assets into the app smart contract, the offchain app server itself is yet to honor that deposit and update its locally tracked asset balances ledger to reflect the deposit. The offchain application server, then, sees the deposit, and processes it as an app action, thereby updating the relevant $\text{rollup-tree}$ leaf and thus, the root too. This deposit app action undergoes the same lifecycle as that of app actions like placing an order.
The deposit app action is additionally used throughout the proof generation -> verification lifecycle as a public input (similar to $\text{root-before}$ and $\text{root-after}$ ). Therefore, in an updateState call on the canonical state contract, the contract is able to check that the provided deposit action indeed has a corresponding pending deposit tracked in storage; the contract then deletes the pending deposit from its storage since the deposit has been reflected in the application ledger.
Deposits are entries into the application, withdrawals are exits. Users initiate a withdrawal request by interacting with the offchain application server. The server processes the withdrawal as an app action similar to a deposit, in that it is treated as a public input (similar to $\text{root-before}$ , $\text{root-after}$ , all deposit actions). In an updateState call, when the contract processes a withdrawal request, it locally stores a pending withdrawal owed to the respective user, which the user can claim by directly interacting with the contract.
The main goal of this whole mechanism is to create a guarantee for users of the application that the offchain application server processes user intents of app actions (e.g. placing an order on the orderbook, etc.) honestly, with exact accounting of consequent user balances. This mechanism cannot create this guarantee all by itself. Firstly, the proof circuit used by the application must be open source and publicly auditable. Only then can there be public assurance that the proving mechanism involved is truly proving for the honest processing of app actions and exact accounting of consequent user balances. Secondly, the offchain application server has the power to simply ignore user intents for app actions sent to it, thereby bricking users. This is addressed by the canonical state contract, which is designed to let users enforce actions by interacting with the contract directly, rather than with the offchain application server.
The main risk for users is that their onchain assets, once deposited, cannot be withdrawn since the offchain application ignores the user's withdrawal requests (this also breaks the property of self-custody). The canonical state smart contract (which actually holds the onchain assets) lets users create 'force-withdrawal' requests. These requests have a timelock/deadline associated with them. If the offchain application server does not process this withdrawal request within the given deadline, the withdrawing user can freeze the entire canonical state smart contract, which essentially renders the entire application frozen.
In this frozen state, the canonical state contract enables an 'escape path' for users to withdraw their assets locked in the contract. A user must present a proof of their respective leaf's inclusion in the $\text{rollup-tree}$ merkle tree (a common proof technique), upon which the contract creates an associated pending withdrawal for that user which the user can then claim to complete the withdrawal.
In order for a user to create this proof of inclusion of their respective leaf in the $\text{rollup-tree}$ merkle tree, they need to be able to reconstruct the whole current state of the tree. However, the canonical state contract only stores the tree root in its storage.
A user is able to reconstruct the $\text{rollup-tree}$ by replaying all updateState transactions made on the canonical state contract since its genesis. In each updateState transaction, the calldata of the transaction includes the updateState function selector and function arguments, like any other function call, but it also includes excess data appended to the end of the function-specific calldata which is simply ignored by Solidity (since it only parses calldata that its function requires it to). This excess data is the "diff" of the leaves of the $\text{rollup-tree}$ that have been updated in the app actions codified in the updateState transaction. Since this calldata is stored and accessible permanently, users are always guaranteed to have the data required to initiate force-withdrawals and combat any censorship from the application.
Note that a frozen state is the worst case for the application. So, it is in the best interest of the application to always honor withdrawal requests and prevent even the prospect of force-withdrawal requests.

Our discussion so far has largely centered around the $\text{rollup-tree}$ . Let's complete the loop by going over the two other trees that the canonical state smart contract tracks: the $\text{validium-tree}$ and $\text{order-tree}$ .

Similar to the $\text{rollup-tree}$ , each leaf of the $\text{validium-tree}$ is also a pair (owner (unique key), asset (unique key), balance). The difference between the two trees is where their leaves data is stored such that the trees can be reconstructed, at any given moment. The $\text{rollup-tree}$ leaves data is stored onchain in calldata in the settlement blockchain, and this is expensive. The application bears this expense (as it makes updateState transactions), but it might push this expense down to the user in the form of trading fees. An inexpensive alternative to storing leaves data in blockchain calldata is to just store leaves data in some offchain database. That's exactly how the $\text{validium-tree}$ works; its "diff" of leaves data on every updateState call is posted onto an external, centrally controlled (ideally, publicly accessible and auditable) data layer.
Once data is successfully posted onto this external data layer, the administrators of this data layer attest to this fact on an attestation smart contract (a simple registry mapping a hash of $\text{validium-tree}$ related public inputs to whether the leaves diff data for it has been posted). The updateState function ensures this attestation exists for the $\text{validium-tree}$ data it receives. This is a known and deliberate design choice/interaction with a centralized component, though this is not a design decision that is absolutely necessary for this whole starkex-like system to work.
The purpose of tracking this $\text{validium-tree}$ , in addition to the $\text{rollup-tree}$ , is to give users a cheaper alternative for transacting on the application (since storing data on an offchain data layer is cheap). Simply put, the canonical smart contract supports a user choosing whether they want their given app action tracked in the validium or rollup tree. The trade-off for users is that the leaves data of the $\text{validium-tree}$ is only available as long as the offchain, external data layer's infrastructure is up, and the data is accessible, as opposed to the leaves data of the $\text{rollup-tree}$ which lives in the calldata of the settlement blockchain, with stronger data availability guarantees. As explained previously, this data is crucial to users in case they need to force a withdrawal, and exit the system when it's frozen.
The $\text{order-tree}$ is a merkle tree where each of its leaves is an order and its state (partially filled, filled, etc.) submitted on the orderbook codified by the offchain application server. This tree is validated against an attestation smart contract, similar to the $\text{validium-tree}$ , since the goal with this tree too is to make it available on an inexpensive, publicly accessible/auditable data layer.
Note that the $\text{rollup-tree}$ , $\text{validium-tree}$ and $\text{order-tree}$ are all parts of the validity proof that the canonical contract verifies; so essentially, the canonical contract is verifying the correctness of the $\text{root-before}$ -> $\text{root-after}$ state transitions of all three trees.

In conclusion

Blockchains have emerged as great stores of value, be it for their decentralization properties, public auditability, etc. However, blockchains are distributed databases with multiple nodes, and they necessarily face some version of the throughput-decentralization trade-off.

Blockchain applications looking to offer a self-custodial user experience with the throughput and scale of their centralized counterparts can use validity proofs to achieve their desired scale, while securely accessing the liquidity that lives onchain and letting users trade it.

Overall, it is the mechanism of users being able to force actions on the offchain application via interactions directly with the smart contract on the settlement blockchain, that creates a technically and economically 'hard' connection between the offchain application and the onchain smart contract on the settlement blockchain.