Data Validation Light Paper
Light Paper for OpenLayer TLS Data Validation
Overview
In our quest to enhance data authenticity and accessibility across digital platforms, OpenLayer is committed to validating data transmitted via HTTPS TLS sessions, thereby impacting a substantial portion of communications over the internet. This initiative is part of a broader mission to deliver a modular, authentic data layer that caters to diverse use cases spanning both Web2 and Web3 ecosystems. OpenLayer’s innovative solutions are tailored to handle data access scenarios that range from publicly available information to privately held data requiring specific user authentication. This ensures that OpenLayer provides comprehensive coverage that bolsters the integrity and security of data across the internet to advance a secure and reliable digital ecosystem.
Related Work
In the realm of TLS session validation and notarization, a notable foundation was laid by the DECO protocol, as detailed by Zhang et al. (2019) in their pioneering work on leveraging 3 party TLS for privacy-preserving websession proofs (https://arxiv.org/abs/1909.00938). This protocol emerged as a cornerstone in the domain, influencing subsequent research in secure TLS mechanisms.
Building upon these foundational concepts, recent advancements by PADO Labs introduced an innovative approach in their paper "Lightweight Authentication of Web Data via Garble-Then-Prove" (https://eprint.iacr.org/2023/964). The study proposed a novel methodology by integrating semi-honest multi-party computation (MPC) with zero-knowledge proofs (ZKPs), eliminating the need for the more costly maliciously-secure MPC. This development underscores a shift towards more resource-efficient cryptographic practices in web data authentication.
Further exploration into TLS 1.3 ciphersuites has been conducted by several researchers, as evidenced by a cluster of studies that emerged simultaneously. Among these, DiStefano et al. (2023) presented a significant breakthrough in their paper (https://eprint.iacr.org/2023/1063), highlighting that the architecture maintains its security integrity even when various components, including the ciphersuite, are substituted—provided the underlying MPC protocol remains secure. This research effectively argues that malicious two-party computation (2PC) does not constitute a critical obstacle, paving the way for new architectures in TLS protocols. Complementary studies in the DIDO (https://eprint.iacr.org/2023/1056) and Janus (https://eprint.iacr.org/2023/1377) papers also contribute to the evolving landscape of TLS 1.3 attestations, focusing on specific implementation nuances and security enhancements within their respective frameworks. These collective efforts delineate a progressive trajectory toward refining and securing TLS session authentication through innovative cryptographic techniques.
An alternative approach was introduced by Liu et al (https://eprint.iacr.org/2024/733) and Reclaim. They put the Verifier as a proxy and remove the need for 3-party TLS. But the security only holds under the assumption that no-one can intercept the communication between the Verifier and Server (including the potentially malicious prover).
3P TLS and Websession Proof
The TLS protocol secures most web traffic today. The protocol is executed between a client and a server, and is composed of two phases:
Handshake. The client and server jointly decide on a set of encryption and authentication schemes that will be used for the session. Then, using digital certificates, the client and server authenticate each other. Finally, they use their public keys and freshly established ephemeral secrets to establish a joint, shared secret.
Record phase. Using the shared secret derived above, the client and server can derive symmetric keys for the encryption and authentication schemes they agreed upon during the handshake. These keys are then used to ensure the confidentiality and integrity of the session’s messages.
A single prover, acting as a client, cannot attest to the contents of a TLS session. Indeed, the contents of a session are authenticated with respect to a symmetric key. Therefore there are two parties that can authenticate data: the server, and the untrusted prover.
To solve this issue, previous work strips away knowledge of the symmetric key from the prover. The TLS client is instead ran jointly by the prover and a designated verifier (the notary). Using MPC, the prover and notary jointly compute all the functions prescribed by the handshake and record phase of the TLS protocol.
The transcript of a 3-party TLS session acts as a binding commitment to the session’s data. Later, given the symmetric key, the prover can produce a ZKP to convince a third-party verifier that a given plaintext is indeed encrypted and authenticated as part of the committed session.
Producing a Session Commitment
Producing a session commitment requires the prover and notary to engage in a number of MPC protocols (and more specifically, 2PC). We summarize these below.
Types of MPC:
Handshake: elliptic curve point secret sharing (additive secret sharing), point-shares-to-coordinate-shares (MtA protocol), deriving session secrets (depends on ciphersuite)
Record phase: authenticated encryption. If the TLS session includes multiple queries and responses, intermediate responses must also be decrypted.
As mentioned above, the resulting collection of ciphertexts (the transcript) acts as a binding commitment to the session’s data. However, this commitment’s length is proportional to the length of the session’s queries and responses. We can shorten the session commitment by computing a Merkle root, or Pedersen commitment over the transcript.
Proving phase
Once the commitment is produced, the notary can reveal their share of the TLS session keys. Having gained access to the full TLS session keys, the prover can now issue succinct (and optionally zero-knowledge) proofs attesting to certain properties of the session data.
In general, the proof will attest that:
the prover knows a (private) pre-image for the session commitment. In the case of an honest prover, this pre-image will be the (encrypted) TLS transcript.
the prover knows a (private) TLS session key and a (private) decrypted string. The ZKP will show that the encrypted transcript can be decrypted using this session key to obtain the decrypted string.
the decrypted string contains a series of queries and responses that certify a claimed made by the prover.
Data Privacy and Optimizations
Selective Disclosure
The 3P TLS protocol enables the prover to selectively hide parts of the websession from the notary, ensuring sensitive personal and authentication information remains private. This is achieved through interactive ZK (IZK) between the notary and the prover over specific parts of the transcript based on the commitment. This could also help with parsing and turning the transcript into a more usable structured set of data which can then be directly consumed.
Optimistic Execution
To reduce computation, gas costs, and network bandwidth, OpenLayer uses restaked assets for security and employs optimistic execution. Instead of only committing to the encrypted transcript, the notary also commits to their secret share for each session. The prover decrypts and processes the transcript, reports the extracted data along with commitments to the cipher transcripts and secret shares from both the notary and themselves. In case of a dispute, a retroactive proof can be generated against these commitments and checked on-chain.
Public Data Optimization
For public data without sensitive information, the decrypted transcript is also committed along with the reported data. This allows anyone to generate a retroactive proof and, if proven that the committed cipher text, secret shares, and decrypted shares do not match, the reporter can be penalized via slashing.
Handling Trust Assumptions on Notary Servers
The 3P TLS approach generally would introduce an additional layer of trust over the notary servers. If the notary server colludes with the prover, they can attest to anything the prover proposes even though they didn’t see it at all.
OpenLayer addresses this problem in five aspects.
Decentralized Notary Server Base
To mitigate the risk of provers getting their own notary servers, OpenLayer employs a decentralized notary server base. This decentralization strategy ensures that a random notary server is selected for each task, significantly reducing the probability of collusion.
Each time a prover commits to perform a task, a random notary server is selected from a decentralized pool. The selection process is designed to be unpredictable and secure, minimizing the chances of collusion.
The notary server base is geographically and administratively distributed. OpenLayer also plugs in multiple notary providers so an application or ecosystem can choose their more trusted notary providers for their tasks.
Trusted Execution Environments (TEE)
Trusted Execution Environments (TEEs) provide a secure enclave for executing sensitive operations. Running the notary servers inside TEEs ensures that the notary servers execute the intended binaries and keep internal memory containing secrets private.
A secure enclave is created within the notary server’s hardware, isolating sensitive computations from the rest of the system. This isolation protects against external attacks and unauthorized access.
TEEs provide attestation capabilities, allowing remote verification of the software running within the enclave. This ensures that the notary servers are executing the correct binaries and have not been tampered with.
The internal memory of the TEE containing secrets is protected from inspection, ensuring that even if the rest of the system is compromised, the sensitive data remains secure.
Anti-Bribing Game Theory
For third party collusion, where the prover bribes the notary server to collude with them, OpenLayer employs anti-bribing game theory to deter notary servers from colluding with provers. This approach incentivizes honesty by allowing provers to phish notary servers and offer bribes, which can later be exposed to penalize collusion. The steps are as following:
Commit to phishing: Provers can commit to a phishing action, simulating an attempt to bribe a notary server. This commitment is recorded on-chain, ensuring transparency and accountability.
Offer to bribe: The prover offers a bribe to the notary server. If the notary server accepts, the prover can subsequently reveal the bribe transaction.
Exposure and penalty: Once the bribe is exposed, the colluding notary server is penalized. The notary servers will always be under the threat that any bribing could be a phishing action and thus not be incentivized to participate honestly.
Multiple Notary Servers
To further reduce the risk of collusion, multiple notary servers can participate in a single web session. This multi-party approach ensures that compromising the session requires collusion among several notary servers, which is order of magnitude harder. The cost coming with it is that the computational cost is also going to grow exponentially as the number of notary servers grow. So this method will be beneficial in use cases where we really want to minimize collusion risks and are willing to sacrifice some efficiency.
Diversification of Operators and User Operated Watchers
Multiple operators execute each task simultaneously. The final result is determined by the median or unanimous agreement of the operators, ensuring robustness against individual errors or small amount of collusions.
At the same time a broad network of user devices acts as watchers or validators. These watchers independently verify the data, ensuring that any significant discrepancies are quickly identified. Both the operator set quorum and the watcher set quorum will need to be achieved to have reported results accepted
Data Authenticity and Fault Attribution
With the above approaches, someone who plans to compromise OpenLayer system and deliver untrue data will have to break all preventative approaches above including match with their own notary or someone who’s willing to collude bearing the phishing risk, break the TEE, successfully do it in more than half of the operators appointed for the task and compromise more than half of the appointed watchers. The stacked possibility adds security and dishonesty tolerance for the system and makes data authenticity high enough.
On the other hand, an honest operator or watcher would never worry about being slashed. When working with the notary server, they would perform the final check on whether the commitments match with what they saw and executed, and thus can make sure that no fraud proof will ever be performed on their reported results. This level of safety allows OpenLayer operators to deposit any assets as security, including ETH and not having to worry about hurting Ethereum security on accidental slashes or the majority tyranny. In other words, there will never be incorrect fault attribution.
To summarize, OpenLayer’s approach divides data authenticity and trustworthiness into two parts: the system’s tolerance for dishonest participants and the protection of honest participants against majority tyranny.
OpenLayer tries to minimize the first risk via various tunable methods as it cannot be eliminated. Instead tradeoffs between risk against cost must be made. At the same time, we eliminate the second risk, so that honest participants are always protected and to make sure that the OpenLayer AVS doesn’t harm or risk the base layer’s security.
Last updated