As artificial intelligence (AI) appears destined to become central to everyday digital applications and services, anchoring AI models on public blockchains could potentially help “establish a permanent trail of provenance,” according to Michael Heinrich, CEO of 0G Labs. According to Heinrich, such a provenance trail enables “ex-post or real-time monitoring analysis” to detect manipulation, bias, or use of problematic data during the training of AI models.
Embedding AI on Blockchain Tools in Promoting Public Trust
In his detailed responses to questions from Bitcoin.com News, Heinrich – a poet and software engineer – argued that anchoring AI models in this way helps maintain their integrity and increase public trust. Furthermore, he suggested that the decentralized nature of public blockchains allows them to “serve as a tamper-proof and censorship-resistant registry for AI systems.”
As for data availability or lack thereof, the 0G Labs CEO said this is a concern for developers and users alike. For developers building on Layer 2 solutions, data availability matters because their respective “applications must be able to rely on secure light client authentication for correctness.” For users, data availability assures them that a “system is working as intended, without having to run entire nodes themselves.”
Despite its importance, data availability remains costly: it represents up to 90% of transaction costs. Heinrich attributes this to Ethereum’s limited data throughput, which is around 83 KB/sec. As a result, even small amounts of data become prohibitively expensive to publish online, Heinrich said.
Below you will find Heinrich’s detailed answers to all questions sent.
Bitcoin.com News (BCN): What is the data availability (DA) problem plaguing the Ethereum ecosystem? Why is it important for developers and users?
Michael Heinrich (MH): The data availability (DA) problem refers to the need for light clients and other off-chain parties to efficiently access and verify the blockchain’s full transaction data and status. This is crucial for scalability solutions such as Layer 2 rollups and sharded chains that conduct transactions outside the main Ethereum chain. The blocks of executed transactions in Layer 2 networks must be published and stored somewhere so that the light client can perform further verification.
This is important for developers building on these scaling solutions because their applications must rely on secure light client authentication for correctness. It is also important for users who interact with these Layer 2 applications, as they need assurance that the system is working as intended, without having to run full nodes themselves.
BCN: According to a report from Blockworks Research, DA fees represent up to 90% of transaction costs. Why do existing scalability solutions struggle to provide the performance and cost-effectiveness needed for high-performance decentralized applications (dapps)?
MH: Existing Layer 2 scaling approaches such as Optimistic and ZK Rollups struggle to provide efficient data availability at scale, due to the fact that they need to publish entire data blobs (transaction data, state roots, etc.) to the Ethereum mainnet so that lightweight clients can sample and Verify. Publishing this data on Ethereum comes at a very high cost – for example, one OP block costs $140 to publish for just 218 KB.
This is because Ethereum’s limited data throughput of around 83 KB/sec means that even small amounts of data are very expensive to publish on-chain. So while rollups achieve scalability by executing transactions off the main chain, the need to publish data on Ethereum for verifiability becomes the bottleneck that limits their overall scalability and cost-effectiveness for high-throughput decentralized applications.
BCN: Your company, 0G Labs, also known as Zerogravity, recently launched its testnet with the aim of bringing artificial intelligence (AI) to the chain, a data load that existing networks cannot handle. Can you tell our readers how the modular nature of 0G helps overcome the limitations of traditional consensus algorithms? What makes modular the right path for building complex use cases like on-chain gaming, on-chain AI, and high-frequency decentralized finance?
MH: 0G’s main innovation is the modular separation of data into data storage and date publishing lanes. The 0G DA layer sits on top of the 0G storage network which is optimized for extremely fast data ingestion and recovery. Large data such as block blobs are stored and only small commitments and availability proofs flow through to the consensus protocol. This eliminates the need to transmit the entire blobs over the consensus network and therefore avoids the broadcast bottlenecks of other DA approaches.
Furthermore, 0G consensus layers can scale horizontally to prevent one consensus network from becoming a bottleneck, achieving infinite DA scalability. With an off-the-shelf consensus system, the network could reach speeds of 300-500 MB/s, which is already a few magnitudes faster than current DA systems, but still does not meet the data bandwidth requirements for high-end applications such as LLM model training that is in the 10s of GB/s.
Customized consensus building could achieve such speeds, but what if many participants want to train models simultaneously? That’s why we’ve introduced infinite scalability through data-level sharding to meet the future demands of high-performance blockchain applications by leveraging any number of consensus layers. All consensus networks share the same set of validators with the same staking status, so they maintain the same level of security.
In summary, this modular architecture enables scaling to handle extremely data-heavy workloads such as on-chain AI model training/inference, on-chain gaming with high state requirements, and high-frequency DeFi applications with minimal overhead. These applications are not possible on monolithic chains today.
BCN: The Ethereum developer community has explored many different ways to address the problem of data availability on the blockchain. Proto-dankhardening, or EIP-4844, is seen as a step in that direction. Do you believe these will not meet developer needs? If so, why and where?
MH: Proto-dankharding (EIP-4844) takes an important step toward improving Ethereum’s data availability capabilities by introducing blob storage. The ultimate step will be Danksharding, which divides the Ethereum network into smaller segments, each responsible for a specific group of transactions. This results in a DA speed of more than 1 MB/s. However, this will still not meet the needs of future high-performance applications, as discussed above.
BCN: What is 0G’s “programmable” data availability and what sets it apart from other DAs in terms of scalability, security and transaction costs?
MH: 0G’s DA system can enable the highest scalability of any blockchain, for example at least 50,000x higher data throughput and 100x lower fees than Danksharding on the Ethereum roadmap, without sacrificing security. Because we build the 0G DA system on top of 0G’s decentralized storage system, customers can decide how they want to use their data. So programmability in our context means that customers can program/customize the persistence, location, type and security of data. In effect, 0G will allow customers to dump their entire state into a smart contract and reload it, solving the state bloat problem that plagues many blockchains today.
BCN: As AI becomes an integral part of Web3 applications and our digital lives, it is critical to ensure that the AI models are fair and reliable. Biased AI models trained on manipulated or fake data can wreak havoc. What are your thoughts on the future of AI and the role that the immutable nature of blockchain could play in maintaining the integrity of AI models?
MH: As AI systems become increasingly important for digital applications and services that impact many lives, ensuring their integrity, fairness and auditability is of paramount importance. Biased, manipulated, or compromised AI models can lead to widespread harmful consequences when deployed on a large scale. Imagine a horror scenario where a malicious AI agent trains another model/agent that is implemented directly into a humanoid robot.
The core properties of Blockchain, namely immutability, transparency and demonstrable state transitions, can play a crucial role here. By anchoring AI models, their training data, and the full auditable data from the model creation/update process on public blockchains, we can establish a permanent trail of provenance. This enables ex-post or real-time monitoring analysis to detect manipulation, bias, use of problematic data, etc. that may have compromised the integrity of the models.
Decentralized blockchain networks, by avoiding single points of failure or control, can serve as a tamper-proof and censorship-resistant registry for AI systems. Their transparency enables public accountability of the AI supply chain in a way that is very difficult with today’s centralized and opaque AI development pipelines. Imagine having an AI model that goes beyond human intelligence; let’s say it produced some result, but all it did was change database entries on a central server without producing the result. In other words, it can be easier to cheat in centralized systems.
And how can we provide the model/agent with the right incentive mechanisms and place it in an environment where it cannot be malicious. Blockchain x AI is the answer, so that future societal use cases such as traffic control, manufacturing and administrative systems can truly be governed by AI for human well-being and prosperity.
What are your thoughts on this interview? Share your thoughts in the comments below.