Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views14 pages

Web3 Data Infrastructure Whitepaper

This white paper discusses the enhancement of Web3 infrastructure using TiDB, a distributed SQL database, to support large-scale, high-performance workloads. It outlines the technical requirements, architectural challenges, and proposed changes to improve data service scenarios in the Web3 ecosystem. The document emphasizes the importance of on-chain data services for dApp development and the need for efficient data management and processing in a rapidly evolving decentralized environment.

Uploaded by

selcuk.ergin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views14 pages

Web3 Data Infrastructure Whitepaper

This white paper discusses the enhancement of Web3 infrastructure using TiDB, a distributed SQL database, to support large-scale, high-performance workloads. It outlines the technical requirements, architectural challenges, and proposed changes to improve data service scenarios in the Web3 ecosystem. The document emphasizes the importance of on-chain data services for dApp development and the need for efficient data management and processing in a rapidly evolving decentralized environment.

Uploaded by

selcuk.ergin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

WHITE PAPER

Enhancing Web3
Infrastructure
with TiDB
How to Support Large-Scale, High-Performance Workloads
Contents

Introduction 3

Workload Details 4

Workload Types 4

Workload Characteristics 4

Technical Requirements 7

Architecture 8

Architecture Analysis 8

Data Flow Description 8

Architecture Challenges 9

Proposed Changes 9

Unified Data Reading and Writing Entrance 10

Architecture Comparison 10

Integration Points 11

Suitability Assessment 11

Competitive Analysis 12

Conclusion 13

White Paper | Enhancing Web3 Infrastructure with TiDB 2


Introduction

Web3 is revolutionizing the internet landscape by leveraging decentralized technologies and smart
contracts. As this transformation unfolds, on-chain data services are becoming increasingly crucial,
serving as the backbone for numerous Web3 applications.

These services are tasked with extracting valuable insights from blockchain data, avoiding redundancy,
competing for market dominance, and supporting the development of decentralized applications (dApps).
Additionally, professional data services enhance the usability, consistency, and accessibility of on-chain
data, making it easier for developers, businesses, and end-users to harness its full potential.

In this process, on-chain data services play a crucial role in:

• Unlocking Data Value: Blockchain data holds immense commercial value, and data service providers
are dedicated to mining this “gold mine” through technological means to offer valuable services.
• Avoiding Redundancy: Web3 advocates for knowledge sharing. The data services provided by
pioneers can be used and paid for by others (including Web3 developers, Web3 industry-related
companies, and wallet users), preventing repetitive work.
• Competing for Market Advantage: There is competition among data service providers to gain market
advantage. The one who can solve industrial pain points the fastest has a better chance of profit.
• Supporting dApp Development: On-chain data services provide data support for dApps, aiding their
development and innovation, and driving the growth of the Web3 ecosystem.
• Enhancing Data Usability: Professional data service providers can enhance the usability, consistency,
and user-friendliness of on-chain data, making it easier for more users to access and utilize data.

This white paper explores the role TiDB, an open-source, distributed SQL database, plays in enhancing
data service scenarios within the Web3 industry. By examining the specific workload characteristics,
technical requirements, and architectural challenges, this document provides a comprehensive
assessment of TiDB’s suitability for supporting large-scale, high-performance Web3 data services.

White Paper | Enhancing Web3 Infrastructure with TiDB 3


Workload Details

In Web3 data service scenarios, there are mainly Web3 wallet services that focus on account balances and
transaction history queries, as well as Web3 infrastructure service providers that concentrate on
on-chain data processing and analysis. Their data systems are primarily oriented towards write
throughput, with read requests mainly targeting recent transaction data queries and real-time aggregation
of address balances. The database includes block tables, transaction tables, and account tables, which
respectively record basic block information, detailed transaction information, and account addresses
and balances.

Blockchain data possesses characteristics such as immutability, chronological order, uniqueness, and
chain relationships. To meet these characteristics and requirements, relational databases need to adopt
appropriate design and optimization strategies. These include inserting data without updates, using unique
and foreign key constraints, and optimizing storage and query performance in order to effectively handle
the insertion and retrieval of large amounts of data.

Workload Types

For Web3 data service scenarios, different service targets and business logic also have the following
scenario names:

• Web3 Wallet Service: An account balance and transaction history query service scenario.
• Web3 Infrastructure Provider: An on-chain data scenario.

Workload Characteristics

• Data Volume: Depending on the range of services provided by different vendors, the data volume can
vary significantly. It generally won’t be less than 10 TB, with the largest data volume among current
customers exceeding 100 TB.

• Ethereum (ETH): As of May 2023, a full Ethereum node needs to store about 1.1 TB of data
(complete ETH Data requires approximately 1.1TB of storage space).
• Bitcoin (BTC): As of May 2023, a full Bitcoin node needs to store about 500 GB of data.
• Binance Smart Chain (BSC): As of May 2023, a full Binance Smart Chain node needs to store
about 1.5 TB of data.

• Throughput: The more blockchains integrated into the system, the greater the write throughput. The
throughput per second (DML QPS) for mainstream blockchains can vary significantly.

White Paper | Enhancing Web3 Infrastructure with TiDB 4


• Ethereum (ETH): The average TPS for the Ethereum mainnet is about 15-30 TPS.
• Bitcoin (BTC): Bitcoin’s TPS (Only Write) is relatively low, averaging around 5-7 TPS.
• Binance Smart Chain (BSC): Binance Smart Chain has a relatively high TPS, averaging 50-100 TPS.
• Tron: The average TPS for the Tron mainnet is about 1000-2000.

• Data Characteristics: The overall data system mainly focuses on write throughput, while read
requests generally involve querying recent transaction data and real-time aggregation tables of
address balances. It includes the following four types of tables:

• Transaction Table: Stores every transaction that happens on the blockchain, including unique
identifiers (transaction hashes), sender addresses, receiver addresses, transaction amounts,
and timestamps.
• Account Table: Records information about accounts in the blockchain network, such as
account address and balance, as illustrated below.

SQL
CREATE TABLE transactions (
id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
chain varchar(64) COLLATE utf8mb4_general_ci NOT NULL COMMENT ‘Main chain’,
txid varchar(255) COLLATE utf8mb4_general_ci NOT NULL COMMENT ‘Transaction
hash’,
direction varchar(64) COLLATE utf8mb4_general_ci NOT NULL DEFAULT ‘’ COMMENT
‘Transaction type’,
address varchar(128) COLLATE utf8mb4_general_ci NOT NULL DEFAULT ‘’ COMMENT
‘Transaction from or to address’,
contract varchar(255) COLLATE utf8mb4_general_ci NOT NULL DEFAULT ‘’ COMMENT
‘Contract address of the transaction’,
`to` varchar(255) COLLATE utf8mb4_general_ci NOT NULL DEFAULT ‘’ COMMENT
‘Recipient address of the transaction’,
token_id text COLLATE utf8mb4_general_ci DEFAULT NULL,
token_list text COLLATE utf8mb4_general_ci DEFAULT NULL,
status tinyint(4) NOT NULL DEFAULT ‘0’ COMMENT ‘Transaction status’,
height bigint(20) unsigned NOT NULL DEFAULT ‘0’ COMMENT ‘Block height’,
amount varchar(255) COLLATE utf8mb4_general_ci NOT NULL DEFAULT ‘0’ COMMENT
‘Transaction amount’,
fee varchar(255) COLLATE utf8mb4_general_ci NOT NULL DEFAULT ‘0’ COMMENT
‘Transaction fee’,
symbol varchar(512) COLLATE utf8mb4_general_ci NOT NULL DEFAULT ‘0’ COMMENT
‘Currency name’,
memo varchar(255) COLLATE utf8mb4_general_ci NOT NULL DEFAULT ‘’,
nonce bigint(20) unsigned DEFAULT ‘0’ COMMENT ‘Transaction nonce’,
update_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_
TIMESTAMP,
create_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (txid, chain, address, contract),
KEY index_1 (chain, address, contract, create_time),
KEY index_2 (address, create_time),
KEY index_3 (create_time)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;

White Paper | Enhancing Web3 Infrastructure with TiDB 5


• Block Table: It keeps all the info about blocks, like the unique block ID (block hash), the
previous block’s hash, block number, miner’s address, timestamp, and number of transactions,
as illustrated below.

SQL
CREATE TABLE blocks (
block_hash VARCHAR(64) PRIMARY KEY,
chain_id VARCHAR(50),
previous_block_hash VARCHAR(64),
block_number BIGINT,
miner_address VARCHAR(64),
timestamp TIMESTAMP,
transaction_count INT,
INDEX (chain_id),
INDEX (previous_block_hash),
INDEX (block_number),
INDEX (miner_address)
);

• Account Table: Records information about accounts in the blockchain network, such as
account address and balance, as illustrated below.

SQL
CREATE TABLE `balances` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`contract` varchar(128) NOT NULL COMMENT ‘contract address’,
`address` varchar(128) NOT NULL COMMENT ‘address’,
`balance` varchar(128) NOT NULL DEFAULT ‘0’,
`update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_
TIMESTAMP,
PRIMARY KEY (`address`,`contract`,`id`) /*T![clustered_index] CLUSTERED */,
KEY `idx_update_time` (`update_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

The block table and transaction table have the following characteristics:

• Immutability: Once blockchain data is created, it can’t be tampered with. This feature in relational
databases can be shown by only inserting data and not updating it.
• Timestamp Order: Both blocks and transactions have timestamps, and data is arranged in
chronological order.
• Uniqueness: Fields like transaction hash and block hash ensure uniqueness, which can be maintained
in relational databases using unique constraints.

White Paper | Enhancing Web3 Infrastructure with TiDB 6


• Large Data Volume: Blockchains typically generate a large amount of data, requiring relational
databases to have good storage and query performance to handle the insertion and retrieval of
large data.
• Chain Relationship: Blockchain data has a chain relationship, where the current block points to the
previous block. This relationship can be represented in relational databases using foreign key
constraints.

Technical Requirements

To effectively support Web3 data services, the underlying system must meet several stringent technical
requirements:

System Reliability

• High Availability: Web3 data services need to ensure data is always accessible and not rendered
unavailable due to any single point of failure. This is crucial because dApps typically require
uninterrupted 24/7 service.
• Data Consistency: Ensuring data consistency is vital as on-chain data needs to be accurately
recorded and retrieved. Any inconsistency could lead to serious consequences.
• Automatic Fault Recovery: The system must have automatic fault detection and recovery capabilities
to minimize downtime and ensure business continuity.

System Scalability

• Elastic Scaling: Resources need to be quickly scaled up or down according to the load, to handle
changes in user traffic and data processing demands. For instance, during peak trading periods, quick
scaling is needed to handle increased requests.
• Performance Stability: It is essential to maintain system performance during scaling to ensure
responsive speed and data processing efficiency.

Support for Various Query Scenarios

• Real-Time Query Capability: The system needs to support real-time data querying and analysis to
enable immediate processing and feedback on on-chain data.
• Complex Query Support: It should support complex query operations such as multi-table joins,
aggregate queries, and full-text searches to meet diverse data analysis needs.
• Big Data Processing: It must efficiently handle and analyze large-scale data to provide deep business
insights.

White Paper | Enhancing Web3 Infrastructure with TiDB 7


Architecture

The current architecture for Web3 data


services is multifaceted, involving multiple
databases, big data platforms, and real-time
data processing tools, as illustrated in the
diagram to the right.

Key components include:

• High Availability: Real-time parsing


of blockchain data and storing it in a
MySQL database.
• MySQL Database: Stores block data
from different blockchains.
• Big Data Platform (BigData): Centrally
stores data synchronized from
multiple MySQL databases and
performs multi-level data cleaning.
• Relational Database (PG): Used to
provide high-frequency query services,
such as user balance and market data
queries.
Figure 1: A typical Web3 data services architecture

Architecture Analysis
This current architecture performs the following functions:

• Blockchain Data Parsing: Real-time programs parse the latest block data and store it in MySQL
databases.
• Data Synchronization: Data from multiple MySQL databases is synchronized to a big data platform for
further processing.
• Multi-Level Data Cleaning: Data undergoes multi-level cleaning to ensure accuracy and consistency.
• Service Database Writing: Cleaned data is written into relational databases to handle high-frequency
queries, such as user balance checks and market trends.

Data Flow Description


Data passing through this current architecture is processed as follows:

1. The latest blockchain data is parsed by programs such as ChainA-insight, ChainB-insight,

White Paper | Enhancing Web3 Infrastructure with TiDB 8


ChainC-insight, and then stored in their respective MySQL databases.
2. Through real-time data streams or ETL tools, the data in the MySQL databases is synced to a big data
platform (BigData).
3. On the BigData platform, the data undergoes multi-level cleaning processes.
4. The cleaned data is written into two relational databases that provide external services (User Balance
and User K-Line).
5. Finally, the relational databases offer high-frequency queries to provide services like address
balances and market trends.

Architecture Challenges

However, this current architecture presents several challenges:

• Complexity and Maintenance Costs: The architecture’s complexity leads to high development and
maintenance costs, requiring a team with specialized skills in big data processing and multi-blockchain
data analysis.
• Data Synchronization Consistency Issues: Different synchronization frequencies and blockchain fork
handling logic can cause data inconsistencies, necessitating robust synchronization mechanisms.
• Data Real-Time Issues: Delays in data transmission and processing can lead to real-time performance
bottlenecks, particularly in emerging blockchains with higher block generation efficiency.
• Data Silos: Initial data silos and the complexity of data synchronization still exist, despite efforts to
centralize data management through big data platforms.

Proposed Changes
In the original architecture, data from
different chains (such as ChainA,
ChainB, or ChainC) was stored in
separate MySQL databases and then
synchronized to a big data platform for
cleaning and processing. In the evolved
architecture, data from these different
sources is directly stored in TiDB.

Figure 2: An evolved Web3 data services architecture


with TiDB at its core

White Paper | Enhancing Web3 Infrastructure with TiDB 9


This convergence delivers the following benefits:

• Simplifies the data flow process by reducing movement and synchronization between different
storage systems.
• Facilitates unified data management and processing, eliminating the need to maintain multiple
independent databases.
• Leverages the horizontal scalability of distributed databases to better handle data growth.
• The unified data storage layer makes it easier to achieve data consistency and transaction
management.

Unified Data Reading and Writing Entrance

In the above evolved architecture, TiDB becomes the unified entry point for data reading and writing. All
data interactions, whether it’s writing on-chain data (ChainX-insight) or querying through business
systems (User Balance, User K-Line), all data pass through this single unified entry.

The advantages of a unified data read/write entry include:

• Simplified interaction modes for the system, with multiple systems only needing to interact with one
database system.
• Leveraging the query performance benefits of the distributed database to improve data read efficiency.
• Facilitating real-time data writing and querying, reducing data transfer delays.

Architecture Comparison
If we examine the current architecture and the evolved one with TiDB at its core, we can see that the
evolved architecture:

• Efficiently analyzes ChainA, ChainB, and ChainC data while integrating it across nodes.
• Uses horizontal scaling to handle the rapid growth of user data and on-chain data.
• Provides a consistent data view in real-time analysis to enhance user experience.
• Significantly improves system availability and data durability.

White Paper | Enhancing Web3 Infrastructure with TiDB 10


Integration Points

The evolved architecture with TiDB is really simple. We just need to write new data generated on the
blockchain directly into TiDB in real-time. From there, we can use TiDB’s HTAP capabilities and transaction
features to quickly clean the detailed on-chain data into categorized aggregate tables with business value.

Suitability Assessment

TiDB offers several advantages as a distributed SQL database for Web3 data services:

• Strong Horizontal Scalability: Distributed databases can linearly scale storage and processing
capabilities by adding nodes, making them suitable for scenarios with rapid blockchain data growth.
• Strong Consistency and ACID Transactions: They offer ACID transactions and strong consistency
similar to traditional relational databases, meeting the requirements for blockchain data immutability
and transaction consistency.
• MySQL Compatibility: TiDB is compatible with MySQL protocols, making it easy to integrate with
existing applications and tools. This reduces migration and adaptation costs.
• High Availability: TiDB automatically maintains multiple data replicas, has automatic failure detection
and recovery capabilities, and provides enterprise-level high availability.
• Real-Time HTAP Capabilities: TiDB supports real-time online transaction processing (OLTP) and
online analytical processing (OLAP), fulfilling the need for real-time data writing and analysis for
blockchain data.

However, there are limitations to consider:

• Data Volume and Performance Bottlenecks: Even though TiDB can improve processing capacity by
expanding nodes, it may still encounter performance bottlenecks and scalability limitations in
ultra-large-scale blockchain data scenarios (like hundreds of TB to PB).
• Maintenance Complexity: Compared to single-node databases, distributed databases are more
complex in terms of deployment, maintenance, and fault diagnosis, requiring specialized personnel
and tools.
• Cost Considerations: As an enterprise-level distributed database, TiDB involves certain costs such
as licensing fees, hardware investment, and personnel training, so the cost-benefit ratio needs to be
carefully considered.

White Paper | Enhancing Web3 Infrastructure with TiDB 11


Competitive Analysis

When compared to other database solutions like MySQL RDS, Amazon Aurora, and ClickHouse, TiDB
demonstrates superior scalability, high availability, and efficient resource utilization. However, it is
important to weigh the specific requirements of your Web3 data service scenario against the capabilities
and limitations of each database solution to determine the best fit.

TiDB MySQL RDS MySQL Sharding Amazon Aurora ClickHouse

Latency (avg) 10-50ms 5ms 7ms 5ms 1sec ~ 15s

Write 1 million + Around 10,000 500,000 + Around 20,000 1 million +


Throughput QPS

Business Agility High In Very low In High

Avaliability 99.99% 99.99% 99.99% 99.99% 99.99%


RTO<30s RTO<50s RTO<50s RTO<60s RTO<30s
Downtime Main downtime Downtime Main downtime Downtime
affects 1/N affects all traffic affects 1/N traffic affects all traffic affects 1/N traffic
traffic

Scalabilty Compute/storage Writing and Horizontally Writing and storage Manual


can be storage are scalable, but with are limited by expansion based
dynamically limited by a long cycle and standalone on Merge Tree
horizontally standalone high risk, around capabilities, while
scaled, capabilities, while 50TB reading has certain
beyond 1PB reading has scalability, around
certain scalability, 128TB
around 3TB

Management High Low Very low In Low


Efficiency

Resource High Low Low In High


Utilization

Online DDL High Low Very low Low Very low


Efficiency

Data Recovery In High (based Based on Cloud High High (based


Efficiency on Cloud Block Block Storage on Cloud Block
Storage Snapshot Snapshot Storage Snapshot
capability) capability capability)

Multi Multi Cloud/Hybrid Multi Cloud/Hybrid Multi Cloud/Hybrid Not supported Multi Cloud/Hybrid
Cloud/Hybrid
Cloud Cloud Cloud Cloud
Cloud

White Paper | Enhancing Web3 Infrastructure with TiDB 12


Conclusion

In the context of Web3 data services, TiDB Cloud offers significant advantages over building and
maintaining an in-house TiDB cluster. These benefits include lower management costs, faster deployment
and scaling, higher availability guarantees, and more professional technical support.

Given these advantages, it is recommended to explore TiDB Cloud as a scalable, high-performance


solution for supporting the data service needs of Web3 applications. By leveraging TiDB Cloud,
organizations can enhance their ability to manage and analyze blockchain data, improve system reliability
and performance, and reduce the complexity and costs associated with maintaining a distributed
database infrastructure.

You can get started with a free trial of TiDB Cloud right now to enjoy dedicated resources, tailored
performance, and zero operational overhead—all optimized for your Web3 applications.

White Paper | Enhancing Web3 Infrastructure with TiDB 13


EVALUATE TiDB FOR YOURSELF

Start Your Free Trial

Contact us for a personalized demo at pingcap.com/demo/

You might also like