Bigtable
Bigtable is a highly scalable, distributed, and fully managed NoSQL database provided by Google Cloud, designed to handle massive workloads with high throughput and low latency. It is particularly useful for applications that require real-time analytics, time-series data, large-scale transaction processing, and large datasets like those used in IoT, financial data analysis, and monitoring systems.
Important Features :
Scalability: Bigtable can scale horizontally to manage petabytes of data across thousands of servers.
High Performance: It provides consistently low-latency reads and writes, making it suitable for high-throughput workloads like online services or analytics.
Wide-Column Store: Bigtable is a wide-column database, meaning it organizes data into rows, columns, and timestamps, making it efficient for storing structured and semi-structured data.
Integration: It integrates well with other Google Cloud services, like Google Cloud Dataflow, Google Cloud Storage, and Google Cloud Pub/Sub.
Automatic Sharding: Bigtable automatically handles partitioning data across multiple nodes (sharding) to maintain performance even as your dataset grows.
Consistency: It offers eventual consistency by default but can provide strong consistency when needed for certain read operations.
Bigtable is commonly used in applications requiring fast access to large amounts of data, such as:
- Personalization engines
- Real-time bidding in advertising
- Fraud detection systems
- IoT data aggregation and processing
- Time-series databases (e.g., for monitoring or logging)
Bigtable
Architecture Components
a) Master Server
• The Master Server is responsible
for controlling the overall management of the Bigtable system. It performs
tasks like:
• Assigning Tablets: Tablets (a contiguous range of rows) are split and assigned to
tablet servers for management.
• Load Balancing: Ensures that tablets are distributed across tablet servers for
balanced load and performance.
• Schema Management: Manages column families and other metadata, like garbage
collection policies and access control.
• Tablet Splitting: When a tablet becomes too large, the Master Server directs its
split into smaller tablets and redistributes them.
• Fault Tolerance: Detects and handles failures of tablet servers by reassigning their tablets to healthy servers.
b) Tablet Servers
• Tablet Servers are responsible for handling the actual data stored in Bigtable.
They manage tablets, which are ranges of rows
from a Bigtable.
• Each tablet server manages multiple tablets.
• Tablets
are sorted by row keys and divided dynamically as the data grows.
• Each tablet contains data stored in SSTables
(Sorted String Tables) along with a commit log
that records changes.
• Responsibilities of Tablet Servers:
• Serving Reads/Writes: Clients communicate with tablet servers directly to read and write
data to tablets.
• Tablet Management: Tablet servers monitor the size of tablets, and when a tablet
grows too large, it’s split into smaller tablets.
• Data Compaction: Periodically, tablet servers merge smaller SSTables into larger
ones, a process known as compaction. This
helps reduce fragmentation and improve read efficiency.
• Logging and Replication: Each tablet server writes updates to a commit log, typically
stored in the underlying distributed file system for durability. This ensures
recovery in case of server failures.
c) Tablet
• A tablet is a unit of storage and
is a range of rows in the Bigtable. Each tablet contains:
• SSTable:
Immutable files that store data in a sorted format. Each SSTable contains a
subset of the rows and columns for the tablet.
• Memtable:
Data that’s been recently written but not yet flushed to disk. It’s held in
memory until it reaches a threshold, after which it is written to an SSTable.
• Commit Log:
Used to ensure durability of writes before they are flushed to the SSTable.
d) Distributed Storage Layer (GFS)
• Bigtable uses a distributed storage layer, typically Google File System (GFS) or similar, to store
SSTables, commit logs, and other metadata. GFS provides the following key
features:
• Replication: Data is replicated across multiple nodes to ensure fault
tolerance.
• Durability:
Writes are durably stored, even in case of failures.
• Scalability: GFS can handle massive amounts of data across thousands of nodes.
•
All Bigtable data (SSTables and
commit logs) are stored in GFS, ensuring the system’s durability and high
availability.
e)Chubby: Coordination and Lock Service
• Bigtable uses Chubby, a distributed lock service, for coordination. Chubby is used for:
• Metadata Storage: Master location and tablet assignment metadata is stored in Chubby.
• Locking Mechanism: Ensures only one Master Server is active at any given time and helps coordinate failover mechanisms.
Chubby ensures consistency and is crucial for the master’s role in the architecture.
a) Client Requests
• Clients communicate directly with Tablet
Servers to read or write data.
• Initially, a client queries the Master
Server to determine the location of the tablet containing the desired
data. After receiving this information, the client communicates directly with
the appropriate tablet server.
• For reads, the client gets the
data from the Memtable (if it’s recent) or
from the SSTables stored on disk.
•
For writes,
the data is first written to the commit log
(for durability) and then updated in the Memtable.
b) Tablet Management
• Tablet Servers manage tablets, which are assigned to them by the Master Server.
•
When a tablet grows too large
(because of incoming writes), it is split into two smaller tablets. The Master
Server coordinates this splitting and reassignment to ensure load balancing.
c) Compaction
•
Over time, tablets accumulate
multiple SSTables as new data is written. Periodically, the tablet server compacts these SSTables, merging them into larger
SSTables to optimize space and improve read performance.
d) Fault Tolerance
• If a Tablet Server fails, the Master Server detects the failure and reassigns the
tablets managed by the failed server to other available tablet servers.
•
Since all data (SSTables and
commit logs) are stored in the distributed file system (e.g., GFS), tablet
recovery is seamless, and no data is lost.
Comments
Post a Comment