Cloud Computing (2024-2025)-SEM-II

Bigtable

Bigtable is a highly scalable, distributed, and fully managed NoSQL database provided by Google Cloud, designed to handle massive workloads with high throughput and low latency. It is particularly useful for applications that require real-time analytics, time-series data, large-scale transaction processing, and large datasets like those used in IoT, financial data analysis, and monitoring systems.

Important Features :

Scalability: Bigtable can scale horizontally to manage petabytes of data across thousands of servers.
High Performance: It provides consistently low-latency reads and writes, making it suitable for high-throughput workloads like online services or analytics.
Wide-Column Store: Bigtable is a wide-column database, meaning it organizes data into rows, columns, and timestamps, making it efficient for storing structured and semi-structured data.
Integration: It integrates well with other Google Cloud services, like Google Cloud Dataflow, Google Cloud Storage, and Google Cloud Pub/Sub.
Automatic Sharding: Bigtable automatically handles partitioning data across multiple nodes (sharding) to maintain performance even as your dataset grows.
Consistency: It offers eventual consistency by default but can provide strong consistency when needed for certain read operations.

Bigtable is commonly used in applications requiring fast access to large amounts of data, such as:

Personalization engines
Real-time bidding in advertising
Fraud detection systems
IoT data aggregation and processing
Time-series databases (e.g., for monitoring or logging)

Bigtable Architecture Components

a) Master Server

• The Master Server is responsible for controlling the overall management of the Bigtable system. It performs tasks like:

• Assigning Tablets: Tablets (a contiguous range of rows) are split and assigned to tablet servers for management.

• Load Balancing: Ensures that tablets are distributed across tablet servers for balanced load and performance.

• Schema Management: Manages column families and other metadata, like garbage collection policies and access control.

• Tablet Splitting: When a tablet becomes too large, the Master Server directs its split into smaller tablets and redistributes them.

• Fault Tolerance: Detects and handles failures of tablet servers by reassigning their tablets to healthy servers.

b) Tablet Servers

• Tablet Servers are responsible for handling the actual data stored in Bigtable. They manage tablets, which are ranges of rows from a Bigtable.

• Each tablet server manages multiple tablets.

• Tablets are sorted by row keys and divided dynamically as the data grows.

• Each tablet contains data stored in SSTables (Sorted String Tables) along with a commit log that records changes.

• Responsibilities of Tablet Servers:

• Serving Reads/Writes: Clients communicate with tablet servers directly to read and write data to tablets.

• Tablet Management: Tablet servers monitor the size of tablets, and when a tablet grows too large, it’s split into smaller tablets.

• Data Compaction: Periodically, tablet servers merge smaller SSTables into larger ones, a process known as compaction. This helps reduce fragmentation and improve read efficiency.

• Logging and Replication: Each tablet server writes updates to a commit log, typically stored in the underlying distributed file system for durability. This ensures recovery in case of server failures.

c) Tablet

• A tablet is a unit of storage and is a range of rows in the Bigtable. Each tablet contains:

• SSTable: Immutable files that store data in a sorted format. Each SSTable contains a subset of the rows and columns for the tablet.

• Memtable: Data that’s been recently written but not yet flushed to disk. It’s held in memory until it reaches a threshold, after which it is written to an SSTable.

• Commit Log: Used to ensure durability of writes before they are flushed to the SSTable.

d) Distributed Storage Layer (GFS)

• Bigtable uses a distributed storage layer, typically Google File System (GFS) or similar, to store SSTables, commit logs, and other metadata. GFS provides the following key features:

• Replication: Data is replicated across multiple nodes to ensure fault tolerance.

• Durability: Writes are durably stored, even in case of failures.

• Scalability: GFS can handle massive amounts of data across thousands of nodes.

• All Bigtable data (SSTables and commit logs) are stored in GFS, ensuring the system’s durability and high availability.

e)Chubby: Coordination and Lock Service

• Bigtable uses Chubby, a distributed lock service, for coordination. Chubby is used for:

• Metadata Storage: Master location and tablet assignment metadata is stored in Chubby.

• Locking Mechanism: Ensures only one Master Server is active at any given time and helps coordinate failover mechanisms.

Chubby ensures consistency and is crucial for the master’s role in the architecture.

Working of Bigtable:

a) Client Requests

• Clients communicate directly with Tablet Servers to read or write data.

• Initially, a client queries the Master Server to determine the location of the tablet containing the desired data. After receiving this information, the client communicates directly with the appropriate tablet server.

• For reads, the client gets the data from the Memtable (if it’s recent) or from the SSTables stored on disk.

• For writes, the data is first written to the commit log (for durability) and then updated in the Memtable.

b) Tablet Management

• Tablet Servers manage tablets, which are assigned to them by the Master Server.

• When a tablet grows too large (because of incoming writes), it is split into two smaller tablets. The Master Server coordinates this splitting and reassignment to ensure load balancing.

c) Compaction

• Over time, tablets accumulate multiple SSTables as new data is written. Periodically, the tablet server compacts these SSTables, merging them into larger SSTables to optimize space and improve read performance.

d) Fault Tolerance

• If a Tablet Server fails, the Master Server detects the failure and reassigns the tablets managed by the failed server to other available tablet servers.

• Since all data (SSTables and commit logs) are stored in the distributed file system (e.g., GFS), tablet recovery is seamless, and no data is lost.

Search This Blog