GCP Bigtable: Wide-Column Store for Massive Scale

TL;DR

Google Cloud Bigtable is GCP’s wide-column NoSQL database — designed for massive scale (petabytes) with single-digit millisecond latency. It’s Google’s internal Bigtable technology (used for Search, Maps, Gmail) made public. Best for: time-series data, IoT, ad tech, and applications needing high write throughput with sequential reads. Not for: small data (minimum 1 node), complex queries, or joins. The pricing is unique: per-node hourly, not per-storage — can be expensive for small workloads, cheap at scale.


What Is It?

Bigtable is a sparse, distributed, persistent multidimensional sorted map — a wide-column store.

Data Model

Table: sensor-data
├── Row key: {device-id}#{timestamp}
│
└── Column families: metadata, readings
    ├── metadata:location    │ metadata:version
    ├── readings:temperature │ readings:humidity

Row: device001#2024-01-15T10:00:00Z
├── metadata:location = "warehouse-a"
├── metadata:version = "v2.1"
├── readings:temperature = 22.5
└── readings:humidity = 45.2

Key Characteristics

Feature Description
Row key design Critical for performance
Column families Groups of columns
Sparse Empty columns take no space
Sorted by row key Sequential access patterns
HBase API Compatible with Apache HBase

Pricing

Node-Based Pricing

Component Price
Production node $0.65/hour (~$470/month)
Storage (SSD) $0.17/GB/month
Storage (HDD) $0.026/GB/month
Network egress Standard GCP rates

Minimum: 1 production node = ~$470/month

Cost Comparison (10 TB, high throughput)

Service Monthly Cost
Bigtable (3 nodes, SSD) ~$1,400 + $1,700 = $3,100
DynamoDB (on-demand) ~$2,000-4,000
Cloud Spanner ~$3,000+

AWS Alternative: DynamoDB

Feature Bigtable DynamoDB
Data model Wide-column Key-value/document
Throughput Higher writes Balanced
Queries Limited (HBase) Limited (different)
Secondary indexes No Yes (GSIs/LSIs)
Pricing Per node Per request/storage
Minimum cost $470/month Can be $0

Bigtable advantage: Higher write throughput, time-series optimized DynamoDB advantage: More flexible, serverless option

AWS Alternative: Keyspaces

Amazon Keyspaces (for Apache Cassandra) is closer:


Real-World Use Cases

Use Case 1: IoT Time-Series

Row key: {sensor-id}#{reverse-timestamp}
Column families: readings, metadata

Benefits:
- High write throughput (millions/sec)
- Efficient time-range scans
- Automatic sharding by row key

Use Case 2: Ad Tech

User profiles: {user-id}#{campaign-id}
Real-time bidding decisions
Sub-10ms reads at massive scale

Use Case 3: Financial Tick Data

Row key: {symbol}#{nanosecond-timestamp}
Store every market data tick
Efficient replay for backtesting

The Catch

1. High Minimum Cost

$470/month for single node. Overkill for small datasets.

2. No Secondary Indexes

3. Complex Row Key Design

Bad design = hot spots = poor performance Requires upfront planning.

4. No Joins, Limited Queries

5. HBase API Only


Verdict

Grade: A-

Best for:

When to use:

When not to use:


Researcher 🔬 — Staff Software Architect