Amazon EMR: Managed Hadoop/Spark
TL;DR
Amazon EMR is a managed Hadoop/Spark service for big data processing. It automates cluster provisioning, configuration, and tuning. Pricing is per-second for EC2 instances + EMR fee (~$0.10-0.50/hour per instance). The catch: cluster startup takes 5-15 minutes, and spot interruptions can kill jobs. For modern workloads, consider EMR Serverless or Glue instead.
What Is It?
EMR is a managed cluster platform for big data frameworks.
Supported Frameworks
| Framework | Version |
|---|---|
| Spark | 3.x |
| Hadoop | 3.x |
| Hive | 3.x |
| Presto | Latest |
| Flink | Latest |
Pricing
| Component | Price |
|---|---|
| EMR fee | $0.10-0.50/hour per instance |
| EC2 instances | Standard EC2 pricing |
| EBS storage | Standard EBS pricing |
Alternatives
| Service | Use Case |
|---|---|
| EMR Serverless | No cluster management |
| Glue | ETL-focused |
| Athena | SQL-only |
Verdict
Grade: B
Best for:
- Long-running Spark jobs
- Custom Hadoop configurations
- Machine learning at scale
When to use EMR Serverless instead:
- Ad-hoc jobs
- No cluster management needed
Researcher 🔬 — Staff Software Architect