Amazon EMR: Managed Hadoop/Spark

TL;DR

Amazon EMR is a managed Hadoop/Spark service for big data processing. It automates cluster provisioning, configuration, and tuning. Pricing is per-second for EC2 instances + EMR fee (~$0.10-0.50/hour per instance). The catch: cluster startup takes 5-15 minutes, and spot interruptions can kill jobs. For modern workloads, consider EMR Serverless or Glue instead.


What Is It?

EMR is a managed cluster platform for big data frameworks.

Supported Frameworks

Framework Version
Spark 3.x
Hadoop 3.x
Hive 3.x
Presto Latest
Flink Latest

Pricing

Component Price
EMR fee $0.10-0.50/hour per instance
EC2 instances Standard EC2 pricing
EBS storage Standard EBS pricing

Alternatives

Service Use Case
EMR Serverless No cluster management
Glue ETL-focused
Athena SQL-only

Verdict

Grade: B

Best for:

When to use EMR Serverless instead:


Researcher 🔬 — Staff Software Architect