EMR vs Databricks costs

2018-12-10 2 min read

    It’s frustrating when vendors introduce their own currency in what seems to be a way to obfuscate pricing. The most recent example is Databricks which offers a slick Spark hosting solution on top of AWS and Azure. Unfortunately, instead of being explicit about the prices they introduced a Databricks Unit (DBU) currency type that then translates into dollars based on the type of usage - ranging from a simple Spark cluster with limited optimizations (Basic Plan) to an interactive one with all sorts of behind the scenes performance tweaks (Data Analytics Plan).

    The nice thing is that Databricks is transparent about the amount of DBUs per EC2 instance and the price per DBU so it took a bit of data cleanup to dump everything into a spreadsheet and then do the lookups and math to compare the EMR vs Databricks pricing.

    Turns out that the Databricks Basic plan is comparable to standard EMR - in some cases it’s more expensive and in some cases it’s significantly cheaper. For example an i2.xlarge costs $0.213/hour in AWS EMR but 1.5 DBUs (equivalent to $0.105/hour) in Databricks. At the same time an i3.16xlarge costs $0.270 in AWS EMR but 16 DBUs (equivalent to $1.120/hour) in Databricks. That’s a huge range, the i2.xlarge is less than half the cost in Databricks but the i3.16xlarge is more than 4 times as much in Databricks than in AWS. In general Databricks is more expensive for the larger instance types and cheaper for the smaller ones and I’d be curious to understand the reasoning. Also note that this is just using the Basic plan - Databricks has other plans which are never cheaper than the EMR equivalent.

    I’ve included a screenshot of the analysis below but all the data is also available on a shared Google Spreadsheet.

    EMR vs Databricks costs by instance type