Commit 28326bc3 authored by Carp's avatar Carp

intro

parent 404db7ef
......@@ -12,14 +12,18 @@ When running Spark you have a few options:
|---|---|---|---|
| IaaS | you build the Spark cluster yourself using VMs in Azure, manage the infrastructure, upgrades, etc. | <ul><li>Has the most flexibility</li><li>if your workload is 24x7 this is likely the cheapest from an Azure spend perspective</li></ul> |Has the highest Ops overhead|
| PaaS/HDInsight | Within the hour you are running a Spark cluster (in VMs) configured however you like with working Hadoop tooling you have come to expect |<ul><li>Cheaper if your workload is not 24x7</li><li>Does not require AS MUCH Ops overhead as IaaS</li></ul> |<ul><li>Since it is PaaS it is slightly more expensive than IaaS</li></ul> |
| Azure Databricks | Pure PaaS Spark offering created by the inventors of Spark. Connects to ADLS for ad-hoc analytics workloads that are better developed with python or scala in a notebook experience. | <ul><li>Zero Ops requirement</li><li>Data engineers and data scientists are immediately productive</li><li>Perfect for getting started with Spark quickly or quick ADLS analytics with a Jupyter experience</li></ul> |Not many configuration/extensibility options.|
| Azure Databricks | Pure PaaS Spark offering (some call it Spark-as-a-Service) created by the inventors of Spark. Connects to ADLS for ad-hoc analytics workloads that are better developed with python or scala in a notebook experience. | <ul><li>Zero Ops requirement</li><li>Data engineers and data scientists are immediately productive</li><li>Perfect for getting started with Spark quickly or quick ADLS analytics with a Jupyter experience</li></ul> |Not many configuration/extensibility options.|
Databricks is a cloud-based Spark platform that removes a lot of the Ops burden of running a Spark cluster. It allows you to focus on your ETL streams adn analytics. Originally http://databricks.com was solely an AWS offering. Recently the Azure version has GA'd with native integration with other Azure services such as ADLS and WASB.
The Databricks service provides you with a master, workers, and executors, just like regular Spark, but the process is configuration-free and automated.
The typical use case is a customer that has an immediate need for Spark-based analytics for a defined period of time. With an PaaS offering you must terminate the service to reduce your costs when the service is not utilized.
The typical use case is a customer that has an immediate need for Spark-based analytics for a defined period of time. With an PaaS offering you must terminate the service to reduce your costs when the service is not utilized.
## How does this fit in the Azure ecosystem?
<img src="/assets/intro02.PNG">
## Azure Databricks Workloads and Pricing
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment