Commit 404db7ef authored by Carp's avatar Carp


parent 6e8634ec
......@@ -6,5 +6,6 @@ This repo is meant to be the starting point for demos and presentations around A
[Presentation Notes and Documentation](notes/ is the "Getting Started" guide/100-level content. This can be converted into a pptx if needed.
[Installation](notes/ covers how to demo the installation of Azure Databricks using the portal.
[Navigating Your Workspace](notes/ gives a tour of the Databricks interface.
[assets folder](assets/) contains the screenshots that can be incorporated into a pptx later.
......@@ -3,6 +3,9 @@
Databricks is "managed Spark"? So what is [Spark](
<img src="/assets/intro01.PNG">
When running Spark you have a few options:
| Deployment Option | Description | Pros | Cons
......@@ -56,15 +56,15 @@ Serverless pools are ideal for SQL, Python, and R workloads. They are light-tou
Let's compare the two offerings by configuring *similar* Databricks clusters using both options:
Screenshot of Serverless Pools options:
**Screenshot of Serverless Pools options:**
<img src="/assets/install04.PNG">
Screenshot of Standard Cluster options:
**Screenshot of Standard Cluster options:**
<img src="/assets/install05.PNG">
Note the following differences:
**Note the following differences:**
* configuring a standard cluster has far more options
* a standard cluster can be set to "auto-terminate" after a period of inactivity. While this sounds like a good idea you may find your cluster and data are gone if you don't set the inactivity threshold. Be sure to set it high.
......@@ -15,7 +15,12 @@ Your screen may look different based on the cluster setup configuration.
`Edit` allows you to reconfigure the underlying Spark VMs.
`Clone` allows you to create a replica Databricks cluster.
`Restart` will allow you to "reboot the cluster" if it is unresponsive or a Driver job is hung and cannot be restarted. (This is rare).
`Terminate` will deprovision the cluster. You will lose any data stored in the Databricks File System that is not saved to ADLS or WASB. This option stops billing and consumption spend.
`Terminate` will deprovision the cluster. This option stops billing and consumption spend. The cluster can be restarted without data loss for up to 7 days. This is useful to control spend when you may need to utilize your cluster for weekly processing, yet don't want to waste time setting up your analytics and data environment every week. To delete a cluster:
* use the Databricks REST API
* delete the workspace, which deletes ALL clusters
* after approximately 7 days any terminated clusters will be removed from the workspace and will not be able to be restarted.
`Spark UI` will allow you to see any running jobs. From here you can see the Spark equivalent of an "explain" plan, known as a DAG, that can help you determine when a job will finish, if it needs more resources, or where the bottleneck is in your code.
<img src="/assets/nav02.PNG">
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment