A new look for Delight, the free, cross-platform monitoring UI for Spark

What is Delight?

Delight is a free, cross-platform monitoring UI for Apache Spark featuring:

  • Spark CPU Usage Metrics, aggregated across all executors
  • Executor Memory Metrics, available for each executor
  • A timeline of your Spark jobs and stages, and of executors added/removal events
  • Access to the Spark UI (we host the Spark History Server for you)

You can install it on top of any existing Spark infrastructure – EMR, Databricks, Spark-on-Kubernetes open-source, Cloudera/Hortonworks, … – by attaching an open-source agent to your Spark applications. See the installation instructions on the open-source repository: https://github.com/datamechanics/delight

Diagram showing connections between Spark infrastructure and Delight

Delight consists of an open-source agent attached to your Spark job, and a hosted backend accessible at delight.datamechanics.co

What’s new with Delight?

Delight was originally launched by the Data Mechanics team in April 2021. One year after Data Mechanics’ acquisition by Spot by NetApp, the team is happy to release a new, more intuitive and user-friendly version of the Delight user interface.

When you log in to Delight, the main dashboard features a list of recently completed Spark applications (note: Spark applications only appear a few minutes after their completion). You can identify an application by its name, start date, and duration. The following statistics are available:

  • I/O: Volume of data read and written by Spark
  • Executor CPU Uptime: The total cores-hours resources used by your Spark executors. For example, if your application had 10 executors, with 4 cores, running for 1 hour, then the Executor CPU Uptime amounts to 40 hours.
  • Spark tasks: The sum of the duration of all the Spark tasks which ran in your job.
  • Efficiency Ratio: Calculated as “Spark Tasks Duration / Executor CPU Uptime”, this is a measure of parallelism. An efficiency close to 100% means that your Spark executors were busy running Spark tasks all the time.
Delight Applications List

Delight Applications List

Once you dive into an application’s page, you’ll see a graph showing the Executor Cores usage over time. Under this graph you’ll see a timeline of your Spark jobs and stages. You can use this graph to understand the performance bottleneck of your Spark application, or identify specific Spark jobs and stages with insufficient parallelism.

Delight showing Executor usage over time

The gray area indicates that some executor cores are idle. This could be due to an insufficient number of partitions, or by straggler tasks commonly caused by data skew.

Delight also collects memory usage over time for each of your Spark executors, recording memory usage from the Java Virtual Machine, Python, and Other processes. You can use this information to tune memory allocation. Of course, it’s better to have the max memory usage under 90% of the executor total memory capacity, to make sure you have a bit of breathing room and do not run into OutOfMemory errors.

Delight displaying memory usage over time

What are the next steps for Delight?

The goal of Delight is to make it easy for Spark developers to understand the performance bottleneck of their Spark jobs, to benefit from better stability, performance, and lower costs. Delight displays insightful CPU and Memory metrics to help developers troubleshoot common issues such as insufficient parallelism, slow shuffle, slow I/O, memory errors, and more.

The team behind Delight is currently focused on building Ocean for Apache Spark, the fully managed, continuously optimized Spark-on-Kubernetes service. In fact Ocean Spark customers have access to the visualizations of Delight without any configuration.

We’ve had a lot of great feedback from Delight users and are planning to deliver more improvements to Delight in the future, including:

  • Real-time monitoring of live Spark applications. Today the Delight application page is available after the application’s completion.
  • Enable more login methods for Delight. (Currently Google SSO is supported.)
  • Automated performance tuning recommendations. (Memory issues, Slow shuffle, Bad Parallelism / Data Skew, Slow I/O, etc.)