Monitoring Databricks jobs through calls to the REST API | by Georgia Deaconu | Oct, 2022

By Jessie Hobb On Oct 11, 2022

Monitoring jobs that run in a Databricks production environment requires not only setting up alerts in case of failure but also being able to easily extract statistics about jobs running time, failure rate, most frequent failure cause, and other user-defined KPIs.

The Databricks workspace provides through its UI a fairly easy and intuitive way of visualizing the run history of individual jobs. The matrix view, for instance, allows for a quick overview of recent failures and shows a rough comparison in terms of run times between the different runs.

The job runs, matrix view (Image by the Author)

What about computing statistics about failure rates or comparing average run times between different jobs? This is where things become less straightforward.

The job runs tab in the Workflows panel shows the list of all the jobs that have run in the last 60 days in your Databricks workspace. But this list cannot be exported directly from the UI, at least at the time of writing.

Job runs tab in the Workflow panel shows the list of jobs that run in the last 60 days in your workspace (Image by the Author)

Luckily, the same information (and some extra details) can be extracted through calls to the Databricks jobs list API. The data is retrieved in JSON format and can easily be transformed into a DataFrame, from which statistics and comparisons can be derived.

In this post, I will show how to connect to the Databricks REST API from a Jupiter Notebook running in your Databricks workspace, extract the desired information, and perform some basic monitoring and analysis.

To connect to the Databricks API you will first need to authenticate, in the same way are asked to do it when connecting through the UI. In my case, I will use a Databricks personal access token generated through a call to the Databricks Token API for authentication in order to avoid storing connection information in my notebook.

First, we need to configure the call to the Token API, by providing the request URL, the request body, and its headers. In the example below, I am using Databricks secrets to extract the Tenant ID and build the API URL for a Databricks workspace hosted by Microsoft Azure. The resource 2ff814a6–3304–4ab8–85cb-cd0e6f879c1d represents the Azure programmatic ID for Databricks, while the Application ID and Password are extracted again from the Databricks secrets.

Job run information retrieved through the API call (Image by the Author)

Monitoring jobs that run in a Databricks production environment requires not only setting up alerts in case of failure but also being able to easily extract statistics about jobs running time, failure rate, most frequent failure cause, and other user-defined KPIs.

What about computing statistics about failure rates or comparing average run times between different jobs? This is where things become less straightforward.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.