Use Spark interims to troubleshoot and polish low-code Spark pipelines: Part 2

Use Spark interims to troubleshoot and polish low-code Spark pipelines: Part 2

In Part 1, we learned an easy way to troubleshoot a data pipeline using historical, read-only metadata. Now, I want to dig in and polish my individual spark data frames.

In Part 1, we learned an easy way to troubleshoot a data pipeline using historical, read-only metadata. Now, I want to dig in and polish my individual spark data frames.

Anya Bida
Assistant Director of R&D
Texas Rangers Baseball Club
June 8, 2023

Table of Contents

In Part 1 we learned an easy way to troubleshoot a data pipeline using historical read-only metadata. Now I want to dig in and polish my individual spark dataframes (or RDDs). Here I have disabled column pruning temporarily so we can sample the data output from each dataframe.

Fig 1. Each interim represented in the Spark UI (right) corresponds to the data sample in the Prophecy pipeline (left) with the matching color arrow. 

Let's see how the data pipeline could be improved. Interims show me some sample data for each step of my pipeline. Let's iterate

Now I understand how my individual dataframes behave and I’m happy with my pipeline. As usual, I can view my pySpark code changes and push them to my git repo.

Interim data sampling makes my troubleshooting easier - I can conceptualize the visual flow, compare historical runs (see Part 1 of this blog), and inspect individual dataframes ALL in a low-code interface for Spark. Finally, spark has a visual IDE!

How can I try Prophecy?

Prophecy is available as a SaaS product where you can add your Databricks credentials and start using it with Databricks. Or you can use an Enterprise Trial with Prophecy's Databricks account for a couple of weeks to kick the tires with examples. We also support installing Prophecy in your network (VPC or on-prem) on Kubernetes. Sign up for your 14 day free trial account here.

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 21 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 14 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Get started with the Low-code Data Transformation Platform

Meet with us at Gartner Data & Analytics Summit in Orlando March 11-13th. Schedule a live 1:1 demo at booth #600 with our team of low-code experts. Request a demo here.

Related content

PRODUCT

A generative AI platform for private enterprise data

LıVE WEBINAR

Introducing Prophecy Generative AI Platform and Data Copilot

Ready to start a free trial?

Visually built pipelines turn into 100% open-source Spark code (python or scala) → NO vendor lock-in
Seamless integration with Databricks
Git integration, testing and CI/CD
Available on AWS, Azure, and GCP
Try it Free

Lastest blog posts

Events

Data Intelligence and AI Copilots at the Databricks World Tour

Matt Turner
October 29, 2024
October 29, 2024
October 29, 2024
Events

Success With AI Takes Data, Big Data!

Matt Turner
October 7, 2024
October 7, 2024
October 7, 2024
ETL modernization

Weigh Your Options As You Move Off Alteryx

Raj Bains
November 18, 2024
November 18, 2024
November 18, 2024