Deep dive into Prophecy for Databricks

Deep dive into Prophecy for Databricks

Just a few days ago, we announced a Prophecy for Databricks. In this blog post — Part 2 of that announcement — we dig into how Prophecy makes data engineering simple for any data practitioner on Databricks.‍

Just a few days ago, we announced a Prophecy for Databricks. In this blog post — Part 2 of that announcement — we dig into how Prophecy makes data engineering simple for any data practitioner on Databricks.‍

Maciej Szpakowski
Assistant Director of R&D
Texas Rangers Baseball Club
June 23, 2022
May 16, 2024

Table of Contents

Just a few days ago we’ve announced a Prophecy for Databricks. This is a huge milestone for our team, as we've integrated our low-code platform to work seamlessly for Databricks users.

This blog post is Part 2 of that announcement, where we will deep-dive into how Prophecy makes data engineering simple for any data practitioner on Databricks. It doesn't matter if you're a seasoned Apache Spark developer or new to Spark and wish you could learn to code.  Now you can be productive in a matter of seconds.

Onboarding

Getting started with Prophecy for Databricks is very easy! Prophecy is one of the Databricks partners, therefore we're available directly within the Databricks' UI via Partner Connect.

When using Partner Connect, you're connecting to Prophecy Public SaaS - which is a great option, especially for smaller teams, to kick the tires quickly and easily. However, if you're more security conscious or are interested in the Enterprise version, you can also install Prophecy directly in your private network (VPC) on AWS, Azure, or GCP.

Read more about it here and contact us when you're ready!

Your First Pipeline

PySpark Support

Initially, when we built Prophecy, we were very focused on generating the highest quality and performant code. Thus, we chose Scala, for its type safety and close JVM integration, as the default language.

Since the initial release, we’ve learned that a lot of data engineers are not used to Scala’s complex syntax. Python, on the other hand, is one the most popular languages for data science and is known for its simplicity. The majority of Databricks users use PySpark, therefore this has been the most requested feature.

Today, we’re tremendously excited that Prophecy PySpark support is GA. This means that you can develop pipelines using gems and have the code in Python. Also, you can write column expressions, and embed scripts into your pipelines directly developed in Python. You can even write native PySpark UDFs in Prophecy!

Delta Integration

Prophecy works really well with Delta. Whether you’re simply reading and writing to Delta tables, or you want to use more complex constructs, like slowly-changing dimensions, we’ve got you covered. Prophecy automatically applies all the best practices to your tables (like Z-ordering) and generates the most efficient, Databricks-approved code for you.

To learn more about the Prophecy + Delta integration, read our blog on it here!

The Code & Git

Every project in Prophecy is a fully-fledged git repository, that can be integrated with your favorite git provider of choice. Every change in gems (the visual elements) generates high-quality code, either PySpark or Scala, that is committed to a specific branch. Additionally, Prophecy ensures all the best software engineering practices are followed on the code generated.

As companies mature and apply typical software engineering practices, the process involves committing your code changes, resolving any git conflicts, pushing them to your release branches, getting them approved, running any integration tests, building all the artifacts, and finally deploying them to your clusters.

Phew, that's a lot! And, it requires piecing together scripts to make it all work. But, thankfully Prophecy automates most of those steps for you, cutting down on the repeatable and error-prone steps.

Are your DevOps practices more complex than that? No problem - check out our guide on how Prophecy makes the most difficult git setups easy again.

Databricks Workflows

After you finished the development of your pipelines, now it's time to schedule them. With Prophecy for Databricks you have two options, you can schedule your pipelines with Databricks Workflows  (recommended) or you can choose Apache Airflow (a more advanced option).

Prophecy makes it really easy to develop schedules. Scheduling a single pipeline is just a click away, and building complex dependency pipeline graphs have never been simpler.

Execution Metrics

What's next? After we've built and deployed our pipeline, we want to make sure that it's going to be running as reliably as possible. For that, on every single run, Prophecy captures additional metrics and data profiles, so that you can easily observe (or set up automated alerts) how your data changes with every single run.

Extensibility - Reusable Logic

We've built Prophecy from the ground up to be a very usable and extensible platform and our customers love it! When you want to extend Prophecy, with just a little bit of coding, anyone can come in and add more data sources or build in new data transformations as easy-to-use gems.  These capabilities empower teams in various industries with differing skill sets.

Today, we've taken it even a step further. Now you can not only build gems themselves, but you can package multiple gems together into a reusable subgraph. Subgraphs are sets of gems that perform specific business logic that can be also easily reused and versioned by your team.

As an example, let's say that you have a specific way of computing your monthly recurring revenue (MRR) that your team applies very often on different data sources (e.g. stripe, salesforce, etc). Normally, whenever you need to compute MRR, every developer would create a set of transformations (Reformat - to clean the data, Aggregate - to sum the amounts by month & OrderBy - to order the months from the latest). However, now one person can define the business logic for the transformation, wrap all of them up in a subgraph, and easily share it with the rest of the team.

Summary

Prophecy enables you to really make your Data Engineering run quickly and reliably. Now, it’s easy to find people (of all skill levels) to create pipelines, build and deploy pipelines much faster, and have greater insight into the pipelines for both the data engineers and the downstream analytics consumers.

We have put in tremendous work and our customers, from midsize to Fortune 50, are delighted with it. Try Prophecy for yourself; you’ll be delighted you did!

How can I try Prophecy?

Prophecy is available as a public SaaS offering.  Just add your Databricks credentials and start using them from the Databricks UI. We also have an Enterprise Trial with access to Prophecy's Databricks account for a couple of weeks so you can try it with examples. Lastly, we support installing Prophecy in your network (VPC or on-prem) on Kubernetes.

Sign up for your account now:

Sign up for your free Account!

We're super excited to share our product with you. Get in touch with us - we'd love to understand the challenges you are facing with data engineering!

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 21 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 14 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Get started with the Low-code Data Transformation Platform

Meet with us at Gartner Data & Analytics Summit in Orlando March 11-13th. Schedule a live 1:1 demo at booth #600 with our team of low-code experts. Request a demo here.

Related content

PRODUCT

A generative AI platform for private enterprise data

LıVE WEBINAR

Introducing Prophecy Generative AI Platform and Data Copilot

Ready to start a free trial?

Visually built pipelines turn into 100% open-source Spark code (python or scala) → NO vendor lock-in
Seamless integration with Databricks
Git integration, testing and CI/CD
Available on AWS, Azure, and GCP
Try it Free

Lastest blog posts

Gliding into the data wonderland

Matt Turner
December 18, 2024
December 18, 2024
December 18, 2024
Events

Data Intelligence and AI Copilots at the Databricks World Tour

Matt Turner
October 29, 2024
October 29, 2024
October 29, 2024
Events

Success With AI Takes Data, Big Data!

Matt Turner
October 7, 2024
October 7, 2024
October 7, 2024