Deep dive into Prophecy for Databricks
Deep dive into Prophecy for Databricks
Just a few days ago, we announced a Prophecy for Databricks. In this blog post — Part 2 of that announcement — we dig into how Prophecy makes data engineering simple for any data practitioner on Databricks.
Just a few days ago, we announced a Prophecy for Databricks. In this blog post — Part 2 of that announcement — we dig into how Prophecy makes data engineering simple for any data practitioner on Databricks.
Table of Contents
Just a few days ago we’ve announced a Prophecy for Databricks. This is a huge milestone for our team, as we've integrated our low-code platform to work seamlessly for Databricks users.
This blog post is Part 2 of that announcement, where we will deep-dive into how Prophecy makes data engineering simple for any data practitioner on Databricks. It doesn't matter if you're a seasoned Apache Spark developer or new to Spark and wish you could learn to code. Now you can be productive in a matter of seconds.
Onboarding
Getting started with Prophecy for Databricks is very easy! Prophecy is one of the Databricks partners, therefore we're available directly within the Databricks' UI via Partner Connect.
When using Partner Connect, you're connecting to Prophecy Public SaaS - which is a great option, especially for smaller teams, to kick the tires quickly and easily. However, if you're more security conscious or are interested in the Enterprise version, you can also install Prophecy directly in your private network (VPC) on AWS, Azure, or GCP.
Read more about it here and contact us when you're ready!
Your First Pipeline
PySpark Support
Initially, when we built Prophecy, we were very focused on generating the highest quality and performant code. Thus, we chose Scala, for its type safety and close JVM integration, as the default language.
Since the initial release, we’ve learned that a lot of data engineers are not used to Scala’s complex syntax. Python, on the other hand, is one the most popular languages for data science and is known for its simplicity. The majority of Databricks users use PySpark, therefore this has been the most requested feature.
Today, we’re tremendously excited that Prophecy PySpark support is GA. This means that you can develop pipelines using gems and have the code in Python. Also, you can write column expressions, and embed scripts into your pipelines directly developed in Python. You can even write native PySpark UDFs in Prophecy!
Delta Integration
Prophecy works really well with Delta. Whether you’re simply reading and writing to Delta tables, or you want to use more complex constructs, like slowly-changing dimensions, we’ve got you covered. Prophecy automatically applies all the best practices to your tables (like Z-ordering) and generates the most efficient, Databricks-approved code for you.
To learn more about the Prophecy + Delta integration, read our blog on it here!
The Code & Git
Every project in Prophecy is a fully-fledged git repository, that can be integrated with your favorite git provider of choice. Every change in gems (the visual elements) generates high-quality code, either PySpark or Scala, that is committed to a specific branch. Additionally, Prophecy ensures all the best software engineering practices are followed on the code generated.
As companies mature and apply typical software engineering practices, the process involves committing your code changes, resolving any git conflicts, pushing them to your release branches, getting them approved, running any integration tests, building all the artifacts, and finally deploying them to your clusters.
Phew, that's a lot! And, it requires piecing together scripts to make it all work. But, thankfully Prophecy automates most of those steps for you, cutting down on the repeatable and error-prone steps.
Are your DevOps practices more complex than that? No problem - check out our guide on how Prophecy makes the most difficult git setups easy again.
Databricks Workflows
After you finished the development of your pipelines, now it's time to schedule them. With Prophecy for Databricks you have two options, you can schedule your pipelines with Databricks Workflows (recommended) or you can choose Apache Airflow (a more advanced option).
Prophecy makes it really easy to develop schedules. Scheduling a single pipeline is just a click away, and building complex dependency pipeline graphs have never been simpler.
Execution Metrics
What's next? After we've built and deployed our pipeline, we want to make sure that it's going to be running as reliably as possible. For that, on every single run, Prophecy captures additional metrics and data profiles, so that you can easily observe (or set up automated alerts) how your data changes with every single run.
Extensibility - Reusable Logic
We've built Prophecy from the ground up to be a very usable and extensible platform and our customers love it! When you want to extend Prophecy, with just a little bit of coding, anyone can come in and add more data sources or build in new data transformations as easy-to-use gems. These capabilities empower teams in various industries with differing skill sets.
Today, we've taken it even a step further. Now you can not only build gems themselves, but you can package multiple gems together into a reusable subgraph. Subgraphs are sets of gems that perform specific business logic that can be also easily reused and versioned by your team.
As an example, let's say that you have a specific way of computing your monthly recurring revenue (MRR) that your team applies very often on different data sources (e.g. stripe, salesforce, etc). Normally, whenever you need to compute MRR, every developer would create a set of transformations (Reformat - to clean the data, Aggregate - to sum the amounts by month & OrderBy - to order the months from the latest). However, now one person can define the business logic for the transformation, wrap all of them up in a subgraph, and easily share it with the rest of the team.
Summary
Prophecy enables you to really make your Data Engineering run quickly and reliably. Now, it’s easy to find people (of all skill levels) to create pipelines, build and deploy pipelines much faster, and have greater insight into the pipelines for both the data engineers and the downstream analytics consumers.
We have put in tremendous work and our customers, from midsize to Fortune 50, are delighted with it. Try Prophecy for yourself; you’ll be delighted you did!
How can I try Prophecy?
Prophecy is available as a public SaaS offering. Just add your Databricks credentials and start using them from the Databricks UI. We also have an Enterprise Trial with access to Prophecy's Databricks account for a couple of weeks so you can try it with examples. Lastly, we support installing Prophecy in your network (VPC or on-prem) on Kubernetes.
Sign up for your account now:
Sign up for your free Account!
We're super excited to share our product with you. Get in touch with us - we'd love to understand the challenges you are facing with data engineering!
Ready to give Prophecy a try?
You can create a free account and get full access to all features for 21 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.
Ready to see Prophecy in action?
You can create a free account and get full access to all features for 21 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.
Get started with the Low-code Data Transformation Platform
Meet with us at Gartner Data & Analytics Summit in Orlando March 11-13th. Schedule a live 1:1 demo at booth #600 with our team of low-code experts. Request a demo here.