Getting started with low-code

Getting started with low-code

Let's create a pipeline using Prophecy's visual, low-code interface for Spark.

Let's create a pipeline using Prophecy's visual, low-code interface for Spark.

Anya Bida
Assistant Director of R&D
Texas Rangers Baseball Club
June 8, 2023

Table of Contents

Together we’ll build a getting-started pipeline using Prophecy's visual design tool. We’ll read, transform, and write a dataset. Our visual pipeline "ExploreTPCH" will be converted to pySpark code and committed to this repo. This is the first of many Prophecy blogs to use TPC datasets [1] :

“This [TPC-H] benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.”  - Source: TPC organization

Let’s build this pipeline together...

  • Signup (video 0.5m), create a project, pipeline (video 1m)
  • Read a TPC-H table from a Snowflake Database
  • Transform the data
  • Write out as a Delta Table
Fig 1. Create a simple pipeline to read data from a Snowflake table,
transform, and write to a Delta Lake table.

We’ll focus on the LINEITEM table from TPC-H. The table contains approx 6 million records and 16 columns. Each record represents a line-item order for a fictional company. Let’s take a look at table LINEITEM:

Fig 2. LINEITEM table schema with datatype. "Optional" means "nullable" and is configurable.

The LINEITEM table fits into the TPC-H schema definition as follows:

Fig 3. TPC-h schema obtained from the TPC website:
https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf

The LINEITEM table is usually used to benchmark business queries, so let's try one! Let's do an aggregation with group-by using our visual drag-n-drop interface:

Fig 4. Visually design the aggregate and orderBy transformations. A few business-related
calculations are useful examples for benchmarking in later learning opportunities.
Here we can see how to setup transformations using the visual design.

We’ll order the records and write the resulting file as a Delta table:

Fig 5. After aggregation, groupBy, and orderBy, the dataset is ready to write to a Delta Table "Target."

In a just a few minutes, we’ve created a visual pipeline across data sources. Prophecy generates Python (or Scala) code based on this visual pipeline. Let’s commit the Python code to a Github repository (mine is here). See the aggregate function written in Python to call the Spark API:

Fig 6. The visually designed pipeline is converted to pySpark
code and committed to the user's Github repo.

That was easy! We completed a standard example query for a TPC-H table. What about exploring and transforming our data? Let's do some data cleaning:

We accomplished a lot in a short blog:

  • Read a TPC-H table from a Snowflake Database
  • Transform the data
  • Write out as a Delta Table

This pipeline committed to Github is ready to be packaged and deployed using SDLC best practices - peer review, unit testing, CI/CD, scheduling, and monitoring. Follow along to get started with your own PySpark pipeline. Overcome the barriers to entry and use this visual tool to build your production-quality benchmarking pipeline. Have a go with low-code tooling and let us know what you think! Ping me or schedule a session with me or my team.

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 21 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Ready to see Prophecy in action?

Request a demo and we’ll walk you through how Prophecy’s AI-powered visual data pipelines and high-quality open source code empowers everyone to speed data transformation

Get started with the Low-code Data Transformation Platform

Meet with us at Gartner Data & Analytics Summit in Orlando March 11-13th. Schedule a live 1:1 demo at booth #600 with our team of low-code experts. Request a demo here.

Heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 21 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Related content

PRODUCT

A generative AI platform for private enterprise data

LıVE WEBINAR

Introducing Prophecy Generative AI Platform and Data Copilot

Ready to start a free trial?

Visually built pipelines turn into 100% open-source Spark code (python or scala) → NO vendor lock-in
Seamless integration with Databricks
Git integration, testing and CI/CD
Available on AWS, Azure, and GCP
Try it Free

Lastest blog posts

ETL modernization

ELT: Transforming Data Pipelines for the AI Era

Anya Bida
January 23, 2025
January 23, 2025
January 23, 2025
Announcements

Prophecy takes in $47M, to scale up and reimagine data integration with AI

Raj Bains
January 16, 2025
January 31, 2025
January 16, 2025
January 31, 2025
January 16, 2025
January 31, 2025