ETL modernization

From novice to expert: A blueprint for data analysts to build robust data transformation pipelines without relying on engineering

Learn essential skills and techniques to empower non-technical data practitioners to build ETL pipelines without relying on data engineering.

Mei Long

Assistant Director of R&D
Texas Rangers Baseball Club
‍

May 30, 2023

Today’s modern data landscape is rapidly evolving, as businesses struggle with how best to deal with the proliferation of new data sources which generate massive volumes of data for downstream analytics and machine learning. The ability to efficiently and quickly extract, transform, and load (ETL) this data for downstream analytics and AI requirements are easier said than done. As a result, data engineers have been tasked with the difficult task of building and maintaining data pipelines to support the needs of the business.

As more data workloads are migrated to the cloud, data engineers are feeling the brunt of the shift in data strategies as organizations increase focus on modernizing their data architectures to align with their cloud strategies, and the vast ecosystem of cloud-based services and tools. However, challenges remain as on-premises ETL products have not evolved at the pace of the market — creating complexities that consume valuable engineering resources and impede data-driven innovation. In this blog, we highlight the shortcomings of today’s legacy ETL tools and offer four guiding principles designed to put you on the path to faster, easier data engineering experience.

New problems with old solutions

Legacy ETL tools and data infrastructure are highly complex and resource intensive, historically leaving the job of management to those with the necessary technical expertise– namely, data engineers. Today, most organizations lack the necessary data engineering resources to meet the lofty demands of the business. Without enough technical expertise to keep things operating smoothly, data engineers quickly move from being enablers to bottlenecks, slowing both data workflows and innovation.

For the rest of the business — primarily data analysts who rely on data engineering to prepare data for analytics — this operational inefficiency creates a huge problem. While the former would step in and attempt to transform data on their own, they typically lack the technical background required by today’s legacy ETL tools, ultimately keeping them from the autonomy they need to be successful. Furthermore, because of their differing skillsets, analysts also rely different tools and programming languages, impacting their ability to effectively collaborate with data engineering on pipeline requirements to meet the needs of their use cases.

In order for companies to truly become “data-first” and move beyond both old habits and siloed work, they must overcome the complexities of ETL and enable all data practitioners to be self sufficient. With such democratization, data analysts could move swiftly to uncover the valuable insights that leadership teams require to make smart decisions. And data engineers can allocate their expertise and time on more strategic tasks that help move the business forward.

A blueprint for data analysts to be more productive with data

According to a 2021 survey conducted by MIT Technology Review and Databricks, enhancing data analytics by 2023 was among the most important data strategy initiatives for 43% of companies. Now that 2023 is here, we’re definitely seeing this priority playing out across the business landscape, but little is known about how analysts can help move the effort along.

If enhancing data analytics is a priority for your business, organizations must identify where their shortcomings lie and commit to a new approach that removes barriers to success. Below are our top 3 steps that can be taken to empower data analysts and other non-technical data practitioners to be successful without having to overly rely on data engineering:

Step 1. Modernize ETL with low-code tooling

Low-code tooling has, understandably, been a dream for many data analysts since its inception. The approach uses a visual way to develop, deploy, and manage data pipelines, all in a drag and drop manner, truly democratizing data engineering even to users without any coding expertise. Theoretically, this approach enables more data practitioners, regardless of skill level, to quickly and easily transform data would lend the autonomy necessary to quickly accelerate productivity, time to insight, and ML innovations. But today’s so-called low-code solutions have been far from successful.

The main issue being that most low-code solutions don't provide the transparency of the underlying code, which is necessary for data engineers to ensure pipelines are reliable and performance. For example, if a data analyst creates his own pipeline in their own environment, there’s no way for data engineering to have visibility into whether the pipeline will impact other parts of the overall data ecosystem when deployed into production. Or if an organization decides to move off their existing ETL platform, they can’t easily extract the code to migrate pipelines to their new environment without significant expense. Another shortcoming is that this low quality output can result in a lack of coding standards and consistency, making it impossible for other teams to figure out how data is being processed.

Instead, what’s needed is an intuitive UI that not only empowers analysts to visually build pipelines, while also auto-generating high quality code in a transparent manner, but also fosters collaboration between analysts and data engineers to ensure usiness requirements are met and what's being developed in dev is not going to break production. This satisfies the needs of both data engineering and analysts and gives the organization complete control over their data pipelines.

Step 2. Provide templates to facilitate data operations at scale

While the traditional order of operations has left data engineers and analysts in their respective silos, the two groups can begin to better collaborate on a low-code approach with templates built by the data engineering team.

By working together in this way, data engineers can help themselves by standardizing common data operations through the creation of frameworks and templates for specific use cases. Data analysts can then independently use these pre-built templates to operationalize pipelines at scale down the line.

Step 3. Ensure product quality with data engineering best practices

Empowering data analysts to build ETL pipelines is critical to future success, but it also means there will be a need to make sure what they’re building is optimized for their particular workloads and use cases for optimal output. In order for this to work, tooling that support the management of the entire data lifecycle — from data access and transformation to orchestration of jobs and monitoring pipeline performance.

Also, establishing processes that are backed by software engineering best practices such as CI/CD, governance, data quality, and more will help build a data-first culture and provide data analysts and any other non-technical data practitioners with the confidence to build and ship trusted data products.

How Prophecy helps: A low-code approach to building high-quality data pipelines

*Prophecy enables all data users with modern data transformation to convert raw data to analytics ready data, natively on any data lake or warehouse.*

Prophecy is working to truly democratize ETL with a low-code approach that enables all data practitioners to visually develop data pipelines. And the platform converts those visual pipelines into high-quality code on the fly using software engineering best practices. Even better? The auto-generated code is native to each underlying cloud data platform, which makes it portable and easy to work with by existing data engineers if needed.

Prophecy’s framework builder enables data users to extend the product by building new visual components that we call gems. While all the legacy products are architected as a collection of in-built visual components, Prophecy is architected as a visual component builder - even the in-built visual components are built with this builder enabling users to build high quality standards for their specific use cases.

With Prophecy, more data users are able to self-serve and build pipelines that are on par with data engineers’ output. This also allows them to focus on their data and business value instead of the technology, knowing that they’ll be able to deliver high quality data pipelines in cloud native and 100% open-source formats.

Let the toolset change the mindset

Creating an environment that will empower data analysts to be more autonomous will give your business the competitive edge it needs when it comes to being able to innovate at the speed today’s market requires. Prophecy strives to be the foundation of that change with features that modernize and democratize data transformation for your entire data org.

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 14 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 21 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Ready to see Prophecy in action?

Request a demo and we’ll walk you through how Prophecy’s AI-powered visual data pipelines and high-quality open source code empowers everyone to speed data transformation

Get started with the Low-code Data Transformation Platform

Meet with us at Gartner Data & Analytics Summit in Orlando March 11-13th. Schedule a live 1:1 demo at booth #600 with our team of low-code experts. Request a demo here.