Modern Data’s Stuck

Modern Data’s Stuck

George K. Mathew
Assistant Director of R&D
Texas Rangers Baseball Club
October 18, 2023
October 19, 2023

Table of Contents

Data remains inaccessible to those who need it most, despite this being recognized as a problem for over a decade. We know data analysts, data scientists, and executives demand rapid access to data for analytics, AI, and decision-making. Yet they find themselves in a waiting game, dependent on engineering teams to transform the data into a usable format—a job that requires complex coding. A staggering 80% of time is consumed with data discovery, preparation, and protection—time that could be better spent deriving actionable insights or training models1. This bottleneck severely constrains organizations’ potential and diverts their focus from higher-value tasks such as optimizing workloads, ensuring platform availability, and setting robust standards for enterprise scale.

Technology has been key to easing the burden on data engineering and empowering lines of business data users to be productive with data. Legacy ETL and data preparation tools (e.g., Ab Initio, Informatica, and Tibco) deliver visual, data transformation capabilities, ostensibly liberating data engineers and adjacent personas from hand-coding everything.

However, the advent of cloud computing presents challenges with these tools: 

  • Legacy ETL tools are not native to the cloud. They’re not designed to take advantage of the unique features and benefits of cloud computing platforms, including Spark and SQL-based solutions, such as scalability, elasticity, and cost-effectiveness.
  • Legacy ETL tools create vendor lock-in. They often use proprietary formats that make it difficult and expensive to switch to new platforms. The proprietary nature of these tools also creates a “walled garden” effect, which limits the ability of an organization to adopt new technologies and innovations. 
  • Legacy ETL tools lack extensibility. They cannot be easily customized to handle the specific needs of an organization—a major limitation for businesses that have complex data processing requirements, which are quickly becoming common. 

The alternative to using legacy ETL tools is to simply go back to coding data transformations. This approach has several shortcomings, which are worth stating explicitly:

First, the limited supply of talent makes it difficult for companies to build a central team of code-oriented data engineers. Even if the central data platform team can hire people with the right skills, multiple lines of business teams are rarely able to do the same.

Second, there is often a lack of standardization—both because different developers have different styles and as a result of churn within these teams. Internal frameworks are sometimes developed to address the need for standardization, but they are a poor long-term solution. These frameworks are homegrown, making them non-standard, hard to manage, and difficult to share with non-coders. 

Finally, if data pipelines are developed as code, a lot of the management that used to be provided by ETL tools to maintain these pipelines is unavailable. This includes, for example, the ability to search, view data lineage for a column to understand how it was computed, and have a summary view of data quality, cost, or team progress. 

Conversely, data teams that forgo code entirely risk losing key benefits. including CI/CD, version control, collaboration, and data tests. This blocks the team's ability to maintain and evolve data projects effectively. The solution, then, must address the following requirements:

  • A large number of data users, including those in the lines of business, must be enabled. Therefore, a visual interface is a first-class citizen.
  • Delivering trusted data requires software best practices for collaboration, development, and deployment. Therefore, code is a must have.
  • The use cases are large. Therefore, there must be a native ability to create and share enterprise-specific standards.

I’ve always been excited that Prophecy solved the ‘false choice’ of having to choose between visual tools and code. Prophecy delivers a self-service data transformation platform for the enterprise that provides a complete product, combining the power of code with the usability of visual ETL tools to deliver a zero-compromise solution. With Prophecy, data analysts can serve themselves and transform data at parity with the best programming data engineers, while data engineers can build and share standards and focus on other higher value activities.

Customers as diverse as global asset management specialists Waterfall Asset Management and the Texas Rangers baseball team have seen more than 10x increases in their data operations productivity with Prophecy. HealthVerity, the nation's largest healthcare and consumer data ecosystem, has realized a 66% decrease in the total cost of ownership of their data architecture with Prophecy. 

It gives me great pleasure to elaborate on why Insight is leading Prophecy’s Series B alongside new and existing investors. The clear demand signal, coupled with a world-class product, just shines through. We couldn’t be more thrilled to support Raj, Vikas, Maciej and the entire Prophecy team in ‘unsticking’ the Modern Data!

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 21 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 14 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Get started with the Low-code Data Transformation Platform

Meet with us at Gartner Data & Analytics Summit in Orlando March 11-13th. Schedule a live 1:1 demo at booth #600 with our team of low-code experts. Request a demo here.

Related content

PRODUCT

A generative AI platform for private enterprise data

LıVE WEBINAR

Introducing Prophecy Generative AI Platform and Data Copilot

Ready to start a free trial?

Visually built pipelines turn into 100% open-source Spark code (python or scala) → NO vendor lock-in
Seamless integration with Databricks
Git integration, testing and CI/CD
Available on AWS, Azure, and GCP
Try it Free

Lastest blog posts

Gliding into the data wonderland

Matt Turner
December 18, 2024
December 18, 2024
December 18, 2024
Events

Data Intelligence and AI Copilots at the Databricks World Tour

Matt Turner
October 29, 2024
October 29, 2024
October 29, 2024
Events

Success With AI Takes Data, Big Data!

Matt Turner
October 7, 2024
October 7, 2024
October 7, 2024