The Five Dysfunctions of a Data Team

Data teams and analysts are inadvertently siloed into different parts of their organizations. These are the 5 data dysfunctions that breed this situation.

Lance Walter
Assistant Director of R&D
Texas Rangers Baseball Club
‍
March 6, 2025
March 6, 2025

Years ago, a friend and coworker turned me on to a popular business book called “The Five Dysfunctions of a Team” by Patrick Lencioni. It rose to popularity in the early 2000s, decomposing and codifying ineffective workplace dynamics. The book tells a fable about a new leader who struggles to align a new team, observing the five dysfunctions that consistently limit her team’s effectiveness.

Larger organizations have long understood that “data is a team sport.” Given complex enterprise data environments with hundreds of data sources, thousands of unique data consumers, and a growing list of operational, regulatory, and analytical applications that depend on clean, trusted, and timely data, it’s no surprise that it requires a team to succeed. 

That team is often “matrixed” across departments, spanning data engineers, operations professionals, business analysts, and more. They’re all trying to help their company get more value from data, and they’re not near each other on the org chart.

The five dysfunctions

Having spent nearly two decades in customer support, product management, and marketing roles working with data teams, I’ll use the “Five Dysfunctions” concept to talk about some of the patterns I’ve seen as organizations struggle to make better use of their data.

5 data team dysfunctions
The five dysfunctions of a data team

‍

  1. Absence of reliable data. This is often a function of organizational overload. You have a large team of analysts and data consumers looking to use dozens or hundreds of operational data sources and a smaller central data engineering team trying to provide proper access. This leads to long data integration “backlogs” that can stretch to months or quarters, leaving frustrated analysts and other decision-makers working with incomplete or out-of-date information. 
  1. Business urgency exceeds data urgency. When the data engineering backlog grows beyond a length that’s acceptable to business users, they may take matters into their own hands. Desperate users try to get by with some combination of “grey-market” data extracts, BI, or data prep tools. Or they just dump data and work with spreadsheets. Ironically, it’s often the fact that those users’ requests aren’t that technically complex that they end up waiting in line behind more complex data engineering jobs. Those users just want the information that they need, they’ve “accepted” (at least temporarily) the resource constraints, and now they’re going to do whatever they think they need to do to get their data. This expands data access but creates fertile ground for “Shadow IT” and information silos. In fact, this governance challenge was the most commonly cited hurdle slowing down AI adoption by data platform teams, according to a recent survey of data and analytics leaders by Wakefield Research. 
  1. Absence of trust (in data). A breakdown in trust is a predictable consequence of loosely managed data access. Each user accesses, integrates, and interprets information differently.  This takes us back to one of the defining challenges of early data warehousing and BI: different users presenting different numbers in the same meeting. Rather than discussing the business situation and using data insights to make a good decision for the business, users argue over different versions of the truth, undermining their trust in data.
  1. Avoidance of accountability. In the age of pay-as-you-go cloud computing, tool sprawl and other user “workarounds” for these dysfunctions can have significant hard costs. In this model, there’s a tradeoff between self-service data access for analysts and centralized cost visibility and controls. But the more users deploy their own cloud-based tools outside of the platform team’s central architecture, the more “hidden taxi meters” are running, potentially 24x7 across the organization. This dysfunction makes it extremely difficult for organizations to analyze and manage their cloud spend with an integrated view.
  1. The data “relay race.” This dysfunction often emerges after organizations have woken up to the hangover of data silos and shadow IT. This attempt to accelerate data delivery for users taps the users’ subject matter expertise to accelerate pipeline development. Users describe their needs and have the central team execute the delivery to maintain control over access and costs. It ends up looking like a clunky corporate “relay race”. Analysts and other data consumers use separate tools or documents to describe or even “prototype” their needs. This might be a spreadsheet, mocked-up diagram, or even prototype that details what the user knows about the data sources, reporting requirements, and calculated metrics. Designed to give a data engineer a “jumpstart” on finding, integrating, and delivering the desired data, this turns into another data dysfunction. It’s inefficient and error-prone, often requiring re-work to align evolving upstream requirements and the realities of the downstream data sources. A well-intended effort to let analysts “be part of the solution” ends up with a frustrating back-and-forth that makes the process of getting clean, trusted data take even longer.

Moving forward: people, process and technology

While it’s exciting to watch the rapid evolution of enterprise infrastructure and tooling, the rule still applies that meaningful, durable change comes from a combination of people, process, and technology. Today’s cloud data platforms, AI tooling, and other advancements can simplify and accelerate many aspects of delivering better data to the business faster, but there is no technology alone that will automatically insulate your organization from these challenges.

This research from Gartner, “How to Assess and Improve Your Data Integration Maturity” offers some great insight as to the causes, consequences, and solutions to some of these problems, including organizational and cultural elements.

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 21 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Ready to see Prophecy in action?

Request a demo and we’ll walk you through how Prophecy’s AI-powered visual data pipelines and high-quality open source code empowers everyone to speed data transformation

Get started with the Low-code Data Transformation Platform

Meet with us at Gartner Data & Analytics Summit in Orlando March 11-13th. Schedule a live 1:1 demo at booth #600 with our team of low-code experts. Request a demo here.

Related content

PRODUCT

A generative AI platform for private enterprise data

LıVE WEBINAR

Introducing Prophecy Generative AI Platform and Data Copilot

Ready to start a free trial?

Visually built pipelines turn into 100% open-source Spark code (python or scala) → NO vendor lock-in
Seamless integration with Databricks
Git integration, testing and CI/CD
Available on AWS, Azure, and GCP
Try it Free

Lastest blog posts

Data Engineering

Data Pipelines: Optimize and Modernize Your Data Strategy

Mitesh Shah
February 27, 2025
February 27, 2025
February 27, 2025
February 27, 2025
February 27, 2025
February 27, 2025
Data Engineering

Survey says… GenAI is reshaping data teams

Matt Turner
February 21, 2025
February 21, 2025
February 21, 2025
Events + Announcements

3 Ways to Connect with Prophecy at the Gartner Data & Analytics Summit!

Mitesh Shah
February 18, 2025
February 18, 2025
February 18, 2025