From Bronze to Gold: Navigating Medallion Architecture for Enterprise Success

Learn how Medallion Architecture can transform your enterprise data strategy by navigating its layered approach to data quality, governance, and analytics.

Prophecy Team
Assistant Director of R&D
Texas Rangers Baseball Club
‍
March 27, 2025
March 28, 2025

Medallion architecture is a three-tiered system organizing data into bronze (raw), silver (validated), and gold (business-ready) layers. This architecture has gained traction since Databricks popularized the term, but what does choosing it imply for an organization? 

Given modern organizational needs and the way the nature of data has changed, thinking through your architectural choices is worthwhile. Especially since making sure well-governed data is availabe is a top concern of data leaders who cited data governance as the top challenge holding back GenAI adoption in our recent survey.

In this article, we'll explore what medallion architecture really is, its advantages and limitations, and help you decide if it matches your specific data challenges.

What is medallion architecture?

Medallion architecture is a data design pattern that organizes and processes data in a structured, progressive manner through distinct quality tiers. Introduced by Databricks, this architecture divides data processing into three primary layers (often represented as medals): 

  • Bronze (raw data)
  • Silver (validated and cleansed data)
  • Gold (business-ready data). 

Each layer represents a higher level of data quality, reliability, and usability.

While Databricks popularized the term, the concept of layering data through progressive refinement isn't entirely new. Enterprise data architects have been implementing similar tiered approaches for decades, using staging areas, data marts, and warehouse layers to transform raw data into business insights.

The evolution of data management and medallion architecture

To understand medallion architecture's significance, we need to look at how data management has evolved.

While data warehouses excelled at structured analysis, they struggled with unstructured data and newer analytical workloads. Data lakes emerged to address these limitations by storing vast amounts of raw data in its native format, but they often lacked the performance, governance, and reliability of warehouses.

The single-tier nature of many data lakes led to what industry professionals call the "data swamp" problem. Without proper organization, data lakes become unwieldy repositories where finding and using data becomes increasingly difficult as they scale, making overcoming data silos a significant challenge.

Medallion architecture provides a structured approach to implement the lakehouse model, combining the flexibility of data lakes with the reliability and performance of data warehouses, effectively creating a structured data lakehouse.

The layered approach in medallion architecture serves several important functions:

  1. It separates concerns between data ingestion, transformation, and consumption.
  2. It enables progressive data quality improvements.
  3. It provides clear data lineage and traceability.
  4. It balances the need for raw data preservation with business-ready analytics.

This structured approach helps organizations manage complex data environments while maintaining data quality and governance.This architecture brings order to potentially chaotic data environments, creating reliable data foundations for analytics and machine learning.

The medallion architecture represents an important evolution in how we think about data organization, bringing structure to modern data platforms while maintaining the flexibility needed for today's diverse analytical workloads.

Core principles of medallion architecture

Medallion architecture is guided by foundational principles that work together to create a robust and effective data management framework. These principles ensure that data progresses logically through the system while maintaining quality and accessibility.

Incremental data refinement

At the heart of medallion architecture is the concept of incremental data refinement. Data flows through three distinct layers—Bronze, Silver, and Gold—with each tier representing a higher level of quality and readiness for analysis. 

This step-by-step approach ensures that data is progressively cleansed, validated, and enriched as it moves through the system, making it increasingly valuable for business insights.

Separation of concerns

Each layer in the medallion architecture has a specific role and responsibility, creating a clear separation of concerns. The Bronze layer focuses on raw data ingestion, the Silver layer handles data cleansing and transformation, and the Gold layer manages data aggregation and optimization for business use. 

This modularity simplifies governance and maintenance by allowing specialized teams to focus on their areas of expertise without disrupting the entire data pipeline.

ACID compliance

Medallion architecture adheres to the principles of Atomicity, Consistency, Isolation, and Durability (ACID). These properties ensure data integrity and reliability as information moves through the layers. 

By maintaining ACID compliance, the architecture guarantees that data remains accurate and consistent throughout its lifecycle, which is crucial for business-critical applications and regulatory compliance.

Scalability and flexibility

The architecture is designed to handle large volumes of data from diverse sources, making it inherently scalable and adaptable to changing business needs. As your data requirements grow, the medallion architecture can expand accordingly without compromising performance or data quality. 

This flexibility allows organizations to adapt to evolving data landscapes and incorporate new data sources with minimal disruption.

Data governance and traceability

The layered approach provides clear data lineage, making it easier to track the source and transformations of data throughout the system. This traceability is essential for auditing, compliance, and understanding data provenance, contributing to effective pipeline governance.

Data quality monitoring

Medallion architecture facilitates comprehensive data quality monitoring by applying validation rules, deduplication, and enrichment processes at each layer. This ensures that only high-quality data reaches the final stages, reducing errors and improving reliability. 

The structured approach to quality management helps organizations identify and address issues early in the data pipeline, before they can affect downstream analysis.

Optimization for analytics

The Gold layer is specifically structured for efficient querying, often using star schemas or denormalized tables. This makes it ideal for business intelligence and machine learning applications, where performance and accessibility are critical.

By optimizing data for analytical purposes, medallion architecture enables faster insights and more responsive decision-making processes.

Advantages of the medallion architecture

  • Structured data refinement - The layered approach progressively improves data quality through each tier, ensuring business users access only the most reliable data.
  • Clear separation of responsibilities - Each layer has distinct purposes and owners, making it easier to manage complex data environments and assign specialized teams to appropriate tasks.
  • Improved data governance - The architecture provides clear data lineage tracking from source to consumption, enabling better regulatory compliance, audit capabilities, and streamlined data operations.
  • Preservation of raw data - The Bronze layer maintains original data in its unaltered form, providing a safety net for reprocessing and historical analysis when requirements change.
  • Flexible processing patterns - The architecture accommodates both batch and streaming data processing, allowing organizations to implement real-time analytics alongside historical analysis.
  • Optimized query performance - The Gold layer is specifically designed for analytical efficiency, with pre-aggregations and denormalization that dramatically improve response times for business users.
  • Enhanced data discovery - Well-defined layers with consistent metadata make it easier for users to find and understand available data assets across the organization.
  • Simplified data operations - The standardized approach reduces complexity in maintaining data pipelines and troubleshooting issues when they arise.
  • Scalable implementation - The architecture can grow with your organization's data needs, accommodating increasing volumes and new data sources without fundamental redesign.
  • Reduced development time - Reusable patterns and clear architectural guidelines accelerate the development of new data products and integration of new sources.

The Bronze layer: Raw data foundation

Think of the Bronze layer as the "single source of truth" in your data ecosystem. By capturing and storing raw data in its original form, you maintain complete data lineage and ensure nothing is lost during downstream processing. This approach gives you several key advantages:

  • It provides a historical archive that can be used for reprocessing if errors occur in downstream layers.
  • It ensures auditability by preserving the original state of all data.
  • It allows for schema evolution as source systems change over time.
  • It decouples data ingestion from transformation, making your pipelines more resilient.

The Bronze layer supports two primary ingestion patterns:

  1. Batch Ingestion: Processing data in scheduled intervals, typically for larger volumes of historical data. This approach is ideal for sources like database extracts, file uploads, or API calls that occur periodically.
  2. Streaming Ingestion: Capturing data in real-time as it's generated. This method is perfect for time-sensitive data from sources like IoT devices, clickstreams, or transaction systems where you need to reduce latency.

Delta Lake is an excellent foundation for implementing the Bronze layer due to its ACID transaction support. Here's how you might implement a Bronze layer for raw e-commerce transaction logs:

  1. Create a Delta table with minimal schema enforcement.
  2. Configure auto-loader or streaming readers to ingest data.
  3. Add metadata columns for ingestion time and source.
  4. Enable time travel capabilities for historical queries.
  5. Implement retention policies based on your compliance requirements.

This approach ensures that all raw transaction data is preserved while providing the performance and reliability benefits of Delta Lake's transactional guarantees.

By establishing a solid Bronze layer, you create the foundation for all subsequent data processing in your medallion architecture. This investment in preserving raw data pays dividends when requirements change or when you need to trace the lineage of transformed data back to its source.

The Silver layer: Refined and validated data

In the Silver layer, data from the Bronze layer undergoes thorough cleansing and validation. This process involves removing duplicate records, handling null values, and correcting inconsistencies that could affect analysis. Data refinement processes are enhanced by implementing validation rules to maintain high data quality before advancing to the Gold layer.

For example, when handling customer data, you might validate email formats, standardize phone numbers, or ensure address fields follow a consistent structure. These validation steps are essential for maintaining data integrity throughout the pipeline.

Several transformation patterns are typically applied in the Silver layer:

  • Standardization: Converting values to consistent formats (dates, currencies, units of measurement).
  • Deduplication: Identifying and removing duplicate records using hash functions or business keys.
  • Normalization: Restructuring data to reduce redundancy and improve data integrity.
  • Type Conversion: Ensuring data types are appropriate for analytical processing.
  • Enrichment: Adding metadata or reference data to provide context.

For example, a common challenge in many organizations is handling duplicate customer records that come from multiple systems. In the Silver layer, you might implement a deduplication pipeline that:

  1. Ingests customer data from various sources (CRM, e-commerce platform, support system).
  2. Applies matching algorithms to identify potential duplicates.
  3. Creates a unified customer record with the most accurate and complete information.
  4. Maintains references to source systems for traceability.

This process creates a clean, deduplicated view of customer data that can be used confidently for customer analytics, personalization, and reporting in the Gold layer.

The Silver layer is where you establish and enforce quality rules to ensure data reliability. These quality rules might include:

  • Range validation for numeric fields.
  • Format validation for structured data like dates and identifiers.
  • Completeness checks for required fields.
  • Cross-field validation to ensure logical consistency.

Schema standardization is a critical function of the Silver layer. Here, you define consistent schemas that make data more predictable and easier to work with.

Schema standardization includes:

  • Setting appropriate column names and data types.
  • Applying constraints where necessary.
  • Creating a unified view across data from multiple sources.
  • Documenting schema changes for data lineage.

The Silver layer is where data truly begins to deliver value, striking a balance between the raw preservation of the Bronze layer and the specialized analytics focus of the Gold layer.

The Gold layer: Business-ready data

In the Gold layer, data is no longer just clean—it's enriched, aggregated, and structured to serve specific business needs. This layer is where your data becomes truly actionable, enabling:

  • Advanced analytics and visualizations.
  • Complex business intelligence reporting.
  • Machine learning model training.
  • Executive dashboards and KPIs.

The Gold layer commonly uses dimensional modeling techniques like star and snowflake schemas. These designs optimize analytical query performance by organizing data into:

  • Fact tables: Containing core business metrics and measurements.
  • Dimension tables: Providing context through descriptive attributes.

This structure makes complex queries more intuitive and significantly improves query performance for business users.

Pre-aggregating data is another strategy in the Gold layer that reduces query time and computational resources. Aggregations like the below improve dashboard performance and reduce computational costs when users need frequent access to summary data:

  • Daily/weekly/monthly summaries.
  • Department or region-based aggregations.
  • Pre-calculated business metrics and KPIs.

While the Silver layer may maintain normalized data structures, the Gold layer often intentionally denormalizes data to optimize query performance. By duplicating some information across tables, you reduce the need for costly joins when running analytical queries.

A prime example of the Gold layer's power is creating a 360-degree customer view by combining data from multiple Silver tables. This comprehensive profile might include:

  • Transaction history across all channels.
  • Customer service interactions.
  • Product preferences and purchase patterns.
  • Demographic and firmographic information.
  • Calculated loyalty metrics and lifetime value.

This unified view enables marketing teams to better understand customer behavior, sales teams to identify cross-selling opportunities, and product teams to design more relevant features.

The Gold layer is where your data investment truly pays off, transforming raw information into strategic business assets. By incorporating these implementation patterns and optimization strategies, you'll create a foundation for data-driven decision-making that scales with your organization's needs.

Is medallion architecture always the best choice?

The most effective data architects recognize that architectural decisions should be driven by business requirements, technical constraints, and organizational capabilities—not by trending patterns. 

While medallion architecture provides an excellent framework for many scenarios, your specific situation might call for a different approach or a hybrid solution that incorporates elements from multiple architectural patterns.

The medallion architecture's multi-layered approach comes with several potential drawbacks worth considering:

  • High Implementation Costs: Populating the Bronze layer involves extensive data copying and processing, while moving data to Silver and Gold layers incurs additional costs for compute, storage, and network transfers. These expenses can accumulate rapidly, especially when teams build redundant pipelines due to unclear data accessibility.
  • Consumer-Centric Burden: The medallion approach often places the responsibility of data access on consumers. Downstream users must create ETL jobs to pull data, remodel it, and clean it before use, creating tight coupling with source data models and making pipelines fragile when upstream changes occur.
  • Complexity at Scale: As organizations grow, the multi-layered approach can become increasingly complex. Managing dependencies, ensuring consistency, and maintaining performance across layers requires significant effort and expertise that not all teams possess.
  • Limited Self-Service Capabilities: Medallion architecture was designed primarily for centralized data teams, making it less adaptable to self-service data platforms. This tight control can hinder agility and overburden data engineers with requests.

So how can you decide whether medallion architecture is your best choice or if it’s time to move on from it? Evaluate alternatives when:

  • Real-time processing is paramount: If your use case demands immediate data processing and insights, the multi-step refinement process of medallion architecture might introduce unacceptable latency. Stream processing architectures might be more appropriate.
  • Budget constraints are significant: Organizations with limited resources may find the storage and computational requirements of maintaining three distinct data layers prohibitively expensive. Data mesh or simpler data lake designs might be more cost-effective.
  • Self-service is a priority: If empowering business users to access and manipulate data without engineering support is crucial, you might benefit from domain-oriented architectures that emphasize accessibility over the structured progression of medallion architecture.
  • Your team lacks specialized expertise: Successfully implementing and maintaining medallion architecture requires specific skills in data engineering. Teams without this expertise might achieve better results with simpler architectures.

Future trends in medallion architecture

As data ecosystems continue to evolve, medallion architecture is adapting to meet new challenges and opportunities.

Data mesh principles emphasize domain-oriented decentralized data ownership and self-serve infrastructure. These principles can complement medallion frameworks, potentially allowing domain teams to manage their own data products while maintaining quality standards.

Medallion architecture is increasingly adopted in machine learning operations. Organizations utilize its structure to manage high-quality data, ensuring consistency in data applications. This approach supports the integration of various components necessary for effective machine learning, such as data transformation and model management.

Organizations can utilize these enhancements to improve data architecture and support timely insights and faster pipeline development.

Metadata is becoming a first-class citizen in modern medallion implementations. Future trends include automated data quality scoring across layers, AI-powered data cataloging and discovery, enhanced data lineage tracking from source to consumption, and observability frameworks that monitor the health of each layer.

The evolution of medallion architecture reflects the maturing data landscape, where quality, governance, scalability, and real-time capabilities must coexist to deliver business value.

Extracting the most from your medallion strategy

Whether or not medallion architecture is the right approach for your company, extracting meaningful insights from data remains a significant challenge for many organizations. Data engineers are frequently overwhelmed with multiple requests from across the business, while the increasing complexity of data pipelines makes it difficult to deliver timely insights to stakeholders.

This is where Prophecy comes in, offering a solution designed to help you maximize the value of your data regardless of your architectural approach:

  • Visual development interface that simplifies pipeline creation and maintenance, allowing engineers to focus on delivering insights rather than debugging complex code.
  • Standardized project templates that implement best practices for each layer of your data architecture, ensuring consistency and quality.
  • Automated data lineage tracking that provides clear visibility into how data flows through your systems, making governance and troubleshooting easier.
  • Enterprise-grade version control that enables teams to collaborate effectively while maintaining the integrity of your data pipelines.
  • Seamless integration with existing tools that works with your current data stack, eliminating the need for disruptive migrations.

If you’re keen to cut the time it takes to disseminate business intelligence in your organization, read The future of data transformation that details how modern, low-code, visual designers can help your data engineers win more of their time back.

Ready to give Prophecy a try?

You can create a free account and get full access to all features for 21 days. No credit card needed. Want more of a guided experience? Request a demo and we’ll walk you through how Prophecy can empower your entire data team with low-code ETL today.

Ready to see Prophecy in action?

Request a demo and we’ll walk you through how Prophecy’s AI-powered visual data pipelines and high-quality open source code empowers everyone to speed data transformation

Get started with the Low-code Data Transformation Platform

Meet with us at Gartner Data & Analytics Summit in Orlando March 11-13th. Schedule a live 1:1 demo at booth #600 with our team of low-code experts. Request a demo here.

Related content

PRODUCT

A generative AI platform for private enterprise data

LıVE WEBINAR

Introducing Prophecy Generative AI Platform and Data Copilot

Ready to start a free trial?

Visually built pipelines turn into 100% open-source Spark code (python or scala) → NO vendor lock-in
Seamless integration with Databricks
Git integration, testing and CI/CD
Available on AWS, Azure, and GCP
Try it Free

Lastest blog posts

Events + Announcements

5 Takeaways from the Gartner Data & Analytics Summit

Matt Turner
March 20, 2025
March 20, 2025
March 20, 2025
March 20, 2025
March 20, 2025
March 20, 2025
Events + Announcements

Self-Service Data Preparation Without the Risk

Mitesh Shah
March 27, 2025
March 27, 2025
March 27, 2025
Data Strategy

The Five Dysfunctions of a Data Team

Lance Walter
March 6, 2025
March 6, 2025
March 6, 2025
March 6, 2025
March 6, 2025
March 6, 2025