Author:

Kamil Klepusewicz

Software Engineer

Date:

Table of Contents

Views, as virtual query-based interfaces, play a pivotal role in The Medallion Architecture in Databricks by enabling data abstraction, governance, and efficient transformations.

 

In this technical guide, I explore how Views enhance Medallion Architecture, offering detailed explanations and practical SQL code examples for each layer. By leveraging Views effectively, organizations can achieve robust, high-quality data pipelines in Databricks.

 

What Is Medallion Architecture?

 

For a detailed overview of the Medallion Architecture—including how the Bronze, Silver, and Gold layers work—check out What is Medallion Architecture in Databricks and How to Implement It.

 

Why Views Matter in This Architecture

 

Views in Databricks are virtual tables defined by SQL queries, providing dynamic access to data without physical storage. In Medallion Architecture, views are essential for:

 

  • Data Abstraction: Simplifying complex datasets by exposing only relevant fields or aggregated results to end-users.
  • Governance: Enforcing fine-grained access controls through Unity Catalog, ensuring compliance with security policies (Databricks Unity Catalog Governance: 3 Proven Techniques).
  • Efficiency: Reducing storage costs by avoiding redundant data copies during transformations.

 

Views enable flexible data modeling and secure access, making them indispensable for managing data across the architecture’s layers. 

 

For foundational knowledge, refer to Understanding Views in Databricks: Basics and Use Cases.

 

Layer Breakdown

 

Each layer of Medallion Architecture serves a distinct purpose, with views facilitating data access and transformation. Below, we detail the Bronze, Silver, and Gold layers, including their roles, how they differ, and SQL code examples demonstrating view implementation.

 

Bronze Layer: Raw Data Ingestion

The Bronze layer ingests raw data from sources such as cloud storage, Kafka, or APIs, storing it „as-is” with minimal validation. Its purpose is to preserve the original data for auditing and traceability, often using flexible schemas (e.g., strings or VARIANT types) to accommodate changes. Views in this layer parse raw formats like JSON, providing a structured interface for downstream processing.

 

Parsing JSON Data with a View

— Create a Bronze table for raw JSON data

CREATE TABLE bronze_user_events

USING DELTA

LOCATION '/mnt/bronze/user_events’

AS SELECT * FROM json.`/mnt/source/user_events_json`;

 

— Create a view to parse JSON fields

CREATE VIEW bronze_events_view

AS SELECT

  get_json_object(value, '$.user_id’) AS user_id,

  get_json_object(value, '$.event_type’) AS event_type,

  get_json_object(value, '$.timestamp’) AS event_timestamp,

  get_json_object(value, '$.details’) AS details

FROM bronze_user_events;

 

 

This view extracts key fields from JSON data, enabling structured queries without altering the raw storage. The Bronze layer is distinguished by its focus on data capture and minimal processing.

 

Silver Layer: Data Cleansing and Normalization

The Silver layer transforms Bronze data through cleansing, deduplication, and schema enforcement, producing validated datasets for analytics. It handles null values, resolves out-of-order data, and applies quality checks. Views in this layer expose cleaned, normalized data, often integrating multiple sources or applying business rules.

 

Validating and Normalizing Data with a View

— Create a Silver table with cleansed data

CREATE TABLE silver_user_events

USING DELTA

LOCATION '/mnt/silver/user_events’

AS SELECT

  user_id,

  event_type,

  to_timestamp(event_timestamp) AS event_time,

  details

FROM bronze_events_view

WHERE user_id IS NOT NULL

  AND event_type IN (’click’, 'view’, 'purchase’)

  AND event_timestamp IS NOT NULL;

 

— Create a view for normalized data

CREATE VIEW silver_normalized_events

AS SELECT

  user_id,

  event_type,

  date_trunc(’day’, event_time) AS event_date,

  trim(details) AS details

FROM silver_user_events

WHERE length(details) > 0;

 

 

This view normalizes event timestamps to daily granularity and ensures data quality by filtering invalid records. The Silver layer is distinguished by its focus on data quality and structure, preparing data for advanced analytics.

 

Gold Layer: Business-Ready Datasets

The Gold layer aggregates Silver data into optimized datasets tailored for specific business needs, such as reporting or machine learning. It involves dimensional modeling, aggregations, and performance optimizations. Views in this layer provide curated, business-friendly access, often performing complex joins or aggregations.

 

Aggregating Data with a View

— Create a Gold table for daily user activity

CREATE TABLE gold_daily_user_activity

USING DELTA

LOCATION '/mnt/gold/daily_user_activity’

AS SELECT

  event_date,

  user_id,

  count(*) AS total_events,

  sum(CASE WHEN event_type = 'purchase’ THEN 1 ELSE 0 END) AS purchase_count

FROM silver_normalized_events

GROUP BY event_date, user_id;

 

— Create a view for top users by purchase count

CREATE VIEW gold_top_users

AS SELECT

  event_date,

  user_id,

  total_events,

  purchase_count

FROM gold_daily_user_activity

WHERE purchase_count > 0

ORDER BY purchase_count DESC

LIMIT 100;

 

 

This view identifies the top 100 users by purchase count, optimized for business reporting. The Gold layer is distinguished by its focus on business-specific, high-performance datasets.

 

 

Views vs Delta Tables

 

Views and Delta tables serve complementary roles in Medallion Architecture. Views are virtual, defined by SQL queries, and do not store data, making them ideal for data abstraction and governance. Delta tables provide physical storage with ACID transactions, time travel, and schema evolution, ensuring reliability and performance. A detailed comparison follows:

 

Feature Views Delta Tables
Storage Virtual, no physical storage Physical storage in Delta format
Performance Computed on query, may be slower Pre-computed, optimized for speed
Use Cases Data abstraction, access control Data persistence, analytics
Governance Supports Unity Catalog controls Supports Unity Catalog, auditing

 

What’s Next?

 

To deepen your Databricks expertise, explore these technical resources:

 

 

Mastering Medallion Architecture with views enables you to build scalable, governed data pipelines in Databricks. For expert guidance or tailored solutions, contact us at Contact Us.