Skip to content

Group 1 - Intermediate: StreamVault

This page will give you context for your use case.

Head to the Checklist 1 when you're ready to start.

Testing, Sources, Jinja & Macros

Your slice of MediaPulse

You own the StreamVault data - MediaPulse's subscription streaming platform. The raw data exists in the database but nobody has wired it into the dbt project yet. By the end of today you'll have a fully tested, documented staging layer for streaming data, complete with reusable macros.


Learning objectives

By the end of the hackathon you will be able to:

  • Define dbt sources and explain why they matter
  • Configure source freshness checks
  • Write generic tests (not_null, unique, accepted_values, relationships)
  • Build clean staging models that rename, cast, and normalise raw columns
  • Utilize existing useful dbt packages to elevate your way of working with dbt
  • Write and use Jinja macros to eliminate repetition
  • Run dbt test and interpret failures

Key concepts

Sources

Sources tell dbt about raw tables that live outside your project. They give you:

  • A place to document and test raw data before it enters your models
  • source() references that show lineage in the DAG
  • Freshness monitoring via loaded_at_field

Generic tests

Four built-in generic tests cover most quality checks:

Test What it checks
not_null Column has no NULL values
unique Column has no duplicate values
accepted_values Column only contains values from a defined list
relationships Every value in column A exists in column B

Jinja & Macros

dbt models are Jinja templates. Macros let you define reusable SQL snippets. A macro is just a function:

{% macro cents_to_dollars(column_name) %}
    {{ column_name }} / 100.0
{% endmacro %}

Call it in a model:

select
    {{ cents_to_dollars('monthly_fee_cents') }} as monthly_fee

Relevant tables

You'll work exclusively with streaming:

  • streaming.watch_events
  • streaming.subscriptions
  • streaming.content_catalog

See the MediaPulse overview for full column details.


Time guide

Session Target
Day 1 AM (10:00–12:00) Steps 1–6: sources, freshness, first staging model
Day 1 PM (13:30–16:30) Steps 7–13: remaining models, macros, full test suite