Group 4 - Power Users: Production Hardening¶
Your mission¶
You're the release engineering team. The other groups are building features; you're making sure the whole MediaPulse project is ready to run in production. That means auditing quality, enforcing standards, hardening tests, and designing a CI/CD pipeline that catches problems before they reach the warehouse.
This is a 2-day open hackathon. There is no prescribed order beyond the checklist steps - prioritise based on what you find.
Learning objectives¶
By the end of the hackathon you will be able to:
- Run dbt-project-evaluator and interpret its output to identify structural problems
- Use dbt-codegen to auto-generate YAML for undocumented sources and models
- Apply dbt-expectations for statistical and distributional data quality tests
- Define model contracts to enforce column-level schemas at run time
- Reason about test severity (
errorvswarn) and when each is appropriate - Design a CI/CD pipeline using dbt Cloud jobs: slim CI, nightly full-refresh, PR validation
- Articulate the difference between hard requirements (must pass before deploy) and nice-to-haves
Key tools¶
dbt-project-evaluator¶
Installs as a dbt package. Runs a suite of models that query your project's metadata and flags structural violations (missing documentation, fan-out tests, exposure gaps, etc.).
Results land in models prefixed fct_ and rpt_ - query them to see violations.
dbt-codegen¶
Generates boilerplate YAML so you don't have to write it by hand.
# Generate a source definition
dbt run-operation generate_source --args '{"schema_name": "streaming"}'
# Generate model YAML (columns + descriptions)
dbt run-operation generate_model_yaml --args '{"model_names": ["stg_streaming__watch_events"]}'
Paste the output into your YAML files and fill in descriptions.
dbt-expectations¶
A port of Great Expectations into dbt. Adds dozens of statistical tests beyond the four built-in generics.
models:
- name: fct_ad_impressions
data_tests:
- dbt_expectations.expect_table_row_count_to_be_between:
arguments:
min_value: 1000
max_value: 50000000
columns:
- name: impressions_count
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
arguments:
min_value: 0
max_value: 10000000
Model contracts¶
Contracts enforce that a model's schema matches its YAML definition at run time. If a column is missing or has the wrong type, the run fails.
models:
- name: revenue_by_content
config:
contract:
enforced: true
columns:
- name: content_id
data_type: varchar
constraints:
- type: not_null
CI/CD in dbt Cloud¶
The goal is to catch problems as early as possible:
| Job | Trigger | Selector | Purpose |
|---|---|---|---|
| Slim CI | PR opened / updated | state:modified+ |
Only rebuild what changed and its downstream |
| Nightly full-refresh | Cron 02:00 | + (all) |
Full rebuild, catch schema drift |
| PR validation | PR merge | tag:critical |
Run critical-path models + tests before merge |
The slim CI job requires a deferred environment - it compares your changed models against the production manifest (manifest.json) so it knows what "state:modified" means.
Time guide¶
This is intentionally open-ended. A suggested arc is shown below, but this is your hackathon! If you feel the urge to investigate something in the MediaPulse project, or you want to dive into your own projects, let your instructor know!
| Session | Focus |
|---|---|
| Day 1 AM | Audit with dbt-project-evaluator; triage findings |
| Day 1 PM | Fix highest-priority violations; add dbt-codegen YAML |
| Day 2 AM | dbt-expectations; model contracts; test severity review |
| Day 2 PM | CI/CD design; hard requirements document; prep presentation |
Head to the Checklist when you're ready to start.