Group 4 - Checklist Level 2¶
dbt Level Up: Hardening for Production¶
Start on this checklist once you have completed Checklist Level 1.
In this level you will apply the following skills:
- dbt-expectations - statistical guardrails on critical models
- Model contracts - schema enforcement at compile time
- Test severity -
warnvserrordecisions - CI/CD design - slim CI, nightly refresh, production deploy
- Hard requirements - what must pass before any production deploy
Work through the steps in order. Document decisions as you go - you'll present findings at 16:00 on Day 2.
Step 1 - Apply dbt-expectations to critical models¶
- Step complete
Add statistical tests to the most important mart models. Focus on:
- Row count bounds (catch silent truncations)
- Column value bounds (catch sign errors, unit errors)
- Column completeness (null rate below threshold)
Hint: dbt-expectations on fct_ad_impressions
models:
- name: fct_ad_impressions
data_tests:
- dbt_expectations.expect_table_row_count_to_be_between:
min_value: 1000 # fail if the table is suspiciously small
max_value: 100000000 # fail if it explodes (fan-out bug)
columns:
- name: impressions_count
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 10000000
- dbt_expectations.expect_column_values_to_not_be_null:
mostly: 1.0 # 100% non-null required
- name: click_through_rate
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0.0
max_value: 1.0 # CTR can't exceed 100%
Hint: dbt-expectations on content_performance
Step 2 - Define model contracts on critical marts¶
- Step complete
Add contract: {enforced: true} to content_performance and revenue_by_content. This means dbt will verify the model's output schema matches the YAML definition at compile time.
Hint: Contract config
models:
- name: revenue_by_content
config:
contract:
enforced: true
columns:
- name: content_id
data_type: varchar
constraints:
- type: not_null
- name: impression_date
data_type: date
constraints:
- type: not_null
- name: mediapulse_revenue_dollars
data_type: float
If the model produces a column with a different type or name, the run fails with a clear error - this catches schema drift before it reaches consumers.
Warning
Contracts require that all columns in the model are listed in YAML. Missing columns cause a compile error. Use dbt-codegen (Level 1 Step 5) to get the full column list first.
Step 3 - Review test severity across the project¶
- Step complete
Go through all model YAML files and consider which tests should be warn vs error:
| Severity | Use when |
|---|---|
error |
Failure means data is corrupt or a key business invariant is violated |
warn |
Failure is unexpected but not immediately harmful; needs investigation |
Hint: Setting severity
columns:
- name: mediapulse_revenue_dollars
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1000000
config:
severity: warn # revenue exceeding $1M/row is suspicious but not a hard stop
Good candidates for warn severity:
- Row count bounds (catch trends, not hard failures)
- Freshness checks beyond a certain threshold
- accepted_values tests on categories that might legitimately grow
Hard error:
- not_null on primary keys
- unique on primary keys
- relationships tests (broken FK = broken joins)
- Revenue assertions (money must be right)
Step 4 - Design the CI/CD pipeline¶
- Step complete
Design a dbt Cloud job structure for MediaPulse. You need at minimum three jobs:
- Slim CI - triggered on PR open/update; runs only changed models and their downstream
- Nightly full-refresh - runs at 02:00; full
--full-refreshto catch schema drift - Production deploy - triggered on merge to main; runs
+state:modified+against production environment
For each job, define:
- Trigger (PR event, cron, API)
- dbt command and selector
- Environment (CI vs prod)
- Whether it uses a deferred environment
Hint: Slim CI configuration
The slim CI job uses state:modified+ to only run what changed:
# In dbt Cloud job commands:
dbt build --select state:modified+ --defer --state ./logs/prod-artifacts
The --defer flag tells dbt to use production-compiled models for any upstream models that weren't selected. The --state flag points to a folder containing the production manifest.json.
In dbt Cloud, you set the Deferral environment in the job config and don't need to handle --state manually.
Hint: Nightly job
Schedule at 02:00 UTC. Send alerts to a Slack channel on failure. This job should also run dbt source freshness to catch upstream data delivery issues.
Step 5 - Define hard requirements vs nice-to-haves¶
- Step complete
As a group, write a short document (a markdown file in the repo under docs/production_requirements.md) that answers:
Hard requirements - must pass before any production deploy:
- All
not_null+uniquetests on primary keys pass - All
relationshipstests pass - No model contract violations
-
content_performanceandrevenue_by_contentrow counts within expected bounds - Singular revenue assertion tests pass
- dbt-project-evaluator: zero
must_fixviolations remain
Nice-to-haves - target within next sprint:
- 100% of models have descriptions
- All source columns have tests
- dbt-expectations tests on all fact tables
-
warn-severity tests for statistical bounds on dimension tables
Hint: Framing for your presentation
The distinction between hard requirements and nice-to-haves is a conversation about risk tolerance. A good way to frame it:
- Hard requirements = failures here mean "someone is making a wrong decision based on this data today"
- Nice-to-haves = failures here mean "we might catch a problem tomorrow instead of today"
Be prepared to justify each item in your list. Not everything needs to be a blocker.
Step 6 - BONUS: Evaluate dbt-project-evaluator coverage gaps¶
- Step complete
dbt-project-evaluator is configurable - you can disable checks that don't apply to your project or add custom rules. Review the evaluator documentation and:
- Identify any default rules that don't make sense for MediaPulse
- Disable them in
dbt_project.ymlusing the evaluator'svarsconfig - Consider whether any project-specific rules are missing (e.g., "all marts must have an exposure defined")
Hint: Disabling a rule
Step 7 - Prepare your presentation¶
- Step complete
At 16:00 Day 2 you have 10–15 minutes to present. Structure:
- What we found - top 5 evaluator violations by risk level
- What we fixed - concrete before/after
- What we added - dbt-expectations tests, contracts, severity review
- CI/CD design - diagram of your three jobs and what each catches
- Hard requirements - your final list with rationale
- What we'd do next - honest backlog
Done?
You've audited, hardened, and documented the MediaPulse platform to production-ready standards. The other groups built features; you built the safety net. Neither is more important - the platform needs both.
Now head to Level 3 to audit test configuration project-wide, and learn dbt unit testing to verify transformation logic in isolation!