Group 3 - Checklist Level 3¶
Advanced Testing¶
Start on this checklist once you have completed Checklist Level 2.
In this level you will apply the following skills:
- Test configuration -
severity,where,store_failures,limit - Package-based tests -
dbt_expectationsfor statistical guardrails on revenue models - Test strategy - deciding what to test, at what severity, and why
Work through the steps in order. Expand a hint only after you've had a genuine attempt - the struggle is where the learning happens!
Step 1 - Install dbt_expectations¶
- Step complete
Add the package to packages.yml:
Then install:
Step 2 - Add severity and where to existing tests¶
- Step complete
Open _ads__models.yml and _revenue__models.yml and apply the following configuration:
- Set
severity: warnon theaccepted_valuestest forcampaign_type- new campaign types may be introduced and shouldn't hard-fail CI. - Add a
wherefilter to thenot_nulltest onspend_dollarsinstg_ads__spendso it only evaluates rows wherespend_dollars > 0(zero-spend rows are valid for paused campaigns). - Configure the
relationshipstest linkingfct_ad_impressions.campaign_id→stg_ads__campaigns. You decide: what severity is appropriate here, and why? Think about what a broken foreign key means for the revenue numbers downstream consumers will see, and what should happen in CI when it breaks. Be ready to defend your choice in the group discussion at the end of this level.
Hint: Test config syntax
A config: block nested under a test changes its behaviour:
- name: campaign_type
data_tests:
- accepted_values:
arguments:
values: ['display', 'video', 'sponsored_content', 'podcast_ad', 'newsletter']
config:
severity: warn
- name: spend_dollars
data_tests:
- not_null:
config:
where: "spend_dollars > 0"
- name: campaign_id
data_tests:
- relationships:
arguments:
to: ref('stg_ads__campaigns')
field: campaign_id
config:
severity: # your call
Step 3 - Enable store_failures on a revenue test¶
- Step complete
Revenue tests that fail are always worth investigating row-by-row. Enable store_failures on your assert_revenue_not_null singular test, and cap the size of the resulting failures table with limit.
Read the dbt docs on the config and figure out exactly where it goes for a singular test (it's a slightly different shape than for a generic test in YAML):
Once configured, run it:
Then query the failures table. Even if the test passes, confirm the failures table exists and is empty - that's proof the test ran correctly.
Heads-up: dbt_expectations is needed from Step 4 onwards
The next steps lean on dbt_expectations. If you skipped Step 1, find the package on the dbt package hub and add it to your packages.yml, then run dbt deps to install it.
Step 4 - Add dbt_expectations tests to fct_ad_impressions¶
- Step complete
This is the highest-volume, most critical fact table. Add the following guardrails in _revenue__models.yml:
- The table should have at least 20 rows (catch a load failure)
impressions_countshould always be greater than 0click_through_rateshould be between 0 and 1 - a CTR above 100% is physically impossible and indicates a calculation errorimpression_dateshould not be null
Hint: CTR bounds
models:
- name: fct_ad_impressions
data_tests:
- dbt_expectations.expect_table_row_count_to_be_between:
arguments:
min_value: 20
columns:
- name: click_through_rate
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
arguments:
min_value: 0.0
max_value: 1.0
config:
severity: error
- name: impressions_count
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
arguments:
min_value: 1
config:
severity: warn
Step 5 - Add dbt_expectations tests to revenue_by_content¶
- Step complete
Add guardrails to the revenue mart:
mediapulse_revenue_dollarsshould always be 0 or greater (revenue can't be negative)allocated_spend_dollarsshould be greater than 0 (zero-allocation rows indicate a join miss)impression_shareshould be between 0 and 1- Add a row count lower bound
Set mediapulse_revenue_dollars >= 0 as severity: error - negative revenue is a hard data error that must block CI.
Hint
models:
- name: revenue_by_content
data_tests:
- dbt_expectations.expect_table_row_count_to_be_between:
arguments:
min_value: 10
columns:
- name: mediapulse_revenue_dollars
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
arguments:
min_value: 0
config:
severity: error
- name: impression_share
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
arguments:
min_value: 0.0
max_value: 1.0
config:
severity: warn
Step 6 - Run the full test suite and review severity split¶
- Step complete
Review and discuss:
- Which tests did you set as
errorvswarn- and what was your reasoning? - The revenue tests are particularly high-stakes. Which ones would you make
errorthat currently aren't? - How would you communicate test failures to non-technical stakeholders? (A failing revenue assertion isn't just a dbt problem.)
As a group, draft a one-paragraph "test contract" for revenue_by_content: what guarantees does this model make to its consumers, and how does the test suite enforce those guarantees?
Step 7 - Stretch your test suite: what else would you cover?¶
- Step complete
You've now applied generic tests, singular tests, severity/where config, store_failures, and a handful of dbt_expectations checks. Step back and ask: what else would you test on these models if this were a real production pipeline?
Brainstorm with your group, then pick one or two extra tests to actually add. Some directions to consider:
- Within
dbt_utils(already installed): tests likeexpression_is_true,equal_rowcount,recency, ormutually_exclusive_rangescover things that purenot_null/uniquewon't. - Anomaly / freshness - is the latest
impression_daterecent enough? Is yesterday's row count within 20% of the 7-day average? - Cross-model invariants - does the sum of
allocated_spend_dollarsreconcile to totalspend_dollarsper day? (You partly tested this in Level 2 Step 3 - tighten the tolerance, or add the inverse check.) - Distribution shape -
expect_column_quantile_values_to_be_between,expect_column_stdev_to_be_betweenfromdbt_expectations.
Other packages worth looking at on the dbt package hub:
- 📦
elementary- anomaly detection on volume, freshness, and column-level statistics out of the box. - 📦
dbt_project_evaluator- tests the project itself (model naming, missing tests, missing documentation) rather than the data. - 📦
dbt_meta_testing- assert that every model has a minimum set of tests/docs.
Pick one or two ideas, add them to your YAML or tests/ folder, and run them. Note what was easy vs awkward to express - test gaps usually live in that awkwardness.
Step 8 - Break things on purpose¶
- Step complete
One-time setup before this step
Editing the seeds only affects your models if your source declaration is reading from your personal seed schema, not the shared one. In models/staging/ads/_ads__sources.yml, set the source's schema to "{{ target.schema }}_ads" if you haven't already - otherwise your changes won't flow through and you'll wonder why nothing fails.
For each scenario: edit the seed, dbt seed, dbt build -s +revenue_by_content, see what fails. Predict the failure first. Revert the seed before moving on.
Negative spend
Change a spend_cents in seeds/_seeds_setup/raw_ads__spend.csv to a negative number.
What fails?
assert_no_negative_spend. not_null on spend_dollars does not fire - the value isn't null, just bad.
Click-through rate above 100%
In seeds/_seeds_setup/raw_ads__impressions.csv, set one clicks value higher than the matching impressions_count.
What fails?
dbt_expectations.expect_column_values_to_be_between on click_through_rate (severity error) - hard fail, build stops.
Unknown campaign type
In seeds/_seeds_setup/raw_ads__campaigns.csv, change one campaign_type to a value not in commission_lookup.csv (e.g. banner).
What fails?
accepted_values_stg_ads__campaigns_campaign_type__display__video__sponsored_content__podcast_ad__newsletter warns - the new value isn't in the configured set (severity warn from L3 Step 2). If you also added the optional relationships test on campaign_type → commission_lookup.campaign_type from L2 Step 8, that would fire too.
Zero impressions
In seeds/_seeds_setup/raw_ads__impressions.csv, set one impressions_count to 0.
What fails?
dbt_expectations.expect_column_values_to_be_between on impressions_count (min 1, severity warn).
Incremental: late-arriving correction
With fct_ad_impressions already built, change a clicks or impressions_count value for an existing (campaign_id, content_id, impression_date) row in seeds/_seeds_setup/raw_ads__impressions.csv. Re-seed, then dbt run --select fct_ad_impressions (no --full-refresh).
What happens?
The corrected row is not picked up - your append strategy + high-watermark filter on impression_date skips it. Now run dbt run --select fct_ad_impressions --full-refresh and confirm the new value lands. This is the trade-off you accepted in L1 Step 10.
Revert after each scenario
Restore the seed and re-run dbt seed before the next one - otherwise you can't tell which test fired in response to what.
Done?
You've moved from writing tests to configuring them - choosing severity, scoping with where, storing failures for debuggability, and adding statistical guardrails around your revenue models. A well-configured test suite is the difference between a pipeline that fails loudly and one that silently serves wrong numbers.