Group 2 - Checklist Level 3¶
Advanced Testing¶
Start on this checklist once you have completed Checklist Level 2.
In this level you will apply the following skills:
- Test configuration -
severity,where,store_failures,limit - Package-based tests -
dbt_expectationsfor statistical guardrails - Test strategy - deciding what to test, at what severity, and why
Work through the steps in order. Expand a hint only after you've had a genuine attempt - the struggle is where the learning happens!
Step 1 - Install dbt_expectations¶
- Step complete
Add the package to packages.yml:
Then install:
Step 2 - Add severity and where to existing tests¶
- Step complete
Open _news__models.yml and _podcasts__models.yml and apply the following configuration:
- Set
severity: warnon theaccepted_valuestest forstatusinstg_news__articles- new statuses may be added legitimately and shouldn't hard-fail CI. - Add a
wherefilter to theuniquetest onarticle_idso it only evaluatespublishedarticles (drafts may intentionally appear multiple times while being edited). - Set
severity: warnon therelationshipstest linkingauthor_id→stg_news__authors- orphaned authors are worth flagging but shouldn't block a deploy.
Hint: Test config syntax
Add a config: block nested inside any test to change its behaviour:
Step 3 - Enable store_failures on a test¶
- Step complete
store_failures: true writes failing rows to a table in your target schema, making it easy to query and understand which rows caused the failure.
Add a unique test to the article_id in the source news.articles - recall that this column is not unique in the source since articles can be republished.
Run the test:
Then find and query the failures table in Snowflake. Does seeing the actual failing rows help you understand the issue faster than a simple pass/fail count?
Hint: store_failures config
Add store_failures inside a config: block on the test:
Hint: Finding the failures table
After running, dbt prints the table name in the run output:
Step 4 - Add dbt_expectations tests to stg_news__articles¶
- Step complete
Add statistical guardrails to the articles model:
- The table should have at least 20 rows (catch a silent seed/source failure)
word_countshould be between 50 and 5000 (catch data entry errors - no article is 2 words or 50,000 words)
Hint: Row count and value bounds
Row count tests go at the model level; value tests go under a column:
- name: model_name
data_tests:
- dbt_expectations.expect_table_row_count_to_be_between:
arguments:
min_value: <n>
columns:
- name: column_name
data_tests:
- dbt_expectations.expect_column_values_to_be_between:
arguments:
min_value: <n>
max_value: <n> # scope the test to rows where null is not acceptable
Step 5 - Add elementary tests to context_performance¶
- Step complete
Check out the elementary package..
Install the package into your project.
Add the anomaly test to the duration column to check whether the number of seconds in an episode falls outside the expected range based on historical averages.
Before running your tests, make sure Elementary's internal tables exist in your schema
You only need to do this once per environment. What do you see in your output?When would this test fail?
Step 6 - Add a singular test¶
Add a singular test to ensure that if there is a category in the content_performance mart there is always a category_group.
Hint: What is a custom singular test??
Singular tests are plain .sql files that return rows when they fail.
They live in the tests/ directory. It should be called assert_<test_functionality>.sql, where you finish the name describing what the test is doing.
When you have your test save run
Does your new test pass?
Extension: convert this test to a generic test
Generic tests live in the tests/generic/ folder and are reusable across any model and column.
Can you refactor your singular test into a generic one that accepts the model and a list of expected values as arguments?
- Create a folder called
generic/in thetests/directory - Move your current test into the new folder
generic/
Once you have written your generic, how would you apply it in a .yml file?
How to apply a generic test in a yaml file?
Given a generic test, you can apply it in a .yml file like this:
- name: model_name
columns:
- name: column_name
data_tests:
- assert_two_columns_are_equal:
other_column_name: other_column
The arguments (model, column_name) are always the model and column the test is defined under. Any additional arguments become named parameters beneath the test name.
Step 7 - Run the full test suite and review severity split¶
- Step complete
Review the output and discuss as a group:
- Which tests block CI (
error) vs flag for investigation (warn)? - Where did you use
whereto scope a test - and was the scoping decision correct? - Which
store_failurestables would you keep permanently vs enable only for debugging?
Done?
You've moved from writing tests to configuring them - choosing severity, scoping with where, storing failures for debuggability, and adding statistical guardrails with dbt_expectations. These are the skills that separate a test suite that blocks CI from one that just generates noise.
Stretch your test suite! What else would you cover?¶
You've now applied generic tests, singular tests, severity/where config, store_failures, and a handful of dbt_expectations checks. Step back and ask: what else would you test on these models if this were a real production pipeline?
Brainstorm with your group, then pick one or two extra tests to actually add. Some directions to consider:
- Within
dbt_utils(already installed): tests likeexpression_is_true,equal_rowcount,recency, ormutually_exclusive_rangescover things that purenot_null/uniquewon't. - Anomaly / freshness - is the latest
impression_daterecent enough? Is yesterday's row count within 20% of the 7-day average? - Cross-model invariants - does the sum of
allocated_spend_dollarsreconcile to totalspend_dollarsper day? (You partly tested this in Level 2 Step 3 - tighten the tolerance, or add the inverse check.) - Distribution shape -
expect_column_quantile_values_to_be_between,expect_column_stdev_to_be_betweenfromdbt_expectations.
Other packages worth looking at on the dbt package hub:
- 📦
elementary- anomaly detection on volume, freshness, and column-level statistics out of the box. - 📦
dbt_project_evaluator- tests the project itself (model naming, missing tests, missing documentation) rather than the data. - 📦
dbt_meta_testing- assert that every model has a minimum set of tests/docs.
Pick one or two ideas, add them to your YAML or tests/ folder, and run them. Note what was easy vs awkward to express - test gaps usually live in that awkwardness.