Group 1 - Checklist Level 2¶
dbt Level Up¶
Start on this checklist once you have completed Checklist Level 1.
In this level you will apply the following skills:
- Further testing -
accepted_valuesandrelationships - Jinja & Macros -
clean_stringandcents_to_dollars - Singular tests - custom SQL assertions
Work through the steps in order. Expand a hint only after you've had a genuine attempt - the struggle is where the learning happens!
Step 1 - Add accepted_values tests¶
- Step complete
The raw streaming data has inconsistent casing... look at
- status and plan_type in raw_streaming.subscriptions
- device_type in raw_streaming.watch_events
Add an accepted_values tests to _streaming__models.yml for those columns. Run the tests and observe what happens.
Expect failures. The goal right now is to understand why they fail - not fix them yet.
Hint: What values exist?
You'll see values like 'Premium' alongside 'premium', and 'Smart TV' alongside 'smart tv'. Your accepted_values list needs to match exactly what comes out of the staging model - so you have two choices: list every casing variant, or normalise the values in the staging model.
It is up to you to decide which is the right option.
Hint: accepted_values YAML structure
Use the list structure in yaml - you have two options.
Option 1:
Option 2:
Step 2 - Add a relationships test¶
- Step complete
A relationships test checks that every value in one column exists in another model's column - the dbt equivalent of a foreign key check.
Add a relationships test to stg_streaming__watch_events to assert that every content_id in watch events has a matching row in stg_streaming__content_catalog.
Hint: relationships test syntax
In _streaming__models.yml, under stg_streaming__watch_events:
columns:
- name: column_name
description: Foreign key to column_name in table_name
data_tests:
- not_null
- relationships:
arguments:
to: ref('model_name')
field: primary_key_column_name
Run it:
If any watch_events.content_id value doesn't exist in content_catalog, the test fails. What would cause that in a real system?
Step 3 - Write a clean_string macro¶
- Step complete
Create macros/clean_string.sql. The macro should accept a column name and return an expression that trims whitespace, converts to lowercase, and coalesces nulls to an empty string.
Hint: Macro skeleton
Macros receive expressions, so callers pass the column name as a string:
Step 4 - Apply clean_string in your staging models¶
- Step complete
Update the following columns in the models:
- stg_streaming__watch_events.sql
- clean_string
- device_type
- stg_streaming__subscriptions.sql
- status
- plan_type.
dbt run --select stg_streaming__watch_events stg_streaming__subscriptions
dbt test --select stg_streaming__watch_events stg_streaming__subscriptions
Does the accepted_values tests you wrote in Step 1 still pass?
Hint: In context
After applying, re-run your accepted_values tests from Step 1. If they still fail, check select distinct status from stg_streaming__subscriptions to see what's coming through.
Step 5 - Write a cents_to_dollars macro and apply it¶
- Step complete
Create macros/cents_to_dollars.sql. The macro accepts a column name and returns a division expression.
Then update stg_streaming__subscriptions.sql to use it for monthly_fee_cents.
Hint: Macro
The functionality you want is this:
Create a macro wrapper in your new file and apply it later using:
Verify the output looks right by running the code in stg_streaming__subscriptions
Step 6 - Write a singular test¶
- Step complete
Singular tests are plain .sql files in the tests/ folder.
When does a test pass in dbt?
A test passes when the query returns zero rows - any rows returned are failures.
Write a test that asserts no watch event has a duration longer than the content's total runtime. A user can't watch more seconds of content than exist in it.
Create tests/assert_watch_duration_lte_runtime.sql.
Hint: Test logic
-- Returns rows where a watch event is longer than the content's runtime.
-- Zero rows = test passes.
select
-- add select columns here
from -- use the ref macro
left join -- use the ref macro
using -- add the required column
where -- add the condition where a watch event is longer than the content's runtime.
Run it:
Does it pass on the data? If it fails, look at the failing rows - is it a data quality issue or a logic issue in the test?
Hint: Optional: Write a second singular test
Try a second one: assert that no subscription has a monthly_fee_dollars of 0 unless the plan_type is trialing.
Choose the name of your test and run dbt test on that file to check it works.
Step 7 - Run the full test suite¶
- Step complete
Fix any remaining failures. A test failure is information - read the error, query the failing rows, understand why before changing anything.
Step 8 - BONUS: dbt build and check lineage¶
- Step complete
This runs models and tests together in dependency order. Then open the DAG in dbt Cloud and confirm all three staging models appear with green source nodes from raw_streaming.
Hint: Generating docs
Now click on the Documentation button - you can find this in the top-left of the console, to the right of the name of your branch.
You should see streaming → stg_streaming__* with source nodes shown in green.
Done?
You've added relationship integrity checks, normalized messy source data with macros, and written your first custom SQL assertions. These are the building blocks of a production-grade test suite - nicely done.
Now head to Level 3 to configure your tests with severity, where clauses, and statistical guardrails from dbt_expectations!