Group 2 - Checklist Level 2¶
dbt Level Up¶
Start on this checklist once you have completed Checklist Level 1.
In this level you will apply the following skills:
- Testing -
relationships,accepted_values,not_nullgap-filling - Documentation - YAML for your mart
- Snapshots - SCD Type 2 for slowly-changing article metadata
Work through the steps in order. Expand a hint only after you've had a genuine attempt - the struggle is where the learning happens!
Step 1 - Review existing tests and identify gaps¶
- Step complete
Look at _news__models.yml and _podcasts__models.yml. Are the tests comprehensive? What's missing?
Make a note of at least two gaps you'd like to fill. You'll add them in Step 2.
Hint: Common gaps to look for
- No
relationshipstest linkingstg_news__articles.author_id→stg_news__authors.author_id - No
not_nulltest onpublished_atin episodes - No
accepted_valuesonstatusin articles (draft,published,archived) - No
uniquetest onepisode_idin episodes
Step 2 - Fill the test gaps¶
- Step complete
Go back to _news__models.yml and _podcasts__models.yml and add the missing tests you identified in Step 1.
Run them and understand any failures before fixing:
Hint: Relationships test example
Hint: accepted_values example
Use the list structure in yaml - you have two options.
Option 1:
Option 2:
Run first to observe the failure, then decide: update the accepted list, or normalise the values in the staging model?
Step 3 - Add a YAML file for the mart¶
- Step complete
Create models/marts/content/_content__models.yml. Document content_performance with descriptions and tests.
Include at minimum:
- A
not_nulltest oncontent_id - A
not_nulltest onplatform - A
not_nulltest onpublished_at - An
accepted_valuestest onplatform
Hint: Use the codegen package to generate the model yaml
dbt-codegen generates model YAML so you don't have to write it by hand.
1. Add the package to packages.yml by adding the following two lines under dbt_utils:
2. Install it: It should automatically install, however to manually do this you can run the following in the command line.
3. Open a new (or existing) untitled file in dbt Cloud and paste the following, then click </> Compile:
Copy the compiled output into your _streaming__models.yml and fill in descriptions and any additional tests.
Step 4 - Create a snapshot for article metadata¶
- Step complete
Create snapshots/snap_news__articles.sql. This should track changes to article title, category, and status over time using the timestamp strategy.
You can run this snapshot on the source data using the source() macro.
Hint: Snapshot block
Add the following yaml
snapshots:
- name: <string> # the name of your snapshot
+relation: source('my_source', 'my_table') | ref('my_model')
+database: <string>
+schema: <string>
+alias: <string>
+unique_key: <column_name_or_expression>
+strategy: timestamp | check # choose the most appropriate
+updated_at: <column_name> # only when timestamp strategy is selected
+check_cols: [<column_name>] | all # only when timestamp strategy is selected
+dbt_valid_to_current: <string> # default is NULL
+hard_deletes: 'ignore' | 'invalidate' | 'new_record' # default is ignore
Run it:
Check the output table. What columns did dbt add? (dbt_scd_id, dbt_updated_at, dbt_valid_from, dbt_valid_to)
Step 5 - Run the snapshot a second time (simulate a change)¶
- Step complete
To see the snapshot in action, update the underlying source table name to point to the updated table:
Note: In practice the underlying source table would change, and you would not change any reference in dbt!!
Then run dbt snapshot again and query the snapshot table:
select * from snapshots.snap_news__articles
where dbt_valid_to is not null
order by dbt_updated_at desc
You should see the old row with a dbt_valid_to value and a new current row with dbt_valid_to is null.
Hint: Reading snapshot output
| Column | Meaning |
|---|---|
dbt_valid_from |
When this version of the row became current |
dbt_valid_to |
When this version was superseded (NULL = still current) |
dbt_scd_id |
Surrogate key for this snapshot row |
Step 6 - Run dbt build¶
- Step complete
This builds the full lineage and runs all tests together.
Fix any remaining failures. A test failure is information - read the error, query the failing rows, understand why before changing anything.
Step 7 - BONUS: Snapshot for podcast episodes¶
- Step complete
Create a snapshot for podcasts.episodes tracking changes to title and duration_seconds. Why might you want to track duration changes?
You can follow the same steps as above, using the "updated" data for episodes.
Done?
You've added relationship integrity checks, filled test gaps, documented your mart, and implemented SCD Type 2 for article metadata. These are the building blocks of a production-grade test suite - nicely done.
Your work directly enables Group 3's revenue attribution - they need clean content data to allocate ad revenue correctly.
Now head to Level 3 to configure your tests with severity, where clauses, and statistical guardrails from dbt_expectations!