..Add
🛠️📦 dbt Links
dbt Additional Resources
- dbt Mesh FAQs
- Awesome dbt, a curated list of dbt resources
- SODA, tool to find, analyze, and solve data issues
- Integrating dbt w/ Datadog
learn dbt
- learn.dbt
- [fundamentals]https://learn.getdbt.com/courses/dbt-fundamentals
- [dbt & snowflake]https://courses.getdbt.com/courses/dbt-cloud-and-snowflake-for-developers
- [refactoring SQL]https://courses.getdbt.com/courses/refactoring-sql-for-modularity
- [advanced deployment w/ dbt Cloud]https://courses.getdbt.com/courses/advanced-deployment
dbt code completion
cheat sheet
git workflow for dbt
Go to dbt project
cd ~/projects/{dbt project name}
Start the pipenv shell
pipenv shell
Update master
git fetch
git pull
Checkout branch
New branch:
git checkout -b {new_branch name}
From PR:
git pull origin pull/{PR Number}/head
Update models locally
dbt run --models staging_{name of model}
Add + to run all referenced by and/or depends on models
+model_c for all upstream models, also runs model_a and model_b
model_a+ for all downstream models, also runs model_b and model_c
+model_b+ for all up & down stream models, also runs model_a and model_c
Make sure SQL syntax is correct
sqlfluff fix -f -n {full path of file}
Update/Create model
dbt run —models {model name}
dbt test —models {model name}
Update schema.yml & View local doc changes
dbt docs generate && dbt docs serve --port 8001
Run GIT commands
git add {files}
git commit -m "message here"
git push --set-upstream origin {branch here}
Either wait for PR review or force merge 🫣 in emergency
!merge
Switch back to master, update and delete branch
git checkout master
git fetch
git merge
git branch -d {local_branch}
More notes
Update Incremental models
update models as needed and run the following locally
dbt run -s {model name} --full-refresh
if updating more than one file and you want to run dbt for all of them
dbt run -s {model name} {another model name} path_to/models/* --full-refresh
Once tested and merged into production (make sure buildkite successfully pushed changes), you will need to trigger a full refresh in dagster
- go to dagster
- open
ad-hoceither by searching or navigating to it. - Go into
Launchpadtab - Under
Presetselect “Run” and underModeselect “Prod”. Dagter will give a template to run ad-hoc models. - Replace
model_namewith necessary models to update,-for each model. - At the same level is
models:&task_tags:addfull-refresh: True - When ready, hit
Launch Runin the bottom right corner
Update Seed files
Update all seed files
dbt seed
Update individual seed
dbt seed --select {seed name}
Update individual seed and rebuild table structure
dbt seed --select {seed name} --full-refresh
Building a snapshot via template
replace { }
-- template for building a snapshot model
select
'{% snapshot {source domain name}_' || lower(TABLE_NAME) || '_snapshot %}' || chr(10) || chr(10) ||
'{{' || chr(10) ||
' config(' || chr(10) ||
' unique_key=''' || 'id' || ''',' || chr(10) ||
' strategy=''timestamp'',' || chr(10) ||
' updated_at=''' || lower(column_name) || '::timestamp_ntz'',' || chr(10) ||
' )' || chr(10) ||
'}}' || chr(10) ||
chr(10) ||
'select *' || chr(10) ||
'from {{ source(''catalog_api'', ''' || lower(TABLE_NAME) || ''') }}' || chr(10) ||
'where ' || '_sdc_batched_at' || ' >= {{ safe_max_timestamp(this, ''' || '_sdc_batched_at' || ''') }}' || chr(10)
from information_schema.columns
where table_schema = '{source domain schema}'
and table_catalog = '{source domain catalog}'
and (column_name ilike '%modif%' or column_name ilike '%update%')
and data_type = 'TIMESTAMP_TZ'
Building a staging model from snapshots
replace { }
-- template for building base staging models from snapshots
select
'{{' || char(10)
|| ' config(' || char(10)
|| ' materialized=''table'',' || char(10)
|| ' unique_key=''id'',' || char(10)
|| ' )' || char(10)
|| '}}' || char(10)
|| char(10)
|| 'with stitch_fixed as (' || char(10)
|| ' {{ stitch_utils.coalesce_fields(source(''{snapshot domain}_snapshots'', ''' || lower(table_name) || ''')) }}' || char(10)
|| ' where dbt_valid_to is null' || char(10)
|| ')' || char(10)
|| 'select' || char(10)
|| ' *' || char(10)
|| 'from stitch_fixed' || char(10),
'staging_' || replace(lower(table_name), '_snapshot', '') || '.sql' as file_name,
'select * from {analyst''s schema}.{source domain}.staging_' || replace(lower(table_name), '_snapshot', ''),
' - name: ' || lower(table_name),
*
from information_schema.tables
where 1=1
and table_schema = '{source domain}'
and table_catalog = '{domain snapshot catalog}'