Here is the DBT basics and terminology in hand written notes I learnt from. I will share the additional information as per my learning.
DBT Basics and Terminology: A Beginner’s Guide
Introduction
DBT (Data Build Tool) is a powerful framework for transforming raw data into meaningful insights in a data warehouse. It allows data engineers and analysts to write modular SQL, manage transformations, and maintain version control, all while ensuring data quality.
In this guide, we’ll cover key DBT terminology and concepts you need to know to get started.
1. DBT Models
-
Definition: SQL files in DBT that define how raw data is transformed.
-
Example: A simple model to calculate total sales per customer:
-
Key point: Models are the building blocks of your DBT project.
2. Sources
-
Definition: Raw tables or views in your data warehouse that DBT references.
-
Usage: Helps track data lineage and ensures DBT knows where data originates.
-
Example:
3. Seeds
-
Definition: CSV files included in your DBT project that can be loaded into your warehouse.
-
Use case: Small lookup tables, static reference data, or configuration tables.
-
Example:
countries.csvto map country codes to names.
4. Snapshots
-
Definition: A way to capture and track changes to source data over time.
-
Example: Tracking changes in customer status:
-
Key point: Snapshots are great for slowly changing dimensions.
5. Tests
-
Definition: Checks to ensure your data meets expectations.
-
Example: Validate that no customer has a NULL
customer_id:
6. References (ref)
-
Definition: Used to reference other DBT models or sources, maintaining dependency order.
-
Example:
-
Benefit: Automatically handles model dependencies and build order.
7. DBT Run & Build
-
dbt run– Executes your models in order. -
dbt build– Runs models, seeds, snapshots, and tests in one command. -
Tip: Use
--selectto run specific models during development.
8. Best Practices
-
Use modular models — small, reusable SQL files.
-
Apply naming conventions for clarity (e.g.,
stg_,int_,fct_). -
Implement tests and snapshots to track data quality and history.
-
Use source.yml to define sources and track lineage.
-
Commit frequently and use version control (GitHub/GitLab).
9. Conclusion
Understanding DBT’s terminology is the first step to building robust data transformation pipelines. By mastering models, sources, seeds, snapshots, and tests, you can ensure your data is accurate, version-controlled, and ready for analytics.
Please feel free to comment in case of any topics missed. This is a live document and will be uploading











Comments
Post a Comment