dbt Basics and Terminology

 Here is the DBT basics and terminology in hand written notes I learnt from. I will share the additional information as per my learning.

DBT Basics and Terminology: A Beginner’s Guide

Introduction

DBT (Data Build Tool) is a powerful framework for transforming raw data into meaningful insights in a data warehouse. It allows data engineers and analysts to write modular SQL, manage transformations, and maintain version control, all while ensuring data quality.

In this guide, we’ll cover key DBT terminology and concepts you need to know to get started.


1. DBT Models

  • Definition: SQL files in DBT that define how raw data is transformed.

  • Example: A simple model to calculate total sales per customer:

-- models/customer_total_sales.sql SELECT customer_id, SUM(order_amount) AS total_sales FROM {{ ref('orders') }} GROUP BY customer_id
  • Key point: Models are the building blocks of your DBT project.


2. Sources

  • Definition: Raw tables or views in your data warehouse that DBT references.

  • Usage: Helps track data lineage and ensures DBT knows where data originates.

  • Example:

version: 2 sources: - name: raw_data tables: - name: orders - name: customers

3. Seeds

  • Definition: CSV files included in your DBT project that can be loaded into your warehouse.

  • Use case: Small lookup tables, static reference data, or configuration tables.

  • Example: countries.csv to map country codes to names.


4. Snapshots

  • Definition: A way to capture and track changes to source data over time.

  • Example: Tracking changes in customer status:

{% snapshot customer_status %} {{ config( target_schema='snapshots', unique_key='customer_id', strategy='timestamp', updated_at='last_updated' ) }} SELECT * FROM {{ source('raw_data', 'customers') }} {% endsnapshot %}
  • Key point: Snapshots are great for slowly changing dimensions.


5. Tests

  • Definition: Checks to ensure your data meets expectations.

  • Example: Validate that no customer has a NULL customer_id:

version: 2 models: - name: customer_total_sales columns: - name: customer_id tests: - not_null

6. References (ref)

  • Definition: Used to reference other DBT models or sources, maintaining dependency order.

  • Example:

SELECT * FROM {{ ref('customer_total_sales') }}
  • Benefit: Automatically handles model dependencies and build order.


7. DBT Run & Build

  • dbt run – Executes your models in order.

  • dbt build – Runs models, seeds, snapshots, and tests in one command.

  • Tip: Use --select to run specific models during development.


8. Best Practices

  1. Use modular models — small, reusable SQL files.

  2. Apply naming conventions for clarity (e.g., stg_, int_, fct_).

  3. Implement tests and snapshots to track data quality and history.

  4. Use source.yml to define sources and track lineage.

  5. Commit frequently and use version control (GitHub/GitLab).


9. Conclusion

Understanding DBT’s terminology is the first step to building robust data transformation pipelines. By mastering models, sources, seeds, snapshots, and tests, you can ensure your data is accurate, version-controlled, and ready for analytics.


Please feel free to comment in case of any topics missed. This is a live document and will be uploading 













Comments