Here is the DBT basics and terminology in hand written notes I learnt from. I will share the additional information as per my learning.

DBT Basics and Terminology: A Beginner’s Guide

Introduction

DBT (Data Build Tool) is a powerful framework for transforming raw data into meaningful insights in a data warehouse. It allows data engineers and analysts to write modular SQL, manage transformations, and maintain version control, all while ensuring data quality.

In this guide, we’ll cover key DBT terminology and concepts you need to know to get started.

1. DBT Models

Definition: SQL files in DBT that define how raw data is transformed.
Example: A simple model to calculate total sales per customer:


-- models/customer_total_sales.sql
SELECT
    customer_id,
    SUM(order_amount) AS total_sales
FROM {{ ref('orders') }}
GROUP BY customer_id

Key point: Models are the building blocks of your DBT project.

2. Sources

Definition: Raw tables or views in your data warehouse that DBT references.
Usage: Helps track data lineage and ensures DBT knows where data originates.
Example:


version: 2
sources:
  - name: raw_data
    tables:
      - name: orders
      - name: customers

3. Seeds

Definition: CSV files included in your DBT project that can be loaded into your warehouse.
Use case: Small lookup tables, static reference data, or configuration tables.
Example: countries.csv to map country codes to names.

4. Snapshots

Definition: A way to capture and track changes to source data over time.
Example: Tracking changes in customer status:


{% snapshot customer_status %}
    {{
      config(
        target_schema='snapshots',
        unique_key='customer_id',
        strategy='timestamp',
        updated_at='last_updated'
      )
    }}
    SELECT * FROM {{ source('raw_data', 'customers') }}
{% endsnapshot %}

Key point: Snapshots are great for slowly changing dimensions.

5. Tests

Definition: Checks to ensure your data meets expectations.
Example: Validate that no customer has a NULL customer_id:


version: 2
models:
  - name: customer_total_sales
    columns:
      - name: customer_id
        tests:
          - not_null

6. References (`ref`)

Definition: Used to reference other DBT models or sources, maintaining dependency order.
Example:


SELECT * FROM {{ ref('customer_total_sales') }}

Benefit: Automatically handles model dependencies and build order.

7. DBT Run & Build

dbt run – Executes your models in order.
dbt build – Runs models, seeds, snapshots, and tests in one command.
Tip: Use --select to run specific models during development.

8. Best Practices

Use modular models — small, reusable SQL files.
Apply naming conventions for clarity (e.g., stg_, int_, fct_).
Implement tests and snapshots to track data quality and history.
Use source.yml to define sources and track lineage.
Commit frequently and use version control (GitHub/GitLab).

9. Conclusion

Understanding DBT’s terminology is the first step to building robust data transformation pipelines. By mastering models, sources, seeds, snapshots, and tests, you can ensure your data is accurate, version-controlled, and ready for analytics.

Please feel free to comment in case of any topics missed. This is a live document and will be uploading