/docs

slunsford/coffee-analytics

Generates comprehensive documentation that follows project structure, documenting APIs and usage patterns with consistent formatting for better user understanding.

CLAUDE.md

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a Coffee Analytics project, a dbt-powered data platform for tracking and analyzing personal coffee tasting experiences. The project transforms coffee data from Collections (primary source) and Airtable (historic data) into dimensional models for personal reference and analysis. It follows a standard dbt medallion architecture with staging → intermediate → marts layers.

Individual Preferences

  • @~/.claude/coffee-analytics-instructions.md

Essential Commands

dbt Operations

  • dbt build — Compile, run, and test all models
  • dbt build --select model_name — Build a specific model
  • dbt build --select model_name+ — Build a model and all downstream models
  • dbt build --select +model_name — Build a model and all upstream models
  • dbt run — Execute all models
  • dbt test — Run all tests
  • dbt compile — Compile SQL without executing
  • dbt deps — Install dependencies from packages.yml
  • dbt clean — Clean the target directory
  • dbt docs generate — Generate documentation
  • dbt run-operation generate_base_model --args '{"source_name": "collections", "table_name": "coffees"}' — Generate base model for a specific source and table
  • dbt run-operation generate_model_yaml --args '{"model_names": ["coffees"], "include_data_types": true}' — Generate YAML for a specific model with data types

Evidence Reports

  • npm run dev --prefix ./reports — Start Evidence dev server for interactive reports
  • npm run build --prefix ./reports — Build Evidence reports for production

Model Selection Patterns

  • dbt build --select staging.collections — Build all Collections staging models (primary source)
  • dbt build --select staging.airtable — Build all Airtable staging models (historic data)
  • dbt build --select marts — Build all mart tables
  • dbt build --select +coffees — Build coffees model with all dependencies

Architecture Overview

Data Sources

  • Collections: Primary coffee database for current coffee tracking and ratings
  • Airtable: Historic coffee data source, maintained for legacy data and some reference information

Model Layers

Staging (models/staging/)

  • Clean and standardize source data
  • Naming: stg_source__entity.sql (e.g., stg_collections__coffees, stg_airtable__ratings)
  • Materialized as views
  • Handle data type conversions and basic transformations

Intermediate (models/intermediate/)

  • Business logic transformations and joins
  • Naming: int_entity_action.sql (e.g., int_flavor_profiles_unnested)
  • Materialized as views
  • Denormalization and data enrichment

Marts (models/marts/)

  • Final dimensional models for personal coffee analytics
  • Naming: entity-focused tables (e.g., coffees, ratings, flavors)
  • Materialized as tables
  • Only marts layer is exposed to end users via Evidence reports

Key Domains

  • Coffees: Coffee records with roasters, origins, and characteristics
  • Ratings: Personal coffee ratings and tasting notes
  • Flavors: Flavor profiles and categories
  • Origins: Geographic coffee origins and regions
  • Roasters: Coffee roasting companies

Data Modeling Patterns

Key Identifiers

  • Primary keys use _id suffix (e.g., coffee_id, roaster_id)
  • Surrogate keys for flavor profiles use _key suffix
  • Boolean fields use is_ prefix (e.g., is_available, is_decaf)
  • Cross-references maintained via relationship tests

Primary Entities

  • Coffees: Central entity representing individual coffee records
  • Ratings: Individual coffee ratings and evaluations
  • Flavors: Flavor categories and profiles
  • Origins: Geographic origins and regions

Custom Macros (macros/)

  • rating_value.sql — Converts rating text to numeric values (-1, 0, 1)
  • extract_id.sql — Extracts ID from Airtable relationship fields
  • extract_ids.sql — Extracts multiple IDs from Airtable array fields
  • dedupe.sql — Deduplication logic for source data

Development Standards

Model Documentation

  • All models require YAML documentation with column descriptions
  • Use data_tests: (not deprecated tests:)
  • Document grain, relationships, and context
  • Include model descriptions explaining purpose and data sources

Testing Strategy

  • Unique/not null tests on primary keys
  • Relationship tests for foreign keys
  • Accepted values tests for categorical columns
  • Data freshness checks on source tables
  • Custom validation for data quality

SQL Formatting Standards

DuckDB from-first Syntax

  • Use DuckDB's from-first syntax: from table_name instead of select * from table_name
  • This is cleaner and more readable for simple table selections
  • Example:
with

source as (
    from {{ source('collections', 'coffees') }}
),

final as (
    from source
)

from final

Column Selection Alignment

  • Within SELECT clauses, align column names consistently with 12-space indentation from left margin
  • Place aliases on same line with as keyword
  • Example:
     select coffee_id,
            coffee_name,
            roaster_id,
            coalesce(country, '[Blend]') as country,
            rating_value

Join and Keyword Alignment

  • All SQL keywords (from, join, left join, where, group by, etc.) are right-aligned
  • Single space between right-aligned keyword and following table/column names
  • Use using keyword when possible for natural joins on identical column names
  • Example:
       from coffees
  left join origins
      using (origin_id)
  left join roasters
      using (roaster_id)
      where is_available = true

CTE Structure

  • Use descriptive CTE names that explain the transformation
  • Start with source CTE for staging models
  • Include final CTE for complex models
  • Separate CTEs with blank lines and clear comments

Field Naming Conventions

  • Identifiers: Use _id suffix (coffee_id, roaster_id)
  • Booleans: Use is_ prefix (is_available, is_decaf, is_favorite)
  • Dates: Use clear suffixes (_at for timestamps, _date for dates)
  • Categorical: Use descriptive names (availability, caffeine_content)
  • Nulls: Handle with coalesce and meaningful defaults

Model Materialization

  • Staging: Views (for real-time source data)
  • Intermediate: Views (for transformation flexibility)
  • Marts: Tables (for performance and end-user access)

Code Quality Standards

Error Handling

  • Use coalesce() for null handling with meaningful defaults
  • Provide fallback values for missing data (e.g., '[Unknown]', '[Blend]')
  • Validate data types and handle edge cases

Performance Considerations

  • Use appropriate materializations for each layer
  • Optimize join patterns and avoid unnecessary complexity
  • Consider incremental models for large datasets

Documentation Requirements

  • All columns must have descriptions
  • Model descriptions should explain purpose
  • Include examples for complex transformations
  • Document any assumptions or data quirks
README.md

The combined dbt/Evidence project for Sean's coffee ratings log.

The Full Stack

  1. Data is tracked in Collections, a macOS/iOS database app with Shortcuts support.1
  2. A shortcut exports the data to CSV, converts column names from Title Case to snake_case, and saves the files to S3 via S3 Files. This shortcut is run on a schedule with automations.
  3. This dbt project transforms the data in MotherDuck.
  4. The Evidence site is deployed on Vercel.

Footnotes

  1. Until recently, this was done in Airtable and synced via Fivetran.