Generate

The generate function in DUQL is used to create new columns or modify existing ones based on expressions, calculations, and conditional logic. It's a powerful tool for data transformation and enrichment within your query pipeline.

Syntax

generate:
  <new_column_name>: <expression>

Parameters

ParameterTypeRequiredDescription

generate

object

Yes

Key-value pairs where keys are new column names and values are expressions

Behavior

  • Creates new columns or modifies existing ones based on the provided expressions.

  • Can reference other columns, use functions, and implement conditional logic.

  • Executes in the order specified within the generate block.

Examples

Basic Column Generation

generate:
  full_name: f"{first_name} {last_name}"

Boolean Flags

generate:
  is_northern: latitude > 0
  should_have_shipped_today: order_time < @08:30

Conditional Logic

generate:
  discount:
    case:
    - total_purchases > 1000: 0.15
    - total_purchases > 500: 0.10
    - total_purchases > 100: 0.05
    - true: 0

Complex Calculations

generate:
  bmi: weight / (height / 100) ^ 2
  health_status:
    case:
    - bmi < 18.5: "Underweight"
    - bmi < 25: "Normal"
    - bmi < 30: "Overweight"
    - true: "Obese"

Using SQL Functions

generate:
  clean_text: 
    sql'REGEXP_REPLACE(LOWER(text), ''[^a-z0-9 ]'', '''')'
  word_count:
    sql'ARRAY_LENGTH(SPLIT(clean_text, '' ''))'

Financial Calculations

generate:
  revenue: price * quantity
  profit_margin: (revenue - cost) / revenue * 100
  performance:
    case:
    - profit_margin > 30: "Excellent"
    - profit_margin > 20: "Good"
    - profit_margin > 10: "Average"
    - true: "Poor"

Geospatial Calculations

generate:
  distance: 
    sql'ST_Distance(ST_Point(long, lat), ST_Point(-74.006, 40.7128))'
  is_nearby: distance < 10

Best Practices

  1. 🏷️ Use clear and descriptive names for generated columns.

  2. 🧮 Break down complex calculations into multiple steps for readability.

  3. 🔍 Validate your generated columns, especially when using complex expressions.

  4. 🚀 Consider performance implications when generating columns with heavy calculations.

  5. 📊 Use case statements for complex conditional logic.

  6. 🔄 Remember that columns generated earlier can be used in subsequent generations within the same generate block.


💡 Tip: The generate function is your Swiss Army knife for data transformation in DUQL. Use it to create new insights, segment your data, and prepare it for deeper analysis. Don't hesitate to combine multiple generate steps in your pipeline for complex transformations!

Last updated