Generate
The generate function in DUQL is used to create new columns or modify existing ones based on expressions, calculations, and conditional logic. It's a powerful tool for data transformation and enrichment within your query pipeline.
Syntax
generate:
<new_column_name>: <expression>Parameters
generate
object
Yes
Key-value pairs where keys are new column names and values are expressions
Behavior
Creates new columns or modifies existing ones based on the provided expressions.
Can reference other columns, use functions, and implement conditional logic.
Executes in the order specified within the
generateblock.
Examples
Basic Column Generation
generate:
full_name: f"{first_name} {last_name}"Boolean Flags
generate:
is_northern: latitude > 0
should_have_shipped_today: order_time < @08:30Conditional Logic
generate:
discount:
case:
- total_purchases > 1000: 0.15
- total_purchases > 500: 0.10
- total_purchases > 100: 0.05
- true: 0Complex Calculations
generate:
bmi: weight / (height / 100) ^ 2
health_status:
case:
- bmi < 18.5: "Underweight"
- bmi < 25: "Normal"
- bmi < 30: "Overweight"
- true: "Obese"Using SQL Functions
generate:
clean_text:
sql'REGEXP_REPLACE(LOWER(text), ''[^a-z0-9 ]'', '''')'
word_count:
sql'ARRAY_LENGTH(SPLIT(clean_text, '' ''))'Financial Calculations
generate:
revenue: price * quantity
profit_margin: (revenue - cost) / revenue * 100
performance:
case:
- profit_margin > 30: "Excellent"
- profit_margin > 20: "Good"
- profit_margin > 10: "Average"
- true: "Poor"Geospatial Calculations
generate:
distance:
sql'ST_Distance(ST_Point(long, lat), ST_Point(-74.006, 40.7128))'
is_nearby: distance < 10Best Practices
🏷️ Use clear and descriptive names for generated columns.
🧮 Break down complex calculations into multiple steps for readability.
🔍 Validate your generated columns, especially when using complex expressions.
🚀 Consider performance implications when generating columns with heavy calculations.
📊 Use
casestatements for complex conditional logic.🔄 Remember that columns generated earlier can be used in subsequent generations within the same
generateblock.
💡 Tip: The
generatefunction is your Swiss Army knife for data transformation in DUQL. Use it to create new insights, segment your data, and prepare it for deeper analysis. Don't hesitate to combine multiplegeneratesteps in your pipeline for complex transformations!
Last updated
Was this helpful?