# Generate

The `generate` function in DUQL is used to create new columns or modify existing ones based on expressions, calculations, and conditional logic. It's a powerful tool for data transformation and enrichment within your query pipeline.

## Syntax

```yaml
generate:
  <new_column_name>: <expression>
```

## Parameters

| Parameter  | Type   | Required | Description                                                                |
| ---------- | ------ | -------- | -------------------------------------------------------------------------- |
| `generate` | object | Yes      | Key-value pairs where keys are new column names and values are expressions |

## Behavior

* Creates new columns or modifies existing ones based on the provided expressions.
* Can reference other columns, use functions, and implement conditional logic.
* Executes in the order specified within the `generate` block.

## Examples

### Basic Column Generation

```yaml
generate:
  full_name: f"{first_name} {last_name}"
```

### Boolean Flags

```yaml
generate:
  is_northern: latitude > 0
  should_have_shipped_today: order_time < @08:30
```

### Conditional Logic

```yaml
generate:
  discount:
    case:
    - total_purchases > 1000: 0.15
    - total_purchases > 500: 0.10
    - total_purchases > 100: 0.05
    - true: 0
```

### Complex Calculations

```yaml
generate:
  bmi: weight / (height / 100) ^ 2
  health_status:
    case:
    - bmi < 18.5: "Underweight"
    - bmi < 25: "Normal"
    - bmi < 30: "Overweight"
    - true: "Obese"
```

### Using SQL Functions

```yaml
generate:
  clean_text: 
    sql'REGEXP_REPLACE(LOWER(text), ''[^a-z0-9 ]'', '''')'
  word_count:
    sql'ARRAY_LENGTH(SPLIT(clean_text, '' ''))'
```

### Financial Calculations

```yaml
generate:
  revenue: price * quantity
  profit_margin: (revenue - cost) / revenue * 100
  performance:
    case:
    - profit_margin > 30: "Excellent"
    - profit_margin > 20: "Good"
    - profit_margin > 10: "Average"
    - true: "Poor"
```

### Geospatial Calculations

```yaml
generate:
  distance: 
    sql'ST_Distance(ST_Point(long, lat), ST_Point(-74.006, 40.7128))'
  is_nearby: distance < 10
```

## Best Practices

1. 🏷️ Use clear and descriptive names for generated columns.
2. 🧮 Break down complex calculations into multiple steps for readability.
3. 🔍 Validate your generated columns, especially when using complex expressions.
4. 🚀 Consider performance implications when generating columns with heavy calculations.
5. 📊 Use `case` statements for complex conditional logic.
6. 🔄 Remember that columns generated earlier can be used in subsequent generations within the same `generate` block.

***

> 💡 **Tip:** The `generate` function is your Swiss Army knife for data transformation in DUQL. Use it to create new insights, segment your data, and prepare it for deeper analysis. Don't hesitate to combine multiple `generate` steps in your pipeline for complex transformations!


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.duql.dev/intermediate/generate.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
