Groups
The group function in DUQL is a powerful tool for aggregating and analyzing data based on specified columns. It supports a wide range of operations, allowing for complex data transformations and analyses within grouped data.
Syntax
group:
by: <grouping_columns>
<additional_steps>Parameters
by
string or array
Yes
Columns to group by. Can be a single column, an array of columns, or 'table.*' for all columns of a table.
additional_steps
object
Yes (at least one)
One or more additional transformation steps to apply to the grouped data.
Additional Steps
The group function can include any of the following steps after grouping:
summarize: Perform aggregations on the grouped data.filter: Apply conditions to filter the grouped results.generate: Create new columns based on grouped data.join: Join the grouped data with another dataset.select: Choose specific columns from the grouped results.sort: Order the grouped results.take: Limit the number of grouped results returned.window: Perform window functions on the grouped data.loop: Perform iterative operations on the grouped data.
Examples
Basic Grouping with Aggregation and Filtering
group:
by: department
steps:
- summarize:
avg_salary: average salary
employee_count: count employee_id
- filter: avg_salary > 50000
- sort: -avg_salary
- take: 5This example groups employees by department, calculates average salary and employee count, filters for departments with high average salaries, sorts by average salary descending, and takes the top 5 results.
Complex Business Logic with Case Statements
group:
by: product_id
steps:
- generate:
stock_status:
case:
- inventory_count == 0: "Out of Stock"
- inventory_count < reorder_point: "Low Stock"
- true: "In Stock"
filter: stock_status != "In Stock"
sort: [stock_status, product_id]This query categorizes products based on their inventory status and filters for those needing attention.
Best Practices
🎯 Choose grouping columns carefully to ensure meaningful aggregations.
📊 Use
summarizeorgenerateto calculate aggregate metrics.🔍 Leverage
filterafter aggregation to focus on important results.📈 Utilize
windowfunctions for advanced analytics within groups.🔢 Use
sortandtaketo prioritize and limit results for better performance.🧮 Combine multiple steps for complex analyses in a single, readable query.
Related Functions
summarize: Often used withingroupto perform aggregations.filter: Can be used before or after grouping to refine the dataset.generate: Useful for creating new columns based on grouped data.window: Enables advanced analytical functions within grouped data.
Limitations and Considerations
Grouping operations can be computationally expensive on large datasets. Use indexing strategies on grouping columns when possible.
Be mindful of the order of operations within the
groupfunction, as it can affect the final results.Some combinations of steps might not be logically valid or may produce unexpected results. Always test your queries thoroughly.
💡 Tip: The
groupfunction is the cornerstone of data analysis in DUQL. By combining it with various additional steps, you can perform complex aggregations, transformations, and analyses all within a single, readable query structure. Experiment with different combinations to unlock deeper insights from your data!
Last updated
Was this helpful?