Steps
The steps
component in DUQL represents a series of data transformation operations applied sequentially to your dataset. It's the heart of DUQL's query structure, providing a flexible and intuitive way to express complex data manipulations as a pipeline of operations.
Syntax
Behavior
Each step in the pipeline takes the output of the previous step as its input.
Steps are executed in the order they are listed.
The pipeline is flexible, allowing you to add, remove, or reorder steps easily.
You can use any DUQL function as a step in the pipeline.
Supported Step Types
The steps
pipeline can include any combination of the following operations:
filter
: Select rows based on conditionsgenerate
: Create new columns or modify existing onesgroup
: Aggregate datajoin
: Combine data from multiple sourcesselect
: Choose or compute columnssort
: Order resultstake
: Limit the number of rowswindow
: Perform window functionsappend
: Combine datasets by adding rowsremove
: Exclude specific rows or subsetsintersect
: Find common rows between datasetsdistinct
: Remove duplicate rowsunion
: Combine datasets, removing duplicatesexcept
: Find rows in one dataset but not in anotherloop
: Perform iterative processing
Flexibility and Power
The steps
pipeline in DUQL offers several advantages:
Modularity: Each step performs a specific operation, making queries easy to understand and maintain.
Reusability: You can easily copy, paste, and modify steps between queries.
Readability: The sequential nature of steps makes the data transformation process clear and logical.
Incremental Development: You can build complex queries step by step, testing and refining as you go.
Easy Debugging: You can comment out or remove steps to isolate issues in your query.
Flexibility: Steps can be easily reordered or modified to change the query logic.
Examples
Basic Pipeline
Complex Data Transformation
Best Practices
🔍 Start with a
filter
step to reduce data volume early in the pipeline.🧮 Use
generate
steps to create new columns needed for analysis.🔗 Place
join
steps early in the pipeline, but after initial filtering.📊 Use
group
steps for aggregations, followed bygenerate
for post-aggregation calculations.🏷️ Use
select
towards the end of the pipeline to choose final output columns.🔢 End with
sort
andtake
steps to order and limit final results.📝 Comment your steps for complex queries to explain the transformation logic.
Real-World Use Case
Here's an example of a DUQL query that uses a complex steps
pipeline to perform a cohort analysis:
This query demonstrates:
Filtering recent data
Generating cohort and time-based columns
Aggregating user actions
Joining with a subquery to get initial user counts
Calculating retention rates
Sorting results
Selecting and renaming final columns
Filtering to show only the first year of data for each cohort
The steps
pipeline allows for this complex analysis to be expressed clearly and logically, making it easy to understand and modify the cohort analysis process.
💡 Tip: Think of the
steps
pipeline as a data assembly line. Each step performs a specific task, gradually shaping your data into the final form you need. Don't hesitate to experiment with the order and composition of steps to achieve your desired result!
Last updated
Was this helpful?