Filters
The filter
function in DUQL is used to select rows from a dataset based on specified conditions. It allows you to narrow down your data to only the records that meet certain criteria.
Syntax
filter: <condition>
Parameters
filter
string
Yes
A boolean expression representing the filter condition
Behavior
Evaluates the specified condition for each row in the dataset.
Retains only the rows where the condition evaluates to true.
Can use complex expressions involving multiple columns, functions, and operators.
Examples
Basic Comparison
filter: age > 18
Date Range
filter: (created_at | in @1776-07-04..@1787-09-17)
Numeric Range
filter: (magnitude | in 50..100)
String Matching
filter: (name.lower | =~ '%smith%') && age >= 21
Date Calculation
filter: order_date > current_date() - interval '30 days'
Using Parameters
filter: id == $1
Array Contains
filter: array_contains tags 'urgent' && status != 'completed'
Complex Conditions
filter: (revenue - costs) / revenue > 0.2 && date.year(order_date) == 2023
Nested Conditions
filter: age >= 18 && (has_parental_consent || country != "US")
Using SQL Functions
filter: sql'regexp_contains(title, ''([a-z0-9]*-){{2,}}'')'
Best Practices
๐ฏ Be as specific as possible with your filter conditions to improve query performance.
๐ Use parentheses to clarify the order of operations in complex conditions.
๐งช Test your filter conditions with sample data to ensure they produce expected results.
๐ Consider using multiple
filter
steps for complex logic, improving readability.๐ Place filters early in your query pipeline to reduce data volume for subsequent operations.
๐ข Use parameters (
$1
,$2
, etc.) for dynamic filtering when appropriate.
Real-World Use Case
Here's an example of a DUQL query that uses multiple filter
steps to analyze high-value orders from loyal customers:
dataset: orders
steps:
- filter: order_date >= @2023-01-01
- join:
dataset: customers
where: orders.customer_id == customers.id
- filter: customers.account_created_at < @2022-01-01 # Loyal customers
- join:
dataset: products
where: orders.product_id == products.id
- generate:
order_total: quantity * price
is_discounted: discount_rate > 0
- filter: order_total > 1000 # High-value orders
- group:
by: [customer_id, customers.name]
steps:
summarize:
total_spent: sum order_total
order_count: count order_id
discounted_orders: sum is_discounted
- filter: order_count >= 3 # Repeat high-value customers
- generate:
average_order_value: total_spent / order_count
discount_rate: discounted_orders / order_count
- filter: discount_rate < 0.5 # Mostly full-price purchases
- sort: -total_spent
- take: 100
into: top_loyal_customers
This query demonstrates:
Filtering orders by date
Joining with customer and product data
Filtering for loyal customers (account created before 2022)
Calculating order totals and identifying discounted orders
Filtering for high-value orders (over $1000)
Aggregating customer data
Filtering for repeat high-value customers (3 or more orders)
Calculating average order value and discount rate
Filtering for customers who mostly pay full price
Sorting and limiting the results
The resulting top_loyal_customers
dataset provides a focused view of the most valuable loyal customers who frequently make large purchases without heavy reliance on discounts.
๐ก Tip: The
filter
function is your primary tool for data selection in DUQL. Use it strategically throughout your query pipeline to focus your analysis on the most relevant data. Combine it withgenerate
andgroup
functions for powerful data insights!
Last updated
Was this helpful?