Query Structure
DUQL is a powerful and intuitive query language designed for data transformation and analysis. It provides a structured way to define complex data operations using a human-readable YAML format.
Syntax
A fully defined DUQL query has the following structure:
settings:
version: <duql_version>
target: <target_database>
declare:
<variable_declarations>
dataset: <main_data_source>
steps:
- <transformation_step_1>
- <transformation_step_2>
# ... more steps as needed
into: <output_destination>
Components
Settings
version
string
No
The version of DUQL being used
target
string
No
The target database or SQL dialect for the query
Example:
settings:
version: '0.0.1'
target: sql.postgres
Declare
The declare
section allows you to define variables, functions, or reusable query components.
Example:
declare:
active_users:
dataset: users
steps:
- filter: last_login > @2023-01-01
calculate_age: birth_date -> datediff(years, birth_date, current_date())
Dataset
The dataset
component specifies the main data source for your query. It can be a simple table name or a more complex definition.
Example:
dataset: sales_transactions
or
dataset:
name: s3://data-bucket/sales/*.parquet
format: parquet
Steps
The steps
section is where you define your data transformation pipeline. Each step represents an operation on your data.
Available steps include:
filter
: Select rows based on conditionsjoin
: Combine data from multiple sourcesselect
: Choose or compute columnsgroup
: Aggregate datasort
: Order resultstake
: Limit the number of rowsgenerate
: Create new columnsAnd more...
Example:
steps:
- filter: date > @2023-01-01
- join:
dataset: products
where: sales_transactions.product_id == products.id
- group:
by: [category, product_name]
steps:
- summarize:
total_sales: sum amount
- sort: -total_sales
- take: 10
Into
The into
component specifies the destination for your query results.
Example:
into: monthly_sales_report
Best Practices
🏗️ Structure your query logically, starting with data sources and progressing through transformations.
🧩 Use the
declare
section to create reusable components and improve query readability.📊 Leverage the power of DUQL's expressive steps to perform complex data operations.
🎯 Be explicit in your joins and filters to ensure data integrity.
🚀 Optimize your query by placing filters early in the pipeline and using appropriate indexes.
📝 Use meaningful names for variables and result sets to enhance query understandability.
🔍 Take advantage of DUQL's ability to work with various data sources and formats.
Real-World Use Case
Here's an example of a DUQL query that analyzes customer purchasing behavior:
settings:
version: '0.0.1'
target: sql.postgres
declare:
recent_customers:
dataset: customers
steps:
- filter: last_purchase > @2023-01-01
dataset: orders
steps:
- join:
dataset: recent_customers
where: orders.customer_id == recent_customers.id
- join:
dataset: products
where: orders.product_id == products.id
- generate:
total_amount: quantity * price
- group:
by: [customer_id, purchase_month, category]
steps:
- summarize:
total_spent: sum total_amount
num_orders: count order_id
- sort: [customer_id, purchase_month, -total_spent]
- generate:
customer_value:
case:
- total_spent > 1000: "High"
- total_spent > 500: "Medium"
- true: "Low"
into: customer_purchase_analysis
This query:
Defines recent customers
Joins order data with customer and product information
Calculates total amount per order
Groups and summarizes purchases by customer, month, and product category
Sorts the results
Categorizes customers based on their spending
The resulting customer_purchase_analysis
dataset provides valuable insights into customer behavior, allowing for targeted marketing strategies and improved customer relationship management.
Last updated
Was this helpful?