Quick Start
DUQL (Dangerously Usable Query Language) is a powerful, YAML-based query language designed for data transformation and analysis. DUQL is a variant of PRQL with additional usability improvements, focusing primary on readability and the creation of UI-based query experiences. This guide will help you understand the basics of DUQL and get you started with writing your own queries.
1. Basic Query Structure
A DUQL query typically consists of the following components:
dataset: your_table_name
steps:
- step1
- step2
# ... more steps as neededdataset: Required. Specifies the main data source for your query.steps: Optional, but typically used. Defines the sequence of operations to perform on your data.
2. Simple Operators and Expressions
DUQL supports various operators for constructing expressions:
Arithmetic:
+,-,*,/,%(modulo),**(exponentiation)Comparison:
==,!=,>,<,>=,<=Logical:
&&(AND),||(OR),!(NOT)String:
+(concatenation),~=(regex match)
Example:
steps:
- filter: age > 18 && (city == "New York" || city == "Los Angeles")3. Basic Functions
DUQL includes many built-in functions for data manipulation:
Aggregate functions (These return a scalar when used within
summarize, or a column when used anywhere else.:minmaxsumaveragestddevallanyconcat_arraycount
String functions:
loweruppertrimlengthcontainsends_withextractltrimrtrimreplacestarts_withextract
Date functions:
to_text
Mathematical functions:
absacosasinatanceilcosdegreesexpfloorlnloglog10pipowradiansroundsinsqrttan
Window functions
lagleadfirstlastrankrank_denserow_number
Example:
steps:
- generate:
full_name: f"{first_name.lower} {last_name.upper}"4. Common Transformation Steps
Here are some frequently used transformation steps:
Filter
Selects rows based on a condition:
- filter: price > 100 && category == "Electronics"Generate
Creates new columns or modifies existing ones:
- generate:
total_price: price * quantity
discount:
case:
- total_price > 1000: 0.1
- total_price > 500: 0.05
- true: 0Group
Aggregates data based on specified columns:
- group:
by: [category, order_year]
steps:
- summarize:
total_sales: sum (price * quantity)
order_count: count order_idSort
Orders the results:
- sort: [-total_sales, category]Take
Limits the number of rows returned:
- take: 105. Putting It All Together
Here's an example that combines multiple steps to analyze sales data:
settings:
version: '0.0.1'
target: sql.postgres
dataset: sales
steps:
- filter: order_date >= @2023-01-01
- generate:
total_amount: price * quantity
order_year: order_date.to_text("%Y-%m-%d") | extract 1 4
- group:
by: [category, order_year]
steps:
- summarize:
total_sales: sum total_amount
order_count: count order_id
- filter: total_sales > 10000
- sort: [-order_year, -total_sales]
- take: 5
into: top_selling_categoriesThis query:
Filters for orders from 2023 onwards
Calculates the total amount for each order and extracts the year
Groups the data by category and year, summarizing total sales and order count
Filters for categories with over $10,000 in sales
Sorts the results by year (descending) and total sales (descending)
Takes the top 5 results
Stores the result in a dataset named
top_selling_categories
Next Steps
As you become more comfortable with DUQL, you can explore more advanced features such as:
Joining multiple datasets
Window functions for complex analytics
Declaring reusable variables and functions
Using subqueries and complex expressions
Remember, DUQL is designed to be intuitive and expressive. Don't hesitate to experiment with different combinations of steps and functions to achieve your data analysis goals!
Last updated
Was this helpful?