Quick Start

DUQL (Dangerously Usable Query Language) is a powerful, YAML-based query language designed for data transformation and analysis. DUQL is a variant of PRQL with additional usability improvements, focusing primary on readability and the creation of UI-based query experiences. This guide will help you understand the basics of DUQL and get you started with writing your own queries.

1. Basic Query Structure

A DUQL query typically consists of the following components:

dataset: your_table_name

steps:
- step1
- step2
# ... more steps as needed
  • dataset: Required. Specifies the main data source for your query.

  • steps: Optional, but typically used. Defines the sequence of operations to perform on your data.

You can also include query Settings & assign the results Into a variable.

2. Simple Operators and Expressions

DUQL supports various operators for constructing expressions:

  • Arithmetic: +, -, *, /, % (modulo), ** (exponentiation)

  • Comparison: ==, !=, >, <, >=, <=

  • Logical: && (AND), || (OR), ! (NOT)

  • String: + (concatenation), ~= (regex match)

Example:

steps:
- filter: age > 18 && (city == "New York" || city == "Los Angeles")

3. Basic Functions

DUQL includes many built-in functions for data manipulation:

  • Aggregate functions (These return a scalar when used within summarize, or a column when used anywhere else.:

    • min

    • max

    • sum

    • average

    • stddev

    • all

    • any

    • concat_array

    • count

  • String functions:

    • lower

    • upper

    • trim

    • length

    • contains

    • ends_with

    • extract

    • ltrim

    • rtrim

    • replace

    • starts_with

    • extract

  • Date functions:

    • to_text

  • Mathematical functions:

    • abs

    • acos

    • asin

    • atan

    • ceil

    • cos

    • degrees

    • exp

    • floor

    • ln

    • log

    • log10

    • pi

    • pow

    • radians

    • round

    • sin

    • sqrt

    • tan

  • Window functions

    • lag

    • lead

    • first

    • last

    • rank

    • rank_dense

    • row_number

Example:

steps:
- generate:
    full_name: f"{first_name.lower} {last_name.upper}"

4. Common Transformation Steps

Here are some frequently used transformation steps:

Filter

Selects rows based on a condition:

- filter: price > 100 && category == "Electronics"

Generate

Creates new columns or modifies existing ones:

- generate:
    total_price: price * quantity
    discount: 
      case:
      - total_price > 1000: 0.1
      - total_price > 500: 0.05
      - true: 0

Group

Aggregates data based on specified columns:

- group:
  by: [category, order_year]
  steps:
  - summarize:
    total_sales: sum (price * quantity)
    order_count: count order_id

Sort

Orders the results:

- sort: [-total_sales, category]

Take

Limits the number of rows returned:

- take: 10

5. Putting It All Together

Here's an example that combines multiple steps to analyze sales data:

settings:
  version: '0.0.1'
  target: sql.postgres

dataset: sales

steps:
- filter: order_date >= @2023-01-01
- generate:
    total_amount: price * quantity
    order_year: order_date.to_text("%Y-%m-%d") | extract 1 4
- group:
    by: [category, order_year]
    steps:
    - summarize:
      total_sales: sum total_amount
      order_count: count order_id
- filter: total_sales > 10000
- sort: [-order_year, -total_sales]
- take: 5

into: top_selling_categories

This query:

  1. Filters for orders from 2023 onwards

  2. Calculates the total amount for each order and extracts the year

  3. Groups the data by category and year, summarizing total sales and order count

  4. Filters for categories with over $10,000 in sales

  5. Sorts the results by year (descending) and total sales (descending)

  6. Takes the top 5 results

  7. Stores the result in a dataset named top_selling_categories

Next Steps

As you become more comfortable with DUQL, you can explore more advanced features such as:

  • Joining multiple datasets

  • Window functions for complex analytics

  • Declaring reusable variables and functions

  • Using subqueries and complex expressions

Remember, DUQL is designed to be intuitive and expressive. Don't hesitate to experiment with different combinations of steps and functions to achieve your data analysis goals!

Last updated