Take

The take function in DUQL is used to limit the number of rows returned or to select specific ranges of rows. It's useful for pagination, sampling, or selecting top/bottom N rows.

Syntax

take: <number_or_range>

Parameters

Behavior

  • When given an integer, it returns that many rows from the beginning of the dataset.

  • When given a range, it returns the specified range of rows.

  • Can be used after sorting to get top/bottom N rows.

Examples

Take First N Rows

take: 10

Take No Rows (Useful for Schema Inspection)

take: 0

Take a Range of Rows

take: '5..10'

Take All Rows from a Specific Point

take: '100..'

Take First N Rows

take: '..50'

Best Practices

  1. 🎯 Use take in combination with sort to get meaningful subsets of data.

  2. 🔢 Consider using take: 0 to inspect the schema of your query result without processing all rows.

  3. 📊 Use range syntax for more flexible row selection.

  4. 🚀 Place take at the end of your query pipeline for best performance.

  5. 🧪 Be cautious with small take values when working with grouped or aggregated data.

Real-World Use Case

Here's an example of a DUQL query that uses take to analyze the top-selling products:

dataset: sales

steps:
- join:
    dataset: products
    where: sales.product_id == products.id
- generate:
    revenue: price * quantity
- group:
    by: [product_id, product_name, category]
    steps:
    - summarize:
          total_revenue: sum revenue
          units_sold: sum quantity
- sort: -total_revenue
- take: 20  # Top 20 products by revenue
- generate:
    average_price: total_revenue / units_sold
    rank:
      sql'ROW_NUMBER() OVER (ORDER BY total_revenue DESC)'

into: top_selling_products

This query demonstrates:

  1. Joining sales data with product information

  2. Calculating revenue

  3. Grouping and summarizing by product

  4. Sorting by total revenue

  5. Taking the top 20 products

  6. Generating additional metrics and ranking

The take: 20 step ensures that we only get the top 20 selling products, making the analysis more focused and manageable.


💡 Tip: Use the take function judiciously to control the size of your query results. It's particularly useful in combination with sorting to get top N or bottom N results quickly!

Last updated