DUQL Docs
  • Getting Started
    • Quick Start
    • Query Structure
      • Settings
      • Declare
      • Steps
      • Into
  • basic
    • Datasets
    • Filters
    • Expressions
    • Select
    • Take
    • Sort
  • Intermediate
    • Groups
    • Summarize
    • Generate
    • Join
  • Advanced
    • Windows
    • Loops
    • Experimental
      • Append
      • Distinct
      • Intersect
      • Remove
      • Except
      • Union
  • Language Comparisons
    • DUQL vs SQL
    • DUQL vs PRQL
Powered by GitBook
On this page
  • Syntax
  • Parameters
  • Behavior
  • Join Types
  • Examples
  • Basic Join
  • Left Join
  • Join with CSV File
  • Join with Subquery and Full Outer Join
  • Complex Join with Aggregation
  • Join with SQL Query
  • Join with External Data Source
  • Best Practices

Was this helpful?

Edit on GitHub
  1. Intermediate

Join

The join function in DUQL is used to combine rows from two or more tables based on a related column between them. It supports different types of joins and allows for complex join conditions.

Syntax

join:
  dataset: <data_source>
  where: <join_condition>
  retain: <join_type>

Parameters

Parameter
Type
Required
Default
Description

dataset

object

Yes

-

The data source to join with (ref: dataset.s.duql.yml)

where

string

Yes

-

The join condition

retain

string

No

"inner"

The type of join to perform

Behavior

  • Combines rows from the current dataset with rows from the specified dataset based on the join condition.

  • The dataset parameter can include its own steps for preprocessing before the join.

  • The where condition supports a == shorthand for equality joins on matching column names.

  • The retain parameter specifies the join type, defaulting to an inner join.

Join Types

Value
Description

"inner"

Returns only the matched rows (default)

"left"

Returns all rows from the left table and matched rows from the right table

"right"

Returns all rows from the right table and matched rows from the left table

"full"

Returns all rows when there is a match in either left or right table

Examples

Basic Join

join:
  dataset: customers
  steps: 
  where: orders.customer_id == customers.id

Left Join

join:
  dataset: products
  where: orders.product_id == products.id
  retain: left

Join with CSV File

join:
  dataset: myorg/employee_data.csv
  type: csv
  where: departments.department_id == employee_data.department_id
  retain: inner

Join with Subquery and Full Outer Join

join:
  dataset: recent_inventory
  steps:
  - filter: last_updated > @2023-01-01
  where: products.product_id == recent_inventory.product_id
  retain: full

Complex Join with Aggregation

join:
  dataset: customer_views
  where: customers.id == customer_views.customer_id

Join with SQL Query

join:
  dataset: sql'SELECT * FROM myexample'
  where: ==id
  retain: full

Join with External Data Source

join:
  dataset: hdfs://cluster/user_profiles/thing.parquet
  type: parquet
  where: users.id == user_profiles.user_id
  retain: left

Best Practices

  1. 🎯 Always specify the join condition explicitly for clarity.

  2. 🤔 Choose the appropriate join type using the retain parameter based on your data requirements.

  3. 🚀 Utilize the steps within the dataset parameter for preprocessing before joining.

  4. 🔍 Use the == shorthand for simple equality joins when column names match.

  5. 📊 Be mindful of data volume when performing joins, especially with large datasets or complex subqueries.

  6. 🏷️ Consider using aliases or clear naming conventions when joining multiple tables to avoid ambiguity.

The join operations allow us to bring together data from multiple sources, creating a rich dataset for analysis that includes sales performance, customer behavior, inventory status, and category-level benchmarks.


💡 Tip: The join function in DUQL is highly flexible, allowing you to combine data from various sources and formats. Leverage its ability to include steps within the dataset parameter to preprocess your data before joining, enabling more sophisticated data integration in your analyses!

PreviousGenerateNextWindows

Last updated 9 months ago

Was this helpful?