Pipelined Relational Query Language, pronounced “Prequel”

PRQL is a modern language for transforming data

— a simple, powerful, pipelined SQL replacement

          from invoices
filter invoice_date >= @1970-01-16
derive {
  transaction_fees = 0.8,
  income = total - transaction_fees
}
filter income > 1
group customer_id (
  aggregate {
    average total,
    sum_income = sum income,
    ct = count total,
  }
)
sort {-sum_income}
take 10
join c=customers (==customer_id)
derive name = f"{c.last_name}, {c.first_name}"
select {
  c.customer_id, name, sum_income
}
derive db_version = s"version()"

        

Why PRQL?

For data engineers

  • PRQL is concise, with abstractions such as variables & functions
  • PRQL is database agnostic, compiling to many dialects of SQL
  • PRQL isn’t limiting — it can contain embedded SQL where necessary
  • PRQL has bindings to most major languages (and more are in progress)
  • PRQL allows for column lineage and type inspection (in progress)

For analysts

  • PRQL is ergonomic for data exploration — for example, commenting out a filter, or a column in a list, maintains a valid query
  • PRQL is simple, and easy to understand, with a small number of powerful concepts
  • PRQL allows for powerful autocomplete, type-checking, and helpful error messages (in progress)

For tools

  • PRQL’s vision is a foundation to build on; we’re open-source and will never have a commercial product
  • PRQL is growing into a single secular standard which tools can target
  • PRQL is easy for machines to read & write

For HackerNews enthusiasts

  • The PRQL compiler is written in Rust
  • We talk about “orthogonal language features” a lot

Showcase

PRQL consists of a curated set of orthogonal transformations, which are combined together to form a pipeline. That makes it easy to compose and extend queries. The language also benefits from modern features, such syntax for dates, ranges and f-strings as well as functions, type checking and better null handling.

Playground Book
from employees
select {id, first_name, age}
sort age
take 10
SELECT
  id,
  first_name,
  age
FROM
  employees
ORDER BY
  age
LIMIT
  10
from track_plays
filter plays > 10_000                # Readable numbers
filter (length | in 60..240)         # Ranges with `..`
filter recorded > @2008-01-01        # Simple date literals
filter released - recorded < 180days # Nice interval literals
sort {-length}                       # Concise order direction
SELECT
  *
FROM
  track_plays
WHERE
  plays > 10000
  AND length BETWEEN 60 AND 240
  AND recorded > DATE '2008-01-01'
  AND released - recorded < INTERVAL 180 DAY
ORDER BY
  length DESC
from employees
# `filter` before aggregations...
filter start_date > @2021-01-01
group country (
  aggregate {max_salary = max salary}
)
# ...and `filter` after aggregations!
filter max_salary > 100_000
SELECT
  country,
  MAX(salary) AS max_salary
FROM
  employees
WHERE
  start_date > DATE '2021-01-01'
GROUP BY
  country
HAVING
  MAX(salary) > 100000
from track_plays
derive {
  finished = started + unfinished,
  fin_share = finished / started,        # Use previous definitions
  fin_ratio = fin_share / (1-fin_share), # BTW, hanging commas are optional!
}
SELECT
  *,
  started + unfinished AS finished,
  (started + unfinished) / started AS fin_share,
  (started + unfinished) / started / (1 - (started + unfinished) / started)
   AS fin_ratio
FROM
  track_plays
from web
# Just like Python
select url = f"https://www.{domain}.{tld}/{page}"
SELECT
  CONCAT('https://www.', domain, '.', tld, '/', page) AS url
FROM
  web
from employees
group employee_id (
  sort month
  window rolling:12 (
    derive {trail_12_m_comp = sum paycheck}
  )
)
SELECT
  *,
  SUM(paycheck) OVER (
    PARTITION BY employee_id
    ORDER BY
      month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
  ) AS trail_12_m_comp
FROM
  employees
let celsius_to_fahrenheit = temp -> temp * 9/5 + 32

from weather
select temp_f = (celsius_to_fahrenheit temp_c)
SELECT
  temp_c * 9 / 5 + 32 AS temp_f
FROM
  weather
# Most recent employee in each role
# Quite difficult in SQL...
from employees
group role (
  sort join_date
  take 1
)
WITH table_0 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY role
      ORDER BY
        join_date
    ) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_0
WHERE
  _expr_0 <= 1
# There's no `version` in PRQL, but s-strings
# let us embed SQL as an escape hatch:
from x
derive db_version = s"version()"
SELECT
  *,
  version() AS db_version
FROM x
from employees
join b=benefits (==employee_id)
join side:left p=positions (p.id==employees.employee_id)
select {employees.employee_id, p.role, b.vision_coverage}
SELECT
  employees.employee_id,
  p.role,
  b.vision_coverage
FROM
  employees
  JOIN benefits AS b ON employees.employee_id = b.employee_id
  LEFT JOIN positions AS p ON p.id = employees.employee_id
from users
filter last_login != null
filter deleted_at == null
derive channel = channel ?? "unknown"
SELECT
  *,
  COALESCE(channel, 'unknown') AS channel
FROM
  users
WHERE
  last_login IS NOT NULL
  AND deleted_at IS NULL
prql target:sql.mssql  # Will generate TOP rather than LIMIT

from employees
sort age
take 10
SELECT
  *
FROM
  employees
ORDER BY
  age OFFSET 0 ROWS
FETCH
  FIRST 10 ROWS ONLY

Principles

Pipelined

A PRQL query is a linear pipeline of transformations

Each line of the query is a transformation of the previous line’s result. This makes it easy to read, and simple to write.

Simple

PRQL serves both sophisticated engineers and analysts without coding experience.

By providing a small number of powerful & orthogonal primitives, queries are simple and composable — there’s only one way of expressing each operation. We can eschew the debt that SQL has built up.

Open

PRQL is open-source, with an open community

PRQL will always be fully open-source and will never have a commercial product. By compiling to SQL, PRQL is compatible with most databases, existing tools, and programming languages that manage SQL. We’re a welcoming community for users, contributors, and other projects.

Extensible

PRQL is designed to be extended, from functions to language bindings

PRQL has abstractions which make it a great platform to build on. Its explicit versioning allows changes without breaking backward-compatibility. And in the cases where PRQL doesn’t yet have an implementation, it allows embedding SQL with s-strings.

Analytical

PRQL’s focus is analytical queries

PRQL was originally designed to serve the growing need of writing analytical queries, emphasizing data transformations, development speed, and readability. We de-emphasize other SQL features such as inserting data or transactions.

Pipelines in action

Integrations

Jupyter/IPython

pyprql contains a Jupyter extension, which executes a PRQL cell against a database. It can also set up an in-memory DuckDB instance, populated with a pandas DataFrame.

ClickHouse

ClickHouse natively supports PRQL with

SET dialect = 'prql'

Visual Studio Code

Extension with syntax highlighting and live SQL compilation.

Prefect

Add PRQL models to your Prefect workflows with a single function.

DuckDB

A DuckDB extension to execute PRQL

PostgreSQL

Write PRQL functions in PostgreSQL

Tools

Playground

Online in-browser playground that compiles PRQL to SQL as you type.

pyprql

Provides Jupyter/IPython cell magic and Pandas accessor.

pip install pyprql

prqlc

A CLI for PRQL compiler, written in Rust.

cargo install prqlc

brew install prqlc

winget install prqlc

Bindings

prqlc-python

Python bindings for prqlc.

prql-js

JavaScript bindings for prqlc.

prqlr

R bindings for prqlc.

prqlc

Compiler implementation, written in Rust. Compile, format & annotate PRQL queries.

Others

Java, C, C++, Elixir, .NET, and PHP have unsupported or nascent bindings.

What people are saying

It starts with FROM, it fixes trailing commas, and it's called PRQL?? If this is a dream, don't wake me up. — Jeremiah Lowin, Founder & CEO, Prefect.

Hamilton Ulmer

@hamiltonulmer
very excited for prql!
2
13:03 · May 6, 2022
Twitter

Armin Ronacher

@mitsuhiko
Oh wow I missed this. Clickhouse now supports PRQL: https://github.com/ClickHouse/ClickHouse/pull/50686
49
13:03 · Jul 25, 2022
Twitter

Swanand.

@_swanand
A few years ago, I started working on a language, called "dsql", short for declarative SQL, and a pun on "the sequel (to SQL)". I kinda chickened out of it then, the amount of study and research I needed was massive. prql here is better than I imagined: github.com/max-sixty/prql
20
13:18 · Jan 25, 2022
Twitter

Rishabh Software

@RishabhSoft
SQL's hold on data retrieval is slipping! 8 new databases are emerging, and some speak entirely new languages for data querying. Know more infoworld.com/article/365490… #SQL #DataQuery #GraphQL #PRQL #WebAssembly
0
16:32 · Apr 13, 2022
Twitter

Burak Emir

@burakemir
I want to give the PRQL a little boost here, "pipeline of transformations" is IMHO a good choice for readable query languages that need to deal with SQL-like aggregations, group by and count and sum all: github.com/max-sixty/prql
2
12:52 · Jan 25, 2022
Twitter