Skip to contents

PRQL (Pipelined Relational Query Language, pronounced “Prequel”) is a modern language for transforming data, can be compiled to SQL.

This package provides a simple function to convert a PRQL query string to a SQL string.

For example, this is a PRQL query.

from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}

And, this is the SQL query that is compiled from the PRQL query.

SELECT
  cyl,
  mpg,
  ROUND(mpg, 0) AS mpg_int
FROM
  mtcars
WHERE
  cyl > 6

To compile a PRQL string, just pass the query string to the prql_compile() function, like this.

library(prqlr)

"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
" |>
  prql_compile() |>
  cat()
#> SELECT
#>   cyl,
#>   mpg,
#>   ROUND(mpg, 0) AS mpg_int
#> FROM
#>   mtcars
#> WHERE
#>   cyl > 6
#> 
#> -- Generated by PRQL compiler version:0.11.2 (https://prql-lang.org)

This output SQL query string can be used with already existing great packages that manipulate data with SQL.

Work with DB

Using it with the DBI package, we can execute PRQL queries against the database.

library(DBI)

# Create an ephemeral in-memory SQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")

# Create a table inclueds `mtcars` data
dbWriteTable(con, "mtcars", mtcars)

# Execute a PRQL query
"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
take 3
" |>
  prql_compile("sql.sqlite") |>
  dbGetQuery(con, statement = _)
#>   cyl  mpg mpg_int
#> 1   8 18.7      19
#> 2   8 14.3      14
#> 3   8 16.4      16

We can also use the sqldf::sqldf() function to automatically register Data Frames to the database.

"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
take 3
" |>
  prql_compile("sql.sqlite") |>
  sqldf::sqldf()
#> Warning: no DISPLAY variable so Tk is not available
#>   cyl  mpg mpg_int
#> 1   8 18.7      19
#> 2   8 14.3      14
#> 3   8 16.4      16

Since SQLite is used here via RSQLite, the target option of prql_compile() is set to "sql.sqlite".

Available target names can be found with the prql_get_targets() function.

Work with R Data Frames

Using prqlr with the tidyquery package, we can execute PRQL queries against R Data Frames via dplyr.

dplyr is a very popular R package for manipulating Data Frames, and the PRQL syntax is very similar to the dplyr syntax.

Let’s run a query that aggregates a Data Frame flights, contained in the nycflights13 package.

library(tidyquery)
library(nycflights13)

"
from flights
filter (distance | in 200..300)
filter air_time != null
group {origin, dest} (
  aggregate {
    num_flts = count this,
    avg_delay = (average arr_delay | math.round 0)
  }
)
sort {-origin, avg_delay}
take 2
" |>
  prql_compile() |>
  query()
#> # A tibble: 2 × 4
#>   origin dest  num_flts avg_delay
#>   <chr>  <chr>    <int>     <dbl>
#> 1 LGA    BUF        122        -2
#> 2 LGA    PWM        273         2

This query can be written with dplyr’s functions as follows.

library(dplyr, warn.conflicts = FALSE)
library(nycflights13)

flights |>
  filter(
    distance |> between(200, 300),
    !is.na(air_time)
  ) |>
  group_by(origin, dest) |>
  summarise(
    num_flts = n(),
    avg_delay = mean(arr_delay, na.rm = TRUE) |> round(0),
    .groups = "drop"
  ) |>
  arrange(desc(origin), avg_delay) |>
  head(2)
#> # A tibble: 2 × 4
#>   origin dest  num_flts avg_delay
#>   <chr>  <chr>    <int>     <dbl>
#> 1 LGA    BUF        122        -2
#> 2 LGA    PWM        273         2

Note that dplyr queries can be generated by the tidyquery::show_dplyr() function!