PRQL (Pipelined Relational Query Language, pronounced “Prequel”) is a modern language for transforming data, can be compiled to SQL.
This package provides a simple function to convert a PRQL query string to a SQL string.
For example, this is a PRQL query.
And, this is the SQL query that is compiled from the PRQL query.
To compile a PRQL string, just pass the query string to the
prql_compile()
function, like this.
library(prqlr)
"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
" |>
prql_compile() |>
cat()
#> SELECT
#> cyl,
#> mpg,
#> ROUND(mpg, 0) AS mpg_int
#> FROM
#> mtcars
#> WHERE
#> cyl > 6
#>
#> -- Generated by PRQL compiler version:0.13.0 (https://prql-lang.org)
This output SQL query string can be used with already existing great packages that manipulate data with SQL.
Work with DB
Using it with the DBI package, we can execute PRQL queries against the database.
library(DBI)
# Create an ephemeral in-memory SQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")
# Create a table inclueds `mtcars` data
dbWriteTable(con, "mtcars", mtcars)
# Execute a PRQL query
"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
take 3
" |>
prql_compile("sql.sqlite") |>
dbGetQuery(con, statement = _)
#> cyl mpg mpg_int
#> 1 8 18.7 19
#> 2 8 14.3 14
#> 3 8 16.4 16
We can also use the sqldf::sqldf()
function to
automatically register Data Frames to the database.
"
from mtcars
filter cyl > 6
select {cyl, mpg}
derive {mpg_int = math.round 0 mpg}
take 3
" |>
prql_compile("sql.sqlite") |>
sqldf::sqldf()
#> Warning: no DISPLAY variable so Tk is not available
#> cyl mpg mpg_int
#> 1 8 18.7 19
#> 2 8 14.3 14
#> 3 8 16.4 16
Since SQLite is used here via RSQLite, the
target
option of prql_compile()
is set to
"sql.sqlite"
.
Available target names can be found with the
prql_get_targets()
function.
Work with R Data Frames
Using prqlr with the tidyquery package, we can execute PRQL queries against R Data Frames via dplyr.
dplyr is a very popular R package for manipulating Data Frames, and the PRQL syntax is very similar to the dplyr syntax.
Let’s run a query that aggregates a Data Frame flights
,
contained in the nycflights13 package.
library(tidyquery)
library(nycflights13)
"
from flights
filter (distance | in 200..300)
filter air_time != null
group {origin, dest} (
aggregate {
num_flts = count this,
avg_delay = (average arr_delay | math.round 0)
}
)
sort {-origin, avg_delay}
take 2
" |>
prql_compile() |>
query()
#> # A tibble: 2 × 4
#> origin dest num_flts avg_delay
#> <chr> <chr> <int> <dbl>
#> 1 LGA BUF 122 -2
#> 2 LGA PWM 273 2
This query can be written with dplyr’s functions as follows.
library(dplyr, warn.conflicts = FALSE)
library(nycflights13)
flights |>
filter(
distance |> between(200, 300),
!is.na(air_time)
) |>
group_by(origin, dest) |>
summarise(
num_flts = n(),
avg_delay = mean(arr_delay, na.rm = TRUE) |> round(0),
.groups = "drop"
) |>
arrange(desc(origin), avg_delay) |>
head(2)
#> # A tibble: 2 × 4
#> origin dest num_flts avg_delay
#> <chr> <chr> <int> <dbl>
#> 1 LGA BUF 122 -2
#> 2 LGA PWM 273 2
Note that dplyr queries can be generated by the
tidyquery::show_dplyr()
function!