Reorganizing Queries with Grouping


Abstract

Language-integrated query has attracted much attention from researchers and engineers. It enables one to write a database query with high-level abstractions, which makes it possible to compose, iterate, and reuse queries. An important issue in language-integrated query is the N+1 query problem, and Cheney et al. proposed a program-transformation approach to solve it for a core language of Microsoft's LINQ.
In our previous work, we extended their language to grouping (GROUP BY in SQL) and aggregate functions, and showed that any term can be transformed to a single SQL query. It still has a problem in that the resulting queries may be unnecessarily large and inefficient.
This paper solves the problem. Our key idea is re-organization of queries with nested control structures. While our previous work decomposes grouping into finer primitives before transformation, the new algorithm fuses nested control structures after transformation, while keeping the absence of nested data structures.
Our algorithm also eliminates correlated subqueries as much as possible, to obtain better performance. We have conducted performance measurements, which shows that our new algorithm reduces the size of generated queries and improves the performance for several examples.

See the paper "Reorganizing Queries with Grouping"(GPCE 2020) for further information.


Example

For example, we consider the products table, which has columns of product ID (pid), name, category, and price, and the orders table, which has columns of dates, client ID (cid), product ID (pid), and quantity (qty). From these tables, we consider a query computes, for each order ID, the sum of quantities ordered on 2020-01-01. We call this query \(Q({\bf simple})\), and in Quelg we can write:

\(Q({\bf simple}) = \mathcal{G}_{({\rm pid}, \alpha)}({\rm for}~(o \leftarrow {\rm table}(”{\rm orders}”)) \\ \phantom{Q({\bf simple}) = \mathcal{G}_{({\rm pid}, \alpha)}(}{\rm where}~(o.{\rm date} = \texttt{"2020-01-01"}) \\ \phantom{Q({\bf simple}) = \mathcal{G}_{({\rm pid}, \alpha)}(}{\rm yield}~o) \\ ~~~~~{\rm where}~~\alpha = \{({\rm qty}, {\rm SUM}, {\rm qty\_sum})\} \)

A query written in Quelg like the above query is represented in SQL without our SQL optimization as:

	  select x.pid as pid, sum(x.qty) as qty_sum
	  from (select o.*
                from orders as o
                where o.date = "2020-01-01") as x
          group by x.pid 

On the other hand, a query like the above SQL query is represented in SQL with our SQL optimization as:

	  select o.pid as pid, sum(o.qty) as qty_sum
	  from orders as o
          where o.date = "2020-01-01"
          group by o.pid 

Thus, we can reduce the query size by using our optimization.
We evaluated eleven queries including \(Q({\bf simple})\) in the paper. You can see eleven examples here.


Credits

The key developers are: