Query Re-Write Phase

The Query Re-Write Phase is executed to either optimize the input query plan for execution by either changing the order of operators without changing the semantic information of the query or by adding new operators to the original query plan. When invoked, the phase applies several query re-write rules one after the other. Each of these rules takes a query plan as input and returns a modified query plan as output. We explain different query re-write rules below.

Filter Push-Down Rule

This rule is responsible for pushing down the filter operator as close to the source as possible. The idea behind this rule is that the earlier we filter data less data transmission costs we will incur. Therefore, in this rule, for each filter operator in a given query plan, we try to push the operator as close to the source operator as possible without changing the query semantics. As shown in the below figure, the filter operator is pushed closer to the car and truck source respectively because the attribute used in the filter predicate is not changed by the upstream operator chain. Additionally, note that in the process the operator was replicated.

Another example, as shown below, is when a filter can't be pushed below a certain operator. This usually happens when the attribute used in the predicate is changed by the upstream operator and thus any further push down of the filter operator can lead to the change in the semantic information of the query plan.

Let's take a look at the query for the above query plan:

Q1 = Source(“Car”).map(“speed” = “speed” * 50).map(“ticket” = true).filter(“speed”>100).sink(Print());

Since the attribute used in the predicate is “speed” and it is not modified by the second map operator, the filter can be pushed below the second map operator. The resulting query will look as follow:

Q1 = Source(“Car”).map(“speed” = “speed” * 50).filter(“speed”>100).map(“ticket” = true).sink(Print());

However, the source operator can't be pushed below the first map operator as the attribute used by the filter operator is being manipulated by the map operator and if we try to push the filter below the map operator then the semantic of the query will change.

Logical Source Expansion Rule

The logical expansion rule is used for expanding the logical source operator using the physical source information from the Stream Catalog. The idea of this re-write rule is to add information about the physical source operators by replicating a logical source and its corresponding downstream operators. Following is how the expansion rule works:

  1. First we identify for a logical source in the query plan the number of physical sources in the stream catalog.
  2. Then we identify the logical downstream operator chain till we encounter first n-ary operator or a sink operator.
  3. Then we replicate this operator chain the number of times the total number of physical source operator for the logical source operator exists in the catalog.
  4. We assign each replicated operator a new operator id and keep the parent-child operator relation the same as in the original query.

We show the result of running the Logical Source Expansion Rule below using two different queries, one with the n-ary operator and other without n-ary operator.