**General Concepts:**

**Components:**

**Features:**

**Configuration:**

**HowTo's:**

—-

**API's:**

**Client Applications:**

**Testing:**

**Functionality:**

## Table of Contents## Query Placement Phase
**Path-Selection**Phase: is responsible for preparing a topology sub-graph where operators from the query plan are to be placed.**Operator-Assignment**Phase: is responsible for assigning operators to the physical nodes on the topology sub-graph obtained from the previous phase. The operators are assigned to the physical node only if the resource available on the node is more than the resource required for placing the operator.
Once, both of the phases get completed successfully, the system generated Network Sink/Source operators are added in the intermediate physical nodes to enable the transfer of tuple streams from one physical node to another. We will explain various placement strategies using the following example: ## Bottom-Up StrategyIn the Bottom-Up Strategy, we try to place the operators closer to the location where its upstream operator is placed.
We first present how the - For each logical source in the query plan, we first find the sub-graph for each physical source connecting the physical source node and the sink node. (See below)
- We then compute the occurrence matrix for all the nodes from the respective sub-graphs identified for the physical source node for each logical source. (See below)
- We then prune the sub-graphs for the nodes with multiple parents using following rules:
- Whenever we encounter more than one parent for a logical node then (a.) We list all the nodes starting from the parent till the sink node and then (b.) We then compute the weight for the parent using the formula W = Sum(occurrence count for the nodes found in (a.))/number of nodes.
- We select the parent with max(W) and remove the other nodes from the sub-garph.
- We then merge the pruned sub-graphs of each logical source together.
- Then we merge together the sub-graphs for different logical sources together.
After the The input query graph is already processed using Logical Source Expansion Rule before we run the operator-assignment phase.
The next phase is Following are the steps: - We start from any source operator in the query plan and find the physical node in the topology sub-graph for the assignment (as shown below).
(**Note**: all the source and sink operators are also called pinned operators because they have a predefined location for assignment. If the node doesn't have the resources available for the pinned operator assignment then the placement is marked as failed.)
- Once the operator is assigned, the following steps are performed for all the parent operators:
- If the Operator is not an N-ary or not a Sink operator then:
- The operator is assigned to the same node where its child operator is placed if there are enough resources available on the node.
- If this is not the case then:
- If no physical node is found then the placement is marked for failure.
- If the operator is an N-ary operator then:
- It is checked if all its child operators are already assigned to a node or not.
- If the operator is a Sink operator then
- From the pinned operator location catalog its physical location is identified and the assignment is performed.
- If the physical node has no resources then the placement is marked for failure.
- Once, all operators are assigned, then the Network source and sink operators are placed appropriately to allow stream data to reach to the query's sink operator. (As shown below, the circles in diferent colors are Network source and Sink operators trnasmitting data from different physical source locations).
## Top-Down Strategy## IFCOP StrategyInfrastructure-aware Fog-Cloud Operator Placement Strategy (IFCOP) is a cost-based operator placement strategy for NebulaStream. IFCOP aims to minimize the processing-time latency of query execution. To achieve that, IFCOP optimizes the operator placement by iteratively generating candidates and evaluating them with a cost function. At the end of the iteration, IFCOP returns the candidate having the lowest cost. ## Candidate RepresentationIFCOP uses a 2D binary matrix to represent an operator placement candidate. The matrix has a size of number of nodes number of operators. Following is an example of a candidate: Each row in the matrix represents a placement decision in a topology node. By having 1 in a cell (i,j) we decide to place the operator j to the node i. For example, in cell (5,6) we set the decision to 1 to indicate that we place the filter operator in Node 6. ## Candidate GenerationTo generate a candidate, IFCOP iterates through source operators in the query plan and performs placement for one source at a time. Following are the steps to place the operators: - The placement starts by selecting a source node in the topology to place the current source operator. IFCOP chooses the last source node which does not have the source operator for the current operator.
- After that, IFCOP draw a random binary decision whether to place the next operator in the query plan in the current topology node. If that is the case, IFCOP places the parent of the current operator in the current topology node. Otherwise, IFCOP moves the topology iterator to the parent of the current topology node.
- In the next topology node, IFCOP repeats the random drawing of the decision on whether to place an operator in the current topology node, and succeeding steps (step 2-3).
- Once the topology iterator reaches the sink node, IFCOP checks if the operator iterator is a sink node. If that is not the case, IFCOP places all remaining operators in the sink node. Otherwise, IFCOP places the sink operator in the sink node.
IFCOP repeats these steps for all source operators in the query to place the all operators in the query plan. ## Cost FunctionIFCOP uses a cost function to select the best candidate that can minimize processing-time latency. The cost function of IFCOP consists of network cost and node over-utilization cost. - Network Cost
The network cost defines the relative amount of data transferred over the topology as a result of applying an operator placement. To compute the overall cost, we need to compute a local cost of a node and take into account the child cost of that node. - Node Over-utilization cost
The node overutilization defines the excess number of operators placed in a topology node with regards to its capacity. We compute the node over-utilization cost as follows: overutilization cost = (min(0, node.capacity - node.load)) - Total Cost
The total cost is computed as a weight sum of network cost and node over-utilization cost. Costtotal = w1 * network cost + w2 * overutilization cost ## OptimizationTo select the best operator placement candidate, IFCOP performs a fixed number of iterations. In each iteration, IFCOP generates an operator placement candidate and then checks its cost. If the cost is less then the current best candidate, IFCOP update the best candidate with the current candidate. At the end of the iteration, IFCOP returns the latest value of the best operator placement candidate. |