This site describes how to run an E2E benchmark. This benchmark is running in either "scale-up" mode, i.e., the run just on the coordinator, or in the "scale-out" mode, i.e., they run one nesWorker and one nesCoordinator.

General Design

To compile the most efficient version of NES with all optimizations, use the following cmake:


Note that, if you want to have the latency histogram, use the flag: -DNES_BENCHMARKS_DETAILED_LATENCY_MEASUREMENT but only then as this slows down the system significantly

The E2e benchmarks are built around the files in benchmark/src/E2EBenchmarks/.

  • E2ERunner.cpp: the starting executable which is compiled to ./e2e-benchmark-runner
  • E2EBenchmarkConfig.cpp: the base file that covers all benchmarking paramters
  • E2EBase.cpp: the base class that covers all benchmark related configurations and setups
  • Scripts


The runner can be executed directly by specifying the config file you want to run

./e2e-benchmark-runner --configPath=exploratoryConfigs/changingLocalBufferSize.yaml

Note that on a large server with NUMA it would be good to run it with numactl

numactl -N 0 -m 0 ./e2e-benchmark-runner --configPath=exploratoryConfigs/changingLocalBufferSize.yaml

Additionally, you can specify the parameters via the command line (note each not specified parameter will take the default value)

./e2e-benchmark-runner --numberOfWorkerThreads=3

E2ERunner on MacOS

On MacOS, shared libraries are searched for in a more restricted way than on Linux. For NES to be able to link to itself (libnes.dylib) during query compilation add the NES build directory to this environment variable via the command line or CLions run configuration (e2e-benchmark-runner). This is necessary for every outside application starting NES.



The config file provides the necessary settings for the benchmark run is build up like:

# ~~~ Configurations for the NES E2E Benchmark ~~~

# Parameter that are changed per run with comma separated list
numberOfWorkerThreads: 1 //number of working threads to use
numberOfSources: 1 //number of sources you want to use

# engine configuration parameter
numberOfBuffersInGlobalBufferManager: 65536 //number of global buffers, allocation space of nes is this * bufferSize
numberOfBuffersPerPipeline: 1024 // number of exclusive buffers per pipeline before the pipeline reaches out to the global buffer pool
numberOfBuffersInSourceLocalBufferPool: 1024 // number of exclusive buffers per source, NOTE: the  source will block if they are exhausted
bufferSizeInBytes: 4096 //size of each buffer in Bytes

##benchmark parameter for the entire run
inputOutputMode: Auto //Input mode specifies which source is used, I do not recommend to change it :)
outputFile: changingBufferSize.csv // the file where output is written
benchmarkName: changingBufferSize // the name of the benchmark in the output file
query: 'Query::from("input").sink(NullOutputSinkDescriptor::create());' //the query you want to submit to the benchmark

#benchmark internal parameter
numberOfBuffersToProduce: 5000000 //numbers of buffers to produce, just make sure that this is a high number such that the benchmark does not stop because of this
scalability: scale-up // modus is either scale-up (only worker) or scale-out (worker and coordinator)
logLevel: LOG_NONE // the log modus like specified in the logger.hpp
experimentMeasureIntervalInSeconds: 1 // don't change this, the benchmark will gather statistics every second
startupSleepIntervalInSeconds: 3 // time to let the system to get to a steady-state
numberOfMeasurementsToCollect: 5 // runtime of the measurement of the benchmark

The above file contains parameters for one run. However, the framework offers possibilities for specifying multiple runs. Parameters in the sections can be be comma-separated values like 1,2,3,4 or ranges like 1-5-1 with start(inclusive)-end(exclusive)-step_size:

  • Parameter that are changed per run with comma-separated list: numberOfWorkerThreads,numberOfSources
  • Engine configuration parameter: numberOfBuffersInGlobalBufferManager,numberOfBuffersPerPipeline,numberOfBuffersInSourceLocalBufferPool,bufferSizeInBytes

Note that, the framework has an automatic filling procedure, e.g., if you specify only

numberOfWorkerThreads: 1,2,3
numberOfSources: 1 //will become numberOfSources: 1,1,1

the framework will extend all other parameters to also have 3 values so it will take the last value and copy it n times


The E2EBase contains all functionality to run the benchmarks, usually, you don't have to adjust it. The functions are:

  • E2EBase() -- configuration via constructor
  • setup() -- setup the entire benchmark
  • setupSources() -- setup the for all sources
  • runQuery() -- run the current configuration
  • recordStatistics() -- this function gathers the statistics from the engine once per second
  • getResult() -- assemble the final result


The most important script is which creates the current like shown in To run this you might need some packages, I used anaconda to install them (

If the results were created just go to the folder where the csv's are in and run


The most important options are:

  • withLatencyHistogram = True //set this to true if you measure with DNES_BENCHMARKS_DETAILED_LATENCY_MEASUREMENT to create the latency histogram
  • folder = "./" //set the folder here if you want to lead the csv files from a different folder, other than the folder where the python file is

In essence, the results in are read like this (note that this can be outdated :))

  • The subtitle shows the default config, each column changes one of these values
  • The x-axis shows the value that changes
  • The y-axis shows:
    • First row shows the throughput in tuples per second
    • Second row shows the throughput in MB/s
    • Third row shows the avg latency (sum up all latency values for the entire run and divide them by the cnt)
    • Fourth row is optional and show the histogram of latency for a selected subset of the configs (only worker and sources change)
  • The columns show the following configs
    • 1: changing buffer size
    • 2: changing the number of buffers in the global buffer pool
    • 3: changing the number of buffers in the local buffer pool (here we give pipeline and sources the same number of buffers)
    • 4: changing the number of sources that produce data (the lines indicate the worker cnt, see legend Wrk-X)
    • 5: changing the number of workers that consume data (the lines indicate the source count, see legend Src-X)
    • 6: changing worker and src at the same time for a query that does not process
    • 7: changing worker and src at the same time for a query with low selectivity of 10%
    • 8: changing worker and src at the same time for a query with med selectivity of 50%
    • 9: changing worker and src at the same time for a query with low selectivity of 90%
  • The legend shows in colors the worker or the src count, depending on what changes
how_to_run_e2e_benchmarks.txt · Last modified: 2021/11/29 14:42 by
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki