Configure NES Worker

NebulaStream provides two possibilities to configure Worker.

  1. Providing configuration over CLI.
  2. Providing configuration using YAML file.

CLI Configurations

Key Default Description
–localWorkerIp 127.0.0.1 Set worker ip
–coordinatorIp 127.0.0.1 Set the server IP of the NES Coordinator to which the NES Worker should connect
–coordinatorPort 0 Set the RPC server Port of the NES Coordinator to which the NES Worker should connect. Needs to be set and needs to be the same as RPC port of the Coordinator. If it is set to 0, the worker will not start.
–rpcPort 0 Set the RPC server port of the NES Worker. If it is set to 0, the operating system will choose one for us.
–dataPort 0 Set the data port of the NES Worker. If it is set to 0, the operating system will choose one for us.
–numberOfSlots Number of processors installed in the host machine Set the number of computing slots in the NES Worker
–numWorkerThreads 1 Set the number of worker threads
–numberOfBuffersInGlobalBufferManager 1024 Number buffers in global buffer pool
–numberOfBuffersPerPipeline 128 Number buffers in task local buffer pool
–numberOfBuffersInSourceLocalBufferPool 64 Number buffers in source local buffer pool
–bufferSizeInBytes 4096 Size of buffer to pass data through the system
–parentId “-1” Set the parentId of this node
–queryCompilerCompilationStrategy FAST, DEBUG, OPTIMIZE Selects the optimization level for query code compilation. Fast omits all optimizations, Debug prints debug output and omits optimizations, Optimizes applies compiler optimizations i.e. O3
–queryCompilerPipeliningStrategy OPERATOR_FUSION, OPERATOR_AT_A_TIME Selects the pipelining strategy. Op OPERATOR_FUSION fuses operators if possible
–queryCompilerOutputBufferOptimizationLevel ALL, NO, ONLY_INPLACE_OPERATIONS_NO_FALLBACK, REUSE_INPUT_BUFFER_AND_OMIT_OVERFLOW_CHECK_NO_FALLBACK, REUSE_INPUT_BUFFER_NO_FALLBACK, OMIT_OVERFLOW_CHECK_NO_FALLBACK Selects the output buffer optimization strategy
–enableMonitoring true Enable monitoring - collection of metrics

YAML Configuration

A user may create a configuration file for starting the Worker and pass it as input using CLI using –configPath=<path to config file>. A template Worker configurations file looks as follows:

# ~~~ Configurations for the NES Worker ~~~
 
# ~~ IPs and Ports ~~
# Set worker IP
localWorkerIp: 127.0.0.1
 
# Set IP address of NES Coordinator to which the NES Worker should connect
coordinatorIp: 127.0.0.1
 
# Set RPC Server port of NES Coordinator to which the NES Worker should connect
# Needs to be the same as rpcPort in Coordinator
coordinatorPort: 4000
 
# Set RPC server port of NES Worker
rpcPort: 3000
 
# Set the data port of the NES Worker
dataPort: 3001
 
# Set number of computing slots in NES Worker
numberOfSlots: 65535
 
# Set the number of worker threads
numWorkerThreads: 1
 
# Number buffers in global buffer pool
numberOfBuffersInGlobalBufferManager: 1024
 
# Number buffers in task local buffer pool
numberOfBuffersPerWorker: 128
 
# Number buffers in source local buffer pool
numberOfBuffersInSourceLocalBufferPool: 64
 
# Size of buffer to pass data through the system
bufferSizeInBytes: 4096
 
# Set parentId of NES Worker node
parentId: -1
 
# The log level (LOG_NONE, LOG_WARNING, LOG_DEBUG, LOG_INFO, LOG_TRACE)
logLevel: LOG_DEBUG
 
# Comma separated list of where to map the sources
sourcePinList:

# Comma separated list of where to map the worker
workerPinList:

# Enable Numa-Aware execution
numaAwareness: false
 
# Enable monitoring
enableMonitoring: false

queryCompiler:
  # Indicates the optimization strategy for the query compiler [FAST|DEBUG|OPTIMIZE]
  compilationStrategy: OPTIMIZE
 
  # Indicates the pipelining strategy for the query compiler [OPERATOR_FUSION|OPERATOR_AT_A_TIME]
  pipeliningStrategy: OPERATOR_FUSION
 
  # Indicates the OutputBufferAllocationStrategy [ALL|NO|ONLY_INPLACE_OPERATIONS_NO_FALLBACK,
  # |REUSE_INPUT_BUFFER_AND_OMIT_OVERFLOW_CHECK_NO_FALLBACK,|REUSE_INPUT_BUFFER_NO_FALLBACK|OMIT_OVERFLOW_CHECK_NO_FALLBACK]
  outputBufferOptimizationLevel: ALL
 
  # Indicates the windowingStrategy [DEFAULT|THREAD_LOCAL]
  windowingStrategy: DEFAULT
 
# set the location of a node if it is a field node (format: "<lat>, <lng>")
# standard location settings will result in an invalid location and create a node without a location
locationCoordinates: 200.0, 200.0
 
# ~~~ Physical Stream Configurations ~~~
 
# A Physical Stream may be associated with multiple Instances of a logicalStreamName, a physicalStreamName and a SourceType
# If no Physical Stream Config is given, the worker will be created without a physical stream
# Available source Types: DefaultSource, CSVSource, MQTTSource, SenseSource, OPCSource (ToDo: Make source available),
#                         ZMQSource (ToDo: Make source available), KafkaSource (ToDo: Make source available),
#                         BinarySource (ToDo: Make source available))
physicalSources:
  # Set logical stream name where this stream is added to
  - logicalSourceName: default_source_log
    # Set physical stream name
    physicalSourceName: default_source_phy
    # Define source type, also need to specify source configurations for source type below
    type: CSV
    # CSVSource and its needed configuration params
    configuration:
      ###### Define only the following configurations for CSV stream
      # Set file path
      filePath: /tests/test_data/exdra.csv
      # Skip first line of the file
      skipHeader: false
      # Set delimiter, e.g. ',' or '.'
      delimiter: ","
      # Set number of buffers to produce, i.e. how often the read csv file is repeated
      numberOfBuffersToProduce: 1
      # Set number of tuples to produce per buffer
      numberOfTuplesToProducePerBuffer: 1
      # Set sampling frequency of source
      sourceFrequency: 1
 
      ###### Define only the following configurations for DefaultSource stream
      # Set number of buffers to produce, i.e. how often the default data is repeated for this source
      #numberOfBuffersToProduce: 1
      ###### Define only the following configurations for BinarySource stream
      # Set number of buffers to produce, i.e. how often the default data is repeated for this source
      #filePath: 1
      ###### Define only the following configurations for SenseSource stream
      # Set number of buffers to produce, i.e. how often the default data is repeated for this source
      #udfs: 1
      ###### Define only the following configurations for MQTTSource stream
      # Set url to connect to
      #url:
      # Set clientId
      #clientId:
      # Set userName
      #userName:
      # Set topic to listen to
      #topic:
      # set quality of service
      #qos: 2
      # set cleanSession true = clean up session after client loses connection, false = keep data for client after connection loss (persistent session)
      #cleanSession: true
      # set tupleBuffer flush interval in milliseconds
      #flushIntervalMS: -1
      # Set input data format
      #inputFormat: JSON
      ###### Define only the following configurations for KafkaSource stream
      # Set kafka broker string
      #brokers:
      # Set auto commit, boolean value where 1 equals true, and 0 equals false
      #autoCommit: 1
      # Set groupId
      #groupId:
      # Set topic to listen to
      #topic: "test"
      # Set connection time out for source
      #connectionTimeout: 10
      ###### Define only the following configurations for OPCSource stream
      # Set namespaceIndex for node, needed for: OPCSource
      #namespaceIndex:
      # Set node identifier, needed for: OPCSource
      #nodeIdentifier:
      # Set userName, needed for: OPCSource (can be chosen arbitrary), OPCSource
      #userName:
      # Set password, needed for: OPCSource
      #password:
 
worker-configurations.txt · Last modified: 2022/02/26 22:24 by 134.96.191.189
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki