Runtime Configuration¶
Bohrium supports a broad range of front and back-ends.
The default backend is OpenMP. You can change which backend to use by defining the BH_STACK environment variable:
- The CPU backend that make use of OpenMP:
BH_STACK=openmp - The GPU backend that make use of OpenCL:
BH_STACK=opencl - The GPU backend that make use of CUDA:
BH_STACK=cude
For debug information when running Bohrium, use the following environment variables:
BH_<backend>_PROF=true -- Prints a performance profile at the end of execution.
BH_<backend>_VERBOSE=true -- Prints a lot of information including the source of the JIT compiled kernels. Enables per-kernel profiling when used together with BH_OPENMP_PROF=true.
BH_SYNC_WARN=true -- Show Python warnings in all instances when copying data to Python.
BH_MEM_WARN=true -- Show warnings when memory accesses are problematic.
BH_<backend>_GRAPH=true -- Dump a dependency graph of the instructions send to the back-ends (.dot file).
BH_<backend>_VOLATILE=true -- Declare temporary variables using `volatile`, which avoid precision differences because of Intel's use of 80-bit floats internally.
Particularly, BH_<backend>_PROF=true is very useful to explore why Bohrium might not perform as expected:
BH_OPENMP_PROF=1 python -m bohrium heat_equation.py --size=4000*4000*100
heat_equation.py - target: bhc, bohrium: True, size: 4000*4000*100, elapsed-time: 6.446084
[OpenMP] Profiling:
Fuse cache hits: 199/203 (98.0296%)
Codegen cache hits 299/304 (98.3553%)
Kernel cache hits 300/304 (98.6842%)
Array contractions: 700/1403 (49.8931%)
Outer-fusion ratio: 13/23 (56.5217%)
Max memory usage: 0 MB
Syncs to NumPy: 99
Total Work: 12800400099 operations
Throughput: 1.9235e+09ops
Work below par-threshold (1000): 0%
Wall clock: 6.65473s
Total Execution: 6.04354s
Pre-fusion: 0.000761211s
Fusion: 0.00411354s
Codegen: 0.00192224s
Compile: 0.285544s
Exec: 4.91214s
Copy2dev: 0s
Copy2host: 0s
Ext-method: 0s
Offload: 0s
Other: 0.839052s
Unaccounted for (wall - total): 0.611198s
Which tells us, among other things, that the execution of the compiled JIT kernels (Exec) takes 4.91 seconds, the JIT compilation (Compile) takes 0.29 seconds, and the time spend outside of Bohrium (Unaccounted for) takes 0.61.
OpenCL Configuration¶
Bohrium sorts all available devices by type (‘gpu’, ‘cpu’, or ‘accelerator’). Set the device number to the device Bohrium should use (0 means first):
BH_OPENCL_DEVICE_NUMBER=0
In order to see all available devices, run:
python -m bohrium_api --info
You can also set the options in the configure file under the [opencl] section.
Also under the [opencl] section, you can set the OpenCL work group sizes:
# OpenCL work group sizes
work_group_size_1dx = 128
work_group_size_2dx = 32
work_group_size_2dy = 4
work_group_size_3dx = 32
work_group_size_3dy = 2
work_group_size_3dz = 2
Advanced Configuration¶
In order to configure the runtime setup of Bohrium you must provide a configuration file to Bohrium. The installation of Bohrium installs a default configuration file in /etc/bohrium/config.ini when doing a system-wide installation, ~/.bohrium/config.ini when doing a local installation, and <python library>/bohrium/config.ini when doing a pip installation.
At runtime Bohrium will search through the following prioritized list in order to find the configuration file:
- The environment variable
BH_CONFIG - The config within the Python package
bohrium/config.ini(in the same directory as__init__.py) - The home directory config
~/.bohrium/config.ini - The system-wide config
/usr/local/etc/bohrium/config.ini - The system-wide config
/usr/etc/bohrium/config.ini - The system-wide config
/etc/bohrium/config.ini
The default configuration file looks similar to the config below:
#
# Stack configurations, which are a comma separated lists of components.
# NB: 'stacks' is a reserved section name and 'default'
# is used when 'BH_STACK' is unset.
# The bridge is never part of the list
#
[stacks]
default = bcexp, bccon, node, openmp
openmp = bcexp, bccon, node, openmp
opencl = bcexp, bccon, node, opencl, openmp
#
# Managers
#
[node]
impl = /usr/lib/libbh_vem_node.so
timing = false
[proxy]
address = localhost
port = 4200
impl = /usr/lib/libbh_vem_proxy.so
#
# Filters - Helpers / Tools
#
[pprint]
impl = /usr/lib/libbh_filter_pprint.so
#
# Filters - Bytecode transformers
#
[bccon]
impl = /usr/lib/libbh_filter_bccon.so
collect = true
stupidmath = true
muladd = true
reduction = false
find_repeats = false
timing = false
verbose = false
[bcexp]
impl = /usr/lib/libbh_filter_bcexp.so
powk = true
sign = false
repeat = false
reduce1d = 32000
timing = false
verbose = false
[noneremover]
impl = /usr/lib/libbh_filter_noneremover.so
timing = false
verbose = false
#
# Engines
#
[openmp]
impl = /usr/lib/libbh_ve_openmp.so
tmp_bin_dir = /usr/var/bohrium/object
tmp_src_dir = /usr/var/bohrium/source
dump_src = true
verbose = false
prof = false #Profiling statistics
compiler_cmd = "/usr/bin/x86_64-linux-gnu-gcc"
compiler_inc = "-I/usr/share/bohrium/include"
compiler_lib = "-lm -L/usr/lib -lbh"
compiler_flg = "-x c -fPIC -shared -std=gnu99 -O3 -march=native -Werror -fopenmp"
compiler_openmp = true
compiler_openmp_simd = false
[opencl]
impl = /usr/lib/libbh_ve_opencl.so
verbose = false
prof = false #Profiling statistics
# Additional options given to the opencl compiler. See documentation for clBuildProgram
compiler_flg = "-I/usr/share/bohrium/include"
serial_fusion = false # Topological fusion is default
The configuration file consists of two things: components and orchestration of components in stacks.
Components marked with square brackets. For example [node], [openmp], [opencl] are all components available for the runtime system.
The stacks define different default configurations of the runtime environment and one can switch between them using the environment var BH_STACK.
The configuration of a component can be overwritten with environment variables using the naming convention BH_[COMPONENT]_[OPTION], below are a couple of examples controlling the behavior of the CPU vector engine:
BH_OPENMP_PROF=true -- Prints a performance profile at the end of execution.
BH_OPENMP_VERBOSE=true -- Prints a lot of information including the source of the JIT compiled kernels. Enables per-kernel profiling when used together with BH_OPENMP_PROF=true.
Useful environment variables:
BH_SYNC_WARN=true -- Show Python warnings in all instances when copying data to Python.
BH_MEM_WARN=true -- Show warnings when memory accesses are problematic.
BH_UNSUP_WARN=false -- Do not warn when when encountering unsupported NumPy operations.
BH_<backend>_GRAPH=true -- Dump a dependency graph of the instructions send to the back-ends (.dot file).
BH_<backend>_VOLATILE=true -- Declare temporary variables using `volatile`, which avoid precision differences because of Intel's use of 80-bit floats internally.