BEAM | [#] | number of beams in kernel beam search
DEFAULT_FLOAT | [HALF, ...]| specify the default float dtype (FLOAT32, HALF, BFLOAT16, FLOAT64, ...), default to FLOAT32
IMAGE | [1-2] | enable 2d specific optimizations
FLOAT16 | [1] | use float16 for images instead of float32
PTX | [1] | enable the specialized [PTX](https://docs.nvidia.com/cuda/parallel-thread-execution/) assembler for Nvidia GPUs. If not set, defaults to generic CUDA codegen backend.
VISIBLE_DEVICES | [list[int]]| restricts the NV/AMD devices that are available. The format is a comma-separated list of identifiers (indexing starts with 0).
JIT | [0-2] | 0=disabled, 1=[jit enabled](quickstart.md#jit) (default), 2=jit enabled, but graphs are disabled
WEBGPU_BACKEND | [WGPUBackendType_Metal, ...] | Force select a backend for WebGPU (Metal, DirectX, OpenGL, Vulkan...)
CUDA_PATH | str | Use `CUDA_PATH/include` for CUDA headers for CUDA and NV backends. If not set, TinyGrad will use `/usr/local/cuda/include`, `/usr/include` and `/opt/cuda/include`.
## Debug breakdown
Variable | Value | Description
---|---|---
DEBUG | >= 1 | Enables debugging and lists devices being used
DEBUG | >= 2 | Provides performance metrics for operations, including timing, memory usage, bandwidth for each kernel execution
DEBUG | >= 3 | Outputs buffers used for each kernel (shape, dtype and strides) and the applied optimizations at a kernel level
DEBUG | >= 4 | Outputs the generated kernel code
DEBUG | >= 5 | Displays the intermediate representation of the computation UOps (AST)
DEBUG | >= 6 | Displays the intermediate representation of the computation UOps in a linearized manner, detailing the operation sequence
DEBUG | >= 7 | Outputs the assembly code generated for the target hardware