Skip to main content

Data formats and tensor semantics

This page explains the public format vocabulary used by InputOptions::format, OutputTensorOptions::format, tensor image metadata, and sample payload tags.

For task-level usage, start with Tensor and Sample. Come here when a graph boundary needs an explicit format contract.

Format tags

FormatTag / FormatSpec names the payload format. In Python, use pyneat.Format or pyneat.FormatTag values for format fields. Do not assign raw strings to Python format fields.

Python exposes the common user-facing format tags. Some lower-level C++ tags, such as BBOX, MLA, ARGMAX, and DETESSDEQUANT, usually appear through tensor semantic metadata, payload tags, or diagnostics rather than as assignable pyneat.Format values.

Common tags:

TagTypical payloadMeaning
RGBimagePacked RGB, 8 bits per channel.
BGRimagePacked BGR, 8 bits per channel. OpenCV uses this by default.
GRAY8image8-bit grayscale.
NV12image/videoY plane plus interleaved UV plane. Width and height must be even.
I420image/videoY, U, and V planes. Width and height must be even.
H264encodedH.264 access unit / NAL stream.
FP32tensorFloat32 tensor payload.
INT8tensorSigned INT8 tensor payload.
UINT8tensorUnsigned UINT8 tensor payload.
BF16tensorBF16 tensor payload.
BBOXdetectionPacked bounding-box payload.
ByteStreamtensor semanticsOpaque byte stream interpreted by downstream contract.

Payload families

PayloadType selects the broad family crossing a graph boundary.

Payload familyInternal/media meaningCommon metadata
Imagedecoded pixelspixel format, width, height, layout, image semantic metadata
Tensormodel or app tensordtype, shape, layout, tensor semantic metadata
Encodedencoded media such as H.264caps string, codec format, timestamps
Autoinfer when possibleuse only when tensor/sample metadata is enough

Text, audio, byte-stream, and opaque-byte payloads use tensor semantics or specialized specs. They are not separate PayloadType enum values in the public API reviewed for this release.

Raw image mapping

FormatPayloadTypeTensor layout / shapeNotes
RGBImageHWC, [H, W, 3]Dense packed pixels.
BGRImageHWC, [H, W, 3]Use for cv2.imread or OpenCV BGR frames.
GRAY8ImageHW, [H, W]Single-channel grayscale.
NV12ImageHW, [H, W] plus plane metadataComposite Y + UV planes.
I420ImageHW, [H, W] plus plane metadataComposite Y + U + V planes.

For packed formats, depth is the channel count. For tensor payloads, depth is derived from the selected layout and shape.

Read format, layout, and axis semantics together

Do not read one field in isolation:

FieldWhat it tells you
PixelFormat / image format metadataHow to interpret pixel channels, such as RGB, BGR, GRAY8, NV12, or I420.
TensorLayoutHow tensor dimensions are ordered, such as HWC, CHW, or HW.
TensorAxisSemanticWhat an axis means when the tensor carries richer semantic metadata.
TensorDTypeHow each element is stored, such as UInt8, INT8, FP32, or BF16.
ByteFormat / byte-stream metadataHow opaque bytes should be interpreted by the next stage.

Bytes are not meaning. Use the metadata fields together before you reinterpret a buffer.

InputOptions format example

simaai::neat::InputOptions input;
input.payload_type = simaai::neat::PayloadType::Image;
input.format = simaai::neat::FormatTag::BGR;
input.width = 640;
input.height = 480;

Set only the fields the boundary needs. If the tensor or sample already carries enough metadata, avoid duplicate guesses.

Advanced image/video output adapter

For normal model output, use nodes.output(...) and pull tensors with pull_tensors(...). Use OutputTensorOptions only when image or video output must be converted, resized, or rate-adjusted into a CPU-friendly UInt8 tensor before the app pulls it.

simaai::neat::OutputTensorOptions output;
output.format = simaai::neat::FormatTag::BGR;
output.target_width = 640;
output.target_height = 480;

graph.add_output_tensor(output);

add_output_tensor(...) accepts TensorDType::UInt8, which is the default output dtype. Keep the normal nodes.output(...) path for model tensors and for outputs where you want the full Sample envelope. Add explicit graph or app-side conversion when you need another dtype.

Sample payload tags

Sample::payload_tag is the preferred label for downstream consumers. It supersedes the deprecated Sample::format field.

Use payload_tag, payload_type, media_type, and caps_string together when debugging encoded media or graph boundary negotiation.

Preprocess metadata and ROI breadcrumbs

Detection decode, render, and ROI workflows need preprocessing metadata to map model-space coordinates back to source-frame coordinates.

That metadata can include:

  • target width and height;
  • scaled content width and height;
  • resize or letterbox mode;
  • padding value and geometry;
  • input and output color formats;
  • axis permutation;
  • normalization, quantization, and tessellation flags;
  • ROI windows, source image size, ROI batch size, and per-ROI affine transforms.

If boxes or masks land in the wrong place, check whether preprocessing metadata reached the decode or render stage before changing thresholds. For ROI-list preprocessing details, see Preproc ROI Lists.

See also