Skip to main content

Run / Inference

If you are new to SiMa.ai Neat, the shortest path to a prediction is two lines: load a model, then run it.

  1. Load a compiled model archive (.tar.gz) with Model.
  2. Call model.run(inputs, timeout_ms) to run inference synchronously and get output tensors back.

That is the whole workflow for a single model. Reach for a Graph when one model on its own is not enough: chaining stages, decoupling producers and consumers with async push / pull, or controlling queueing.

What the examples assume

Use the snippets on this page as shapes, not as a whole app. A runnable app also needs:

  • a compiled model artifact from the Model Compiler, copied to the machine that runs Neat;
  • Python: import pyneat;
  • C++: #include "neat.h";
  • C++ builds: find_package(SimaNeat REQUIRED CONFIG) and SimaNeat::sima_neat;
  • an image or tensor fixture that matches the model contract.

Replace example paths such as resnet_50_model.tar.gz with your model artifact. The specs are the contract; inspect them before you allocate real input.

Choose the runtime path

Use the smallest surface that matches the work. No ceremony tollbooths.

If you need to...UseWhy
Run one compiled model onceModel.run(...)Fastest smoke test for artifact, input, and output contract.
Compose a model with app nodesGraphNames the boundary and makes topology visible.
Run one graph request/responseGraph.run(...)One-shot graph execution without managing a long-lived run.
Keep a graph alive for many inputsgraph.build(...)RunReuses the runtime and exposes push/pull, close, drain, and measurement.
Tune multiple streams or max throughputRunOptions, try_push(...), MeasureReportQueue policy and counters tell you what actually happened under load.

For setup, model archives, and command context, see the Tutorials preflight checklist.

Run a model directly

Load the model and call run(...). It executes synchronously. No Graph, no Run, no runtime loop.

simaai::neat::Model model("resnet_50_model.tar.gz");

cv::Mat frame = /* your OpenCV BGR frame */;
simaai::neat::Tensor input = simaai::neat::Tensor::from_cv_mat(
frame,
simaai::neat::ImageSpec::PixelFormat::BGR,
simaai::neat::TensorMemory::CPU);

simaai::neat::TensorList outputs = model.run(
simaai::neat::TensorList{input},
/*timeout_ms=*/1000);

// outputs[0] holds the first result; read its bytes with outputs[0].map_read().

In Python, pass a list or tuple of inputs. model.run([tensor]) means “one model input,” not “add a batch dimension.”

For a complete, runnable version, see Run Your First Model.

Compose a Graph when you need more

A Graph wraps one or more model stages plus your own nodes into a runtime flow you build into a Run. Reach for it when you need to:

  • chain multiple models or pre/post-processing stages;
  • decouple producers and consumers with asynchronous push / pull; or
  • control queueing, overflow, and metrics with RunOptions.

Run a graph once

For request/response execution, use Graph.run(...).

simaai::neat::Model model("resnet_50_model.tar.gz");

simaai::neat::Graph graph("classifier");
graph.add(simaai::neat::nodes::Input("image"));
graph.add(model);
graph.add(simaai::neat::nodes::Output("classes"));

cv::Mat frame = /* your frame (RGB/BGR as configured) */;
auto out = graph.run(std::vector<cv::Mat>{frame});

Build a reusable Run

Use a reusable Run when you want to decouple producers and consumers, control queueing, or overlap I/O and compute. Use push(...) / pull(...) with RunOptions to tune queueing and drop behavior.

simaai::neat::Model model("resnet_50_model.tar.gz");

simaai::neat::Graph graph("classifier");
graph.add(simaai::neat::nodes::Input("image"));
graph.add(model);
graph.add(simaai::neat::nodes::Output("classes"));

cv::Mat frame = /* your frame */;

auto run = graph.build();
run.push("image", std::vector<cv::Mat>{frame});
auto out = run.pull("classes", /*timeout_ms=*/1000);

For lifecycle, backpressure, multistream throughput, measurement, and run export, continue to Run a Graph.

C++ and Python return shapes

The APIs line up, but Python uses explicit lists or tuples to disambiguate single input vs. multiple input ports. A bare Tensor or Sample is rejected on purpose.

OperationC++ returnPython returnNotes
Model::run(TensorList) or Model::run(std::vector<cv::Mat>)TensorListmodel.run([tensor]) returns a tensor listUse for normal tensor/image input.
Model::run(Sample)Samplemodel.run([sample]) returns a SampleUse when you need sample metadata or bundles.
Graph::run(TensorList) or Graph::run(std::vector<cv::Mat>)TensorListgraph.run([tensor]) returns a tensor listOne-shot graph request/response.
Graph::run(Sample)Samplegraph.run([sample]) returns a SampleUse for sample-backed graph input.
Run::run(TensorList) or Run::run(std::vector<cv::Mat>)TensorListrun.run([tensor]) returns a tensor listReusable request/response on a live Run.
Run::run(Sample)Samplerun.run([sample]) returns a SampleUse for bundles, stream IDs, frame IDs, or metadata.
Run::pull(...)std::optional<Sample>Sample or NoneNone means no sample arrived before timeout or the output is closed.
Run::pull_tensors(...)TensorListTensor listUse when you only want tensor payloads.
Run::pull_samples(...)SampleSampleUse when absence is exceptional and you want strict sample output.

Learn the concepts

  • Model: model archive loading and model-driven graph fragments.
  • Graph: assembly, validation, and run/build entry point.
  • Run a Graph: runtime options, push/pull loops, multistream throughput, and measurement.
  • Node: atomic stages, pre-built groups, and graph boundary nodes.
  • Tensor and Sample: payload vs metadata envelope.

Tutorials