SPAgent — Agent in the Physical & Spatial World

Capabilities

Modular Tool System

Mix and match any combination of expert tools. Add or remove tools at runtime with a single function call.

Open Tool Integration

Integrate any tool into the ecosystem. Define a schema, plug it in, and the agent will use it automatically.

Spatial Reasoning

Purpose-built prompts for 3D spatial understanding. Grounded perception in complex physical environments.

RL Training

Built-in reinforcement learning pipeline. Train agents with tool-calling rewards.

Supported tools

An open-ended ecosystem spanning 2D perception, 3D reconstruction, video generation, and beyond.

2D Perception

Depth Anything V2

High-accuracy monocular depth estimation for dense depth maps from a single image.

:20019

SAM 2

Promptable image and video segmentation with fast, precise masks and tracking.

:20020

Grounding DINO

Open-vocabulary object detection driven by natural-language prompts and referring expressions.

:20022

Moondream

A small, fast vision-language model for captioning, visual Q&A, and lightweight visual reasoning.

:20024

YOLO-E / Supervision

Real-time open-vocabulary detection and segmentation with annotation, tracking, and visualization utilities.

local

3D Reconstruction

Pi3 / Pi3X

3D point cloud reconstruction from single or multiple images, with Pi3X adding smoother metric-scale outputs.

:20030 / :20031

VGGT

Feed-forward multi-view 3D reconstruction with camera pose, depth, and geometry prediction in one pass.

:20032

MapAnything

Universal metric 3D reconstruction for dense point clouds, depth, poses, and multi-view geometry.

:20033

Video Generation

Veo

Cinematic text-to-video and image-to-video generation with audio and strong creative control.

API

Sora

Text-to-video and image-to-video generation for realistic, dynamic scenes with strong prompt fidelity.

API

Quick start

from spagent import SPAgent
from spagent.models import GPTModel
from spagent.tools import DepthEstimationTool, SegmentationTool

# Create model and tools
model = GPTModel(model_name="gpt-4o-mini")
tools = [
    DepthEstimationTool(),
    SegmentationTool()
]

# Create agent and solve
agent = SPAgent(model=model, tools=tools)
result = agent.solve_problem(
    "image.jpg",
    "Analyze depth relationships and main objects"
)
print(result['answer'])

Architecture

SPAgent Core

Agent logic, tool registry, prompt system, data collection

Tools

Modular expert implementations with client/server architecture

Models

Supports leading open-source and closed-source models worldwide.

Training

Reinforcement learning with supervised fine-tuning

SPAgent is a foundation agent for the physical & spatial world