Perception, reasoning, and action in the physical and spatial world, powered by an open-ended multimodal ecosystem of tools spanning 2D, 3D, world models, and beyond.
Mix and match any combination of expert tools. Add or remove tools at runtime with a single function call.
Integrate any tool into the ecosystem. Define a schema, plug it in, and the agent will use it automatically.
Purpose-built prompts for 3D spatial understanding. Grounded perception in complex physical environments.
Built-in reinforcement learning pipeline. Train agents with tool-calling rewards.
An open-ended ecosystem spanning 2D perception, 3D reconstruction, video generation, and beyond.
High-accuracy monocular depth estimation for dense depth maps from a single image.
Promptable image and video segmentation with fast, precise masks and tracking.
Open-vocabulary object detection driven by natural-language prompts and referring expressions.
A small, fast vision-language model for captioning, visual Q&A, and lightweight visual reasoning.
Real-time open-vocabulary detection and segmentation with annotation, tracking, and visualization utilities.
3D point cloud reconstruction from single or multiple images, with Pi3X adding smoother metric-scale outputs.
Feed-forward multi-view 3D reconstruction with camera pose, depth, and geometry prediction in one pass.
Universal metric 3D reconstruction for dense point clouds, depth, poses, and multi-view geometry.
Cinematic text-to-video and image-to-video generation with audio and strong creative control.
Text-to-video and image-to-video generation for realistic, dynamic scenes with strong prompt fidelity.
from spagent import SPAgent from spagent.models import GPTModel from spagent.tools import DepthEstimationTool, SegmentationTool # Create model and tools model = GPTModel(model_name="gpt-4o-mini") tools = [ DepthEstimationTool(), SegmentationTool() ] # Create agent and solve agent = SPAgent(model=model, tools=tools) result = agent.solve_problem( "image.jpg", "Analyze depth relationships and main objects" ) print(result['answer'])
Agent logic, tool registry, prompt system, data collection
Modular expert implementations with client/server architecture
Supports leading open-source and closed-source models worldwide.
Reinforcement learning with supervised fine-tuning
Open-source and ready to use. Deploy expert tools, connect your model, and reason about the physical world.