← Back to papers

Payload Operation using On-Board Vision Language Model

Ms. Meirav Nevo — AI Engineer
ImageSat International (ISI)
Technology AI/ML in Satellite Data Missions

Schedule

Poster Wednesday, May 27, 2026 · 10:00 AM · Posters Area – Kiosk 2

Abstract

The rapid maturation of multimodal vision-language models (VLMs) has significantly expanded artificial intelligence applications at the edge, including spaceborne systems. In this work, we present one of the first demonstrations of deploying a compact vision-language model directly on a very-high-resolution Earth-observation satellite payload processor [1], enabling autonomous on-orbit scene understanding and near real-time decision-making. This represents a paradigm shift in satellite operations, moving from ground-centric processing toward intelligent, self-directed spacecraft.

Conventional Earth-observation missions rely on ground-based image processing and interpretation, introducing latency, downlink bandwidth constraints, and limited operational responsiveness. By embedding a VLM on board the satellite, visual reasoning can be performed in situ, allowing the spacecraft to autonomously interpret imagery, generate semantic descriptions, and prioritize data for downlink based on mission-relevant features without continuous ground intervention [2].

To validate this concept, we integrated Google’s Gemma-3n, a lightweight multimodal vision-language model, onto an NVIDIA Jetson Orin-based payload processing system within a proprietary, ruggedized, very-high-resolution satellite computer architecture. We demonstrate that 200 TOPS AI performance can be achieved under space-relevant power and thermal constraints. Sufficient computational margin was verified to support inference on the Gemma-3n model (~3.8B parameters) while maintaining total power consumption compatible with constrained on-board resources (~15W). Through custom quantization and model-level optimizations [3], end-to-end inference latencies below two seconds were achieved on satellite imagery acquired in orbit.

This work substantiates the feasibility of spaceborne edge multimodal AI beyond single-task CNN pipelines, and provides a practical path toward autonomous constellation operations such as rapid response to emergent events, cooperative inter-satellite tasking, and resilient on-orbit intelligence when ground connectivity is constrained.

Keywords: Vision-Language Models, On-Orbit Processing, Edge AI, Autonomous Satellites, NVIDIA Jetson Orin, Multimodal Intelligence

Authors

  • Mr. Oshri Fatkiev — CV Engineer
    ImageSat International (ISI)
  • Mr. Guy Zaidman — CV Engineer
    ImageSat International (ISI)
  • Dr. Doron Shterman — CTO
    ImageSat International (ISI)