SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models

1University of Maryland, College Park, 2University of Southern California, 3University of Central Florida
SABER teaser figure

SABER applies stealthy, bounded instruction perturbations through a ReAct-style tool-calling agent to induce targeted behavioral degradation in VLA-driven robots, including task failure, action inflation, and constraint violations.

Abstract

Vision-language-action (VLA) models enable robots to follow natural-language instructions grounded in visual observations, but the instruction channel also introduces a critical vulnerability: small textual perturbations can alter downstream robot behavior. Systematic robustness evaluation therefore requires a black-box attacker that can generate minimal yet effective instruction edits across diverse VLA models.

To this end, we present SABER, an agent-centric approach for automatically generating instruction-based adversarial attacks on VLA models under bounded edit budgets. SABER uses a GRPO-trained ReAct attacker to generate small, plausible adversarial instruction edits using character-, token-, and prompt-level tools under a bounded edit budget that induces targeted behavioral degradation, including task failure, unnecessarily long execution, and increased constraint violations.

On the LIBERO benchmark across six state-of-the-art VLA models, SABER reduces task success by 20.6%, increases action-sequence length by 55%, and raises constraint violations by 33%, while requiring 21.1% fewer tool calls and 54.7% fewer character edits than strong GPT-based baselines. These results show that small, plausible instruction edits are sufficient to substantially degrade robot execution, and that an agentic black-box pipeline offers a practical, scalable, and adaptive approach for red-teaming robotic foundation models.

Method Overview

For each LIBERO task, we maintain two contrastive rollouts under a frozen target VLA. A clean baseline rollout is first executed and cached as reference. For the attack rollout, the instruction is passed to a red-team agent, which uses an LLM backbone to reason over the instruction and available tools, then performs multi-turn FINDAPPLY edits in a ReAct-style loop. The perturbation toolbox returns edited instructions from target positions and local context. The target VLA then executes the perturbed instruction to produce the attack rollout. The reward function compares the clean and attack rollouts, together with the agent's tool-use traces, to compute rewards from task outcome, action inflation, constraint violations, and stealth signals.

SABER pipeline overview

Perturbation Tool Families

SABER models adversarial instruction generation as multi-turn tool use over three complementary perturbation families, each following a two-stage FINDAPPLY protocol:

 Token-Level

Edit words or subwords. FIND returns a tokenized sequence and a brief prompt for selecting the target token and edit type (replace, remove, add, or attribute swap); APPLY performs the edit using token index(es) and replacement text.

 Character-Level

Apply typo-style edits within a word (insertion, deletion, substitution, transposition, case flip). Captures subword and OCR-like perturbations, e.g. pickplck, or mugrnug.

 Prompt-Level

Inject clauses or sentences, such as verification wraps, decomposition steps, uncertainty clauses, extra constraints, or objective injections. APPLY inserts the clause under a maximum added-token budget.

Two-Stage Training Procedure

We cold-start by caching clean baseline rollouts from target VLAs and collecting initial attack trajectories with a frozen red-team agent via lightweight random exploration over tool-calling chains. These rollouts form the cold-start dataset for SFT before GRPO training. We then perform agentic RL in interactive scenarios, where the red-team agent attacks target VLAs through tool calling and learns from reward feedback computed by comparing clean and attack rollouts, together with the agent's tool-use traces, under different attack objectives.

SABER two-stage training

Pretrained Checkpoints

We release the GRPO-trained LoRA adapters for all three attack objectives on HuggingFace. Each is a LoRA adapter (rank 8, ~75 MB) loadable with peft.

Objective HuggingFace Base Model
task_failure IntelligenceLab/saber-attack-agent-task-failure Qwen/Qwen2.5-3B-Instruct
action_inflation IntelligenceLab/saber-attack-agent-action-inflation Qwen/Qwen2.5-3B-Instruct
constraint_violation IntelligenceLab/saber-attack-agent-constraint-violation Qwen/Qwen2.5-3B-Instruct

Key Highlights

20.6%

Task Success Reduction

55%

Action Sequence Inflation

33%

Constraint Violation Increase

54.7%

Fewer Character Edits

Evaluated on the LIBERO manipulation benchmark across six state-of-the-art VLA models: π0-LIBERO, π0.5, GR00T-N1.5, X-VLA, InternVLA-M1, and DeepThinkVLA. SABER achieves stronger behavior-level attacks at lower cost than GPT-based baselines, requiring 21.1% fewer tool calls and 54.7% fewer character edits.

Experimental Results

Task Failure (ASR)

Attack success rate for task failure, computed as Base TER − Attack TER (%).

Victim VLA LIBERO-Spatial LIBERO-Object LIBERO-Goal LIBERO-Long Overall
Base TER↑Atk TER↓ASR↑ Base TER↑Atk TER↓ASR↑ Base TER↑Atk TER↓ASR↑ Base TER↑Atk TER↓ASR↑ Base TER↑Atk TER↓ASR↑
π0-LIBERO100.086.713.3100.080.020.0100.053.346.766.740.026.791.765.026.7
π0.5100.093.36.793.393.30.0100.053.346.793.380.013.396.780.016.7
GR00T-N1.5100.093.36.7100.0100.00.0100.053.346.793.386.76.698.383.315.0
X-VLA93.380.013.393.373.320.0100.066.733.360.046.713.386.766.720.0
InternVLA-M193.386.76.6100.093.36.7100.046.753.386.773.313.495.075.020.0
DeepThinkVLA86.780.06.7100.093.36.7100.033.366.793.373.320.095.070.025.0
Average95.686.78.997.888.98.9100.051.148.982.266.715.593.973.320.6

Action Inflation (AIR)

Action inflation ratio Δ|a| = |aattack| / |abase|.

Victim VLA LIBERO-Spatial LIBERO-Object LIBERO-Goal LIBERO-Long Overall
Base |a|Atk |a|AIR↑ Base |a|Atk |a|AIR↑ Base |a|Atk |a|AIR↑ Base |a|Atk |a|AIR↑ Base |a|Atk |a|AIR↑
π0-LIBERO119.3220.71.85139.2233.91.68101.3230.02.27363.0457.41.26180.7285.51.58
π0.5112.2173.91.55151.1226.71.50105.1196.51.87346.5391.51.13178.7247.21.38
GR00T-N1.5133.7514.73.85129.7220.51.7098.3378.53.85343.5346.91.01176.3365.22.07
X-VLA157.8189.41.20189.7187.80.99126.5261.92.07431.3504.61.17226.3285.91.26
InternVLA-M1114.3192.01.68143.8204.21.4295.1255.82.69320.9327.31.02168.5244.81.45
DeepThinkVLA125.0197.51.58137.4186.91.3698.1255.12.60326.3421.91.29171.7265.41.55
Average127.0248.01.95148.5210.01.44104.1263.02.56355.2408.31.15183.7282.31.55

Constraint Violation (CVI)

Constraint violation inflation ΔCV = CVattack / CVbase.

Victim VLA LIBERO-Spatial LIBERO-Object LIBERO-Goal LIBERO-Long Overall
Base CVAtk CVCVI↑ Base CVAtk CVCVI↑ Base CVAtk CVCVI↑ Base CVAtk CVCVI↑ Base CVAtk CVCVI↑
π0-LIBERO550.7326.20.59711.5838.31.18309.6624.52.021039.61269.11.22652.9764.51.17
π0.5570.9549.80.96681.4699.01.03260.3439.11.69863.91336.21.55594.1756.01.27
GR00T-N1.5599.9702.71.17644.1595.70.92258.9862.33.33918.8778.10.85605.4734.71.21
X-VLA838.31356.31.62828.3725.70.88347.1885.02.551145.31171.01.02789.81034.51.31
InternVLA-M1475.9130.30.27639.3493.00.77232.9495.32.13681.91550.32.27507.5667.21.31
DeepThinkVLA572.4759.71.33607.9729.01.20220.2915.74.16827.11509.71.83556.9978.51.76
Average601.4637.51.06685.4680.10.99271.5703.72.59912.81269.11.39617.8822.61.33

Comparison and Ablation

Compared with a frozen GPT-5 mini attacker using the same tool-calling interface, SABER achieves consistent gains on objective-aligned metrics while being substantially more efficient and stealthy. RL training not only improves attack effectiveness, but also learns higher-leverage perturbation strategies.

SABER vs. GPT-5 mini

Method Tool Calls↓ Char Edits↓ ASR↑ AIR↑ CVI↑
GPT-5 mini3.93168.814.51.371.25
SABER3.1076.4616.71.381.27

Cold-Start Ablation

Training Tool Calls↓ Char Edits↓ Base TER Atk TER ASR↑
GRPO Only2.7611.7896.788.08.7
SFT + GRPO3.1076.4696.780.016.7

Tool Usage by Objective

Mean tool calls and character edits per episode across LIBERO suites.

Objective Spatial Object Goal Long Overall
CallsEdits CallsEdits CallsEdits CallsEdits CallsEdits
Task Failure2.9713.43.3413.22.8010.32.9715.13.0213.0
Action Inflation2.98126.73.26114.03.36130.33.05117.33.16122.1
Constraint Violation3.3489.32.3465.01.6750.02.3451.72.4264.0

Contributions

  1. Problem formulation: We identify the need for a general-purpose automated attacker for VLA systems and formulate instruction-only black-box attacks as a constrained optimization problem over robot behavioral objectives under bounded edit budgets.
  2. Agentic attack methodology: We propose SABER, where a single GRPO-trained ReAct agent adaptively composes character-, token-, and prompt-level perturbations without gradient access to the target model or model-specific redesign.
  3. Comprehensive evaluation: We evaluate on the LIBERO manipulation benchmark across six state-of-the-art VLA models and three attack objectives, showing average degradation of 20.6% in task success, a 55% increase in action-sequence length, and a 33% increase in constraint violations.
  4. Efficiency over baselines: Compared with GPT-5 mini baselines, SABER achieves stronger behavior-level attacks at lower cost, requiring 21.1% fewer tool calls and 54.7% fewer character edits.

BibTeX

@article{wu2026saber,
  title={SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models},
  author={Wu, Xiyang and Shi, Guangyao and Wang, Qingzi and Li, Zongxia and Bedi, Amrit Singh and Manocha, Dinesh},
  journal={arXiv preprint arXiv:2603.24935},
  year={2026}
}