We propose a co-evolving multi-agent framework for long-horizon video-game decision-making via unsupervised trajectory decomposition and skill-bank refinement. The framework has two components:
(a) Decision Agent AD: an LLM-based agent that interacts with the game through primitive actions and skill retrieval. At each step, it summarizes the current state, retrieves relevant skill candidates from the skill bank, updates its intention, selects or switches skills when needed, and executes an action.
(b) Skill Bank Agent AS: an LLM-based pipeline that converts unlabeled trajectories into reusable protocol-based skills and learns compact effect contracts for them. It updates the skill bank by proposing new skill candidates, refining low-quality skills, and revising skill protocols over time.
Together, the decision agent generates trajectories, and the skill bank agent transforms them into structured skills that support future decisions through skill retrieval and selection. Both agents are updated via GRPO using separate LoRA adapters in a unified co-evolution loop.