Agents Application

Agent systems, tool use, memory, AI research workflows, and reusable skill ecosystems.

Research category

Agent systems, tool use, memory, AI research workflows, and reusable skill ecosystems.

13Papers

24Resource links

2026.05Latest month

1 paper

Tool Use

2025.12 Tool Use

Thinking with Programming Vision: Towards a Unified View for Thinking with Images

This paper identifies brittleness in current multimodal tool-using reasoning under simple image orientation changes and corruptions, and proposes CodeVision, a code-as-tool framework that lets models invoke arbitrary image operations through generated code. It combines SFT and RL with dense process rewards to improve multi-tool reasoning, execution efficiency, and error recovery on thinking-with-images tasks.

Paper Code

3 papers

AI Research

2026.05 AI Research

AI for Auto-Research: Roadmap & User Guide

This survey analyzes AI-assisted research across creation, writing, validation, and dissemination, showing where automation is reliable and where autonomy still fails on novelty, experiments, and scientific judgment. It provides a lifecycle taxonomy, benchmark suite, tool inventory, design principles, and practitioner playbook for human-governed AI research workflows.

Paper Project Code

2026.05 AI Research

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

This paper proposes Crafter, a multi-agent harness for generating publication-style scientific figures across multiple figure types and input conditions, and CraftEditor for converting raster outputs into editable SVGs. It also introduces CraftBench, a human-annotated benchmark for scientific figure generation, and shows gains over standalone generators and agentic baselines.

Paper Code Hugging Face

2026.03 AI Research

AIRA_2: Overcoming Bottlenecks in AI Research Agents

This paper introduces AIRA_2, an AI research agent architecture that addresses limited experiment throughput, noisy validation-based selection, and static single-turn operators. It combines asynchronous multi-GPU workers, Hidden Consistent Evaluation, and interactive ReAct agents to improve long-horizon research task performance.

Paper

3 papers

Agent Skills

2026.05 Agent Skills

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

SkillsVote addresses noisy and hard-to-govern agent trajectories by treating Agent Skills as reusable experience artifacts with collection, recommendation, attribution, and evolution controls. It profiles large-scale open-source skill corpora, recommends structured skill context before execution, and admits only evidence-gated successful discoveries to improve frozen agents without model updates.

Paper Project Code

2026.04 Agent Skills

From Context to Skills: Can Language Models Learn from Context Skillfully?

Ctx2Skill addresses context learning for long, dense contexts where manual skill annotation is costly and automated skill construction lacks external feedback. It uses a multi-agent self-play loop with Cross-time Replay to autonomously discover, refine, and select reusable natural-language skills that improve solving rates across language models.

Paper Code Hugging Face

2026.03 Agent Skills

SkillReducer: Optimizing LLM Agent Skills for Token Efficiency

This paper presents SkillReducer, a two-stage optimization framework that compresses LLM agent skills (pre-packaged instruction sets) by 48% for descriptions and 39% for body while improving functional quality by 2.8%, reducing token costs and attention dilution in agent contexts.

Paper

2 papers

Agent Development

2026.03 Agent Development

Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization

This paper proposes Nurture-First Development, a paradigm for growing domain-expert agents through structured conversational interaction rather than fixed code-first or prompt-first construction. It formalizes a Knowledge Crystallization Cycle, Three-Layer Cognitive Architecture, Dual-Workspace Pattern, and Spiral Development Model for continuously converting tacit practitioner knowledge into reusable agent assets.

Paper

2026.01 Agent Development

Controlled Self-Evolution for Algorithmic Code Optimization

This paper proposes EvoControl, a controlled self-evolution framework for algorithmic code optimization that balances correctness with exploration across generate-verify-refine cycles. It uses staged self-evolution, genetic-style population search, and evolutionary memory to improve code quality on challenging algorithmic benchmarks.

Paper Code

4 papers

Memory

2026.03 Memory

Trajectory-Informed Memory Generation for Self-Improving Agent Systems

This paper presents a framework that extracts actionable learnings from LLM agent execution trajectories and retrieves them as contextual memory for future tasks. It combines trajectory intelligence extraction, decision attribution, contextual learning generation, and adaptive memory retrieval to improve AppWorld task completion, especially on complex scenarios.

Paper

2025.10 Memory

Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs

This paper introduces BEAM, a benchmark of long, coherent conversations and probing questions for evaluating long-term memory in LLMs, and proposes LIGHT, a memory framework with episodic memory, working memory, and a scratchpad. Together, they expose long-context memory limitations and improve performance on long-horizon conversational reasoning tasks.

Paper

2025.08 Memory

MLP Memory: A Retriever-Pretrained Memory for Large Language Models

This paper introduces MLP Memory, a lightweight parametric module that learns to internalize retrieval patterns by pretraining an MLP to imitate a kNN retriever's behavior, bridging the gap between RAG and fine-tuning approaches.

Paper

2025.07 Memory

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

This paper introduces MemAgent, a multi-conversation RL-based memory agent that addresses the challenge of handling infinitely long documents with linear complexity without performance degradation during extrapolation.

Paper Project