2026.05
AI Research
AI for Auto-Research: Roadmap & User Guide
This survey analyzes AI-assisted research across creation, writing, validation, and dissemination, showing where automation is reliable and where autonomy still fails on novelty, experiments, and scientific judgment. It provides a lifecycle taxonomy, benchmark suite, tool inventory, design principles, and practitioner playbook for human-governed AI research workflows.
2026.05
AI Research
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs
This paper proposes Crafter, a multi-agent harness for generating publication-style scientific figures across multiple figure types and input conditions, and CraftEditor for converting raster outputs into editable SVGs. It also introduces CraftBench, a human-annotated benchmark for scientific figure generation, and shows gains over standalone generators and agentic baselines.
2026.03
AI Research
AIRA_2: Overcoming Bottlenecks in AI Research Agents
This paper introduces AIRA_2, an AI research agent architecture that addresses limited experiment throughput, noisy validation-based selection, and static single-turn operators. It combines asynchronous multi-GPU workers, Hidden Consistent Evaluation, and interactive ReAct agents to improve long-horizon research task performance.
2026.05
Agent Skills
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution
SkillsVote addresses noisy and hard-to-govern agent trajectories by treating Agent Skills as reusable experience artifacts with collection, recommendation, attribution, and evolution controls. It profiles large-scale open-source skill corpora, recommends structured skill context before execution, and admits only evidence-gated successful discoveries to improve frozen agents without model updates.
2026.04
Agent Skills
From Context to Skills: Can Language Models Learn from Context Skillfully?
Ctx2Skill addresses context learning for long, dense contexts where manual skill annotation is costly and automated skill construction lacks external feedback. It uses a multi-agent self-play loop with Cross-time Replay to autonomously discover, refine, and select reusable natural-language skills that improve solving rates across language models.
2026.03
Agent Skills
SkillReducer: Optimizing LLM Agent Skills for Token Efficiency
This paper presents SkillReducer, a two-stage optimization framework that compresses LLM agent skills (pre-packaged instruction sets) by 48% for descriptions and 39% for body while improving functional quality by 2.8%, reducing token costs and attention dilution in agent contexts.
2 papers
Agent Development
2026.03
Agent Development
Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization
This paper proposes Nurture-First Development, a paradigm for growing domain-expert agents through structured conversational interaction rather than fixed code-first or prompt-first construction. It formalizes a Knowledge Crystallization Cycle, Three-Layer Cognitive Architecture, Dual-Workspace Pattern, and Spiral Development Model for continuously converting tacit practitioner knowledge into reusable agent assets.
2026.01
Agent Development
Controlled Self-Evolution for Algorithmic Code Optimization
This paper proposes EvoControl, a controlled self-evolution framework for algorithmic code optimization that balances correctness with exploration across generate-verify-refine cycles. It uses staged self-evolution, genetic-style population search, and evolutionary memory to improve code quality on challenging algorithmic benchmarks.
2026.03
Memory
Trajectory-Informed Memory Generation for Self-Improving Agent Systems
This paper presents a framework that extracts actionable learnings from LLM agent execution trajectories and retrieves them as contextual memory for future tasks. It combines trajectory intelligence extraction, decision attribution, contextual learning generation, and adaptive memory retrieval to improve AppWorld task completion, especially on complex scenarios.
2025.10
Memory
Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs
This paper introduces BEAM, a benchmark of long, coherent conversations and probing questions for evaluating long-term memory in LLMs, and proposes LIGHT, a memory framework with episodic memory, working memory, and a scratchpad. Together, they expose long-context memory limitations and improve performance on long-horizon conversational reasoning tasks.
2025.08
Memory
MLP Memory: A Retriever-Pretrained Memory for Large Language Models
This paper introduces MLP Memory, a lightweight parametric module that learns to internalize retrieval patterns by pretraining an MLP to imitate a kNN retriever's behavior, bridging the gap between RAG and fine-tuning approaches.
2025.07
Memory
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
This paper introduces MemAgent, a multi-conversation RL-based memory agent that addresses the challenge of handling infinitely long documents with linear complexity without performance degradation during extrapolation.