Prompt optimization, evaluator prompting, prompt ensembles, and test-time prompt learning.
3Papers
4Resource links
2025.12Latest month
1 paper
Prompt Optimization
2025.07Prompt Optimization
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
GEPA introduces a prompt optimizer that uses natural language reflection to learn high-level rules from trial and error, outperforming GRPO by 6% on average with up to 35x fewer rollouts. It also beats MIPROv2 by over 10% and shows promising results as an inference-time search strategy for code optimization.
Becoming Experienced Judges: Selective Test-Time Learning for Evaluators
This paper introduces Learning While Evaluating (LWE), enabling LLM-as-a-judge systems to improve sequentially at inference time by updating an evolving meta-prompt with self-generated feedback. It further proposes Selective LWE, which updates only on self-inconsistent cases to improve evaluation quality with better cost efficiency.
APE improves LLM-as-a-judge reliability by automatically discovering auxiliary evaluation dimensions from failure cases and ensembling them with confidence-aware selection. It boosts agreement with human-aligned benchmarks by using test-time computation more effectively.