Awesome LLM Research Collections
  • Home
  • Papers
    • Attention
    • LLMs
    • Multimodal LLMs
    • Embeddings
    • SFT
    • Training
    • Reinforcement Learning
    • Agents Application
    • Vision
    • Auto-Prompt
  • Notes
  • Blogs
  • English
  • 中文

Notes

Research notes

Bilingual notes for paper readings and technical reflections around LLM research and engineering.

5Notes
3Topics
2026-06-18Latest date
Paper Readings Technical Reflections

Browse

Paper Readings

2026-06-15

CapRL: Stimulating Vision-Language Captioning Capabilities with Reinforcement Learning

CapRL evaluates captions through the MCQ accuracy of a vision-free LLM, turning subjective caption-quality scoring into a verifiable reward for training image-captioning models.

CapRL Reinforcement Learning RLVR Vision-Language Model Image Captioning
REINFORCEMENT-LEARNING Paper Readings
2026-06-15

From Qwen-VL to Qwen3-VL: Four Generations of Architecture and Training

A technical review of how four Qwen-VL generations evolved across vision-language alignment, dynamic resolution, spatiotemporal position encoding, video modeling, and deep visual fusion.

VLM Qwen Multimodal Large Language Model Position Encoding Video Understanding
MLLMS Paper Readings
2026-06-18

Entropy Collapse: Policy Entropy Consumption in LLM Reinforcement Learning

A note on entropy collapse in LLM reinforcement learning, covering policy entropy, the difference between SFT and RL, DAPO's Clip-Higher strategy, and covariance regularization.

Entropy Collapse Reinforcement Learning Post-training DAPO GRPO
REINFORCEMENT-LEARNING Paper Readings

Browse

Technical Reflections

2026-05-28

OPD: Capability Integration Interface in Post-training

A technical reflection on how OPD becomes a capability integration interface in post-training through Qwen3, GLM-5, MiMo-V2, and DeepSeek-V4.

OPD Reinforcement Learning Distillation
OPD Technical Reflections
2026-06-16

PPO, DPO, and GRPO: Objectives and Training Loops for LLM Alignment

A comparison of PPO, DPO, and GRPO through their objectives, advantage estimators, training loops, engineering tradeoffs, and practical boundaries.

Post-training Reinforcement Learning PPO DPO GRPO
REINFORCEMENT-LEARNING Technical Reflections
  • View source
  • Report an issue