Notes

Research notes

Bilingual notes for paper readings and technical reflections around LLM research and engineering.

5Notes

3Topics

2026-06-18Latest date

Browse

Paper Readings

2026-06-15

CapRL: Stimulating Vision-Language Captioning Capabilities with Reinforcement Learning

CapRL evaluates captions through the MCQ accuracy of a vision-free LLM, turning subjective caption-quality scoring into a verifiable reward for training image-captioning models.

CapRL Reinforcement Learning RLVR Vision-Language Model Image Captioning

REINFORCEMENT-LEARNING Paper Readings

2026-06-15

From Qwen-VL to Qwen3-VL: Four Generations of Architecture and Training

A technical review of how four Qwen-VL generations evolved across vision-language alignment, dynamic resolution, spatiotemporal position encoding, video modeling, and deep visual fusion.

VLM Qwen Multimodal Large Language Model Position Encoding Video Understanding

MLLMS Paper Readings

2026-06-18

Entropy Collapse: Policy Entropy Consumption in LLM Reinforcement Learning

A note on entropy collapse in LLM reinforcement learning, covering policy entropy, the difference between SFT and RL, DAPO's Clip-Higher strategy, and covariance regularization.

Entropy Collapse Reinforcement Learning Post-training DAPO GRPO

REINFORCEMENT-LEARNING Paper Readings

Browse

Technical Reflections

2026-05-28

OPD: Capability Integration Interface in Post-training

A technical reflection on how OPD becomes a capability integration interface in post-training through Qwen3, GLM-5, MiMo-V2, and DeepSeek-V4.

OPD Reinforcement Learning Distillation

OPD Technical Reflections

2026-06-16

PPO, DPO, and GRPO: Objectives and Training Loops for LLM Alignment

A comparison of PPO, DPO, and GRPO through their objectives, advantage estimators, training loops, engineering tradeoffs, and practical boundaries.

Post-training Reinforcement Learning PPO DPO GRPO

REINFORCEMENT-LEARNING Technical Reflections