Notes
Research notes
Bilingual notes for paper readings and technical reflections around LLM research and engineering.
Browse
Paper Readings
CapRL: Stimulating Vision-Language Captioning Capabilities with Reinforcement Learning
CapRL evaluates captions through the MCQ accuracy of a vision-free LLM, turning subjective caption-quality scoring into a verifiable reward for training image-captioning models.
From Qwen-VL to Qwen3-VL: Four Generations of Architecture and Training
A technical review of how four Qwen-VL generations evolved across vision-language alignment, dynamic resolution, spatiotemporal position encoding, video modeling, and deep visual fusion.
Entropy Collapse: Policy Entropy Consumption in LLM Reinforcement Learning
A note on entropy collapse in LLM reinforcement learning, covering policy entropy, the difference between SFT and RL, DAPO's Clip-Higher strategy, and covariance regularization.
Browse
Technical Reflections
OPD: Capability Integration Interface in Post-training
A technical reflection on how OPD becomes a capability integration interface in post-training through Qwen3, GLM-5, MiMo-V2, and DeepSeek-V4.
PPO, DPO, and GRPO: Objectives and Training Loops for LLM Alignment
A comparison of PPO, DPO, and GRPO through their objectives, advantage estimators, training loops, engineering tradeoffs, and practical boundaries.