Supervised fine-tuning methods, data recipes, token weighting, and reasoning generalization studies.
3Papers
5Resource links
2026.05Latest month
3 papers
SFT Methods
2026.05SFT Methods
Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning
This paper systematically studies difficulty-based data selection for supervised fine-tuning and shows that no single difficulty level is universally optimal. It explains the data-size-dependent optimum through a tradeoff between in-distribution generalization and extrapolation, with the best difficulty shifting toward harder examples as the data budget grows.
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
This paper challenges the common claim that supervised fine-tuning (SFT) only memorizes while RL generalizes, finding cross-domain generalization from reasoning SFT with long chain-of-thought supervision depends jointly on optimization dynamics, training data, and base model capability.
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection
This paper presents ProFit, a supervised fine-tuning method that mitigates single-reference overfitting by using token probability as a proxy for semantic importance and masking low-probability tokens. The approach focuses learning on core logical content and improves reasoning and math performance over standard SFT baselines.