Awesome LLM Research Collections
  • Home
  • Papers
    • Attention
    • LLMs
    • Multimodal LLMs
    • Embeddings
    • SFT
    • Training
    • Reinforcement Learning
    • Agents Application
    • Vision
    • Auto-Prompt
  • Notes
  • Blogs
  • English
  • 中文

SFT

Supervised fine-tuning methods, data recipes, token weighting, and reasoning generalization studies.
中文

Research category

Supervised fine-tuning methods, data recipes, token weighting, and reasoning generalization studies.

3Papers
5Resource links
2026.05Latest month
SFT Methods

3 papers

SFT Methods

2026.05 SFT Methods

Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning

This paper systematically studies difficulty-based data selection for supervised fine-tuning and shows that no single difficulty level is universally optimal. It explains the data-size-dependent optimum through a tradeoff between in-distribution generalization and extrapolation, with the best difficulty shifting toward harder examples as the data budget grows.

Paper
2026.04 SFT Methods

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

This paper challenges the common claim that supervised fine-tuning (SFT) only memorizes while RL generalizes, finding cross-domain generalization from reasoning SFT with long chain-of-thought supervision depends jointly on optimization dynamics, training data, and base model capability.

Paper Code Hugging Face
2026.01 SFT Methods

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

This paper presents ProFit, a supervised fine-tuning method that mitigates single-reference overfitting by using token probability as a proxy for semantic importance and masking low-probability tokens. The approach focuses learning on core logical content and improves reasoning and math performance over standard SFT baselines.

Paper
  • View source
  • Report an issue