LLMs

Foundation model reports, inference methods, long-context language modeling, coding, and reasoning systems.

Research category

Foundation model reports, inference methods, long-context language modeling, coding, and reasoning systems.

18Papers

54Resource links

2026.05Latest month

15 papers

Foundation Models

2026.05 Foundation Models

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

This technical report presents the MiniMax-M2 series, MoE language models with a small active-parameter footprint designed for real-world agentic deployment. It combines agent-driven verifiable data pipelines, the Forge agent-native RL system, and early self-evolution in M2.7 to improve coding, deep-search, office-task, and reasoning performance.

Paper Project Code Hugging Face

2026.04 Foundation Models

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

This survey argues that continuous latent space is becoming a native computational substrate for language-based models, addressing the inefficiencies of explicit token-level generation such as redundancy, discretization bottlenecks, and semantic loss. It further organizes the field through mechanism and ability perspectives, and outlines key open challenges for future research.

Paper Project

2026.02 Foundation Models

GLM-5: from Vibe Coding to Agentic Engineering

GLM-5 is a next-generation foundation model targeting long-horizon agentic engineering, with reduced training and inference cost and preserved long-context capability. It introduces asynchronous RL infrastructure and agent RL algorithms to improve post-training efficiency and real-world coding performance.

Paper Project Code Hugging Face

2026.02 Foundation Models

Kimi K2.5: Visual Agentic Intelligence

This paper introduces an open-source multimodal agentic model that jointly optimizes text and vision through unified pretraining, SFT, and reinforcement learning. It also proposes Agent Swarm, a parallel orchestration framework for decomposing and executing complex tasks with coordinated agents.

Paper Project Code Hugging Face

2026.01 Foundation Models

MiMo-V2-Flash Technical Report

MiMo-V2-Flash is a 309B-parameter MoE foundation model with 15B active parameters, built for fast reasoning, coding, and agentic workloads through hybrid sliding-window/global attention, 27T-token pretraining, and long-context extension to 256k. It introduces Multi-Teacher On-Policy Distillation for scalable post-training and repurposes multi-token prediction as a draft model for speculative decoding speedups.

Paper Code Hugging Face

2026.01 Foundation Models

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

This paper introduces conditional memory as a sparsity axis complementary to MoE, instantiated by Engram for constant-time lookup of static knowledge. A scaling law guides the allocation between neural computation and memory, enabling Engram models to improve knowledge, reasoning, code, math, and long-context retrieval at matched parameters and FLOPs.

Paper Code

2025.12 Foundation Models

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

DeepSeek-V3.2 is an open large language model that combines efficient long-context computation with strong reasoning and agent performance. Its key ingredients include DeepSeek Sparse Attention, scalable RL post-training, and a large-scale agentic task synthesis pipeline for improving tool-use generalization and instruction-following robustness.

Paper Project Hugging Face

2025.08 Foundation Models

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

GLM-4.5 introduces an open-source MoE foundation model with hybrid reasoning modes (thinking/direct response) to better support agentic, reasoning, and coding tasks. It combines large-scale pretraining and RL-based post-training, and releases both full and compact variants with strong benchmark performance.

Paper Code Hugging Face

2025.07 Foundation Models

Kimi K2: Open Agentic Intelligence

Kimi K2 presents a trillion-parameter MoE language model focused on strong agentic, reasoning, and coding capabilities with stable large-scale training. The work introduces MuonClip with QK-clip to improve optimization stability and token efficiency during pretraining.

Paper Project Code Hugging Face

2025.05 Foundation Models

Qwen3 Technical Report

This report presents the Qwen3 family spanning dense and MoE models across a wide parameter range, emphasizing stronger multilingual performance and efficiency. It unifies deliberative thinking and fast response modes in one framework and scales post-training to improve reasoning, coding, and agentic behavior.

Paper Project Code Hugging Face

2025.01 Foundation Models

MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax-01 introduces a long-context model family built around Lightning Attention and MoE to improve scaling efficiency and practical throughput. It combines optimized parallelization and communication-computation overlap to train large models with stronger long-context performance.

Paper Project Code Hugging Face

2024.12 Foundation Models

DeepSeek-V3 Technical Report

DeepSeek-V3 is a 671B-parameter MoE language model with 37B activated parameters per token, built for efficient inference and cost-effective large-scale training. It extends MLA and DeepSeekMoE with auxiliary-loss-free load balancing and a multi-token prediction objective, achieving strong open-model performance with stable 14.8T-token pretraining and SFT/RL post-training.

Paper Code Hugging Face

2024.09 Foundation Models

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

This paper presents Qwen2.5-Math, a family of math-specialized language models that applies self-improvement throughout pre-training, post-training, and inference. The approach strengthens mathematical reasoning and tool-augmented problem solving across multiple model sizes.

Paper Code Hugging Face

2024.07 Foundation Models

Qwen2 Technical Report

This report introduces the Qwen2 series of dense and mixture-of-experts language models, covering base and instruction-tuned variants across a broad parameter range. It emphasizes stronger multilingual, coding, math, and reasoning performance while remaining competitive with proprietary systems.

Paper Code Hugging Face

2024.05 Foundation Models

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

DeepSeek-V2 is a 236B-parameter MoE language model with 21B activated parameters per token and 128K context length, designed for economical training and efficient inference. It combines Multi-head Latent Attention for KV-cache compression with DeepSeekMoE sparse computation, reducing training cost and KV cache while improving throughput and open-model performance.

Paper Code Hugging Face

2 papers

Inference

2026.04 Inference

Large Language Models Explore by Latent Distilling

This paper proposes Exploratory Sampling (ESamp), a decoding method that addresses the shallow lexical variation of standard stochastic sampling by encouraging semantic exploration. It trains a lightweight Distiller at test time to predict deep-layer representations from shallow ones, then uses prediction error as a novelty signal to reweight candidate tokens and improve Pass@k efficiency.

Paper Code

2026.03 Inference

Caterpillar of Thoughts: The Optimal Test-Time Algorithm for Large Language Models

This paper presents a theoretical framework for optimal test-time computation in LLMs, proving that the optimal algorithm always generates a caterpillar tree structure, and introduces CaT which achieves better success rate than Tree-of-Thoughts with fewer token generations.

Paper

1 paper

Detection

2026.05 Detection

Base Models Look Human To AI Detectors

This paper finds that commercial AI-text detectors often label base-model outputs as more human than outputs from instruction-tuned counterparts, suggesting they track tuning artifacts and local context rather than invariant machine-text signals. It proposes Humanization by Iterative Paraphrasing (HIP), a detector-agnostic fine-tuning and iterative paraphrasing pipeline that improves semantic preservation while evading detectors.

Paper Code