Commit Graph

4625 Commits

Author SHA1 Message Date
Rabi Mishra
69f8a0ea37 fix(rocm): Use refresh_env_variables() for rocm_aiter_ops in test_moe (#31711)
Signed-off-by: rabi <ramishra@redhat.com>
2026-01-13 19:11:54 +00:00
Wentao Ye
f28125d87b [Perf] Optimize grouped topk kernel, 1.2%~2% E2E Throughput improvement (#32058)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-13 10:58:18 -08:00
Andrew Xia
af54d2e2d0 [responseAPI] support partial message generation (#32100)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <mitandrewxia@gmail.com>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2026-01-13 10:41:26 -08:00
Matthew Bonanni
2263d44b68 [4/N][Attention] Move MLA common to model_executor (#32060)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-13 09:08:45 -08:00
Chauncey
4f02cb2eac [Refactor] [7/N] to simplify the vLLM lora serving architecture (#32251)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-13 15:37:34 +00:00
Cyrus Leung
252c011012 [Refactor] Remove MultiModalProfiler (#32254)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 15:10:20 +00:00
Matthew Bonanni
98f60e5acb [6/N][Attention] Move utils to more appropriate locations (#32215)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-13 05:38:52 -08:00
Chauncey
fefce49807 [Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-13 13:01:39 +00:00
Cyrus Leung
232214b2ae [Bugfix] Replace PoolingParams.normalize with use_activation (#32243)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 10:45:42 +00:00
Andreas Karatzas
df7e12715f [ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing (#32061)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-13 15:14:30 +08:00
Xingyu Liu
80221e1884 [BugFix]Fix eagle draft_model_config and add tests (#31753)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
2026-01-12 23:09:36 -08:00
Andreas Karatzas
5e714f7ff4 [ROCm][CI] Fix HuggingFace flash_attention_2 accuracy issue in Isaac vision encoder (#32233)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-12 22:33:59 -08:00
Nick Hill
c6bb5b5603 [BugFix] Fix engine crash caused by chat tools + response_format (#32127)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-13 10:33:14 +08:00
Andrew Xia
a307ac0734 [responsesAPI] add unit test for optional function tool call id (#32036)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2026-01-12 16:14:54 -08:00
Nicolò Lucchesi
f8bd8394e3 [NIXL][Bugfix] Failure logging overhaul + early metadata free on failure (#32031)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-12 20:38:49 +00:00
Or Ozeri
2be765b68a [BugFix] scheduler: Fix ordering preserving of skipped requests (#32173)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-12 18:39:38 +00:00
Ilya Markov
1eb61ab34b [Refactor] EPLB rebalance algo to NumPy (#30697)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2026-01-12 18:13:23 +00:00
Matthew Bonanni
20228cb851 [3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-12 09:13:56 -08:00
Cyrus Leung
7c0d3c5152 [Benchmark] Share data between SLA runs (#32184)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 01:12:22 +08:00
Nicolò Lucchesi
5b68107411 [Misc][PD] Fix get_attn_backend usage in transfer connectors (#31988)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-12 18:10:05 +01:00
Asaf Joseph Gardin
8fb2c135be [Bugfix] Fix stale SSM state for new Mamba requests scheduled as decode (#32118)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
2026-01-12 17:02:38 +00:00
danielafrimi
3f72639d36 [FIX] Add NO_MUL activation support for modular kernel path (#31528)
Signed-off-by: dafrimi <dafrimi@nvidia.com>
Signed-off-by: <>
Co-authored-by: root <root@gpu-267.slurm-workers-slurm.slurm.svc.cluster.local>
Co-authored-by: root <root@gpu-537.slurm-workers-slurm.slurm.svc.cluster.local>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: root <root@pool0-01777.cm.cluster>
2026-01-12 11:55:49 -05:00
Jaehyun An
6bc9c8473e [MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct (#29384)
Signed-off-by: Jaehyun An <steve.ai@kakaocorp.com>
2026-01-12 16:39:02 +00:00
Kyungmin Lee
63ed2409e8 Add K-EXAONE-236B-A23B (#31621)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: lgai-exaone <exaonemodels@lgresearch.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-12 16:30:50 +00:00
Or Ozeri
9cddbdba6d OffloadingConnector: Add cpu_bytes_to_use configuration (#24498)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-12 15:00:43 +00:00
daniel-salib
d7b2e57097 [Frontend] Fix Flaky MCP Streaming Test (#32153)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2026-01-12 18:03:32 +08:00
Jiangyun Zhu
3df619ac94 [CI] fix test_concat_and_cache_mla_rope_fused (#32117)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2026-01-11 15:11:11 +00:00
Cyrus Leung
a374532111 [CI/Build] Separate out flaky responses API tests (#32110)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-11 05:01:12 -08:00
Or Ozeri
4c16ba617f [KVConnector] OffloadingConnector: Fix bug in handling of preemptions (#29870)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-11 08:05:36 +00:00
Matt
bde57ab2ed [Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-01-10 23:19:46 -08:00
Cyrus Leung
ef96fa3f1f [Benchmark][2/2] Use spline interpolation to tune SLA variables (#32095)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-10 20:27:27 -08:00
Or Ozeri
2a4dbe24ea [BugFix] Wait for compute before offloading KV to CPU (#31341)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-10 22:25:08 +00:00
shyeh25
1c46dea001 Revert "[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#308… (#31617)
Signed-off-by: shyeh25 <206795756+shyeh25@users.noreply.github.com>
2026-01-10 12:39:59 -08:00
Or Ozeri
028599739d [BugFix] scheduler: Fix resuming of preempted requests after async load (#31583)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-10 12:39:25 -08:00
Jeremy Teboul
07286ec5a6 [Bugfix] Fix integer overflow in Gemma3n audio processing (#31657)
Signed-off-by: Jeremy Teboul <jeremyte@meta.com>
2026-01-10 17:52:53 +08:00
Cyrus Leung
5f2385a4c8 [Benchmark][1/2] Generalize SLA criterion validation from binary flags to margins (#32075)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-10 07:11:03 +00:00
Andreas Karatzas
d83becd503 [ROCm][CI] Fix flaky test_function_calling_with_stream and reduce schema test examples (#32063)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-10 05:02:35 +00:00
Cyrus Leung
583a90e005 [Refactor] Separate sequence and token pooling types (#32026)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-10 04:53:24 +00:00
PatrykSaffer
80fead8bf6 Fuse RoPE and MLA KV-cache write (#25774)
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-01-09 19:18:37 -08:00
Akshat Shrivastava
e45946bd91 feature/issac 0.2 (#31550)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-01-10 03:18:05 +00:00
Lucas Kabela
ea6d067a2a [Misc][LLaMa4] Compile LLaMa Vision Encoder (#30709)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-01-09 22:01:38 -05:00
Ning Xie
abd9224280 resolve pydantic error in startup benchmark (#31348)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-01-10 02:41:27 +00:00
Matthew Bonanni
2612ba9285 [1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-09 13:10:24 -08:00
Andrew Xia
1f8b7c536b [responsesAPI] fix incomplete_messages for simple/parsable context (#31836)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2026-01-09 21:00:57 +00:00
Lucas Wilkinson
0a0aa07747 [Quant] Make static quant support all group shapes (#30833)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-01-09 12:49:27 -08:00
Runkai Tao
a4d5d663e2 Add unpermute-aware fused MoE path and small-batch fallback (#29354)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-09 12:58:39 -07:00
Jeremy Teboul
657e9c0e18 [Fix] Introduce audio channels spec (#31595)
Signed-off-by: Jeremy Teboul <jeremyte@meta.com>
2026-01-09 19:34:51 +00:00
Andrew Xia
f32c629eb4 [Frontend][gpt-oss] Allow system message to overwrite model identity (#31737)
Signed-off-by: lacora <hyelacora@gmail.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: lacora <hyelacora@gmail.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-01-09 14:03:57 -05:00
Yifan Qiao
cd4a95e3aa [Feat][Core] Support multiple KV cache groups in Hybrid KV Coordinator (#31707)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
2026-01-09 10:53:20 -08:00
inkcherry
4505849b30 [ROCm][PD] add moriio kv connector. (#29304)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2026-01-09 14:01:57 +00:00