Karan Bansal
|
fad09e8a1f
|
fix(glm47): improve tool call parsing and content normalization (#37386)
Signed-off-by: karanb192 <karan@example.com>
Co-authored-by: karanb192 <karan@example.com>
|
2026-03-18 08:12:21 +00:00 |
|
Or Ozeri
|
fcf0687b27
|
[kv_offload+HMA][0/N]: Support block-level preemption handling (#34805)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-18 08:49:53 +02:00 |
|
liuzhenwei
|
86b7e3c95a
|
[XPU] skip unsupported ut and update test_nixl_connector (#37179)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-18 13:32:59 +08:00 |
|
Andreas Karatzas
|
ce2ef42fd3
|
[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 05:26:20 +00:00 |
|
Andreas Karatzas
|
8b6325758c
|
[ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture (#37349)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 04:55:40 +00:00 |
|
gxd3
|
a0dd1995c7
|
[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924)
Signed-off-by: Guangxiang Du <gxd@google.com>
|
2026-03-18 12:53:28 +08:00 |
|
Andreas Karatzas
|
58cde5c026
|
[ROCm][CI] Skip trtllm kvfp8 dequant tests on ROCm (#37330)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 11:12:26 +08:00 |
|
Yanan Cao
|
ff9fbc9aff
|
[Kernel][Helion] [16/N] Refactor register_kernel API to be more Dynamo-friendly (#36705)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-18 01:23:35 +00:00 |
|
Michael Goin
|
09e4576f65
|
[Kernel] Add non-gated support for NVFP4 CUTLASS MoE (#37320)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-03-17 18:12:04 -04:00 |
|
Yong Hoon Shin
|
de35c06c66
|
Make KV connector metadata build overridable via plugin (#37336)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2026-03-17 21:29:06 +00:00 |
|
Athrael Soju
|
c0745a851a
|
[Model] Add ColQwen3.5 4.5B support (#36887)
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-17 21:17:02 +00:00 |
|
Ekagra Ranjan
|
b5ca9c3557
|
[Models] Cohere ASR (#35809)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2026-03-17 21:04:17 +00:00 |
|
Cyrus Leung
|
51f0acda79
|
[Model] Remove unused handle_oov_mm_token (#37321)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-17 19:44:52 +00:00 |
|
Andrey Talman
|
68f783a727
|
[Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673)
Signed-off-by: atalman <atalman@fb.com>
|
2026-03-17 18:47:59 +00:00 |
|
Andreas Karatzas
|
4ed51308c8
|
[CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ (#37230)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-17 09:08:08 -07:00 |
|
Isotr0py
|
a836524d20
|
[Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-17 14:44:19 +00:00 |
|
Bhoomit
|
3717a4dd47
|
[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984)
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-17 14:36:41 +00:00 |
|
Harry Mellor
|
ecfcdd2ce4
|
Fix Phi3 test that fails with Transformers v5 (#37298)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-17 14:29:24 +00:00 |
|
Sage
|
59192dfd39
|
[Frontend] Complete OpenAI render delegation (#37287)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-17 13:53:55 +00:00 |
|
Cyrus Leung
|
f340324335
|
[1/2] Move InternVL-based processors (#37260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-17 21:50:56 +08:00 |
|
Viacheslav
|
293f036e6d
|
Add gigachat 3.1 tool parser + fix gigachat3 tool parser (#36664)
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
|
2026-03-17 12:03:20 +00:00 |
|
Sage
|
00f8e0d211
|
[Frontend] Delegate tokenization serving preprocessing to OpenAIServingRender (#37266)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-17 11:22:54 +00:00 |
|
Augusto Yao
|
9c7cab5ebb
|
[Feature]: Support for multiple embedding types in a single inference call (#35829)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
|
2026-03-17 17:05:42 +08:00 |
|
Chauncey
|
132bfd45b6
|
[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens (#37258)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-17 08:54:52 +00:00 |
|
Benjamin Chislett
|
8a680463fa
|
[Bugfix] Fix NemotronH MTP + Chunked Prefill (#35447)
|
2026-03-17 07:07:33 +01:00 |
|
Flora Feng
|
3e3d320c1b
|
[Refactor] Relocate responses API tests (#37241)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-17 05:14:52 +00:00 |
|
Flora Feng
|
384dc7f77b
|
[Refactor] Relocate completion and chat completion tests (#37125)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-17 11:31:23 +08:00 |
|
Flora Feng
|
f04d5226f8
|
[CI] Fix flaky tool_use chat completion tests with deterministic seed (#37027)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-17 03:24:34 +00:00 |
|
Vadim Gimpelson
|
6c1cfbad32
|
Support non-contiguous KV cache in TRTLLM fp8 dequant kernel (#36867)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: Pavani Majety <pavanimajety@gmail.com>
|
2026-03-16 17:48:42 -07:00 |
|
Harry Huang
|
45f526d652
|
[BugFix] Correct max memory usage for multiple KV-cache groups (#36030)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
|
2026-03-17 00:38:52 +00:00 |
|
Walter Beller-Morales
|
061980c36a
|
[Feature][Frontend] add support for Cohere Embed v2 API (#37074)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-03-16 19:55:53 -04:00 |
|
Ben Browning
|
7a49742b88
|
[CI/Build] Add common tool call parser test suite (#27599)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2026-03-16 19:46:20 -04:00 |
|
Terry Gao
|
3e6a1e1686
|
[Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389)
Signed-off-by: tianrengao <terrygao87@gmail.com>
|
2026-03-16 18:51:46 -04:00 |
|
Andreas Karatzas
|
4f9b14c21c
|
[CI] Stabilize multinode DP internal LB completion tests (#36356)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 15:40:23 -07:00 |
|
EdalatiAli
|
e5b807607c
|
[Quant][Feature] Support online MXFP8 quantization for MoE and dense models (#35448)
Signed-off-by: EdalatiAli <aliedalati@cohere.com>
|
2026-03-16 18:07:39 -04:00 |
|
Krish Gupta
|
c0f011918d
|
[Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688) (#36779)
Signed-off-by: Krish Gupta <krishom70@gmail.com>
|
2026-03-16 21:11:33 +00:00 |
|
rasmith
|
2cc26c3a99
|
[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-03-16 13:22:57 -07:00 |
|
Flora Feng
|
dfa8852db2
|
[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Flora Feng <4florafeng@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-16 15:53:07 -04:00 |
|
Nicolò Lucchesi
|
f5c081d432
|
[PD][Nixl] Add support for hybrid SSM-FA models (#36687)
|
2026-03-16 19:58:06 +01:00 |
|
Max de Bayser
|
9f9ecff4cd
|
Add simple granite4 tool parser (#36827)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2026-03-16 10:49:09 -07:00 |
|
haosdent
|
ca1954d58c
|
[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-16 19:03:10 +02:00 |
|
Raushan Turganbay
|
55e6d3d5c0
|
[Bugfix] Make siglip/clip compatible with transformers v5 (#37200)
Signed-off-by: raushan <raushan@huggingface.co>
|
2026-03-16 16:48:18 +00:00 |
|
Benjamin Bartels
|
0e5a9382af
|
[Bugfix] accept redacted thinking blocks in Anthropic messages (#36992)
Signed-off-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
Signed-off-by: bbartels <benjamin@bartels.dev>
Co-authored-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
|
2026-03-16 22:01:57 +08:00 |
|
Fynn Schmitt-Ulms
|
04bf5a35fa
|
[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013)
|
2026-03-16 14:53:45 +01:00 |
|
Robin Nabel
|
bf9a185395
|
GLM4 tool parser: fix streaming mode (#35208)
Signed-off-by: Robin Nabel <opensource@nabel.co>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-03-16 18:48:52 +08:00 |
|
Kunshang Ji
|
747b068136
|
[Hardware] Replace memory related torch.cuda APIs (#37031)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
|
2026-03-16 10:24:48 +00:00 |
|
haosdent
|
116ed130f4
|
[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-16 10:30:23 +01:00 |
|
Isotr0py
|
912fbe9555
|
[Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs (#37147)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-16 08:56:06 +00:00 |
|
Roy Wang
|
821eb80c0d
|
[Performance][Model Loader] Skip non-local expert weights during EP model loading (#37136)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-03-16 01:33:36 -07:00 |
|
Andreas Karatzas
|
a2956a0f8e
|
[ROCm][CI] Retrying in case of batch variance effects and reducing flakiness (#36442)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 16:08:51 +08:00 |
|