Karan Bansal
|
fad09e8a1f
|
fix(glm47): improve tool call parsing and content normalization (#37386)
Signed-off-by: karanb192 <karan@example.com>
Co-authored-by: karanb192 <karan@example.com>
|
2026-03-18 08:12:21 +00:00 |
|
Jee Jee Li
|
8c31f47c63
|
[LoRA] Make LoRA respect language_model_only (#37375)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-18 07:53:34 +00:00 |
|
Li, Jiang
|
261801242f
|
[Bugfix] Avoid OpenMP thread reallocation in CPU torch compile (#37391)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-18 07:51:39 +00:00 |
|
Or Ozeri
|
fcf0687b27
|
[kv_offload+HMA][0/N]: Support block-level preemption handling (#34805)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-18 08:49:53 +02:00 |
|
liuzhenwei
|
86b7e3c95a
|
[XPU] skip unsupported ut and update test_nixl_connector (#37179)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-18 13:32:59 +08:00 |
|
Andrew Xia
|
0e95916155
|
[responsesAPI] parser.extract_response_outputs can take in token IDs (#37130)
Signed-off-by: Andrew Xia <axia@meta.com>
|
2026-03-18 05:31:31 +00:00 |
|
Andreas Karatzas
|
ce2ef42fd3
|
[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 05:26:20 +00:00 |
|
Andreas Karatzas
|
8b6325758c
|
[ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture (#37349)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 04:55:40 +00:00 |
|
gxd3
|
a0dd1995c7
|
[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924)
Signed-off-by: Guangxiang Du <gxd@google.com>
|
2026-03-18 12:53:28 +08:00 |
|
Xin Yang
|
f1740006e4
|
[Perf] Enable dual stream execution of input projection for Qwen3 (#36795)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-18 11:13:27 +08:00 |
|
Andreas Karatzas
|
58cde5c026
|
[ROCm][CI] Skip trtllm kvfp8 dequant tests on ROCm (#37330)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 11:12:26 +08:00 |
|
Roy Wang
|
761e0aa7a0
|
[Performance] Add --enable-ep-weight-filter CLI option (#37351)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-18 09:36:55 +08:00 |
|
Yanan Cao
|
ff9fbc9aff
|
[Kernel][Helion] [16/N] Refactor register_kernel API to be more Dynamo-friendly (#36705)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-18 01:23:35 +00:00 |
|
Divakar Verma
|
e6c4797704
|
[ROCm][Quantization] add fp8xfp8 attn support for rocm_aiter_unified_attn (#36927)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2026-03-18 08:49:32 +08:00 |
|
Michael Goin
|
09e4576f65
|
[Kernel] Add non-gated support for NVFP4 CUTLASS MoE (#37320)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-03-17 18:12:04 -04:00 |
|
Andreas Karatzas
|
3ed7b1e6e0
|
[ROCm] Validate block_size for explicitly selected attention backends (#36846)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-17 17:04:40 -05:00 |
|
JartX
|
e8f9dbc369
|
[Bugfix][ROCm] Fix worker startup OOM on ROCm by skipping unreliable cudagraph memory profiling (#36720)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2026-03-17 17:55:34 -04:00 |
|
Yong Hoon Shin
|
de35c06c66
|
Make KV connector metadata build overridable via plugin (#37336)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2026-03-17 21:29:06 +00:00 |
|
Athrael Soju
|
c0745a851a
|
[Model] Add ColQwen3.5 4.5B support (#36887)
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-17 21:17:02 +00:00 |
|
Ekagra Ranjan
|
b5ca9c3557
|
[Models] Cohere ASR (#35809)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2026-03-17 21:04:17 +00:00 |
|
Chao-Ju Chen
|
245758992e
|
[Bugfix] Rescale NVFP4 weight scales to fix BF16 dequant underflow (#34577)
Signed-off-by: ricky-chaoju <ricky.chen@infinirc.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-17 20:48:42 +00:00 |
|
Dimitrios Bariamis
|
1204cf0a9d
|
[Bugfix] Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
|
2026-03-17 20:13:06 +00:00 |
|
Wei Zhao
|
b36adfa349
|
[Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache (#37252)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-03-17 20:09:20 +00:00 |
|
Michael Goin
|
e78821b438
|
[Deprecation] Deprecate --calculate-kv-scales option (#37201)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-17 19:57:24 +00:00 |
|
Cyrus Leung
|
51f0acda79
|
[Model] Remove unused handle_oov_mm_token (#37321)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-17 19:44:52 +00:00 |
|
Brian Dellabetta
|
fa75204b16
|
bump compressed-tensors version to 0.14.0.1 (#36988)
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
|
2026-03-17 15:36:19 -04:00 |
|
Wentao Ye
|
bdb903bb5f
|
[Bug] Fix FlashInfer MNNVL socket collisions under concurrent vLLM jobs (#36674)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-17 15:19:52 -04:00 |
|
Andrey Talman
|
68f783a727
|
[Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673)
Signed-off-by: atalman <atalman@fb.com>
|
2026-03-17 18:47:59 +00:00 |
|
Avinash Singh
|
c5030c439d
|
[CI] Split Distributed Tests (4 GPUs) and Kernel MoE tests (#37100)
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Signed-off-by: Avinash Singh <107198269+avinashsingh77@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-03-17 11:44:55 -07:00 |
|
Michael Goin
|
51b2333be1
|
[Perf] Optimize top-k search in apply_top_k_top_p_triton sampler (#37225)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-03-17 11:35:17 -07:00 |
|
Andreas Karatzas
|
4ed51308c8
|
[CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ (#37230)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-17 09:08:08 -07:00 |
|
Cyrus Leung
|
c781fbbab3
|
[Bugfix] Standardize custom HF Processor init (#37289)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-17 15:38:55 +00:00 |
|
Richard Zou
|
979ff44cea
|
[BugFix] PyTorch Compilation Tests should error if any test fails (#37300)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-17 15:26:38 +00:00 |
|
Benjamin Chislett
|
f63ed7b5ac
|
[Bugfix] Fix DP MTP Dummy Run (#35243)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-17 11:16:48 -04:00 |
|
Ning Xie
|
c9e5096256
|
[openapi] remove redundant exception stack trace[4/N] (#37157)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-03-17 15:06:25 +00:00 |
|
Anton Vlasjuk
|
2ff0ad9694
|
[UltraVox] Fix output type (#37224)
Signed-off-by: vasqu <antonprogamer@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-17 14:51:17 +00:00 |
|
Isotr0py
|
a836524d20
|
[Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-17 14:44:19 +00:00 |
|
Bhoomit
|
3717a4dd47
|
[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984)
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-17 14:36:41 +00:00 |
|
Harry Mellor
|
ecfcdd2ce4
|
Fix Phi3 test that fails with Transformers v5 (#37298)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-17 14:29:24 +00:00 |
|
Siew's Capital Jarvis
|
c25dbc2d27
|
[Bugfix] Fix unclean shutdown crash with AllReduce Fusion workspace (#36955)
Signed-off-by: Jarvis <brayden.stanley.0127@gmail.com>
|
2026-03-17 14:22:09 +00:00 |
|
Jonas M. Kübler
|
77d2a5f17b
|
pick up tuned prefill configs for FP8 FA3 (#36265)
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
|
2026-03-17 07:00:26 -07:00 |
|
Sage
|
59192dfd39
|
[Frontend] Complete OpenAI render delegation (#37287)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-17 13:53:55 +00:00 |
|
Umut Polat
|
56cb1baa66
|
[Misc] Use VLLMValidationError in batch, pooling, and tokenize protocol validators (#36256)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
|
2026-03-17 13:52:30 +00:00 |
|
Cyrus Leung
|
f340324335
|
[1/2] Move InternVL-based processors (#37260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-17 21:50:56 +08:00 |
|
sfbemerk
|
2660b9289c
|
Bugfix for offloading+prefetch for GLM-4.7-FP8 (#37178)
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
|
2026-03-17 21:22:09 +08:00 |
|
Viacheslav
|
293f036e6d
|
Add gigachat 3.1 tool parser + fix gigachat3 tool parser (#36664)
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
|
2026-03-17 12:03:20 +00:00 |
|
youkaichao
|
0fb142a454
|
[perf][connector] optimize build_connector_meta when host buffer transfer is not used (#37165)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2026-03-17 11:59:35 +00:00 |
|
Sage
|
00f8e0d211
|
[Frontend] Delegate tokenization serving preprocessing to OpenAIServingRender (#37266)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-17 11:22:54 +00:00 |
|
zhao, zhenhui
|
4af9ed21cb
|
[Bugfix](xpu): prevent “selected index k out of range” in TP decode path (#37259)
Signed-off-by: zhenzhao <zhenzhao@habana.ai>
|
2026-03-17 11:14:07 +00:00 |
|
Augusto Yao
|
9c7cab5ebb
|
[Feature]: Support for multiple embedding types in a single inference call (#35829)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
|
2026-03-17 17:05:42 +08:00 |
|