Wentao Ye
|
d88a1df699
|
[Deprecation] Deprecate profiling envs (#33722)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-04 05:58:21 +00:00 |
|
Cyrus Leung
|
90d74ebaa4
|
[Deprecation] Remove _get_data_parser in MM processor (#33757)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-04 05:51:52 +00:00 |
|
Frank Wang
|
45f8fd6f97
|
[Feature] Enable TRITON_ATTN for Batch Invariance (#33688)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
|
2026-02-04 13:27:34 +08:00 |
|
Wentao Ye
|
5e1e0a0fbd
|
[Refactor] Remove unused dead code (#33718)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-03 21:25:11 -08:00 |
|
Michael Goin
|
eb5ed20743
|
[Bugfix] Define router_logits_dtype for remaining MoE models (#33737)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-04 13:24:14 +08:00 |
|
Huy Do
|
2647163674
|
Save startup benchmark results as a list of values (#33629)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2026-02-03 20:37:51 -08:00 |
|
Shanshan Shen
|
9fb27dd3b3
|
[MM] Align the prefix of MMEncoderAttention with Attention (#33750)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-02-04 04:07:30 +00:00 |
|
R3hankhan
|
4dffc5e044
|
[CPU] Split attention dispatch by head_dim alignment (#32161)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-02-03 19:37:15 -08:00 |
|
Andrew Xia
|
e1bf04b6c2
|
[1/N] Initial Implementation of Parser for ResponsesAPI (#32712)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2026-02-04 10:59:03 +08:00 |
|
Isotr0py
|
02080179a3
|
[Bugfix] Fix torchrun PP broadcast deadlock with async scheduling (#33701)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-04 02:17:37 +00:00 |
|
wang.yuqi
|
1b8fe6f7c4
|
[Frontend][4/n] Make pooling entrypoints request schema consensus | ScoreRequest (#33060)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-04 01:48:40 +00:00 |
|
Nick Hill
|
52ee21021a
|
[BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-03 23:34:41 +00:00 |
|
Wentao Ye
|
655efb3e69
|
[Dependency] Remove comments of ray in dependency files (#33351)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-03 15:30:47 -08:00 |
|
Matthew Bonanni
|
bd8da29a66
|
[Bugfix] Fix sparse MLA metadata building (#33579)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-03 15:29:48 -08:00 |
|
Michael Goin
|
2a99c5a6c8
|
[Bugfix] Disable TRTLLM FP8 MoE if router_logits_dtype==float32 and routing_method!=DeepSeekV3 (#33613)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-03 13:26:51 -08:00 |
|
Patrick von Platen
|
3f7662d650
|
[Voxtral Realtime] Change name (#33716)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2026-02-03 13:03:28 -08:00 |
|
Vadim Gimpelson
|
a372f3f40a
|
[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1 (#33257)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-02-03 15:10:31 -05:00 |
|
Harry Mellor
|
61e632aea1
|
Turn @config into a dataclass_transform (#31541)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-03 17:40:59 +00:00 |
|
Richard Zou
|
b1bb18de8d
|
[torch.compile] Significantly speed up cold start times (#33641)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-03 09:12:11 -08:00 |
|
Lucas Wilkinson
|
2267cb1cfd
|
[Attention][FA3] Update FA3 to include new swizzle optimization (#23465)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-03 08:08:47 -08:00 |
|
dtc
|
0d6ccf68fa
|
[P/D] rework mooncake connector and introduce its bootstrap server (#31034)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2026-02-03 08:08:25 -08:00 |
|
Cyrus Leung
|
18e7cbbb15
|
[Bugfix] Fix startup hang for Granite Speech (#33699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-03 15:57:56 +00:00 |
|
Patrick von Platen
|
f0d5251715
|
[Voxtral models] Skip warm-up to skip confusing error message in warm-up (#33576)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-03 07:22:34 -08:00 |
|
Shanshan Shen
|
5c4f2dd6ef
|
[MM] Pass prefix parameter to MMEncoderAttention (#33674)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-02-03 06:47:41 -08:00 |
|
wang.yuqi
|
f3d8a34671
|
[Bugfix] Do not add extra \n for image-only cases when constructing multimodal text prompts. (#33647)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-03 06:43:47 -08:00 |
|
shaharmor98
|
4bc913aeec
|
Feat/add nemotron nano v3 tests (#33345)
|
2026-02-03 08:52:49 -05:00 |
|
Kuntai Du
|
fbb3cf6981
|
[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer (#33377)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2026-02-03 21:50:15 +08:00 |
|
Krish Gupta
|
2df2b3499d
|
Document NixlConnector backend selection via kv_connector_extra_config (#33552)
Signed-off-by: KrxGu <krishom70@gmail.com>
|
2026-02-03 05:49:59 -08:00 |
|
Harry Mellor
|
2a8d84e66d
|
Fix Gemma3n audio encoder for Transformers v5 (#33673)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-03 05:49:49 -08:00 |
|
zxy
|
a3acfa1071
|
[Models] Intern-S1-Pro (#33636)
Signed-off-by: zxy <zhou0493@e.ntu.edu.sg>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-03 05:49:45 -08:00 |
|
Harry Mellor
|
be8168ff88
|
Fix Gemma3 GGUF for Transformers v5 (#33683)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-03 12:36:53 +00:00 |
|
Harry Mellor
|
f6af34626d
|
Fix offline test for Transformers v5 (#33682)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-03 12:07:24 +00:00 |
|
Song Zhixin
|
ceab70c89d
|
[Bugfix] fix qwen3-asr response error (#33644)
Signed-off-by: jesse <szxfml@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-03 03:33:56 -08:00 |
|
Cyrus Leung
|
52683ccbe1
|
[Misc] Update default image format of encode_base64 (#33656)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-03 03:13:16 -08:00 |
|
Michael Goin
|
e346e2d056
|
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-03 10:37:15 +00:00 |
|
Cyrus Leung
|
83449a5ff0
|
[Refactor] Clean up pooling serial utils (#33665)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-03 10:29:18 +00:00 |
|
Lucas Hänke de Cansino
|
dad2d6a590
|
[Bugfix][Model] Fix DeepSeek-OCR-2 chat template to include BOS token (#33642)
Signed-off-by: l4b4r4b4b4 <lucas.cansino@mail.de>
|
2026-02-03 00:35:58 -08:00 |
|
Isotr0py
|
32e84fa1ff
|
[CI/Build] Investigate torchrun distributed tests hanging issue (#33650)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-03 15:49:17 +08:00 |
|
Richard Zou
|
fd9c83d0e0
|
[torch.compile] Document the workaround to standalone_compile failing (#33571)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-03 07:16:55 +00:00 |
|
杨朱 · Kiki
|
b95cc5014d
|
[Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-02-03 15:01:59 +08:00 |
|
Nick Hill
|
61397891ce
|
[Minor] Some code simplification in scheduler.py (#33597)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-03 15:00:00 +08:00 |
|
杨朱 · Kiki
|
ef248ff740
|
[Misc] Remove deprecated profiler environment variables (#33536)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-02-03 14:58:44 +08:00 |
|
Kunshang Ji
|
e10604480b
|
[XPU][1/N] Deprecate ipex and switch to vllm-xpu-kernels for xpu platform (#33379)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-02 22:46:10 -08:00 |
|
Chauncey
|
bf001da4bf
|
[Bugfix] Interleaved thinking keeps compatibility with reasoning_content (#33635)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Koushik Dutta <koushd@gmail.com>
|
2026-02-03 06:46:05 +00:00 |
|
杨朱 · Kiki
|
a0a984ac2e
|
[CI/Build] Remove hardcoded America/Los_Angeles timezone from Dockerfiles (#33553)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-02-02 22:32:39 -08:00 |
|
Shengliang Xu
|
f1cb9b5544
|
Fix quantized Falcon-H1 model loading issues (#32728)
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-02 22:31:27 -08:00 |
|
Daniel Mescheder
|
4c4b6f7a97
|
[Frontend] Add sampling parameters to Responses API (#32609)
Signed-off-by: Daniel Mescheder <dmesch@amazon.com>
Co-authored-by: Daniel Mescheder <dmesch@amazon.com>
|
2026-02-03 13:51:10 +08:00 |
|
Roger Wang
|
10546f925a
|
[Bugfix] Fix mm budget setting for Qwen Omni models (#33634)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-03 04:56:25 +00:00 |
|
Radu Salavat
|
e69c990c21
|
[Feature][CPU Backend]: Optimize ARM vectorization backend (#30329)
Signed-off-by: Radu Salavat <radu.salavat@arm.com>
|
2026-02-02 20:17:56 -08:00 |
|
Richard Zou
|
5eac9a1b34
|
[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624)
Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-03 03:38:49 +00:00 |
|