Kuntai Du
|
86dca07d9b
|
[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator (#28011)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-11-05 10:36:31 +00:00 |
|
Qiu
|
16b37f3119
|
[bugfix] fix wrong dcp_local_seq_lens calc (#27518)
Signed-off-by: Qiu <qiuchunshuo@huawei.com>
|
2025-11-05 17:58:13 +08:00 |
|
Chauncey
|
0976711f3b
|
[Refactor] to simplify and extract the shared logic between chat completion and responses (#27961)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-05 15:46:39 +08:00 |
|
Chauncey
|
e261d37c9a
|
[Refactor] Lazy-loaded reasoning_parser (#28092)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-05 15:37:02 +08:00 |
|
Alex Brooks
|
b7cbc25416
|
[Model, Core] Support Granite Speech & LoRA for STT (#24455)
|
2025-11-05 08:33:48 +01:00 |
|
Lucas Wilkinson
|
d43ad5a757
|
[BugFix] Fix DCP Assert (AssertionError: DCP not support reorder_batch_threshold > 1 now.) (#28100)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-05 14:54:43 +08:00 |
|
Isotr0py
|
0ff05e3770
|
[Bugfix] Fix encoder-only model support for transformers backend (#28021)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-04 22:24:41 -08:00 |
|
wangxiyuan
|
428bc7bf1c
|
[V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-04 20:51:16 -08:00 |
|
Zhewen Li
|
878fd5a16f
|
[CI/Build] Enable some fixed tests in AMD CI (#28078)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-05 03:15:59 +00:00 |
|
Kunshang Ji
|
18b39828d9
|
[XPU] Add gpt-oss model support for Intel GPU (#27786)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-05 02:17:23 +00:00 |
|
tou
|
4ea62b77f5
|
[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 (#27740)
|
2025-11-05 09:25:09 +08:00 |
|
Vadim Gimpelson
|
d4e547bb7e
|
Revert "[PERF] Decouple projections from GDN custom op" (#28080)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-04 15:58:23 -08:00 |
|
Aleksandr Malyshev
|
2d977a7a9e
|
[ROCm] gemm_a16w16 upstreaming (#26969)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-11-04 16:01:00 -05:00 |
|
Chenheli Hua
|
1fb4217a05
|
[Multimodal] Make MediaConnector extensible. (#27759)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-11-04 18:28:01 +00:00 |
|
nadavkluger
|
611c86ea3c
|
Added disable rule to track files under benchmarks/lib (#28048)
Signed-off-by: Nadav Kluger <nadav.k@fmr.ai>
|
2025-11-04 18:18:43 +00:00 |
|
Pleaplusone
|
dc937175d4
|
[ROCm][Perf] New design on ROCm AITER MHA backend Implementation (#25763)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-04 18:05:33 +00:00 |
|
Harry Mellor
|
2f1cc8cef1
|
Remove deprecated --rope-scaling and --rope-theta (#28006)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-04 18:01:56 +00:00 |
|
Nick Hill
|
938a81692e
|
[AsyncScheduling] Don't schedule past request max_tokens (#27922)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-04 17:06:28 +00:00 |
|
Nick Hill
|
c9f66da8fd
|
[PerfFix] Avoid separate thread for MP executor shm spin (#28012)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-04 08:33:55 -08:00 |
|
yt0428
|
05cae69f0f
|
[model] Add support for openPangu_Ultra_MoE (#27521)
Signed-off-by: yuantao <2422264527@qq.com>
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-04 08:17:20 -08:00 |
|
Vadim Gimpelson
|
5fd8f02ea9
|
[PERF] Decouple projections from GDN custom op (#27512)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-04 08:11:41 -08:00 |
|
lyrisz
|
97e3dda84b
|
[Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM (#27284)
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com>
Co-authored-by: Faqin Zhong <zhofaqin@amazon.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-04 07:49:25 -08:00 |
|
Nick Hill
|
5a0a6dfd55
|
[BugFix] Fix incorrect preallocated sampled_token_ids tensor size (#28025)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-04 07:38:16 -08:00 |
|
bnellnm
|
938772af03
|
[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123)
|
2025-11-04 21:59:45 +08:00 |
|
tomeras91
|
e4ee658672
|
[Model] add optimal triton fused moe configs for NemotronH MoE (#27967)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-11-04 12:59:43 +00:00 |
|
tomeras91
|
77f8001f53
|
[Model][Bugfix] fix pipeline parallelism support for NemotronH (#27968)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-11-04 12:28:36 +00:00 |
|
Zhuohan Li
|
300a265978
|
[Core] Enable StatLogger in LLMEngine (#28020)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
|
2025-11-04 04:13:35 -08:00 |
|
Jerry Zhang
|
03c4c4aa9d
|
Support using Int4PreshuffledTensor after loading (#26066)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-11-04 06:00:57 -05:00 |
|
yugong333
|
2ec401bc39
|
Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-04 18:27:35 +08:00 |
|
Varun Sundar Rabindranath
|
4022a9d279
|
[BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904)
|
2025-11-04 15:56:21 +08:00 |
|
Zhewen Li
|
53f6e81dfd
|
[CI/Build] Fix OpenAI API correctness on AMD CI (#28022)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-04 07:20:50 +00:00 |
|
CSWYF3634076
|
43a6acfb7d
|
[Model] fix ernie45 reasoning_parser (#27973)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2025-11-04 07:16:46 +00:00 |
|
Mark McLoughlin
|
58279c60b5
|
[KV Connector] Make KVCacheConfig an explicit constructor argument (#27887)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-03 23:00:49 -08:00 |
|
Zhewen Li
|
2f84ae1f27
|
[CI/Build] Update LM Eval Version in AMD CI (#27944)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-04 06:36:40 +00:00 |
|
xiangze-arm
|
f32cbc9a0c
|
[CPU]Improve dynamic 4bit moe performance (#27240)
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>
|
2025-11-04 06:33:23 +00:00 |
|
Wentao Ye
|
7e4be74104
|
[Bug] Batch invariant: Fix flash attn MLA RuntimeError: scheduler_metadata must have shape (metadata_size) (#27884)
|
2025-11-04 14:05:55 +08:00 |
|
Mark McLoughlin
|
380ba6816d
|
[Metrics] Enable sleep state metric outside of dev mode (#27867)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-03 20:35:36 -08:00 |
|
liuzhenwei
|
14a125a06d
|
[NIXL][XPU] Pin NIXL version to 0.7.0 (#27849)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2025-11-04 03:28:35 +00:00 |
|
Chauncey
|
c02fccdbd2
|
[Refactor] Lazy import tool_parser (#27974)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-04 10:10:10 +08:00 |
|
li2haipeng
|
6ddae74054
|
[LoRA] Lora shrink swizzle (#27694)
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>
Signed-off-by: Haipeng Li <li2haipeng@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-04 09:30:20 +08:00 |
|
vllmellm
|
b13a447546
|
[Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm (#27748)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-03 17:12:19 -08:00 |
|
QiliangCui
|
7956b0c0bc
|
Remove the tpu docker image nightly build. (#27997)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-11-04 00:35:54 +00:00 |
|
Tyler Michael Smith
|
3758757377
|
[Bugfix] Fix MoE Routing Simulation (#28002)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-11-03 22:26:49 +00:00 |
|
Hank_
|
ccd3e55e51
|
[Bugfix][plugin] fla crash on plugin (#27322)
|
2025-11-04 05:27:03 +08:00 |
|
Matthew Bonanni
|
01baefe674
|
Add TP parameter to attention tests (#27683)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-03 13:04:40 -08:00 |
|
Ning Xie
|
786030721e
|
[Docs] add runai_streamer_sharded to LoadConfig (#27937)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-11-03 20:35:16 +00:00 |
|
Matthew Bonanni
|
145c00a4d3
|
[Bugfix] change FlashMLA reorder_batch_threshold (#27777)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-03 15:17:10 -05:00 |
|
Lucas Kabela
|
55011aef24
|
[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile (#27764)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2025-11-03 11:12:15 -08:00 |
|
Sophie du Couédic
|
a4398fbb5e
|
[Feature][Benchmarks] Support inf burstiness (#26941)
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
|
2025-11-03 18:33:17 +00:00 |
|
Aurick Qiao
|
2c19d96777
|
[Spec Decode] Integrate Suffix Decoding from Arctic Inference (#25784)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
|
2025-11-03 09:23:31 -08:00 |
|