Divakar Verma
|
22b42b5402
|
[CI][ROCm] Install arctic-inference on ROCm tests (#29344)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-11-25 02:15:39 +00:00 |
|
gbyu-amd
|
cb7214d8ea
|
[ROCm][MLA] enable fp8 MLA decode on ROCm (#28032)
Signed-off-by: guanbao <gyu@amd.com>
Signed-off-by: Guanbao Yu <gyu@amd.com>
Signed-off-by: gbyu-amd <Guanbao.Yu@amd.com>
Co-authored-by: guanbao <gyu@amd.com>
|
2025-11-25 10:15:02 +08:00 |
|
Pleaplusone
|
77e10c9cab
|
[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence (#28029)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-24 19:05:46 -07:00 |
|
Michael Goin
|
6f1355a1b7
|
[Perf] Disable DeepGEMM MoE by default when TP=8 is used (#29346)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-24 19:01:40 -07:00 |
|
Harry Mellor
|
a4ad43ad5a
|
Scheduled removal of ParallelConfig's direct child EPLB fields (#29324)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-25 01:58:58 +00:00 |
|
Nick Hill
|
a178a0b40b
|
[BugFix] Fix duplicate id tool-call race condition (#29355)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-25 01:54:26 +00:00 |
|
Kunshang Ji
|
b8328b49fb
|
[XPU] upgrade torch & ipex 2.9 on XPU platform (#29307)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-25 09:34:47 +08:00 |
|
Hanjie Qiu
|
5f9679a43b
|
[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states (#27688)
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-11-24 20:13:12 -05:00 |
|
Wentao Ye
|
699bca76c0
|
[UX] Raise error for attn backend of batch invariant (#29348)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-24 17:49:01 -07:00 |
|
Michael Goin
|
c17610e2ba
|
[Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 (#29339)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-24 18:22:46 -05:00 |
|
Chen Zhang
|
71df2a57ef
|
[Hybrid Allocator] Better layer padding strategy for gpt-oss eagle (#29303)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-11-24 14:28:32 -08:00 |
|
Tyler Michael Smith
|
4dd42db566
|
Remove VLLM_SKIP_WARMUP tip (#29331)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-11-24 22:16:05 +00:00 |
|
Nick Hill
|
84371daf75
|
[Tests] Verify gpt_oss package is installed in harmony tests (#29336)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-24 22:04:31 +00:00 |
|
Woosuk Kwon
|
f32c7d6f54
|
[Model Runner V2] Simplify Eagle bookkeeping with num_rejected (#29347)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-24 13:54:59 -08:00 |
|
Yan Ma
|
3cfa63ad99
|
[XPU]fix Kimi-VL-A3B-thinking on xpu (#29309)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2025-11-24 21:02:21 +00:00 |
|
Benjamin Bartels
|
4d6afcaddc
|
[CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies (#29270)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
|
2025-11-24 11:40:54 -08:00 |
|
Woosuk Kwon
|
97588c4d12
|
[Model Runner V2] Add minor clarification comments for Eagle (#29332)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-24 11:28:56 -08:00 |
|
Chenheli Hua
|
839c6b7b72
|
[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. (#27721)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-24 19:24:37 +00:00 |
|
bnellnm
|
8f066146c3
|
[MoE][Refactor] Make select_experts a non-static method (#29067)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-11-24 13:38:04 -05:00 |
|
Woosuk Kwon
|
cec418b5df
|
[Model Runner V2] Change Numba AoT to JIT (#29328)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-24 09:34:37 -08:00 |
|
Woosuk Kwon
|
cc313cb73d
|
[Model Runner V2] Implement Single-step Eagle 1 (#29300)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-24 09:32:27 -08:00 |
|
Nicolò Lucchesi
|
26a465584a
|
[NIXL] Use config to enable telemetry + NIXL version bump (#29305)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-24 17:18:04 +00:00 |
|
Varun Sundar Rabindranath
|
e924bbb4f4
|
[Build/CI][DP/EP] Add QWen/Qwen3-30B-A3B-FP8 + EPLB tests to Nightly H100 and B200 (#29195)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-24 16:06:17 +00:00 |
|
Aydin Abiar
|
656516c315
|
[Bugfix] properly handle nested json with llama3 tool parser (#27701)
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Signed-off-by: Aydin Abiar <62435714+Aydin-ab@users.noreply.github.com>
Co-authored-by: Aydin Abiar <aydin@anyscale.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-11-24 15:28:51 +00:00 |
|
vllmellm
|
e48b2e6848
|
[Bugfix] [ROCm] [UX] Reorganize ROCm Backend Selection Logic (#26980)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-24 15:24:49 +00:00 |
|
Laith Sakka
|
7a228b5305
|
Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. (#26199)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-11-24 10:12:41 -05:00 |
|
Yuan Tang
|
f716a15372
|
Update KServe guide link in documentation (#29258)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-11-24 14:40:05 +00:00 |
|
WeiQing Chen
|
2601f18a82
|
[EPLB] Optimize EPLB for Async Rearrange Experts (#22179)
Signed-off-by: David Chen <530634352@qq.com>
Co-authored-by: SunChenxiang123 <1291824390@qq.com>
|
2025-11-24 09:08:29 -05:00 |
|
R3hankhan
|
4de87866a8
|
[CPU][IBM Z] Fix BF16 support and vectorize math operations for s390x (#28926)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2025-11-24 12:08:09 +00:00 |
|
Didier Durand
|
eca7a8fb59
|
[Doc]: fix typos in various files (#29230)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-24 11:10:48 +00:00 |
|
杰兮
|
8005e606bf
|
[Bugfix][Rocm] Fix shared expert weight loading failure in DeepSeek-MTP (#27563)
Signed-off-by: zhyajie <yajizhan@amd.com>
Co-authored-by: zhyajie <yajizhan@amd.com>
|
2025-11-24 10:16:52 +00:00 |
|
rongfu.leng
|
68dfe28eae
|
[Feature][Benchmark] add --link-vars can filter when serve_param equal bench_param (#28909)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-11-24 02:02:28 -08:00 |
|
Fanli Lin
|
ed40d85929
|
[BugFix] Fix R-VL model loading error (#29299)
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
|
2025-11-23 22:48:45 -08:00 |
|
Roger Wang
|
0ff70821c9
|
[Core] Deprecate xformers (#29262)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-11-24 04:18:55 +00:00 |
|
tongqiu
|
5253f4276f
|
[ROCm] Support for Whisper v1 with Aiter Unified Attention and Aiter Flash Attention (#28376)
Signed-off-by: apinge <Tong.Qiu2@amd.com>
|
2025-11-24 03:26:00 +00:00 |
|
Zero
|
30854783ad
|
[Model] Add OpenCUA-7B support (#29068)
Signed-off-by: lim4349 <rockmanzero@naver.com>
Signed-off-by: Zero <rockmanzero@naver.com>
Co-authored-by: Cloud User <ubuntu@a100-80g-4.novalocal>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-24 10:27:55 +08:00 |
|
Jee Jee Li
|
1073ba68b0
|
[LoRA] Optimize 3D MoE logic (#29222)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-24 10:27:23 +08:00 |
|
Josh Moore
|
c309bb5245
|
[Bugfix] Update Gradio OpenAI Chatbot Webserver example to new Gradio message history format (#29249)
Signed-off-by: joshiemoore <joshiemoore98@gmail.com>
|
2025-11-24 00:47:54 +00:00 |
|
Woosuk Kwon
|
3e1ad40655
|
[Model Runner V2] Add apply_temperature option to gumbel_sample (#29276)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-23 14:13:00 -08:00 |
|
Woosuk Kwon
|
62d54ba46d
|
[Model Runner V2] Optimize CUDA graph capture time (#29275)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-23 11:15:32 -08:00 |
|
Woosuk Kwon
|
b004c00418
|
[Model Runner V2] Support spec decoding [1/N] (#29274)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-23 10:09:06 -08:00 |
|
Woosuk Kwon
|
7f12c82fa6
|
[Model Runner V2] Change bookkeeping logic in preparation for spec decoding (#29194)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-23 09:42:52 -08:00 |
|
Luke
|
6fb0215eee
|
[Bugfix] Use lazy string reference for DeepseekV3Config in config registry (#28958)
Signed-off-by: Luke <yq0536@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-23 11:43:21 +00:00 |
|
Micah Williamson
|
55c21c8836
|
[ROCm][CI] Fix "Cannot re-initialize CUDA in forked subprocess" in test_pynccl.py (#29119)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-11-23 13:05:00 +08:00 |
|
rasmith
|
3999442f1c
|
[CI/Build][AMD] Add check for flash_att_varlen_func to test_tree_attention.py (#29252)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-23 04:45:08 +00:00 |
|
rasmith
|
71362ffab4
|
[CI/Build][AMD] Skip test_multi_shared_storage_connector_consistency in test_multi_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29253)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-23 04:42:49 +00:00 |
|
Woosuk Kwon
|
20ee418adc
|
[Model Runner V2] Minor fix for cudagraph_utils (#29256)
|
2025-11-22 20:12:50 -08:00 |
|
Cyrus Leung
|
389aa1b2eb
|
[Doc] Update more docs with respect to V1 (#29188)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-23 10:58:48 +08:00 |
|
Michael Act
|
3ed767ec06
|
docs: fixes distributed executor backend config for multi-node vllm (#29173)
Signed-off-by: Michael Act <michael.a.c.tulenan@gdplabs.id>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-23 10:58:28 +08:00 |
|
jiahanc
|
5f96c00c55
|
[Fix] Add SM check to flashinfer MOE backend (#29144)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-11-23 00:39:30 +00:00 |
|