Harry Mellor
|
54aecd9ed5
|
Fix pre-commit (and XPU) on main (#28556)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 06:13:41 -08:00 |
|
wangxiyuan
|
10138c92a5
|
[V0 deprecation] Deprecate use_v1 parameter (#28112)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-12 14:03:52 +00:00 |
|
Jee Jee Li
|
a9d18b5107
|
[Bugfix] Fix gpt_oss packed_modules_mapping (#28536)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-12 21:02:06 +08:00 |
|
TJian
|
edb59a9470
|
[ROCm] [Bugfix] Fix fused_qknorm_rope_kernel rocm compatibility (#28500)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-11-12 05:01:14 -08:00 |
|
ZhengHongming888
|
c5f10cc139
|
add cpu option for p/d in nixl_connector (#28356)
Signed-off-by: Hongming Zheng <hongming.zheng@intel.com>
|
2025-11-12 11:53:08 +00:00 |
|
ziruiliu
|
d143152308
|
[KVConnector] Enable get_block_ids_with_load_errors() in LMCache connector (#27978)
Signed-off-by: Zirui Liu <ziliu@ddn.com>
Signed-off-by: ziruiliu <ziliu@ddn.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-11-12 11:44:58 +01:00 |
|
Chaojun Zhang
|
a4730c1b4f
|
[XPU]Fix crash due to removed VLLM_USE_V1 attribute (#28520)
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
|
2025-11-12 10:20:55 +00:00 |
|
wuyaoxuehun
|
d3ade61e42
|
[Model] fix glm4_moe_mtp load weights with GLM-4.6 checkpoint. (#27597)
Signed-off-by: wuao.scotty <wuao.scotty@bytedance.com>
Co-authored-by: wuao.scotty <wuao.scotty@bytedance.com>
|
2025-11-12 10:14:00 +00:00 |
|
yyzxw
|
1761dea1a8
|
[BugFix]: --enable-lora with model granite-4.0-micro crash (#27733)
Signed-off-by: zxw <1020938856@qq.com>
|
2025-11-12 09:03:56 +00:00 |
|
Huamin Li
|
c748355e0d
|
[CI] Introduce autorun_on_main feature (#27836)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-11-12 08:51:19 +00:00 |
|
Chenguang Zheng
|
91864b79b3
|
[CI/Build] Fix crash due to removed VLLM_USE_V1 attribute in EPD (#28521)
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-11 23:09:33 -08:00 |
|
Lukas Geiger
|
ac0bb2c307
|
[Core] Cache vllm_is_batch_invariant (#28304)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-12 05:03:01 +00:00 |
|
ai-jz
|
f31419ed8b
|
[Benchmark] Add retry support to fix workload bias in multi-turn benchmark (#28493)
|
2025-11-12 05:00:45 +00:00 |
|
Fanli Lin
|
b9ce9a3013
|
[BugFix] Add fallback path in apply_rotary_pos_emb_flashattn for non-cuda platforms (#28447)
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
|
2025-11-12 03:13:21 +00:00 |
|
Chenguang Zheng
|
4ccffe561f
|
[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233)
Signed-off-by: n00909098 <nguyen.kha.long@huawei.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Signed-off-by: Khuong Le <khuong.le.manh@huawei.com>
Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com>
Co-authored-by: n00909098 <nguyen.kha.long@huawei.com>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Khuong Le <khuong.le.manh@huawei.com>
Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>
|
2025-11-11 18:58:33 -08:00 |
|
Lukas Geiger
|
cbb799e314
|
[Model][Qwen3VL] Simplify get_mrope_input_positions using numpy (#28302)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-12 02:55:10 +00:00 |
|
Andreas Karatzas
|
9f0247cfa4
|
VLLM_USE_TRITON_FLASH_ATTN V0 variable deprecation (#27611)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>
|
2025-11-11 18:34:36 -08:00 |
|
Li, Jiang
|
7f829be7d3
|
[CPU] Refactor CPU attention backend (#27954)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-12 09:43:06 +08:00 |
|
wangxiyuan
|
e1710393c4
|
[[V0 deprecation]]Remove VLLM_USE_V1 env (#28204)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-11 18:22:16 -07:00 |
|
Isotr0py
|
3f770f4427
|
[Performance] Cache loaded custom logitsprocs to avoid overheads (#28462)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-11 16:49:29 -08:00 |
|
Yanan Cao
|
48c879369f
|
[Frontend] Change CompilationMode to a proper Enum (#28165)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-11 19:46:18 -05:00 |
|
Ilya Markov
|
1788aa1efb
|
[BugFix] Graceful handling of torch symm mem errors. (#27671)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-11 17:41:54 -07:00 |
|
Adrian Abeyta
|
d23539549a
|
Use FLASHINFER MLA backend when testing fp8_kv_scale_compile (#28491)
Signed-off-by: adabeyta <aabeyta@redhat.com>
|
2025-11-12 00:34:58 +00:00 |
|
Max Hu
|
412e153df5
|
[Feature] Allow configuring FlashInfer workspace size (#28269)
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-11 23:32:20 +00:00 |
|
Michael Goin
|
e5f599d4d1
|
[Bugfix] Disable shared expert overlap if Marlin MoE is used (#28410)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-11 23:16:12 +00:00 |
|
Michael Goin
|
28534b92b9
|
Add Zurich vLLM Meetup (#28488)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-11 14:53:59 -08:00 |
|
wangxiyuan
|
d4902ba56d
|
[Misc] Cleanup Executor interface (#28441)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-11 22:28:07 +00:00 |
|
Kyuyeun Kim
|
df4d3a44a8
|
[TPU] Rename path to tpu platform (#28452)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2025-11-11 19:16:47 +00:00 |
|
Jee Jee Li
|
9d1c474704
|
[LoRA][1/N]Remove LoRA extra vocab (#28382)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-11 11:06:21 -08:00 |
|
Jie Luo
|
8c32c6e4b4
|
[Misc] fix typo in DCP comment (#28389)
Signed-off-by: Livinfly <luojie3m@gmail.com>
|
2025-11-11 10:59:16 -08:00 |
|
Canlin Guo
|
de120bc94f
|
[V0 deprecation] Clean up num_prefill_tokens logic for V0 (#28203)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2025-11-11 10:57:12 -08:00 |
|
Jialin Ouyang
|
4228be7959
|
[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead (#28245)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-11 10:28:47 -08:00 |
|
Lukas Geiger
|
76e4dcf225
|
[Misc] Remove unused attention prefix prefill ops functions (#26971)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-11 18:26:04 +00:00 |
|
Fanli Lin
|
d5edcb8678
|
[BugFix] Fix Siglip2Attention on XPU (#28448)
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
|
2025-11-11 18:18:02 +00:00 |
|
Xin Yang
|
6c3c0f8235
|
[Kernel] Optimize rms_norm kernel (#27931)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2025-11-11 18:02:23 +00:00 |
|
Matthew Bonanni
|
684f254585
|
Prefer FlashAttention MLA as default over FlashMLA (#27363)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-11 17:13:51 +00:00 |
|
Zhewen Li
|
e553424919
|
[CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA (#28424)
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-12 01:09:47 +08:00 |
|
xuebwang-amd
|
5a1271d83a
|
[Quantization] fix attention quantization of gpt_oss model (#27334)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
2025-11-11 12:06:00 -05:00 |
|
xuebwang-amd
|
05576df85c
|
[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model (#24239)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Co-authored-by: fxmarty-amd <felmarty@amd.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-11 12:05:22 -05:00 |
|
zhrrr
|
68c09efc37
|
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2025-11-11 12:00:31 -05:00 |
|
Nicolò Lucchesi
|
a7ef3eb0cd
|
[NIXL] Generalize block-first backend layouts (FlashInfer-like) (#28282)
|
2025-11-11 16:57:43 +00:00 |
|
Michael Goin
|
f9a4087182
|
Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel (#28431)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-11 11:46:04 -05:00 |
|
the-codeboy
|
287bbbeb06
|
[Doc] Fix typo in serving docs (#28474)
Signed-off-by: the-codeboy <71213855+the-codeboy@users.noreply.github.com>
|
2025-11-11 16:45:49 +00:00 |
|
usberkeley
|
3143eb23fc
|
[BugFix] Add test_outputs.py to CI pipeline (#28466)
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-11 16:01:30 +00:00 |
|
Fanli Lin
|
b886068056
|
[BugFix] Fix RuntimeError in PixtralHFAttention on CPU/XPU (#28444)
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
|
2025-11-11 15:29:33 +00:00 |
|
Mark McLoughlin
|
a90ad7d838
|
Add @markmc to CODEOWNERS for Observability (#28457)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-11 23:03:22 +08:00 |
|
jvlunteren
|
533b018f72
|
[BugFix] Fix Failing Ruff Check (#28469)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2025-11-11 06:41:43 -08:00 |
|
bnellnm
|
a1448b4b69
|
[Kernels] Split up fused_moe/layer.py, isolate more modular kernel code (#28064)
|
2025-11-11 07:29:02 -07:00 |
|
Maryam Tahhan
|
fa1970201d
|
[Docs] Fix grammar in CPU installation guide (#28461)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
|
2025-11-11 14:01:11 +00:00 |
|
Ido Segev
|
3380543b20
|
Add request timeout override for multi-turn benchmarks (#28386)
Signed-off-by: Ido Segev <idos@pliops.com>
|
2025-11-11 13:41:18 +00:00 |
|