Michael Goin
|
28534b92b9
|
Add Zurich vLLM Meetup (#28488)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-11 14:53:59 -08:00 |
|
wangxiyuan
|
d4902ba56d
|
[Misc] Cleanup Executor interface (#28441)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-11 22:28:07 +00:00 |
|
Kyuyeun Kim
|
df4d3a44a8
|
[TPU] Rename path to tpu platform (#28452)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2025-11-11 19:16:47 +00:00 |
|
Jee Jee Li
|
9d1c474704
|
[LoRA][1/N]Remove LoRA extra vocab (#28382)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-11 11:06:21 -08:00 |
|
Jie Luo
|
8c32c6e4b4
|
[Misc] fix typo in DCP comment (#28389)
Signed-off-by: Livinfly <luojie3m@gmail.com>
|
2025-11-11 10:59:16 -08:00 |
|
Canlin Guo
|
de120bc94f
|
[V0 deprecation] Clean up num_prefill_tokens logic for V0 (#28203)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2025-11-11 10:57:12 -08:00 |
|
Jialin Ouyang
|
4228be7959
|
[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead (#28245)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-11 10:28:47 -08:00 |
|
Lukas Geiger
|
76e4dcf225
|
[Misc] Remove unused attention prefix prefill ops functions (#26971)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-11 18:26:04 +00:00 |
|
Fanli Lin
|
d5edcb8678
|
[BugFix] Fix Siglip2Attention on XPU (#28448)
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
|
2025-11-11 18:18:02 +00:00 |
|
Xin Yang
|
6c3c0f8235
|
[Kernel] Optimize rms_norm kernel (#27931)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2025-11-11 18:02:23 +00:00 |
|
Matthew Bonanni
|
684f254585
|
Prefer FlashAttention MLA as default over FlashMLA (#27363)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-11 17:13:51 +00:00 |
|
Zhewen Li
|
e553424919
|
[CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA (#28424)
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-12 01:09:47 +08:00 |
|
xuebwang-amd
|
5a1271d83a
|
[Quantization] fix attention quantization of gpt_oss model (#27334)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
2025-11-11 12:06:00 -05:00 |
|
xuebwang-amd
|
05576df85c
|
[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model (#24239)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Co-authored-by: fxmarty-amd <felmarty@amd.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-11 12:05:22 -05:00 |
|
zhrrr
|
68c09efc37
|
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2025-11-11 12:00:31 -05:00 |
|
Nicolò Lucchesi
|
a7ef3eb0cd
|
[NIXL] Generalize block-first backend layouts (FlashInfer-like) (#28282)
|
2025-11-11 16:57:43 +00:00 |
|
Michael Goin
|
f9a4087182
|
Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel (#28431)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-11 11:46:04 -05:00 |
|
the-codeboy
|
287bbbeb06
|
[Doc] Fix typo in serving docs (#28474)
Signed-off-by: the-codeboy <71213855+the-codeboy@users.noreply.github.com>
|
2025-11-11 16:45:49 +00:00 |
|
usberkeley
|
3143eb23fc
|
[BugFix] Add test_outputs.py to CI pipeline (#28466)
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-11 16:01:30 +00:00 |
|
Fanli Lin
|
b886068056
|
[BugFix] Fix RuntimeError in PixtralHFAttention on CPU/XPU (#28444)
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
|
2025-11-11 15:29:33 +00:00 |
|
Mark McLoughlin
|
a90ad7d838
|
Add @markmc to CODEOWNERS for Observability (#28457)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-11 23:03:22 +08:00 |
|
jvlunteren
|
533b018f72
|
[BugFix] Fix Failing Ruff Check (#28469)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2025-11-11 06:41:43 -08:00 |
|
bnellnm
|
a1448b4b69
|
[Kernels] Split up fused_moe/layer.py, isolate more modular kernel code (#28064)
|
2025-11-11 07:29:02 -07:00 |
|
Maryam Tahhan
|
fa1970201d
|
[Docs] Fix grammar in CPU installation guide (#28461)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
|
2025-11-11 14:01:11 +00:00 |
|
Ido Segev
|
3380543b20
|
Add request timeout override for multi-turn benchmarks (#28386)
Signed-off-by: Ido Segev <idos@pliops.com>
|
2025-11-11 13:41:18 +00:00 |
|
Cyrus Leung
|
afffd3cc8a
|
[Model] Pass mm_features directly into get_mrope_input_positions (#28399)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-11 21:14:48 +08:00 |
|
Chaojun Zhang
|
7dbe6d81d6
|
Fix Fused MoE LoRA Triton kernel bug (#28450)
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
|
2025-11-11 20:46:47 +08:00 |
|
Matthew Bonanni
|
b30dfa03c5
|
[Attention] Refactor CUDA attention backend selection logic (#24794)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-11 07:40:44 -05:00 |
|
Michael Goin
|
2e78150d24
|
[CI] Add mergify rules for nvidia label (#28417)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-11 04:28:28 -08:00 |
|
Ido Segev
|
d381eb967f
|
Multi turn benchmark progress bar for synthetic conversation generation (#28394)
Signed-off-by: Ido Segev <idos@pliops.com>
|
2025-11-11 11:06:04 +00:00 |
|
Lukas Geiger
|
9973e6e04a
|
[Model][Qwen3VL] Slighly speedup fast_pos_embed_interpolate (#28434)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-11 10:35:10 +00:00 |
|
Fanli Lin
|
c7991269dd
|
[BugFix] 'DeepseekV2Config' object has no attribute 'use_mla'` (#28387)
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
|
2025-11-11 08:45:38 +00:00 |
|
Jiangyun Zhu
|
f0359fffa4
|
[Bugfix] fix qwen3-next crash (#28202)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-11 08:24:28 +00:00 |
|
Sage Moore
|
798c7bebca
|
[EPLB] Refactor balance_packing to use numpy and optimize GPU-CPU transfers in EPLB (#28369)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-11-11 00:19:51 -08:00 |
|
Roger Wang
|
4fd4b743a2
|
[Bugfix] Fix max image size for PaddleOCR-VL (#28442)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-11-11 08:07:24 +00:00 |
|
David Ben-David
|
cc079763c5
|
[BugFix] Avoid calling KV connector layer APIs when metadata is unset (#28253)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-10 23:39:36 -08:00 |
|
iAmir97
|
a7adbc6c6b
|
[Doc] Sleep mode documentation (#28357)
Signed-off-by: Amir Balwel <amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: Amir Balwel <amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-10 22:44:35 -08:00 |
|
Robert Shaw
|
e605e8e323
|
[Bugfix] Fix Stream Sync for Shared Expert Overlap (#28430)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-11 05:59:08 +00:00 |
|
Zuyi Zhao
|
bca74e32b7
|
[Frontend] Add sagemaker_standards dynamic lora adapter and stateful session management decorators to vLLM OpenAI API server (#27892)
Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com>
Signed-off-by: Shen Teng <sheteng@amazon.com>
Co-authored-by: Shen Teng <sheteng@amazon.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-11-11 04:57:01 +00:00 |
|
Zhuohan Li
|
8d706cca90
|
[Misc] FlattenLogprobs -> FlatLogprobs (#28335)
|
2025-11-11 03:41:23 +00:00 |
|
Xin Yang
|
57201a6a4c
|
Fix rotary embedding benchmark script (#28323)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2025-11-10 21:57:12 -05:00 |
|
Michael Goin
|
f2d9ad0620
|
Only register rocm_aiter_ops if aiter is found (#28428)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-11 02:53:24 +00:00 |
|
Wentao Ye
|
de540c0354
|
[Feature] Add env var VLLM_MOE_USE_DEEP_GEMM (#28422)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-11 02:29:48 +00:00 |
|
Lucas Wilkinson
|
39029d5192
|
[CI/Test Fix] Fix CP tests on Blackwell (#28404)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-11 01:36:29 +00:00 |
|
Wentao Ye
|
35d801f13f
|
[Feature] Refactor batch invariant fp8 DeepGEMM (#27606)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-11 00:08:40 +00:00 |
|
Matthew Bonanni
|
0bf29fadf5
|
[Test] Remove old non-varlen FA2 test (#28420)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-10 23:57:41 +00:00 |
|
Adrian Abeyta
|
a5a790eea6
|
[Bugfix] Ensure calculated KV scales are applied in attention. (#27232)
Signed-off-by: adabeyta <aabeyta@redhat.com>
|
2025-11-10 23:42:37 +00:00 |
|
Jialin Ouyang
|
b30372cbd0
|
[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-10 15:34:18 -08:00 |
|
Ilya Markov
|
d17ecc6b19
|
[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-11-10 18:33:11 -05:00 |
|
Yong Hoon Shin
|
021143561f
|
[ROCm] Add missing gemm_a8w8_blockscale import (#28378)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-11-10 23:13:36 +00:00 |
|