Mengqing Cao
|
f3a683b7c9
|
[Bugfix][Logprobs] Fix logprobs op to support more backend (#21591)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2025-07-25 05:53:07 -07:00 |
|
Cyrus Leung
|
46d81d6951
|
[V1] Get supported tasks from model runner instead of model config (#21585)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-25 05:36:45 -07:00 |
|
Jee Jee Li
|
5c3f2628d5
|
[Quantization] Enable BNB support for more MoE models (#21370)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-25 03:57:34 -07:00 |
|
Kebe
|
7311f74468
|
[Bugfix] GGUF: fix AttributeError: 'PosixPath' object has no attribute 'startswith' (#21579)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-07-25 03:42:23 -07:00 |
|
Xu Wenqing
|
8ed01e32f7
|
Add H20-3e fused MoE kernel tuning configs for Qwen3-Coder-480B-A35B-Instruct (#21598)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
|
2025-07-25 02:36:55 -07:00 |
|
Nick Hill
|
e38e96a3c0
|
[Tests] Harden DP tests (#21508)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-25 02:27:24 -07:00 |
|
Chengji Yao
|
40d86ee412
|
[TPU][Bugfix] fix OOM issue in CI test (#21550)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-24 23:01:53 -07:00 |
|
Yang Chen
|
85d051f026
|
[Misc] Removed undefined cmake variables MOE_PERMUTE_ARCHS (#21262)
Signed-off-by: Yang Chen <yangche@fb.com>
|
2025-07-24 22:54:23 -07:00 |
|
Ignacio Sica
|
5140f54b89
|
[CI/Build] fix cpu_extension for apple silicon (#21195)
Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
|
2025-07-24 22:53:59 -07:00 |
|
Chengji Yao
|
947edd099e
|
[Misc][Tools] make max-model-len a parameter in auto_tune script (#21321)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-24 22:46:43 -07:00 |
|
hfan
|
fde60ee775
|
[Model] Fix a check for None but the return value was empty list in Gemma3 MM vision_embeddings (#21479)
Signed-off-by: Hongmin Fan <fanhongmin@google.com>
|
2025-07-25 13:46:06 +08:00 |
|
Jason Gu
|
b38bc652ac
|
[Model] Support tensor parallel for timm ViT in Deepseek_vl2 (#21494)
Signed-off-by: wzqd <1057337859@qq.com>
|
2025-07-24 22:45:16 -07:00 |
|
Ning Xie
|
adaf2c6d4f
|
[Bugfix] fix modelscope snapshot_download serialization (#21536)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-24 22:44:38 -07:00 |
|
Li, Jiang
|
42343f1f89
|
[CI] Update CODEOWNERS for CPU and Intel GPU (#21582)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-24 21:58:03 -07:00 |
|
Benji Beck
|
965bc71b04
|
Integrate TensorSchema with shape validation for Phi3VImagePixelInputs (#21232)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-24 21:43:52 -07:00 |
|
Zhou Fang
|
807a328bb6
|
[Docs] Add requirements/common.txt to run unit tests (#21572)
Signed-off-by: Zhou Fang <fang.github@gmail.com>
|
2025-07-24 20:51:15 -07:00 |
|
QiliangCui
|
e0be2c4d09
|
[TPU][Test] Temporarily suspend this MoE model in test_basic.py. (#21560)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-24 20:44:50 -07:00 |
|
Nick Hill
|
9c8b2c2a8a
|
[DP] Support api-server-count > 0 in hybrid DP LB mode (#21510)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-24 20:18:16 -07:00 |
|
Varun Sundar Rabindranath
|
2212cd6cfb
|
[Bugfix] DeepGemm utils : Fix hardcoded type-cast (#21517)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-24 20:17:29 -07:00 |
|
Burkhard Ringlein
|
ce3a9b1378
|
[Kernel] adding fused_moe configs for upcoming granite4 (#21332)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-24 20:16:59 -07:00 |
|
Yuxuan Zhang
|
2ce90e5b01
|
Fix GLM-4 PP Missing Layer When using with PP. (#21531)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-07-24 20:07:38 -07:00 |
|
Wentao Ye
|
633f6e804b
|
[Bug] Fix DeepGemm Init Error (#21554)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-24 20:07:22 -07:00 |
|
Harry Mellor
|
b57296bb9a
|
[Docs] Fix site_url for RunLLM (#21564)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-24 20:05:58 -07:00 |
|
Cyrus Leung
|
34ddcf9ff4
|
[Frontend] run-batch supports V1 (#21541)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-24 20:05:55 -07:00 |
|
Woosuk Kwon
|
fe56180c7f
|
[MoE] More balanced expert sharding (#21497)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-07-24 15:56:08 -07:00 |
|
QiliangCui
|
07d80d7b0e
|
[TPU][TEST] HF_HUB_DISABLE_XET=1 the test 3. (#21539)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-24 15:33:04 -07:00 |
|
weiliang
|
2dd72d23d9
|
update flashinfer to v0.2.9rc1 (#21485)
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
|
2025-07-24 14:06:11 -07:00 |
|
Simon Mo
|
a6c7fb8cff
|
[Docs] Add Expert Parallelism Initial Documentation (#21373)
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-24 12:36:06 -07:00 |
|
Ricardo Decal
|
a7272c23d0
|
[Docs][minor] Fix broken gh-file link in distributed serving docs (#21543)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-24 10:36:56 -07:00 |
|
Juncheng Gu
|
6066284914
|
[P/D] Support CPU Transfer in NixlConnector (#18293)
Signed-off-by: Juncheng Gu <juncgu@gmail.com>
Signed-off-by: Richard Liu <ricliu@google.com>
Co-authored-by: Richard Liu <39319471+richardsliu@users.noreply.github.com>
Co-authored-by: Richard Liu <ricliu@google.com>
|
2025-07-24 17:58:42 +01:00 |
|
Rui Qiao
|
1e9ea8e69d
|
[P/D] Move FakeNixlWrapper to test dir (#21328)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-24 08:53:45 -07:00 |
|
Chaojun Zhang
|
d9f9a3fd96
|
[XPU] Conditionally import CUDA-specific passes to avoid import errors on xpu platform (#21036)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
|
2025-07-24 23:23:36 +08:00 |
|
Shu Wang
|
1b25f1fe75
|
Update flashinfer CUTLASS MoE Kernel (#21408)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
|
2025-07-24 08:13:31 -07:00 |
|
Wentao Ye
|
e8cb0d0495
|
[Bug] Fix Compressed Tensor NVFP4 cutlass_fp4_group_mm illegal memory access (#21465)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-24 08:13:24 -07:00 |
|
Ricardo Decal
|
684174115d
|
[Docs] Rewrite Distributed Inference and Serving guide (#20593)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-24 08:13:05 -07:00 |
|
Sanger Steel
|
cdb79ee63d
|
[Docs] Update Tensorizer usage documentation (#21190)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Signed-off-by: William Goldby <willgoldby@gmail.com>
Co-authored-by: William Goldby <willgoldby@gmail.com>
|
2025-07-24 06:56:18 -07:00 |
|
elvischenv
|
5a19a6c670
|
[Fix] Update mamba_ssm to 2.2.5 (#21421)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-07-24 03:25:41 -07:00 |
|
Ming Yang
|
2ded067fd2
|
[Bugfix] Fix CUDA arch flags for MoE permute (#21426)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-07-24 03:23:59 -07:00 |
|
Harry Mellor
|
13abd0eaf9
|
[Model] Officially support Emu3 with Transformers backend (#21319)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-24 03:22:12 -07:00 |
|
Lucas Wilkinson
|
61b8cea3b4
|
[Attention] Optimize FlashInfer MetadataBuilder Build call (#21137)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-07-24 03:21:46 -07:00 |
|
cjackal
|
526078a96c
|
bump flashinfer to v0.2.8 (#21385)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2025-07-24 03:20:38 -07:00 |
|
Chauncey
|
6da0078523
|
[Feat] Allow custom naming of vLLM processes (#21445)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-24 03:15:23 -07:00 |
|
Rui Qiao
|
73e3949d07
|
[Misc] Improve comment for DPEngineCoreActor._set_cuda_visible_devices() (#21501)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-24 03:13:40 -07:00 |
|
Shintarou Okada
|
6eca337ce0
|
Replace --expand-tools-even-if-tool-choice-none with --exclude-tools-when-tool-choice-none for v0.10.0 (#20544)
Signed-off-by: okada <kokuzen@gmail.com>
Signed-off-by: okada shintarou <okada@preferred.jp>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-24 02:56:36 -07:00 |
|
Yuxuan Zhang
|
85bda9e7d0
|
remove GLM-4.5 quantization wrong Code (#21435)
|
2025-07-24 01:52:43 -07:00 |
|
22quinn
|
610852a423
|
[Core] Support model loader plugins (#21067)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-24 01:49:44 -07:00 |
|
Nick Hill
|
f0f4de8f26
|
[Misc] Fix duplicate FusedMoEConfig debug messages (#21455)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-24 01:27:30 -07:00 |
|
Zhou Fang
|
fc5f756db4
|
[v1][Core] Clean up usages of SpecializedManager (#21407)
Signed-off-by: Zhou Fang <fang.github@gmail.com>
|
2025-07-24 00:40:11 -07:00 |
|
Chengji Yao
|
e74bfc70e4
|
[TPU][Bugfix] fix moe layer (#21340)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-07-24 00:38:39 -07:00 |
|
Gregory Shtrasberg
|
90eeea8f85
|
[Bugfix][ROCm] Fix for warp_size uses on host (#21205)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-07-24 00:37:19 -07:00 |
|