Doug Smith
|
b9b753e7a7
|
For VLLM_USE_PRECOMPILED, only compiled .so files should be extracted (#21964)
|
2025-07-30 13:04:40 -07:00 |
|
Nick Hill
|
56bd537dde
|
[Misc] Support more collective_rpc return types (#21845)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-30 10:20:20 -07:00 |
|
wenxindongwork
|
8f0d516715
|
[TPU] Support Pathways in vLLM (#21417)
Signed-off-by: wenxindongwork <wenxindong@google.com>
|
2025-07-30 10:02:12 -07:00 |
|
wxsm
|
f4135232b9
|
feat(distributed): add get_required_kvcache_layout class method to kv connector api (#20433)
Signed-off-by: wxsm <wxsms@foxmail.com>
|
2025-07-30 16:41:51 +00:00 |
|
Chenguang Zheng
|
4904e53c32
|
[Bugfix] SharedStorage Connector for V1 PD multimodal (#21611)
Signed-off-by: fake0fan <645327136@qq.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
|
2025-07-30 09:18:37 -07:00 |
|
Cyrus Leung
|
004203e953
|
[CI/Build] Fix registry tests (#21934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-30 09:10:41 -07:00 |
|
633WHU
|
5c765aec65
|
[Bugfix] Fix TypeError in scheduler when comparing mixed request_id types (#21816)
Signed-off-by: chiliu <chiliu@paypal.com>
Co-authored-by: chiliu <chiliu@paypal.com>
|
2025-07-30 08:54:44 -07:00 |
|
Yong Hoon Shin
|
ad510309ee
|
Override attention metadata for fast prefill in some KV sharing setups (#21590)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-30 08:54:15 -07:00 |
|
Cyrus Leung
|
366f6b3a4d
|
[Bugfix] Fix multi-api server not working for text models (#21933)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-30 08:42:05 -07:00 |
|
Isotr0py
|
6e599eebe8
|
[Bugfix] Fix OOM tests in initialization test (#21921)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-30 07:35:47 -07:00 |
|
Harry Mellor
|
88edf5994c
|
[Docs] Reduce the size of the built docs (#21920)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-30 07:35:08 -07:00 |
|
Po-Han Huang (NVIDIA)
|
ff08e51940
|
[NVIDIA] Fix Llama4 Scout FP4 functionality issues (#21499)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-07-30 07:33:40 -07:00 |
|
Ruixiang Tan
|
8f4a1c9a04
|
[Misc] Improve code readability of KVCacheManager (#21673)
Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>
Signed-off-by: Ruixiang Tan <819464715@qq.com>
Signed-off-by: GitHub <noreply@github.com>
|
2025-07-30 07:20:43 -07:00 |
|
Harry Mellor
|
36ede45989
|
Reduce time wasted in GitHub Actions using concurrency (#21919)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-30 07:18:02 -07:00 |
|
Cyrus Leung
|
0e40b26073
|
[CI/Build] Only run markdownlint in CI (#21892)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-30 07:17:14 -07:00 |
|
Wentao Ye
|
0271c2ff2f
|
[Test] Add Benchmark and Unit Test for per_token_group_quant (#21860)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-30 07:15:02 -07:00 |
|
youkaichao
|
e91d3c9cda
|
[misc] skip p2p check by default (#21904)
|
2025-07-30 22:05:04 +08:00 |
|
Yan Pashkovsky
|
bf668b5bf5
|
[Feature] Support multiple api keys in server (#18548)
Signed-off-by: Yan Pashkovsky <yanp.bugz@gmail.com>
|
2025-07-30 07:03:23 -07:00 |
|
rongfu.leng
|
da3e0bd6e5
|
[Bugfix] we should use metavar is not choices (#21902)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-07-30 06:51:58 -07:00 |
|
Cyrus Leung
|
fcfd1eb9c5
|
[Doc] Remove vLLM prefix and add citation for PagedAttention (#21910)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-30 06:36:34 -07:00 |
|
aladerran
|
d979dd6beb
|
[Feature][EPLB] Add eplb support for Qwen3 (#20815)
Signed-off-by: aladerran <aladerran@gmail.com>
|
2025-07-30 06:27:57 -07:00 |
|
Eric Curtin
|
b876860c62
|
[Hardware][CPU] Build fix for ARM without BF16 (#21848)
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
|
2025-07-30 06:22:00 -07:00 |
|
Patrick von Platen
|
13986365a9
|
Add @patrickvonplaten as maintainer of mistral's related files. (#21928)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-07-30 20:42:51 +08:00 |
|
Hongsheng Liu
|
5c8fe389d6
|
[Docs] Fix the example code of streaming chat completions in reasoning (#21825)
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: Zi Wang <66560864+BruceW-07@users.noreply.github.com>
|
2025-07-30 12:11:58 +00:00 |
|
Cyrus Leung
|
5bbaf492a6
|
[Doc] Update partial support (#21916)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-30 01:32:39 -07:00 |
|
Peter Pan
|
533db0935d
|
[benchmark] add max-concurrency in result table (#21095)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-07-30 01:15:43 -07:00 |
|
Jee Jee Li
|
fc91da5499
|
[Model] Remove DSV2 unused code (#21903)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-30 00:55:03 -07:00 |
|
Varun Vinayak Shenoy
|
547795232d
|
[Tests] Fixing bug inside MultiModalProfiler. (#21842)
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>
|
2025-07-30 00:44:15 -07:00 |
|
Kebe
|
30ef30ed5a
|
[CI] rollback lint-and-deploy pipeline using amd machine (#21912)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-07-30 00:37:59 -07:00 |
|
Jee Jee Li
|
02f82fe438
|
[Doc] Update Intern-S1 info (#21908)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-29 23:58:57 -07:00 |
|
Cyrus Leung
|
2ca5f82c2a
|
[Misc] Remove redundant config definitions (#21891)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-29 23:54:18 -07:00 |
|
Louie Tsai
|
6f8d261882
|
Update vLLM Benchmark Suite for Xeon based on 0.9.2 release (#21486)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-07-30 05:57:03 +00:00 |
|
Ricardo Decal
|
4cd7fe6cea
|
[Docs] Expand introduction to Ray in Multi-node deployment section (#21584)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-29 22:07:28 -07:00 |
|
Cyrus Leung
|
16f3250527
|
[CI/Build] Fix pre-commit failure in docs (#21897)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-29 21:53:08 -07:00 |
|
Tao He
|
e3bc17ceea
|
Add @sighingnow as maintainer of qwen's related files. (#21895)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
|
2025-07-29 21:30:44 -07:00 |
|
Kunshang Ji
|
05cbbe20c5
|
[XPU] use ZE_AFFINITY_MASK for device select on xpu (#21815)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-07-30 03:56:14 +00:00 |
|
wang.yuqi
|
65f311ce59
|
[Frontend] Add LLM.reward specific to reward models (#21720)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-29 20:56:03 -07:00 |
|
Wentao Ye
|
1b0a155534
|
[Perf] Using __nv_fp8_e4m3 instead of c10::e4m3 for per_token_group_quant (#21867)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-29 21:50:46 -06:00 |
|
Cyrus Leung
|
44bc46da60
|
[Bugfix] Actually disable processing cache when API server is scaled out (#21839)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-29 20:36:04 -07:00 |
|
MingzhenHan
|
b7b23da4d2
|
[Bugfix] Fix comment typo of get_num_common_prefix_blocks() (#21827)
Signed-off-by: MingzhenHan <hanmingzhen2002@outlook.com>
|
2025-07-29 20:35:33 -07:00 |
|
Areeb Syed
|
fdde18229e
|
[Bugfix] Fix shape mismatch assertion error when loading Gemma3n model with BitsAndBytes quantization (#21808)
Signed-off-by: sydarb <areebsyed237@gmail.com>
|
2025-07-30 11:35:21 +08:00 |
|
Csrayz
|
b917da442b
|
Expose PyTorch profiler configuration to environment variables (#21803)
Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com>
|
2025-07-29 19:46:31 -07:00 |
|
Michael Goin
|
fb58e3a651
|
[Docs] Update docker.md with HF_TOKEN, new model, and podman fix (#21856)
|
2025-07-29 19:45:41 -07:00 |
|
Chen Zhang
|
76080cff79
|
[DOC] Fix path of v1 related figures (#21868)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-29 19:45:18 -07:00 |
|
Harry Mellor
|
ba5c5e5404
|
[Docs] Switch to better markdown linting pre-commit hook (#21851)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-29 19:45:08 -07:00 |
|
Chen Zhang
|
555e7225bc
|
[v1][attention] Support Hybrid Allocator + FlashInfer (#21412)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-30 01:45:29 +00:00 |
|
milesial
|
0e36abf993
|
[Bugfix] Correct max tokens for non-contiguous embeds (#21798)
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Co-authored-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
|
2025-07-30 01:16:25 +00:00 |
|
Simon Mo
|
452b2a3180
|
[ci] mark blackwell test optional for now (#21878)
|
2025-07-29 18:03:27 -07:00 |
|
Simon Mo
|
0d0cc9e150
|
[ci] add b200 test placeholder (#21866)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-07-29 17:11:50 -07:00 |
|
Yong Hoon Shin
|
9266d98048
|
[BugFix] Fix interleaved sliding window not set for Gemma3n (#21863)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-29 16:34:19 -07:00 |
|