Zhengxu Chen
|
7349d5268b
|
[ez] Remove a trailing space from compilation/decorators.py (#22028)
|
2025-07-31 09:46:07 -07:00 |
|
Song
|
9484641616
|
[Model] Add step3 vl (#21998)
Signed-off-by: oliveryuan <yuansong@step.ai>
Co-authored-by: oliveryuan <yuansong@step.ai>
|
2025-07-31 23:19:06 +08:00 |
|
amirkl94
|
207b750e19
|
[NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend (#21458)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-31 06:00:01 -07:00 |
|
Nick Hill
|
5daffe7cf6
|
[BugFix] Fix case where collective_rpc returns None (#22006)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-31 12:51:37 +00:00 |
|
wang.yuqi
|
2836dd73f1
|
[Model][CI] Let more pooling models support v1 (#21747)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-31 01:51:15 -07:00 |
|
Daniele
|
d2aab336ad
|
[CI/Build] get rid of unused VLLM_FA_CMAKE_GPU_ARCHES (#21599)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
|
2025-07-31 15:00:08 +08:00 |
|
Cyrus Leung
|
9532a6d563
|
[Deprecation] Remove deprecated args and methods (#21907)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-30 23:46:38 -07:00 |
|
Ning Xie
|
3e36fcbee6
|
[Bugfix]: fix metadata file copy in test_sharded_state_loader (#21830)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-31 06:22:11 +00:00 |
|
Michael Goin
|
055bd3978e
|
[CI Bugfix] Fix CI OOM for test_shared_storage_connector_hashes (#21973)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-31 11:45:29 +08:00 |
|
Jee Jee Li
|
0f7919fca0
|
[Misc] Expand SUPPORTED_HIDDEN_SIZES for DeepEP low-latency kernels (#21818)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-30 20:41:12 -07:00 |
|
Michael Goin
|
61445453df
|
[UX] Rename CUTLASS_MLA_VLLM_V1 to CUTLASS_MLA (#21966)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-30 20:40:34 -07:00 |
|
Sanchit Gandhi
|
ec02e536df
|
[Bugfix] Relax lang pin for voxtral (#21833)
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-30 20:38:52 -07:00 |
|
Michael Goin
|
9cb497bfa3
|
[Example] Add async_llm_streaming.py example for AsyncLLM streaming in python (#21763)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-30 18:39:46 -06:00 |
|
Zebing Lin
|
ca9e2be3ed
|
[Core] Move EngineCoreRequest to Request conversion out of EngineCore (#21627)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-07-30 15:00:54 -07:00 |
|
Bram
|
601f856d56
|
[Bugfix] Fix None value handling in trace span creation for cancelled requests (#20272)
|
2025-07-30 14:44:02 -07:00 |
|
cascade
|
287f527f54
|
[Feature] Add async tensor parallelism for scaled mm (#20155)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-07-30 17:23:41 -04:00 |
|
Ming Yang
|
f12d9256b3
|
[Misc] Use dracut on CentOS and skip clone if repo exists for EP kernel installation (#21635)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-07-30 13:15:06 -07:00 |
|
Doug Smith
|
b9b753e7a7
|
For VLLM_USE_PRECOMPILED, only compiled .so files should be extracted (#21964)
|
2025-07-30 13:04:40 -07:00 |
|
Nick Hill
|
56bd537dde
|
[Misc] Support more collective_rpc return types (#21845)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-30 10:20:20 -07:00 |
|
wenxindongwork
|
8f0d516715
|
[TPU] Support Pathways in vLLM (#21417)
Signed-off-by: wenxindongwork <wenxindong@google.com>
|
2025-07-30 10:02:12 -07:00 |
|
wxsm
|
f4135232b9
|
feat(distributed): add get_required_kvcache_layout class method to kv connector api (#20433)
Signed-off-by: wxsm <wxsms@foxmail.com>
|
2025-07-30 16:41:51 +00:00 |
|
Chenguang Zheng
|
4904e53c32
|
[Bugfix] SharedStorage Connector for V1 PD multimodal (#21611)
Signed-off-by: fake0fan <645327136@qq.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
|
2025-07-30 09:18:37 -07:00 |
|
Cyrus Leung
|
004203e953
|
[CI/Build] Fix registry tests (#21934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-30 09:10:41 -07:00 |
|
633WHU
|
5c765aec65
|
[Bugfix] Fix TypeError in scheduler when comparing mixed request_id types (#21816)
Signed-off-by: chiliu <chiliu@paypal.com>
Co-authored-by: chiliu <chiliu@paypal.com>
|
2025-07-30 08:54:44 -07:00 |
|
Yong Hoon Shin
|
ad510309ee
|
Override attention metadata for fast prefill in some KV sharing setups (#21590)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-30 08:54:15 -07:00 |
|
Cyrus Leung
|
366f6b3a4d
|
[Bugfix] Fix multi-api server not working for text models (#21933)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-30 08:42:05 -07:00 |
|
Isotr0py
|
6e599eebe8
|
[Bugfix] Fix OOM tests in initialization test (#21921)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-30 07:35:47 -07:00 |
|
Harry Mellor
|
88edf5994c
|
[Docs] Reduce the size of the built docs (#21920)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-30 07:35:08 -07:00 |
|
Po-Han Huang (NVIDIA)
|
ff08e51940
|
[NVIDIA] Fix Llama4 Scout FP4 functionality issues (#21499)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-07-30 07:33:40 -07:00 |
|
Ruixiang Tan
|
8f4a1c9a04
|
[Misc] Improve code readability of KVCacheManager (#21673)
Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>
Signed-off-by: Ruixiang Tan <819464715@qq.com>
Signed-off-by: GitHub <noreply@github.com>
|
2025-07-30 07:20:43 -07:00 |
|
Harry Mellor
|
36ede45989
|
Reduce time wasted in GitHub Actions using concurrency (#21919)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-30 07:18:02 -07:00 |
|
Cyrus Leung
|
0e40b26073
|
[CI/Build] Only run markdownlint in CI (#21892)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-30 07:17:14 -07:00 |
|
Wentao Ye
|
0271c2ff2f
|
[Test] Add Benchmark and Unit Test for per_token_group_quant (#21860)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-30 07:15:02 -07:00 |
|
youkaichao
|
e91d3c9cda
|
[misc] skip p2p check by default (#21904)
|
2025-07-30 22:05:04 +08:00 |
|
Yan Pashkovsky
|
bf668b5bf5
|
[Feature] Support multiple api keys in server (#18548)
Signed-off-by: Yan Pashkovsky <yanp.bugz@gmail.com>
|
2025-07-30 07:03:23 -07:00 |
|
rongfu.leng
|
da3e0bd6e5
|
[Bugfix] we should use metavar is not choices (#21902)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-07-30 06:51:58 -07:00 |
|
Cyrus Leung
|
fcfd1eb9c5
|
[Doc] Remove vLLM prefix and add citation for PagedAttention (#21910)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-30 06:36:34 -07:00 |
|
aladerran
|
d979dd6beb
|
[Feature][EPLB] Add eplb support for Qwen3 (#20815)
Signed-off-by: aladerran <aladerran@gmail.com>
|
2025-07-30 06:27:57 -07:00 |
|
Eric Curtin
|
b876860c62
|
[Hardware][CPU] Build fix for ARM without BF16 (#21848)
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
|
2025-07-30 06:22:00 -07:00 |
|
Patrick von Platen
|
13986365a9
|
Add @patrickvonplaten as maintainer of mistral's related files. (#21928)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-07-30 20:42:51 +08:00 |
|
Hongsheng Liu
|
5c8fe389d6
|
[Docs] Fix the example code of streaming chat completions in reasoning (#21825)
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: Zi Wang <66560864+BruceW-07@users.noreply.github.com>
|
2025-07-30 12:11:58 +00:00 |
|
Cyrus Leung
|
5bbaf492a6
|
[Doc] Update partial support (#21916)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-30 01:32:39 -07:00 |
|
Peter Pan
|
533db0935d
|
[benchmark] add max-concurrency in result table (#21095)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-07-30 01:15:43 -07:00 |
|
Jee Jee Li
|
fc91da5499
|
[Model] Remove DSV2 unused code (#21903)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-30 00:55:03 -07:00 |
|
Varun Vinayak Shenoy
|
547795232d
|
[Tests] Fixing bug inside MultiModalProfiler. (#21842)
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>
|
2025-07-30 00:44:15 -07:00 |
|
Kebe
|
30ef30ed5a
|
[CI] rollback lint-and-deploy pipeline using amd machine (#21912)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-07-30 00:37:59 -07:00 |
|
Jee Jee Li
|
02f82fe438
|
[Doc] Update Intern-S1 info (#21908)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-29 23:58:57 -07:00 |
|
Cyrus Leung
|
2ca5f82c2a
|
[Misc] Remove redundant config definitions (#21891)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-29 23:54:18 -07:00 |
|
Louie Tsai
|
6f8d261882
|
Update vLLM Benchmark Suite for Xeon based on 0.9.2 release (#21486)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-07-30 05:57:03 +00:00 |
|
Ricardo Decal
|
4cd7fe6cea
|
[Docs] Expand introduction to Ray in Multi-node deployment section (#21584)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-29 22:07:28 -07:00 |
|