Cyrus Leung
|
879a06579e
|
[CI/Build] Bump transformers version (#27528)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-31 22:11:07 -07:00 |
|
yugong333
|
29de3cdee4
|
Adding SplitK in fused_moe_lora kernel (#27818)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-01 12:55:46 +08:00 |
|
Yan Ma
|
7e2729b57e
|
[Multimodal][XPU]Enable vision attn backend for xpu platform (#27525)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yejing Lai <yejing.lai@intel.com>
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-01 04:45:02 +00:00 |
|
Jee Jee Li
|
3a5de7d2d6
|
[Bugfix] Fix KDA output (#27905)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-01 11:54:36 +08:00 |
|
Jee Jee Li
|
bc4486d609
|
[Kernel] Enable FusedMoEModularKernel support bias (#27754)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-01 02:05:12 +00:00 |
|
Nick Hill
|
0cdbe7b744
|
[Core] Async scheduling + structured outputs compatibility (#26866)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-01 00:35:04 +00:00 |
|
Chen Zhang
|
df334868ca
|
[Hybrid] A simpler algorithm to find kernel_block_size (#26476)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-10-31 21:30:28 +00:00 |
|
Bram Wasti
|
0e0a638c3b
|
Batch invariance doc (#27839)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-31 17:22:19 -04:00 |
|
Matthew Bonanni
|
f29aeb5a25
|
Add FLASHINFER_MLA to test_mla_backends and add B200 CI run (#27663)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-31 11:12:19 -07:00 |
|
Vinay R Damodaran
|
5e8862e9e0
|
[Feature] Pydantic validation for scheduler.py and structured_outputs.py (#26519)
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-31 18:05:50 +00:00 |
|
Nick Hill
|
9e5bd3076e
|
[Cleanup] Remove no-longer-used SpeculativeConfig.enable_chunked_prefill (#27826)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-31 10:57:45 -07:00 |
|
Shu Wang
|
fc16f1c477
|
Flashinfer_CUTLASS_MOE fuses quantization for TP (#27223)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
|
2025-10-31 17:54:29 +00:00 |
|
ZiTian Zhao
|
bc306fe5e9
|
fix incorrect type annotation in KimiMLP (#27885)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-10-31 17:38:02 +00:00 |
|
Chenguang Zheng
|
103a468bbf
|
[bugfix] Missing cached item in beam search (#27874)
Signed-off-by: fake0fan <645327136@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-31 17:34:27 +00:00 |
|
Rob Mulla
|
70bfbd7b16
|
Docs update tpu install instructions (#27824)
Signed-off-by: Rob Mulla <rob.mulla@gmail.com>
Signed-off-by: Rob Mulla <RobMulla@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-31 10:29:55 -07:00 |
|
GuanLuo
|
d6517be3cd
|
[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node (#26338)
Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-10-31 10:16:00 -07:00 |
|
Isotr0py
|
7e06c40e63
|
[Bugfix] Fix broken MRoPE for GLM-4.1V/GLM-4.5V (#27860)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-31 17:04:51 +00:00 |
|
Madeesh Kannan
|
675704ac01
|
[Bugfix] Allow 64-bit integer values for LoRA IDs to avoid overflow/truncation (#27876)
Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
|
2025-10-31 16:58:42 +00:00 |
|
Jee Jee Li
|
0384aa7150
|
[CI/Build] Add gpt-oss LoRA test (#27870)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-31 22:17:21 +08:00 |
|
Jiangyun Zhu
|
3857eb8725
|
[Perf] Decouple torch op from GDA to leverage torch.compile (#27871)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-10-31 21:35:52 +08:00 |
|
Huamin Li
|
933cdea440
|
[BugFix] Don’t compute reorder threshold when there are no attention groups (#27861)
|
2025-10-31 11:36:18 +00:00 |
|
Isotr0py
|
3933f18a5e
|
[Bugfix] Avoid too small block m/n for FlexAttention kernel option (#27853)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-31 19:33:12 +08:00 |
|
toncao
|
e5ef4dfc11
|
[Kimi-Linear] Correct prefixes and add compatibility to AWQ quants (#27834)
Signed-off-by: toncao <cpatonn@gmail.com>
Co-authored-by: toncao <cpatonn@gmail.com>
|
2025-10-31 17:36:37 +08:00 |
|
Akash kaothalkar
|
36960501d3
|
[Hardware][Powerpc] Fix VLLM_CPU_OMP_THREADS_BIND="auto" low CPU utilization for Power (#27734)
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
|
2025-10-31 07:45:26 +00:00 |
|
Seiji Eicher
|
b2e65cb4a7
|
[benchmark] Make request IDs unique across clients by default (#27723)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-10-30 17:40:35 -07:00 |
|
Wentao Ye
|
2bf0bcc1fc
|
[CI Test] Add Scheduled Integration Test (#27765)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-30 17:29:26 -07:00 |
|
Jakub Sochacki
|
697f507a8e
|
[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 (#26919)
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
|
2025-10-31 07:57:22 +08:00 |
|
Matthew Bonanni
|
d5d2a0fe74
|
[Misc] Make all tool scripts executable (#27831)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-30 23:46:02 +00:00 |
|
Nick Hill
|
c9791f1813
|
[BugFix] Fix broken import in initialize_ray_cluster() (#27838)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-30 16:26:13 -07:00 |
|
Paul Zhang
|
e7acb20076
|
[Feature] Batch invariant torch.compile (#27660)
Signed-off-by: PaulZhang12 <paulzhan@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-30 13:11:29 -07:00 |
|
Jialin Ouyang
|
4b68c4a55b
|
[Core][Perf] Only invoke save_new_computed_blocks when computed blocks are not empty (#27799)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-30 19:47:30 +00:00 |
|
Wentao Ye
|
a8141fa649
|
[Refactor] Remove VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK (#27750)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-30 15:32:39 -04:00 |
|
Sumanth R Hegde
|
4917002523
|
[Fix] Skip record_sleep_state logic in PrometheusStatsLogger if not in dev mode (#27789)
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
|
2025-10-30 19:26:27 +00:00 |
|
cong-meta
|
a2981c4272
|
[EP/DP][API Server] Enable DP-aware routing in OpenAI API requests (#24945)
Co-authored-by: Cong Chen <prowindy@gmail.com>
|
2025-10-30 12:10:16 -07:00 |
|
Jialin Ouyang
|
4574d48bab
|
[Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index (#27629)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-30 11:52:36 -07:00 |
|
Tyler Michael Smith
|
ab98f6556f
|
[Bugfix] Fix 2 precommit issues - (mamba_block_size, kv_cache_config) (#27811)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-10-30 11:52:18 -07:00 |
|
Roger Meier
|
2918c1b49c
|
[Model] Use the same fused_moe configs for all H200 devices (#23642)
Signed-off-by: Roger Meier <r.meier@siemens.com>
v0.11.1rc5
|
2025-10-30 17:36:56 +00:00 |
|
Mengqing Cao
|
1004205795
|
[MTP] Refactor mtp predictor to avoid d2h operation (#27643)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2025-10-30 17:27:39 +00:00 |
|
Huy Do
|
ba33e8830d
|
Reapply "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" (#27768)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-10-30 10:22:30 -07:00 |
|
Kebe
|
33a0ea5f32
|
[Docs] add Shanghai Meetup - 2025/10 (#27545)
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: esmeetu <jasonailu87@gmail.com>
|
2025-10-31 00:33:13 +08:00 |
|
Ilya Markov
|
60f76baa66
|
[Misc] Replace CUDA_VISIBLE_DEVICES in DP with torch.cuda.set_device for device selection on cuda-like devices (#27564)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-10-30 11:41:44 -04:00 |
|
Varun Sundar Rabindranath
|
e5e076cad7
|
[BugFix] Stopgap - Flashinfer Autotuner + GPT-OSS + DP/TP (#27762)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-30 08:24:31 -07:00 |
|
Li, Jiang
|
eebf00cb0c
|
[Bugfix][CPU] Fix MRoPE dispatch on the CPU backend (#27800)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-10-30 15:12:05 +00:00 |
|
Fan Yin
|
9956aae4ea
|
[Model][Ouro] Support Ouro Model (#27794)
Signed-off-by: yinfan.1024 <yinfan.1024@bytedance.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-30 22:34:41 +08:00 |
|
Zhewen Li
|
0fe0140408
|
[KV offload] Enable CPU KV offload on CUDA alike Platforms (#27770)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-10-30 22:10:29 +08:00 |
|
Zhiyuan Li
|
4e68cc9b6a
|
[Model] Introduce Kimi Linear to vLLM (#27809)
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>
|
2025-10-30 21:02:27 +08:00 |
|
Huamin Li
|
1994de99ea
|
[CI Failure] Fix test_kv_cache_model_load_and_run (#27717)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-10-30 12:27:53 +00:00 |
|
wang.yuqi
|
4464723f22
|
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. (#25524)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-30 12:13:05 +00:00 |
|
Sairam Pillai
|
74374386e2
|
[Bugfix] Improve GPU validation logging in Ray fallback scenarios (#25775)
Signed-off-by: Sairam Pillai <sairam.pillai61@gmail.com>
|
2025-10-30 11:57:59 +00:00 |
|
Wentao Ye
|
c01f6e525f
|
[CI] Fix mypy for vllm/v1/core and vllm/v1/engine (#27108)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-30 11:32:17 +00:00 |
|