ElizaWszola
|
502640c3f9
|
[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2025-10-02 19:35:13 +00:00 |
|
Chen Zhang
|
3d5f1c8640
|
[Mamba][KVCacheManager] Simplify kv cache manage logic for mamba + MTP (#25119)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-10-02 18:48:31 +00:00 |
|
Ekagra Ranjan
|
1cab2f9cad
|
EAGLE 3: Fix preamble so that measured speedup over Eagle 1 becomes 32% instead of 5% on MTBench (#25916)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2025-10-02 11:29:35 -07:00 |
|
Chen Zhang
|
1e50f1be70
|
[Deepseek v3.2] Support indexer prefill chunking (#25999)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-10-02 10:29:12 -07:00 |
|
Chenheli Hua
|
ad87ba927a
|
[Small] Prevent bypassing media domain restriction via HTTP redirects (#26035)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-10-02 10:27:10 -07:00 |
|
Lucas Wilkinson
|
decf7f794b
|
[BugFix] Fix FI accuracy issue when used for MLA prefill (#26063)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-10-02 17:18:13 +00:00 |
|
Cyrus Leung
|
d00d652998
|
[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-02 10:04:57 -07:00 |
|
Michael Goin
|
3b279a84be
|
[CI] Add Blackwell DeepSeek FP8 FlashInfer MoE tests (#26040)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-02 09:07:19 -07:00 |
|
vllmellm
|
5e4a8223c6
|
[Qwen][ROCm] Flash Attention Rotary Embeddings (#24642)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-02 08:26:08 -07:00 |
|
leo-pony
|
e51de388a2
|
[Platform][CI] Added OOT platform interface e2e test that running on Ascend NPU (#25470)
Signed-off-by: leo-pony <nengjunma@outlook.com>
|
2025-10-02 23:19:22 +08:00 |
|
Cyrus Leung
|
cc253b73d3
|
[Model] Use merge_by_field_config for MM models (D-F) (#26076)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-02 08:17:35 -07:00 |
|
Cyrus Leung
|
7d6fb905d9
|
[Model] Use merge_by_field_config for MM models (A-C) (#26073)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-02 08:17:31 -07:00 |
|
Lucas Wilkinson
|
418d111f8c
|
[FA/Chore] Bump vllm-flash-attention (#25537)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-02 11:06:14 -04:00 |
|
Thomas Parnell
|
be8921fbba
|
Change size of single CUDA graph for CI to 4 (#26089)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-10-02 14:14:28 +00:00 |
|
Huy Do
|
d4e7a1152d
|
Update base image to 22.04 (jammy) (#26065)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-10-02 05:48:04 -07:00 |
|
pwschuurman
|
be22bb6f3d
|
Run:ai model streamer add GCS package support (#24909)
Signed-off-by: Peter Schuurman <psch@google.com>
|
2025-10-01 20:59:13 -07:00 |
|
Nick Hill
|
169313b9f8
|
[Misc] Make handling of SamplingParams clearer in n>1 case (#26032)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-01 19:31:39 -07:00 |
|
Gregory Shtrasberg
|
0b018d8baf
|
[ROCm][Bugfix] Add missing parameter to ROCm backend (#26029)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-10-01 19:23:14 -07:00 |
|
Jerry Zhang
|
c31246800c
|
Support RL online quantization with torchao (#23014)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-10-01 16:39:29 -07:00 |
|
Lucas Wilkinson
|
4134312b35
|
[BugFix] ChunkedLocalAttention is currently not CG compatible (#26034)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-01 16:28:00 -07:00 |
|
Wentao Ye
|
da554f932e
|
[Bug] Fix Negative Cuda Memory Usage (#25683)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-01 18:16:26 -04:00 |
|
Hosang
|
aac622e0cd
|
[ROCm][Build] Add support for AMD Ryzen AI MAX / AI 300 Series (#25908)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
|
2025-10-01 21:39:49 +00:00 |
|
Lucas Wilkinson
|
1726e93ef1
|
[BugFix][DP/EP] Fix CUTLASS MLA hang under load (#26026)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-10-01 12:30:00 -07:00 |
|
Michael Goin
|
ee04c0cd04
|
[CI] Tweaks to GPT-OSS Eval (Blackwell) for stability (#26030)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-01 12:02:17 -07:00 |
|
Huamin Li
|
c36f0aa300
|
Fix test_mamba_ssm_ssd.py due to missing _query_start_loc_to_chunk_indices_offsets (#25995)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-10-01 18:18:36 +00:00 |
|
Johnny
|
5234dc7451
|
[NVIDIA] Blackwell Family (#24673)
Signed-off-by: Johnny <johnnynuca14@gmail.com>
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnync13@gmail.com>
Signed-off-by: Salvatore Cena <cena@cenas.it>
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>
Co-authored-by: Salvatore Cena <cena@cenas.it>
|
2025-10-01 10:50:54 -07:00 |
|
Kenichi Maehashi
|
3b7c20a6b5
|
[Bugfix] Apply same sampling parameters for both n=1 and n>1 (#26005)
Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp>
|
2025-10-01 14:37:35 +00:00 |
|
Nathan Scott
|
f9e714813a
|
[Benchmark] Finish documented v0.11.0 deprecation of --endpoint-type (#26007)
Signed-off-by: Nathan Scott <nathans@redhat.com>
|
2025-10-01 12:41:57 +00:00 |
|
billishyahao
|
2518230d3e
|
[MISC] Fix misleading batch_size_capture_list when cuda_graph_sizes < 4 (#25829)
Signed-off-by: billishyahao <bill.he@amd.com>
Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com>
|
2025-10-01 08:39:45 -04:00 |
|
Harry Mellor
|
a332b84578
|
[CI] Only capture a single CUDA graph size in CI by default (#25951)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-01 10:03:44 +01:00 |
|
Cyrus Leung
|
1405f0c7ba
|
[Misc] Factor out common _apply_feature_select_strategy (#26003)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-01 01:31:03 -07:00 |
|
Wenlong Wang
|
84d57342b6
|
[BugFix][MM] Fix Nonetype error when video is cache in qwen2.5-omni-thinker (#26004)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-10-01 08:03:25 +00:00 |
|
nadathurv
|
57b46d769e
|
[Doc] updating torch.compile doc link (#25989)
Signed-off-by: nadathurv <work.vnadathur@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
|
2025-10-01 07:04:56 +00:00 |
|
Lucia Fang
|
f48b6a03ba
|
[Misc]allow disable pynccl (#25421)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-10-01 06:04:13 +00:00 |
|
Harry Mellor
|
2a69ab4899
|
Update to Transformers v4.56.2 (#24638)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-30 22:07:07 -07:00 |
|
Lucas Wilkinson
|
8d7da92fd7
|
[BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 (#25988)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-30 21:58:31 -07:00 |
|
Zhewen Li
|
e952eee698
|
[Bugfix] Fix __syncwarp on ROCM (#25996)
|
2025-09-30 21:15:11 -07:00 |
|
Roger Wang
|
66bca9b8bd
|
[MM] Add text-only mode for Qwen3-VL (#26000)
|
2025-09-30 21:13:42 -07:00 |
|
Param
|
99028fda44
|
Fix INT8 quantization error on Blackwell GPUs (SM100+) (#25935)
Signed-off-by: padg9912 <phone.and.desktop@gmail.com>
|
2025-09-30 19:19:53 -07:00 |
|
Wentao Ye
|
1244948885
|
[Log] Optimize Log for FP8MOE (#25709)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-30 19:18:43 -07:00 |
|
Salvatore Cena
|
a73f6491c8
|
Update launch_bounds_utils.h for correct compile on Multiple Cuda Arch - PTXAS out of range Warning (#25843)
Signed-off-by: Salvatore Cena <cena@cenas.it>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-30 19:18:19 -07:00 |
|
Lucia Fang
|
001e50c92c
|
[Model] MTP fallback to eager for DeepSeek v32 (#25982)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-10-01 01:53:22 +00:00 |
|
Lucas Wilkinson
|
96ebcaa3ad
|
[Misc] Make EP kernels install script support uv (#25785)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-30 23:38:34 +00:00 |
|
Andrew Xia
|
5db1870bb9
|
[gpt-oss] use vLLM instead of openai types for streaming (#25186)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-09-30 22:47:07 +00:00 |
|
Harry Mellor
|
2ce26b9b5d
|
[Docs] Remove API Reference from search index (#25949)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-30 22:10:02 +00:00 |
|
Harry Mellor
|
a388252ac4
|
Add explicit pooling classes for the Transformers backend (#25322)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-30 23:07:06 +01:00 |
|
David Ben-David
|
9a9f48dff7
|
[V1] [P/D] Add Support for KV Load Failure Recovery (#19330)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
|
2025-09-30 14:57:08 -07:00 |
|
Jee Jee Li
|
67f3fb0844
|
[Bench] Add DeepSeekV32 to MoE benchmark (#25962)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-30 14:13:48 -07:00 |
|
cjackal
|
43b752c325
|
[Llama4] [multimodal] Fix misplaced dtype cast of cos_sin_cache in Llama4VisionRotaryEmbedding (#25889)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2025-09-30 20:35:15 +00:00 |
|
Or Ozeri
|
cfd302db9b
|
OffloadingConnector: Fix GPU block tracking bug (#25856)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-30 19:53:04 +00:00 |
|