Nicolò Lucchesi
|
2582683566
|
[PD] Skip tp_size exchange with rank0 (#19413)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-06-25 20:04:39 -07:00 |
|
Michael Goin
|
754b00edb3
|
[Bugfix] Fix Mistral tool-parser regex for nested JSON (#20093)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-26 01:01:17 +00:00 |
|
Michael Goin
|
296ce95d8e
|
[CI] Add SM120 to the Dockerfile (#19794)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-25 16:23:56 -07:00 |
|
Chenyaaang
|
2d7620c3eb
|
[TPU] Add TPU specific var VLLM_TPU_MOST_MODEL_LEN (#19919)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-06-25 15:51:02 -07:00 |
|
Nick Hill
|
55c65ab495
|
[P/D] Avoid stranding blocks in P when aborted in D's waiting queue (#19223)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-25 15:19:44 -07:00 |
|
Chengji Yao
|
2cc2069970
|
[TPU][Bugfix] fix kv cache padding (#20048)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-25 21:24:10 +00:00 |
|
zhrrr
|
9f0608fc16
|
[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine (#20062)
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
|
2025-06-25 21:03:17 +00:00 |
|
QiliangCui
|
4e0db57fff
|
Fix the path to the testing script. (#20082)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-06-25 20:48:17 +00:00 |
|
Nick Hill
|
c40692bf9a
|
[Misc] Add parallel state node_count function (#20045)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-25 13:38:53 -07:00 |
|
lkchen
|
4734704b30
|
[PD] let toy proxy handle /chat/completions (#19730)
Signed-off-by: Linkun <github@lkchen.net>
|
2025-06-25 15:17:45 -04:00 |
|
Eldar Kurtić
|
8b8c209e35
|
static_scaled_fp8_quant should not run when scale.numel is not 1 (#20076)
|
2025-06-25 15:08:03 -04:00 |
|
lsz05
|
23a04e0895
|
[Fix] Support cls pooling in ModernBertPooler (#20067)
Signed-off-by: shengzhe.li <shengzhe.li@sbintuitions.co.jp>
|
2025-06-25 15:07:45 -04:00 |
|
Dipika Sikka
|
02c97d9a92
|
[Quantization] Add compressed-tensors emulations support for NVFP4 (#19879)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
|
2025-06-25 14:28:19 -04:00 |
|
Nicolò Lucchesi
|
e795d723ed
|
[Frontend] Add /v1/audio/translations OpenAI API endpoint (#19615)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-06-25 17:54:14 +00:00 |
|
cjackal
|
8359f4c8d8
|
[V1][Speculative Decoding] Fix DeepSeek MTP (#20022)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2025-06-25 08:41:02 -07:00 |
|
Michael Goin
|
bf5181583f
|
[Doc] Guide for Incremental Compilation Workflow (#19109)
|
2025-06-25 22:06:46 +09:00 |
|
Reid
|
c53fec1fcb
|
[doc] add reference link for Intel XPU (#20064)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-25 12:24:07 +00:00 |
|
Lucas Wilkinson
|
0f9e7354f5
|
[BugFix] Fix full-cuda-graph illegal memory access in FA3 (#20057)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-06-25 08:39:04 +00:00 |
|
Aaron Pham
|
ba7ba35cda
|
[Chore] debloat some initial logs (#19438)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-06-25 06:36:22 +00:00 |
|
bnellnm
|
015fab8c2f
|
[Kernels][Bugfix] Use torch op for all kernels in FusedMoE forward. Add additional testing for cudagraphs. (#19717)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-06-24 23:22:58 -07:00 |
|
Max Wittig
|
f59fc60fb3
|
[Feat][CLI] enforce-include-usage (#19695)
Signed-off-by: Max Wittig <max.wittig@siemens.com>
|
2025-06-25 01:43:04 -04:00 |
|
Wentao Ye
|
879f69bed3
|
[Refactor] Remove duplicate ceil_div (#20023)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-25 05:19:09 +00:00 |
|
David Xia
|
7108934142
|
[Frontend] speed up import time of vllm.config (#18036)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-06-25 00:41:11 -04:00 |
|
h-avsha
|
3443aaf8dd
|
Move to a faster base64 implementation (#19984)
Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai>
|
2025-06-24 20:33:51 -07:00 |
|
Isotr0py
|
2273ec322c
|
Revert "Fix(models/siglip): Add compatibility for Gemma models quantized by llm-compressor" (#20030)
|
2025-06-25 11:23:29 +08:00 |
|
Wentao Ye
|
a6c4b87fbc
|
Revert "[Feature] Integrate new deepgemm (#19820)" (#20049)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-24 19:45:22 -07:00 |
|
Brayden Zhong
|
1afa9948f5
|
[Llama4] Update attn_temperature_tuning (#19997)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-06-24 22:42:53 -04:00 |
|
Eli Uriegas
|
0d06b533a0
|
cmake: Update vllm_flash_attn for vllm_kernels (#20032)
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
|
2025-06-24 22:44:10 +00:00 |
|
Boyuan Feng
|
c01d1c5aba
|
use .dev for version comparison with pytorch nightly release (#20031)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-06-24 21:52:16 +00:00 |
|
Brayden Zhong
|
ead369845d
|
[Easy] Remove submodule added in #19463 (#20039)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-06-24 13:23:15 -07:00 |
|
Wentao Ye
|
c6e3bba8e6
|
[Feature] Integrate new deepgemm (#19820)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-24 12:51:56 -07:00 |
|
lkchen
|
91f7d9d0b6
|
[P/D] Asynchronously do _nixl_handshake (#19836)
Signed-off-by: Linkun Chen <github@lkchen.net>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-24 12:46:10 -07:00 |
|
Nick Hill
|
8619e7158c
|
[BugFix] Fix multi-node offline data parallel (#19937)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-24 12:45:20 -07:00 |
|
d.transposed
|
c635c5f744
|
[Misc][Benchmarking] Add variable request-rate ("ramp-up") to the benchmarking client. (#19423)
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-06-24 18:41:49 +00:00 |
|
Lucas Wilkinson
|
a045b7e89a
|
[Perf] Improve/Fix-regression for FA3 in High QPS regimes (#19463)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-06-24 13:09:01 -04:00 |
|
amit
|
981eeca41a
|
[Fix][V1] Remove --scheduling-policy oracle (#20010)
Signed-off-by: amit <amit.man@gmail.com>
|
2025-06-24 09:52:15 -07:00 |
|
Reid
|
26d34eb67e
|
refactor example - qwen3_reranker (#19847)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-24 14:03:20 +00:00 |
|
Li, Jiang
|
53da4cd397
|
[Bugfix][CPU] Fix InputBatch for pooling models in the CPU v1 (#20014)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-24 13:20:04 +00:00 |
|
Vadim Gimpelson
|
9a3b88328f
|
[PERF] Speedup of MRoPE prepare inputs (#19939)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
|
2025-06-23 23:01:26 -07:00 |
|
Reid
|
3014c920da
|
add some examples for other benchmark scripts (#19893)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-24 05:57:46 +00:00 |
|
Kay Yan
|
0eed516951
|
[doc] Fix broken link in the installation for CPU (#19980)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-06-24 12:04:11 +08:00 |
|
Chenyaaang
|
ee5ad8d2c5
|
[Misc][Tools][Benchmark] Add profile to autotune script (#19711)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-06-24 00:59:41 +00:00 |
|
QiliangCui
|
a738dbb2a1
|
Update test case parameter to have the throughput above 8.0 (#19994)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-06-24 00:18:10 +00:00 |
|
Chenyaaang
|
33d5e29be9
|
[TPU] Fix tpu model runner test (#19995)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-06-23 16:04:28 -07:00 |
|
22quinn
|
4671ac6e2a
|
[Bugfix][Benchmark] Fix Marlin benchmark (#19929)
|
2025-06-24 07:25:12 +09:00 |
|
Jun-Howie
|
dd2ccf8dde
|
Feat Dynamic Quantization for MoE Layers in GPTQ Marlin Backend (#19395)
|
2025-06-24 07:23:28 +09:00 |
|
22quinn
|
a3bc76e4b5
|
[CI/Build] Push latest tag for cpu and neuron docker image (#19897)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-23 14:15:37 -07:00 |
|
cascade
|
e6327c9b3e
|
[Feature] Support sequence parallelism for static fp8 quantization (#19181)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-06-23 16:09:02 -04:00 |
|
lkchen
|
d0132f025d
|
[Misc] Add type alias ReqId and EngineId for better readability (#19880)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-06-23 12:57:57 -07:00 |
|
Isotr0py
|
61f4fc5dc6
|
[Bugfix][v1] Fix step pooler implementation and step pooling usage in v1 (#19956)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-23 18:38:06 +00:00 |
|