Reid
|
e384f2f108
|
[Misc] refactor example - openai_transcription_client (#19851)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-20 08:02:21 +00:00 |
|
Reid
|
089a306f19
|
[Misc] update cuda version (#19526)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-20 07:25:15 +00:00 |
|
kourosh hakhamaneshi
|
5e666f72cd
|
[Bugfix][Ray] Set the cuda context eagerly in the ray worker (#19583)
|
2025-06-19 22:01:16 -07:00 |
|
qli88
|
e3a3e4db46
|
[Bugfix] Enable PP with AITER+V1 (#19822)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
|
2025-06-20 12:43:20 +08:00 |
|
Xerxes
|
e41bf15cd0
|
[Chore]: qwen3-moe-type-hints-mistake (#19860)
Co-authored-by: xinnan.hou <hxn02029096@alibaba-inc.com>
|
2025-06-19 21:43:07 -07:00 |
|
Brayden Zhong
|
5aa4a015ce
|
[Benchmark] Fix Value of type "SampleRequest" is not indexable (#18032)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-06-19 21:28:55 -07:00 |
|
Elaine Zhao
|
b6bad3d186
|
[CI][Neuron] Fail and exit on first error (#19622)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-20 12:27:51 +08:00 |
|
Isotr0py
|
ee9a1531aa
|
[CI/Build][Bugfix] Fix deadlock on v1 engine test CI (#19872)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-20 09:51:07 +08:00 |
|
Robert Shaw
|
10d82f9ac5
|
[Benchmark][Bugfix] Fix Dataset Length Calculation (#19868)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-06-19 18:30:41 -07:00 |
|
xzbdmw
|
ea10dd9d9e
|
[Frontend] early return chat format resolution when specified (#19735)
|
2025-06-19 18:49:59 +00:00 |
|
Alex Brooks
|
ead2110297
|
[Core][Bugfix] Fix Online MM Beam Search (#19688)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-06-19 17:18:07 +00:00 |
|
Li, Jiang
|
01220ce89a
|
[CI][CPU] Improve dummy Triton interfaces and fix the CPU CI (#19838)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-19 15:46:09 +00:00 |
|
22quinn
|
6f68c49220
|
[Doc] Update V1 user guide for embedding models (#19842)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-19 09:43:27 +00:00 |
|
Alexei-V-Ivanov-AMD
|
4719460644
|
Fixing Chunked Prefill Test. (#19762)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-06-19 01:36:16 -07:00 |
|
NekoMimiUnagi
|
466166dcfd
|
[Frontend] Add optional token-level progress bar to LLM.beam_search (#19301)
Signed-off-by: Ruosen Li <rxl190028@utdallas.edu>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-71-179.ec2.internal>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-19 03:21:41 -04:00 |
|
Zuxin
|
1d0ae26c85
|
Add xLAM tool parser support (#17148)
|
2025-06-19 14:26:41 +08:00 |
|
Isotr0py
|
6021999573
|
[Minor] Allow redirecting model path for HfRunner in test (#19795)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-18 23:04:10 -07:00 |
|
Ning Xie
|
c7b370c603
|
raise exception for pin_lora (#19809)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-18 22:57:35 -07:00 |
|
zsolt-borbely-htec
|
aa20d10a91
|
[Misc] [ROCm] Prevent surplus tensor reshape (#19803)
Signed-off-by: Zsolt Borbely <zsolt.borbely@htecgroup.com>
|
2025-06-19 13:57:16 +08:00 |
|
TJian
|
2de12be428
|
[ROCm] [AITER] [Bugfix] Patch for AITER commit 648764942e552a8bb5fe16026703716a81f05374 (#18990)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-06-18 22:56:31 -07:00 |
|
Yu-Hang "Maxin" Tang
|
83ca9ae47b
|
Mark invariant normalizer in Gemma as non-persistent (#19788)
Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
|
2025-06-18 22:56:03 -07:00 |
|
kourosh hakhamaneshi
|
e2148dc5ea
|
[Bugfix] Add check_health to v1 async client. (#19821)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-06-18 21:47:01 -07:00 |
|
Lu Fang
|
b1098b4072
|
[Bugfix] Fix the linter (#19826)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-18 21:44:41 -07:00 |
|
Maximilien de Bayser
|
799397ee4f
|
Support embedding models in V1 (#16188)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-18 21:36:33 -07:00 |
|
Jee Jee Li
|
4959915089
|
[Quantization] Modify the logic of BNB double quantization (#19742)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-19 03:52:09 +00:00 |
|
Lu Fang
|
8d1e89d946
|
[Misc][ROCm] Enforce no unused variable in ROCm C++ files (#19796)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-18 20:25:15 -07:00 |
|
Michael Goin
|
36239f79dd
|
Fix FA2 fallback for Blackwell V1 (#19781)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-19 09:53:55 +08:00 |
|
afeldman-nm
|
dfada85eee
|
[Frontend] Expose custom args in OpenAI APIs (#16862)
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-18 17:41:11 -07:00 |
|
Richard Zou
|
ed33349738
|
[BugFix] Fix use_cudagraph=False (#19612)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-06-19 08:23:12 +08:00 |
|
Woosuk Kwon
|
d49adea1f9
|
[Multimodal] Use fast processor for Qwen2/2.5-VL (#19789)
|
2025-06-18 15:49:40 -07:00 |
|
Russell Bryant
|
14fdd21d39
|
[Core] More fixes to MultiModalEmbeddings type handling (#19715)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-18 22:48:29 +00:00 |
|
QiliangCui
|
04fefe7c9a
|
[TPU] Update torch-xla version to include paged attention tuned block change (#19813)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-06-18 22:41:13 +00:00 |
|
Lukas Geiger
|
3b523e38d9
|
[Core] Do not copy array during hashing (#19484)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-18 15:36:55 -07:00 |
|
afeldman-nm
|
16c16301c8
|
Disable "Forbid direct 'import triton'" check for vllm/triton_utils/importing.py in an extensible way (#19783)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
|
2025-06-18 15:08:00 -07:00 |
|
Nathan Weinberg
|
9206d0ff01
|
docs: fix Slack bulletpoint in README (#19811)
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
|
2025-06-18 20:47:08 +00:00 |
|
Chen Zhang
|
a89209b78d
|
[v1] Support mamba2 (#19327)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-18 20:34:15 +00:00 |
|
Russell Bryant
|
ffacb222cb
|
[Docs] Add Huzaifa Sidhpurwala to vuln mgmt team doc (#19808)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-18 20:22:28 +00:00 |
|
Chauncey
|
12575cfa7a
|
[Bugfix] fix RAY_CGRAPH_get_timeout is not set successfully (#19725)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-06-18 10:26:16 -07:00 |
|
Zzz9990
|
8b6e1d639c
|
[Hardware][AMD] integrate aiter chunked prefill into vllm (#18596)
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: fsx950223 <fsx950223@outlook.com>
Co-authored-by: charlifu <charlifu@amd.com>
|
2025-06-18 08:46:51 -07:00 |
|
Lu Fang
|
735a9de71f
|
[Qwen] Add tagging rule for Qwen related PRs (#19799)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-18 14:26:43 +00:00 |
|
wangxiyuan
|
257ab95439
|
[Platform] Allow platform use V1 Engine by default (#19792)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-06-18 13:03:36 +00:00 |
|
Reid
|
cca91a7a10
|
[doc] fix the incorrect label (#19787)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-18 10:30:58 +00:00 |
|
Woosuk Kwon
|
f04d604567
|
[Minor] Zero-initialize attn output buffer (#19784)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-18 06:59:27 +00:00 |
|
afeldman-nm
|
19a53b2783
|
[V1] Decouple GPU and TPU InputBatch (#19778)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
|
2025-06-18 06:38:13 +00:00 |
|
Zhonghua Deng
|
eccdc8318c
|
[V1][P/D] An native implementation of xPyD based on P2P NCCL (#18242)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2025-06-18 06:32:36 +00:00 |
|
Russell Bryant
|
5f52a84685
|
[V1] Add API docs for EncoderCacheManager (#19294)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-18 13:37:01 +08:00 |
|
lkchen
|
d4629dc43f
|
[Misc] Add __str__ for RequestStatus (#19780)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-06-18 03:03:01 +00:00 |
|
Ning Xie
|
6e9cc73f67
|
[MISC] correct DeviceConfig device field static type analysis (#19699)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-17 17:21:50 -07:00 |
|
Ning Xie
|
c53711bd63
|
[MISC] correct copy_blocks src_to_dists param type (#19696)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-17 17:21:06 -07:00 |
|
Chenyaaang
|
dac8cc49f4
|
[TPU] Update torch version to include paged attention kernel change (#19706)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-06-17 22:24:49 +00:00 |
|