NekoMimiUnagi
|
466166dcfd
|
[Frontend] Add optional token-level progress bar to LLM.beam_search (#19301)
Signed-off-by: Ruosen Li <rxl190028@utdallas.edu>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-71-179.ec2.internal>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-19 03:21:41 -04:00 |
|
Zuxin
|
1d0ae26c85
|
Add xLAM tool parser support (#17148)
|
2025-06-19 14:26:41 +08:00 |
|
Isotr0py
|
6021999573
|
[Minor] Allow redirecting model path for HfRunner in test (#19795)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-18 23:04:10 -07:00 |
|
Ning Xie
|
c7b370c603
|
raise exception for pin_lora (#19809)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-18 22:57:35 -07:00 |
|
zsolt-borbely-htec
|
aa20d10a91
|
[Misc] [ROCm] Prevent surplus tensor reshape (#19803)
Signed-off-by: Zsolt Borbely <zsolt.borbely@htecgroup.com>
|
2025-06-19 13:57:16 +08:00 |
|
TJian
|
2de12be428
|
[ROCm] [AITER] [Bugfix] Patch for AITER commit 648764942e552a8bb5fe16026703716a81f05374 (#18990)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-06-18 22:56:31 -07:00 |
|
Yu-Hang "Maxin" Tang
|
83ca9ae47b
|
Mark invariant normalizer in Gemma as non-persistent (#19788)
Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>
|
2025-06-18 22:56:03 -07:00 |
|
kourosh hakhamaneshi
|
e2148dc5ea
|
[Bugfix] Add check_health to v1 async client. (#19821)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-06-18 21:47:01 -07:00 |
|
Lu Fang
|
b1098b4072
|
[Bugfix] Fix the linter (#19826)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-18 21:44:41 -07:00 |
|
Maximilien de Bayser
|
799397ee4f
|
Support embedding models in V1 (#16188)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-18 21:36:33 -07:00 |
|
Jee Jee Li
|
4959915089
|
[Quantization] Modify the logic of BNB double quantization (#19742)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-19 03:52:09 +00:00 |
|
Lu Fang
|
8d1e89d946
|
[Misc][ROCm] Enforce no unused variable in ROCm C++ files (#19796)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-18 20:25:15 -07:00 |
|
Michael Goin
|
36239f79dd
|
Fix FA2 fallback for Blackwell V1 (#19781)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-19 09:53:55 +08:00 |
|
afeldman-nm
|
dfada85eee
|
[Frontend] Expose custom args in OpenAI APIs (#16862)
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-18 17:41:11 -07:00 |
|
Richard Zou
|
ed33349738
|
[BugFix] Fix use_cudagraph=False (#19612)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-06-19 08:23:12 +08:00 |
|
Woosuk Kwon
|
d49adea1f9
|
[Multimodal] Use fast processor for Qwen2/2.5-VL (#19789)
|
2025-06-18 15:49:40 -07:00 |
|
Russell Bryant
|
14fdd21d39
|
[Core] More fixes to MultiModalEmbeddings type handling (#19715)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-18 22:48:29 +00:00 |
|
QiliangCui
|
04fefe7c9a
|
[TPU] Update torch-xla version to include paged attention tuned block change (#19813)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-06-18 22:41:13 +00:00 |
|
Lukas Geiger
|
3b523e38d9
|
[Core] Do not copy array during hashing (#19484)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-18 15:36:55 -07:00 |
|
afeldman-nm
|
16c16301c8
|
Disable "Forbid direct 'import triton'" check for vllm/triton_utils/importing.py in an extensible way (#19783)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
|
2025-06-18 15:08:00 -07:00 |
|
Nathan Weinberg
|
9206d0ff01
|
docs: fix Slack bulletpoint in README (#19811)
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
|
2025-06-18 20:47:08 +00:00 |
|
Chen Zhang
|
a89209b78d
|
[v1] Support mamba2 (#19327)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-18 20:34:15 +00:00 |
|
Russell Bryant
|
ffacb222cb
|
[Docs] Add Huzaifa Sidhpurwala to vuln mgmt team doc (#19808)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-18 20:22:28 +00:00 |
|
Chauncey
|
12575cfa7a
|
[Bugfix] fix RAY_CGRAPH_get_timeout is not set successfully (#19725)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-06-18 10:26:16 -07:00 |
|
Zzz9990
|
8b6e1d639c
|
[Hardware][AMD] integrate aiter chunked prefill into vllm (#18596)
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: fsx950223 <fsx950223@outlook.com>
Co-authored-by: charlifu <charlifu@amd.com>
|
2025-06-18 08:46:51 -07:00 |
|
Lu Fang
|
735a9de71f
|
[Qwen] Add tagging rule for Qwen related PRs (#19799)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-18 14:26:43 +00:00 |
|
wangxiyuan
|
257ab95439
|
[Platform] Allow platform use V1 Engine by default (#19792)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-06-18 13:03:36 +00:00 |
|
Reid
|
cca91a7a10
|
[doc] fix the incorrect label (#19787)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-18 10:30:58 +00:00 |
|
Woosuk Kwon
|
f04d604567
|
[Minor] Zero-initialize attn output buffer (#19784)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-18 06:59:27 +00:00 |
|
afeldman-nm
|
19a53b2783
|
[V1] Decouple GPU and TPU InputBatch (#19778)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
|
2025-06-18 06:38:13 +00:00 |
|
Zhonghua Deng
|
eccdc8318c
|
[V1][P/D] An native implementation of xPyD based on P2P NCCL (#18242)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2025-06-18 06:32:36 +00:00 |
|
Russell Bryant
|
5f52a84685
|
[V1] Add API docs for EncoderCacheManager (#19294)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-18 13:37:01 +08:00 |
|
lkchen
|
d4629dc43f
|
[Misc] Add __str__ for RequestStatus (#19780)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-06-18 03:03:01 +00:00 |
|
Ning Xie
|
6e9cc73f67
|
[MISC] correct DeviceConfig device field static type analysis (#19699)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-17 17:21:50 -07:00 |
|
Ning Xie
|
c53711bd63
|
[MISC] correct copy_blocks src_to_dists param type (#19696)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-17 17:21:06 -07:00 |
|
Chenyaaang
|
dac8cc49f4
|
[TPU] Update torch version to include paged attention kernel change (#19706)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-06-17 22:24:49 +00:00 |
|
Charlie Fu
|
a44b1c951d
|
[Feature][ROCm] Add full graph capture support for TritonAttentionBackend (#19158)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-06-17 17:03:06 -04:00 |
|
Michael Goin
|
b447624ee3
|
[Bugfix] Fix faulty triton importing logic when using Ray for DP (#19734)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-17 20:59:29 +00:00 |
|
Jiayi Yao
|
cda92307c1
|
[Misc] Update lmcache connector with the latest connector apis (#19441)
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2025-06-17 19:57:54 +00:00 |
|
Michael Goin
|
bf57ccc5c2
|
Remove sm120 arch from sm100 cutlass kernel arch list (#19716)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-17 11:49:39 -07:00 |
|
Wentao Ye
|
ffb2cd6b54
|
[Perf] Optimize moe_align_block_size CUDA kernel (#19572)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-06-17 11:49:26 -07:00 |
|
Isotr0py
|
ca94d7fa00
|
[Bugfix] Update multimodel models mapping to fit new checkpoint after Transformers v4.52 (#19151)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-17 15:58:38 +00:00 |
|
CYJiang
|
5a1c2e15d8
|
[Mis] remove duplicate engine status checks (#19647)
Signed-off-by: googs1025 <googs1025@gmail.com>
|
2025-06-17 08:17:38 -07:00 |
|
Nicolò Lucchesi
|
4c8f64faa7
|
[V1][Kernel] Flashinfer HND KV cache layout (#19280)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-06-17 09:09:22 -04:00 |
|
David Xia
|
93aee29fdb
|
[doc] split "Other AI Accelerators" tabs (#19708)
|
2025-06-17 22:05:29 +09:00 |
|
Reid
|
154d063b9f
|
[doc][mkdocs] Add edit button to documentation (#19637)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-17 11:10:31 +00:00 |
|
jvlunteren
|
ccd7c05089
|
[Kernel] Add Split-KV Support to Unified Triton Attention Kernel (#19152)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2025-06-17 10:45:07 +00:00 |
|
Huy Do
|
c48c6c4008
|
Add a doc on how to update PyTorch version (#19705)
|
2025-06-17 18:10:37 +08:00 |
|
Isotr0py
|
aed8468642
|
[Doc] Add missing llava family multi-image examples (#19698)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-17 07:05:21 +00:00 |
|
quanliu
|
5c76b9cdaf
|
[Core] add remove_seq_from_computed_blocks_tracker to BlockSpaceManager (#19686)
Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
|
2025-06-17 04:40:58 +00:00 |
|