Aaron Pham
|
8a4a2efc6f
|
[V1][Core] using cached vocab_size for Structured Outputs (#14630)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-13 11:39:28 -07:00 |
|
Cyrus Leung
|
8e9ffd37d6
|
[Misc] Clean up processor tests (#14771)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-13 18:25:37 +00:00 |
|
Woosuk Kwon
|
01b3fd0af7
|
[V1][Minor] Minor enhancements on scheduler (#14732)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-13 08:53:22 -07:00 |
|
Cyrus Leung
|
f53a0586b9
|
[Bugfix] Fix prompt format of GLM4V (#14539)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-13 11:37:17 +00:00 |
|
Isotr0py
|
b1cc4dfef5
|
[VLM] Support loading InternVideo2.5 models as original InternVLChatModel (#14738)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-13 03:10:02 -07:00 |
|
Cyrus Leung
|
382403921f
|
[VLM] Support pan-and-scan for Gemma3 multi-modal processor (#14672)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-13 02:23:12 -07:00 |
|
Jee Jee Li
|
a73122de96
|
[Bugfix] fix benchmark moe (#14653)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-13 16:12:42 +08:00 |
|
Jee Jee Li
|
bd44b812cb
|
[CI/Build] Delete ultravox LoRA test (#14730)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-13 07:57:39 +00:00 |
|
Szymon Ożóg
|
55211b01e8
|
[Bugfix] Fix chunked prefill for GGUF (#14666)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
|
2025-03-13 07:19:03 +00:00 |
|
Kyle Sayers
|
5d043c1685
|
[Quant] Bamba SupportsQuant (#14698)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-03-13 04:57:05 +00:00 |
|
Kyle Sayers
|
36d1ccb286
|
[Quant] BartModel SupportsQuant (#14699)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-03-13 04:55:59 +00:00 |
|
Siyuan Liu
|
1bc3b739c4
|
[V1][TPU] Add assertion on multi-step-scheduler (#14707)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-03-12 21:37:58 -07:00 |
|
Mathis Felardos
|
1bd32bc8dd
|
[Config][Disaggregated] Add timeout configuration for the torch.store and add KVTransferConfig.kv_connector_extra_config (#14367)
Signed-off-by: Mathis Felardos <mathis@mistral.ai>
|
2025-03-12 20:15:20 -07:00 |
|
TY-AMD
|
128bf75283
|
[BugFix][TritonMLA] Process weights after model loading for GGUF (#14555)
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
|
2025-03-12 20:14:36 -07:00 |
|
Gregory Shtrasberg
|
a94a699c3f
|
[ROCm][FP8] Fix for adjustments needed only for fnuz (#14689)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-03-12 20:14:04 -07:00 |
|
Richard Liu
|
ab426ec9c0
|
Add ray[data] as tpu dependency (#14691)
Signed-off-by: <ricliu@google.com>
Signed-off-by: Richard Liu <ricliu@google.com>
|
2025-03-12 20:13:48 -07:00 |
|
Joe Runde
|
165290d357
|
[bugfix] fixup warning message for plugged schedulers for v1 (#14700)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-03-12 20:12:13 -07:00 |
|
Kevin H. Luu
|
ce20124671
|
[release] Add force remove for TPU logs (#14697)
|
2025-03-12 22:35:18 +00:00 |
|
Woosuk Kwon
|
53be4a8634
|
[V1] Allow sliding window + prefix caching (#13069)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-12 11:21:19 -07:00 |
|
Nick Hill
|
f5d3acd474
|
[BugFix][V1] Fix parallel sampling finishing/aborts (#14512)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-12 10:29:48 -07:00 |
|
TJian
|
916836bbfb
|
[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-12 09:31:19 -07:00 |
|
Sage Moore
|
d9f83d6206
|
[ROCm] Enable chunked prefill/paged attention in MLA on ROCm (#14316)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-03-12 15:51:20 +00:00 |
|
ameyanjarlekar
|
4a754fcf15
|
[Bugfix] Missing thumbnail from NVLM-D processor (#14633)
Signed-off-by: ameyanjarlekar <aanjarlekar@nvidia.com>
|
2025-03-12 08:50:49 -07:00 |
|
Woosuk Kwon
|
c0c25e25fa
|
[Model] Add support for Gemma 3 (#14660)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-12 08:36:33 -07:00 |
|
Sage Moore
|
45f3f3f59e
|
[ROCm][Bugfix] Ensure that the moe_wna16_gemm kernel is not built on ROCm platforms. (#14629)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-03-12 08:00:28 -04:00 |
|
Li, Jiang
|
ff47aab056
|
[CPU] Upgrade CPU backend to torch-2.6 (#13381)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-03-12 10:41:13 +00:00 |
|
Pavani Majety
|
debd6bbf09
|
[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-03-12 05:13:11 +00:00 |
|
Benjamin Chislett
|
5c538c37b2
|
[V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing (#14645)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-03-11 22:12:41 -07:00 |
|
Szymon Ożóg
|
e22ee1e7a2
|
[Kernel] GGUF MoE kernel (#14613)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
|
2025-03-12 03:33:27 +00:00 |
|
Isotr0py
|
e392d85831
|
[Core] Refactor QKVCrossParallelLinear implementation to support BNB 4-bit quantization (#14545)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-11 20:12:52 -07:00 |
|
Aaron Pham
|
77a318bd01
|
[V1][Core] Support MistralTokenizer for Structured Output (#14625)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-12 10:40:09 +08:00 |
|
Farzad Abdolhosseini
|
80e78d02ac
|
[Model] Extend Ultravox to accept audio longer than 30s (#13631)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
|
2025-03-12 10:27:10 +08:00 |
|
Jennifer Zhao
|
4a42b9f5d6
|
[Doc] Update benchmarks README (#14646)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-03-11 19:23:04 -07:00 |
|
Joe Runde
|
47532cd9f4
|
[core][V1] pluggable scheduler (#14466)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-03-12 01:15:15 +00:00 |
|
Randy Chen
|
36e0c8f7da
|
[Feature] Add vllm bench CLI (#13993)
Signed-off-by: Randy Chen <acad.randyjhc@gmail.com>
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-12 00:31:48 +00:00 |
|
Kevin H. Luu
|
9f583e360c
|
[release] Add commands to clean up logs on TPU release node (#14642)
|
2025-03-12 00:14:50 +00:00 |
|
Cody Yu
|
b706d898af
|
[Bugfix][V1][PP] Only warmup sampler at last PP rank (#14643)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-11 23:40:07 +00:00 |
|
iefgnoix
|
863d315c86
|
[V1][TPU] Pad the block_table.shape[1] so the ragged paged attention can handle correctly (#14597)
|
2025-03-11 19:12:26 -04:00 |
|
Richard Liu
|
d374f04a33
|
Fix run_tpu_test (#14641)
Signed-off-by: <ricliu@google.com>
Signed-off-by: Richard Liu <ricliu@google.com>
|
2025-03-11 21:14:33 +00:00 |
|
Russell Bryant
|
61a01b27a7
|
[V1] Delay all xgrammar usage until needed (#14616)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-11 20:21:33 +00:00 |
|
Yang.Tao
|
53056731fd
|
fix some typos : supported_head_sizes (#14627)
|
2025-03-11 10:38:24 -07:00 |
|
Russell Bryant
|
4cbf286794
|
[V1] Remove cache from StructuredOutputManager (#14622)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-11 10:36:07 -07:00 |
|
Kunshang Ji
|
c6e14a61ab
|
[Hardware][Intel GPU] upgrade IPEX dependency to 2.6.10. (#14564)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-03-11 17:11:47 +00:00 |
|
Lucas Wilkinson
|
07b4b7a37f
|
[BugFix/Build] Fix sparse kernels not getting built on hopper (#14572)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-11 17:09:03 +00:00 |
|
Dilip Gowda Bhagavan
|
07964e2f30
|
docs: Add documentation for s390x cpu implementation (#14198)
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-11 17:02:17 +00:00 |
|
Russell Bryant
|
4bf82d4b90
|
[V1] Add regex structured output support with xgrammar (#14590)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-11 23:03:44 +08:00 |
|
Richard Liu
|
9ab326713f
|
Uninstall dependencies before installing requirements/tpu.txt (#14586)
Signed-off-by: <ricliu@google.com>
Signed-off-by: Richard Liu <ricliu@google.com>
|
2025-03-11 08:01:35 -07:00 |
|
Cyrus Leung
|
af295e9b01
|
[Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 (#14609)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-11 07:59:43 -07:00 |
|
Jeff Daily
|
a1c8f3796c
|
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-11 10:54:56 -04:00 |
|
Russell Bryant
|
08a1a1121d
|
benchmarks: simplify test jsonschema (#14567)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-11 13:39:30 +00:00 |
|