Yajie Wang
|
977a16772c
|
[Bugfix][Kernel]: Fix AllSpark kernel compilation errors and enable for CUDA < 12.0 (#14430)
Signed-off-by: wyj371990 <wyj371990@alibaba-inc.com>
|
2025-03-14 09:55:14 -07:00 |
|
daniel-salib
|
73deea2fdb
|
[Frontend] track server_load (#13950)
|
2025-03-14 09:53:17 -07:00 |
|
Mark McLoughlin
|
9d2b4a70f4
|
[V1][Metrics] Updated list of deprecated metrics in v0.8 (#14695)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-15 00:45:25 +08:00 |
|
Russell Bryant
|
0b0d6421b2
|
[Frontend] Fix log message to use http vs https (#14774)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-14 09:21:09 -07:00 |
|
Russell Bryant
|
1140991a7b
|
[V1] Fix vocab size calculation for structured output (#14826)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-14 09:18:38 -07:00 |
|
Cyrus Leung
|
613c5bb945
|
[Bugfix] Fix Aria test loading (#14823)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-14 09:11:23 -07:00 |
|
Guillaume Calmettes
|
fd8e055ffb
|
[BugFix]: properly catch templating error when preprocess input (#13976)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-03-14 05:58:34 -07:00 |
|
Cyrus Leung
|
ab93f1360f
|
[VLM] Various cleanup and fixes (#14806)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-14 05:58:19 -07:00 |
|
DefTruth
|
40253bab44
|
[Bugfix][W8A8] fixed cutlass block fp8 binding (#14796)
|
2025-03-14 03:32:42 -07:00 |
|
Woosuk Kwon
|
c77620d22d
|
[V1][Minor] Minor code cleanup for scheduling metrics (#14800)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-14 08:21:28 +00:00 |
|
Jee Jee Li
|
989ecd2007
|
[Misc] Gemma3ForConditionalGeneration supports LoRA (#14797)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-14 01:07:30 -07:00 |
|
WeiCheng
|
54cc46f3eb
|
[Bugfix] Fix small typo in the example of Streaming delimiter (#14793)
|
2025-03-14 08:05:17 +00:00 |
|
Cyrus Leung
|
601bd3268e
|
[Misc] Clean up type annotation for SupportsMultiModal (#14794)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-14 00:59:56 -07:00 |
|
Li Wang
|
09269b3127
|
[BugFix]Fix performance serving benchmark when enable profiling (#14737)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-03-14 07:02:05 +00:00 |
|
Thien Tran
|
27b50f1fe6
|
[Bugfix][Kernel][CPU] Fix num_tokens in CPU rotary embedding kernel (#14667)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-03-13 23:47:49 -07:00 |
|
Lucas Wilkinson
|
9532c49836
|
[Attention] MLA get rid of materialization (#14770)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-13 23:39:02 -07:00 |
|
Roger Wang
|
0c2af17c76
|
[CI] Fix missing example model id in processor test (#14787)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-14 13:52:15 +08:00 |
|
Jennifer Zhao
|
a6e0d096dd
|
[Feature] Add visionarena offline support for benchmark_throughput (#14654)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-03-14 04:07:54 +00:00 |
|
Liangfu Chen
|
d3d4956261
|
[Neuron] flatten test parameterization for neuron attention kernels (#14712)
|
2025-03-13 20:46:56 -07:00 |
|
Nick Hill
|
4059adc31b
|
[Misc][Minor] Simplify SamplingParams.__post_init__() (#14772)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-14 11:44:20 +08:00 |
|
Kevin H. Luu
|
f1f632d9ec
|
[ci] Reduce number of tests in fastcheck (#14782)
|
2025-03-13 20:43:45 -07:00 |
|
Thien Tran
|
95d680b862
|
[Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prepack when CPU does not support it (#14681)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-03-13 20:43:18 -07:00 |
|
Thomas Parnell
|
fb4c7f8ef0
|
[Kernel] [V1] Further optimizations to ROCm (Triton) Backend to better handle GQA. (#14431)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>
Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com>
|
2025-03-13 20:42:27 -07:00 |
|
Varun Sundar Rabindranath
|
0b1cfa6180
|
[Kernel] LoRA - Enable CUDAGraphs for V1 (#14626)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-13 20:42:04 -07:00 |
|
Woosuk Kwon
|
32ef4983cd
|
[V1] Temporarily disable FlashInfer Rejection Sampler (#14788)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-13 20:40:35 -07:00 |
|
Roger Wang
|
ad19c8a003
|
[V1] Move OOM check into sampler run (#14728)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-03-13 20:40:23 -07:00 |
|
Jeff Daily
|
2a602b055a
|
forward fix PR 14245, restore build on ROCm 6.2 (#14709)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-13 20:40:15 -07:00 |
|
Alexander Matveev
|
7888e1d0a3
|
[V1] TPU - Enable prefix caching by default (#14773)
|
2025-03-13 20:40:05 -07:00 |
|
Chen Zhang
|
60c872d4b6
|
[Doc] Fix small typo in Transformers fallback (#14791)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-03-13 20:33:12 -07:00 |
|
yasu52
|
3fb17d26c8
|
[Doc] Fix typo in documentation (#14783)
Signed-off-by: yasu52 <tsuguro4649@gmail.com>
|
2025-03-13 20:33:09 -07:00 |
|
Lucas Wilkinson
|
d47807ba08
|
[Attention] Remove slow setattr in MLA (#14769)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-13 21:31:14 +00:00 |
|
afeldman-nm
|
02fcaa3d0a
|
[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (#14624)
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
|
2025-03-13 19:07:34 +00:00 |
|
Aaron Pham
|
8a4a2efc6f
|
[V1][Core] using cached vocab_size for Structured Outputs (#14630)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-13 11:39:28 -07:00 |
|
Cyrus Leung
|
8e9ffd37d6
|
[Misc] Clean up processor tests (#14771)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-13 18:25:37 +00:00 |
|
Woosuk Kwon
|
01b3fd0af7
|
[V1][Minor] Minor enhancements on scheduler (#14732)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-13 08:53:22 -07:00 |
|
Cyrus Leung
|
f53a0586b9
|
[Bugfix] Fix prompt format of GLM4V (#14539)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-13 11:37:17 +00:00 |
|
Isotr0py
|
b1cc4dfef5
|
[VLM] Support loading InternVideo2.5 models as original InternVLChatModel (#14738)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-13 03:10:02 -07:00 |
|
Cyrus Leung
|
382403921f
|
[VLM] Support pan-and-scan for Gemma3 multi-modal processor (#14672)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-13 02:23:12 -07:00 |
|
Jee Jee Li
|
a73122de96
|
[Bugfix] fix benchmark moe (#14653)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-13 16:12:42 +08:00 |
|
Jee Jee Li
|
bd44b812cb
|
[CI/Build] Delete ultravox LoRA test (#14730)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-13 07:57:39 +00:00 |
|
Szymon Ożóg
|
55211b01e8
|
[Bugfix] Fix chunked prefill for GGUF (#14666)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
|
2025-03-13 07:19:03 +00:00 |
|
Kyle Sayers
|
5d043c1685
|
[Quant] Bamba SupportsQuant (#14698)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-03-13 04:57:05 +00:00 |
|
Kyle Sayers
|
36d1ccb286
|
[Quant] BartModel SupportsQuant (#14699)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-03-13 04:55:59 +00:00 |
|
Siyuan Liu
|
1bc3b739c4
|
[V1][TPU] Add assertion on multi-step-scheduler (#14707)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-03-12 21:37:58 -07:00 |
|
Mathis Felardos
|
1bd32bc8dd
|
[Config][Disaggregated] Add timeout configuration for the torch.store and add KVTransferConfig.kv_connector_extra_config (#14367)
Signed-off-by: Mathis Felardos <mathis@mistral.ai>
|
2025-03-12 20:15:20 -07:00 |
|
TY-AMD
|
128bf75283
|
[BugFix][TritonMLA] Process weights after model loading for GGUF (#14555)
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com>
|
2025-03-12 20:14:36 -07:00 |
|
Gregory Shtrasberg
|
a94a699c3f
|
[ROCm][FP8] Fix for adjustments needed only for fnuz (#14689)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-03-12 20:14:04 -07:00 |
|
Richard Liu
|
ab426ec9c0
|
Add ray[data] as tpu dependency (#14691)
Signed-off-by: <ricliu@google.com>
Signed-off-by: Richard Liu <ricliu@google.com>
|
2025-03-12 20:13:48 -07:00 |
|
Joe Runde
|
165290d357
|
[bugfix] fixup warning message for plugged schedulers for v1 (#14700)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-03-12 20:12:13 -07:00 |
|
Kevin H. Luu
|
ce20124671
|
[release] Add force remove for TPU logs (#14697)
|
2025-03-12 22:35:18 +00:00 |
|