Richard Liu
|
d374f04a33
|
Fix run_tpu_test (#14641)
Signed-off-by: <ricliu@google.com>
Signed-off-by: Richard Liu <ricliu@google.com>
|
2025-03-11 21:14:33 +00:00 |
|
Russell Bryant
|
61a01b27a7
|
[V1] Delay all xgrammar usage until needed (#14616)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-11 20:21:33 +00:00 |
|
Yang.Tao
|
53056731fd
|
fix some typos : supported_head_sizes (#14627)
|
2025-03-11 10:38:24 -07:00 |
|
Russell Bryant
|
4cbf286794
|
[V1] Remove cache from StructuredOutputManager (#14622)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-11 10:36:07 -07:00 |
|
Kunshang Ji
|
c6e14a61ab
|
[Hardware][Intel GPU] upgrade IPEX dependency to 2.6.10. (#14564)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-03-11 17:11:47 +00:00 |
|
Lucas Wilkinson
|
07b4b7a37f
|
[BugFix/Build] Fix sparse kernels not getting built on hopper (#14572)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-11 17:09:03 +00:00 |
|
Dilip Gowda Bhagavan
|
07964e2f30
|
docs: Add documentation for s390x cpu implementation (#14198)
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-11 17:02:17 +00:00 |
|
Russell Bryant
|
4bf82d4b90
|
[V1] Add regex structured output support with xgrammar (#14590)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-11 23:03:44 +08:00 |
|
Richard Liu
|
9ab326713f
|
Uninstall dependencies before installing requirements/tpu.txt (#14586)
Signed-off-by: <ricliu@google.com>
Signed-off-by: Richard Liu <ricliu@google.com>
|
2025-03-11 08:01:35 -07:00 |
|
Cyrus Leung
|
af295e9b01
|
[Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 (#14609)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-11 07:59:43 -07:00 |
|
Jeff Daily
|
a1c8f3796c
|
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-11 10:54:56 -04:00 |
|
Russell Bryant
|
08a1a1121d
|
benchmarks: simplify test jsonschema (#14567)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-11 13:39:30 +00:00 |
|
Isotr0py
|
1477ffc381
|
[VLM] Cleanup siglip legacy code and fix broken paligemma multimodal processor (#14602)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-11 11:27:36 +00:00 |
|
yexin(叶鑫)
|
70b808fe1a
|
[Perf]:Optimize qwen2-vl to reduce cudaMemcpyAsync (#14377)
Signed-off-by: cynthieye <987073381@qq.com>
|
2025-03-11 07:39:56 +00:00 |
|
Isotr0py
|
63d635d179
|
[Misc] Correct deepseek-vl2 chat template (#14558)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-11 04:37:11 +00:00 |
|
Roger Wang
|
1fc973c0b5
|
[V1][Core] Fix memory issue with logits & sampling (#14508)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com>
|
2025-03-11 04:03:41 +00:00 |
|
Concurrensee
|
c982ac5722
|
[Bugfix] Fix FP16 overflow for DeepSeek V2 (#13232)
Signed-off-by: Yida Wu <yida.wu@amd.com>
|
2025-03-10 20:46:59 -07:00 |
|
Cody Yu
|
4290b704ff
|
[V1][PP] Do not block engine core when no requests to schedule (#14585)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-10 19:48:24 -07:00 |
|
Liangfu Chen
|
c91b64f749
|
[neuron] add reshape_and_cache (#14391)
|
2025-03-10 18:37:29 -07:00 |
|
gnovack
|
d6123170d5
|
[Neuron] Add Neuron device communicator for vLLM v1 (#14085)
|
2025-03-10 18:37:04 -07:00 |
|
Cody Yu
|
485afdd3cb
|
[MISC][V1] Handle exception of current_platform.get_device_name() in arg_utils (#14379)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-10 20:42:11 -04:00 |
|
Jinzhen Lin
|
90e88ab756
|
[Kernel] moe wna16 cuda kernel (#13321)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-03-10 20:12:40 -04:00 |
|
Russell Bryant
|
04421dff8a
|
[V1] Prevent xgrammar from breaking TPU support (#14575)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-10 23:06:19 +00:00 |
|
Russell Bryant
|
432d6dad15
|
Fix typo in benchmark_serving_structured_output.py (#14566)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-10 14:58:58 -07:00 |
|
Varun Sundar Rabindranath
|
5ff0d32580
|
[V1] LoRA - Add triton kernels for V1 (#13096)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-10 17:27:53 -04:00 |
|
Woosuk Kwon
|
0967110e42
|
[Minor] Update the tqdm bar for parallel sampling (#14571)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-10 14:23:48 -07:00 |
|
Simon Mo
|
fb0acb6c72
|
[Perf] Improve MLA on V1 (#14540)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-10 12:06:58 -07:00 |
|
Chauncey
|
92b0ce2ac7
|
[Bugfix][v1] fixed llava-hf/llava-1.5-7b-hf is broken on V1 (#14554)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-10 18:24:51 +00:00 |
|
Harry Mellor
|
bc2d4473bf
|
[Docs] Make installation URLs nicer (#14556)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-10 10:43:08 -07:00 |
|
Harry Mellor
|
3b352a2f92
|
Correct capitalisation: VLLM -> vLLM (#14562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-10 16:36:21 +00:00 |
|
Roger Wang
|
dea985aef0
|
[V1][Bugfix] Fix handing of second_per_grid_ts for Qwen2-VL & Qwen2.5-VL (#14548)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-10 16:03:11 +00:00 |
|
Harry Mellor
|
39be30351f
|
Correct capitalisation: Github -> GitHub (#14561)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-10 15:53:33 +00:00 |
|
Cyrus Leung
|
001a9c7b0d
|
[Doc] Update PaliGemma note to a warning (#14565)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-10 15:02:28 +00:00 |
|
Szymon Ożóg
|
89cdaa83e7
|
[Kernel] Add more dtype support for GGUF kernels (#14043)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-03-10 07:30:04 -07:00 |
|
Chauncey
|
b0746fae3d
|
[Frontend] support image embeds (#13955)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-10 12:36:03 +00:00 |
|
Harry Mellor
|
60a98b2de5
|
[Docs] Mention model_impl arg when explaining Transformers fallback (#14552)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-10 12:13:10 +00:00 |
|
Chauncey
|
460f553a6d
|
[Misc] Add log information for handle_process_request. (#14130)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-10 08:40:50 +00:00 |
|
Jennifer Zhao
|
1253b15774
|
[Feature] Consolidate performance benchmark datasets (#14036)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-10 07:23:11 +00:00 |
|
Martin Hoyer
|
dc74613fa2
|
[Bugfix] Wrong requirements path - rocm (#14527)
Signed-off-by: Martin Hoyer <mhoyer@redhat.com>
|
2025-03-10 02:49:46 +00:00 |
|
Yanyi Liu
|
a21076ed3a
|
[Misc] Ensure out-of-tree quantization method recognize by cli args (#14328)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
|
2025-03-09 12:13:31 +00:00 |
|
Chengji Yao
|
212007b168
|
[Hardware][TPU] Fix the recompiling issue in logits processor after warmup (#14510)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-03-09 05:44:39 -04:00 |
|
Isotr0py
|
fb16eea48b
|
[Bugfix] Revert QKVCrossParallelLinear usage in Mllama to keep BNB quantization work (#14498)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-09 04:47:45 +00:00 |
|
Yuchen Yan
|
73ae0b44e9
|
[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 (#12428)
Signed-off-by: Yuchen Yan <740987012@qq.com>
|
2025-03-08 20:14:53 -08:00 |
|
Jiayi Yao
|
6d7f037748
|
[Feat] Support chunked prefill for LMCache connector (#14505)
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2025-03-08 19:30:06 -08:00 |
|
iefgnoix
|
10f7552789
|
[V1][TPU] Remove unnecessary padding for running on TPU. (#14467)
|
2025-03-08 21:56:04 -05:00 |
|
Lucas Wilkinson
|
b0d541947a
|
[Attention] Default to FlashMLA backend for MLA (#14451)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-08 18:18:39 -08:00 |
|
Robert Shaw
|
5f0b53c6ea
|
Revert "[V1][Core] Fix memory issue with logits & sampling" (#14504)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-08 17:43:37 -08:00 |
|
22quinn
|
eb8b5eb183
|
[V1] Support bad_words in sampler (#13376)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-08 14:50:26 -08:00 |
|
Cyrus Leung
|
9513290032
|
[Misc] Upgrade to Python 3.9 typing for additional directories (#14492)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-08 17:35:50 +00:00 |
|
Russell Bryant
|
0d5e73d30e
|
Update CODEOWNERS for structured output (#14496)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-08 17:19:51 +00:00 |
|