22quinn
|
b799f4b9ea
|
[CI/Build] Fix tensorizer test for load_format change (#22583)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-10 19:30:00 -07:00 |
|
Benji Beck
|
06da44f0cb
|
Migrate LlavaImageInputs to TensorSchema (#21770)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-10 19:29:19 -07:00 |
|
Benji Beck
|
a554991748
|
Migrate LlavaNextVideoPixelInputs to TensorSchema (#21843)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-10 19:29:16 -07:00 |
|
Doug Smith
|
d1af8b7be9
|
enable Docker-aware precompiled wheel setup (#22106)
Signed-off-by: dougbtv <dosmith@redhat.com>
|
2025-08-10 16:29:02 -07:00 |
|
Benji Beck
|
68b254d673
|
Fix TensorSchema validation test for symbolic dims (#22366)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-10 17:16:44 +00:00 |
|
ZiTian Zhao
|
8c50d62f5a
|
Remove redundant row_indices unsqueeze operation in MiniCPMO (#22528)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-08-10 09:20:00 -07:00 |
|
Benji Beck
|
b4e2916721
|
Migrate LlavaNextImageInputs to TensorSchema (#21774)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-10 09:05:21 -07:00 |
|
Breno Baldas Skuk
|
65a7917be4
|
Fix(benchmarks): allow multiple mm contents in OpenAI Chat Completion Benchmarks (#22534)
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
|
2025-08-10 09:03:15 -07:00 |
|
Isotr0py
|
b76753f0b5
|
[Bugfix][Kernel] Support partial rotary embedding for MRoPE triton kernel (#22593)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-10 09:00:36 -07:00 |
|
youkaichao
|
b81fe83b2c
|
[doc] add alibaba cloud as sponsor (#22597)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-08-10 23:13:47 +08:00 |
|
youkaichao
|
0757551c96
|
[doc] add beijing meetup links (#22596)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-08-10 22:51:36 +08:00 |
|
Harry Mellor
|
8290d15d2c
|
Move CacheConfig from config/__init__.py to config/cache.py (#22586)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-10 07:36:40 -07:00 |
|
Isotr0py
|
049c245143
|
[Misc] Replace flaky image urls in pixtral test (#22574)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-10 06:18:21 -07:00 |
|
Harry Mellor
|
00976db0c3
|
[Docs] Fix warnings in docs build (#22588)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-10 05:49:51 -07:00 |
|
Cyrus Leung
|
d411df0296
|
[Misc] Further refine type annotations in parallel state (#22499)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-10 05:49:48 -07:00 |
|
22quinn
|
010e0e39ea
|
[Doc] Fix API doc link in side navigation (#22585)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-10 01:35:22 -07:00 |
|
Ning Xie
|
326976291b
|
[Misc] code clean duplicate set_current_vllm_config in _set_vllm_config (#22566)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-10 00:08:48 -07:00 |
|
Isotr0py
|
7e8d685775
|
[Minor] Fix pre-commit error on main (#22579)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-10 00:08:23 -07:00 |
|
Harry Mellor
|
c49848396d
|
Refactor sliding window configuration to Transformers best practice (#21927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-09 20:50:48 -07:00 |
|
Chengji Yao
|
2a84fb422f
|
[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block (#22394)
Signed-off-by: Chengji Yao <chengjiyao@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@gmail.com>
|
2025-08-09 20:49:04 -07:00 |
|
ZiTian Zhao
|
534c45b962
|
Improve fast_topk function with type hints and documentation (#22530)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-08-09 20:25:42 -07:00 |
|
Le Chen
|
3d7363e61c
|
[Config] add "qwen" as a native eagle3 target supported model (#22333)
Signed-off-by: lechen <lecself@163.com>
Signed-off-by: LeChen <lecself@163.com>
|
2025-08-09 20:21:05 -07:00 |
|
Jee Jee Li
|
0c5254b82a
|
[oss] Init gpt-oss bf16 support (#22508)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-09 20:19:13 -07:00 |
|
Thomas Parnell
|
61f67d8acd
|
[V1] [Hybrid] Enable Full CUDA Graph (decode-only) for Mamba layers (#21401)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-09 20:16:11 -07:00 |
|
TJian
|
42172ad18f
|
[FEAT] [Performance] Add triton mrope to replace the torch code path (#22375)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-08-09 11:50:03 -07:00 |
|
Isotr0py
|
fbd8595c5c
|
[Bugfix] Fix basic models tests hanging due to mm processor creation (#22571)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-09 11:42:21 -07:00 |
|
Nicolò Lucchesi
|
5a16fa614c
|
[Model] Gemma3n MM (#20495)
Signed-off-by: ShriKode <shrikode@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: ShriKode <shrikode@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-08-09 09:56:25 -07:00 |
|
Harry Mellor
|
2d18256e47
|
Move ParallelConfig from config/__init__.py to config/parallel.py (#22565)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-09 08:33:46 -07:00 |
|
Harry Mellor
|
56186474f6
|
[Docs] Reduce noise in docs and --help from the JSON tip (#22567)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-09 08:31:32 -07:00 |
|
Thomas Parnell
|
1bf5e1f25b
|
[CI] [Hybrid] Speed up hybrid models test by removing large models (#22563)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-09 02:04:42 -07:00 |
|
Yuxuan Zhang
|
a6022e6fbc
|
GLM-4.5V with new class name at transformers (#22520)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-09 00:50:21 -07:00 |
|
Thomas Parnell
|
2be07a0db1
|
Update docs for Minimax-Text support (#22562)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-09 00:18:18 -07:00 |
|
Jee Jee Li
|
0edc0cd52b
|
[Bugfix] Fix CI moe kernel failure (#22556)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-09 00:03:29 -07:00 |
|
Isotr0py
|
7920e9b1c5
|
[Bugfix] Fix failing GPT-OSS initialization test (#22557)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-09 00:03:26 -07:00 |
|
Charlie Fu
|
b7c0942b65
|
[ROCm][Misc] Rename the context_len to seq_len in ROCm custom paged attention kernel (#22097)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-08-08 23:15:06 -07:00 |
|
Kyuyeun Kim
|
9a0c5ded5a
|
[TPU] Add support for online w8a8 quantization (#22425)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2025-08-08 23:12:54 -07:00 |
|
Eldar Kurtić
|
10a02535d4
|
Fix loading of quantized BigCode models (#22463)
Signed-off-by: Eldar Kurtic <eldar@neuralmagic.com>
|
2025-08-08 23:12:12 -07:00 |
|
Cyrus Leung
|
65552b476b
|
[Misc] Use config definitions from Transformers library (#21913)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-08 23:10:51 -07:00 |
|
Or Ozeri
|
7ad7adb67f
|
v1: Pass KVConnectorOutput to scheduler-side (#22157)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-08-08 23:09:51 -07:00 |
|
Thomas Parnell
|
6ade99eafa
|
[V1] [Hybrid] Support Minimax-Text-01 in V1 (#22151)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-08 23:08:48 -07:00 |
|
Wentao Ye
|
3157aebb63
|
[Log] Add Warning for Deprecation of DeepGEMM old version (#22194)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-08 23:07:48 -07:00 |
|
Thomas Parnell
|
8a0ffd6285
|
Remove mamba_ssm from vLLM requirements; install inside test container using --no-build-isolation (#22541)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-08 23:05:32 -07:00 |
|
Roger Wang
|
23472ff51c
|
[Doc] Add usage of implicit text-only mode (#22561)
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
|
2025-08-08 23:04:19 -07:00 |
|
Roger Wang
|
08b751ba74
|
Implicit language-model-only mode via limit-mm-per-prompt (#22299)
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Andy Xie <andy.xning@gmail.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: XIn Li <xinli@nvidia.com>
Signed-off-by: Junhao Li <junhao@ubicloud.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Signed-off-by: Linkun <github@lkchen.net>
Co-authored-by: Ning Xie <andy.xning@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
Co-authored-by: Zhiyu <zhiyuc@nvidia.com>
Co-authored-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: XIn Li <xinli@nvidia.com>
Co-authored-by: Junhao Li <streaver91@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Yuxuan Zhang <2448370773@qq.com>
Co-authored-by: ZiTian Zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Po-Han Huang (NVIDIA) <53919306+nvpohanh@users.noreply.github.com>
Co-authored-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Hong Hanh <hanh.usth@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: lkchen <github@lkchen.net>
|
2025-08-08 22:21:40 -07:00 |
|
Isotr0py
|
429e4e2d42
|
[Bugfix] Fix ModernBert cuda graph capturing in v1 (#21901)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-08 22:17:22 -07:00 |
|
Pradyun92
|
35afe1b30b
|
[BugFix] [P/D] Handle lookahead token count edge-case with Eagle Spec Decoding and P/D (#22317)
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
Signed-off-by: Pradyun92 <142861237+Pradyun92@users.noreply.github.com>
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-08-08 17:04:15 -07:00 |
|
Kunshang Ji
|
81c57f60a2
|
[XPU] upgrade torch 2.8 on for XPU (#22300)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-08-08 17:03:45 -07:00 |
|
Russell Bryant
|
311d875614
|
Drop flaky test_healthcheck_response_time (#22539)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-08 16:56:47 -07:00 |
|
Harry Mellor
|
e3edc0a7a8
|
Extract CompilationConfig from config.py (#22524)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-08 16:34:25 -07:00 |
|
yyweiss
|
baece8c3d2
|
[Frontend] Add unix domain socket support (#18097)
Signed-off-by: <yyweiss@gmail.com>
Signed-off-by: yyw <yyweiss@gmail.com>
|
2025-08-08 16:23:44 -07:00 |
|