Shinichi Hemmi
|
c7ffe93d9c
|
[Model] Support TP/PP/mamba2 kernel for PLaMo2 (#19674)
Signed-off-by: Shinichi Hemmi <shemmi@preferred.jp>
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
Co-authored-by: Calvin Metzger <metzger@preferred.jp>
Co-authored-by: Sixue Wang <cecilwang@preferred.jp>
|
2025-07-28 05:00:47 +00:00 |
|
Adeline
|
15a72ac478
|
[V1] Exception Handling when Loading KV Cache from Remote Store (#21534)
Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>
Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>
|
2025-07-27 20:34:17 -07:00 |
|
Jee Jee Li
|
04ff4be310
|
[Misc] Add fused_moe configs for Qwen3-Coder-480B-A35B-Instruct-FP8 (#21700)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-27 20:12:18 -07:00 |
|
Yuxuan Zhang
|
93269bb43e
|
Fix GLM tool parser (#21668)
Co-authored-by: Chenhui Zhang <zhang.chenhui@outlook.com>
|
2025-07-28 10:46:38 +08:00 |
|
Joachim Studnia
|
82acf2184d
|
Fix typo for limit-mm-per-prompt in docs (#21697)
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
|
2025-07-27 19:45:37 -07:00 |
|
Cyrus Leung
|
86ae693f20
|
[Deprecation][2/N] Replace --task with --runner and --convert (#21470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-27 19:42:40 -07:00 |
|
Alexander Matveev
|
8f605ee309
|
[Attention] Make CutlassMLA the default backend for SM100 (blackwell) (#21626)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-27 20:13:00 +00:00 |
|
Ning Xie
|
a9b2a1d704
|
[Misc] Refactor vllm config str (#21666)
|
2025-07-27 09:51:44 -07:00 |
|
Caleb_Du
|
57c22e57f9
|
Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-07-27 07:08:00 -07:00 |
|
Wentao Ye
|
bda9d0535f
|
[Refactor] Refactor MOE NVFP4 Code Base: ModelOpt + Compressed Tensor (#21631)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-27 05:25:21 -07:00 |
|
Isotr0py
|
3d847a3125
|
[VLM] Add video support for Intern-S1 (#21671)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-27 11:49:43 +00:00 |
|
Benji Beck
|
5f8c9a425e
|
Migrate Florence2ImagePixelInputs to TensorSchema (#21663)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-27 02:43:02 -07:00 |
|
Ning Xie
|
1cbf951ba2
|
[Misc] add default value for file pattern arg (#21659)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-27 05:14:51 +00:00 |
|
ZiTian.Zhao
|
a8936e5193
|
Refactor: Remove numpy dependency from LoggingStatLogger (#20529)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-07-27 04:06:21 +00:00 |
|
Ye (Charlotte) Qi
|
01a395e9e7
|
[CI/Build][Doc] Clean up more docs that point to old bench scripts (#21667)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-07-27 04:02:12 +00:00 |
|
Huy Do
|
971948b846
|
Handle non-serializable objects in vllm bench (#21665)
|
2025-07-27 03:35:22 +00:00 |
|
Isotr0py
|
eed2f463b2
|
[VLM] Support HF format Phi-4-MM model (#17121)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-26 20:07:57 -07:00 |
|
Benji Beck
|
20950b29fb
|
Migrate ChameleonImagePixelInputs to TensorSchema (#21657)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-26 19:34:25 -07:00 |
|
Benji Beck
|
3339cba3ff
|
Migrate FuyuImagePatchInputs to TensorSchema (#21662)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-26 19:34:14 -07:00 |
|
Benji Beck
|
0b8caf9095
|
Migrate DeepseekVL2ImageInputs to TensorSchema (#21658)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-26 19:34:11 -07:00 |
|
Benji Beck
|
ccf27cc4d4
|
Migrate Blip2ImagePixelInputs and Blip2ImageEmbeddingInputs to TensorSchema (#21656)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-27 10:33:52 +08:00 |
|
Jinzhen Lin
|
c657369841
|
support torch.compile for bailing moe (#21664)
|
2025-07-26 23:54:32 +00:00 |
|
Wenchen Lo
|
6c66f28fa5
|
Remove xformers requirement for Mistral-format Pixtral and Mistral3 (#21154)
Signed-off-by: Wenchen Lo <charles761013@gmail.com>
|
2025-07-26 17:20:29 -06:00 |
|
Kaixi Hou
|
de509ae8eb
|
[NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels (#21411)
Signed-off-by: kaixih <kaixih@nvidia.com>
|
2025-07-26 07:10:36 -07:00 |
|
Ye (Charlotte) Qi
|
e7c4f9ee86
|
[CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI (#21355)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-07-26 07:10:14 -07:00 |
|
Yeju Zhou
|
9094d11c5d
|
[Bugfix][Apple Silicon] fix missing symbols when build from source on Mac with Apple Silicon (#21380)
Signed-off-by: Yeju Zhou <yejuzhou@outlook.com>
|
2025-07-26 07:09:57 -07:00 |
|
Wentao Ye
|
56e544f24b
|
[Refactor] Remove moe_align_block_size_triton (#21335)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-26 07:08:29 -07:00 |
|
WeiQing Chen
|
97d6c30cc9
|
[BugFix] Fix shared storage connector load kv only load attention layer (#21428)
Signed-off-by: David Chen <530634352@qq.com>
|
2025-07-26 07:07:40 -07:00 |
|
Ye (Charlotte) Qi
|
a40a8506df
|
[Misc] Improve memory profiling debug message (#21429)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-07-26 07:07:21 -07:00 |
|
Wentao Ye
|
c215f5c877
|
[Bug] Fix has_flashinfer_moe Import Error when it is not installed (#21634)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-26 07:06:14 -07:00 |
|
Maximilien de Bayser
|
1cd6eaba54
|
Support encoder-only models without KV-Cache (#21270)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-07-26 21:09:52 +08:00 |
|
Isotr0py
|
f27fdfc3ed
|
[Bugfix] Investigate Qwen2-VL failing test (#21527)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-26 06:09:29 -07:00 |
|
Benji Beck
|
de10ff0b7c
|
Migrate AyaVisionImagePixelInputs to TensorSchema for shape validation (#21622)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-26 06:08:18 -07:00 |
|
Benji Beck
|
9d197280fa
|
Migrate AriaImagePixelInputs to TensorSchema for shape validation (#21620)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-26 06:08:15 -07:00 |
|
Huy Do
|
e98def439c
|
[Take 2] Correctly kill vLLM processes after benchmarks (#21646)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-07-26 06:06:05 -07:00 |
|
Reid
|
05c1126f29
|
[Misc] remove unused try-except in pooling config check (#21618)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-26 12:20:03 +00:00 |
|
Lyu Han
|
875af38e01
|
Support Intern-S1 (#21628)
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-26 19:14:04 +08:00 |
|
QiliangCui
|
7728dd77bb
|
[TPU][Test] Divide TPU v1 Test into 2 parts. (#21431)
|
2025-07-26 06:20:30 +00:00 |
|
Alexandre JUAN
|
2f6e6b33fb
|
[Bugfix] Fix isinstance check for tensor types in _load_prompt_embeds to use dtype comparison (#21612)
Signed-off-by: Alexandre Juan <a.juan@netheos.net>
|
2025-07-25 20:11:10 -07:00 |
|
Huy Do
|
a55c95096b
|
Correctly kill vLLM processes after finishing serving benchmarks (#21641)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-07-25 19:06:21 -07:00 |
|
WeiQing Chen
|
97349fe2bc
|
[Docs] add offline serving multi-modal video input expamle Qwen2.5-VL (#21530)
Signed-off-by: David Chen <530634352@qq.com>
|
2025-07-25 18:37:32 -07:00 |
|
Farzad Abdolhosseini
|
62965de5fe
|
[Model] Ultravox: Support Llama 4 and Gemma 3 backends (#17818)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
Signed-off-by: Patrick Li <patrick8289@gmail.com>
Co-authored-by: Patrick Li <patrick8289@gmail.com>
|
2025-07-25 18:12:31 -07:00 |
|
Alex Kogan
|
7ae75fa6d0
|
[Feature] Add support for MoE models in the calibration-free RTN-based quantization (#20766)
Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
|
2025-07-25 18:09:34 -07:00 |
|
Chengji Yao
|
f1b286b2fb
|
[TPU] Update ptxla nightly version to 20250724 (#21555)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-25 17:09:00 -07:00 |
|
Rui Qiao
|
c7742d6113
|
[Bugfix] Always set RAY_ADDRESS for Ray actor before spawn (#21540)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-25 17:08:30 -07:00 |
|
Rui Qiao
|
cea96a0156
|
[Bugfix] Fix sync_and_slice_intermediate_tensors (#21537)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-25 17:07:58 -07:00 |
|
Yong Hoon Shin
|
2eddd437ba
|
Add interleaved RoPE test for Llama4 (Maverick) (#21478)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-25 17:07:26 -07:00 |
|
Wentao Ye
|
75d29cf4e1
|
[Perf] Cuda Kernel for Int8 Per Token Group Quant (#21476)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-25 17:07:07 -07:00 |
|
Daniel Han
|
41d3082c41
|
Add Unsloth to RLHF.md (#21636)
|
2025-07-25 17:06:48 -07:00 |
|
QiliangCui
|
7cfea0df39
|
[TPU][Test] Rollback PR-21550. (#21619)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-25 13:22:01 -07:00 |
|