biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Wentao Ye	37bd8d6e4c	[Bug] DeepGemm: Fix TypeError: per_block_cast_to_fp8() missing 1 required positional argument: 'use_ue8m0' for SM100 (#21187 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-18 23:25:22 -07:00
Lucas Wilkinson	468e2400fe	[BugFix][CPU] Fix `TorchSDPABackendImpl` doesn't have `use_irope` (#21200 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-07-18 23:18:48 -07:00
Varun Sundar Rabindranath	dcc6cfb991	[Kernel][Performance] Tweak MoE Batched silu_mul_fp8_quant_deep_gemm kernel (#21193 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-18 23:09:51 -07:00
Woosuk Kwon	dd572c0ab3	[V0 Deprecation] Remove V0 Spec Decode workers (#21152 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-18 21:47:50 -07:00
Varun Sundar Rabindranath	9ffe905a41	[Bugfix][Model] Fix LoRA for Mistral-Small-3.1-24B-Instruct-2503 (#21183 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-07-18 21:15:03 -07:00
Lucia Fang	9a9fda1423	[Core] Support Local Chunked Attention for Hybrid KV Cache (#19351 ) Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Lu Fang <fanglu@meta.com> Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: Lu Fang <fanglu@meta.com>	2025-07-18 20:48:38 -07:00
Jee Jee Li	466e878f2a	[Quantization] Enable BNB support for more MoE models (#21100 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-18 17:52:02 -07:00
Rui Qiao	217937221b	Elastic Expert Parallel Initial Support (#20775 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-07-18 17:46:09 -07:00
hax0r31337	5782581acf	[Bugfix] Voxtral on Blackwell GPUs (RTX 50 series) (#21077 ) Signed-off-by: hax0r31337 <liulihaocaiqwq@gmail.com>	2025-07-18 18:40:18 -04:00
JialinOuyang-Meta	0f199f197b	[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (#21005 ) Signed-off-by: Jialin Ouyang <jialino@meta.com>	2025-07-18 12:34:40 -07:00
Richard Zou	b2eb2b5ad7	[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0 (#19346 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-07-18 14:10:21 -04:00
Richard Zou	21274ab476	[CI] Update CODEOWNERS for vllm/compilation (#21185 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-07-18 06:51:12 -07:00
Thomas Parnell	ed8cbfedf8	Let GraniteMoeAttention use YaRN (#21174 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-18 05:52:52 -07:00
Cyrus Leung	45badd05d0	[Core] Set pooling params based on task and model (#21128 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-18 05:41:17 -07:00
ElizaWszola	4adc66f64d	[Bugfix] Allocate less memory in non-batched CUTLASS MoE (#21121 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-07-18 18:55:52 +08:00
Cyrus Leung	55ad648715	[Doc] Fix typo in model name (#21178 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-18 03:55:10 -07:00
wang.yuqi	5895afd780	[Bugfix] The special_tokens in tokenizer should also be controlled by do_lower_case in encoder_config. (#20750 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-18 09:10:47 +00:00
wang.yuqi	ca4eb82bcb	[Model] Re-add the implicit conversion feature for as_seq_cls_model (#21103 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-18 07:15:07 +00:00
Roger Wang	ba2dfbb0c2	[Misc] Make MM embedding merge interface explicit in model runner (#21147 ) Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-18 07:13:57 +00:00
Jialin Ouyang	1bf65138f6	[benchmark] Sending request strictly follows the random intervals (#21108 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-07-18 06:22:08 +00:00
Woosuk Kwon	54cf1cae62	[Misc] Do not print async output warning for v1 (#21151 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-17 21:57:02 -07:00
shixianc	5780121c95	[Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm (#20911 ) Signed-off-by: Shixian Cui <shixian@amazon.com> Co-authored-by: Shixian Cui <shixian@amazon.com>	2025-07-18 04:34:43 +00:00
Shu Wang	c7d8724e78	[Core] FlashInfer CUTLASS fused MoE backend (NVFP4) (#20037 ) Signed-off-by: shuw <shuw@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-17 21:32:45 -07:00
22quinn	b38baabcf9	[Doc] Add inplace weights loading example (#19640 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-07-17 21:12:23 -07:00
Lucas Wilkinson	89cab4d01f	[Attention] Make local attention backend agnostic (#21093 )	2025-07-18 00:10:42 -04:00
Lucia Fang	b9a21e9173	[Docs] Update supported models documentation with missing models (#20844 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-07-17 20:12:13 -07:00
Ricardo Decal	c4e3b12524	[Docs] Add minimal demo of Ray Data API usage (#21080 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-17 20:09:19 -07:00
elvischenv	8dfb45ca33	[Bugfix] Fix the tensor non-contiguous issue for Flashinfer TRT-LLM backend attention kernel (#21133 )	2025-07-18 00:35:58 +00:00
Wentao Ye	8a8fc94639	[Log] Debugging Log with more Information (#20770 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-18 00:19:46 +00:00
Woosuk Kwon	4de7146351	[V0 deprecation] Remove V0 HPU backend (#21131 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-17 16:37:36 -07:00
Eric Curtin	ac9fb732a5	On environments where numa cannot be detected we get 0 (#21115 ) Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-07-17 18:52:17 +00:00
Jee Jee Li	a3a6c695f4	[Misc] Qwen MoE model supports LoRA (#20932 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-17 18:32:52 +00:00
Cyrus Leung	90bd2ab6e3	[Model] Update pooling model interface (#21058 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-17 16:05:40 +00:00
ElizaWszola	9fb2d22032	[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-07-17 09:56:44 -04:00
Harry Mellor	2d6a38209b	[Docs] Move code block out of admonition now that it's short (#21118 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-17 06:12:29 -07:00
wangxiyuan	89e3c4e9b4	[Misc] Avoid unnecessary import (#21106 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-17 12:57:41 +00:00
Harry Mellor	fe8a2c544a	[Docs] Improve docstring formatting for `FusedMoEParallelConfig.make` (#21117 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-17 04:13:00 -07:00
kYLe	4ef00b5cac	[VLM] Add Nemotron-Nano-VL-8B-V1 support (#20349 ) Signed-off-by: Kyle Huang <kylhuang@nvidia.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-07-17 03:07:55 -07:00
Asher	5a7fb3ab9e	[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 ) Signed-off-by: Asher Zhang <asherszhang@tencent.com>	2025-07-17 09:10:09 +00:00
Varun Sundar Rabindranath	11dfdf21bf	[Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels (#20903 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-17 08:10:37 +00:00
Chauncey	fdc5b43d20	[Bugfix]: Fix final_res_batch list index out of range error (#21055 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-17 00:29:09 -07:00
Jee Jee Li	c5b8b5953a	[Misc] Fix PhiMoE expert mapping (#21085 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-17 05:47:49 +00:00
David Ben-David	4fcef49ec4	[V1] [KVConnector] Fix MultiprocExecutor worker output aggregation (#21048 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-07-17 13:29:45 +08:00
Zhonghua Deng	8a4e5c5f3c	[V1][P/D]Enhance Performance and code readability for P2pNcclConnector (#20906 ) Signed-off-by: Abatom <abzhonghua@gmail.com>	2025-07-16 22:13:00 -07:00
Lucas Wilkinson	76b494444f	[Attention] Refactor attention metadata builder interface (#20466 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-07-17 04:44:25 +00:00
Michael Goin	28a6d5423d	[Bugfix] Fix Machete zero point issue for GPTQ models on SM90 (#21066 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-16 19:54:45 -07:00
XiongfeiWei	58760e12b1	[TPU] Start using python 3.12 (#21000 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-07-16 19:37:44 -07:00
Michael Goin	a50d918225	[Docker] Allow FlashInfer to be built in the ARM CUDA Dockerfile (#21013 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-16 19:37:13 -07:00
Kevin_Xiong	c9ba8104ed	[Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group (#21024 ) Signed-off-by: KevinXiong-C <kevin_xiong1997@outlook.com>	2025-07-16 19:36:36 -07:00
Michael Goin	4e7dfbe7b4	Update PyTorch to `torch==2.7.1` for CUDA (#21011 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-17 02:30:44 +00:00

1 2 3 4 5 ...

7811 Commits