biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Wentao Ye	541a2ef892	[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. (#29546 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 20:31:14 +08:00
Jinzhen Lin	879ddb09c3	[Kernel][MoE] optimize `moe_align_block_size` (#29642 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 01:58:47 -08:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Cyrus Leung	671427efbf	[Model] Move `multimodal_cpu_fields` definition to field config (#30181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 13:40:02 +00:00
Cyrus Leung	c46b932df2	[Chore] Deprecate `SupportsMultiModal.merge_by_field_config` (#30170 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 07:57:28 +00:00
Peter Salas	e858bc4d14	[Model] Add support for transformer-based Ultravox v0.7 projector (#30089 ) Signed-off-by: Peter Salas <peter@fixie.ai>	2025-12-05 20:55:43 -08:00
Dongjie Zou	e3fbb6f152	fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb (#30093 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2025-12-05 20:55:09 -08:00
yuttian1	c4d62618ca	Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend (#30102 ) Signed-off-by: yuttian1 <yuttian@amd.com>	2025-12-05 20:54:38 -08:00
rasmith	dc839ad03d	[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round (#30151 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 20:52:11 -08:00
Wentao Ye	7b5575fa7d	[Bug] Fix vLLM config is not set error (#29999 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-05 16:42:12 -05:00
Divakar Verma	962d703818	[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-05 19:57:26 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Yi Liu	0d8a7d8a26	[Compressed Tensors] Add XPU `wNa16` support (#29484 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2025-12-05 22:02:09 +08:00
Zhiwei	3628bcaaf2	[ROCm][MXFP4] Infer w4a4 quant method in rocm aiter fused moe (#29775 ) Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>	2025-12-05 11:01:16 +00:00
amitz-nv	6038b1b04b	[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978 ) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>	2025-12-05 00:34:33 -08:00
Alexander Matveev	4470ee2f90	[Perf] Enable separate shared_experts stream only for CUDA (#30085 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-12-05 00:03:17 +00:00
Harry Mellor	e10c84e06a	Access `partial_rotary_factor` from `rope_parameters` (#29966 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 18:42:49 +00:00
Jee Jee Li	652ba93da3	[Bugfix] Fix FP8 MoE LoRA (#29890 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-04 18:17:49 +00:00
Tao Yun	6dcb07f676	support qwen3-vl handle requests with embeddings (#30037 ) Signed-off-by: taoyun <1069423820@qq.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-12-04 17:34:06 +00:00
Cyrus Leung	b286a311c2	[Chore] Deprecate `merge_by_field_config` arg (#30035 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 17:21:24 +00:00
Harry Mellor	9998ea5b57	Delete HF version of Phi 4 MM (#30049 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 13:44:50 +00:00
wang.yuqi	74c4d80c6c	[Model][6/N] Improve all pooling task \| Support chunked prefill with ALL pooling (#27145 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-04 13:44:15 +00:00
Cyrus Leung	68eb5c8d97	[Misc] Move functions into `PoolingMetadata` (#30027 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 08:21:19 +00:00
TJian	3f1b03739a	[ROCm] [Bugfix] `compute_attn_mask_seqlen` for qwen3 omni (#29974 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-12-04 08:20:24 +00:00
Cyrus Leung	9ae2f60374	[Misc] Various cleanups for MM input processing (#29970 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 06:22:20 +00:00
bnellnm	2902c34826	[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-03 20:49:00 +00:00
Varun Sundar Rabindranath	19bee6d12d	[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-03 18:04:59 +00:00
HDCharles	b294e28db2	[refactor] CTMoEMethods to use QuantizationArgs (#28871 ) Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-03 11:00:56 +00:00
Tsukasa OI	42c1949643	[Bugfix][Quantization] Support BF16 tensors on GGUF (#29948 ) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>	2025-12-03 10:33:46 +00:00
Isotr0py	a21cd9ed23	[Bugfix] Fix incorrect `image_grid_thw` rank for HunyuanOCR from missing `merge_by_field_config=True` (#29950 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-03 10:05:10 +00:00
Julien Denize	5e5646e206	[BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention (#29908 ) Signed-off-by: juliendenize <julien.denize@mistral.ai>	2025-12-02 14:51:20 -08:00
Harry Mellor	6fc5841db1	Fix some more Transformers nightly tests (#29872 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 21:49:44 +00:00
Navanit Dubey	a2b053dc85	feat(model): Add BitsAndBytes quantization support for Qwen3-Omni-MoE (#29896 ) Signed-off-by: navanit-git <navanitdubey@gmail.com>	2025-12-02 19:28:35 +00:00
Matthew Bonanni	1d93f11675	[Attention][CUDAGraph] Remove CG padding from attention backends (#29352 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-02 13:48:08 -05:00
Isotr0py	0ec8422171	[Bugfix] Fix incorrect channel order for idefics3 in edge case (#29881 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-02 16:03:52 +00:00
Matthew Bonanni	51c57b51dd	[Bugfix] Fix DeepSeek R1 MTP weight loading (#29545 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-12-02 15:52:18 +00:00
Cyrus Leung	68ffbca7e4	[Chore] Use `tokenizer.encode` and `tokenizer.decode` directly (#29851 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 12:30:40 +00:00
Julien Denize	d8c6210eea	Add Mistral Large 3 and Ministral 3 (#29757 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: Mickael Seznec <mickael@mistral.ai> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Mickael Seznec <mickael@mistral.ai>	2025-12-02 10:29:00 +00:00
Harry Mellor	f5b0846ba0	Fix some Transformers nightly tests (#29802 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 07:05:27 +00:00
Cyrus Leung	653591d5e7	[Chore] Move tokenizer initialization methods (#29793 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 13:33:37 +08:00
Johnny Yang	f441d36cee	Add missing return in _check_vllm_model_embed_input_ids (#29834 ) Signed-off-by: Johnny Yang <johnnyyang@google.com>	2025-12-01 19:22:50 -08:00
Divakar Verma	4b40924998	[ROCm] Fallback pytorch GELU with tanh approximation to GELU() (#29244 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com> Signed-off-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-02 02:02:22 +00:00
sangbumlikeagod	092bb73b8a	[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209 ) Signed-off-by: sangbumlikeagod <oironese@naver.com> Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>	2025-12-01 18:19:17 +01:00
Fanli Lin	f37e8938d2	[XPU] Fix AWQ skipped layer detection in IPEX quantization (#29774 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com>	2025-12-01 12:00:52 +00:00
Shu Wang	f72a817bdf	[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141 ) Signed-off-by: Shu Wang <shuw@nvidia.com> Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-30 16:05:32 -08:00
Xingyu Liu	21c2627934	[Misc]Remove redundant hidden_size property in ModelConfig (#29749 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-30 17:14:23 +00:00
Omer Ullman Argov	39d28108f4	[Feat] Support non-gated activations in NVFP4 modelopt path (#29004 )	2025-11-30 11:02:40 -05:00
Cyrus Leung	64bc09ba27	[Core] Enable `inputs_embeds_size` separate from `hidden_size` (#29741 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 17:31:12 +08:00
Isotr0py	47539cfd3e	[Bugfix] Fix mismatched nvfp4 gemm output shape (#29742 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-30 09:15:01 +00:00

... 4 5 6 7 8 ...

3709 Commits