biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Li, Jiang	b5dfb94fa0	[CI/Build][Bugfix] Fix Qwen2.5 tests in CPU CI via fallback silu_and_mul to torch native implementation (#22145 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-03 05:34:04 -07:00
Isotr0py	3dddbf1f25	[Misc] Add tensor schema test coverage for multimodal models (#21754 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-03 00:52:14 -07:00
jiahanc	337eb23bcc	[Fix] Fix llama4 modelopt weight loading error (#22107 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-03 00:50:34 -07:00
Yan Ma	73e1b9b1d4	[xpu]support moe models on XPU platform (#21643 ) Signed-off-by: yan <yan.ma@intel.com> Signed-off-by: Yan Ma <yan.ma@intel.com>	2025-08-02 07:49:08 -07:00
Chih-Chieh Yang	b690e34824	[Model] Mamba2 preallocate SSM output tensor to avoid d2d copy overhead (#21075 ) Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com> Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>	2025-08-02 01:59:34 -07:00
Yuxuan Zhang	25373b6c6c	for glm-4.1V update (#22000 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-08-02 01:46:57 -07:00
Chih-Chieh Yang	c64861d63c	[Bugfix] Mamba2 remove bugged initial state condition in chunk scan (#22034 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>	2025-08-01 23:55:57 -07:00
vllmellm	d3a6f2120b	[FEAT][ROCm] Enable running Flash Attention as ViT attn backend for Qwen-VL models on ROCm platform. (#22069 ) Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>	2025-08-01 23:53:18 -07:00
Dipika Sikka	9f9c38c392	[Speculators][Speculative Decoding] Add Qwen Eagle3 Support (#21835 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>	2025-08-01 19:43:37 -07:00
Varun Sundar Rabindranath	a65f46be5e	[Misc] DeepGemmExperts : Avoid JIT generation in the hot-path (#21955 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-08-01 19:42:03 -07:00
vllmellm	ee2eb6ecd8	[Model] Qwen2.5 VL SiLU-and-Mul (#22066 ) Signed-off-by: kf <kuanfu.liu@embeddedllm.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: kf <kuanfu.liu@embeddedllm.com>	2025-08-01 19:34:37 -07:00
JartX	3654847db5	feat: Add Support GPTQ Quantization MOE on ROCM vllm serve (#21733 )	2025-08-01 21:12:19 -04:00
Harry Mellor	38c8bce8b6	Enable headless models for pooling in the Transformers backend (#21767 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 10:31:29 -07:00
Varun Sundar Rabindranath	ac45c44d98	[Bugfix] [Performance] DeepEPHighThroughput + DeepSeek : Quant before Dispatch (#21837 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-08-01 10:14:38 -07:00
Isotr0py	3f8e952179	[Bugfix] Fix glm4.1v video inference issue (#22067 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-01 09:33:30 -07:00
Dipika Sikka	dfbc1f8880	[Speculative Decoding] Add `speculators` config support (#21345 )	2025-08-01 08:25:18 -04:00
Harry Mellor	87c94bc879	Revert "Update sampling_metadata.py (#21937 )" (#22088 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 05:24:46 -07:00
Jee Jee Li	28b18cc741	[Quantization] Enable BNB support for InternS1 (#21953 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-01 11:09:54 +00:00
Aviad Rossmann	53d7c39271	Update sampling_metadata.py (#21937 ) Signed-off-by: Aviad Rossmann <aviadr@neureality.ai>	2025-07-31 23:23:18 -07:00
Kyle Sayers	0f46a780d4	[Model] [Quantization] Support quantization for Gemma3n (#21974 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-07-31 22:45:15 -07:00
Cyrus Leung	82de9b9d46	[Misc] Automatically resolve HF processor init kwargs (#22005 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-31 22:44:10 -07:00
Wentao Ye	c3e0e9337e	[Feature] Add Flashinfer MoE Support for Compressed Tensor NVFP4 (#21639 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-31 15:26:11 -07:00
Benjamin Chislett	2dff2e21d9	[Bugfix] Fix MTP weight loading (#21941 )	2025-07-31 16:33:53 -04:00
zhiweiz	9e0726e5bf	[Meta] Official Eagle mm support, first enablement on llama4 (#20788 ) Signed-off-by: morgendave <morgendave@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-07-31 10:35:07 -07:00
Song	9484641616	[Model] Add step3 vl (#21998 ) Signed-off-by: oliveryuan <yuansong@step.ai> Co-authored-by: oliveryuan <yuansong@step.ai>	2025-07-31 23:19:06 +08:00
amirkl94	207b750e19	[NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend (#21458 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-31 06:00:01 -07:00
wang.yuqi	2836dd73f1	[Model][CI] Let more pooling models support v1 (#21747 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-31 01:51:15 -07:00
Jee Jee Li	0f7919fca0	[Misc] Expand SUPPORTED_HIDDEN_SIZES for DeepEP low-latency kernels (#21818 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-30 20:41:12 -07:00
Sanchit Gandhi	ec02e536df	[Bugfix] Relax lang pin for voxtral (#21833 ) Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-30 20:38:52 -07:00
Cyrus Leung	004203e953	[CI/Build] Fix registry tests (#21934 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-30 09:10:41 -07:00
Yong Hoon Shin	ad510309ee	Override attention metadata for fast prefill in some KV sharing setups (#21590 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-07-30 08:54:15 -07:00
Isotr0py	6e599eebe8	[Bugfix] Fix OOM tests in initialization test (#21921 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-30 07:35:47 -07:00
Po-Han Huang (NVIDIA)	ff08e51940	[NVIDIA] Fix Llama4 Scout FP4 functionality issues (#21499 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>	2025-07-30 07:33:40 -07:00
aladerran	d979dd6beb	[Feature][EPLB] Add eplb support for Qwen3 (#20815 ) Signed-off-by: aladerran <aladerran@gmail.com>	2025-07-30 06:27:57 -07:00
Jee Jee Li	fc91da5499	[Model] Remove DSV2 unused code (#21903 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-30 00:55:03 -07:00
Cyrus Leung	2ca5f82c2a	[Misc] Remove redundant config definitions (#21891 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-29 23:54:18 -07:00
Areeb Syed	fdde18229e	[Bugfix] Fix shape mismatch assertion error when loading Gemma3n model with BitsAndBytes quantization (#21808 ) Signed-off-by: sydarb <areebsyed237@gmail.com>	2025-07-30 11:35:21 +08:00
Yong Hoon Shin	9266d98048	[BugFix] Fix interleaved sliding window not set for Gemma3n (#21863 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-07-29 16:34:19 -07:00
Wenhua Cheng	ad341c5194	[Bugfix]fix mixed bits and visual language model quantization in AutoRound (#21802 ) Signed-off-by: Wenhua Cheng <wenhua.cheng@intel.com>	2025-07-29 07:26:31 -07:00
Jee Jee Li	61a6905ab0	[Model] Refactor JambaForCausalLM (#21394 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-29 18:25:07 +08:00
Reza Barazesh	37efc63b64	[V0 deprecation] Guided decoding (#21347 ) Signed-off-by: Reza Barazesh <rezabarazesh@meta.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-29 03:15:30 -07:00
Isotr0py	a4528f0cac	[Model]: Fused MoE for nomic-embed-text-v2-moe (#18321 ) Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-29 03:13:27 -07:00
Benji Beck	f1e2c095ec	Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema (#21684 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-07-28 22:09:45 -07:00
Wentao Ye	48b763d6b5	[Refactor] Merge Compressed Tensor FP8 `CompressedTensorsW8A8Fp8MoEMethod` and `CompressedTensorsW8A8Fp8MoECutlassMethod` (#21775 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-28 19:47:21 -06:00
Nikhil Gupta	89ac266b26	[Feat]: Add support for Dynamic Quant 4 bit CPU kleidiai kernels (#17112 ) Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-28 20:55:15 +00:00
rasmith	b361f14e39	[AMD][BugFix] Fix omission of wvSplitK kernel for small batch sizes (1-4) due to torch.compile (#21350 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-07-28 15:38:20 -04:00
Cyrus Leung	e17a4d3bf9	[Bugfix] Fix granite speech shape validation (#21762 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-28 14:19:21 -04:00
Anton Vlasjuk	656c24f1b5	[`Ernie 4.5`] Name Change for Base 0.3B Model (#21735 ) Signed-off-by: vasqu <antonprogamer@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-28 12:22:32 +00:00
Isotr0py	0ae970ed15	[Bugfix] Fix glm4.1v video_grid_thw tensor shape scheme (#21744 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-28 04:26:49 -07:00
Jee Jee Li	1b769dccf3	[Bugfix] Fix Ernie4_5_MoeForCausalLM shared experts (#21717 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-28 11:02:25 +00:00

1 2 3 4 5 ...

2258 Commits