biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Milos Puzovic	2176778cd3	[Doc] Add Arm CPUs are on the list of supported targets in vLLM (#26018 ) Signed-off-by: Milos Puzovic <milos.puzovic@arm.com>	2025-11-06 15:30:26 +00:00
Eric Yue	0370679ce9	[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 (#28200 ) Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>	2025-11-06 07:29:46 -08:00
Harry Mellor	8816e375d3	[Docs] Switch to directory style URLs (#28058 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-06 07:06:33 -08:00
Michael Goin	f32229293e	Disable nm-testing models with issues in CI (#28206 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-06 06:19:07 -08:00
xiangze-arm	c757a15f0f	[CPU]Improve cpu fused moe perf (#27244 ) Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>	2025-11-06 11:04:18 +00:00
Chauncey	59a50afa08	[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony (#26874 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-11-06 10:40:03 +00:00
courage17340	981cadb35c	[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty (#28181 ) Signed-off-by: courage17340 <courage17340@163.com>	2025-11-06 17:52:13 +08:00
wangxiyuan	c3ee80a01a	[V0 deprecation]clean up is_v1_supported_oracle (#28116 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-06 16:05:32 +08:00
Aditya Tewari	3755c14532	[CPU] Enable torch profiling (#28130 ) Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>	2025-11-06 07:32:05 +00:00
Seungduk Kim	201dc98acc	Fix hard-coded parameter name in gemma3n.py (#27946 ) Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com> Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-11-05 23:07:36 -08:00
Julien Denize	a404e2c0f1	Patch Mistral Tokenizer (#28146 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-11-06 06:43:16 +00:00
Xiaozhu Meng	e31946f86e	[flashinfer] fix FI all2all with FI cutlass moe (#28166 ) Signed-off-by: Xiaozhu <mxz297@gmail.com>	2025-11-06 05:52:16 +00:00
gmagogsfm	bde5039325	[CI] Add compile/test_multimodal_compile.py to CI (#28151 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-06 05:41:47 +00:00
Jacob Zhong	d72299d47b	Make the cv2 dependency optional (#27780 ) Signed-off-by: Jacob <cmpute@qq.com>	2025-11-06 05:08:55 +00:00
Lukas Geiger	80679f108f	[Core][MM] Use non-blocking CPU-GPU copy of multimodal data (#28141 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-06 04:05:12 +00:00
Isotr0py	43ecd0a900	[Chore] Clean up deepseek v2/v3 config copy (#28055 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-06 03:46:30 +00:00
Chauncey	07d614511f	[Misc] Remove the duplicate code (#28111 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-11-05 21:07:47 -05:00
Vadim Gimpelson	f948ab6945	[CI Failure] `nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV` was removed from HF. Skip it in tests (#28170 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-06 01:22:13 +00:00
Wentao Ye	d71af5f502	[Feature] Enable TP + EP `shared_experts` overlap with router, 3.7% E2E performance improvement (#28164 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-05 17:21:08 -08:00
Wentao Ye	90189c71a9	[Bug] Fix env string `"0"` same to `True` (#28159 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-05 17:04:20 -08:00
Wentao Ye	d79d9f0780	[Bug] Fix cpu disable shared_experts `VLLM_DISABLE_SHARED_EXPERTS_STREAM` (#28157 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-05 17:03:09 -08:00
Vadim Gimpelson	b6a248bdd7	[PERF] Decouple projections from GDN custom op. Attempt 2 (#28083 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-05 17:01:12 -08:00
Dayeol Lee	1767658559	[Debugging] Add annotation for easier trace analysis (#22496 )	2025-11-05 16:52:52 -08:00
Kuntai Du	efe73e9b57	[Core][Hybrid allocator + connector 2/n] Unify `remove_skipped_blocks` by `get_last_useful_token` (#25431 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-11-06 00:12:00 +00:00
Zhewen Li	0b8e871e5e	[CI/Build] Fix `test_defaults_with_usage_context` in AMD CI (#27926 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-05 15:40:24 -08:00
Zhewen Li	5ee93a5956	[CI/Build] Update checking logic in cutlass_group_gemm_supported (#27948 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-05 15:40:10 -08:00
Snehlata	e15601789b	[Feature]: Add corrupted request metric to V1 metrics system. (#27306 ) Signed-off-by: atalhens <sneh.lata@nutanix.com>	2025-11-05 13:45:29 -08:00
Richard Zou	65ac8d8dc4	[Docs] Add guide to debugging vLLM-torch.compile integration (#28094 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-11-05 21:31:46 +00:00
Isotr0py	ffb08379d8	[Chore] Remove Nemotron-Nano-VL config copy (#28126 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-05 20:06:45 +00:00
R3hankhan	e04492449e	[Hardware][IBM Z] Optimize s390x Dockerfile (#28023 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2025-11-05 11:25:44 -08:00
Michael Yao	518ec6b722	[Docs] Clean up README_TUNING.md (#28088 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-11-05 19:01:34 +00:00
wang.yuqi	802748bddb	[Bugfix] Fix Qwen3-Reranker-8B load (#28117 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-11-05 18:33:50 +00:00
Paul Zhang	faedbb4d4f	[Feature] Extend batch invariant torch.compile to B200 (#27856 ) Signed-off-by: PaulZhang12 <paulzhan@fb.com>	2025-11-05 10:04:49 -08:00
Samuel Shen	40db194446	[CI]: Add LMCacheConnector Unit Tests (#27852 ) Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>	2025-11-05 09:45:57 -08:00
Chen Zhang	c765f0b443	[FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell (#27994 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-11-05 09:25:32 -08:00
gmagogsfm	002b07c4b2	[Bugfix] vLLM should check Inductor config for compile cache enablement status (#27637 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-05 12:22:44 -05:00
Walter Beller-Morales	752ddeacaa	[Core] add support for reasoning parser plugins (#28075 ) Signed-off-by: walter beller-morales <walter.beller.morales@gmail.com>	2025-11-06 01:15:06 +08:00
Jiangyun Zhu	c18f88c6ca	[Kernel] Fuse computation of g and beta for Gated Delta Net (#28095 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-05 09:14:55 -08:00
Jiaju Zhang	6fd0df8132	[misc] add vLLM Beijing Meetup (#28127 ) Signed-off-by: Jiaju Zhang <jjzhang@redhat.com>	2025-11-05 17:12:59 +00:00
Isotr0py	3f5a4b6473	[Bugfix] Validate custom logits processor xargs for online serving (#27560 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-05 16:53:33 +00:00
Pleaplusone	6cae1e5332	[ROCm][MLA] Support block-size > 1 for AITER MLA backend (#27224 ) Signed-off-by: ganyi <ygan@amd.com> Co-authored-by: wuhuikx <hattie.wu@amd.com>	2025-11-05 10:43:02 -05:00
Alexei-V-Ivanov-AMD	80c9275348	Enabling cooperative multi-gpu tests on multi-gpu nodes (#27986 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-11-05 10:35:49 -05:00
Ilya Markov	e50c454672	[BugFix] Support EP/DP + EPLB with MTP (#25311 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-11-05 15:22:17 +00:00
Chen Zhang	5d16d0fa62	[DCP] check return_lse for all layers in dcp (#27929 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-11-05 22:27:25 +08:00
bigmoyan	0606bea2b6	add kimi reasoning parser (#28128 ) Signed-off-by: wangzhengtao <wangzhengtao@msh.team> Co-authored-by: wangzhengtao <wangzhengtao@msh.team>	2025-11-05 21:48:33 +08:00
Frost Mitchell	6e97eccf5d	[XPU] Enable custom routing functions in IPEX for Llama4 (#28004 ) Signed-off-by: frost-intel <frost.mitchell@intel.com>	2025-11-05 13:39:57 +00:00
Boyuan Feng	6ab183813c	[Graph Partition][Cache] Use inductor partition ops config (#27702 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-05 13:04:48 +00:00
amirkl94	6b7a81185d	Bugfix: Cutlass FP8 FusedMoE bad scaling factors (#27255 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-05 06:06:06 -05:00
Eric Yue	b57789b62b	Fix excessive logging noise by reducing the log level of the MinimaxM2ToolParser import success message (#27635 ) Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>	2025-11-05 19:03:51 +08:00
Chauncey	377061d481	[Misc] fix import error for DeepSeekR1ReasoningParser (#28114 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-11-05 19:02:32 +08:00

... 66 67 68 69 70 ...

14386 Commits