biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Fan Yang	a1946570d8	add --insecure arg to the vllm bench to skip TLS (#34026 ) Signed-off-by: Fan Yang <yan9fan@meta.com> Co-authored-by: Fan Yang <yan9fan@meta.com>	2026-02-10 22:23:52 +08:00
Harry Mellor	d0bc520569	Bump `mamba-ssm` version in CI for Transformers v5 compatibility (#34233 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-10 14:46:01 +01:00
Krish Gupta	748625cdaf	[V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input (#34220 ) Signed-off-by: KrxGu <krishom70@gmail.com>	2026-02-10 13:05:32 +00:00
Harry Mellor	61413973e8	Stop testing for slow tokenizers as they will not exist soon (#34235 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-10 12:08:20 +00:00
Phúc H. Lê Khắc	94de871546	[Misc] allow specify is_mm_prefix_lm in hf_config (#34215 )	2026-02-10 11:16:21 +00:00
tc-mb	e042d7e685	Add flagos in MiniCPM-o (#34126 ) Signed-off-by: tc-mb <caitianchi@modelbest.cn> Signed-off-by: Vincent-Xiao <vincent.xiao.me@gmail.com> Co-authored-by: Vincent-Xiao <vincent.xiao.me@gmail.com>	2026-02-10 02:51:48 -08:00
Roger Wang	ae4e280602	[Bugfix] Fix FI kernel`chunk_gated_delta_rule` output shape for Qwen3.5 (#34219 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-10 10:41:24 +00:00
zzaebok	cbea11c9f0	[Docs] Fix format error in KV load failure recovery doc (#34137 ) Signed-off-by: Jaebok Lee <jaebok9541@naver.com>	2026-02-10 02:16:26 -08:00
Cyrus Leung	2c32558a3c	[Bugfix] Fix `--trust-remote-code` conflict (#34218 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 00:29:10 -08:00
Zetong Li	5f970120f0	[Bugfix] Fix memory inconsistency in cross-process shared memory (#32022 ) Signed-off-by: Zetong Li <slippersss@126.com>	2026-02-10 08:22:03 +00:00
Cyrus Leung	998e2d91f8	Revert #34208 (#34216 )	2026-02-09 23:59:04 -08:00
Wentao Ye	e1060a71a1	[Perf] Optimize detokenizer python logic (#32975 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-02-09 23:54:41 -08:00
Chen Zhang	97fa8f6590	[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2026-02-10 07:41:16 +00:00
wang.yuqi	dab1de9f38	[Frontend][CI] Consolidate instrumentator entrypoints (#34123 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-10 07:30:19 +00:00
Balaxxe	8d48d0a9d9	[Bugfix] Sort hf_weights_files in fastsafetensors_weights_iterator to match #33491 (#34190 ) Signed-off-by: Balaxxe <136368465+jaim12005@users.noreply.github.com>	2026-02-09 23:06:30 -08:00
Andrew Xia	9608844f96	[responsesAPI] fix simpleContext streaming output_messages (#34188 ) Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2026-02-09 22:53:07 -08:00
Cyrus Leung	f69b903b4c	[Bugfix] Add `--trust-remote-code` to dataset bench args (#34208 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-09 22:37:50 -08:00
Lucas Wilkinson	81e217fe6b	[Bugfix] Fix DP Attention Padding in Dummy Run (#34187 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-10 05:29:39 +00:00
Cyrus Leung	ab97bcf662	[CI/Build] Relax `test_mcp_tool_call` (#34204 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 05:18:57 +00:00
Cyrus Leung	25e48a3aae	[Doc] Update usage of `--limit-mm-per-prompt` (#34148 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-09 21:12:13 -08:00
Roger Wang	8a5e0e2b2b	[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-10 13:03:32 +08:00
Andreas Karatzas	4cde2e0159	[ROCm][Bugfix] Resolve Dynamo tracing crash from amdsmi calls in on_gfx* arch detection (#34108 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-09 20:50:20 -08:00
Roger Wang	047a457fa4	[Bugfix] Adopt `ChunkGatedDeltaRule` for Qwen3.5 (#34198 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-10 03:47:54 +00:00
Yuwei An	e94ec59733	[LMCache] Token Base IPC API (#34175 ) Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>	2026-02-10 01:18:42 +00:00
Ning Xie	13397841ab	[structured output] validate unsupported json features first (#33233 ) Signed-off-by: Andy Xie <andy.xning@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2026-02-09 23:49:09 +00:00
Gregory Shtrasberg	c60f8e3b49	[Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available (#34153 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-02-09 17:38:54 -06:00
Michael Goin	5e75a14a66	[Doc] Add DCP support to attention backend doc (#33936 )	2026-02-09 18:33:43 -05:00
Nick Hill	e7e52781ff	[ModelRunner V2][BugFix] Fix `max_query_len` calculation (#34167 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-09 21:47:17 +00:00
Charlie Fu	bb9f97308d	[torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. (#33945 ) Signed-off-by: charlifu <charlifu@amd.com>	2026-02-09 16:15:43 -05:00
Hongxia Yang	4d39650961	[ROCm] update triton branch to support gpt-oss models for gfx11xx devices (#34032 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2026-02-09 19:36:30 +00:00
Artus Krohn-Grimberghe	8fd31f6245	[Bugfix] Voxtral prompt/audio placeholder alignment (#34140 ) Signed-off-by: Artus KG <artuskg@gmail.com>	2026-02-09 19:30:38 +00:00
Artus Krohn-Grimberghe	eadb4e868b	[Bugfix] Avoid duplicate k-proj weight emission in helper (#34142 ) Signed-off-by: Artus KG <artuskg@gmail.com>	2026-02-09 19:17:44 +00:00
Jiangyun Zhu	285bab4752	[Kernel] use flashinfer for gdn prefill (#32846 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-02-09 12:17:25 -05:00
TomerBN-Nvidia	995bbf38f1	[Bugfix] Fix shared expert input for latent MoE in EP+DP (Nemotron-H) (#34087 ) Signed-off-by: Tomer Natan <tbarnatan@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-09 16:44:18 +00:00
Mohammad Miadh Angkad	d4f123cc48	[Kernel] FlashInfer: switch allreduce fusion to unified API (#33985 ) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>	2026-02-09 15:43:24 +00:00
ZhengHongming888	cb62e86f83	Add NUMA Core binding in nixl_connector for CPU xPyD (#32365 ) Signed-off-by: Hongming Zheng <hongming.zheng@intel.com> Signed-off-by: ZhengHongming888 <hongming.zheng@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-09 15:39:12 +00:00
Luka Govedič	781ddf7868	[CI][torch.compile] Fix incorrect filtering for E2E fusion tests on B200 (#34031 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2026-02-09 10:05:14 -05:00
Roger Wang	64a9c2528b	[UX] Add `--language-model-only` for hybrid models (#34120 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-09 14:57:33 +00:00
Lucas Wilkinson	d0d97e2974	[Misc] Fix up attention benchmarks (#33810 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-09 09:42:03 -05:00
JJJYmmm	9562912cea	[MODEL] Adding Support for Qwen3.5 Models (#34110 ) Signed-off-by: JJJYmmm <1650675829@qq.com> Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wulipc <wulipc@users.noreply.github.com> Co-authored-by: ywang96 <ywang96@users.noreply.github.com> Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-02-09 21:12:58 +08:00
zofia	9bdb06b436	[XPU][6/N] add xpu scaled_mm kernel (#34117 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>	2026-02-09 20:17:35 +08:00
Nikhil Gupta	caad9f1e01	[Fix] [CPU Backend] : Prepack weights for w8a8 oneDNN matmul (#33901 ) Signed-off-by: nikhil-arm <nikhil.gupta2@arm.com>	2026-02-09 18:04:41 +08:00
Ekagra Ranjan	1d5922fade	[ASR] Fix audio benchmark and add RTFx metric (#32300 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2026-02-09 10:02:37 +00:00
Andreas Karatzas	3025b3cebb	[CI] Remove empty image_size_factors for fuyu, glm4_1v, glm_ocr (#34107 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-09 17:37:04 +08:00
Jee Jee Li	978a37c823	[Model] GLM adaptation (#34124 )	2026-02-09 17:32:52 +08:00
ihb2032	5a5c43511a	fix(cpu): fix mla_decode compilation on x86 without AVX512 (#34052 ) Signed-off-by: ihb2032 <hebome@foxmail.com> Co-authored-by: root <root@LAPTOP-FKNHV411.localdomain>	2026-02-09 08:55:41 +00:00
Nick Hill	d9bede0314	[BugFix] Fix `fastsafetensors` TP all procs using all GPUs (#34070 ) Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-09 15:15:46 +08:00
wang.yuqi	22b64948f6	[Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> v0.16.0rc1	2026-02-09 06:42:38 +00:00
Reagan Lee	7c233dbb36	[Tiny] Rename encoder budget file to more specific name (#34103 ) Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>	2026-02-09 03:48:19 +00:00
kourosh hakhamaneshi	a75a5b54c7	[bug-fix] supported_tasks is breaking backward compatibility at init_app_state (#34027 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-09 09:46:46 +08:00

1 2 3 4 5 ...

13780 Commits