biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Wentao Ye	eebc58df0c	[Refactor] Remove unused cutlass moe problem size function (#32047 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-18 12:46:59 -08:00
Wentao Ye	16de822c71	[Refactor] Remove unused file `pallas_kv_cache_update.py` (#32433 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-18 12:46:39 -08:00
Deming	5480c6b1fa	[Doc] Correct comment for _jobs dict in OffloadingConnectorWorker (#32556 )	2026-01-18 12:46:00 -08:00
Andrey Khalyavin	ba29ab441e	Use the same memory for workspace13 and fused_output. (#31531 ) Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>	2026-01-18 19:14:22 +00:00
Robert Shaw	afc3622602	[CI] Move Distributed Tests from H200 -> H100 (#32555 )	2026-01-18 10:25:23 -08:00
bnellnm	327a02d8db	[MoE Refactor] Separate Router into OO Classes (#30623 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-18 11:40:49 -05:00
tjp_zju	2f03035a61	"refactor: refactor_repeated_interfaces" (#32486 ) Signed-off-by: tom-zju <tanjianpingzju1990@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-18 22:07:01 +08:00
Isotr0py	38bf2ffb21	[Bugfix] Fix GLM-ASR audio encoder RoPE dim (#32540 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-18 19:17:59 +08:00
Li Xie	c826c72a96	[Model] Support Step1 Model (#32511 ) Signed-off-by: xieli <xieli@stepfun.com>	2026-01-18 10:20:46 +00:00
Canlin Guo	fe36bf5e80	[Model] Remove the unnecessary dtype conversion in MiniCPM (#32523 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2026-01-18 08:07:28 +00:00
Woosuk Kwon	963dc0b865	[Model Runner V2] Minor optimization for eagle input processing (#32535 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-17 21:55:17 -08:00
Isotr0py	8cc26acd8b	[Performance] Improve Triton prefill attention kernel's performance (#32403 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-17 20:19:59 -08:00
Robert Shaw	4a6af8813f	[MoE Refactor] Move Test Impl into Test Dirs (#32129 ) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-01-18 12:16:59 +08:00
Woosuk Kwon	4147910f1e	[Model Runner V2] Move mrope_positions buffer to MRopeState (#32532 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-17 20:09:48 -08:00
Karan Bansal	3055232ba0	[Feature] Add FIPS 140-3 compliant hash algorithm option for multimodal hashing (#32386 ) Signed-off-by: Karan Bansal <karanb192@gmail.com>	2026-01-18 11:02:01 +08:00
Shengqi Chen	965765aef9	[build] fix cu130 related release pipeline steps and publish as nightly image (#32522 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2026-01-17 18:36:11 -08:00
Mritunjay Kumar Sharma	9e078d0582	[CI/Build][Docker] Add centralized version manifest for Docker builds (#31492 ) Signed-off-by: Mritunjay Sharma <mritunjay.sharma@chainguard.dev>	2026-01-17 13:45:30 +00:00
Guofang.Tang	2b99f210f5	[Misc] Fix typo: seperator -> separator in flashmla_sparse.py (#32411 ) Signed-off-by: Guofang Tang <tinggofun@gmail.com> Co-authored-by: Guofang Tang <tinggofun@gmail.com>	2026-01-17 12:18:30 +00:00
Kim Hee Su	1646fea672	[Model] Molmo2: Enable quantized weight mapping for vision backbone (#32385 ) Signed-off-by: kimheesu <wlskaka4@gmail.com>	2026-01-17 09:33:05 +00:00
Paul Pak	d3317bbba4	[Models] Lfm2Moe: minor name changes for resolving lora conflicts (#29063 ) Signed-off-by: Paul Pak <paulpak58@gmail.com>	2026-01-16 22:12:55 -08:00
Shengqi Chen	8e61425ee6	[CI] Implement uploading to PyPI and GitHub in the release pipeline, enable release image building for CUDA 13.0 (#31032 )	2026-01-17 04:52:33 +00:00
Matthew Bonanni	2e7c89e708	Revert "[Attention][MLA] Make `FLASHINFER_MLA` the default MLA backen… (#32484 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-17 04:42:39 +00:00
vanshil shah	037a6487af	apply _validate_input to MistralTokenizer token-id chat prompts (#32448 ) Signed-off-by: Vanshil Shah <vanshilshah@gmail.com>	2026-01-17 03:23:45 +00:00
Simon Mo	5a3050a089	[Docs][Governance] Add @robertshaw2-redhat to lead maintainers group (#32498 ) Co-authored-by: Claude <noreply@anthropic.com>	2026-01-16 18:35:49 -08:00
Chenyaaang	484e22bc18	[TPU][Core] Enable Pipeline Parallelism on TPU backend (#28506 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2026-01-16 15:29:20 -08:00
Lucas Wilkinson	ca21288080	[CI] Fix OOM in Hopper Fusion E2E Tests (H100) (#32489 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-16 21:27:16 +00:00
Andrew Xia	4c82b6fac7	[responsesAPI] allow tuning include_stop_str_in_output (#32383 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2026-01-16 21:14:40 +00:00
Xin Yang	a884bc62d6	[LoRA] Update LoRA expand kernel heuristic (#32425 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-16 18:38:07 +00:00
Hashem Hashemi	7a1030431a	Atomics Reduce Counting Optimization for SplitK Skinny GEMMs. (#29843 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-01-16 11:45:04 -06:00
Wentao Ye	9fd918e510	[CI] Update deepgemm to newer version (#32479 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-17 01:18:05 +08:00
Ilya Markov	c9a533079c	[EPLB][BugFix]Possible deadlock fix (#32418 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-01-16 09:11:01 -05:00
rasmith	6ca4f400d8	[CI][AMD] Skip test_permute_cols since the kernel is not used and not built for ROCm (#32444 ) Signed-off-by: Randall Smith <ransmith@amd.com>	2026-01-16 16:22:53 +08:00
Cyrus Leung	180e981d56	[Chore] Replace swish with silu (#32459 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-16 08:22:45 +00:00
Micah Williamson	b84c426a8c	[ROCm][CI] Skip Qwen3-30B-A3B-MXFP4A16 Eval Test On Non-CUDA Platforms (#32460 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-16 00:17:44 -08:00
Rabi Mishra	b66b0d6abb	fix(rocm): Enable non-gated MoE (is_act_and_mul=False) support on ROCm (#32244 ) Signed-off-by: rabi <ramishra@redhat.com>	2026-01-16 15:31:10 +08:00
Hongxin Xu	03da3b52ef	[Bugfix] Refactor to support DP parallel in R3 (#32306 ) Signed-off-by: xhx1022 <1737006628@qq.com> Co-authored-by: arlenxu <arlenxu@tencent.com>	2026-01-16 15:13:58 +08:00
Lucas Wilkinson	14ce524249	[CI] Breakup h200 tests (#30499 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-16 06:23:22 +00:00
wang.yuqi	4ae77dfd42	[Frontend][1/n] Make pooling entrypoints request schema consensus \| CompletionRequest (#32395 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-16 06:17:04 +00:00
XiongfeiWei	73f635a75f	[Bug] Add TPU backend option (#32438 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2026-01-16 05:17:12 +00:00
cjackal	35bf5d08e8	[bugfix] Fix online serving crash when text type response_format is received (#26822 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>	2026-01-16 12:23:54 +08:00
Kebe	5de6dd0662	[Bugfix] [DeepSeek-V3.2] fix sparse_attn_indexer padding (#32175 ) Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-16 03:21:55 +00:00
ltd0924	709502558c	[Model] Add Step3vl 10b (#32329 ) Signed-off-by: luotingdan <luotingdan@stepfun.com> Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com> Co-authored-by: luotingdan <luotingdan@stepfun.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-01-15 19:04:16 -08:00
Micah Williamson	46f8a982b1	[ROCm][CI] Enable AITER Unified Attention On ROCm For gpt-oss Test (#32431 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-16 00:55:57 +00:00
Matthew Bonanni	bcf2333cd6	[CI] Fix LM Eval Large Models (H100) (#32423 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-16 00:52:49 +00:00
Michael Goin	83239ff19a	Add thread_n=64 support to Marlin MoE (#32360 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-15 16:45:44 -08:00
TomerBN-Nvidia	c277fbdf31	[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257 ) Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>	2026-01-15 16:15:05 -08:00
Wentao Ye	aca5c51487	[Refactor] Remove unused file (#32422 )	2026-01-15 15:59:38 -07:00
Yongye Zhu	31c29257c8	[MoE Refactor][17/N] Apply Refactor to Bf16 (#31827 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-15 12:53:40 -08:00
Aleksandr Malyshev	8c11001ba2	[ROCM] DSfp4 mla projection gemms weight dynamic quantization (#32238 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2026-01-15 14:13:08 -06:00
Richard Zou	bd292be0c0	[BugFix] Python file source reading can fail on UnicodeDecodeError (#32416 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-15 20:01:41 +00:00

... 13 14 15 16 17 ...

13773 Commits