biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Didier Durand	63fed55506	[Doc]: fix typos in various files (#28811 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-16 14:30:06 +00:00
Anna Shors	8d259fad6c	Fix gpt oss weight loading with EP + bf16 (#28765 ) Signed-off-by: ashors1 <ashors@nvidia.com>	2025-11-16 13:12:45 +00:00
scottzh8	3bc1175798	[Bugfix] Fix host and port join for ipv6 in bench serve (#28679 ) Signed-off-by: Scott Zhang <scottzh@fb.com> Co-authored-by: Scott Zhang <scottzh@fb.com>	2025-11-16 10:20:57 +00:00
Dezhan	af02c40970	Fixed gpt-oss _load_weights_other() parameter position bug (#28715 ) Co-authored-by: Dezhan Tu <dztu@meta.com>	2025-11-16 09:46:29 +00:00
Lucia Fang	b316ac6589	[V1] Support MP Executor for multi node distributed inference (#23691 ) Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-16 09:01:21 +00:00
wang.yuqi	a55b64635c	[Model] Allow users to control skip reading cache per request. (#28194 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2025-11-16 00:04:50 -08:00
ai-jz	d231876ce3	[Benchmark] Fix client seed synchronization in multi-turn benchmark (#28512 ) Signed-off-by: ai-jz <aijz.xplr@gmail.com>	2025-11-16 15:04:32 +08:00
Bram Wasti	f849ee739c	Adding a benchmark for batch invariance (#28161 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-16 13:22:17 +08:00
Lucas Wilkinson	be263f7645	[BugFix] Fix `AssertionError: DCP not support reorder_batch_threshold > 1 now.` (#28751 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-15 22:35:06 +00:00
Didier Durand	2bb4435cb7	[Doc]: fix typos in various files (#28567 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-15 19:27:50 +00:00
Lukas Geiger	07cadab27a	[Model][Qwen3VL] Cache positional embedding indices (#28475 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-15 19:03:09 +00:00
Nick Hill	637f292196	[CI] Fix broken pipeline (#28781 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-15 08:44:14 -08:00
Eldar Kurtić	e439c784fa	Add support for Eagle with separate lm-head and embed_tokens layers (#28549 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>	2025-11-15 06:12:02 -08:00
hwhaokun	085a525332	[Model] Fix lmhead init bug of bailing_moe (#28777 ) Signed-off-by: hwhaokun <haokun0405@163.com> Co-authored-by: zhaozx-cn <zhaozx2116@163.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-15 05:44:12 -08:00
Cyrus Leung	89d3679221	[Doc] Fix failing doc build (#28772 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-15 05:33:27 -08:00
tingtinggithub	cb15ee28db	Allow Gemma3 to take image embeddings (#28483 ) Signed-off-by: tingtinggithub <streamttt@gmail.com>	2025-11-15 04:18:08 -08:00
Angela Yi	f36292dbee	[compile] Enable sequence parallelism matching w/o custom ops enabled (#27126 ) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-11-15 11:46:12 +00:00
Vadim Gimpelson	173b356abf	[PERF] Remove TRTLLM Gen attn kernel limitation `max_seq_len <=131072` (#28755 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-15 15:43:41 +05:30
Cyrus Leung	638e4196d1	[Misc] Make `SchedulerConfig.max_model_len` init-only (#28733 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-15 01:59:31 -08:00
Zhewen Li	1ec978c209	[Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 (#28709 ) Signed-off-by: Zhewen Li <zhewenli@meta.com>	2025-11-15 01:10:48 -08:00
Jane (Yuan) Xu	74b5267d3a	Use narrow over indexing in `hadacore_transform` to prep for ABI stable (#28756 ) Signed-off-by: Jane Xu <janeyx@meta.com>	2025-11-15 01:10:15 -08:00
Zhuohan Li	dd6ac1c2bb	[RL] [V1] Remove unused device argument from reset_kv_cache (#28766 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-11-14 23:59:42 -08:00
Cyrus Leung	98b4d389ed	[Redo] #26368 (#28771 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 22:47:41 -08:00
Varun Sundar Rabindranath	6965ef436f	[Performance][DeepGEMM] Estimate expected_m (#28694 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-15 13:52:14 +08:00
Chendi.Xue	c9e665852a	[NIXL] heterogeneous block_size support (#26759 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2025-11-14 21:51:32 -08:00
Mohammad Othman	363aaeef0f	Fix IntermediateTensors initialization and add type hints (#28743 ) Signed-off-by: Mohammad Othman <Mo@MohammadOthman.com> Co-authored-by: Mohammad Othman <Mo@MohammadOthman.com>	2025-11-15 04:31:36 +00:00
Nick Hill	ac86bff8cb	Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773 )	2025-11-14 20:24:00 -08:00
Michael Goin	edfe498189	[Bugfix] Build hadacore kernels on >SM90 (#28748 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-14 19:51:05 -08:00
Lukas Geiger	f05d474c8a	[Model][Qwen3VL] Use `mm_position` to compute mrope positions (#28730 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-14 19:45:11 -08:00
QiliangCui	9fc81ec765	[TPU] Fix import error in tpu launch (#28758 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-11-15 00:58:32 +00:00
Jialin Ouyang	186352b270	[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 16:04:04 -08:00
Nick Hill	58e61e56b7	[Test] Rework e2e async scheduling tests (#28744 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-14 16:01:09 -08:00
Gregory Shtrasberg	75f01b9d3c	[ROCm][CI/Build] Upgrade to ROCm 7.1 and AITER main (#28753 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-11-14 15:53:21 -08:00
rasmith	ba041d980b	[Log] Save profiler results to file instead of stdout (#28144 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-14 23:26:39 +00:00
Thomas Parnell	e0c910bb89	[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 (#28295 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-11-14 22:55:42 +00:00
Benjamin Chislett	bf3ffb61e6	[Bugfix] Fix ChunkedLocalAttention CUDA Graph setting (#28739 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-14 14:14:46 -08:00
Alexander Matveev	e5c78956c0	[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-14 14:13:46 -08:00
Laith Sakka	2e0ad629b0	Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-14 14:11:10 -08:00
Gregory Shtrasberg	5a84b76b86	[ROCm][CI/Build] Change install location of uv (#28741 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-11-14 21:34:18 +00:00
Marcin Ostrowski	0de4f217ab	[Bugfix] TypeError: 'NoneType' object is not callable (#27410 ) Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>	2025-11-14 21:13:53 +00:00
Michael Goin	f08eab2acc	[CI] Fix macos smoke test uv cache issue (#28736 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-14 13:29:55 -07:00
Sage Moore	8977ffb5e6	[ROCm][Bugfix] Fix compilation errors with fused_qknorm_rope_kernel.cu (#28682 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-11-14 11:06:01 -08:00
Andrey Khalyavin	fd4555089a	[BugFix] Fix misprint introduced by modular_kernel refactoring. (#28728 ) Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>	2025-11-14 10:58:18 -08:00
GuanH	cec275efce	[Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure (#28663 ) Signed-off-by: GuanH <guansdrailib@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-14 18:44:27 +00:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
Harry Mellor	67187554dd	[Docs] Enable some more markdown lint rules for the docs (#28731 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-14 18:39:19 +00:00
TJian	a425dc256e	[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo (#28716 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-14 10:30:50 -08:00
Fardin Hoque	964d65deed	LLaMA4 LoRA Adapter Enablement (#28602 ) Signed-off-by: Fardin Hoque <kfhfar@amazon.com> Co-authored-by: Wei Wei <wwei6@meta.com>	2025-11-14 13:27:56 -05:00
Chen Wang	9261eb3dc1	docs(lora_resolvers): clarify multi-resolver order and storage path requirement (#28153 ) Signed-off-by: Chen Wang <Chen.Wang1@ibm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-14 18:08:30 +00:00
czhu-cohere	cdd7025961	[kernel] Improve FP8 PTPC on Hopper for larger shapes (#28692 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-11-14 09:59:11 -08:00

... 38 39 40 41 42 ...

13302 Commits