biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
rasmith	0024f39a32	[ROCm][P/D][MORI][BugFix] Add transfer_id for moriio_connector so moriio_connector to restore P/D functionality (#34907 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-03-16 10:36:51 +08:00
Andrew Xia	e9163b536e	[responsesAPI][ez] add a unit test for SimpleContext logprobs (#37126 ) Signed-off-by: Andrew Xia <axia@meta.com>	2026-03-15 17:12:26 -07:00
Lalithnarayan C	7acaea634c	In-Tree AMD Zen CPU Backend via zentorch [1/N] (#35970 ) Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-15 23:35:35 +00:00
Jiangyun Zhu	697e4ff352	[GDN] add a config for gdn kernel selection (#36647 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-03-16 00:40:17 +08:00
Hari	a3e2e250f0	[Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614 ) Signed-off-by: hasethuraman <hsethuraman@microsoft.com>	2026-03-15 19:38:21 +08:00
Isotr0py	143e4dccdf	[Misc] Add online audio_in_video test (#36775 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-15 00:14:11 -07:00
Isotr0py	6590a3ecda	[Frontend] Remove `torchcodec` from audio dependency (#37061 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-15 05:15:59 +00:00
Russell Bryant	b3debb7e77	[Build] Upgrade xgrammar to get a security fix (#36168 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-15 03:13:48 +00:00
Nick Hill	458c1a4b2d	[Frontend] Reduce chat template warmup logging levels (#37062 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-14 13:48:59 -07:00
Karan Bansal	821fde2df4	[Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference (#32384 ) Signed-off-by: Karan Bansal <karanb192@gmail.com> Co-authored-by: Inokinoki <inoki@inoki.cc>	2026-03-14 17:29:06 +00:00
arlo	8c29042bb9	[Feature] Add InstantTensor weight loader (#36139 )	2026-03-14 18:05:23 +01:00
Cyrus Leung	5467d137b3	[Frontend] Avoid startup error log for models without chat template (#37040 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-14 09:36:11 -07:00
Santino Ramos	3ed46f374b	[Model Runner V2] Add Support for XD-RoPE (#36817 ) Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>	2026-03-14 09:27:55 -07:00
seanmamasde	84868e4793	[Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM formats (#35109 ) Signed-off-by: seanmamasde <seanmamasde@gmail.com>	2026-03-14 08:44:03 -07:00
Isotr0py	a8e8d62dd8	[Misc] Clean up Kimi-audio whisper encoder loading (#36903 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-14 23:37:52 +08:00
Julien Denize	e42b49bd69	Mistral common v10 (#36971 ) Signed-off-by: juliendenize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: root <root@h200-bar-196-227.slurm-bar-compute.tenant-slurm.svc.cluster.local> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-14 07:26:43 -07:00
Sergey Zinchenko	4a718e770d	[Bug] Fix Failure in /v1/chat/completions/render for Multimodal Requests (https://github.com/vllm-project/vllm/issues/35665 ) (#35684 )	2026-03-14 14:10:11 +00:00
Kevin H. Luu	600a039f57	[CI] Shard Multi-Modal Models (Standard) into 4 parallel jobs (#37014 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 08:26:54 +00:00
Harry Mellor	ffa5d74f15	Enable loading of fused expert weights in the Transformers modelling backend (#36997 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-14 07:01:06 +00:00
Kevin H. Luu	74fe80ee95	[CI] Split Distributed Tests (4 GPUs) into 3 parallel jobs (#37015 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 12:21:13 +08:00
Flora Feng	bcfdadb1bc	[Refactor] Relocate chat completion and anthropic tests (#36919 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-14 12:16:16 +08:00
Yanan Cao	236de72e49	[CI] Pin helion version (#37012 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 23:25:29 -04:00
sbeurnier	a116f96930	[V1] Remove pin_memory() in async_copy_to_gpu to fix sporadic stalls (#37006 ) Signed-off-by: Sebastien Beurnier <sbeurnier@together.ai>	2026-03-14 01:37:32 +00:00
Li, Jiang	092ace9e3a	[UX] Improve UX of CPU backend (#36968 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-14 09:27:29 +08:00
Andrew Xia	f680dc1b39	[responsesAPI] prioritize content over summary in reasoning item input (#36516 ) Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: Andrew Xia <mitandrewxia@gmail.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Andrew Xia <axia@fb.com>	2026-03-14 09:20:30 +08:00
Giulio Leone	b41aa264f9	fix: resolve chat template names before kwargs detection (#36937 ) Co-authored-by: giulio-leone <giulio.leone@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-14 00:20:16 +00:00
Dimitrios Bariamis	367cf5cd3e	[Feat][Bugfix] Enable additional dimension for Flashinfer MLA and fix routing dtype (#36931 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>	2026-03-13 16:41:16 -07:00
haosdent	6d53efd2a5	[Bugfix] Fix MLA attention crash with AWQ/GPTQ quantized models (#34695 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-13 23:25:41 +00:00
Benjamin Chislett	8b346309a5	[Refactor] Consolidate SupportsEagle (#36063 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-13 23:22:40 +00:00
Nick Hill	54a6db827f	[BugFix] Fix "DP Coordinator receives unexpected..." messages (#37008 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-13 23:18:05 +00:00
Matthew Bonanni	9efc4db965	[Bugfix] Fix DeepSeek-V3.2 tokenizer stripping spaces (#37004 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-13 22:55:36 +00:00
Kevin H. Luu	f1816fb192	[CI] Split V1 e2e + engine (1 GPU) into separate jobs (#36945 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 14:16:02 -07:00
Harry Mellor	0005d2a3c9	Use Transformers v5 `WeightRenaming` for Transformers modeling backend (#31545 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-13 20:49:08 +00:00
Ekagra Ranjan	d0b402974f	[Bugfix][Spec Decode] Avoid double call of Ngram CPU (#36952 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2026-03-13 20:33:19 +00:00
Divakar Verma	6341d43043	[ROCm][Quantization] add quark w4a8 mxfp4_fp8 for LinearLayer (#35316 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-03-13 19:44:24 +00:00
Mark McLoughlin	7afe0faab1	[Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish (#36666 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-13 12:10:06 -07:00
Harry Mellor	5a3f1eb62f	[Misc] Set default `kv_buffer_device` in a better way (#36862 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-13 19:07:33 +00:00
yugong333	b3ce711b93	Fp8 lora dense kernel (#35242 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com>	2026-03-13 19:05:08 +00:00
Isotr0py	abf61aaa8e	[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request (#36800 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-13 18:16:05 +00:00
bigmoyan	4508532fbd	[Bugfix] fix paddleocr crash on some image shape (#36959 ) Signed-off-by: wangzhengtao <wangzhengtao@msh.team> Signed-off-by: bigmoyan <moyan_work@foxmail.com> Co-authored-by: wangzhengtao <wangzhengtao@msh.team> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-13 13:46:55 +00:00
Itay Alroy	d5af196c18	[2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-03-13 09:25:33 -04:00
Chaojun Zhang	82f836d976	[XPU] Support LoRA via torch.compile on XPU platform (#36962 ) Signed-off-by: chzhang <chaojun.zhang@intel.com>	2026-03-13 10:34:59 +00:00
Andreas Karatzas	4fccd30f19	[ROCm][CI] Upgrading orchestrator to handle python pipeline markers and options (#36181 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-13 02:04:22 -07:00
Or Ozeri	cfaf4668f7	[kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec (#36610 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-13 08:04:21 +00:00
Andreas Karatzas	99a57bdf74	[ROCm][CI] Corrected the GPT-OSS test root path (#36711 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-13 15:53:43 +08:00
Sage	a2268617cf	[Frontend] Delegate preprocessing to `OpenAIServingRender` (#36483 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-13 00:39:43 -07:00
Rohan Potdar	a4ad9db541	Enable RoPE+KV cache fusion for ROCm AITER FA (non-shuffle layout) (#35786 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-03-13 07:33:22 +00:00
Nick Hill	b373b5102a	[Tests] Shutdown test `RemoteVLLMServer` cleanly (#36950 ) Recent PR #33949 changed the teardown logic of the RemoteVLLMServer test utility class to send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated shutdown logic that assumes only the top-level process will receive a signal (for example when running in a container that's shut down). This caused a bunch of errors and stacktraces in some test logs, even though those tests still pass. We should still attempt a normal shutdown and only kill other procs if they are still running after a few seconds. Example: tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-13 07:32:55 +00:00
Thomas Parnell	f296a1966d	[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs (#36876 )	2026-03-13 07:09:39 +01:00
Csrayz	bc2c0c86ef	[Frontend] Fix usage incorrectly returned with empty stream_options` (#36379 ) Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com>	2026-03-13 03:33:04 +00:00

1 2 3 4 5 ...

14842 Commits