biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Harry Mellor	aed16879a9	Move `ModelConfig` from `config/__init__.py` to `config/model.py` (#25252 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-19 16:22:33 +00:00
Harry Mellor	cf278ff3b2	Update CODEOWNERS (#25269 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-19 09:12:55 -07:00
Icey	838d7116ba	[Qwen] Remove cuda hard-code in qwen3 next (#25243 ) Signed-off-by: Icey <1790571317@qq.com>	2025-09-19 12:25:12 +00:00
Cyrus Leung	5089fd749c	[V0 Deprecation] Remove V0 logic from `get_input_embeddings` interface (#25242 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-19 11:10:52 +00:00
Nicolò Lucchesi	a3d087adec	[P/D][Nixl] Introduce `KVTransferMetrics` and aggregation strategy (#22188 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-19 11:09:14 +00:00
Harry Mellor	058525b997	Move `PoolerConfig` from `config/__init__.py` to `config/pooler.py` (#25181 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-19 11:02:55 +00:00
Roger Wang	1dfea5f4a9	[Bugfix][Perf] Misc fixes for Qwen3 VL (#25238 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-19 10:46:16 +00:00
Isotr0py	cea91a32f2	[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE (#25055 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-19 10:27:49 +00:00
Yan Ma	a684c0124c	[bugfix] fix MHA for models like OpenGVLab/InternVL3_5-38B (#25146 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-19 08:45:06 +00:00
Isotr0py	f2718d2948	[Misc] Cleanup test conftest for deprecated encoder-decoder models (#25231 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-19 07:44:56 +00:00
Li, Jiang	825fdb11ad	[Bugfix][CPU] Add placeholder to avoid import errors when using fused_moe ops on platforms without triton (#25137 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-19 07:41:12 +00:00
Li, Jiang	8c1d4acbfe	[CPU] Disable oneDNN linear on non-x86 platforms (#25166 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-19 07:27:22 +00:00
Russell Bryant	486c5599e3	[Build] Update Xgrammar to 0.1.24 to get a CVE fix (#25188 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-09-19 14:27:17 +08:00
Chendi.Xue	a6149aa587	[OOT] Support sync_model_loading for OOT (#25126 ) Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>	2025-09-19 05:41:53 +00:00
Michael Yao	6c8a3c099b	[Docs] Fix griffe warnings in vllm/multimodal (#25216 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-09-18 22:10:44 -07:00
Roger Wang	31a8a2a7bc	[Misc] Clean up MM profiling warnings (#25222 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-19 04:46:57 +00:00
Chen Ding	1a0a04dae9	[Perf] Optimize memory peak during EAGLE model loading. (#24585 ) Signed-off-by: Chen Ding <candy.dc@alibaba-inc.com>	2025-09-19 03:31:16 +00:00
Andrew Xia	6d8246aaff	[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartDoneEvent for streaming (#24938 ) Signed-off-by: Andrew Xia <axia@meta.com>	2025-09-18 19:11:59 -07:00
Or Ozeri	9d1c50a5ac	[KV offload][2/N] Introduce LRU-based CPU offloading management (#20075 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-09-19 00:20:51 +00:00
Andrew Sansom	9a4600e4dc	[CORE] Prompt Embeddings Support for v1 Engine (#24278 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-09-19 08:03:09 +08:00
Lucas Wilkinson	9fac6aa30b	[BugFix] Fix DeepGEMM warmup, no m.weight_scale_inv (#25206 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-18 14:26:28 -07:00
Or Ozeri	a53ad626d6	[KV offload][1b/N] rename offloading to kv_offload (#25191 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-09-18 20:53:52 +00:00
Woosuk Kwon	1c3dad22ff	[V0 Deprecation] Remove unused async_timeout.py (#25190 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-18 20:35:21 +00:00
Wentao Ye	d2a30a2d93	[Bug] Fix torch Compilation Cache Hit Error (#25093 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-18 12:38:37 -07:00
Wentao Ye	75fb112d80	[Bug] Fix `returned_lse` not Defined issue (#25106 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-18 19:32:24 +00:00
Aziz	38db529f66	[feat]: Create interface for model-specific M-RoPE (#24194 ) Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Aziz <azizbenothman76@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-18 19:18:56 +00:00
Nikhil Gupta	064cac7bb7	[fix]: remove data type hardcoding from gptoss model implementation (#23807 ) Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>	2025-09-18 18:15:23 +00:00
Woosuk Kwon	e19bce40a1	[V0 Deprecation] Remove AsyncLLMEngine (#25025 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-18 11:07:42 -07:00
Or Ozeri	505805b645	[KV offload][1/N] Introduce an offloading component (#19848 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-09-18 10:57:07 -07:00
Rohan Potdar	bbdc0f2366	[ROCm][AITER][Bugfix] Switch AITER to use PIECEWISE_AND_FULL compilation (#25104 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2025-09-18 17:46:47 +00:00
Gregory Shtrasberg	dc34059360	[ROCm][CI/Build] Use ROCm7.0 as the base (#25178 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-09-18 09:36:55 -07:00
qizixi	c4cb0af98a	[spec decode] Fix MTP inference path for MiMo-7B model (#25136 ) Signed-off-by: zixi-qi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-18 09:12:19 -07:00
Harry Mellor	1c3b1634aa	[Misc] Add codeowner for Transformers backend (#25180 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 09:01:50 -07:00
Shu Wang	2ea50e977a	Enable Allgather/ReduceScatter backend for NaiveAllToAll (#23964 ) Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Shu Wang <shuw@nvidia.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-18 15:52:58 +00:00
Hyogeun Oh (오효근)	b419937c78	[Docs] Fix warnings in mkdocs build (continued) (#25163 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-09-18 08:23:26 -07:00
wang.yuqi	5f696c33b1	[New Model] Support BertForTokenClassification / Named Entity Recognition (NER) task (#24872 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-18 23:22:01 +08:00
dongbo910220	67244c86f0	feat(api): Return 503 on /health when engine is dead (#24897 ) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-09-18 14:29:40 +00:00
Vadim Gimpelson	072d7e53e5	[PERF] Add `conv1d` metadata to GDN attn (#25105 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-09-18 14:27:49 +00:00
jvlunteren	01a583fea4	[Kernel] Decouple Tile Size from Block Size in Triton Unified Attention Kernel (#21197 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>	2025-09-18 14:27:01 +00:00
Nicolò Lucchesi	bc19d75985	[Misc] Add kv-connector label (#25156 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-18 13:56:07 +00:00
Michael Goin	fbd6523ac0	Refactor dense FP8 tensor/channel/block utils and add CT FP8 block (#21404 )	2025-09-18 08:53:45 -04:00
Shanshan Shen	470484a4f5	[Structured Output][Refactor] Move `apply_grammar_bitmask()` method from `ModelRunner` to structured output utils (#21999 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-09-18 20:44:31 +08:00
Roger Wang	21da73343a	[Misc] Clean up flags in `vllm bench serve` (#25138 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-18 12:43:33 +00:00
Asaf Joseph Gardin	66072b36db	[Bugfix][Mamba] - Fix Conv State Kernel FP32 Support (#24883 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-09-18 12:21:17 +00:00
Harry Mellor	3ed1ec4af2	Fix `validate-config` pre-commit check (#25157 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 12:06:28 +00:00
Harry Mellor	5a33ae9a3f	Fix forward reference warning in documentation (#25150 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 11:41:41 +00:00
William Song	c9ff9e6f0c	[Docs] add the parallel sampling usage in LLMEngine and AsyncLLM (#24222 )	2025-09-18 04:37:08 -07:00
Kay Yan	eaffe4486c	[Docs] Fix pooling-params doc references in openai_compatible_server.md (#24939 )	2025-09-18 04:36:47 -07:00
Harry Mellor	8ed039d527	Move `StructuredOutputsConfig` from `config/__init__.py` to `config/structured_outputs.py` (#25153 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 11:24:27 +00:00
Jee Jee Li	37970105fe	[Model] Improve Pooling Model (#25149 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-18 11:04:21 +00:00

1 2 3 4 5 ...

9654 Commits