biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
danisereb	084aa19f02	Add support for ModelOpt MXFP8 dense models (#33786 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-08 11:16:48 -08:00
SorenDreano	6e7b1c4b59	[Docs] Improve documentation (#33799 ) Co-authored-by: Soren Dreano <soren@numind.ai> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-02-06 12:57:09 +00:00
chengchengpei	965525667b	Onboard voyage-4-nano (#33720 ) Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com> Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-06 06:23:34 +00:00
Ilya Boytsov	439afa4eea	feat: Add ColBERT late interaction model support (#33686 ) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com> Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-05 08:05:13 +08:00
Zhengxu Chen	a208439537	[compile] Remove runner type from ignored caching factor list. (#33712 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-04 10:56:45 +00:00
Harry Mellor	61e632aea1	Turn `@config` into a `dataclass_transform` (#31541 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-03 17:40:59 +00:00
Kunshang Ji	e10604480b	[XPU][1/N] Deprecate ipex and switch to vllm-xpu-kernels for xpu platform (#33379 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-02 22:46:10 -08:00
Cyrus Leung	92924b2ddd	[Deprecation] Remove deprecated items related to pooling (#33477 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-31 08:44:40 -08:00
Cyrus Leung	f0a1c8453a	[Frontend] Use new Renderer for Completions and Tokenize API (#32863 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-31 04:51:15 -08:00
Harry Mellor	67239c4c42	Fix encoder-decoder model disabling mm processor cache (#33236 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-30 16:30:10 +00:00
Julien Denize	ae5b7aff2b	Improve Mistral format checks. (#33253 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: juliendenize <julien.denize@mistral.ai> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-30 06:23:33 -08:00
Robert Shaw	af9b69f977	[Quantization][Deprecation] Remove Marlin 24 (#32688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 15:54:59 +00:00
Robert Shaw	247d1a32ea	[Quantization][Deprecation] Remove BitBlas (#32683 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-28 11:06:22 +00:00
Cyrus Leung	51931c5c9a	[UX] Deduplicate sampling parameter startup logs (#32953 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-24 17:37:28 +08:00
Cyrus Leung	4753f3bf69	[Model] Use context managers for encoder- and LM-only mode (#32605 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 11:43:38 +08:00
sangho.lee	7e6f123810	Add Molmo2 multimodal model support (#30997 ) Signed-off-by: sanghol <sanghol@allenai.org> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-14 15:33:09 +08:00
Yi Liu	50632adc58	Consolidate Intel Quantization Toolkit Integration in vLLM (#31716 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2026-01-14 07:11:30 +00:00
cjackal	15b33ff064	[Misc] improve warning/assert messages (#32226 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2026-01-13 03:11:23 +00:00
Hongxin Xu	49e6b86c91	[Feature] Support recording expert indices for rollout router replay (#28284 ) Signed-off-by: xhx1022 <1737006628@qq.com> Signed-off-by: Hongxin Xu <70438206+xhx1022@users.noreply.github.com> Signed-off-by: arlenxu <arlenxu@tencent.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: arlenxu <arlenxu@tencent.com>	2026-01-12 06:23:04 -08:00
Cyrus Leung	583a90e005	[Refactor] Separate sequence and token pooling types (#32026 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-10 04:53:24 +00:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Robert Shaw	5825bbc1f7	[Quantization] Deprecate Long Tail of Schemes (#31688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-08 19:07:45 -05:00
omer-dayan	04a49669d1	RayLLM Bugfix - Preserve obj store URL for multi engine_config creation (#30803 ) Signed-off-by: Omer Dayan <omdayan@nvidia.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-08 10:00:25 +00:00
Li, Jiang	8becf146bd	[Quantization][Refactor] Move CPU GPTQ kernel into MP linear (#31801 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-06 19:10:18 +00:00
wang.yuqi	96860af655	[Model] rename use_pad_token to use_sep_token (#31784 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-06 14:16:04 +00:00
Isotr0py	51e38a8e30	[Misc] Enable Paligemma's PrefixLM attention mask computation (#31725 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-06 03:31:49 +08:00
Xingyu Liu	0eee877f67	[Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com> Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>	2026-01-02 15:13:15 -08:00
Nick Hill	bd877162eb	[BugFix] Support online dense model DP without overhead (#30739 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-02 23:36:38 +08:00
Nick Hill	3b312fb792	[Minor] Various small code cleanups/simplifications (#31508 ) Signed-off-by: njhill <nickhill123@gmail.com>	2025-12-29 22:42:06 -08:00
Cyrus Leung	7adeb4bfa8	[Bugfix] Fix `max_model_len="auto"` handling (#31260 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-24 19:15:27 +08:00
wang.yuqi	bd89ce16d2	[Model] Introduce verify_and_update_model_config for VerifyAndUpdateConfig. (#31131 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2025-12-24 09:54:57 +00:00
Michael Goin	8ee90c83f8	Add `--max-model-len auto` to auto-fit context to available memory (#29431 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-23 21:37:14 -08:00
Harry Mellor	c016c95b45	Use helper function instead of looping through attribute names (#29788 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-23 17:31:56 +00:00
Patrick von Platen	3faa8bee57	adapt voxtral (#31095 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2025-12-23 05:31:55 -08:00
Harry Mellor	b10d47e0e0	Add util function for checking nesting of rope parameters (#31146 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-23 11:41:49 +00:00
dengyunyang	8f8f469b1b	[BugFix] skip language model in Encoder (#30242 ) Signed-off-by: dengyunyang <584797741@qq.com>	2025-12-22 05:25:59 -08:00
CedricHuang	19cc9468fd	[Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM (#30957 )	2025-12-21 22:34:49 -05:00
Zhonghua Deng	969bbc7c61	[Model] Add MiMo-V2-Flash support (#30836 ) Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Jumiar <liuanqim10@126.com> Signed-off-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jumiar <liuanqim10@126.com> Co-authored-by: Zyann7 <zyann7@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-19 17:17:03 +00:00
Harry Mellor	970713d4a4	Remove `SkipValidation` from `ModelConfig` (#30695 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-15 17:34:08 +00:00
wang.yuqi	4429d934de	[Model] Automatic conversion of TokenClassification model (#30666 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-15 08:13:00 +00:00
yifant-code	5ccf0efa84	[Bugfix] Improve error messages in ModelConfig validation (#30213 ) Signed-off-by: ytian218 <ytian218@bloomberg.net> Co-authored-by: ytian218 <ytian218@bloomberg.net>	2025-12-14 21:23:37 +08:00
Nicolò Lucchesi	0efd9f867c	[Core] Whisper Enable Encoder Batching (#29421 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-11 21:06:51 +00:00
wang.yuqi	a5f9fb5960	[Deprecation] Deprecation `--convert reward`, use `--convert embed` instead. (#30463 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-11 10:18:25 +00:00
Cyrus Leung	7e24e5d4d6	[Deprecation] Remove deprecated task, seed and MM settings (#30397 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:39 -08:00
wang.yuqi	9e77ffca3f	[Model][7/N] Improve all pooling task \| Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-08 08:10:09 +00:00
Isotr0py	b952f4d3c3	[v1] Add PrefixLM support to FlexAttention backend (#27938 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-07 15:51:36 +00:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Rohan Potdar	40a046cd82	[Bugfix]: Fix `TokenizerLike` interface (#30009 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2025-12-05 20:56:40 -08:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00

1 2 3

127 Commits