biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Seungmin Kim	160424a937	[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs (#33992 ) Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com> Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com> Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 18:15:51 -08:00
Joao Gante	709eadbb0b	Doc link typo (#35281 ) Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-25 03:00:31 -08:00
Yanwen Lin	80e60a6133	[Doc] Suggest "--managed-python" flag when installing python using uv (#33069 ) Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>	2026-02-25 08:19:43 +00:00
jonoillar	26e722f906	[DOC][BugFix] Specfiy build dependency installation (#34513 ) Signed-off-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com> Co-authored-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com>	2026-02-25 08:04:06 +00:00
lichuang	2c619e5e3f	[Docs]Fix documentation formatting in architecture overview (#34679 ) Signed-off-by: codedump <lichuang1982@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-25 08:00:15 +00:00
Simon Mo	8a685be8d9	docs: document committer proposal process in governance (#35225 ) Signed-off-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-25 07:58:48 +00:00
Harry Mellor	f7967577f5	Remove requirement to use `--hf-overrides` for `DeepseekVLV2ForCausalLM` (#35203 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-24 22:00:06 -08:00
Nicolò Lucchesi	f91808ae0d	[MM] Allow audio chunking for offline LLM (#34628 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-23 21:04:28 -08:00
Mark McLoughlin	5cc7c4452e	[Metrics] Add Prometheus counters for Model FLOPs Utilization (MFU) (#30950 ) Export the existing Model FLOPs Utilization (MFU) metrics via Prometheus. `--enable-mfu-metrics` is required for these to be exposed. Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-02-23 15:01:07 +00:00
Cyrus Leung	987506bca6	[Refactor] Simplify dummy data generation (#35025 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-22 20:55:27 -08:00
Athrael Soju	970861ac0c	[New Model] Add ColModernVBERT (#34558 ) Signed-off-by: Athrael Soju <athrael.soju@gmail.com> Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com>	2026-02-22 12:23:41 +08:00
petrpechman	bebfe55b1c	[Doc] Fix example of eagle3 (#34960 ) Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz> Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz>	2026-02-21 09:57:53 +00:00
Nicolò Lucchesi	ab6f3487a6	[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-21 01:34:57 -08:00
BADAOUI Abdennacer	8dc8a99b56	[ROCm] Enable bitsandbytes quantization support on ROCm (#34688 ) Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>	2026-02-21 00:34:55 -08:00
Kata Coder	5719a4e4e6	[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed (#34574 ) Signed-off-by: craftsangjae <craftsangjae@gmail.com>	2026-02-20 20:01:40 -08:00
Kyle Sayers	64ac1395e8	[Docs] Clean up speculators docs (#34065 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-02-18 13:48:11 -08:00
Cyrus Leung	a766b30349	[Renderer] Deprecate code paths for old input processing (#34775 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-18 00:35:04 -08:00
Matthew Bonanni	dc5fa77a4e	[Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs (#34457 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-17 14:01:27 -05:00
Harry Mellor	a21cedf4ff	Bump `lm-eval` version for Transformers v5 compatibility (#33994 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-16 05:24:35 -08:00
Parth Bansal	5653021094	[Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model (#34584 ) Signed-off-by: Parth Bansal <parthbansal127@gmail.com>	2026-02-16 12:09:00 +08:00
Maryam Tahhan	f07a128413	[CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… (#33079 ) Signed-off-by: Maryam Tahhan <mtahhan@redhat.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2026-02-15 06:33:08 -08:00
Isotr0py	19fab44152	[Doc] Update Encoder-Decoder models support doc with Florence-2 (#34581 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-15 04:18:57 -08:00
Kata Coder	d1ea65d0a1	[new model] add COLQwen3 code & Inference (#34398 ) Signed-off-by: craftsangjae <craftsangjae@gmail.com> Signed-off-by: katacoder <craftsangjae@gmail.com>	2026-02-14 12:15:19 +08:00
Ilya Boytsov	071d863e20	Extend ColBERT support to non-standard BERT backbones (#34170 ) Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>	2026-02-13 09:53:09 +00:00
myselvess	bcf0731aa0	[New Model] support new model ovis2.6 (#34426 ) Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com>	2026-02-13 00:12:45 -08:00
Matthew Bonanni	f2c47886fd	[Attention] Add FlashInfer Sparse MLA backend (#33451 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2026-02-12 17:21:54 +00:00
Nicolò Lucchesi	334c715e0f	[Docs] Spec decoding docs warning removal (#34439 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-12 09:01:51 -08:00
Louie Tsai	55a1a9563a	Vllm CPU benchmark suite improvement (#34128 ) Signed-off-by: louie-tsai <louie.tsai@intel.com>	2026-02-12 16:04:44 +08:00
Tianqi Ren	786806dd44	[Doc] Update Marlin support matrix for Turing (#34319 ) Signed-off-by: Tianqi Ren <tianqi.r@outlook.com>	2026-02-11 09:03:41 +00:00
Kunshang Ji	cb9574eb85	[XPU][9/N] clean up existing ipex code/doc (#34111 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-11 00:27:15 -08:00
AllenDou	21dfb842d7	[model] support FunASR model (#33247 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>	2026-02-11 07:37:09 +00:00
Cyrus Leung	c9a1923bb4	[Plugin] Simplify IO Processor Plugin interface (#34236 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 19:47:39 -08:00
bnellnm	d1481ba783	[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-02-10 19:51:07 -05:00
Cyrus Leung	25e48a3aae	[Doc] Update usage of `--limit-mm-per-prompt` (#34148 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-09 21:12:13 -08:00
Michael Goin	5e75a14a66	[Doc] Add DCP support to attention backend doc (#33936 )	2026-02-09 18:33:43 -05:00
JJJYmmm	9562912cea	[MODEL] Adding Support for Qwen3.5 Models (#34110 ) Signed-off-by: JJJYmmm <1650675829@qq.com> Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wulipc <wulipc@users.noreply.github.com> Co-authored-by: ywang96 <ywang96@users.noreply.github.com> Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-02-09 21:12:58 +08:00
Ekagra Ranjan	1d5922fade	[ASR] Fix audio benchmark and add RTFx metric (#32300 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2026-02-09 10:02:37 +00:00
wang.yuqi	22b64948f6	[Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-09 06:42:38 +00:00
danisereb	084aa19f02	Add support for ModelOpt MXFP8 dense models (#33786 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-08 11:16:48 -08:00
Jee Jee Li	db4ede9743	[Model] Enable Step3p5ForCausalLM testing (#33755 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-07 05:25:24 -08:00
果冻虾仁	6f7adc533a	fix description in plugin_system.md (#33999 )	2026-02-06 19:37:02 -08:00
vllmellm	aaa2efbe98	[DOC] [ROCm] Update docker deployment doc (#33971 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 10:05:35 -08:00
Michael Goin	c39ee9ee2b	[Docs] Add sections on process architecture and minimum CPU resources (#33940 ) It seems users can be confused about vLLM's performance when running with very small amounts of CPU cores available. We are missing a clear overview of what vLLM's process architecture is, so I added this along with some diagrams in arch_overview.md, and included a section on CPU resource recommendations in optimization.md Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-06 15:26:43 +00:00
Raushan Turganbay	85ee1d962b	[Bugfix] Fix models and tests for transformers v5 (#33977 ) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 21:47:41 +08:00
SorenDreano	6e7b1c4b59	[Docs] Improve documentation (#33799 ) Co-authored-by: Soren Dreano <soren@numind.ai> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-02-06 12:57:09 +00:00
chengchengpei	965525667b	Onboard voyage-4-nano (#33720 ) Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com> Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-06 06:23:34 +00:00
Simon Mo	5819ca8944	[Docs] Add reo analytics (#33957 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2026-02-05 17:42:22 -08:00
Nicolò Lucchesi	81a90e5277	[Docs] Add bart-plugin to docs (#33905 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-05 12:20:25 +00:00
liranschour	8322d4e47f	Enable Cross layers KV cache layout at NIXL Connector V2 (#33339 ) Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-05 02:17:02 -08:00
rinbaro	007b183d74	[docs] fix unintentional misspellings (#33863 ) Signed-off-by: rinbaro <ilgomishra@gmail.com>	2026-02-04 20:50:59 -08:00

1 2 3 4 5 ...

1989 Commits