biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Huy Do	6c9837a761	Fix cuda_archs_loose_intersection when handling sm_*a (#20207 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-06-29 16:52:34 -07:00
Dipika Sikka	6f2f53a82d	[Quantization] Add compressed-tensors NVFP4 MoE Support (#19990 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com>	2025-06-29 22:05:40 +00:00
Michael Goin	7b1895e6ce	[CI Fix] Try fixing eagle e2e test OOM by reducing block allocation (#20213 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-29 10:31:37 +08:00
Wentao Ye	4d36693687	[Refactor] Create a function util and cache the results for `has_deepgemm`, `has_deepep`, `has_pplx` (#20187 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-28 22:06:38 +00:00
Stan Wozniak	daec9dea6e	[Bugfix] Correct behavior of GraniteMoeHybrid for TensorParallel execution (#20137 ) Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>	2025-06-28 08:16:41 -07:00
Nicolò Lucchesi	daceac57c7	[Frontend] Generalize `v1/audio/transcriptions` endpoint (#20179 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-06-28 08:15:26 -07:00
Thomas Parnell	8615d9776f	[CI/Build] Add new CI job to validate Hybrid Models for every PR (#20147 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-06-27 23:00:25 -07:00
Jiayi Yan	7b460c25f9	[BugFix] Fix the incorrect func name in the comments. (config.py) (#20185 )	2025-06-27 22:51:16 -07:00
Michael Goin	f719772281	[Bugfix] Properly reject requests with empty list guided_choice (#20195 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-27 22:50:52 -07:00
Wentao Ye	d45417b804	fix ci issue distributed 4 gpu test (#20204 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-27 22:50:00 -07:00
Michael Goin	a29e62ea34	Fix num_token_padding support for static per-tensor scaled_fp8_quant (#20188 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-27 22:48:13 -07:00
Chales Xu	e53be6f00a	[Misc] Add type assertion of request_id for LLMEngine.add_request (#19700 ) Signed-off-by: n2ptr <xuzhanchaomail@163.com>	2025-06-27 22:47:36 -07:00
Michael Goin	c329ceca6d	[CI Fix] Pin tests/models/registry.py MiniMaxText01ForCausalLM to revision due to model changes (#20199 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-28 13:43:06 +08:00
Fabien Dupont	3c545c0c3b	[CI/Build] Allow hermetic builds (#18064 ) Signed-off-by: Fabien Dupont <fdupont@redhat.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Fabien Dupont <fabiendupont@pm.me> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Elias Levy <eliaslevy@google.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-06-27 09:04:39 -07:00
Tyler Michael Smith	e8c3bd2cd1	[Bugfix] Fix some narrowing conversion warnings (#20141 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-27 09:01:28 -07:00
bnellnm	c6c983053d	[Bugfix] Mark 'hidden_states' as mutable in moe_forward registration. (#20152 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-06-27 09:42:22 -06:00
Luka Govedič	aafabaa0d5	[Fix][torch.compile] Enable custom ops by default when Inductor off (#20102 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-27 09:00:42 -06:00
Hosang	94a55c7681	[Fix][ROCm] Remove unused variables to fix build error on GFX11/12 (#19891 ) Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>	2025-06-27 07:14:44 -07:00
Ilya Lavrenov	aa0dc77ef5	[Perf] Improved perf for resolve_chat_template_content_format (#20065 ) Signed-off-by: Ilya Lavrenov <ilya.lavrenov@cerebras.net>	2025-06-27 09:16:41 +00:00
Michael Goin	4ab3ac285e	[Bugfix] Fix flaky failure when getting DP ports (#20151 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-27 15:30:53 +08:00
Robert Shaw	d1c956dc0f	Gemma3n (Text-only) (#20134 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-06-27 07:16:26 +00:00
Chendi.Xue	dec197e3e5	Quick Fix by adding conditional import for flash_attn_varlen_func in flash_attn (#20143 ) Signed-off-by: Chendi.Xue <chendi.xue@intel.com>	2025-06-27 05:48:13 +00:00
Yazan Sharaya	6e244ae091	[Perf][Frontend] eliminate api_key and x_request_id headers middleware overhead (#19946 ) Signed-off-by: Yazan-Sharaya <yazan.sharaya.yes@gmail.com>	2025-06-27 00:44:14 -04:00
wang.yuqi	cd4cfee689	[Model][1/N] Automatic conversion of CrossEncoding model (#20012 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-06-26 21:10:04 -07:00
Thomas Parnell	e110930680	[Fix] Fix gemma CI test failing on main (#20124 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-06-26 21:06:59 -07:00
Yang Wang	8b64c895c0	[CI] Sync test dependency with test.in for torch nightly (#19632 ) Signed-off-by: Yang Wang <elainewy@meta.com> Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Concurrensee <yida.wu@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-06-26 20:55:25 -07:00
li haoyang	0740e29b66	[Feature] add quick all reduce (#19744 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Signed-off-by: Haoyang Li <Haoyang.Li@amd.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-06-26 20:54:24 -07:00
Michael Goin	44d2e6af63	[Bugfix] Build moe_data for both sm100 and sm90 (#20086 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-26 20:50:12 -07:00
Ilya Markov	2d7779f888	[Perf] SM100 FP8 GEMM Optimizations after cutlass_profiler (#20071 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-06-26 20:50:09 -07:00
Dipika Sikka	a57d57fa72	[Quantization] Bump to use latest `compressed-tensors` (#20033 ) Signed-off-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>	2025-06-26 20:50:06 -07:00
Michael Goin	71799fd005	[CI Failure] Fix OOM with test_oot_registration_embedding (#20144 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-27 11:21:04 +08:00
Bowen Wang	e9fd658a73	[Feature] Expert Parallelism Load Balancer (EPLB) (#18343 ) Signed-off-by: Bowen Wang <abmfy@icloud.com>	2025-06-26 15:30:21 -07:00
Kyle Yu	07b8fae219	[Doc] correct LoRA capitalization (#20135 ) Signed-off-by: kyolebu <kyu@redhat.com>	2025-06-26 15:22:12 -07:00
Wentao Ye	562308816c	[Refactor] Rename commnication utils (#20091 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-26 22:19:32 +00:00
Chengji Yao	04e1642e32	[TPU] add kv cache update kernel (#19928 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-06-26 10:01:37 -07:00
Kunshang Ji	b69781f107	[Hardware][Intel GPU] Add v1 Intel GPU support with Flash attention backend. (#19560 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-06-26 09:27:18 -07:00
Tyler Michael Smith	0bceac9810	Spam folks if config.py changes (#20131 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-26 08:19:46 -07:00
Cyrus Leung	34878a0b48	[Doc] Rename page titles (#20130 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-26 08:18:49 -07:00
Cyrus Leung	6393b03986	[Doc] Auto sign-off for VSCode (#20132 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-26 08:18:36 -07:00
wang.yuqi	0907d507bf	[Doc] Automatically signed-off by PyCharm (#20120 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-06-26 14:34:17 +00:00
Wentao Ye	c894c5dc1f	[Bug Fix] Fix address/port already in use error for deep_ep test (#20094 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-26 22:33:13 +08:00
Michael Goin	1f5d178e9c	Revert "[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine" (#20128 )	2025-06-26 07:32:22 -07:00
TJian	27c065df50	[Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) (#19904 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-06-26 12:42:31 +00:00
Michael Yao	84c260caeb	[Docs] Improve frameworks/helm.md (#20113 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-06-26 10:41:51 +00:00
Reid	167aca45cb	[Misc] Use collapsible blocks for benchmark examples. (#20017 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-26 03:35:16 -07:00
Li, Jiang	0567c8249f	[CPU] Fix torch version in x86 CPU backend (#19258 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-06-26 03:34:47 -07:00
Wentao Ye	d188913d99	[Refactor] Remove unused library (#20099 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-26 09:16:10 +00:00
Cyrus Leung	1d7c29f5fe	[Doc] Update docs for New Model Implementation (#20115 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-26 00:47:06 -07:00
Seiji Eicher	65397e40f5	[Bugfix] Allow `CUDA_VISIBLE_DEVICES=''` in `Platform.device_id_to_physical_device_id` (#18979 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-06-26 00:01:57 -07:00
Ekagra Ranjan	9502c38138	[Benchmark][Bug] Fix multiple bugs in bench and add args to spec_decode offline (#20083 )	2025-06-25 22:06:27 -07:00

... 37 38 39 40 41 ...

9263 Commits