biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
danisereb	9609b1f18d	Integrate flashinfer mm_mxfp8 in ModelOpt MXFP8 (#35053 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-24 08:45:13 -07:00
danisereb	a0c7081695	Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe (#35088 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-24 07:25:44 -08:00
R3hankhan	34ce0ffd1f	[CPU][Perf] Accelerate Attention head for s390x using vector intrinsics (#34434 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2026-02-24 07:25:39 -08:00
Robin Nabel	0de5333989	Fix GLM4 parser tests (#34905 ) Signed-off-by: Robin Nabel <opensource@nabel.co> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-02-24 22:27:42 +08:00
Eldar Kurtić	a87cc50859	[Attn,KV-cache] Use per-head scales in the attention selector (#34281 ) Signed-off-by: Your Name <you@example.com> Signed-off-by: Eldar Kurtic <research@neuralmagic.com> Co-authored-by: Eldar Kurtic <research@neuralmagic.com> Co-authored-by: Your Name <you@example.com>	2026-02-24 09:02:43 -05:00
Cyrus Leung	761e63e541	[Frontend] Always pass `supported_tasks` to validation (#35186 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-24 04:16:33 -08:00
Isotr0py	d12d201409	[Bugfix] Fix failing FunASR processor test (#35111 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-24 04:13:45 -08:00
eustlb	b3ad37c5db	[glm-asr] change defaults dummy audio size (#35108 ) Signed-off-by: Eustache Le Bihan <eulebihan@gmail.com>	2026-02-24 04:13:33 -08:00
Wentao Ye	14561fabfd	[Perf] Optimize pooling model redundant copy, 1.8% throughput improvement (#35127 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-24 04:13:11 -08:00
Zhengxu Chen	c77f3e1207	[compile] Save aot compile artifacts atomically. (#35117 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-24 04:11:01 -08:00
Dor Huri	012dee9233	[Feature] Add LoRA tower/connector support for Llama 4 Vision (mllama4) (#35147 ) Signed-off-by: dorhuri123 <dor.huri1@live.biu.ac.il> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-24 04:10:32 -08:00
Tugsbayasgalan Manlaibaatar	f1c664545b	Make voxtral compile friendly (#33959 ) Signed-off-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-24 09:33:35 +01:00
Xin Yang	c870eb9e0f	[LoRA] Update LoRA expand kernel block_n calculation (#32621 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-23 23:17:53 -08:00
BadrBasowid	6af03f2394	[Refactor] [1/N] Reorganize kernel abstraction directory (#34055 ) Signed-off-by: BadrBasowid <badr.basowid@gmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-24 06:47:22 +00:00
Vlad Tiberiu Mihailescu	1a6cf39dec	[CI/Build] Remove redundant OpenTelemetry pip install from CI configs (#35032 ) Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>	2026-02-23 22:24:11 -08:00
Nicolò Lucchesi	f91808ae0d	[MM] Allow audio chunking for offline LLM (#34628 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-23 21:04:28 -08:00
Vadim Gimpelson	33a0d43c71	[BUGFIX][Qwen3.5] Hardcode `mlp.gate` as not quantizable (#35156 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-02-23 19:42:24 -08:00
pschlan-amd	80d93fd6da	gpu_model_runner: Cache is_encoder_decoder from model config (#35099 ) Signed-off-by: Patrick Schlangen <pschlan@amd.com>	2026-02-23 19:08:34 -08:00
Jia Guo	ec85340531	[Quantization] Support FP8 MoE bias for models like GPT-OSS (#34906 ) Signed-off-by: jasperjiaguo <jasperg662@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-23 19:07:47 -08:00
Rohan Potdar	2ff4e51152	[ROCm] AITER fused RoPE+KVCache (#33443 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: charlifu <charlifu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>	2026-02-23 19:06:00 -08:00
Asaf Gardin	95642441d0	[Mamba1] - Change supports_update_block_table to True (#35054 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-02-23 19:05:57 -08:00
Xin Yang	a7c9f7b7ec	[Bugfix] Fix lora_ids in FusedMoE LoRA test (#35135 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-23 21:49:25 -05:00
Michael Goin	a4bd661fb3	[Perf] Enable FlashInfer DeepGEMM swapAB on SM90 by default (#34924 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-23 17:34:41 -08:00
Michael Goin	3ef9fd0f98	[Bugfix] Fix DSV3 kernels breaking _C and _moe_C on unsupported arches (#35123 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-23 17:11:27 -08:00
Michael Goin	22a97e6613	[Perf] Improve default triton fused moe configs (#34846 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-23 16:01:28 -08:00
Aaron Hao	596ed1f02e	[RL] Validation for pause_mode='keep' (#34992 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-02-23 16:30:56 -05:00
Nicolò Lucchesi	b8d8b7e934	[Misc] Monitor interface changes (#35113 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-23 17:14:51 +00:00
Harry Mellor	28c5e69ba0	Enforce that `model` is the first positional arg when `--served-model-name` is used (#34973 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-23 08:38:05 -08:00
Harry Mellor	864167d376	Fix custom processors that use deleted import for Transformers v5 (#35101 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-23 08:38:00 -08:00
haosdent	a2ba6a5244	[Bugfix] Fix prefix caching for Mamba 'all' mode (Nemotron models) (#34874 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-23 17:31:51 +01:00
Harry Mellor	c4f38696f7	Use Xet high performance mode for Transformers v5 (#35098 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-23 08:19:30 -08:00
haosdent	a7f341c323	[Bugfix] Fix MRotaryEmbedding missing `truncate` attr with YaRN scaling (#35080 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-23 16:05:52 +00:00
Robert Shaw	d13ece38d7	[CI] Skip Responses API (#34990 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-02-23 07:46:45 -08:00
Mark McLoughlin	5cc7c4452e	[Metrics] Add Prometheus counters for Model FLOPs Utilization (MFU) (#30950 ) Export the existing Model FLOPs Utilization (MFU) metrics via Prometheus. `--enable-mfu-metrics` is required for these to be exposed. Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-02-23 15:01:07 +00:00
Eldar Kurtić	b95bb6927f	[kv-cache, ct] Use compressed-tensors as a source of ground-truth for quant strategies (#34254 ) Signed-off-by: Your Name <you@example.com> Co-authored-by: Your Name <you@example.com>	2026-02-23 07:37:55 -07:00
Cyrus Leung	392645454b	[Refactor] Decouple TimingContext from InputProcessingContext (#35083 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-23 14:15:50 +00:00
Eldar Kurtić	1e8438a89a	[Llama4,CI] Bring back Llama-4 bug fixes, and also fix Maverick tests (#35033 ) Signed-off-by: Eldar Kurtic <you@example.com> Co-authored-by: Eldar Kurtic <you@example.com>	2026-02-23 09:04:34 -05:00
Robert Shaw	8435b2e049	[ModelBash][DSV3] Add TRTLLM DSV3 Router GEMM kernel (6% B1 Speedup) (#34302 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-02-23 14:02:26 +00:00
Yan Ma	b1b5e045df	[XPU] allow TORCH_SDPA/TRITON_ATTN as XPU vit Backend (#35010 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-02-23 05:06:44 -08:00
Andreas Karatzas	5f68464f92	[ROCm][CI] Fix spec decode profile assertion and logprob test determinism (#35043 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-23 05:05:54 -08:00
Vincent Gimenes	aa08a30fc9	[CLEANING] Remove unused disable_by_batch_size from SpeculativeConfig (#35060 ) Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>	2026-02-23 05:05:36 -08:00
Wentao Ye	7f40e9e516	[Refactor] Remove dead private func `_fp8_perm` and `_extract_mask_for_item` (#35068 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-23 05:05:20 -08:00
Harry Mellor	103e614b14	Fix pipeline parallel with embed scaling in the Transformers modelling backend (#35094 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-23 05:04:47 -08:00
Neil Schemenauer	54e2f83d0a	[Feature] Lazy import for the "mistral" tokenizer module. (#34651 ) Signed-off-by: Neil Schemenauer <nas@arctrix.com>	2026-02-23 00:43:01 -08:00
Gabe Goodhart	e631f8e78e	fix: Apply embedding_multiplier to inputs_embeds (#34813 ) Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-23 00:42:46 -08:00
Martin Hickey	e97c46a92d	[BugFix]: Fix local mypy issues (#34739 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-23 00:40:29 -08:00
Jee Jee Li	7291d1b288	[Bugfix] Fix kernel benchmark (#33752 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-22 21:18:08 -08:00
Cyrus Leung	987506bca6	[Refactor] Simplify dummy data generation (#35025 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-22 20:55:27 -08:00
Woosuk Kwon	c645e9a214	[Model Runner V2] Remove propose_draft method (#35070 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-22 18:27:12 -08:00
Nick Hill	944ffb5968	[Model Runner V2][Minor] Remove redundant `do_spec_decode` field (#35039 ) Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-22 16:18:04 -08:00

... 18 19 20 21 22 ...

15117 Commits