Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

5d18bf8b32 [Bugfix] Fix Harmony preamble visibility in Responses API (#32114) pushkar 2026-02-25 21:38:16 +05:30
0788ff0a15 [Bugfix] Gracefully disable AllReduceFusionPass on GPUs without multicast support (#35085) haosdent 2026-02-25 23:31:45 +08:00
d72b0be33c [XPU]Fix for Qwen-OMNI crash (#35249) Chendi.Xue 2026-02-25 09:31:07 -06:00
42489e43c2 [Misc][LoRA] Increase max vocab size limit to 258048 in logits processor (#34773) Bhoomit 2026-02-25 07:30:55 -08:00
af5e6afa0a [Bugfix] Fix step3p5 reasoning with interleaved thinking (#34211) Mario Hong 2026-02-25 23:13:01 +08:00
ee59a7c615 [Tests] Add GSM8k check to SpecDec E2E tests (#34772) Benjamin Chislett 2026-02-25 07:51:14 -05:00
709eadbb0b Doc link typo (#35281) Joao Gante 2026-02-25 11:00:31 +00:00
90fc7f9109 Fix custom processors that use deleted behaviour for Transformers v5 (#35107) Harry Mellor 2026-02-25 10:36:21 +00:00
675ec59aa9 [Bugfix][CPU] Fix basic unit tests failing in CPU platforms (#34677) Yanwen Lin 2026-02-25 00:36:15 -08:00
80e60a6133 [Doc] Suggest "--managed-python" flag when installing python using uv (#33069) Yanwen Lin 2026-02-25 00:19:43 -08:00
26e722f906 [DOC][BugFix] Specfiy build dependency installation (#34513) jonoillar 2026-02-25 09:04:06 +01:00
2c619e5e3f [Docs]Fix documentation formatting in architecture overview (#34679) lichuang 2026-02-25 16:00:15 +08:00
8a685be8d9 docs: document committer proposal process in governance (#35225) Simon Mo 2026-02-24 23:58:48 -08:00
2465071510 [Perf] Add opt-in SM100 Oink RMSNorm custom-op path (#31828) Laura Wang 2026-02-24 23:01:53 -08:00
cd43673668 [Perf] Optimize FP8 gemm of sm120. (#34424) wenshuai 2026-02-25 14:25:24 +08:00
35d44b4557 [XPU]Support CUDAGraph on XPU Platform (#34482) Xinyu Chen 2026-02-25 14:22:52 +08:00
8ad54a991b [Platform] Add current_platform.num_compute_units interface (#35042) Kunshang Ji 2026-02-25 14:22:49 +08:00
92510edc32 remove cuda check in top_k_top_p_triton kernel (#35011) Kunshang Ji 2026-02-25 14:22:31 +08:00
a6c137521c [Misc] Add shard_id validation for MergedColumnLinear (#35055) Isotr0py 2026-02-25 14:12:28 +08:00
4572a06afe [Misc] Enable weights loading tracking for quantized models (#35074) Isotr0py 2026-02-25 14:11:03 +08:00
5cc29cfb8b [compile] Improve error message during artifacts load failure. (#35115) Zhengxu Chen 2026-02-25 01:01:09 -05:00
8fae54faff [Linear Attention] fix bug for linear attention + prefix caching + reset_prefix_cache (#35157) Chen Zhang 2026-02-24 22:00:19 -08:00
f7967577f5 Remove requirement to use --hf-overrides for DeepseekVLV2ForCausalLM (#35203) Harry Mellor 2026-02-25 06:00:06 +00:00
af770b8e7b [Bugfix] Fix AttributeError when passing StructuredOutputsParams to CompletionRequest (#35237) pks 2026-02-25 07:00:03 +01:00
2ff3e436ad [Responses][CI] Filter negative token IDs in schema fuzz test to avoid 500 errors (#35231) Andreas Karatzas 2026-02-24 23:52:44 -06:00
c2c4c4611a [FIX] fused moe with lora shared expert dual stream (1.07x otps) (#34933) Jhao-Ting Chen 2026-02-24 20:40:45 -08:00
f38f8c9742 [ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE (#35180) Rohan Potdar 2026-02-24 22:36:40 -06:00
89a77b1084 [ROCm][CI] Pin TorchCodec to v0.10.0 for ROCm compatibility (#34447) v0.16.0 Andreas Karatzas 2026-02-12 12:47:34 -06:00
d3c1513f5f [ci] Use the right tag for CPU arm64 image (#34915) Kevin H. Luu 2026-02-19 19:59:15 -08:00
5dbfbc967b [CI/Build] Fix gRPC version mismatch (#35013) Cyrus Leung 2026-02-22 03:14:41 +08:00
c86cdcbcd2 Revert "[Release 2.10] Update to Torch 2.10 - final release (#30525)" khluu 2026-02-24 20:28:53 -08:00
3c9496f146 Revert "[Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available (#34153)" khluu 2026-02-24 20:28:45 -08:00
ec1d30c0f6 [Responses] Decouple SSE event helpers from Harmony context (#35148) Flora Feng 2026-02-24 23:05:25 -05:00
e3b2324ec4 [Frontend] Use init_app_state and FrontendArgs in run_batch (#32967) Pooya Davoodi 2026-02-24 19:40:39 -08:00
dbf0da817a [Core] Cleanup engine pause/sleep logic (#34528) Nick Hill 2026-02-24 19:33:34 -08:00
3bbb2046ff [Bugfix] Fix expert_ids padding values in moe_align_block_size kernel (#35161) Xin Yang 2026-02-24 17:14:24 -08:00
576fe50333 Adding Nemotron fp8 Triton MoE Config (#34674) yugong333 2026-02-24 15:56:38 -08:00
a0e50a4260 Convert wvSplitKQ to 16x16 MFMA in prep for mi4xx. (#34100) Hashem Hashemi 2026-02-24 15:35:21 -08:00
9fa5b25a23 [Bug][DSV3.2] Always prepare metadata for DeepGEMM Sparse Attention (#35075) Benjamin Chislett 2026-02-24 17:55:22 -05:00
ea97750414 [CI] Fix Distributed Tests (#35236) Robert Shaw 2026-02-24 17:31:56 -05:00
067c5d9ad1 [ROCm][CI] Added MI325 mirrors (#34923) Andreas Karatzas 2026-02-24 15:37:15 -06:00
f5972a872f [Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support (#33726) Benjamin Chislett 2026-02-24 12:49:56 -05:00
a9e15e040d Add @MatthewBonanni to CODEOWNERS (#35207) Matthew Bonanni 2026-02-24 12:45:10 -05:00
542ca66357 Revert "[CI/Build] Remove redundant OpenTelemetry pip install from CI configs" (#35211) Lucas Wilkinson 2026-02-24 12:26:42 -05:00
fc8456c336 [CI/Build] Fix kernels test location (#35205) Cyrus Leung 2026-02-25 01:20:34 +08:00
9ce8fad2a9 [Perf] Optimize Python Slice for Structured Output using islice instead of [:] (#33593) Wentao Ye 2026-02-24 12:02:36 -05:00
c38b8d5a31 Remove padding_index from models that don't use it for better Transformers v5 compatibility (#35189) Harry Mellor 2026-02-24 16:04:46 +00:00
60da0e1544 [CI] Remove Duplicated Tests (#35199) Robert Shaw 2026-02-24 10:53:30 -05:00
9609b1f18d Integrate flashinfer mm_mxfp8 in ModelOpt MXFP8 (#35053) danisereb 2026-02-24 17:45:13 +02:00
a0c7081695 Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe (#35088) danisereb 2026-02-24 17:25:44 +02:00
34ce0ffd1f [CPU][Perf] Accelerate Attention head for s390x using vector intrinsics (#34434) R3hankhan 2026-02-24 20:55:39 +05:30
0de5333989 Fix GLM4 parser tests (#34905) Robin Nabel 2026-02-24 14:27:42 +00:00
a87cc50859 [Attn,KV-cache] Use per-head scales in the attention selector (#34281) Eldar Kurtić 2026-02-24 15:02:43 +01:00
761e63e541 [Frontend] Always pass supported_tasks to validation (#35186) Cyrus Leung 2026-02-24 20:16:33 +08:00
d12d201409 [Bugfix] Fix failing FunASR processor test (#35111) Isotr0py 2026-02-24 20:13:45 +08:00
b3ad37c5db [glm-asr] change defaults dummy audio size (#35108) eustlb 2026-02-24 13:13:33 +01:00
14561fabfd [Perf] Optimize pooling model redundant copy, 1.8% throughput improvement (#35127) Wentao Ye 2026-02-24 07:13:11 -05:00
c77f3e1207 [compile] Save aot compile artifacts atomically. (#35117) Zhengxu Chen 2026-02-24 07:11:01 -05:00
012dee9233 [Feature] Add LoRA tower/connector support for Llama 4 Vision (mllama4) (#35147) Dor Huri 2026-02-24 14:10:32 +02:00
f1c664545b Make voxtral compile friendly (#33959) Tugsbayasgalan Manlaibaatar 2026-02-24 16:33:35 +08:00
c870eb9e0f [LoRA] Update LoRA expand kernel block_n calculation (#32621) Xin Yang 2026-02-23 23:17:53 -08:00
6af03f2394 [Refactor] [1/N] Reorganize kernel abstraction directory (#34055) BadrBasowid 2026-02-24 14:47:22 +08:00
1a6cf39dec [CI/Build] Remove redundant OpenTelemetry pip install from CI configs (#35032) Vlad Tiberiu Mihailescu 2026-02-24 00:24:11 -06:00
f91808ae0d [MM] Allow audio chunking for offline LLM (#34628) Nicolò Lucchesi 2026-02-24 06:04:28 +01:00
33a0d43c71 [BUGFIX][Qwen3.5] Hardcode mlp.gate as not quantizable (#35156) Vadim Gimpelson 2026-02-24 07:42:24 +04:00
80d93fd6da gpu_model_runner: Cache is_encoder_decoder from model config (#35099) pschlan-amd 2026-02-24 04:08:34 +01:00
ec85340531 [Quantization] Support FP8 MoE bias for models like GPT-OSS (#34906) Jia Guo 2026-02-23 19:07:47 -08:00
2ff4e51152 [ROCm] AITER fused RoPE+KVCache (#33443) Rohan Potdar 2026-02-23 21:06:00 -06:00
95642441d0 [Mamba1] - Change supports_update_block_table to True (#35054) Asaf Gardin 2026-02-24 05:05:57 +02:00
a7c9f7b7ec [Bugfix] Fix lora_ids in FusedMoE LoRA test (#35135) Xin Yang 2026-02-23 18:49:25 -08:00
a4bd661fb3 [Perf] Enable FlashInfer DeepGEMM swapAB on SM90 by default (#34924) Michael Goin 2026-02-23 20:34:41 -05:00
3ef9fd0f98 [Bugfix] Fix DSV3 kernels breaking _C and _moe_C on unsupported arches (#35123) Michael Goin 2026-02-23 20:11:27 -05:00
22a97e6613 [Perf] Improve default triton fused moe configs (#34846) Michael Goin 2026-02-23 19:01:28 -05:00
596ed1f02e [RL] Validation for pause_mode='keep' (#34992) Aaron Hao 2026-02-23 13:30:56 -08:00
b8d8b7e934 [Misc] Monitor interface changes (#35113) Nicolò Lucchesi 2026-02-23 18:14:51 +01:00
28c5e69ba0 Enforce that model is the first positional arg when --served-model-name is used (#34973) Harry Mellor 2026-02-23 16:38:05 +00:00
864167d376 Fix custom processors that use deleted import for Transformers v5 (#35101) Harry Mellor 2026-02-23 16:38:00 +00:00
a2ba6a5244 [Bugfix] Fix prefix caching for Mamba 'all' mode (Nemotron models) (#34874) haosdent 2026-02-24 00:31:51 +08:00
c4f38696f7 Use Xet high performance mode for Transformers v5 (#35098) Harry Mellor 2026-02-23 16:19:30 +00:00
a7f341c323 [Bugfix] Fix MRotaryEmbedding missing truncate attr with YaRN scaling (#35080) haosdent 2026-02-24 00:05:52 +08:00
d13ece38d7 [CI] Skip Responses API (#34990) Robert Shaw 2026-02-23 10:46:45 -05:00
5cc7c4452e [Metrics] Add Prometheus counters for Model FLOPs Utilization (MFU) (#30950) Mark McLoughlin 2026-02-23 15:01:07 +00:00
b95bb6927f [kv-cache, ct] Use compressed-tensors as a source of ground-truth for quant strategies (#34254) Eldar Kurtić 2026-02-23 15:37:55 +01:00
392645454b [Refactor] Decouple TimingContext from InputProcessingContext (#35083) Cyrus Leung 2026-02-23 22:15:50 +08:00
1e8438a89a [Llama4,CI] Bring back Llama-4 bug fixes, and also fix Maverick tests (#35033) Eldar Kurtić 2026-02-23 15:04:34 +01:00
8435b2e049 [ModelBash][DSV3] Add TRTLLM DSV3 Router GEMM kernel (6% B1 Speedup) (#34302) Robert Shaw 2026-02-23 09:02:26 -05:00
b1b5e045df [XPU] allow TORCH_SDPA/TRITON_ATTN as XPU vit Backend (#35010) Yan Ma 2026-02-23 21:06:44 +08:00
5f68464f92 [ROCm][CI] Fix spec decode profile assertion and logprob test determinism (#35043) Andreas Karatzas 2026-02-23 07:05:54 -06:00
aa08a30fc9 [CLEANING] Remove unused disable_by_batch_size from SpeculativeConfig (#35060) Vincent Gimenes 2026-02-23 14:05:36 +01:00
7f40e9e516 [Refactor] Remove dead private func _fp8_perm and _extract_mask_for_item (#35068) Wentao Ye 2026-02-23 08:05:20 -05:00
103e614b14 Fix pipeline parallel with embed scaling in the Transformers modelling backend (#35094) Harry Mellor 2026-02-23 13:04:47 +00:00
54e2f83d0a [Feature] Lazy import for the "mistral" tokenizer module. (#34651) Neil Schemenauer 2026-02-23 00:43:01 -08:00
e631f8e78e fix: Apply embedding_multiplier to inputs_embeds (#34813) Gabe Goodhart 2026-02-23 01:42:46 -07:00
e97c46a92d [BugFix]: Fix local mypy issues (#34739) Martin Hickey 2026-02-23 08:40:29 +00:00
7291d1b288 [Bugfix] Fix kernel benchmark (#33752) Jee Jee Li 2026-02-23 13:18:08 +08:00
987506bca6 [Refactor] Simplify dummy data generation (#35025) Cyrus Leung 2026-02-23 12:55:27 +08:00
c645e9a214 [Model Runner V2] Remove propose_draft method (#35070) Woosuk Kwon 2026-02-22 18:27:12 -08:00
944ffb5968 [Model Runner V2][Minor] Remove redundant do_spec_decode field (#35039) Nick Hill 2026-02-22 16:18:04 -08:00
2bcf71b9c0 [Spec Decode] Reduce TP communication for speculative decoding draft token generation (#34049) qizixi 2026-02-22 14:59:16 -08:00
b7892a3bef [Model] Add NVFP4 quantization support for Step3.5-Flash (#34478) tacos8me 2026-02-22 14:30:46 -05:00

... 14 15 16 17 18 ...