Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

1ecfabe525 glm 4.6 fused tuned inference config for B200 (#32958) navmarri14 2026-02-08 10:55:47 -08:00
4df841fe75 [torch.compile] Add an option to force-enable the MOE cold start optimization (#33735) Richard Zou 2026-02-08 13:42:56 -05:00
a263aa6140 [BugFix] Change support no act and mul for marlin (#34088) TomerBN-Nvidia 2026-02-08 19:18:22 +02:00
179ae7da8f [Revert] Fix performance regression for GLM-4.7-GPTQ decode and MTP acceptance rate (#33771) aabbccddwasd 2026-02-09 00:13:24 +08:00
c4df59ad43 Add embedding input functionality for disabled modalities [remake] (#32493) Reagan Lee 2026-02-08 04:57:16 -08:00
785cf28fff [ROCm] [CI] Reduce Resource of two test groups (#34059) TJian 2026-02-08 15:17:26 +08:00
a96197f564 [Perf] Simplify DeepseekV32 tokenizer, ensure fast detokenization used (#33855) Nick Hill 2026-02-07 23:16:34 -08:00
ab10d79855 [ROCm][Bugfix] fix act_quant_fusion module import error (#34069) Andreas Karatzas 2026-02-07 21:21:12 -06:00
7fcb705b80 [CI/Build] Skip GCS test (#34057) Cyrus Leung 2026-02-08 00:52:38 +08:00
b956cdf818 [Doc] Fix run_batch docs (#34056) Cyrus Leung 2026-02-07 22:18:16 +08:00
ed17f54c8b Perf tuning and expansion of cases covered for wvSplitKrc (#33493) Hashem Hashemi 2026-02-07 05:33:11 -08:00
860981d8d8 Make directory exist ok for ray spinning up multiple replicas on a single instance (#33604) Jiang Wu 2026-02-07 05:30:49 -08:00
52181baaea Update DeepGEMM version pin in Dockerfile to match #32479 (#33935) zifeitong 2026-02-07 05:30:22 -08:00
de3869bb4d move checks out of unified_kv_cache_update custom op (#33943) Rohan Potdar 2026-02-07 07:30:09 -06:00
ce9b3cd3e9 [PluggableLayer][3/N] Apply PluggableLayer to mamba layers. (#33660) whx 2026-02-07 21:26:05 +08:00
db4ede9743 [Model] Enable Step3p5ForCausalLM testing (#33755) Jee Jee Li 2026-02-07 21:25:24 +08:00
2cb2340f7a [Frontend]Add support for transcriptions and translations to run_batch (#33934) Pooya Davoodi 2026-02-07 05:24:57 -08:00
4df44c16ba Enable Eagle3 speculative decoding for Mistral3ForConditionalGeneration to support eagle3 (#33939) TundeAtSN 2026-02-07 08:24:52 -05:00
81fe69cae5 [torch.compile] Stop compiling identical artifacts (#34003) Richard Zou 2026-02-07 08:24:48 -05:00
dd6a6e1190 [Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune (#34006) Mohammad Miadh Angkad 2026-02-07 21:24:44 +08:00
edb359cce4 [Renderer] Define render_cmpl and render_chat (#34039) Cyrus Leung 2026-02-07 21:24:40 +08:00
6ed5eda300 [CI][Build] Pin grpcio-tools==1.78.0 (#34048) wang.yuqi 2026-02-07 21:24:35 +08:00
11a4c9d30d [Misc] Simplify get_max_tokens (#34036) Cyrus Leung 2026-02-07 16:59:49 +08:00
15a0b9e570 Fix spelling errors (#33978) lukec 2026-02-07 15:58:50 +08:00
c490d8cc73 [ROCm][CI] Pinning lm-eval version to resolve multi-modal small eval bug (#34038) Andreas Karatzas 2026-02-07 00:21:08 -06:00
48312e579a [Misc] Make PlaceholderRange.get_num_embeds a method (#34035) Cyrus Leung 2026-02-07 13:30:17 +08:00
bc32444b23 [Kernel] Add enable_sm120_or_later for SM121 (DGX Spark) CUTLASS support (#33517) Vel 2026-02-06 20:28:01 -08:00
18e8545297 [Revert] Add util handle_deprecated back (#33998) Wentao Ye 2026-02-06 23:14:45 -05:00
6f7adc533a fix description in plugin_system.md (#33999) 果冻虾仁 2026-02-07 11:37:02 +08:00
40218a82ba [ModelRunner V2] Revert token rank comparison difference for now (#34017) Nick Hill 2026-02-06 19:11:05 -08:00
1c3b22058f [Misc] Add backward-compatible import aliases for renamed translations module (#34015) kourosh hakhamaneshi 2026-02-06 19:01:41 -08:00
3920cafdd6 [Bugfix] Fix _fused_moe_lora_expand signature mismatch (#33821) Xin Yang 2026-02-06 18:45:59 -08:00
ec28784fdc [CI][AMD]Bugfix] Check that model_config is not None in enable_norm_pad_fusion (#34007) rasmith 2026-02-06 20:43:25 -06:00
55aeec04f5 [Bugfix] Fix Whisper tokenization (#34011) Nicolò Lucchesi 2026-02-07 03:42:52 +01:00
906077181b [Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 (#33967) Ikenna 2026-02-06 21:27:33 -05:00
89a385d79f [Feat][RL] Pause and Resume with keep requests for single engine (#32351) Aaron Hao 2026-02-06 16:08:58 -08:00
4a2d00eafd [bugfix] [ROCm] Fix premature CUDA initialization in platform detection (#33941) kourosh hakhamaneshi 2026-02-06 14:17:55 -08:00
207c3a0c20 Fix RoutingMethodType logic (#33919) Dimitrios Bariamis 2026-02-06 23:03:34 +01:00
ae2e93f89b [Fix] Fix logprobs=0 handling for /inference/v1/generate endpoint (#34010) Sumanth R Hegde 2026-02-06 12:33:40 -08:00
9e9acce577 [Bugfix] Fix no attribute error of SharedFusedMoE (DeepSeek-V3.1 as test model) (#33993) xuebwang-amd 2026-02-07 03:11:32 +08:00
fe5438200b [Rocm][Bugfix] Fix dtype not same for gemm_a4w4 op (#33734) Charlie Fu 2026-02-06 13:09:59 -06:00
77c09e1130 [Refactor] Remove align block size logic in moe_permute (#33449) Wentao Ye 2026-02-06 13:57:06 -05:00
16786da735 [Model Runner V2] support apply penalty for spec decode (#33251) zhrrr 2026-02-07 02:56:48 +08:00
aaa2efbe98 [DOC] [ROCm] Update docker deployment doc (#33971) vllmellm 2026-02-07 02:05:35 +08:00
aca5967416 [KV Connector] Add missing method overrides to MultiConnector (#33292) Seiji Eicher 2026-02-06 09:58:21 -08:00
67a746e87f [Log] Optimize duplicate startup log (#33944) Wentao Ye 2026-02-06 12:49:56 -05:00
7bec435130 [Bugfix] Fix the issue where tool calling does not work when using fast detokenization with dsv32 (#33964) Chauncey 2026-02-07 01:23:44 +08:00
5c52644b10 [Docs] Update link to Benchmark CLI documentation (#33254) Eldar Kurtić 2026-02-06 17:00:59 +01:00
2ce9fe4ad0 [XPU][5/N] add wna16 xpu kernel (#33973) zofia 2026-02-06 23:59:53 +08:00
cd8b405bd0 [Refactor] Consolidate sequence normalization and enc-dec parsing (#33928) Cyrus Leung 2026-02-06 23:43:47 +08:00
4707f7ebb4 [Model] Support MiniCPM-o 4.5 (#33431) tc-mb 2026-02-06 23:29:10 +08:00
c39ee9ee2b [Docs] Add sections on process architecture and minimum CPU resources (#33940) Michael Goin 2026-02-06 10:26:43 -05:00
350ca72c04 [ROCm][AITER] Fix AITER import regression for explicit backend selection (#33749) Andreas Karatzas 2026-02-06 09:08:16 -06:00
1fb0495a72 [FIX] guidance: use max(vocab_size, len(tokenizer)) for n_vocab (#33509) FredericOdermatt 2026-02-06 15:23:03 +01:00
85ee1d962b [Bugfix] Fix models and tests for transformers v5 (#33977) Raushan Turganbay 2026-02-06 14:47:41 +01:00
51a7bda625 Update WeightTransferConfig to be more standard like the others (#33989) Harry Mellor 2026-02-06 13:15:00 +00:00
6e7b1c4b59 [Docs] Improve documentation (#33799) SorenDreano 2026-02-06 13:57:09 +01:00
2991dd3d22 [Bugfix][Model] Support LoRA on Qwen3 Output Embedding (#29816) Kurt Shuster 2026-02-06 04:25:31 -08:00
ac32e66cf9 [torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731) Luka Govedič 2026-02-06 07:19:49 -05:00
f79d9dce16 [CPU][BugFix] Fix loading of w8a8int models with bias (#33582) Fadi Arafeh 2026-02-06 11:59:20 +00:00
ba5cbbf107 Bump HF Hub client to get bug fix (#33984) Harry Mellor 2026-02-06 11:25:33 +00:00
233b26ab35 [PaddleOCR-VL] Add BC for transformers 5.0 config (#33976) zhang-prog 2026-02-06 18:33:49 +08:00
791a94bed0 Consolidate and fix forbidden import pre-commit checks (#33982) Harry Mellor 2026-02-06 09:47:41 +00:00
e969a169ef support view_from_cpu_tensor on XPU (#33868) Xinyu Chen 2026-02-06 16:34:20 +08:00
6d8d34be6d Fix main pre-commit (#33975) Harry Mellor 2026-02-06 08:08:05 +00:00
1363e3d6d5 [cpu][performance] CPU Paged Attention NEON BFMMLA BF16 Implementation (#32263) Gassan Salama 2026-02-06 07:01:48 +00:00
965525667b Onboard voyage-4-nano (#33720) chengchengpei 2026-02-05 22:23:34 -08:00
6550815c3a [XPU]Replace pip in docker.xpu with uv pip (#31112) sihao_li 2026-02-06 14:02:33 +08:00
7439e4f41b [XPU][4/N] add mxfp4 moe model support (#33679) Kunshang Ji 2026-02-06 13:03:59 +08:00
ac04dd374f [CPU] Add BF16 Kernel type for s390x (#33788) R3hankhan 2026-02-06 10:27:02 +05:30
035a6cb09a [Misc] Update code for encoder-decoder models (#33900) Cyrus Leung 2026-02-06 11:38:39 +08:00
a32cb49b60 feat(frontend): early-fail tokenization guard for user requests (#31366) Mingliang Li 2026-02-06 11:38:02 +08:00
20d7454c9b fix(ROCm): Make flash_attn import optional in MLA attention (#33511) Rabi Mishra 2026-02-06 07:52:53 +05:30
5819ca8944 [Docs] Add reo analytics (#33957) Simon Mo 2026-02-05 17:42:22 -08:00
79028d4388 [Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568) Xin Yang 2026-02-05 17:34:00 -08:00
325ab6b0a8 [Feature] OTEL tracing during loading (#31162) emricksini-h 2026-02-06 01:59:28 +01:00
91a07ff618 [Bugfix] Fix DeepSeek v3.2 tokenizer outputting None issue (#33832) Wei Zhao 2026-02-05 18:50:49 -05:00
d5c4800112 Adds padding and perf improvements to wvSplitK_fp8 (#33527) Hashem Hashemi 2026-02-05 14:16:02 -08:00
42d5d705f9 [Minor] Sort safetensors files to ensure deterministic loading order (#33491) Lumosis 2026-02-05 14:05:09 -08:00
116880a5a0 [Bugfix] Make MM batching more robust (#33817) Cyrus Leung 2026-02-06 04:40:58 +08:00
4145e50d85 [Bugfix] Fix DSV3.2 NVFP4 (#33932) Matthew Bonanni 2026-02-05 14:22:19 -05:00
20f5d185a6 [Misc] Rename translations to speech_to_text for OAI serving component (#33904) Nicolò Lucchesi 2026-02-05 20:16:52 +01:00
1887acca9e Fix tokenizer test for renamed attr on Transformers v5 (#33902) Harry Mellor 2026-02-05 19:16:20 +00:00
92e7562a99 [Bugfix] Suppress non-TTY color output on the process name part of the log (#29714) Tsukasa OI 2026-02-06 03:47:09 +09:00
87d0d17ab5 [Models] Consolidate Deepseek-OCR2 processor (#33909) Isotr0py 2026-02-06 02:29:20 +08:00
a57c8228ff [Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375) bnellnm 2026-02-05 13:07:18 -05:00
1ee95841bd [Bugfix] Fix swapped engine_ids in NIXL Llama 4 local attention path (#33795) zackyoray 2026-02-05 19:51:58 +02:00
7d8c6804e2 [Misc] Add debug logs (#33931) Nicolò Lucchesi 2026-02-05 18:42:40 +01:00
af3162d3aa [Spec Decode] Unified Parallel Drafting (#32887) Benjamin Chislett 2026-02-05 12:37:18 -05:00
5b2a9422f0 [BugFix] Fix LoRA Fp8 (#33879) danisereb 2026-02-05 19:25:55 +02:00
c1858b7ec8 [Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943) Aaron Hao 2026-02-05 09:13:23 -08:00
82914d2ae8 [Bugfix] Fix step3p5 parser when using mtp (#33690) Mario Hong 2026-02-06 00:04:04 +08:00
81a90e5277 [Docs] Add bart-plugin to docs (#33905) Nicolò Lucchesi 2026-02-05 13:20:25 +01:00
1c3a221d3b [Bugfix] Fix corner case of sparse embedding (#33886) wang.yuqi 2026-02-05 18:51:22 +08:00
7bd42e609d [Refactor] Clean up input preprocessing (#33687) Cyrus Leung 2026-02-05 18:43:42 +08:00
a2522839d8 [Bugfix] Fix Kimi-K2.5 NVFP4 checkpoints weight loading (#33876) Isotr0py 2026-02-05 18:29:54 +08:00
59a5cb387a [perf] Integrate flashinfer concat_mla_k (#31171) jiahanc 2026-02-05 18:23:11 +08:00
8322d4e47f Enable Cross layers KV cache layout at NIXL Connector V2 (#33339) liranschour 2026-02-05 12:17:02 +02:00
3e472e81f9 [ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba) (#32710) Andreas Karatzas 2026-02-05 04:01:23 -06:00
038914b7c8 [Refactor] Move task outside of PoolingParams.verify (#33796) Cyrus Leung 2026-02-05 17:33:11 +08:00

... 19 20 21 22 23 ...