Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

aa84e43ccb [Pixtral] Enable Pixtral language model support Eagle3 (#37182) Rémi Delacourt 2026-03-20 16:50:15 +01:00
5e806bcf54 [Bugfix] Fix ConchLinearKernel channelwise quantization (group_size=-1) (#37329) Matthias Gehre 2026-03-20 16:32:21 +01:00
56a62c310c [Bugfix] Reject channelwise quantization (group_size <= 0) in ExllamaLinearKernel (#37331) Matthias Gehre 2026-03-20 16:31:57 +01:00
1779c09898 [ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode (#34709) L.B.R. 2026-03-20 15:11:23 +00:00
44eea10f68 [ROCm][Quantization] make quark ocp mx dtype parser robust for weight-only quantization (#36232) xuebwang-amd 2026-03-20 23:10:03 +08:00
8b6c6b9505 [Model] Add LFM2-ColBERT-350M support (#37528) Ilya Boytsov 2026-03-20 15:57:57 +01:00
9f6d9dd371 Fix attribute error in isaac_patch_hf_runner (#37685) Harry Mellor 2026-03-20 14:49:40 +00:00
dd20ee4e3e [UX] Enable torch_profiler_with_stack (#37571) Jee Jee Li 2026-03-20 19:17:26 +08:00
0523449c9c [Misc] Use logger.info_once for auto tool choice log message (#37661) Chauncey 2026-03-20 18:40:36 +08:00
b4c1aef21c [Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ (#37500) Flora Feng 2026-03-20 05:50:34 -04:00
6050b93bed [Refactor] Move serve entrypoint tests under tests/entrypoints/serve/ (#37595) Flora Feng 2026-03-20 05:10:47 -04:00
5a4a179591 [ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible attention backend (#37611) Andreas Karatzas 2026-03-20 04:07:26 -05:00
37cd9fc107 [ROCm][CI] Remove deepep DBO tests on gfx90a (#37614) Andreas Karatzas 2026-03-20 04:07:07 -05:00
9cfd4ebb5e [ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (#37619) Andreas Karatzas 2026-03-20 04:06:53 -05:00
ed359c497a [Model] Deprecate the score task (this will not affect users). (#37537) wang.yuqi 2026-03-20 16:07:56 +08:00
dcee9be95a [Model Runner V2] Fix draft logits not populated during cudagraph replay (#37639) Giancarlo Delfin 2026-03-20 00:43:47 -07:00
bd8c4c0752 [CI] Removing deprecated rlhf examples reference (#37585) Andreas Karatzas 2026-03-20 02:20:33 -05:00
0140eafb15 [Bug] Fix FlashInfer allreduce fusion workspace uninitialized error (#37461) Wei Zhao 2026-03-20 03:09:21 -04:00
bdf6a0a57b [XPU] bump vllm-xpu-kernels to v0.1.4 (#37641) Kunshang Ji 2026-03-20 15:04:38 +08:00
0674d1fee7 [PluggableLayer][MM] Add PluggableLayer for CustomQwen2Decoder (#37293) Wangbei25 2026-03-20 14:24:07 +08:00
30108fc8b0 [Model] Refactor Step3-VL processor to HF style (#37579) Cyrus Leung 2026-03-20 14:05:08 +08:00
e2d1c8b5e8 [Refactor] Relocate entrypoint tests to match serving code structure (#37593) Flora Feng 2026-03-20 01:31:23 -04:00
6951fcd44f [XPU] Automatically detect target platform as XPU in build. (#37634) Huanxing 2026-03-20 13:30:15 +08:00
39474513f6 [Model Runner V2] fix draft attention metadata generation (#37364) Giancarlo Delfin 2026-03-19 21:05:15 -07:00
638a872d77 fix(xpu): Re-compute compile ranges after platform-specific config updates (#37523) Yuxiang Liang 2026-03-20 11:52:35 +08:00
9040151fe1 [V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612) Flora Feng 2026-03-19 23:31:43 -04:00
8fbe3f303f [Bugfix][LoRA] Fix Qwen35 LoRA (#36976) Jee Jee Li 2026-03-20 11:09:32 +08:00
ea2c148fa7 [compile][graph_partition]Add tensor size handling (#36038) Xiao 2026-03-19 19:55:25 -07:00
47b7af0d87 [Feat] Enable CompressedTensorW4A8Int for XPU (#37207) Tianmu Li 2026-03-19 19:34:28 -07:00
269bf46d99 fix: disambiguate multimodal prefix cache keys (#36708) tianshu-Michael-yu 2026-03-19 19:33:20 -07:00
e5a77a5015 [CI] Update mergify tool-calling label paths (#37478) Flora Feng 2026-03-19 22:22:23 -04:00
ca1ac1a4b4 Fix DP coordinator ZMQ port TOCTOU (#37452) Itay Alroy 2026-03-20 02:58:31 +02:00
4ca3fa6bb4 [ROCm][Bugfix] fix cache block size mismatch for aiter unified attention (#37606) Divakar Verma 2026-03-19 20:00:08 -04:00
be12afd284 [Bugfix] Fix Deepseekv32 tool parser when stream interval > 1 (#36056) Flora Feng 2026-03-19 19:51:25 -04:00
df3c0291a3 [Bug] Fix EmbedIOprocessor "classify" <-> "embed" (#37573) Wentao Ye 2026-03-19 19:40:10 -04:00
2be1a0f74b [Refactor] Remove dead code in pooling model (#37572) Wentao Ye 2026-03-19 19:39:43 -04:00
4120a05ff1 Fix AttributeError in Qwen3.5 GDN layers with quantized models (#37448) Jim Smith 2026-03-19 19:21:14 -04:00
98ff042917 [CI][BugFix][AMD] Don't set VLLM_ROCM_USE_AITER anymore in test_rocm_aiter_topk since its not necessary (#36996) rasmith 2026-03-19 18:12:45 -05:00
bcf2be9612 [cherry-pick][Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#37591)#37605 v0.18.0 khluu 2026-03-19 15:06:38 -07:00
b55156eae9 [Performance] Enable Triton autotuning disk cache by default (#37188) Artem Perevedentsev 2026-03-19 23:36:28 +02:00
112944fab9 test Qwen/Qwen3-4B-Instruct-2507 for unbacked (#36064) Laith Sakka 2026-03-19 14:28:45 -07:00
91be5f9be3 [MoE Refactor] Rename "naive" all2all backend (#36294) bnellnm 2026-03-19 15:50:34 -04:00
4ee847e400 Comment fix for async rl example (#35244) Aaron Hao 2026-03-19 12:46:07 -07:00
040a505ff5 [ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline (#34839) Andreas Karatzas 2026-03-19 14:30:58 -05:00
9279c59a0e [MoE Refactor] DefaultMoERunner simplifcation (#33049) bnellnm 2026-03-19 15:07:44 -04:00
7454096199 [Log] Log once in local node by default (#37568) Wentao Ye 2026-03-19 15:04:59 -04:00
fb8b5e05fc [CI] Add retry with 4x backoff to HTTP fetches for transient failures (#37218) Andreas Karatzas 2026-03-19 14:00:20 -05:00
e5d96dc8fc Fix SpeculatorsConfig now that PreTrainedConfig is a dataclass in Transformers (#37574) Harry Mellor 2026-03-19 18:04:40 +00:00
daa05bf340 [Bugfix] Fix AttributeError when serving MXFP8 models with DeepGEMM installed (#37358) EdalatiAli 2026-03-19 13:58:33 -04:00
7769b58307 [torch.compile][BE][Multimodal] Remove requirement to set_model_tag to avoid cache conflict (#37345) Lucas Kabela 2026-03-19 10:26:12 -07:00
2f9f946b22 [P/D] AnthropicMessages add kv_transfer_params for PD disaggregation (#37535) Chauncey 2026-03-20 00:41:20 +08:00
2890aecce5 [CPU][UX] Do not crash when tcmalloc/libiomp are not ldpreloaded (#37561) Fadi Arafeh 2026-03-19 16:35:45 +00:00
34f093b417 [CI] Gate pre-commit on ready label or number of contributions (#37544) Harry Mellor 2026-03-19 16:21:57 +00:00
4dce8321a9 Run MacOS smoke test on daily cron job instead of every commit (#37567) Harry Mellor 2026-03-19 16:19:50 +00:00
657855ab41 [Misc] Cleanup more configs and processors (#37560) Cyrus Leung 2026-03-19 23:45:23 +08:00
e27b8ba3d1 [Bug] Fix fp8 trtllm MoE modular kernel supported routing methods (#37346) Wei Zhao 2026-03-19 11:43:06 -04:00
40b8363b45 [MRV2] Use fp32 for draft logits (#37526) Woosuk Kwon 2026-03-19 08:41:21 -07:00
8b10e4fb31 [1/n] Migrate permute_cols to libtorch stable ABI (#31509) mikaylagawarecki 2026-03-19 11:27:26 -04:00
104605cbf2 Remove deprecated reasoning_content message field(part-2) (#37480) Ifta khairul Alam Adil 2026-03-19 16:20:08 +01:00
96266f119b [LoRA] Minor improvements to LoRA log (#37557) Jee Jee Li 2026-03-19 23:18:06 +08:00
7c0cf3bcd0 Cap the number of API servers to 1 when using Elastic EP. (#37466) Sage Moore 2026-03-19 07:42:57 -07:00
572b432913 Stop bench CLI from recursively casting all configs to dict (#37559) Harry Mellor 2026-03-19 14:04:03 +00:00
9515c20868 [Misc] Clean up processing logic (#37541) Cyrus Leung 2026-03-19 21:30:20 +08:00
c63ca2b2e6 [Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support (#37438) DorBernsohn 2026-03-19 15:08:00 +02:00
a32eaf5bb2 [CI] Merge cleanup_pr_body.yml and reminder_comment.yml (#37552) Harry Mellor 2026-03-19 12:55:07 +00:00
e390742c59 Fix KV Offloading + MLA AssertionError by using num_kv_heads=1 in cpu… (#37536) XueLiang Yang 2026-03-19 20:05:07 +08:00
7a6ebcbfcf [Model] Remove unnecessary get_language_model (#37545) Cyrus Leung 2026-03-19 20:00:36 +08:00
c7bc12c20f [CI/Build] Split out MM pooling tests (#37542) Cyrus Leung 2026-03-19 19:36:11 +08:00
f9e2a38386 [Docs] Reorganize pooling docs. (#35592) wang.yuqi 2026-03-19 19:25:47 +08:00
4426447bba Don't log exc_info when vLLM tries to doenload a file that doesn't exist (#37458) Harry Mellor 2026-03-19 10:38:29 +00:00
3322e26420 [Bugfix] Avoid more OpenMP thread reallocation in CPU torch compile (#37538) Li, Jiang 2026-03-19 18:24:39 +08:00
765e461065 [Bugfix] Fix Nemotron Parse loading (#37407) Cyrus Leung 2026-03-19 17:55:29 +08:00
6a9cceb219 [Bugfix][ROCm] Fix MoRI + AITER FP8 dispatch compatibility for defer_input_quant (#37418) Duyi-Wang 2026-03-19 17:49:27 +08:00
199f914183 fix(cpu): add null check for aligned_alloc in ScratchPadManager (#37369) yassha 2026-03-19 10:45:06 +01:00
ca21483bf9 [MISC] fix pin_memory=torch.cuda.is_available(), use is_pin_memory_available (#37415) Kunshang Ji 2026-03-19 17:23:24 +08:00
da70c87e81 [CI] Fix wrong path test file, missing rlhf_async_new_apis.py (#37532) TJian 2026-03-19 17:21:55 +08:00
0b6d52629f Support temporal compression for Nemotron-3-VL videos (#36808) Collin McCarthy 2026-03-19 01:02:19 -07:00
d3cc379567 [Perf] Fix slow hasattr in CUDAGraphWrapper.__getattr__ (#37425) Ziming Huang 2026-03-19 15:43:48 +08:00
354cd580d5 fix(anthropic): remove non-standard 'data: [DONE]' from Anthropic streaming (#37510) cdpath 2026-03-19 15:23:35 +08:00
d49f273144 [SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310) zhanqiuhu 2026-03-19 03:22:00 -04:00
b21d384304 [Refactor] Relocate endpoint tests to mirror serving code directory structure (#37504) Flora Feng 2026-03-19 03:19:36 -04:00
e3126cd107 [ROCm] issue management - request information for bug issues on ROCm (#37009) Hongxia Yang 2026-03-18 23:51:29 -04:00
e37ff5b5c8 [Perf] Optimize token_embed for pooling models, 1.0% token throughput improvement (#37347) Wentao Ye 2026-03-18 22:27:51 -04:00
6accb21f2a [bug] Fix deadlock with pause resume and collective_rpc (#37024) Aaron Hao 2026-03-18 18:49:02 -07:00
89138b21cc [Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442) v0.18.0rc2 Elvir Crnčević 2026-03-19 01:28:37 +01:00
6edd43de3c [Bugfix][ROCm] Fix worker startup OOM on ROCm by skipping unreliable cudagraph memory profiling (#36720) JartX 2026-03-17 22:55:34 +01:00
053f3b6309 [Model Runner V2] Spec decode rejection sampler logprobs support (#37237) Giancarlo Delfin 2026-03-18 18:36:27 -07:00
5f82706a21 [BUG] Exclude SKIP_TENSORS from get_layer_size() + new weight sync example for dpep (#37334) Aaron Hao 2026-03-18 17:45:10 -07:00
c32a58cc2a [EPLB] Simplify EPLB rearrange by only returning one map (#36267) Sage Moore 2026-03-18 17:34:00 -07:00
ef2c4f778d [Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442) Elvir Crnčević 2026-03-19 01:28:37 +01:00
9dade5da3a [XPU]Unify xpu test dependencies in dockerfile.xpu (#36477) sihao_li 2026-03-19 08:12:07 +08:00
828f862acb [Bugfix] Expand quantization method support in perf metrics (#37231) Thillai Chithambaram 2026-03-18 19:54:19 -04:00
577df69b26 [Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish (#37054) Andy Lo 2026-03-18 23:07:29 +00:00
04244fd0e1 [Model Runner V2] Spec decode rejection sampler greedy support (#37238) Giancarlo Delfin 2026-03-18 15:59:03 -07:00
9482b0b085 [Bugfix] Remove assertion for NVFP4 scale dynamic range (#37465) Michael Goin 2026-03-18 23:37:49 +01:00
5bc1da147f [LoRA][BugFix] Fix skipped LoRA adapters for Mistral3 (#36928) Woosuk Kwon 2026-03-18 15:34:19 -07:00
0091017188 fix(worker): optimize swap_states to copy only active token prefixes (#34733) Philip Ottesen 2026-03-18 22:59:27 +01:00
0d81a1fe61 [V0 Deprecation] Deprecate virtual engine (#37195) Wentao Ye 2026-03-18 17:30:14 -04:00
6ae4c8d6fc chunk parakeet into 30s clips to prevent OOMs on long audios (#36671) Netanel Haber 2026-03-18 23:22:24 +02:00
a913b612d8 [Bugfix] Fix ROCm crash in qwen3_next multi-stream events (#36795) (#37427) JartX 2026-03-18 21:06:31 +01:00

... 5 6 7 8 9 ...