Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

e09546cf05 [Frontend] Exploit tokenizers "new stream" in FastIncrementalDetokenizer (#34217) Nick Hill 2026-02-11 02:03:24 -08:00
786806dd44 [Doc] Update Marlin support matrix for Turing (#34319) Tianqi Ren 2026-02-11 17:03:41 +08:00
79504027ef [Misc] Bump fastsafetensors version for latest fixes (#34273) Nick Hill 2026-02-11 00:30:09 -08:00
addac0e653 [torch.compile] Enable AR+rms fusion by default available for -O2 (#34299) Luka Govedič 2026-02-11 03:30:00 -05:00
675a22ed66 [Chore] Move BaseRenderer to base.py (#34308) Cyrus Leung 2026-02-11 16:29:51 +08:00
cb9574eb85 [XPU][9/N] clean up existing ipex code/doc (#34111) Kunshang Ji 2026-02-11 16:27:15 +08:00
21dfb842d7 [model] support FunASR model (#33247) AllenDou 2026-02-11 15:37:09 +08:00
d1b837f0ae [CPU] Enable FP16 (Half dtype) support for s390x (#34116) R3hankhan 2026-02-11 12:11:42 +05:30
0b20469c62 [Bugfix] Fix weight naming in Qwen3.5 (#34313) Roger Wang 2026-02-10 21:37:14 -08:00
d7982daff5 [Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides (#34279) Tyler Michael Smith 2026-02-11 00:15:52 -05:00
9b17c57460 [ModelBash][DSR1 NVFp4] Removed Bf16 Bias Cast (#34298) Robert Shaw 2026-02-11 00:00:00 -05:00
1b3540e6c6 Threshold fix wvSplitk for occasional CI fails (#34013) Hashem Hashemi 2026-02-10 19:59:14 -08:00
7a048ee65f [Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 (#34149) Matthias Gehre 2026-02-11 04:58:56 +01:00
c9a1923bb4 [Plugin] Simplify IO Processor Plugin interface (#34236) Cyrus Leung 2026-02-11 11:47:39 +08:00
b482f71e9f [XPU][7/N] enable xpu fp8 moe (#34202) zofia 2026-02-11 11:33:59 +08:00
1485396abb [Kernel] Apply 256bit LDG/STG To Activation Kernels (#33022) Дзержи́нский 2026-02-11 11:31:51 +08:00
5ee5c86eeb [Bugfix][DeepSeek-V3.2] fix fp8 kvcache type cast (#33884) Kebe 2026-02-11 12:31:36 +09:00
b5dcb372e4 [Misc] Clean up validation logic in input processor (#34144) Cyrus Leung 2026-02-11 11:29:29 +08:00
066c6da6a0 [WideEP] Fix nvfp4 DeepEP High Throughput All2All backend (#33738) Tyler Michael Smith 2026-02-10 22:15:43 -05:00
e30cedd44b [torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter (#34093) Richard Zou 2026-02-10 22:15:40 -05:00
3bcd494ef4 [Redo] Add --trust-remote-code to dataset bench args (#34251) Cyrus Leung 2026-02-11 11:10:12 +08:00
0e725a7d22 [Bugfix] Fix Worker.load_model context-manager composition for sleep mode (#34021) tianshu-Michael-yu 2026-02-10 19:07:51 -08:00
ba0511fd80 [Misc] Add run one batch script that supports profiling (#32968) Lucas Wilkinson 2026-02-10 19:29:49 -07:00
4a1550d22d [ROCm][CI] Fix test_sequence_parallel.py location in AMD CI pipeline (#34280) Micah Williamson 2026-02-10 19:08:11 -06:00
d1481ba783 [MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344) bnellnm 2026-02-10 19:51:07 -05:00
dc6de33c3d [CI] Add pip caching to cleanup_pr_body workflow (#32979) 7. Sun 2026-02-11 08:45:28 +08:00
c4b9e6778f [Misc] Add pre-commit hook to catch boolean ops in with-statements (#34271) Tyler Michael Smith 2026-02-10 18:13:20 -05:00
341eed3d30 [torch.compile] Disable recursive pre_grad_passes (#34092) Richard Zou 2026-02-10 18:02:31 -05:00
6f2f59f2b3 [Misc][Spec Decode] support different load config for draft model (#34022) Zhengkai Zhang 2026-02-10 14:52:43 -08:00
bb2fc8b5e7 [BugFix] Fix async EPLB hang with DeepEP LL all2all backend (#32860) Ilya Markov 2026-02-10 23:34:47 +01:00
67132945bb [Perf] Move eplb rebalance algo to async thread (#30888) Ilya Markov 2026-02-10 23:19:10 +01:00
f0ca0671c7 [Feature] Warn about unrecognized environment variables (#33581) Gregory Shtrasberg 2026-02-10 15:45:38 -06:00
578977bb5e [SM100] Resubmit FMHA FP8 prefill for MLA (#31195) Pavani Majety 2026-02-10 13:18:43 -08:00
9615575afc [Bugfix] Fix mamba cache dtype for Qwen3.5 (#34200) Roger Wang 2026-02-10 13:12:31 -08:00
4293c00b84 [Benchmarks] Fix attention benchmark smoke test (#34269) Matthew Bonanni 2026-02-10 16:04:07 -05:00
506ad7d7c1 [Bugfix] Fix weights offloading for sleep mode (#32947) J Seppänen 2026-02-10 22:38:17 +02:00
fdd6f2ad58 Convert online APIs to use Renderer (#34084) Reagan Lee 2026-02-10 11:44:31 -08:00
33bcd3dc3b [Misc] Introduce ec_both role EC (encoder cache) connector (#34182) Qi Wang 2026-02-10 10:55:35 -08:00
1f5febb4b8 [UX nit] Fix non-default api_server_count message (#34152) Michael Goin 2026-02-10 13:35:58 -05:00
ae871ca923 Minor cleanup for Voxtral (#34247) Andy Lo 2026-02-10 18:18:30 +00:00
a2443de5fa [Model Runner V2] Use pinned memory for write_contents (#34222) Woosuk Kwon 2026-02-10 08:55:22 -08:00
f84a2a8f31 [Docs] Speed up build environment set-up (#34240) Harry Mellor 2026-02-10 17:34:43 +01:00
000214c4bb [BUGFIX] Fix accuracy bugs in Qwen3-Next MTP (#34077) Vadim Gimpelson 2026-02-10 19:57:11 +04:00
c5a66d1697 [Core][BugFix] Fix PP KV cache sharding memory validation (#33698) junuxyz 2026-02-11 00:46:24 +09:00
afdce12c89 [Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680) Roberto L. Castro 2026-02-10 16:29:52 +01:00
82e11973cc [compile] Enable AOT compile with 2.10 in trunk. (#34155) Zhengxu Chen 2026-02-10 10:24:42 -05:00
b129136c7a [ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008) xuebwang-amd 2026-02-10 23:08:05 +08:00
599e4335a4 Support benchmarking of Geospatial models (#33922) mgazz 2026-02-10 15:04:16 +00:00
a1946570d8 add --insecure arg to the vllm bench to skip TLS (#34026) Fan Yang 2026-02-10 06:23:52 -08:00
d0bc520569 Bump mamba-ssm version in CI for Transformers v5 compatibility (#34233) Harry Mellor 2026-02-10 14:46:01 +01:00
748625cdaf [V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input (#34220) Krish Gupta 2026-02-10 18:35:32 +05:30
61413973e8 Stop testing for slow tokenizers as they will not exist soon (#34235) Harry Mellor 2026-02-10 13:08:20 +01:00
94de871546 [Misc] allow specify is_mm_prefix_lm in hf_config (#34215) Phúc H. Lê Khắc 2026-02-10 18:16:21 +07:00
e042d7e685 Add flagos in MiniCPM-o (#34126) tc-mb 2026-02-10 18:51:48 +08:00
ae4e280602 [Bugfix] Fix FI kernelchunk_gated_delta_rule output shape for Qwen3.5 (#34219) Roger Wang 2026-02-10 02:41:24 -08:00
cbea11c9f0 [Docs] Fix format error in KV load failure recovery doc (#34137) zzaebok 2026-02-10 18:16:26 +08:00
2c32558a3c [Bugfix] Fix --trust-remote-code conflict (#34218) Cyrus Leung 2026-02-10 16:29:10 +08:00
5f970120f0 [Bugfix] Fix memory inconsistency in cross-process shared memory (#32022) Zetong Li 2026-02-10 16:22:03 +08:00
998e2d91f8 Revert #34208 (#34216) Cyrus Leung 2026-02-10 15:59:04 +08:00
e1060a71a1 [Perf] Optimize detokenizer python logic (#32975) Wentao Ye 2026-02-10 02:54:41 -05:00
97fa8f6590 [BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387) Chen Zhang 2026-02-09 23:41:16 -08:00
dab1de9f38 [Frontend][CI] Consolidate instrumentator entrypoints (#34123) wang.yuqi 2026-02-10 15:30:19 +08:00
8d48d0a9d9 [Bugfix] Sort hf_weights_files in fastsafetensors_weights_iterator to match #33491 (#34190) Balaxxe 2026-02-10 00:06:30 -07:00
9608844f96 [responsesAPI] fix simpleContext streaming output_messages (#34188) Andrew Xia 2026-02-09 22:53:07 -08:00
f69b903b4c [Bugfix] Add --trust-remote-code to dataset bench args (#34208) Cyrus Leung 2026-02-10 14:37:50 +08:00
81e217fe6b [Bugfix] Fix DP Attention Padding in Dummy Run (#34187) Lucas Wilkinson 2026-02-09 21:29:39 -08:00
ab97bcf662 [CI/Build] Relax test_mcp_tool_call (#34204) Cyrus Leung 2026-02-10 13:18:57 +08:00
25e48a3aae [Doc] Update usage of --limit-mm-per-prompt (#34148) Cyrus Leung 2026-02-10 13:12:13 +08:00
8a5e0e2b2b [Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183) Roger Wang 2026-02-09 21:03:32 -08:00
4cde2e0159 [ROCm][Bugfix] Resolve Dynamo tracing crash from amdsmi calls in on_gfx* arch detection (#34108) Andreas Karatzas 2026-02-09 22:50:20 -06:00
047a457fa4 [Bugfix] Adopt ChunkGatedDeltaRule for Qwen3.5 (#34198) Roger Wang 2026-02-09 19:47:54 -08:00
e94ec59733 [LMCache] Token Base IPC API (#34175) Yuwei An 2026-02-09 17:18:42 -08:00
13397841ab [structured output] validate unsupported json features first (#33233) Ning Xie 2026-02-10 07:49:09 +08:00
c60f8e3b49 [Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available (#34153) Gregory Shtrasberg 2026-02-09 17:38:54 -06:00
5e75a14a66 [Doc] Add DCP support to attention backend doc (#33936) Michael Goin 2026-02-09 18:33:43 -05:00
e7e52781ff [ModelRunner V2][BugFix] Fix max_query_len calculation (#34167) Nick Hill 2026-02-09 13:47:17 -08:00
bb9f97308d [torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. (#33945) Charlie Fu 2026-02-09 15:15:43 -06:00
4d39650961 [ROCm] update triton branch to support gpt-oss models for gfx11xx devices (#34032) Hongxia Yang 2026-02-09 14:36:30 -05:00
8fd31f6245 [Bugfix] Voxtral prompt/audio placeholder alignment (#34140) Artus Krohn-Grimberghe 2026-02-09 20:30:38 +01:00
eadb4e868b [Bugfix] Avoid duplicate k-proj weight emission in helper (#34142) Artus Krohn-Grimberghe 2026-02-09 20:17:44 +01:00
285bab4752 [Kernel] use flashinfer for gdn prefill (#32846) Jiangyun Zhu 2026-02-10 01:17:25 +08:00
995bbf38f1 [Bugfix] Fix shared expert input for latent MoE in EP+DP (Nemotron-H) (#34087) TomerBN-Nvidia 2026-02-09 18:44:18 +02:00
d4f123cc48 [Kernel] FlashInfer: switch allreduce fusion to unified API (#33985) Mohammad Miadh Angkad 2026-02-09 23:43:24 +08:00
cb62e86f83 Add NUMA Core binding in nixl_connector for CPU xPyD (#32365) ZhengHongming888 2026-02-09 07:39:12 -08:00
781ddf7868 [CI][torch.compile] Fix incorrect filtering for E2E fusion tests on B200 (#34031) Luka Govedič 2026-02-09 10:05:14 -05:00
64a9c2528b [UX] Add --language-model-only for hybrid models (#34120) Roger Wang 2026-02-09 06:57:33 -08:00
d0d97e2974 [Misc] Fix up attention benchmarks (#33810) Lucas Wilkinson 2026-02-09 06:42:03 -08:00
9562912cea [MODEL] Adding Support for Qwen3.5 Models (#34110) JJJYmmm 2026-02-09 21:12:58 +08:00
9bdb06b436 [XPU][6/N] add xpu scaled_mm kernel (#34117) zofia 2026-02-09 20:17:35 +08:00
caad9f1e01 [Fix] [CPU Backend] : Prepack weights for w8a8 oneDNN matmul (#33901) Nikhil Gupta 2026-02-09 10:04:41 +00:00
1d5922fade [ASR] Fix audio benchmark and add RTFx metric (#32300) Ekagra Ranjan 2026-02-09 05:02:37 -05:00
3025b3cebb [CI] Remove empty image_size_factors for fuyu, glm4_1v, glm_ocr (#34107) Andreas Karatzas 2026-02-09 03:37:04 -06:00
978a37c823 [Model] GLM adaptation (#34124) Jee Jee Li 2026-02-09 17:32:52 +08:00
5a5c43511a fix(cpu): fix mla_decode compilation on x86 without AVX512 (#34052) ihb2032 2026-02-09 16:55:41 +08:00
d9bede0314 [BugFix] Fix fastsafetensors TP all procs using all GPUs (#34070) Nick Hill 2026-02-08 23:15:46 -08:00
22b64948f6 [Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127) v0.16.0rc1 wang.yuqi 2026-02-09 14:42:38 +08:00
7c233dbb36 [Tiny] Rename encoder budget file to more specific name (#34103) Reagan Lee 2026-02-08 19:48:19 -08:00
a75a5b54c7 [bug-fix] supported_tasks is breaking backward compatibility at init_app_state (#34027) kourosh hakhamaneshi 2026-02-08 17:46:46 -08:00
f97ca67176 [Release 2.10] Update to Torch 2.10 - final release (#30525) Andrey Talman 2026-02-08 16:51:09 -05:00
084aa19f02 Add support for ModelOpt MXFP8 dense models (#33786) danisereb 2026-02-08 21:16:48 +02:00

... 18 19 20 21 22 ...