Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

ddeec11ba9 [Bugfix][P/D] Reduce num_threads used by nixl ucx backend (#27196) David Whyte-Gray 2025-10-21 13:41:52 -04:00
86ed77022d [Feature] Batch Invariant for R1 TP 8 on Blackwell (#27229) Wentao Ye 2025-10-21 13:25:55 -04:00
aa1356ec53 [ROCm] Update Triton, Torch, and AITER branches for ROCm base Dockerfile (#27206) Micah Williamson 2025-10-21 11:01:23 -05:00
ecc3c0940a Add @pavanimajety to .github/codeowners for Flashinfer, ModelOpt related code (#27213) Pavani Majety 2025-10-21 07:59:53 -07:00
ba09652de2 [ROCM] Enable CompressedTensorsWNA16 (#27187) JartX 2025-10-21 16:43:23 +02:00
bd66b8529b [CI] Install pre-release version of apache-tvm-ffi for flashinfer (#27262) Harry Mellor 2025-10-21 15:23:56 +01:00
6c728f7771 [Chore] Separate out NCCL utilities from vllm.utils (#27197) dongbo910220 2025-10-21 21:18:23 +08:00
80e9452984 [Deepseek v3.2] Optimize top_k_per_row (#26763) Daniel Cámpora 2025-10-21 10:30:07 +02:00
c3a2c6ac5f [MM][Core] Decouple ViT backend from LM backend (#27061) Roger Wang 2025-10-21 00:30:10 -07:00
72f431e709 [Nixl] Minor refactor to handshake related metadata (#26410) Nicolò Lucchesi 2025-10-21 09:07:47 +02:00
be4445072c [Fix][Spec Decode] Fix llama4 draft loading with different quantization (#27136) Zebing Lin 2025-10-21 02:19:00 -04:00
f381cf2302 [Bugfix] Fix broken MTP weight loading for FP8 KV Scales (#27227) Benjamin Chislett 2025-10-21 01:51:44 -04:00
5ff5d94e77 [Bugfix] Fix gpt-oss w4a8 DP/EP on B200 (#26729) Varun Sundar Rabindranath 2025-10-21 01:51:14 -04:00
f95da13c3d [ModelOpt] Load w13/w2_input_scale for all experts, nvfp4 (#26135) Shu Wang 2025-10-21 00:50:31 -05:00
aef368aa08 [BugFix] GPT-OSS Attention DP + MoE TP weight loading issue (#24032) Po-Han Huang (NVIDIA) 2025-10-21 12:03:47 +08:00
5f6cbf60d6 [Feature][Kernel]FusedMoE LoRA (#21229) Chen Wu 2025-10-21 11:01:37 +08:00
3ada34f9cb [Frontend] Enforce tokenize=False when applying chat template (#27205) Russell Bryant 2025-10-20 22:57:34 -04:00
0eb8f2b880 create is_in_the_same_node on cpu (#26832) Lunwen He 2025-10-20 19:04:14 -07:00
163965d183 [cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 (#27183) Fadi Arafeh 2025-10-21 03:02:58 +01:00
a03cf9bc70 [V0 Deprecation] Remove V0 metrics code (#27215) Nick Hill 2025-10-20 19:02:10 -07:00
352c0c8a28 [Quantization] Automatically infer AWQ modules_to_not_convert field (#26909) Isotr0py 2025-10-21 09:49:28 +08:00
bfe0b4bd2a [ez] add uv lock to gitignore (#27212) Andrew Xia 2025-10-20 17:37:44 -07:00
58fbbcb2f5 [ROCm] enable some tests in entrypoints test groups on AMD (#26725) Concurrensee 2025-10-20 19:37:16 -05:00
87778d5f00 [Feature][Quantization] auto_round support for mixed bits quantization (#23812) Heng Guo 2025-10-21 06:23:30 +08:00
f9e7ad5400 [Bugfix][CI] Fix Distributed Tests (4 GPUs) async_sched+ray test (#27195) Nicolò Lucchesi 2025-10-20 18:34:54 +02:00
4d0f266113 [Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) (#26268) shivampr 2025-10-20 07:48:01 -07:00
e93ff6c8b9 Nemotron Nano V2 VL + EVS Video Support (#27107) Eugene Khvedchenya 2025-10-20 17:19:11 +03:00
1c691f4a71 AArch64 CPU Docker pipeline (#26931) ioana ghiban 2025-10-20 13:09:40 +02:00
9fce7bee74 [Kernel] Accelerate solve_tril with TMA (#26746) Jiangyun Zhu 2025-10-20 13:39:02 +08:00
b63f2143f8 [LoRA] LoRA cuda graph specialization (#25914) Andy Lo 2025-10-20 05:21:09 +01:00
f32bf7582e [Model][VLM] Support Bee-8B Model (#27012) Yi Zhang 2025-10-20 10:31:26 +08:00
8a81d776ce Fix typo in ValueError message: use kv_role instead of kv_disagg_role (#27166) Yongtao Huang 2025-10-20 03:47:19 +08:00
f6fdacd82c [Bugfix] Fix error with penalties when speculative decoding and structural output are enabled (#26586) Sergei Skvortsov 2025-10-19 20:24:46 +01:00
d31f7844f8 [Misc] Move utils to avoid conflicts with stdlib, and move tests (#27169) Cyrus Leung 2025-10-19 20:20:55 +08:00
7a6c8c3fa1 [Chore] Separate out vllm.utils.network_utils (#27164) iAmir97 2025-10-19 17:06:32 +07:00
221bf72577 output type conversion fix (#27159) Jianyu Huang 2025-10-19 01:10:07 -07:00
b3aba04e5a [Benchmark] Convenience script for multiple parameter combinations (#27085) Cyrus Leung 2025-10-19 14:57:01 +08:00
8a297115e2 [Chore] Separate out hashing utilities from vllm.utils (#27151) dongbo910220 2025-10-19 11:09:38 +08:00
191eed0bb9 [BugFix] Fix lazy imports involving outlines_core (#27158) 22quinn 2025-10-18 19:35:32 -07:00
fb860670da [Minor] Remove unused env variable (#27161) Woosuk Kwon 2025-10-18 18:48:35 -07:00
83e760c57d [V1][Metrics][Plugin] Add plugin support for custom StatLoggerBase implementations (#22456) Tova Movshovitz 2025-10-19 01:12:46 +03:00
c2bba69065 [BugFix] Disable fp8 kv-cache by default for DeepSeek V3.2 (#27121) Lucas Wilkinson 2025-10-18 18:05:23 -04:00
e133d6d218 [BugFix] fix graph partition signature (#27139) Boyuan Feng 2025-10-18 14:34:36 -07:00
a1946c9f61 [Chore] Separate out profiling utilities from vllm.utils (#27150) dongbo910220 2025-10-19 03:12:01 +08:00
9f020f4f31 [BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] (#27111) Lucas Wilkinson 2025-10-18 14:44:39 -04:00
3b45075206 [Minor] Add some clarifying comments to recent changes (#27130) Nick Hill 2025-10-18 09:52:45 -07:00
168e578efc Fix incorrect string formatting in barrier timeout exceptions (#27149) Yongtao Huang 2025-10-19 00:51:57 +08:00
6ac5e06f7c [Chore] Clean up pytorch helper functions in vllm.utils (#26908) Isotr0py 2025-10-19 00:48:22 +08:00
5c2acb270a [Models][QwenVL] Remove unnecessary .contiguous() calls (#27106) Lukas Geiger 2025-10-18 16:05:05 +02:00
b26b70bec4 [Misc] Refactor get_kv_cache_spec into AttentionLayerBase (#26587) Nicolò Lucchesi 2025-10-18 15:51:21 +02:00
ab4be40fc5 [fix][cpu] fix prefill attention in CPU attention backend (#27035) Fadi Arafeh 2025-10-18 14:30:21 +01:00
245e4f2c01 [Feature] Batch Invariant: Support DeepGEMM and Blackwell (#27127) Wentao Ye 2025-10-18 09:28:05 -04:00
1d165d6d85 [Chore] Separate out vllm.utils.mem_utils (#27143) iAmir97 2025-10-18 17:06:59 +07:00
83004020fd [Test] Add test for /health endpoint on engine failure (#26074) dongbo910220 2025-10-18 17:59:05 +08:00
12e21701e7 [DOC][FEATURES][CPU]update cpu feature for v1 (#27135) Chendi.Xue 2025-10-18 03:10:45 -05:00
30a33b92ee [Misc] Rev DeepEP (#27122) Varun Sundar Rabindranath 2025-10-18 02:54:29 -04:00
7c572544e4 [GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot (#25515) Hanchenli 2025-10-17 21:55:54 -07:00
c312320764 [CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests (#26663) Huamin Li 2025-10-17 21:11:26 -07:00
c981f0ea78 [Perf] Add H100 fused MoE config (#25398) ZiTian Zhao 2025-10-18 10:21:27 +08:00
6367bde739 [BugFix][Core] Fix error when enable async-scheduling in multi-node env (#25887) Lehua Ding 2025-10-18 06:16:18 +08:00
f50cc221ea [Test] Make test_failure more stable for batch invariance (#27054) Wentao Ye 2025-10-17 16:59:08 -04:00
acedc74b1a [V1][Spec Decode] Fix greedy temperature detection after sampler refactor (#27077) Pradyun92 2025-10-17 16:27:47 -04:00
d29483b58a [Minor] Remove unnecessary error message (#27115) Zhuohan Li 2025-10-17 13:02:12 -07:00
950cf9e58e [Bugfix] Use PIECEWISE cudagraphs on Blackwell if max_model_len > 131072 (#27114) Michael Goin 2025-10-17 15:47:18 -04:00
3125d79950 [Chore] Remove unused PolyNorm layer (#27110) Isotr0py 2025-10-18 03:03:43 +08:00
e33ee23ee3 [Bugfix] [AITER] [ROCm] Fix Quark MoE Quant Config and AITER Fused MoE quant type logic (#27029) vllmellm 2025-10-18 02:51:10 +08:00
b10c64c834 [ROCm][Bugfix][Model] Fix illegal memory access when running qwen3_moe models with rms_norm (Qwen3-235B-A22B, Qwen3-30B-A3B, etc.) (#26192) rasmith 2025-10-17 13:17:18 -05:00
0925b28a8e [ROCM] MoE fp4 CK kernel (#26545) Aleksandr Malyshev 2025-10-17 11:06:33 -07:00
99722d5f0e [CI] Remove forbidden slash (#27112) Nicolò Lucchesi 2025-10-17 18:38:00 +02:00
4c91a28e30 [bugfix] Qwen3-VL fix video incorrect timestamp calculations while do_sample_frames=True (#27104) 燃 2025-10-18 00:26:33 +08:00
b038d9c40c [Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) (#26367) Patrick von Platen 2025-10-17 17:24:42 +02:00
2ba60ec7fe [CI] Nixl integration tests (#27010) Nicolò Lucchesi 2025-10-17 16:13:31 +02:00
bd7157a071 [torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604) Luka Govedič 2025-10-17 10:10:23 -04:00
be429d0cfd Fix incorrect docstring for stop_profile() method (#27101) Yongtao Huang 2025-10-17 21:30:23 +08:00
c253745eb8 [Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 (#25586) Reima Karhila (AMD) 2025-10-17 14:56:12 +03:00
daec4d2624 [Model]Improve Qwen3VLMoeForConditionalGeneration packed_modules_mapping (#27096) Jee Jee Li 2025-10-17 19:47:00 +08:00
6c9fdbf725 [Docs] Replace rst style double-backtick with md single-backtick (#27091) Harry Mellor 2025-10-17 10:47:34 +01:00
483ea64611 [Docs] Replace all explicit anchors with real links (#27087) Harry Mellor 2025-10-17 10:22:06 +01:00
e20eba753b [VLM][Refactor] Remove useless func get_input_positions in MRotaryEmbedding (#27088) Mengqing Cao 2025-10-17 17:00:30 +08:00
bbc1b29665 Update troubleshooting.md and remind VLLM_TRACE_FUNCTION usage (#27069) cong-meta 2025-10-17 01:53:06 -07:00
acb1bfa601 [CI] fix docs build failed (#27082) Chauncey 2025-10-17 15:53:40 +08:00
75c7ad9918 [Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717) zhrrr 2025-10-17 15:30:35 +08:00
5550ff9c25 [CI/Build] Update compressed tensor test path to fix CPU CI (#27068) Li, Jiang 2025-10-17 13:34:56 +08:00
3aeb19a39e [Model] Add support for LightOnOCR (#26916) Said Taghadouini 2025-10-17 07:05:24 +02:00
8c017b3490 [Model] Always use Transformers backend for PaliGemma and Gemma3-MM (#26715) Cyrus Leung 2025-10-17 13:03:35 +08:00
9c2c2287a0 [CI/Build] Update Llama4 eval yaml (#27070) Zhewen Li 2025-10-16 21:59:47 -07:00
fec2b341ad [Kernel] Lazy import FlashInfer (#26977) Jee Jee Li 2025-10-17 12:48:18 +08:00
87bc0c492f [Bugfix] Fix ReplicatedLinearWithLoRA (#27065) Jee Jee Li 2025-10-17 12:43:16 +08:00
fe3b9372ad [Core] Change execute_model_with_error_logging() to be a ctx manager (#27060) Nick Hill 2025-10-16 20:45:32 -07:00
bde9e2272a [Bugfix][Qwen] fixes the weights dtype in qwen3_next: it is actually a bfloat16 (#27030) Tao He 2025-10-17 11:37:52 +08:00
08405609cc disable graph partition in custom op (#26952) Boyuan Feng 2025-10-16 20:08:47 -07:00
ab81379ea6 [Perf] Exploit out-of-band buffers in shm_broadcast (#26961) Nick Hill 2025-10-16 20:08:03 -07:00
4ffd6e8942 [Docs] Reduce custom syntax used in docs (#27009) Harry Mellor 2025-10-17 04:05:34 +01:00
965c5f4914 vllm bench serve shows num of failed requests (#26478) Tomas Ruiz 2025-10-17 04:55:09 +02:00
4d055ef465 Remove unused imports (#26972) Lukas Geiger 2025-10-17 03:51:17 +01:00
17c540a993 [torch.compile] fix simple inductor graph partition test (#27050) Boyuan Feng 2025-10-16 18:09:36 -07:00
4d4d6bad19 [Chore] Separate out vllm.utils.importlib (#27022) Cyrus Leung 2025-10-17 08:48:59 +08:00
11ae016bd7 [torch.compile] Passing only necessary compilation config to inductor pass config (#27041) Lucia Fang 2025-10-16 17:01:52 -07:00
41d3071918 [NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714) jiahanc 2025-10-16 16:20:25 -07:00
fb5e10d3fb Refactor Transformers backend to use mixins (#26906) Harry Mellor 2025-10-16 22:50:39 +01:00

... 51 52 53 54 55 ...