Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

5406ebf5c9 [CI] Pooling models mteb test uses enforce_eager (#22878) wang.yuqi 2025-08-15 16:16:15 +08:00
b2c06509e5 [P/D]Provide bucket algorithm rate limiter for proxy_server (#22643) frankie 2025-08-15 15:01:48 +08:00
b2f6c247a9 Revert "[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module." (#22956) TJian 2025-08-14 23:39:19 -07:00
3d232dbd19 [Mamba] - refactor: Renamed mamba_attn to mamba2_attn (#22818) Asaf Joseph Gardin 2025-08-15 09:38:05 +03:00
5c3fbfe46b [Feature] Full Cuda Graph Support for Cutlass MLA and 6% E2E Throughput Improvement (#22763) Wentao Ye 2025-08-15 02:27:30 -04:00
b4cef5e6c7 refactor: Change scaling factors calculation for flashinfer FusedMoE (#22812) amirkl94 2025-08-15 09:19:31 +03:00
0fe85087a9 [CI Perf] Prune tests in tests/kernels/attention/ (#22936) Michael Goin 2025-08-14 23:34:53 -04:00
d2b0e97ea6 [CI Perf] Prune tests in tests/kernels/moe/ (#22939) Michael Goin 2025-08-14 23:33:42 -04:00
590bddbfc5 [CI Perf] Prune tests in tests/kernels/quantization/ (#22942) Michael Goin 2025-08-14 23:25:34 -04:00
ae05a6d83d [BugFix] Fix port lookup in internal DP LB tests (#22252) Nick Hill 2025-08-14 20:17:11 -07:00
0933f9d518 [BugFix][KVConn] Fix use of get_required_kvcache_layout (#22734) Nick Hill 2025-08-14 18:39:43 -07:00
f1f0d2fab8 Revert "[Kernel] Add cuda kernel for gpt_oss activation" (#22948) Simon Mo 2025-08-14 17:38:10 -07:00
81f4b96481 [Kernel] Add cuda kernel for gpt_oss activation (#22538) Jee Jee Li 2025-08-15 08:21:29 +08:00
39cd09dc86 [Bugfix] use flash attn on sm90 (#22933) Yongye Zhu 2025-08-14 19:37:22 -04:00
919234fe17 [BugFix] Fix initial DP request load imbalance (#22910) Nick Hill 2025-08-14 15:20:28 -07:00
ebcce2cd36 [Core] Return final response for aborted requests from AsyncLLM.generate (#22283) Nick Hill 2025-08-14 14:49:02 -07:00
4121de512e [Quantization]: Support compressed-tensors mixed-precision model loading (#22468) Dipika Sikka 2025-08-14 17:32:09 -04:00
279a5f31b3 [Kernel] Add nvfp4 gemm flashinfer backends (#22346) nvjullin 2025-08-15 04:03:55 +08:00
b8ff05361a [CI] Temporarily disable flaky test (#22930) Lucas Wilkinson 2025-08-14 15:59:16 -04:00
637093ae26 docs: update fastsafetensors usage instructions (#22891) Nir 2025-08-14 22:56:54 +03:00
33c63e9547 [Kernel] [Quantization] Add MXFP4 and bias support for marlin kernel (#22428) Jinzhen Lin 2025-08-15 02:23:22 +08:00
ab9f2cfd19 [CI] [Hybrid] Bump min transformers version for Bamba and Jamba (#22908) Thomas Parnell 2025-08-14 20:01:16 +02:00
52c905a3d4 Merge branch 'vllm-project:main' into wye-refactor-quant-folder Wentao Ye 2025-08-14 11:12:23 -04:00
dbe298046c [Bugfix] Fix parsing of --disable-mm-preprocessor-cache (#22909) Cyrus Leung 2025-08-14 23:09:44 +08:00
625ccd1c4d [Bugfix] Replace custom Encoding class with BatchEncoding in MistralTokenizer (#22786) Jiangyun Zhu 2025-08-14 23:09:27 +08:00
92ff41abea [Model] Modify the gate implementation of glm4_moe (#22832) Jee Jee Li 2025-08-14 20:28:50 +08:00
829b9a62d0 [Perf] Dont create unnecessary pooling params (#22876) Lucas Wilkinson 2025-08-14 08:28:09 -04:00
540d54ca8d [CI] Re-enable transcriptions test_long_audio_request (#22890) Nicolò Lucchesi 2025-08-14 13:34:34 +02:00
0783f13960 [Doc] fix dead link (#22898) Daniele 2025-08-14 13:06:13 +02:00
7655dc3e45 [Bugfix] Add reset prefix cache for online serving (#22726) iAmir97 2025-08-14 18:04:18 +07:00
f4efda821d Remove Phi 4 Flash configuration workaround (#22723) Harry Mellor 2025-08-14 12:03:49 +01:00
eb08487b18 [BugFix] Threadsafe close async zmq sockets (#22877) Nick Hill 2025-08-14 03:44:29 -07:00
7c3a0741c6 [Bugfix] Fix PixtralHFImagePixelInputs dynamic shape check (#22827) Isotr0py 2025-08-14 17:35:43 +08:00
00e3f9da46 vLLM Benchmark suite improvement (#22119) Louie Tsai 2025-08-14 00:12:17 -07:00
a353bd083d [CI] remove flaky v0 test (#22864) Robert Shaw 2025-08-14 00:41:51 -04:00
1d20c34717 [CI] Fix tests/distributed/test_ca_buffer_sharing.py (#22849) Ilya Markov 2025-08-14 05:09:30 +02:00
b6af24fba7 [CI][Entrypoints]: add filter to generation to filter out invalid tool calls (#22826) Will Eaton 2025-08-13 23:09:07 -04:00
0ca2393b47 [CI/Build] Increase pooling tolerance to pass CI (#22844) Cyrus Leung 2025-08-14 06:52:48 +08:00
31a500c86f [Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP (#22437) Jialin Ouyang 2025-08-13 14:44:06 -07:00
4e8614e88b Move checklist in PR template (#22852) Luka Govedič 2025-08-13 17:38:35 -04:00
c6cd5ca3d3 [ROCm][Bugfix] Fix compilation error in topk softmax fused kernel (#22819) kliuae 2025-08-14 04:45:03 +08:00
df0e0f023e [CI/Build] Skip gpt_big model test because of broken HF model (#22848) Isotr0py 2025-08-14 04:36:28 +08:00
b4b78d6317 [CI/Build] Fix param mismatch in test_eagle_correctness (#22847) Cyrus Leung 2025-08-14 01:55:25 +08:00
12817a8ac7 [CI] Fix tests/v1/e2e/test_kv_sharing_fast_prefill.py import on test (#22815) Nicolò Lucchesi 2025-08-13 19:35:50 +02:00
c9232d41f4 [CI/Build] Update VLM common tests (#22841) Cyrus Leung 2025-08-14 01:03:05 +08:00
9bd9294f0e [Bugfix] Fix MiniCPMV Image input inference failed (#22813) HWH 2025-08-14 00:41:41 +08:00
e1b37e06b7 Merge branch 'vllm-project:main' into wye-refactor-quant-folder Wentao Ye 2025-08-13 10:53:20 -04:00
da2705198f [Misc] clear and separate error messages for input too long and input + max-tokens too long (#22803) Roger Wang 2025-08-13 07:22:56 -07:00
19b927e52d [Core] Use individual MM items in P0/P1 cache and model runner (#22570) Cyrus Leung 2025-08-13 22:18:07 +08:00
20d65aa755 [Frontend] Multithreaded async multimodal load_bytes (#22710) milesial 2025-08-13 06:09:26 -07:00
b159c0a67a Fix GGUF loader for Qwen3 MoE. (#22785) Gh0u1L5 2025-08-13 21:08:23 +08:00
6772bb0f7d Remove unnecessary CUDA sync of qwen image and video preprocess (#22792) Yuanyuan Chen 2025-08-13 21:07:28 +08:00
fceafaf582 [Bugfix][mamba] Fix type annotation of Mamba2Metadata (#22787) Chen Zhang 2025-08-13 06:07:09 -07:00
6b794c756c [Nixl][CI] Fix tests (#22806) Nicolò Lucchesi 2025-08-13 15:03:53 +02:00
98deac3879 [FEATURE] support custom vllm tuned config path for fused moe triton kernels (#22791) Chi Zhang 2025-08-13 20:27:25 +08:00
653124bd46 [Frontend] Add chunked processing to handle long inputs in embedding models (#22280) Kdump 2025-08-13 19:14:24 +08:00
0b1bdac6af [Platform] Custom ops support for FusedMoe (#22509) wangxiyuan 2025-08-13 19:12:00 +08:00
d94e3026de [V1] Add tree drafting tests for eagle spec decoding (#22705) Giancarlo Delfin 2025-08-13 04:11:28 -07:00
3f52738dce [Doc] Add max_lora_rank configuration guide (#22782) 633WHU 2025-08-13 19:10:07 +08:00
a01e0018b5 [Bugfix] Fix Nemotron VL image processing (#22739) Duc-Viet Hoang 2025-08-13 17:11:36 +07:00
9e7e5baaa8 [Model] Add missing prefix to glm4_1v (#22716) Yuxuan Zhang 2025-08-13 16:23:33 +08:00
d16aa3dae4 [Model] Add option to run Step3VisionEncoder in DP (#22697) zzh142857 2025-08-13 03:09:13 -04:00
6807af8f46 [gpt-oss] upgrade gpt-oss to v0.0.3 and add version check (#22768) Chen Zhang 2025-08-12 21:37:26 -07:00
4c558cf62e [Perf] Support topk softmax fused kernel for broader num_experts (#22211) shixianc 2025-08-12 21:34:47 -07:00
77a6bf07ae [Bug] Fix Unexpected Keyword Argument 'w1_bias' (#22757) Wentao Ye 2025-08-13 00:31:47 -04:00
4082338a25 Remove unneeded ROCm platform import when using CUDA (#22765) Michael Goin 2025-08-13 00:26:38 -04:00
c6b928798e Force TRTLLM attention for gpt-oss on SM100 (#22678) Michael Goin 2025-08-13 00:22:16 -04:00
b1361c7273 [Bugfix] Fix default enable for CUTLASS MLA on SM100 (#22738) Michael Goin 2025-08-13 00:22:05 -04:00
4f0f844b16 Fix cuda illegal mem access with Llama4 TP8 + rms_norm custom op (#22701) Po-Han Huang (NVIDIA) 2025-08-13 12:21:50 +08:00
c5830381af [V0 Deprecation] Remove args for multi-step scheduling (#22779) Woosuk Kwon 2025-08-12 20:38:18 -07:00
d31f97cf57 [Misc] Remove tests/multi_step/__init__.py (#22778) Woosuk Kwon 2025-08-12 20:21:18 -07:00
71683ca6f6 [V0 Deprecation] Remove multi-step scheduling (#22138) Woosuk Kwon 2025-08-12 20:18:39 -07:00
e18859298d Add hardware plugins to installation doc (#22732) Michael Goin 2025-08-12 20:14:46 -04:00
fde0b611a3 [Model] Decouple glm4v (#22751) Jee Jee Li 2025-08-13 08:13:17 +08:00
d0a6301588 Fix Transformers backend tensor parallel for multimodal models (#22673) Harry Mellor 2025-08-13 01:12:30 +01:00
45c3936e94 [Docs] Hide the navigation and toc sidebars on home page (#22749) Harry Mellor 2025-08-13 01:12:26 +01:00
ba81acbdc1 [Bugfix] Bump DeepGEMM Version to Fix SMXX Layout Issues (#22606) Frank Wang 2025-08-12 15:43:06 -07:00
53c730286c [Misc] parametrize 'dtype' in test_flash_mla (#22641) RUTHLESS-BOT 2025-08-13 04:31:48 +08:00
6534d2fc97 Fix torch version check for SM100 mxfp4 (#22535) zifeitong 2025-08-12 12:54:42 -07:00
422f22e012 [CI][Nixl] Check kv cache layout during handshake (#22745) Nicolò Lucchesi 2025-08-12 21:53:52 +02:00
6bd8ebf026 [Kernel][AMD] Avoid D2H copy and cumsum kernel (#22683) Xiaozhu Meng 2025-08-12 12:53:36 -07:00
66d491c494 Merge branch 'vllm-project:main' into wye-refactor-quant-folder Wentao Ye 2025-08-12 15:18:34 -04:00
dab4f9f764 [Chore] Update CODEOWNERS to include @yewentao256 for CUDA kernels, attention backends, quantization, and related tests (#22741) Wentao Ye 2025-08-12 12:50:31 -04:00
c42fe0b63a Add more test scenario for tensor schema (#22733) TeeKen Lau 2025-08-13 02:34:41 +10:00
5a4b4b3729 Add: SupportsEagle3 interface for explicit EAGLE3 support (#22642) Rahul Tuli 2025-08-12 21:54:52 +05:30
e5d3d63c42 [Benchmark] Fix terminal colors in benchmark_serving_multi_turn (python 3.12) (#22730) Daniel Serebrenik 2025-08-12 17:41:37 +03:00
3d9d40efde [Bugfix][CI] Fix test_remote_decode_lifecycle.py::test_short_prompt_lifecycle (#22727) Nicolò Lucchesi 2025-08-12 16:30:17 +02:00
67c153b88a Fix Llama4 FlashInfer FP4 MoE issues (#22511) Po-Han Huang (NVIDIA) 2025-08-12 20:50:59 +08:00
f7ad6a1eb3 [CI Failure] fix tests/entrypoints/openai/test_skip_tokenizer.py (#22708) wang.yuqi 2025-08-12 20:42:58 +08:00
80bb1e8afe Officially support SmolLM3 using the Transformers backend (#22665) Harry Mellor 2025-08-12 13:38:48 +01:00
d030b01548 [BugFix][Nixl][PD] Fix heterogenous TP (#22663) Nicolò Lucchesi 2025-08-12 14:37:30 +02:00
767e63b860 [Docs] Improve docs navigation (#22720) Harry Mellor 2025-08-12 12:25:55 +01:00
007dd90859 [gpt-oss] Enable gpt-oss on ampere (#22714) Yongye Zhu 2025-08-12 06:21:44 -04:00
b8a9d0e429 [Misc] remove GH discussions link (#22722) Jee Jee Li 2025-08-12 18:15:33 +08:00
50f2aae1b4 [LMCache][Example] Align the PYTHONHASHSEED for prefillers and decoders for KV chunks hashing (#21161) zejunchen-zejun 2025-08-12 17:05:14 +08:00
46ae7f6666 [Bugfix] Mamba2 SSD varlen bug fix initstates decay, improve test, assert chunk pwr 2 (#21783) RishiAstra 2025-08-12 05:04:37 -04:00
1ece7f30ba Fix: AWQ Marlin get_quant_method does not recognize "modules_to_not_convert" (#21888) Jun-Howie 2025-08-12 17:03:53 +08:00
bc8372efc3 [Bugfix] Fix erroneous randomly generated cases in bad word testing (#22170) phantomlei 2025-08-12 17:03:22 +08:00
8d17fa633e [V0] Correct CUDA Graph capture for encoder-decoder models (#22630) Sugar-zsg 2025-08-12 17:01:08 +08:00
9f909b8996 [New Model] Support Command-A-Vision (#22660) dongluw 2025-08-12 04:39:54 -04:00

... 72 73 74 75 76 ...