Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

6d0cf239c6 [CI/Build] Add Transformers nightly tests in CI (#20924) Isotr0py 2025-07-15 00:33:17 +08:00
3fc964433a [Misc] Clean up Aimv2 config registration in Ovis config (#20921) Isotr0py 2025-07-14 23:36:43 +08:00
0caf61c08a [CI] Update codeowner for compilation code (#20929) Lu Fang 2025-07-14 08:33:19 -07:00
667624659b [CI] cc folks on changes to vllm/compilation (#20925) Richard Zou 2025-07-14 10:52:17 -04:00
38efa28278 [Model] Add Ling implementation (#20680) ant-yy 2025-07-14 22:10:32 +08:00
e8cc53af5e [Misc] Log the reason for falling back to FlexAttention (#20699) Cyrus Leung 2025-07-14 19:16:51 +08:00
a4851cfe68 [Bugfix]: Fix messy code when using logprobs (#20910) Chauncey 2025-07-14 19:06:45 +08:00
9887e8ec50 [Misc] Remove unused function (#20909) Reid 2025-07-14 18:48:55 +08:00
f326ab9c88 [Bugfix] Bump up mistral_common to support v13 tokenizer (#20905) 22quinn 2025-07-14 03:45:03 -07:00
dcf2a5e208 [CI/Build] Fix OOM issue in Jina-VL test (#20907) Cyrus Leung 2025-07-14 18:32:35 +08:00
1e9438e0b0 [MISC] Move bind_kv_cache to worker module (#20900) wangxiyuan 2025-07-14 17:40:00 +08:00
697ef765ee [Refactor][V1] Move outlines utils for V1 imports (#20878) Aaron Pham 2025-07-14 03:58:35 -04:00
a99b9f7dee [Quantization] add BNB for MixtralForCausalLM (#20893) Jee Jee Li 2025-07-14 15:34:34 +08:00
c488b928a7 [ROCm] [Bugfix] [Critical]: Fix mamba compilation bug (#20883) TJian 2025-07-14 00:23:28 -07:00
2c7fa47161 Fix: Add missing EOFError handling in CLI complete command (#20896) Reid 2025-07-14 15:09:57 +08:00
88fc8a97e3 Removing redundant python version check (#20888) Daniel song 2025-07-14 02:15:05 -04:00
66f6fbd393 [Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511) Maroon Ayoub 2025-07-14 05:45:31 +03:00
8632e831ba [Core] Add update_config RPC method (#20095) 22quinn 2025-07-13 17:49:18 -07:00
4bbfc36b16 [V1] Hybrid allocator without prefix caching (#20661) nopperl 2025-07-14 01:55:14 +09:00
80d38b8ac8 [V1] [ROCm] [AITER] Upgrade AITER to commit 916bf3c and bugfix APIs (#20880) TJian 2025-07-13 08:19:32 -07:00
211b6a6113 [Bugfix] fix define of RerankDocument (#20877) Liuchenlong 2025-07-13 22:32:40 +08:00
247102f07f [Bugfix] Fix: add patch_rope_scaling after hf override (#20857) Wang Siyuan 2025-07-13 15:13:25 +08:00
bd4c1e6fdb Support for LlamaForSequenceClassification (#20807) Minkyu Kim 2025-07-13 16:09:34 +09:00
99b4f080d8 Renable google/gemma-3-1b-it accuracy test. (#20866) QiliangCui 2025-07-12 21:48:56 -07:00
020f58abcd [Core] Support multiple tasks per model (#20771) Nicolò Lucchesi 2025-07-13 04:40:11 +02:00
c1acd6d7d4 [Refactor] Change the way of import triton (#20774) Wentao Ye 2025-07-12 22:39:55 -04:00
3b3b778d4a [Bugfix] Fix a couple PPLX+CUTLASS MoE bugs (#20825) ElizaWszola 2025-07-13 04:39:14 +02:00
42d440c22b [Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841) Wentao Ye 2025-07-12 22:38:45 -04:00
f45a332886 [Sched] Enhance the logic to remove stopped requests from queues (#20739) Woosuk Kwon 2025-07-12 15:33:13 -07:00
6e2c176e1f [Bugfix] Restrict Machete to only run on Hopper (#20830) Michael Goin 2025-07-13 02:34:40 +09:00
a86754a12b [docs] convert supported configs to table (#20858) Reid 2025-07-12 21:54:50 +08:00
c2a2f19aba [Bugfix] Fix Tensor Parallelism Padding Consistency in Granite Models (#20843) Alex Brooks 2025-07-12 07:11:30 -06:00
2c11a738b3 [Model] New model support for microsoft/Phi-4-mini-flash-reasoning (#20702) Congcong Chen 2025-07-12 06:02:10 -07:00
b639327ad9 Revert "Use NVCC --compress-mode to reduce binary size by 30% #20694" (#20853) Michael Goin 2025-07-12 15:07:35 +09:00
4afe687a82 Enable ModelOpt Llama4 fp8 checkpoint deployment (#20419) Zhiyu 2025-07-11 23:07:16 -07:00
5de8d9f111 Remove extra tensor on CPU (#20693) Maximilien de Bayser 2025-07-12 03:06:34 -03:00
c1c8ca57ff [cold start time] add envs.VLLM_COMPILE_DEPYF to guard decompile (#20790) Boyuan Feng 2025-07-11 23:06:13 -07:00
a3a5a47e48 [Bugfix] Fix torch.compile x LoRA for PyTorch 2.8 (#20823) Richard Zou 2025-07-12 02:06:04 -04:00
fb25e95688 [Docs] Update basic.md (#20846) Lucia Fang 2025-07-12 14:05:32 +08:00
0d4891cd03 [Bug] Fix DeepGemm for EP low latency case (#20833) Wentao Ye 2025-07-12 02:05:12 -04:00
f56d2996ca [Misc] Respect no_use_tqdm_on_load flag while capturing CUDA graph (#20834) lkchen 2025-07-11 23:04:45 -07:00
147afb448b [Bugfix] Replace unavailable video url in multimodal test (#20854) Isotr0py 2025-07-12 13:25:39 +08:00
3c7d942da8 [Frontend] Abstract prompt and SpeechToTextConfig for transcriptions models (#20637) Nicolò Lucchesi 2025-07-12 06:33:26 +02:00
890323dc1b [Bugfix] : Fix typo - logger.warn_once -> logger.warning_once (#20852) Varun Sundar Rabindranath 2025-07-12 07:56:24 +04:00
01cae37713 [CI/Build] Ensure compatability with Transformers v4.53 (#20541) Isotr0py 2025-07-12 11:53:07 +08:00
11c0198615 [Bugfix] Fix tensor parallel issue in Qwen3 reranker weight loading (#20682) yurhett 2025-07-12 11:52:43 +08:00
b1235c3e10 [Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break not-cuda-alike devices (#20822) Li, Jiang 2025-07-12 11:52:05 +08:00
44d02f54db [Misc] Restrict deep_gemm's log output (#20827) Jee Jee Li 2025-07-12 11:50:42 +08:00
a8593237c0 Add pynccl all-gatherv and reducescatterv (#20154) Trevor Morris 2025-07-11 18:59:23 -07:00
fc0f41d10a Integration SM100 FlashInfer fused allreduce RMSNorm (#20691) Ilya Markov 2025-07-12 03:58:15 +02:00
7b828e30d5 [CI Bug] Fix Async Engine, Inputs, Utils, Worker Test: 'State' object has no attribute 'enable_server_load_tracking' (#20845) Wentao Ye 2025-07-11 21:57:24 -04:00
5f0af36af5 Update kimi-k2 tool calling docs, enable unit tests (#20821) bigmoyan 2025-07-12 04:16:14 +08:00
0d21b2664c [Bugfix] Fix OOM in language generation test (#20814) Isotr0py 2025-07-12 02:21:52 +08:00
9907fc4494 [Docs] Data Parallel deployment documentation (#20768) Nick Hill 2025-07-11 17:42:10 +01:00
d47661f0cd [Kernel] Basic tuned configs for NVFP4 CUTLASS dense GEMM (#20646) Michael Goin 2025-07-12 01:05:33 +09:00
53fa457391 [Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449) Varun Sundar Rabindranath 2025-07-11 10:51:46 -04:00
6fb162447b [doc] fix ordered list issue (#20819) Reid 2025-07-11 21:49:46 +08:00
66177189c5 [Bugfix] Add missing field to TritonLanguagePlaceholder (#20812) Li, Jiang 2025-07-11 20:25:11 +08:00
b4f0b5f9aa Temporarily suspend google/gemma-3-1b-it. (#20722) QiliangCui 2025-07-11 04:21:26 -07:00
cbd14ed561 [Bugfix] Refactor /invocations to be task-agnostic (#20764) Cyrus Leung 2025-07-11 18:20:54 +08:00
7bd4c37ae7 [Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100). (#19825) Pavani Majety 2025-07-11 02:23:23 -07:00
8020e98c9f [Quantization][1/N] MoE support BNB-Inflight Quantization (#20061) Jee Jee Li 2025-07-11 16:01:13 +08:00
762be26a8e [Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging (#20777) Luka Govedič 2025-07-11 03:15:22 -04:00
6a9e6b2abf [doc] fold long code block (#20795) Reid 2025-07-11 14:16:41 +08:00
5d09152ff1 [V1] Enable Mamba2 layers other than MambaMixer2 in the v1 engine (#20660) nopperl 2025-07-11 14:53:31 +09:00
31d5c1797f [Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830) Luka Govedič 2025-07-11 00:56:28 -04:00
35514b682a [XPU] XCCL support enabled in torch 2.8.0.dev nightly builds (#20705) Ratnam Parikh 2025-07-10 20:39:52 -07:00
e2de455c34 [Feature] Integrate SM100 DeepGEMM support (#20087) Wentao Ye 2025-07-10 23:18:05 -04:00
5b032352cc [Attention] MLA - Flashinfer Ragged Prefill (#20034) Alexander Matveev 2025-07-10 23:17:47 -04:00
922f316441 [Model] Support HF format of minimax (#20211) Michael Goin 2025-07-11 11:55:21 +09:00
5923ab9524 [fix]: disable cutlass block scaled group gemm for EP (#20781) Duncan Moss 2025-07-10 19:39:18 -07:00
0cf893cae1 Add kimi-k2 tool parser (#20789) bigmoyan 2025-07-11 10:36:23 +08:00
cf75cd2098 [CI Bugfix] Specify same TORCH_CUDA_ARCH_LIST for flashinfer aot and install (#20772) Michael Goin 2025-07-11 10:16:01 +09:00
b854321ffe [Docs] Lazy import gguf (#20785) Simon Mo 2025-07-10 16:06:37 -07:00
5b6fe23d05 [Bugfix][Benchmark] Make sure the output length > 0 when testing prefill workload. (#20786) Kuntai Du 2025-07-10 14:52:46 -07:00
f0c98cae27 [Misc] MoE ModularKernel : Introduce TopKWeightAndReduce (#20648) Varun Sundar Rabindranath 2025-07-10 17:40:38 -04:00
574ad60db9 [KVConnector] Always call connector clear_metadata() at end of step (#20756) Nick Hill 2025-07-10 22:37:27 +01:00
fdadb6f43a [Bugfix] Fused MoE Modular Kernel chunking loop (#20392) Varun Sundar Rabindranath 2025-07-10 16:31:10 -04:00
41060c6e08 [Core] Add Support for Default Modality Specific LoRAs [generate / chat completions] (#19126) Alex Brooks 2025-07-10 14:09:37 -06:00
3de2ed767f [Bugfix] Remove assertion of expert_map being None (#20714) Ming Yang 2025-07-10 12:55:22 -07:00
299252ea82 [CI] Fix pre commit issue (#20782) Wentao Ye 2025-07-10 15:48:13 -04:00
d6902ce79f [V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975) Nathan Hoos 2025-07-10 14:30:26 -05:00
5e53c89a74 [Bugfix] [CI] Fix Tensorizer LoRA test (#20760) Sanger Steel 2025-07-10 15:07:06 -04:00
c66e38ea4c [Test] Remove docker build from test. (#20542) QiliangCui 2025-07-10 11:21:58 -07:00
251595368f Fix DeepSeek-R1-0528 chat template (#20717) sfbemerk 2025-07-10 19:47:36 +02:00
4bed167768 [Model][VLM] Support JinaVL Reranker (#20260) shineran96 2025-07-11 01:43:43 +08:00
b140416abf [Model] Add reason parser for Hunyuan A13B Model. (#20625) Asher 2025-07-11 00:33:26 +08:00
5b8366b61a [ROCm][Regression] Remove tensor creation that harms performance on ROCm (#20741) Gregory Shtrasberg 2025-07-10 12:22:23 -04:00
c7753a9809 [Hardware][CPU] Vllm int8 quantization enablement for ARM CPU (#14129) nishith-fujitsu 2025-07-10 21:29:04 +05:30
4b9a9435bb Update Dockerfile FlashInfer to v0.2.8rc1 (#20718) Michael Goin 2025-07-11 00:09:02 +09:00
3482fd7e4e [Doc] Add engine args back in to the docs (#20674) Harry Mellor 2025-07-10 16:02:40 +01:00
77f77a951e [Misc] Clean up mark to fork process in BNB tests (#20692) Isotr0py 2025-07-10 21:59:40 +08:00
1a4f35e2ea Normalize lm-eval command between baseline and correctness test (#18560) Michael Goin 2025-07-10 22:27:32 +09:00
be1e128dfb [CI Bugfix] Skip failing Tensorizer+LoRA test (#20724) Michael Goin 2025-07-10 21:15:03 +09:00
65393ee064 [doc] fix ordered list (#20749) Reid 2025-07-10 18:13:52 +08:00
dc221ad72d [Bugfix][Build][Non-CUDA] Only referencing CMAKE_CUDA_COMPILER_VERSION on CUDA where it is defined (#20738) Gregory Shtrasberg 2025-07-10 05:58:11 -04:00
7571a4a7e5 [CI/Build] Fix Basic Models Test (#20728) Jee Jee Li 2025-07-10 17:57:19 +08:00
f67d986dd1 [Misc] loose new-model tagger conditions (#20747) Isotr0py 2025-07-10 17:54:47 +08:00
cc876d0f29 [KVConnector] Aggregate finished requests on the scheduler (#19555) Or Ozeri 2025-07-10 11:22:18 +03:00
fdfd409f8f [TPU][Core]Make load weight exceed hbm error more instructive for customers (#20644) Chenyaaang 2025-07-10 00:01:17 -07:00

... 81 82 83 84 85 ...