Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

98b4d389ed [Redo] #26368 (#28771) Cyrus Leung 2025-11-15 14:47:41 +08:00
6965ef436f [Performance][DeepGEMM] Estimate expected_m (#28694) Varun Sundar Rabindranath 2025-11-15 00:52:14 -05:00
c9e665852a [NIXL] heterogeneous block_size support (#26759) Chendi.Xue 2025-11-14 23:51:32 -06:00
363aaeef0f Fix IntermediateTensors initialization and add type hints (#28743) Mohammad Othman 2025-11-15 06:31:36 +02:00
ac86bff8cb Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773) Nick Hill 2025-11-14 20:24:00 -08:00
edfe498189 [Bugfix] Build hadacore kernels on >SM90 (#28748) Michael Goin 2025-11-14 22:51:05 -05:00
f05d474c8a [Model][Qwen3VL] Use mm_position to compute mrope positions (#28730) Lukas Geiger 2025-11-15 03:45:11 +00:00
9fc81ec765 [TPU] Fix import error in tpu launch (#28758) QiliangCui 2025-11-14 16:58:32 -08:00
186352b270 [Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368) Jialin Ouyang 2025-11-14 16:04:04 -08:00
58e61e56b7 [Test] Rework e2e async scheduling tests (#28744) Nick Hill 2025-11-14 16:01:09 -08:00
75f01b9d3c [ROCm][CI/Build] Upgrade to ROCm 7.1 and AITER main (#28753) Gregory Shtrasberg 2025-11-14 18:53:21 -05:00
ba041d980b [Log] Save profiler results to file instead of stdout (#28144) rasmith 2025-11-14 17:26:39 -06:00
e0c910bb89 [Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 (#28295) Thomas Parnell 2025-11-14 23:55:42 +01:00
bf3ffb61e6 [Bugfix] Fix ChunkedLocalAttention CUDA Graph setting (#28739) Benjamin Chislett 2025-11-14 17:14:46 -05:00
e5c78956c0 [Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740) Alexander Matveev 2025-11-14 17:13:46 -05:00
2e0ad629b0 Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110) Laith Sakka 2025-11-14 14:11:10 -08:00
5a84b76b86 [ROCm][CI/Build] Change install location of uv (#28741) Gregory Shtrasberg 2025-11-14 16:34:18 -05:00
0de4f217ab [Bugfix] TypeError: 'NoneType' object is not callable (#27410) Marcin Ostrowski 2025-11-14 22:13:53 +01:00
f08eab2acc [CI] Fix macos smoke test uv cache issue (#28736) Michael Goin 2025-11-14 15:29:55 -05:00
8977ffb5e6 [ROCm][Bugfix] Fix compilation errors with fused_qknorm_rope_kernel.cu (#28682) Sage Moore 2025-11-14 11:06:01 -08:00
fd4555089a [BugFix] Fix misprint introduced by modular_kernel refactoring. (#28728) Andrey Khalyavin 2025-11-14 21:58:18 +03:00
cec275efce [Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure (#28663) GuanH 2025-11-15 02:44:27 +08:00
e2741f6cbc [Chore] Rename SchedulerConfig.chunked_prefill_enabled (#28735) Cyrus Leung 2025-11-15 02:39:57 +08:00
67187554dd [Docs] Enable some more markdown lint rules for the docs (#28731) Harry Mellor 2025-11-14 18:39:19 +00:00
a425dc256e [Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo (#28716) TJian 2025-11-14 10:30:50 -08:00
964d65deed LLaMA4 LoRA Adapter Enablement (#28602) Fardin Hoque 2025-11-14 10:27:56 -08:00
9261eb3dc1 docs(lora_resolvers): clarify multi-resolver order and storage path requirement (#28153) Chen Wang 2025-11-14 13:08:30 -05:00
cdd7025961 [kernel] Improve FP8 PTPC on Hopper for larger shapes (#28692) czhu-cohere 2025-11-14 12:59:11 -05:00
085424808e Remove audio optional dependency for mistral-common (#28722) Julien Denize 2025-11-14 18:54:38 +01:00
a17e36f223 Fix typo in comment: existance -> existence (#28737) Mohammad Othman 2025-11-14 19:35:45 +02:00
8cc40f8992 [Attention] Bump FA for removed method (#28429) Matthew Bonanni 2025-11-14 12:13:37 -05:00
6f1e7f7226 [DisaggEverything] Tokens in<>out /generate endpoint (#24261) Nicolò Lucchesi 2025-11-14 17:58:01 +01:00
d54a18a47e [CI][CPU] Smoke test for Apple Silicon using GHA MacOS runner (#28688) Michael Goin 2025-11-14 11:37:18 -05:00
5f3cd7f7f2 [Docs] Update the name of Transformers backend -> Transformers modeling backend (#28725) Harry Mellor 2025-11-14 16:34:14 +00:00
c934caee88 [Fix] improve aspect ratio in dummy image generation and add common VLM tests for PaddleOCR-VL (#28711) dongbo910220 2025-11-15 00:07:20 +08:00
3f8a874065 [Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) (#27134) Duncan Moss 2025-11-14 08:02:44 -08:00
511a6b611d [Config] Clean up SchedulerConfig initialization (#28665) Cyrus Leung 2025-11-14 22:41:02 +08:00
96b23b8e3b [Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677) Nicolò Lucchesi 2025-11-14 15:40:05 +01:00
433c0f8675 [Model] Fix bailing_moe accuracy problem (#28277) zhaozx-cn 2025-11-14 21:33:02 +08:00
8d3748d3c7 [Doc] Fix macOS installation dependency resolution issue (#26721) Fasal Shah 2025-11-14 18:13:56 +05:30
db56a59970 [BugFix] Fix FA3 IMA with FULL_AND_PIECEWISE and cascade attention (default) (#28702) Lucas Wilkinson 2025-11-14 07:19:22 -05:00
9324e10275 Fix KV sharing fast prefill with cudagraph enabled (#28537) Yong Hoon Shin 2025-11-14 01:53:42 -10:00
4516d44b7f [DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer (#25438) Jingchun Gao 2025-11-14 19:24:10 +08:00
41b92f7d38 [Model][MM] Extract conv layer as CustomOp (#28455) Shanshan Shen 2025-11-14 19:16:13 +08:00
360bd8762f [Frontend] Added chat-style multimodal support to /classify. (#27516) Srreyansh Sethi 2025-11-14 03:03:55 -08:00
ecf8230d4d [Metrics] Log number of preempted requests (#28522) lyn610 2025-11-14 17:47:45 +08:00
8cfbe89b93 [Misc] fix comment in test_envs (#28529) Xing Liu 2025-11-14 01:32:46 -08:00
fd75d3e8c0 [Minor] avoid register new custom and just import silly_attn (#28578) Boyuan Feng 2025-11-14 01:32:31 -08:00
c9a3a02149 Add output token counting to gsm8k eval (#28594) Michael Goin 2025-11-14 04:32:03 -05:00
bc3e43069a [BugFix] Fix multi-modal async scheduling race condition (#28706) Nick Hill 2025-11-14 01:11:13 -08:00
c36bcfe6b3 [Bugfix] fix dots.ocr pp support (#28705) Jiangyun Zhu 2025-11-14 17:01:26 +08:00
529cea343d use default CCL_ZE_IPC_EXCHANGE (#28700) Yan Ma 2025-11-14 16:55:29 +08:00
93103575ce [BugFix][CI/Build][ROCM] Fix import error and apply assert in appropriate case in test_struct_output_generate (#28311) rasmith 2025-11-14 00:41:29 -06:00
15ae8e0784 [Bugfix][CI/Test][Spec Decode] Fix illegal memory access in offline_inference/spec_decode.py (Issue 27619) (#28432) rasmith 2025-11-14 00:34:01 -06:00
0b25498990 [Misc] add ignore mapper for quark quantization (#28275) haoyangli-amd 2025-11-14 13:56:35 +08:00
0aecd9138f [Misc] Update xformers to 0.33.0.post1 (#28678) Roger Wang 2025-11-13 21:52:53 -08:00
da14ae0fad [XPU][CI]disable lm cache uts (#28696) Kunshang Ji 2025-11-14 11:15:50 +08:00
01bea115c4 [Misc] Remove warn_for_unimplemented_methods (#28613) Cyrus Leung 2025-11-14 11:10:10 +08:00
b39a5026eb [ci][amd] fix basic models extra init test (#28676) Bradley D 2025-11-13 18:44:36 -08:00
622e6106a9 [CPU][Bugfix] Fix Apple Silicon M1 compilation failure (#28681) Michael Goin 2025-11-13 20:49:55 -05:00
2aa75c752b [ROCm] Bump up the version of amd-smi to 6.4.3 (#28680) Sage Moore 2025-11-13 17:24:28 -08:00
4d5943bda6 [quantization][config] enable override existing quant_config (#28510) Hank_ 2025-11-14 09:24:10 +08:00
f2b8e1c551 Mirrored test group definitions for AMD (2025-11-11) (#28573) Alexei-V-Ivanov-AMD 2025-11-13 18:16:34 -06:00
6e25b1cddf [KV Connector] Test async mode in scheduler tests (#28550) Mark McLoughlin 2025-11-13 23:30:59 +00:00
e64011f29a [CI] Bug: Fix ci entrypoint pooling (#28684) Wentao Ye 2025-11-13 17:19:35 -05:00
1b622deba7 [Misc] Update CODEOWNERS for simon-mo and comaniac (#28675) Simon Mo 2025-11-13 13:01:43 -08:00
faed7bf07e [Bugfix] [CPU] bump torch to 2.9.0 for Darwin to fix segmentation fault (#27791) Kebe 2025-11-14 05:48:08 +09:00
262d263f6c [Bugfix] Eliminate tuple inputs to submodules in graph partitioning (#28533) Yanan Cao 2025-11-13 12:09:05 -08:00
968060c15a [bugfix] correct local_chunk_len for DCP in reorg_kvcache with long context (#28526) Qiu 2025-11-14 03:29:22 +08:00
5d6ce2b960 [Perf] Support stream interval for reducing host overhead (#27869) elvischenv 2025-11-14 02:21:25 +08:00
f9f3b596f3 [Attention][Bugfix] Fix FA sink support (#28660) Matthew Bonanni 2025-11-13 12:20:01 -06:00
119c4927b3 [Bugfix] Fix validate model input for decoder models (#27099) Yannick Schnider 2025-11-13 19:18:47 +01:00
fe1cd7704d [Performance][B200] silu_mul_quant: pack scales in int32 (#28358) Varun Sundar Rabindranath 2025-11-13 13:16:55 -05:00
fdfd5075aa [TPU] patch TPU wheel build script to resolve metadata issue (#27279) Johnny Yang 2025-11-13 09:36:54 -08:00
327c0a9a23 [BugFix] Ensure EngineArgs.create_engine_config is idempotent (#28515) Nick Hill 2025-11-13 09:14:08 -08:00
06c4873d95 Rewrite C++ meta funcs to Python (#28595) Jane (Yuan) Xu 2025-11-13 11:52:50 -05:00
d3387750f1 [Misc] Turn off encoder torch compile by default (#28634) Roger Wang 2025-11-13 08:38:08 -08:00
b230286fbc Fix get_num_experts when config sets it explicitly to None (#28652) Harry Mellor 2025-11-13 16:02:42 +00:00
3035d1a166 [BugFix] DeepSeek-OCR: apply NoRepeatNGramLogitsProcessor to greedy path (#28617) Yuanping Song 2025-11-13 10:24:35 -05:00
07a606aa7e [CI Failure] Fix backend selection for encoder-only models (#28534) Huamin Li 2025-11-13 07:11:27 -08:00
a7791eac9d [CI/Build] Install uv for AMD MI300: Language Models Tests (Hybrid) %N (#28142) amdfaa 2025-11-13 09:34:55 -05:00
8da2f28f53 [ROCm][BugFix]Fix get_cu_count in rocm_aiter_fa.py (#28618) Pleaplusone 2025-11-13 22:18:20 +08:00
86d15bfd8d [Hardware][PowerPC] Fix fp16 compilation error for Power in cpu attention backend and bump oneDNN version (#28535) Akash kaothalkar 2025-11-13 19:02:21 +05:30
c9fe6abe7c [Bugfix] Fix FPS value type for Qwen2.5-Omni video processing (#28630) Fanli Lin 2025-11-13 21:06:06 +08:00
c47b6c85ac [XPU] add sym params to IPEXConfig (#28611) zofia 2025-11-13 19:35:04 +08:00
c428e8d80b Fix io processor pooling #28273 (#28484) baonudesifeizhai 2025-11-13 06:34:14 -05:00
5e973209aa [BugFix] Fix type error when assign a trition kernel tensor to a torch.nn.Parameter (#28603) Zijing Liu 2025-11-13 03:30:04 -08:00
e63fd44560 Fix: Correctly filter special tokens in benchmark_prefix_caching (#28615) Di Wu 2025-11-13 18:57:44 +08:00
11ac9ddd03 Support all interleaved layer types (#28485) Yong Hoon Shin 2025-11-12 22:57:20 -10:00
5c9ad138d5 [Frontend] supports interleaved thinking (#28531) Chauncey 2025-11-13 16:14:13 +08:00
fa183e9271 [Bugfix] fix kimi-linear crash (#28445) Jiangyun Zhu 2025-11-13 15:59:58 +08:00
4ab34f6ef1 Add NUMA node validation for CPU thread binding (#28555) usberkeley 2025-11-13 15:03:52 +08:00
c33b87e777 Use official xformers-0.0.33 built for PT 2.9 (#28600) Huy Do 2025-11-12 22:48:53 -08:00
4504e8029b [Bugfix] Prevent crash on empty grammar string (#28210) tjandy98 2025-11-13 14:42:29 +08:00
ca00b1bfc6 [ROCm][BugFix] Remove the usage of device_info from aiter (#28383) Pleaplusone 2025-11-13 13:43:42 +08:00
d44fbbab0e [build][cmake]: Bundle static ACL and torch libgomp for CPU extension builds (#28059) Radu Salavat 2025-11-12 21:43:08 -08:00
7e082bc14e Support DeepEP for Kimi-k2-thinking through enabling gemm selection for compressed-tensor marlin wna16 (#28574) Lucia Fang 2025-11-12 21:40:45 -08:00
dbbe0c756a [XPU] Support Triton path for LoRA operations on XPU (#28511) Fanli Lin 2025-11-13 13:31:42 +08:00
7dca0c90cb [BugFix][ROCm] Fix get_cu_count missing variable error (#28608) Pleaplusone 2025-11-13 13:18:56 +08:00
1a0b157a2e [Frontend][responsesAPI][1/n] convert responses API tool input to chat completions tool format (#28231) Andrew Xia 2025-11-12 20:47:22 -08:00

... 44 45 46 47 48 ...