Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

17af6aa0da [Document] Add ms-swift library to rlhf.md (#27469) jinghanhu 2025-10-25 04:31:50 +08:00
fc168c33f3 [CI/Build] Fix test_torch_utils in AMD CI (#27317) Zhewen Li 2025-10-24 12:26:00 -07:00
acc78aeb88 [Bugfix] Fix interns1-vit qk norm code path (#27480) Isotr0py 2025-10-25 01:43:45 +08:00
0f67d4d962 [Attention] Add MLA prefill backend: trtllm_ragged_attention_deepseek (#26397) Ming Yang 2025-10-24 10:24:08 -07:00
7e1d697b56 [Bugfix] Fix MultiConnector stats reconstruction across process boundaries (#27366) kourosh hakhamaneshi 2025-10-24 10:08:05 -07:00
699d62e6cf [NIXL][BUGFIX] delay done_recving queue cleanup to bottom of get_finished (#27297) Chendi.Xue 2025-10-24 12:01:41 -05:00
cd390b609d [compile] Turn standalone_compile back on (#27460) Richard Zou 2025-10-24 09:30:27 -07:00
2080b05099 [cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype (#27472) Fadi Arafeh 2025-10-24 16:57:48 +01:00
6454afec90 [Doc] Fix minor issues in docs/design/metrics.md (#27436) Lifans 2025-10-24 05:40:54 -07:00
41a62564a7 Fix test named tool use (#27458) Chauncey 2025-10-24 20:27:45 +08:00
284cc92275 [MISC] cudagraph_capture_sizes related improvements (#26016) fhl2000 2025-10-24 20:11:05 +08:00
435be10db9 Fix AArch64 CPU Docker pipeline (#27331) ioana ghiban 2025-10-24 14:11:01 +02:00
b7030d962b [Benchmark] Enable benchmark to run with encoding_format="bytes" (#27467) Cyrus Leung 2025-10-24 19:16:50 +08:00
3567816932 [Refactor] move tool parsing logic from protocol.py to the tool parser (#27383) Chauncey 2025-10-24 17:53:23 +08:00
e0ef8a2920 [BugFix] Fix torchrun DP with LLM class (#27395) 22quinn 2025-10-24 01:11:37 -07:00
42efe609ba [MM][Bugfix] Replace PatchEmbed's conv3d to linear layer (#27418) Isotr0py 2025-10-24 15:32:47 +08:00
88d3141ec6 [Docs] remove v1 column for embedding models (#27446) Yu Jiaqi 2025-10-24 14:55:03 +08:00
09a6a49eaf [Misc] Avoid "PyTorch non-writable tensors" warning in RayPPCommunicator (#27443) Rui Qiao 2025-10-23 23:53:09 -07:00
074475541a [Bugfix] Fix Pydantic union resolution for ResponseFunctionToolCall in Responses API (#26706) strinczer 2025-10-24 06:53:42 +01:00
d4c574c39f [Chore] remove structural tags logging lines (#27451) Aaron Pham 2025-10-24 01:35:45 -04:00
c528b9006a Fix EventPublisherFactory logic for disabled KV cache events (#27419) usberkeley 2025-10-24 13:00:01 +08:00
85fee74b33 [Bugfix][CI] Move resolving cudagraph_mode before initializing attn_metadata_builder (#27427) fhl2000 2025-10-24 11:31:14 +08:00
8dbe0c527f [Misc] Add TPU usage report when using tpu_inference. (#27423) hfan 2025-10-23 23:29:37 -04:00
5cc6bddb6e [Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092) Xiangyu Li 2025-10-24 11:26:13 +08:00
1f9460c4c1 Fix pooling adapters for Transformers backend (#27338) Harry Mellor 2025-10-24 04:23:55 +01:00
70022ffc00 Granite 4.0 quark quantization support (#26944) xiao-llm 2025-10-23 22:14:03 -04:00
f417746ad7 [Hardware][POWERPC] Disable oneDNN path in vllm/model_executor/layers/utils.py for Powerpc (#27422) Akash kaothalkar 2025-10-24 02:51:36 +05:30
0552cfb195 [Model] Siglip Embedding Support (#27324) Yu Jiaqi 2025-10-24 04:19:48 +08:00
51dd14ac2b [Bugfix][DP] Fix creating too many DP Placement Groups (#26880) Kebe 2025-10-24 05:16:51 +09:00
dbfbf9f324 [Attention] Fix FlashMLA metadata builder arguments for q_len > 1 (#27368) Matthew Bonanni 2025-10-23 15:58:15 -04:00
ca76486a16 [Chore] Separate out vllm.utils.platform_utils.py (#27374) Jonathan Chen 2025-10-23 15:08:06 -04:00
a9f55dc588 [Misc] Add triton_kernels dependency (#27370) Varun Sundar Rabindranath 2025-10-23 15:04:14 -04:00
81d5bb765a [Bugfix] Fix AWQ marlin layer skipping (#27416) Isotr0py 2025-10-24 02:30:28 +08:00
0825197bee [Bugfix][ROCm][DeepSeek] Fix for forward_hip in rope for DeepSeek (#27373) Gregory Shtrasberg 2025-10-23 13:43:53 -04:00
9ef3d5b875 [Bugfix] Fix dp_chunking enablement logic in FusedMoE layer (#27220) Alexander Matveev 2025-10-23 12:03:14 -04:00
295c7f0267 Mirroring the test definitions (2025-10-22) (#27362) Alexei-V-Ivanov-AMD 2025-10-23 11:02:26 -05:00
3fa2c12185 [Frontend][4/N] Improve all pooling task | Add plugin pooling task (#26973) wang.yuqi 2025-10-23 22:46:18 +08:00
fe2016de2d [CI/Build] Remove unnecessary flags from test registry (#27353) Cyrus Leung 2025-10-23 22:42:40 +08:00
237cf6d32a [Misc] Remove use of CUDA_VISIBLE_DEVICES for device selection (fix DP slow startup time &c) (#26709) Ilya Markov 2025-10-23 14:58:39 +02:00
faee3ccdc2 [Feature] Pydantic validation for speculative.py (#27156) Navya Srivastava 2025-10-23 05:19:33 -07:00
570c3e1cd4 [Bugfix] Honor --mm_encoder_attn_backend when used (#27124) Bradley D 2025-10-23 05:09:52 -07:00
3a4255c7c4 Run mypy on the lowest supported Python version instead of system Python (#27048) Harry Mellor 2025-10-23 13:07:44 +01:00
61089465a6 [Model] Add MoE support for NemotronH (#25863) tomeras91 2025-10-23 13:27:23 +03:00
88afa11010 [Metrics] [KVConnector] Add connector prefix cache hit rate stats (#26245) Tova Movshovitz 2025-10-23 13:21:08 +03:00
d00ce29d89 [CI] Reorganize entrypoints tests (#27403) Chauncey 2025-10-23 18:10:06 +08:00
3b7bdf983b add SLA information into comparison graph for vLLM Benchmark Suite (#25525) Louie Tsai 2025-10-23 01:04:59 -07:00
50b788a17a [CI/Build] Fix AMD CI: test_cpu_gpu.py (#27388) Zhewen Li 2025-10-23 00:55:00 -07:00
fc059c7061 [Bugfix] Fix args settings for guided decoding args (#27375) Lucia Fang 2025-10-23 00:34:06 -07:00
bfb240cc49 [CI/Build] Fix Prithvi plugin test (#27393) Cyrus Leung 2025-10-23 15:30:44 +08:00
e255d92990 [Chore] Remove duplicate has_ functions in vllm.utils (#27372) Jonathan Chen 2025-10-23 02:11:59 -04:00
3729ed00ba [Model] Add num_cached_tokens for PoolingRequestOutput (#27378) wang.yuqi 2025-10-23 14:03:42 +08:00
6644796bf4 [V1][spec decode] return logprobs for spec decoding (#26060) Giancarlo Delfin 2025-10-22 22:59:59 -07:00
ff93cc8c84 [CORE] Support Prefix Caching with Prompt Embeds (#27219) Andrew Sansom 2025-10-23 00:18:07 -05:00
243ed7d32e [Bugfix][Core] running queue index leakage exception (#26754) PiteXChen 2025-10-23 12:40:12 +08:00
7e0941055f [Bugfix] Fix incorrect kv cache metrics in grafana.json (#27133) fangpings 2025-10-22 20:58:36 -07:00
6738e4a093 [Bugfix] Fix SLA tuner initialization (#27355) Cyrus Leung 2025-10-23 11:43:04 +08:00
2566dca2a9 [Bugfix] Fix deepseek-ocr multi-image inference and add merge_by_field_config=True with tensor schema support (#27361) Isotr0py 2025-10-23 08:15:38 +08:00
b4fda58a2d [MLA] Bump FlashMLA (#27354) Matthew Bonanni 2025-10-22 18:48:37 -04:00
a0003b56b0 [Chore] Separate out system utilities from vllm.utils (#27201) dongbo910220 2025-10-23 04:25:25 +08:00
5beacce2ea [BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (#27128) Daisy-Ma-coder 2025-10-22 12:36:39 -07:00
8669c69afa [Feature] publisher default set zmq in kv_event config (#26915) rongfu.leng 2025-10-23 03:19:33 +08:00
1651003c35 [Prefix Cache] Use LoRA name for consistent KV-cache block hashing (#27211) Sage 2025-10-22 21:13:03 +03:00
1cb8c6c5fe [Doc] Fix numbering sequence in prefix caching (#27357) William Song 2025-10-23 02:35:47 +09:00
e05a6754a8 [Model] Revert PR #26715: Restore custom PaliGemma and Gemma3-MM impl… (#27309) Luciano Martins 2025-10-22 14:05:34 -03:00
084a9dae80 [Bugfix] Disable FlexAttention direct block mask building for encoder-only models (#27344) Isotr0py 2025-10-23 00:39:08 +08:00
c9461e05a4 Support Anthropic API /v1/messages Endpoint (#22627) v0.11.1rc2 RED 2025-10-23 00:13:18 +08:00
4dfdb821c8 [P/D] Dynamic kv_output_aggregator collect size (#26734) Nicolò Lucchesi 2025-10-22 18:07:58 +02:00
58fab50d82 [Frontend] Require flag for loading text and image embeds (#27204) Russell Bryant 2025-10-22 11:52:02 -04:00
db6f28d898 [Bugfix] Fix HF format InternVL large variants video processing (#27330) Isotr0py 2025-10-22 23:39:23 +08:00
14e2f1231e [Bugfix] Make get_mrope_input_positions instance methods (#27342) Cyrus Leung 2025-10-22 23:38:34 +08:00
7c4767f1eb [NIXL] use Host buffer to support TP_ratio > 1 for XPU (#27140) Chendi.Xue 2025-10-22 10:28:13 -05:00
9771e0b432 [Bugfix] Add missing 'is_internal_router' attribute to FusedMoEWithLoRA (#27351) Jee Jee Li 2025-10-22 23:19:12 +08:00
980de31ca0 [bugfix] remove unused parameters to reduce unnecessary vram usage (#26789) Reinforce-II 2025-10-22 23:16:09 +08:00
1c160841ea [Bug] Fix DeepSeek-V2.5-1210-FP8 issue (#27267) Wentao Ye 2025-10-22 11:00:10 -04:00
4ca13a8667 [NIXL] Terminate handshake listener thread in shutdown (#26404) Mark McLoughlin 2025-10-22 15:59:53 +01:00
675aa2ec64 [Model] Upstream Deepseek-OCR model (#27247) Isotr0py 2025-10-22 22:59:15 +08:00
3ae082c373 [Chore] Separate out optional dependency checks from vllm.utils (#27207) dongbo910220 2025-10-22 22:44:21 +08:00
49c00fe304 Mirroring changes in test-pipeline.yaml into test-amd.yaml (#27242) Alexei-V-Ivanov-AMD 2025-10-22 08:59:45 -05:00
141d3b9fc5 [docs] Update v1 metrics design doc (#27332) Mark McLoughlin 2025-10-22 14:29:15 +01:00
abf3db40ef [Core] Handle MoE LoRA edge cases (#27335) Jee Jee Li 2025-10-22 21:14:33 +08:00
8e4ca4d14e Bugfix - pass 'max_num_tokens_padded' into 'moe_lora_align_block_size' (#27311) gnovack 2025-10-22 05:23:57 -07:00
1a0f4defb7 [Log] Add Warning for LLM(data_parallel_size=k) single-process DP Usage (#27282) Wentao Ye 2025-10-22 08:12:21 -04:00
843af7f7fc [Bugfix][CPU] Disable dual stream execution for experts on CPU (#27320) Li, Jiang 2025-10-22 19:02:27 +08:00
1f633b8632 [Frontend][3/N] Improve all pooling task | Support binary embedding response (#27066) wang.yuqi 2025-10-22 18:38:57 +08:00
a4c29e6e82 fixed reasoning streaming with tool_choice="required" (#24108) ExtReMLapin 2025-10-22 11:42:55 +02:00
8f18feb191 Remove last level references not removed in #26355 (#27260) Harry Mellor 2025-10-22 10:18:17 +01:00
ed540d6d4c Update release pipeline for PyTorch 2.9.0 (#27303) Huy Do 2025-10-22 02:18:01 -07:00
f6027b2855 [1/N][Platform] Cleanup useless function (#26982) wangxiyuan 2025-10-22 17:04:57 +08:00
ab3e80042e [torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled (#27146) Jiangyun Zhu 2025-10-22 12:22:39 +08:00
ceacedc1f9 [Benchmark] Add plot utility for parameter sweep (#27168) Cyrus Leung 2025-10-22 11:30:03 +08:00
bfa59be8f1 [CI] Nixl integration tests DP-EP (#27199) Nicolò Lucchesi 2025-10-22 05:17:48 +02:00
265ecb05fb [DOC] [ROCm] Add ROCm quickstart guide (#26505) vllmellm 2025-10-22 11:10:48 +08:00
09a7e6f617 [Deepseek v3.2] Remove extra logics in indexer (#26465) Lain 2025-10-21 16:34:03 -07:00
6c2eef5a5d [P/D] KVConnector for decode benchmarking (#25986) Tyler Michael Smith 2025-10-21 19:30:47 -04:00
19748806f0 [Bugfix] skip cuda graph for drafter when running with eager (#26821) Benjamin Chislett 2025-10-21 18:39:09 -04:00
4a8a567e16 Updated xgrammar backend to not deny supported string formats (#27253) ExtReMLapin 2025-10-22 00:25:23 +02:00
344a0017c0 [Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE (#26440) Alexander Matveev 2025-10-21 17:38:29 -04:00
becb7de40b Update PyTorch to 2.9.0+cu129 (#24994) Huy Do 2025-10-21 14:20:18 -07:00
250fb1b8ea [Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. (#27144) Tao He 2025-10-22 02:27:03 +08:00
647214f3d5 [V0 Deprecation] Remove V0 executors (#27142) Nick Hill 2025-10-21 11:09:37 -07:00

... 50 51 52 53 54 ...