Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

2dec7c1a5d [Bugfix][CUDA] fixes CUDA FP8 kv cache dtype supported (#21420) elvischenv 2025-07-23 11:34:50 +08:00
08d2bd78da [BUGFIX] deepseek-v2-lite failed due to fused_qkv_a_proj name update (#21414) Chendi.Xue 2025-07-22 22:33:57 -05:00
4f76a05f4f [BugFix] Update python to python3 calls for image; fix prefix & input calculations. (#21391) ericehanley 2025-07-22 22:33:00 -05:00
f154bb9ff0 Simplify weight loading in Transformers backend (#21382) Harry Mellor 2025-07-23 04:29:43 +01:00
3ec7170ff1 [Bugfix][ROCm][Build] Fix build regression on ROCm (#21393) Gregory Shtrasberg 2025-07-22 23:27:41 -04:00
c401c64b4c [CI/Build] Fix model executor tests (#21387) Cyrus Leung 2025-07-23 11:25:37 +08:00
b77c7d327f [BugFix] Fix ray import error mem cleanup bug (#21381) Joe Runde 2025-07-22 17:19:55 -06:00
35bc8bd5fb [Misc] Copy HF_TOKEN env var to Ray workers (#21406) Rui Qiao 2025-07-22 16:18:42 -07:00
4594fc3b28 [Model] Add Qwen3CoderToolParser (#21396) Yiheng Xu 2025-07-22 15:05:57 -07:00
ae268b6326 Fix Flashinfer Allreduce+Norm enable disable calculation based on fi_allreduce_fusion_max_token_num (#21325) Xin Li 2025-07-22 15:42:31 -04:00
35366ae57c [CI/Build] Fix test failure due to updated model repo (#21375) Cyrus Leung 2025-07-22 23:39:35 +08:00
2226d5bd85 [Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers (#21353) Aritra Roy Gosthipaty 2025-07-22 20:57:28 +05:30
44554a0068 Add tokenization_kwargs to encode for embedding model truncation (#21033) Wang Yijun 2025-07-22 23:24:00 +08:00
226b452a20 Revert "[Refactor] Fix Compile Warning #1444-D (#21208)" (#21384) Wentao Ye 2025-07-22 11:22:10 -04:00
f38ee34a0a [feat] Enable mm caching for transformers backend (#21358) Raushan Turganbay 2025-07-22 17:18:46 +02:00
b194557a6c Adds parallel model weight loading for runai_streamer (#21330) Benjamin Bartels 2025-07-22 16:15:53 +01:00
774d0c014b [Perf] Cuda Kernel for Per Token Group Quant (#21083) Wentao Ye 2025-07-22 10:27:15 -04:00
2c8db17cfd [feat]: add SM100 support for cutlass FP8 groupGEMM (#20447) Duncan Moss 2025-07-22 07:27:12 -07:00
4fb56914c5 [perf] Add fused MLA QKV + strided layernorm (#21116) Mickaël Seznec 2025-07-22 16:07:44 +02:00
0df4d9b06b [Misc] unify variable for LLM instance v2 (#21356) Ning Xie 2025-07-22 21:32:36 +08:00
ed25054577 [Core] Introduce popleft_n and append_n in FreeKVCacheBlockQueue to further optimize block_pool (#21222) Jialin Ouyang 2025-07-22 06:17:47 -07:00
10904e6d75 [benchmark] Port benchmark request sent optimization to benchmark_serving (#21209) Jialin Ouyang 2025-07-22 05:28:00 -07:00
a32237665d [Core] Optimize update checks in LogitsProcessor (#21245) Jialin Ouyang 2025-07-22 05:27:18 -07:00
bc8a8ce5ec [Misc] Remove deprecated args in v0.10 (#21349) Kebe 2025-07-22 20:26:39 +08:00
32142b3c62 [Bugfix] Fix eviction cached blocked logic (#21357) Simon Mo 2025-07-22 01:18:40 -07:00
82b8027be6 Add arcee model (#21296) Raghav Ravishankar 2025-07-22 13:27:43 +05:30
3779eb8c81 [Feature][eplb] add verify ep or tp or dp (#21102) rongfu.leng 2025-07-22 14:41:14 +08:00
9e23ad9655 Update fp4 quantize API (#21327) Shu Wang 2025-07-22 01:40:21 -05:00
e69a92a1ce [Bug] DeepGemm: Fix Cuda Init Error (#21312) Wentao Ye 2025-07-22 02:36:18 -04:00
8425f785ad [Misc] DeepEPHighThroughtput - Enable Inductor pass (#21311) Varun Sundar Rabindranath 2025-07-22 12:05:45 +05:30
c17231e827 Fix kv_cache_dtype handling for out-of-tree HPU plugin (#21302) Konrad Zawora 2025-07-22 08:35:14 +02:00
6e5b5ca580 [Refactor] Fix Compile Warning #1444-D (#21208) Wentao Ye 2025-07-22 02:33:51 -04:00
488d8a986a [V1] [Hybrid] Add new test to verify that hybrid views into KVCacheTensor are compatible (#21300) Thomas Parnell 2025-07-22 08:31:18 +02:00
af376ca19d [Core] Minimize number of dict lookup in _maybe_evict_cached_block (#21281) Jialin Ouyang 2025-07-21 22:37:34 -07:00
e7b2042681 Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762) (#21334) Ming Yang 2025-07-21 21:49:01 -07:00
90f1e55421 [Intel GPU] Ray Compiled Graph avoid NCCL for Intel GPU (#21338) Ratnam Parikh 2025-07-21 21:48:27 -07:00
5e70dcd6e6 [Doc] Fix CPU doc format (#21316) Li, Jiang 2025-07-22 12:47:49 +08:00
25d585ab7b [XPU] Enable external_launcher to serve as an executor via torchrun (#21021) Chaojun Zhang 2025-07-22 12:47:35 +08:00
8d0a01a5f2 [v1][sampler] Inplace logprobs comparison to get the token rank (#21283) Lu Fang 2025-07-21 13:47:47 -07:00
0ec82edda5 [perf] Speed up align sum kernels (#21079) Himanshu Jaju 2025-07-21 19:19:23 +01:00
005ae9be6c Fix bad lm-eval fork (#21318) Michael Goin 2025-07-21 13:47:51 -04:00
29d1ffc5b4 [DP] Fix Prometheus Logging (#21257) Robert Shaw 2025-07-21 12:11:35 -04:00
304dce7ec0 [Attention] Clean up iRoPE in V1 (#21188) Lucas Wilkinson 2025-07-21 12:10:30 -04:00
6ece16c4fe [Misc] Add dummy maverick test (#21199) Ming Yang 2025-07-21 09:08:09 -07:00
a0e827e07c [BugFix] make utils.current_stream thread-safety (#21252) (#21253) simpx 2025-07-22 00:07:36 +08:00
a15a50fc17 [CPU] Enable shared-memory based pipeline parallel for CPU backend (#21289) Li, Jiang 2025-07-22 00:07:08 +08:00
6dda13c86b [Misc] Add sliding window to flashinfer test (#21282) Woosuk Kwon 2025-07-21 08:37:49 -07:00
6b46c4b653 Add Nvidia ModelOpt config adaptation (#19815) Zhiyu 2025-07-21 07:02:58 -07:00
d97841078b [Misc] unify variable for LLM instance (#20996) Ning Xie 2025-07-21 19:18:33 +08:00
e6b90a2805 [Docs] Make tables more space efficient in supported_models.md (#21291) Harry Mellor 2025-07-21 10:25:02 +01:00
be54a951a3 [Docs] Fix hardcoded links in docs (#21287) Harry Mellor 2025-07-21 10:23:57 +01:00
042af0c8d3 [Model][1/N] Support multiple poolers at model level (#21227) Cyrus Leung 2025-07-21 17:22:21 +08:00
378d33c392 [Bugfix] Fix missing placeholder in logger debug (#21280) Cyrus Leung 2025-07-21 13:50:06 +08:00
940af1f03a Add the instruction to run e2e validation manually before release (#21023) Huy Do 2025-07-20 22:29:18 -07:00
92615d7fe8 [Docs] Add RFC Meeting to Issue Template (#21279) Simon Mo 2025-07-20 21:58:07 -07:00
8188196a1c [CI] Cleanup modelscope version constraint in Dockerfile (#21243) Kay Yan 2025-07-21 11:13:02 +08:00
7ba34b1241 [bugfix] fix syntax warning caused by backslash (#21251) Jiayi Yan 2025-07-21 01:12:10 +08:00
9499e26e2a [Model] Support VLMs with transformers backend (#20543) Raushan Turganbay 2025-07-20 15:25:50 +02:00
51ba839555 [Model] use AutoWeightsLoader for bart (#18299) Calvin Chen 2025-07-20 16:15:50 +08:00
d1fb65bde3 Enable v1 metrics tests (#20953) v0.10.0rc1 Seiji Eicher 2025-07-19 20:22:02 -07:00
3a1d8940ae [TPU] support fp8 kv cache quantization (#19292) Chengji Yao 2025-07-19 20:01:00 -07:00
2b504eb770 [Docs] [V1] Update docs to remove enforce_eager limitation for hybrid models. (#21233) Thomas Parnell 2025-07-20 01:09:58 +02:00
10eb24cc91 GLM-4 Update (#20736) Yuxuan Zhang 2025-07-20 06:40:31 +08:00
2e8cbb58f3 [BugFix] Fix full cuda graph slot_mapping (#21228) fhl2000 2025-07-20 05:13:18 +08:00
752c6ade2e [V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small (#21217) Woosuk Kwon 2025-07-19 13:53:17 -07:00
881e3cbe3b [V1] [Hybrid] Enable piecewise CUDA Graph for mamba layers (#21194) Thomas Parnell 2025-07-19 21:27:21 +02:00
9f414a12ad [BugFix] Make PD work with Ray (#21072) kourosh hakhamaneshi 2025-07-19 08:46:50 -07:00
6a971ed692 [Docs] Update the link to the 'Prometheus/Grafana' example (#21225) Jiayi Yan 2025-07-19 21:58:07 +08:00
da6579bf41 [CI/CD][bugfix]fix: error argument to loads has incompatible type (#21223) Sungjae Lee 2025-07-19 21:16:48 +09:00
c81259d33a Fix/remove some broken model executor tests (#21224) Rabi Mishra 2025-07-19 17:45:07 +05:30
e3a0e43d7f [bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code (#21032) Li, Jiang 2025-07-19 20:13:55 +08:00
b3d82108e7 [Bugfix][Frontend] Fix openai CLI arg middleware (#21220) 22quinn 2025-07-19 02:40:38 -07:00
6d0734c562 [NVIDIA] Add SM100 Flashinfer MoE blockscale fp8 backend for low latency (#20645) Kaixi Hou 2025-07-19 02:33:01 -07:00
7d94577138 Add torch golden impl for moe_align_block_size kernel test (#20653) shixianc 2025-07-19 02:32:36 -07:00
59f935300c [BugFix] Fix potential cuda-graph IMA (#21196) Lucas Wilkinson 2025-07-19 05:18:47 -04:00
18e519ec86 [Bugfix] Fix ndarray video color from VideoAsset (#21064) Isotr0py 2025-07-19 17:17:16 +08:00
1eaff27815 [V0 deprecation] Remove long context LoRA (#21169) Jee Jee Li 2025-07-19 17:15:41 +08:00
cf8cc32674 Fix a couple of Voxtral tests (#21218) Huy Do 2025-07-19 02:13:41 -07:00
3a2cb2649d [Misc][Tools][Benchmark] Add readme file for auto_tune script (#20779) Chenyaaang 2025-07-19 02:06:59 -07:00
3e04107d97 [Model] EXAONE 4.0 model support (#21060) 김종곤 2025-07-19 15:25:44 +09:00
37bd8d6e4c [Bug] DeepGemm: Fix TypeError: per_block_cast_to_fp8() missing 1 required positional argument: 'use_ue8m0' for SM100 (#21187) Wentao Ye 2025-07-19 02:25:22 -04:00
468e2400fe [BugFix][CPU] Fix TorchSDPABackendImpl doesn't have use_irope (#21200) Lucas Wilkinson 2025-07-19 02:18:48 -04:00
dcc6cfb991 [Kernel][Performance] Tweak MoE Batched silu_mul_fp8_quant_deep_gemm kernel (#21193) Varun Sundar Rabindranath 2025-07-19 11:39:51 +05:30
dd572c0ab3 [V0 Deprecation] Remove V0 Spec Decode workers (#21152) Woosuk Kwon 2025-07-18 21:47:50 -07:00
9ffe905a41 [Bugfix][Model] Fix LoRA for Mistral-Small-3.1-24B-Instruct-2503 (#21183) Varun Sundar Rabindranath 2025-07-19 09:45:03 +05:30
9a9fda1423 [Core] Support Local Chunked Attention for Hybrid KV Cache (#19351) Lucia Fang 2025-07-19 11:48:38 +08:00
466e878f2a [Quantization] Enable BNB support for more MoE models (#21100) Jee Jee Li 2025-07-19 08:52:02 +08:00
217937221b Elastic Expert Parallel Initial Support (#20775) Rui Qiao 2025-07-18 17:46:09 -07:00
5782581acf [Bugfix] Voxtral on Blackwell GPUs (RTX 50 series) (#21077) hax0r31337 2025-07-19 00:40:18 +02:00
0f199f197b [Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (#21005) JialinOuyang-Meta 2025-07-18 12:34:40 -07:00
b2eb2b5ad7 [Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0 (#19346) Richard Zou 2025-07-18 14:10:21 -04:00
21274ab476 [CI] Update CODEOWNERS for vllm/compilation (#21185) Richard Zou 2025-07-18 09:51:12 -04:00
ed8cbfedf8 Let GraniteMoeAttention use YaRN (#21174) Thomas Parnell 2025-07-18 14:52:52 +02:00
45badd05d0 [Core] Set pooling params based on task and model (#21128) Cyrus Leung 2025-07-18 20:41:17 +08:00
4adc66f64d [Bugfix] Allocate less memory in non-batched CUTLASS MoE (#21121) ElizaWszola 2025-07-18 12:55:52 +02:00
55ad648715 [Doc] Fix typo in model name (#21178) Cyrus Leung 2025-07-18 18:55:10 +08:00
5895afd780 [Bugfix] The special_tokens in tokenizer should also be controlled by do_lower_case in encoder_config. (#20750) wang.yuqi 2025-07-18 17:10:47 +08:00
ca4eb82bcb [Model] Re-add the implicit conversion feature for as_seq_cls_model (#21103) wang.yuqi 2025-07-18 15:15:07 +08:00
ba2dfbb0c2 [Misc] Make MM embedding merge interface explicit in model runner (#21147) Roger Wang 2025-07-18 00:13:57 -07:00
1bf65138f6 [benchmark] Sending request strictly follows the random intervals (#21108) Jialin Ouyang 2025-07-17 23:22:08 -07:00

... 79 80 81 82 83 ...