Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

879f69bed3 [Refactor] Remove duplicate ceil_div (#20023) Wentao Ye 2025-06-25 01:19:09 -04:00
7108934142 [Frontend] speed up import time of vllm.config (#18036) David Xia 2025-06-25 00:41:11 -04:00
3443aaf8dd Move to a faster base64 implementation (#19984) h-avsha 2025-06-25 06:33:51 +03:00
2273ec322c Revert "Fix(models/siglip): Add compatibility for Gemma models quantized by llm-compressor" (#20030) Isotr0py 2025-06-25 11:23:29 +08:00
a6c4b87fbc Revert "[Feature] Integrate new deepgemm (#19820)" (#20049) Wentao Ye 2025-06-24 22:45:22 -04:00
1afa9948f5 [Llama4] Update attn_temperature_tuning (#19997) Brayden Zhong 2025-06-24 22:42:53 -04:00
0d06b533a0 cmake: Update vllm_flash_attn for vllm_kernels (#20032) Eli Uriegas 2025-06-24 15:44:10 -07:00
c01d1c5aba use .dev for version comparison with pytorch nightly release (#20031) Boyuan Feng 2025-06-24 14:52:16 -07:00
ead369845d [Easy] Remove submodule added in #19463 (#20039) Brayden Zhong 2025-06-24 16:23:15 -04:00
c6e3bba8e6 [Feature] Integrate new deepgemm (#19820) Wentao Ye 2025-06-24 15:51:56 -04:00
91f7d9d0b6 [P/D] Asynchronously do _nixl_handshake (#19836) lkchen 2025-06-24 12:46:10 -07:00
8619e7158c [BugFix] Fix multi-node offline data parallel (#19937) Nick Hill 2025-06-24 12:45:20 -07:00
c635c5f744 [Misc][Benchmarking] Add variable request-rate ("ramp-up") to the benchmarking client. (#19423) d.transposed 2025-06-24 20:41:49 +02:00
a045b7e89a [Perf] Improve/Fix-regression for FA3 in High QPS regimes (#19463) Lucas Wilkinson 2025-06-24 13:09:01 -04:00
981eeca41a [Fix][V1] Remove --scheduling-policy oracle (#20010) amit 2025-06-24 19:52:15 +03:00
26d34eb67e refactor example - qwen3_reranker (#19847) Reid 2025-06-24 22:03:20 +08:00
53da4cd397 [Bugfix][CPU] Fix InputBatch for pooling models in the CPU v1 (#20014) Li, Jiang 2025-06-24 21:20:04 +08:00
9a3b88328f [PERF] Speedup of MRoPE prepare inputs (#19939) Vadim Gimpelson 2025-06-24 10:01:26 +04:00
3014c920da add some examples for other benchmark scripts (#19893) Reid 2025-06-24 13:57:46 +08:00
0eed516951 [doc] Fix broken link in the installation for CPU (#19980) Kay Yan 2025-06-24 12:04:11 +08:00
ee5ad8d2c5 [Misc][Tools][Benchmark] Add profile to autotune script (#19711) Chenyaaang 2025-06-23 17:59:41 -07:00
a738dbb2a1 Update test case parameter to have the throughput above 8.0 (#19994) QiliangCui 2025-06-23 17:18:10 -07:00
33d5e29be9 [TPU] Fix tpu model runner test (#19995) Chenyaaang 2025-06-23 16:04:28 -07:00
4671ac6e2a [Bugfix][Benchmark] Fix Marlin benchmark (#19929) 22quinn 2025-06-23 15:25:12 -07:00
dd2ccf8dde Feat Dynamic Quantization for MoE Layers in GPTQ Marlin Backend (#19395) Jun-Howie 2025-06-24 06:23:28 +08:00
a3bc76e4b5 [CI/Build] Push latest tag for cpu and neuron docker image (#19897) 22quinn 2025-06-23 14:15:37 -07:00
e6327c9b3e [Feature] Support sequence parallelism for static fp8 quantization (#19181) cascade 2025-06-23 13:09:02 -07:00
d0132f025d [Misc] Add type alias ReqId and EngineId for better readability (#19880) lkchen 2025-06-23 12:57:57 -07:00
61f4fc5dc6 [Bugfix][v1] Fix step pooler implementation and step pooling usage in v1 (#19956) Isotr0py 2025-06-24 02:38:06 +08:00
68aaeb3749 [EP+DP] Optimize the little operations in the DeepGEMM + DeepEP low latency case (#19885) Tyler Michael Smith 2025-06-23 14:07:47 -04:00
c3649e4fee [Docs] Fix syntax highlighting of shell commands (#19870) Lukas Geiger 2025-06-23 18:59:09 +01:00
53243e5c42 [doc] improve readability for long commands (#19920) Reid 2025-06-23 22:27:07 +08:00
a6e6604d32 [Bugfix] Fix CI bitsandbytes failure (#19969) Jee Jee Li 2025-06-23 21:30:55 +08:00
b82e0f82cb [doc] use MkDocs collapsible blocks - supplement (#19973) Reid 2025-06-23 18:54:16 +08:00
5111642a6f [Doc] Update V1 status for decoder-only embedding models (#19952) Isotr0py 2025-06-23 17:31:06 +08:00
1bcd15edc7 [BugFix][P/D] Fix for cases where _recving_transfers can be cleaned up when *all* transfer done (#19874) lkchen 2025-06-22 22:41:53 -07:00
2ebff5b77c [P/D][NixlConnector] Support tp_size > num_kv_heads deployments (#19691) Nicolò Lucchesi 2025-06-23 07:41:50 +02:00
f17aec0d63 [doc] Fold long code blocks to improve readability (#19926) Reid 2025-06-23 13:24:23 +08:00
493c275352 Fix(models/siglip): Add compatibility for Gemma models quantized by llm-compressor (#19643) Vensen 2025-06-23 11:40:28 +08:00
f39ab2d4bd [Misc] Configurable timeout for execute_model RPC calls via env var (#19544) jinqinn 2025-06-23 11:36:26 +08:00
4a0f7888a3 [Core] feat: Implement Priority Scheduling in V1 Engine (#19057) amit 2025-06-23 06:18:08 +03:00
c4cf260677 [Perf][CLI] Improve overall startup time (#19941) Aaron Pham 2025-06-22 19:11:22 -04:00
33d51f599e [BugFix] Add an env to disable moe chunking to work around compile incompatibility (#19642) Ye (Charlotte) Qi 2025-06-22 15:17:49 -07:00
e91386cde1 [Chore] dedup logs (#19955) Aaron Pham 2025-06-22 15:43:07 -04:00
2c11a29f0b [Misc] Simplify vllm bench cli subcommand implementation (#19948) Ye (Charlotte) Qi 2025-06-22 09:34:48 -07:00
c76a506bd6 [Misc] Update model-specific PR tagging (#19949) Roger Wang 2025-06-22 05:16:08 -07:00
ec0db6f51c [doc] use snippets for contact us (#19944) Reid 2025-06-22 18:26:13 +08:00
c305a2109d [CI/Build] Auto tag perf benchmarks related PRs (#19943) 22quinn 2025-06-22 01:46:21 -07:00
202c5df935 [Benchmark] fix request loss if "ping" is returned (#19535) Wang, Yi 2025-06-22 15:21:04 +08:00
2bb246b8f7 [MISC] add cpu_kvcache_space_bytes to CacheConfig (#19812) Ning Xie 2025-06-22 13:39:09 +08:00
4c409cabc2 [Misc] add vllm_config in __init__ (#19866) Ning Xie 2025-06-22 11:10:46 +08:00
3b1e4c6a23 [Docs] Add GPT2ForSequenceClassification to supported models in docs (#19932) Adrian 2025-06-21 22:57:19 +02:00
2c5302fadd [Multimodal] Optimize Qwen2/2.5-VL startup time (#19756) Woosuk Kwon 2025-06-21 13:01:07 -07:00
caa680fd2e [doc] add contact us in community (#19922) Reid 2025-06-22 01:29:06 +08:00
c3bf9bad11 [New model support]Support Tarsier2 (#19887) 汪志鹏 2025-06-21 12:01:51 +08:00
6f170f11dd [Bugfix] Fix bnb 8bit model weights loading (#19917) Isotr0py 2025-06-21 11:29:09 +08:00
8ca81bb069 Fix: Check the type of params to be a Sequence not list. (#19910) Rabin Adhikari 2025-06-21 01:03:17 +02:00
e773a9e1c2 [Misc] Clean up useless code (#19889) wangxiyuan 2025-06-21 05:09:09 +08:00
71baf85ae1 [Kernel] mark TorchSDPABackend swap_blocks NotImplementedError (#19749) Ning Xie 2025-06-21 02:18:11 +08:00
79f2f1c2a1 [CPU][CI] Fallback sliding window to v0 and fix CPU pooling model tests (#19901) Li, Jiang 2025-06-20 23:30:36 +08:00
2e3e3c86dc Export NaNs in logits to scheduler_stats if output is corrupted (#18777) Vlad Tiberiu Mihailescu 2025-06-20 07:47:16 -07:00
7e8977fcd4 [custom_op][vllm-plugin] update custom_op class to use op_registry (#19164) Chendi.Xue 2025-06-20 09:44:56 -05:00
f1e840e842 [Model] GPT2ForSequenceClassification model (#19663) Adrian 2025-06-20 14:07:41 +02:00
7771d1de88 [Fix] import regex instead of re (#19875) Thomas Parnell 2025-06-20 13:16:48 +02:00
71d1219545 [Kernel] correct cpu worker function parameter type (#19745) Ning Xie 2025-06-20 18:50:13 +08:00
e384f2f108 [Misc] refactor example - openai_transcription_client (#19851) Reid 2025-06-20 16:02:21 +08:00
089a306f19 [Misc] update cuda version (#19526) Reid 2025-06-20 15:25:15 +08:00
5e666f72cd [Bugfix][Ray] Set the cuda context eagerly in the ray worker (#19583) kourosh hakhamaneshi 2025-06-19 22:01:16 -07:00
e3a3e4db46 [Bugfix] Enable PP with AITER+V1 (#19822) qli88 2025-06-19 23:43:20 -05:00
e41bf15cd0 [Chore]: qwen3-moe-type-hints-mistake (#19860) Xerxes 2025-06-20 12:43:07 +08:00
5aa4a015ce [Benchmark] Fix Value of type "SampleRequest" is not indexable (#18032) Brayden Zhong 2025-06-20 00:28:55 -04:00
b6bad3d186 [CI][Neuron] Fail and exit on first error (#19622) Elaine Zhao 2025-06-19 21:27:51 -07:00
ee9a1531aa [CI/Build][Bugfix] Fix deadlock on v1 engine test CI (#19872) Isotr0py 2025-06-20 09:51:07 +08:00
10d82f9ac5 [Benchmark][Bugfix] Fix Dataset Length Calculation (#19868) Robert Shaw 2025-06-19 21:30:41 -04:00
ea10dd9d9e [Frontend] early return chat format resolution when specified (#19735) xzbdmw 2025-06-20 02:49:59 +08:00
ead2110297 [Core][Bugfix] Fix Online MM Beam Search (#19688) Alex Brooks 2025-06-19 11:18:07 -06:00
01220ce89a [CI][CPU] Improve dummy Triton interfaces and fix the CPU CI (#19838) Li, Jiang 2025-06-19 23:46:09 +08:00
6f68c49220 [Doc] Update V1 user guide for embedding models (#19842) 22quinn 2025-06-19 02:43:27 -07:00
4719460644 Fixing Chunked Prefill Test. (#19762) Alexei-V-Ivanov-AMD 2025-06-19 03:36:16 -05:00
466166dcfd [Frontend] Add optional token-level progress bar to LLM.beam_search (#19301) NekoMimiUnagi 2025-06-19 02:21:41 -05:00
1d0ae26c85 Add xLAM tool parser support (#17148) Zuxin 2025-06-18 23:26:41 -07:00
6021999573 [Minor] Allow redirecting model path for HfRunner in test (#19795) Isotr0py 2025-06-19 14:04:10 +08:00
c7b370c603 raise exception for pin_lora (#19809) Ning Xie 2025-06-19 13:57:35 +08:00
aa20d10a91 [Misc] [ROCm] Prevent surplus tensor reshape (#19803) zsolt-borbely-htec 2025-06-19 07:57:16 +02:00
2de12be428 [ROCm] [AITER] [Bugfix] Patch for AITER commit 648764942e552a8bb5fe16026703716a81f05374 (#18990) TJian 2025-06-18 22:56:31 -07:00
83ca9ae47b Mark invariant normalizer in Gemma as non-persistent (#19788) Yu-Hang "Maxin" Tang 2025-06-18 22:56:03 -07:00
e2148dc5ea [Bugfix] Add check_health to v1 async client. (#19821) kourosh hakhamaneshi 2025-06-18 21:47:01 -07:00
b1098b4072 [Bugfix] Fix the linter (#19826) Lu Fang 2025-06-19 12:44:41 +08:00
799397ee4f Support embedding models in V1 (#16188) Maximilien de Bayser 2025-06-19 01:36:33 -03:00
4959915089 [Quantization] Modify the logic of BNB double quantization (#19742) Jee Jee Li 2025-06-19 11:52:09 +08:00
8d1e89d946 [Misc][ROCm] Enforce no unused variable in ROCm C++ files (#19796) Lu Fang 2025-06-19 11:25:15 +08:00
36239f79dd Fix FA2 fallback for Blackwell V1 (#19781) Michael Goin 2025-06-19 10:53:55 +09:00
dfada85eee [Frontend] Expose custom args in OpenAI APIs (#16862) afeldman-nm 2025-06-18 20:41:11 -04:00
ed33349738 [BugFix] Fix use_cudagraph=False (#19612) Richard Zou 2025-06-18 20:23:12 -04:00
d49adea1f9 [Multimodal] Use fast processor for Qwen2/2.5-VL (#19789) Woosuk Kwon 2025-06-18 15:49:40 -07:00
14fdd21d39 [Core] More fixes to MultiModalEmbeddings type handling (#19715) Russell Bryant 2025-06-18 18:48:29 -04:00
04fefe7c9a [TPU] Update torch-xla version to include paged attention tuned block change (#19813) QiliangCui 2025-06-18 15:41:13 -07:00
3b523e38d9 [Core] Do not copy array during hashing (#19484) Lukas Geiger 2025-06-18 23:36:55 +01:00
16c16301c8 Disable "Forbid direct 'import triton'" check for vllm/triton_utils/importing.py in an extensible way (#19783) afeldman-nm 2025-06-18 18:08:00 -04:00
9206d0ff01 docs: fix Slack bulletpoint in README (#19811) Nathan Weinberg 2025-06-18 16:47:08 -04:00

... 85 86 87 88 89 ...