Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

e54ee3ea33 [Core] Deduplicate generate/encode logic in AsyncLLM (#31510) Nick Hill 2025-12-29 18:42:45 -08:00
358bfd315c fix: update kimi k2 tool parser logic (#31207) wangln19 2025-12-30 10:01:58 +08:00
39512aba72 [Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction (#27577) Sage 2025-12-30 02:17:16 +02:00
0f35429a0c [CI]Test Group 'NixlConnector PD accuracy tests' is fixed (#31460) qli88 2025-12-29 17:48:56 -06:00
d63b969675 [CI/ROCm] Fixing "V1 Test attention (H100)" test group. (#31187) Alexei-V-Ivanov-AMD 2025-12-29 15:53:59 -06:00
56f516254c [Bugfix][ROCm] Fix Static Quant Issue (#31502) Robert Shaw 2025-12-29 16:27:55 -05:00
9152a30d8f [MoE Refactor][12/N] Marlin Fp8 MoE Pure Function (#31499) Robert Shaw 2025-12-29 16:27:00 -05:00
c2ff33cc8c [Core] Enable async scheduling by default (#27614) Nick Hill 2025-12-29 12:20:55 -08:00
b12cb38398 implements register kv caches in lmcache connector (#31397) chunxiaozheng 2025-12-30 03:13:42 +08:00
5bc664110f Optimize QKNorm for MiniMax-M2/M2.1 (#31493) Roger Young 2025-12-30 00:30:18 +08:00
b3a2bdf1ac [Feature] Add offline FastAPI documentation support for air-gapped environments (#30184) RickyChen / 陳昭儒 2025-12-30 00:22:39 +08:00
e37e7349e6 Replace nn.ConvNd with vLLM's ConvNdLayer for Transformers modeling backend (#31498) Harry Mellor 2025-12-29 16:20:01 +00:00
b5d2d71d26 Migrate doc to website: Hardware Plugins (1/N) (#31496) Roy Wang 2025-12-29 23:55:20 +08:00
decc244767 [Docs] Use relative md links instead of absolute html links for cross referencing (#31494) Harry Mellor 2025-12-29 13:33:44 +00:00
9c884faa95 [Bugfix] Preserve tool call id/type/name in streaming finish chunk (#31438) amittell 2025-12-29 08:10:52 -05:00
48d5ca4e8b [CI] fix test_chat_truncation_content_not_null test (#31488) Chauncey 2025-12-29 20:47:08 +08:00
bf73a3e4d7 [Bugfix][Frontend] Fix Jina reranker multimodal input compatibility (#31445) twj 2025-12-29 17:13:18 +08:00
3ecfdc3776 [ROCm][GPTQ][Bugfix] Fix GPTQ GEMM kernel output zeroing race condition (#30719) Andreas Karatzas 2025-12-29 03:13:14 -06:00
45c1ca1ca1 [ROCm][CI] Skip DeepGemm-dependent test on ROCm platform (#31462) Andreas Karatzas 2025-12-29 01:31:10 -06:00
17347daaa2 [CI/Build][CPU] Update CPU CI test cases (#31466) Li, Jiang 2025-12-29 14:17:52 +08:00
b9793e6a8c Add Fused MoE Triton kernels for GLM-4.5-Air, GLM-4.5v, GLM-4.6v on 2x RTX Pro 6000 (#31407) Mamy Ratsimbazafy 2025-12-28 17:38:33 +01:00
0b6b701050 [Model] Add tuned triton fused_moe configs for Qwen3Moe on B200 (#31448) Jzz1943 2025-12-29 00:38:07 +08:00
094fcce250 [BugFix] Re-fix async multimodal cpu tensor race condition (#31373) Nick Hill 2025-12-28 03:05:08 -08:00
573dd0e6f0 [ROCm] Migrate xgrammar to upstream release (#31327) Andreas Karatzas 2025-12-28 02:08:29 -06:00
f70368867e [ROCm][CI] Add TorchCodec source build for transcription tests (#31323) Andreas Karatzas 2025-12-28 02:06:05 -06:00
96142f2094 [ROCm][CI] Added perceptron lib in requirements for isaac multi-modal test (#31441) Andreas Karatzas 2025-12-27 22:15:14 -06:00
62def07d67 [BugFix] register quant scale tensors as buffer (#31395) Boyuan Feng 2025-12-27 19:20:02 -08:00
b326598e97 add tip for VLLM_USE_PRECOMPILED arg to reduce docker build time (#31385) yitingdc 2025-12-28 11:19:47 +08:00
727c41f3fd [MoE Refactor][10/N] Cleanup Fp8 Process Weights After Loading (#31169) Robert Shaw 2025-12-27 15:22:48 -05:00
2f12cd32c0 [BugFix] Fix cache issue in compilation_config (#31376) Boyuan Feng 2025-12-27 06:30:39 -08:00
40a8756224 [Chore]: Remove HF format Phi4-MM examples (#31405) Isotr0py 2025-12-27 21:42:02 +08:00
3d024985ab [CI/Build] Ignore max transformers version for more common tests (#31401) Isotr0py 2025-12-27 21:06:26 +08:00
8711b21676 Fix/get raw stream patch #30905 (#30912) baonudesifeizhai 2025-12-26 23:08:47 -05:00
52bf066516 [Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector (#30166) Yifan Qiao 2025-12-26 18:25:46 -08:00
5326c89803 [XPU][CI]skip test_preprocess_error_handling due to fork/spawn issue (#31381) Kunshang Ji 2025-12-27 05:40:44 +08:00
87f1b8ca2c CustomOp: Unify aiter impl into GroupedTopk (#31221) Xinyu Chen 2025-12-27 01:44:29 +08:00
887e900b77 [Docs] Add profiler user docs for http request (#31370) rongfu.leng 2025-12-26 23:48:15 +08:00
48e744976c [Mistral common] Ensure all functions are imported from the top & only use public methods (#31138) Patrick von Platen 2025-12-26 13:48:24 +01:00
ce1eafd1a5 [Core] Initialize LoRA support for tower and connector in multi-modal models (#26674) Jee Jee Li 2025-12-26 20:48:20 +08:00
0b544e6476 [Docs] Fix some snippets (#31378) Harry Mellor 2025-12-26 12:47:41 +00:00
c3666f56fd [Misc] Fix Qwen2-MoE shared_expert_gate (#31339) Jee Jee Li 2025-12-26 13:10:39 +08:00
c79dbfa9ad [CI] Fix flaky vision beam search test with flexible semantic validation (#31324) Andreas Karatzas 2025-12-25 22:39:32 -06:00
9ee05cbe7f Support LoRA and GPTQModel for PLaMo 2/3 (#31322) Shinichi Hemmi 2025-12-26 12:41:33 +09:00
3b8f31b362 [benchmark] use model card root instead of id (#31329) Ning Xie 2025-12-26 10:55:56 +08:00
2cd94259c8 [CI/Build] Ignore max transformers version skipping for initialization tests (#30619) Isotr0py 2025-12-26 10:50:32 +08:00
b7165d53c6 Feature/isaac 0.1 (#28367) oscardev256 2025-12-25 21:49:11 -05:00
81786c8774 [BugFix] Fix async scheduling + reasoning with struct output (#31332) Nick Hill 2025-12-25 15:01:02 -08:00
f1531d9f2a [Hybrid] Mamba2 prefix cache blocks freeing for running requests (#28047) Stan Wozniak 2025-12-25 21:54:06 +01:00
2d6001f491 [Model][Ernie4.5-VL] Support video metadata for timestamp rendering (#31274) SongHe 2025-12-25 22:07:15 +08:00
030fc44914 use the same stream for cuda graph catpure and replay for NCCL (#29207) Amir Samani 2025-12-25 03:10:03 -08:00
2532f437ee [Doc] Add troubleshooting for Triton PTX error about undefined gpu-name (#31338) Isotr0py 2025-12-25 18:26:34 +08:00
f15185fbdb [Benchmark Suite] improve cpu Benchmark Suite tests and comparison report for 0.12.0 (#30994) Louie Tsai 2025-12-25 00:51:45 -08:00
ba25a65992 [Frontend] add FunctionGemma tool parser support (#31218) Mark Gatere 2025-12-25 10:29:25 +03:00
42826bbccd [Doc] Add tool call parser documentation for GPT-OSS models (#31212) Amith KK 2025-12-25 10:59:10 +05:30
254f6b9867 [Bugfix] Fix eagle dp tests on A100 (#31241) Richard Zou 2025-12-24 19:05:04 -05:00
bc5ef333e0 [Perf] Add skip_clone to SamplingParams for internal request handling (#31041) Michael Goin 2025-12-24 17:35:57 -05:00
09dc7c690c [Chore][1/2] Drop v0.14 deprecations (#31285) Cyrus Leung 2025-12-25 01:54:01 +08:00
506eb0f454 [Bugfix] Remove dead block_quant_to_tensor_quant function (#31294) ゆり 2025-12-25 01:22:48 +08:00
5d93089686 [cli] complete vllm cli help message (#31226) Ning Xie 2025-12-24 23:45:47 +08:00
66c9887440 [Bugfix][Hardware][AMD] Fix FP8 dtype in silu_mul quantization (#31179) Kevin McKay 2025-12-24 09:37:11 -06:00
1ff67df182 [CI] Reorganization pooling_mteb_test (#31265) wang.yuqi 2025-12-24 23:36:20 +08:00
7cd288a4b3 [PERF] Add interleaved memory allocation to NUMA module (#30800) skaraban3807 2025-12-24 19:17:49 +05:30
d201807339 [Chore] Bump lm-eval version (#31264) Cyrus Leung 2025-12-24 21:39:13 +08:00
aa3868ecfe [Chore] Remove unused noqas (#31263) Cyrus Leung 2025-12-24 21:38:46 +08:00
7adeb4bfa8 [Bugfix] Fix max_model_len="auto" handling (#31260) Cyrus Leung 2025-12-24 19:15:27 +08:00
bd89ce16d2 [Model] Introduce verify_and_update_model_config for VerifyAndUpdateConfig. (#31131) wang.yuqi 2025-12-24 17:54:57 +08:00
b41aeb3468 [Bugfix][ROCm] Fix load issue on deepseek quark quantization when shared expert enabled (#31261) Pleaplusone 2025-12-24 16:47:44 +08:00
ddfac7034e [CI/Build] Ignore data_parallel_size_local (#30281) Ryan Rock 2025-12-24 01:40:54 -06:00
6559d96796 [ROCm][CI] Set TORCH_NCCL_BLOCKING_WAIT Distributed Tests On ROCm (#31259) Micah Williamson 2025-12-24 01:19:07 -06:00
1c74150bca [ROCm][CI] Fix "Distributed Tests (H200)" Test (#31227) kliuae 2025-12-24 14:56:30 +08:00
0247a91e00 [ROCm][CI] Fix entrypoints tests and Python-only installation test on ROCm (#28979) Andreas Karatzas 2025-12-24 00:42:30 -06:00
8ee90c83f8 Add --max-model-len auto to auto-fit context to available memory (#29431) Michael Goin 2025-12-24 00:37:14 -05:00
d7e05ac743 [docker] Fix downloading sccache on aarch64 platform (#30070) Nick Cao 2025-12-23 21:36:33 -08:00
471ddb99a0 [XPU] Remove distributed_executor_backend check (#30760) sihao_li 2025-12-24 13:34:33 +08:00
bb24592d13 [Qwen3-Omni] fixed _get_feat_extract_output_lengths function (#31007) Xiong Wang 2025-12-24 13:33:54 +08:00
369f47aa0f [DeepSeek v3.2] Remove unnecessary syncwarps (#31047) Matthew Bonanni 2025-12-24 00:33:30 -05:00
dabff12ed3 [Bugfix][ROCm][Dynamo][DS 3.1][FP8] fix unsupported hasattr call when Dynamo tracing for ROCm device (#31149) zejunchen-zejun 2025-12-24 13:32:19 +08:00
3bb9561928 Revert "[bench] Support common prefix len config (for decode-only bench)" (#31240) Ming Yang 2025-12-23 21:17:23 -08:00
3ce791ac77 [ROCm][CI] Set VLLM_FLOAT32_MATMUL_PRECISION="tf32" For terratorch Tests In AMD CI (#31242) Micah Williamson 2025-12-23 21:21:50 -06:00
e42894f5b5 [ROCm][CI][Bugfix] Fix Siglip2 rotary embedding dispatch and InternVL video test tolerance (#31235) Andreas Karatzas 2025-12-23 20:56:58 -06:00
76e6a95192 [Bug] Fix Number of dimensions of tensors must match. for Deepseek V3.2 (#31160) Wentao Ye 2025-12-23 21:41:09 -05:00
8b59753cdb [P/D] Mooncake connector support more protocols (#30133) Chao Lei 2025-12-24 10:24:07 +08:00
538e830caa [KVEvent] User request.block_hash for parent block_hash (#30544) Chen Zhang 2025-12-23 18:23:43 -08:00
4ed11105d7 [Misc] Remove unused custom ops copy_blocks and copy_blocks_mla (#30967) rongfu.leng 2025-12-24 10:22:35 +08:00
dd424571c8 [Bugfix] Enable dynamic_dims for different embeds shape (#31223) Cyrus Leung 2025-12-24 10:15:47 +08:00
ca6a95ba25 [Chore] Simplify logic of _execute_mm_encoder (#31222) Cyrus Leung 2025-12-24 10:15:16 +08:00
bc0a5a0c08 [CI] Add Qwen3-Next-FP8 to Blackwell model tests (#31049) Vadim Gimpelson 2025-12-24 05:21:50 +04:00
bfa2c0bbb9 [ROCm][Bugfix] Fix RuntimeError in MMEncoderAttention by replacing .view() with .reshape() (#31203) Andreas Karatzas 2025-12-23 15:48:01 -06:00
f790068600 [Core] Add a random suffix to frontend-provided request IDs (#27987) Mark McLoughlin 2025-12-23 21:05:39 +00:00
34916ae37f [Mamba] - Consolidate Mambas Attention Logic (#28133) Asaf Joseph Gardin 2025-12-23 22:57:00 +02:00
0736f901e7 docs: Add llm-d integration to the website (#31234) Yuan Tang 2025-12-23 15:27:22 -05:00
c016c95b45 Use helper function instead of looping through attribute names (#29788) Harry Mellor 2025-12-23 17:31:56 +00:00
1339878e13 Only patch original_max_position_embeddings for Transformers v4 (#31214) Harry Mellor 2025-12-23 16:46:32 +00:00
b94f80ffb8 [FIX] FP4 quantization kernel padding initialization bug (#31097) danielafrimi 2025-12-23 18:45:18 +02:00
38c361f99d Fix edge case Mistral tool parser (#30724) Joachim Studnia 2025-12-23 15:19:58 +01:00
bb62dda2c3 [Misc] Introduce encode_*_url utility function (#31208) Cyrus Leung 2025-12-23 21:45:21 +08:00
3faa8bee57 adapt voxtral (#31095) Patrick von Platen 2025-12-23 14:31:55 +01:00
b10d47e0e0 Add util function for checking nesting of rope parameters (#31146) Harry Mellor 2025-12-23 11:41:49 +00:00
769f27e701 [OpenAI] Add parameter metadata to validation errors (#30134) R3hankhan 2025-12-23 17:00:12 +05:30
23daef548d [Frontend] Support using chat template as custom score template for reranking models (#30550) Jakub Zakrzewski 2025-12-23 12:19:16 +01:00

... 31 32 33 34 35 ...