Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

7c1f760024 [Kernel][TPU][ragged-paged-attn] vLLM code change for PR#8896 (#15659) yarongmu-google 2025-03-28 21:13:15 -07:00
da461f3cbf [TPU][V1][Bugfix] Fix w8a8 recompiilation with GSM8K (#15714) Nicolò Lucchesi 2025-03-29 05:13:06 +01:00
5b800f0932 [Bugfix] set VLLM_WORKER_MULTIPROC_METHOD=spawn for vllm.entrypoionts.openai.api_server (#15700) Jinzhen Lin 2025-03-29 12:12:26 +08:00
8427f70493 Use numba 0.61 for python 3.10+ to support numpy>=2 (#15692) cyyever 2025-03-29 12:11:51 +08:00
7a7992085b [CI] Speed up V1 structured output tests (#15718) Russell Bryant 2025-03-29 00:10:45 -04:00
1286211f57 [Bugfix] LoRA V1: add and fix entrypoints tests (#15715) Varun Sundar Rabindranath 2025-03-28 21:10:41 -07:00
6d531ad7b8 [Misc][V1] Misc code streamlining (#15723) Nick Hill 2025-03-28 20:59:47 -07:00
762b424a52 [Docs] Document v0 engine support in reasoning outputs (#15739) Ce Gao 2025-03-29 11:46:57 +08:00
de1cb38769 [Model] Support Skywork-R1V (#15397) pengyuange 2025-03-29 11:39:21 +08:00
c802f5430d [ROCm][AMD][Build] Update AMD supported arch list (#15632) Gregory Shtrasberg 2025-03-28 23:39:18 -04:00
cff8991a50 [Docs][V1] Optimize diagrams in prefix caching design (#15716) simpx 2025-03-29 11:33:58 +08:00
f3f8d8fff4 implement prometheus fast-api-instrumentor for http service metrics (#15657) daniel-salib 2025-03-28 17:12:02 -07:00
26df46ee59 [Misc] cli auto show default value (#15582) Reid 2025-03-29 06:23:00 +08:00
c3f687ac22 [V1] TPU - Fix the chunked prompt bug (#15713) Alexander Matveev 2025-03-28 16:19:04 -04:00
04437e313d [Bugfix] [torch.compile] Add Dynamo metrics context during compilation (#15639) Luka Govedič 2025-03-28 16:01:09 -04:00
038bededba [TPU] [Perf] Improve Memory Usage Estimation (#15671) Robert Shaw 2025-03-28 10:37:52 -07:00
d03308be0c [Misc] Remove stale func in KVTransferConfig (#14746) shangmingc 2025-03-29 01:33:32 +08:00
c6bc0034d0 [Misc] Remove unused utils and clean up imports (#15708) Cyrus Leung 2025-03-29 00:41:16 +08:00
70e132244a [Minor] Remove TGI launching script (#15646) Woosuk Kwon 2025-03-28 09:30:08 -07:00
47e9038d23 Fix cpu offload testing for gptq/awq/ct (#15648) Michael Goin 2025-03-28 10:29:32 -06:00
432cf22a6a [Bugfix] Fix regex compile display format (#15368) Kebe 2025-03-28 23:58:44 +08:00
2914006fe0 [doc] add missing imports (#15699) Reid 2025-03-28 23:56:48 +08:00
7329ff5468 [V1] Support disable_any_whtespace for guidance backend (#15584) Russell Bryant 2025-03-28 11:46:45 -04:00
541d1df486 [Bugfix] embed_is_patch for Idefics3 (#15696) Cyrus Leung 2025-03-28 23:27:52 +08:00
3b00ff9138 [Bugfix][v1] xgrammar structured output supports Enum. (#15594) Chauncey 2025-03-28 21:14:53 +08:00
91276c5721 [Model] Adding torch compile annotations to chatglm (#15624) Jee Jee Li 2025-03-28 21:14:09 +08:00
0b4167526d [Docs] Add "Generation quality changed" section to troubleshooting (#15701) Harry Mellor 2025-03-28 13:03:21 +00:00
fd5fd26902 [Frontend] update priority for --api-key and VLLM_API_KEY (#15588) Reid 2025-03-28 19:40:12 +08:00
3bbaacbe15 [Bugfix][Frontend] Eliminate regex based check in reasoning full generator (#14821) Ce Gao 2025-03-28 19:20:35 +08:00
a10314c6b3 [Misc] Fix test_sleep to use query parameters (#14373) Lize Cai 2025-03-28 19:00:14 +09:00
70f2c2a709 [Bugfix] Fix 'InductorAdaptor object has no attribute 'cache_dir' (#15674) Jee Jee Li 2025-03-28 17:10:40 +08:00
280d074103 [CPU][CI] Improve CPU Dockerfile (#15690) Li, Jiang 2025-03-28 16:36:31 +08:00
32b14baf8a [Refactor][Frontend] Keep all logic about reasoning into one class (#14428) Ce Gao 2025-03-28 15:23:30 +08:00
2d9045fce8 [TPU][CI] Fix TPUModelRunner Test (#15667) Robert Shaw 2025-03-28 03:01:26 -04:00
355f66348c [V1] Remove legacy input registry (#15673) Cyrus Leung 2025-03-28 14:34:34 +08:00
8693e47e6a [Bugfix] Fix mm_hashes forgetting to be passed (#15668) Cyrus Leung 2025-03-28 13:51:05 +08:00
cec8c7d7f8 Refactor error handling for multiple exceptions in preprocessing (#15650) Jason (Siyu) Zhu 2025-03-27 20:27:20 -07:00
4d0ec37267 [Quantization][FP8] Adding support for fp8 gemm layer input in fp8 (#14578) Gregory Shtrasberg 2025-03-27 22:58:16 -04:00
e7f720ea56 [Misc]add coding benchmark for speculative decoding (#15303) Chen Xia 2025-03-27 19:47:05 -07:00
4ae17bf1e2 Revert "Use Cache Hinting for fused_moe kernel (#15511)" (#15645) Wes 2025-03-27 20:45:55 -06:00
8a49eea74b [CI][TPU] Temporarily Disable Quant Test on TPU (#15649) Robert Shaw 2025-03-27 22:45:05 -04:00
b4245a48df [Doc] Fix dead links in Job Board (#15637) wwl2755 2025-03-27 21:43:40 -05:00
4e0f6076be [Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (#14948) Kebe 2025-03-28 10:13:41 +08:00
726efc6a32 [Quantization][V1] BitsAndBytes support V1 (#15611) Jee Jee Li 2025-03-28 10:12:47 +08:00
bd45912b99 [TPU] Lazy Import (#15656) Robert Shaw 2025-03-27 21:57:01 -04:00
15dac210f0 [V1] AsyncLLM data parallel (#13923) Nick Hill 2025-03-27 16:14:41 -07:00
112b3e5b3b [CI] Update rules for applying tpu label. (#15634) Russell Bryant 2025-03-27 18:15:26 -04:00
32d669275b Correct PowerPC to modern IBM Power (#15635) cnorman 2025-03-27 17:04:32 -05:00
4098b72210 [Bugfix][TPU][V1] Fix recompilation (#15553) Nicolò Lucchesi 2025-03-27 20:15:06 +01:00
46450b8d33 Use absolute placement for Ask AI button (#15628) Harry Mellor 2025-03-27 18:52:18 +00:00
13ac9cab21 [Misc] Avoid direct access of global mm_registry in compute_encoder_budget (#15621) Cyrus Leung 2025-03-28 01:52:00 +08:00
66aa4c0bf4 [Feature] Add middleware to log API Server responses (#15593) Yuan Tang 2025-03-27 13:49:38 -04:00
247181536f [Misc] Replace is_encoder_decoder_inputs with split_enc_dec_inputs (#15620) Cyrus Leung 2025-03-28 01:36:32 +08:00
07bf813fb5 [Doc] Link to onboarding tasks (#15629) Cyrus Leung 2025-03-28 00:30:53 +08:00
8958217ad5 [Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 (#15211) Hiroaki Sugiyama 2025-03-27 23:29:29 +09:00
ac5bc615b0 [Model] MiniCPM-V/O supports V1 (#15487) Cyrus Leung 2025-03-27 21:07:29 +08:00
8063dfc61a [Doc] update --system for transformers installation in docker doc (#15616) Reid 2025-03-27 20:38:46 +08:00
6278bc829e Fix incorrect filenames in vllm_compile_cache.py (#15494) Richard Zou 2025-03-27 06:33:41 -04:00
3f532cb6a6 [Misc] Use model_redirect to redirect the model name to a local folder. (#14116) wang.yuqi 2025-03-27 17:21:23 +08:00
e6c9053f9e [Misc] Clean up scatter_patch_features (#15559) Cyrus Leung 2025-03-27 15:45:00 +08:00
43ed4143c4 [Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM (#15587) Robert Shaw 2025-03-27 02:47:25 -04:00
f4c98b4d4c [Misc] Consolidate LRUCache implementations (#15481) Bella kira 2025-03-27 14:43:43 +08:00
e1e0fd7543 [TPU] Avoid Triton Import (#15589) Robert Shaw 2025-03-27 02:43:02 -04:00
df8d3d1287 [Misc] Restrict ray version dependency and update PP feature warning in V1 (#15556) Rui Qiao 2025-03-26 23:21:07 -07:00
619d3de8bd [TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS (#15583) Chengji Yao 2025-03-26 22:46:26 -07:00
ecff8309a3 [ROCm] Env variable to trigger custom PA (#15557) Gregory Shtrasberg 2025-03-27 01:46:12 -04:00
dcf2a590f5 Allow torchao quantization in SiglipMLP (#15575) Jerry Zhang 2025-03-26 22:45:51 -07:00
54aa619459 [V1] Refactor num_computed_tokens logic (#15307) Cody Yu 2025-03-26 21:54:36 -07:00
fb22be5817 [moe][quant] add weight name case for offset (#15515) Mengqing Cao 2025-03-27 12:50:29 +08:00
7f301dd8ef [Doc] Update V1 user guide for fp8 kv cache support (#15585) Wei Zeng 2025-03-26 19:39:03 -07:00
8095341a01 [misc] LoRA: Remove unused long context test data (#15558) Varun Sundar Rabindranath 2025-03-26 19:04:51 -07:00
69db16a46a add platform check back (#15578) Chenyaaang 2025-03-26 18:50:27 -07:00
ce78f9af4e Add automatic tpu label to mergify.yml (#15560) Michael Goin 2025-03-26 19:39:58 -06:00
9239bf718e [Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972) ElizaWszola 2025-03-27 01:54:44 +01:00
7a6d45bc8a Support FIPS enabled machines with MD5 hashing (#15299) Matthew Vine 2025-03-26 20:19:46 -04:00
e74ff409e0 [TPU] support disabling xla compilation cache (#15567) Chengji Yao 2025-03-26 17:09:28 -07:00
7a888271f5 Use Cache Hinting for fused_moe kernel (#15511) Wes 2025-03-26 17:21:34 -06:00
9d119a86ae [V1] TPU CI - Fix test_compilation.py (#15570) Alexander Matveev 2025-03-26 17:51:54 -04:00
b2e85e26f4 [V1] TPU - Revert to exponential padding by default (#15565) Alexander Matveev 2025-03-26 17:35:05 -04:00
dd8a29da99 Applying some fixes for K8s agents in CI (#15493) Alexei-V-Ivanov-AMD 2025-03-26 15:35:11 -05:00
27df5199d9 Support SHA256 as hash function in prefix caching (#15297) marko 2025-03-26 19:11:28 +01:00
35fad35a48 [V1][Sampler] Faster top-k only implementation (#15478) Nick Hill 2025-03-26 10:56:47 -07:00
733e7c9e95 [Refactor] Remove unnecessary backend parameter in structured output interface (#15317) Aaron Pham 2025-03-26 13:51:56 -04:00
0af4d764d6 Fix weight loading for some models in Transformers backend (#15544) Harry Mellor 2025-03-26 17:17:53 +00:00
e64afa455c multi-node offline DP+EP example (#15484) youkaichao 2025-03-26 23:54:24 +08:00
1711b929b6 [Model] Add Reasoning Parser for Granite Models (#14202) Alex Brooks 2025-03-26 08:28:07 -06:00
c091c0a588 Improve validation of TP in Transformers backend (#15540) Harry Mellor 2025-03-26 14:26:48 +00:00
1aa162e030 Apply torchfix (#15532) cyyever 2025-03-26 20:09:06 +08:00
cf5c8f1686 Separate base model from TransformersModel (#15467) Harry Mellor 2025-03-26 10:13:38 +00:00
4ec2cee000 [Misc] improve example script output (#15528) Reid 2025-03-26 18:12:47 +08:00
99f536f830 [Misc] Enhance warning information to user-defined chat template (#15408) wwl2755 2025-03-26 04:21:15 -05:00
5ebf66748b [FEAT][ROCm] Integrate Fused MoE Kernels from AITER (#14967) vllmellm 2025-03-26 16:30:30 +08:00
781d056280 [Feature] Enhance EAGLE Architecture with Proper RMS Norms (#14990) Bryan Lu 2025-03-26 01:24:07 -07:00
5aefd6ac31 Fix raw_request extraction in load_aware_call decorator (#15382) daniel-salib 2025-03-25 22:29:54 -07:00
6c663dfd5e [misc] LoRA - Skip LoRA kernels when not required (#15152) Varun Sundar Rabindranath 2025-03-25 20:33:45 -07:00
33437bc6e7 [BugFix] Fix nightly MLA failure (FA2 + MLA chunked prefill, i.e. V1, producing bad results) (#15492) Lucas Wilkinson 2025-03-25 23:33:22 -04:00
23114d3364 [Misc] Warn about v0 in benchmark_paged_attn.py (#15495) Tyler Michael Smith 2025-03-25 23:31:04 -04:00
997c8811d6 [Model] Support multi-image for Molmo (#15438) Cyrus Leung 2025-03-26 11:26:33 +08:00
e42389f9d7 Transformers backend already supports V1 (#15463) Harry Mellor 2025-03-26 03:26:16 +00:00
ff38f0a32c [CI/Build] LoRA: Delete long context tests (#15503) Varun Sundar Rabindranath 2025-03-25 17:18:34 -07:00

... 103 104 105 106 107 ...