Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

a5cfbab3c8 [Core] LoRA: V1 Scheduler optimization (#15422) Varun Sundar Rabindranath 2025-03-25 15:50:09 -07:00
ac3cd6e83c [core] add bucket padding to tpu_model_runner (#14995) Chenyaaang 2025-03-25 14:27:22 -07:00
082ab86f5f [V1] Support long_prefill_token_threshold in v1 scheduler (#15419) Lu Fang 2025-03-25 14:22:26 -07:00
6aa196c8dc [V1][Minor] Use SchedulerInterface type for engine scheduler field (#15499) Nick Hill 2025-03-25 14:21:36 -07:00
a0dd7dcd49 [TPU][V1] Fix Sampler recompilation (#15309) Nicolò Lucchesi 2025-03-25 21:43:54 +01:00
e977c11111 Add workaround for shared field_names in pydantic model class (#13925) Maximilien de Bayser 2025-03-25 17:31:08 -03:00
5f063a80bd [bugfix] add supports_v1 platform interface (#15417) Joe Runde 2025-03-25 15:00:32 -04:00
5d8e1c9279 [Bugfix] Support triton==3.3.0+git95326d9f for RTX 5090 (Unsloth + vLLM compatibility) (#15471) Antonio Gómez 2025-03-25 18:59:25 +01:00
0a049c7d86 [CI/Build] Add tests for the V1 tpu_model_runner. (#14843) yarongmu-google 2025-03-25 09:27:16 -07:00
d0cfec7ab9 [bugfix] fix inductor cache on max_position_embeddings (#15436) youkaichao 2025-03-25 22:05:39 +08:00
a608160027 [Kernel] Fix conflicting macro names for gguf kernels (#15456) Szymon Ożóg 2025-03-25 14:50:49 +01:00
3f04a7fbf2 [Doc] Update V1 user guide for multi-modality (#15460) Cyrus Leung 2025-03-25 19:01:58 +08:00
5994430b84 [Misc] Remove redundant num_embeds (#15443) Cyrus Leung 2025-03-25 18:27:57 +08:00
a9e879b316 [Misc] Clean up MiniCPM-V/O code (#15337) Cyrus Leung 2025-03-25 18:22:52 +08:00
3e2f37a69a Dockerfile.ppc64le changes to move to UBI (#15402) Md. Shafi Hussain 2025-03-25 15:45:14 +05:30
4f044b1d67 [Kernel][CPU] CPU MLA (#14744) Thien Tran 2025-03-25 17:34:59 +08:00
4157f563b4 [Hardware][TPU][Bugfix] Fix v1 mp profiler (#15409) Siyuan Liu 2025-03-25 01:43:00 -07:00
051da7efe3 Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160) Lu Fang 2025-03-25 00:36:45 -07:00
25f560a62c [V1][Spec Decode] Update target_logits in place for rejection sampling (#15427) v0.8.2 Woosuk Kwon 2025-03-24 21:04:41 -07:00
a09ad90a72 [V1] guidance backend for structured output + auto fallback mode (#14779) Russell Bryant 2025-03-25 00:02:33 -04:00
10b34e36b9 [Bugfix] Fixed the issue of not being able to input video and image simultaneously (#15387) Chauncey 2025-03-25 11:48:08 +08:00
b5269db959 Revert "Fix non-contiguous input passed to Marlin kernel (#15319)" (#15398) Tyler Michael Smith 2025-03-24 23:43:51 -04:00
6db94571d7 [Misc] Remove LoRA log (#15388) Jee Jee Li 2025-03-25 11:43:48 +08:00
97cfa65df7 Add pipeline parallel support to TransformersModel (#12832) Harry Mellor 2025-03-25 02:41:45 +00:00
911c8eb000 [Minor][Spec Decode] Remove compiled_softmax (#15416) Woosuk Kwon 2025-03-24 19:09:04 -07:00
ebcebeeb6b [V1][Spec Decode] Enable spec decode for top-p & top-k sampling (#15063) Woosuk Kwon 2025-03-24 17:16:46 -07:00
f533b5837f [ROCm][Kernel] MoE weights padding (#14454) Gregory Shtrasberg 2025-03-24 19:45:30 -04:00
8279201ce6 [Build] Cython compilation support fix (#14296) Gregory Shtrasberg 2025-03-24 19:37:54 -04:00
23fdab00a8 [Hardware][TPU] Skip failed compilation test (#15421) Siyuan Liu 2025-03-24 16:28:57 -07:00
623e2ed29f [BugFix][V1] Quick fix for min_tokens with multiple EOS (#15407) Nick Hill 2025-03-24 15:58:59 -07:00
9d72daf4ce [V1][Perf] Simpler request output queues (#15156) Nick Hill 2025-03-24 15:44:08 -07:00
6dd55af6c9 [Doc] Update docs on handling OOM (#15357) Cyrus Leung 2025-03-25 05:29:34 +08:00
3eb08ed9b1 [DOC] Add Kubernetes deployment guide with CPUs (#14865) Yuan Tang 2025-03-24 13:48:43 -04:00
5eeadc2642 [Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral (#12303) liuzhenwei 2025-03-25 00:48:40 +08:00
3aee6573dc [V1] Aggregate chunked prompt logprobs in model runner (#14875) Nick Hill 2025-03-24 09:27:57 -07:00
9cc645141d [MISC] Refine no available block debug msg (#15076) Yi Liu 2025-03-25 00:01:10 +08:00
0893567db9 [V1][Minor] fix comments (#15392) Chen1022 2025-03-24 23:45:32 +08:00
8abe69b499 [Core] Don't force uppercase for VLLM_LOGGING_LEVEL (#15306) Russell Bryant 2025-03-24 11:27:30 -04:00
761702fd19 [Core] Integrate fastsafetensors loader for loading model weights (#10647) Manish Sethi 2025-03-24 11:08:02 -04:00
9606d572ed [distributed] fix dp group (#15355) youkaichao 2025-03-24 22:54:27 +08:00
cbcdf2c609 [Bugfix] Fix chat template loading (#15143) Cyrus Leung 2025-03-24 21:50:09 +08:00
038de04d7b Fix zmq IPv6 URL format error (#15341) Russell Bryant 2025-03-24 09:30:41 -04:00
6b3cc75be0 [Kernel] allow non-contiguous input for marlin kernel (#14658) Jinzhen Lin 2025-03-24 21:21:33 +08:00
7ffcccfa5c Revert "[CI/Build] Use uv python for docker rather than ppa:deadsnakess/ppa (#13569)" (#15377) Simon Mo 2025-03-24 05:53:10 -07:00
cc8accfd53 [Misc] Update guided decoding logs to debug (#15310) sfbemerk 2025-03-24 12:25:20 +01:00
948ab03e7e [Bugfix][V1] Avoid importing PreTrainedModel (#15366) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-03-24 12:33:12 +02:00
5797fb97e9 [Misc] Remove ignore_reinit_error for ray.init() (#15373) Rui Qiao 2025-03-24 00:41:53 -07:00
3892e58ad7 [Misc] Upgrade BNB version (#15183) Jee Jee Li 2025-03-24 13:51:42 +08:00
d20e261199 Fix non-contiguous input passed to Marlin kernel (#15319) Qubitium-ModelCloud 2025-03-24 11:09:44 +08:00
f622dbcf39 [Fix] [torch.compile] Improve UUID system for custom passes (#15249) Luka Govedič 2025-03-23 21:54:07 -04:00
dccf535f8e [V1] Enable V1 Fp8 cache for FA3 in the oracle (#15191) Lucas Wilkinson 2025-03-23 18:07:04 -04:00
9c5c81b0da [Misc][Doc] Add note regarding loading generation_config by default (#15281) Roger Wang 2025-03-23 14:00:55 -07:00
d6cd59f122 [Frontend] Support tool calling and reasoning parser (#14511) Robin 2025-03-24 05:00:07 +08:00
bc8ed3c4ba [V1][Spec Decode] Use better defaults for N-gram (#15358) Woosuk Kwon 2025-03-23 10:52:30 -07:00
b9bd76ca14 [V1][Spec Decode] Respect prompt_lookup_max (#15348) Woosuk Kwon 2025-03-23 10:41:44 -07:00
6ebaf9ac71 [Bugfix] consider related env vars for torch.compiled cache hash (#14953) DefTruth 2025-03-23 23:53:09 +08:00
f90d34b498 [Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 (#15322) DefTruth 2025-03-23 16:10:10 +08:00
f68cce8e64 [ci/build] fix broken tests in LLM.collective_rpc (#15350) youkaichao 2025-03-23 14:49:48 +08:00
09b6a95551 [ci/build] update torch nightly version for GH200 (#15135) youkaichao 2025-03-23 14:04:13 +08:00
50c9636d87 [V1][Usage] Refactor speculative decoding configuration and tests (#14434) shangmingc 2025-03-23 13:28:10 +08:00
0661cfef7a Fix v1 supported oracle for worker-cls and worker-extension-cls (#15324) hijkzzz 2025-03-23 10:23:35 +08:00
a827aa815d [doc] Add back previous news (#15331) Chen Zhang 2025-03-23 08:38:33 +08:00
b877031d80 Remove openvino support in favor of external plugin (#15339) Russell Bryant 2025-03-22 17:06:39 -04:00
dd861b992f [BugFix][Typing] Fix Imprecise Type Annotations (#15208) Wang Ran (汪然) 2025-03-23 00:05:03 +08:00
eb63ea1e18 [V1] Add disable-any-whitespace option support for xgrammar (#15316) Russell Bryant 2025-03-22 11:56:17 -04:00
2f4bd358f1 [Model] Support Tele-FLM Model (#15023) Naitong Yu 2025-03-22 17:04:44 +08:00
8a8b30eac1 [Bugfix] LoRA V0 - Fix case where max_num_seqs is between cudagraph capture sizes (#15308) Varun Sundar Rabindranath 2025-03-22 05:03:32 -04:00
2fa0e1396b [Bugfix] Fix torch.compile raise FileNotFoundError (#15278) Jee Jee Li 2025-03-22 13:49:34 +08:00
1c2bec0f82 [Doc] add load_format items in docs (#14804) wwl2755 2025-03-22 00:36:43 -05:00
ec870fba9a [FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature (#14959) TJian 2025-03-22 13:36:14 +08:00
df1430265c [Bugfix][V0] Multi-sequence logprobs streaming edge case (#15259) Andy Lo 2025-03-22 05:35:37 +00:00
4c69e228b3 [Misc] Increase RayDistributedExecutor RAY_CGRAPH_get_timeout (#15301) Rui Qiao 2025-03-21 22:25:43 -07:00
790b79750b [Build/CI] Fix env var typo (#15305) Russell Bryant 2025-03-21 18:28:46 -04:00
cfbb8c930f [TPU][V1] MHA Pallas backend (#15288) Nicolò Lucchesi 2025-03-21 16:50:39 +01:00
baec0d4de9 Revert "[Feature] specify model in config.yaml (#14855)" (#15293) Cyrus Leung 2025-03-21 23:30:23 +08:00
c21b99b912 [Bugfix][VLM] fix llava processor (#15285) Mengqing Cao 2025-03-21 20:14:36 +08:00
93a00d7dde [v1] Refactor KVCacheConfig (#14079) Chen Zhang 2025-03-21 19:56:27 +08:00
61e8c18350 [Misc] Add cProfile helpers (#15074) Russell Bryant 2025-03-21 07:56:09 -04:00
8afcd0f633 [Bugfix] Fix broken kernel test due to missing rename for v1 Triton backend (#15282) Isotr0py 2025-03-21 19:42:06 +08:00
91ca929dc7 [V1] Fix wrong import path of get_flash_attn_version (#15280) Lehua Ding 2025-03-21 18:54:11 +08:00
84e00adc8a [Bugfix] Fix incorrect resolving order for transformers fallback (#15279) Isotr0py 2025-03-21 18:54:08 +08:00
47c7126213 [Misc] Add attention mask pre-computation optimization back to Qwen2.5-VL (#15273) Isotr0py 2025-03-21 18:32:33 +08:00
a989ca2bf6 [Bugfix] Add int8 torch dtype for KVCache (#15260) Shanshan Shen 2025-03-21 16:58:28 +08:00
0fa3970deb [Feature] specify model in config.yaml (#14855) Wei Zeng 2025-03-21 00:26:03 -07:00
da6ea29f7a [V1] Avoid redundant input processing in n>1 case (#14985) Nick Hill 2025-03-20 22:24:10 -07:00
7297941b38 [Doc] Update LWS docs (#15163) Edwin Hernandez 2025-03-20 21:18:47 -07:00
f8a08cb90d [V1] Enable Triton(ROCm) Attention backend for Nvidia GPUs (#14071) Isotr0py 2025-03-21 11:14:19 +08:00
b15fd2be2a [Hardware][TPU] Add check for no additional graph compilation during runtime (#14710) Siyuan Liu 2025-03-20 20:05:28 -07:00
e588ac237c Add an example for reproducibility (#15262) Woosuk Kwon 2025-03-20 19:55:47 -07:00
5df2da5b97 [Misc] Better RayExecutor and multiprocessing compatibility (#14705) Cody Yu 2025-03-20 19:27:46 -07:00
11b986b3fb [Docs] Trim the latest news in README (#15261) Woosuk Kwon 2025-03-20 19:24:21 -07:00
296f927f24 [Model] RE: Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies (#14857) Chih-Chieh Yang 2025-03-20 19:21:08 -07:00
0032903a5b [Bugfix] detect alibi and revert to FA2 (#15231) Travis Johnson 2025-03-20 20:20:16 -06:00
47195057e9 [V1][TPU] Speed up top-k on TPU by using torch.topk (#15242) Hyesoo Yang 2025-03-20 19:19:40 -07:00
6edbfa924d Mention extra_body as a way top pass vLLM only parameters using the OpenAI client (#15240) Harry Mellor 2025-03-21 02:18:36 +00:00
1e508343e1 [Bugfix] Fix incorrect qwen2.5-vl attention mask pre-computation (#15200) Isotr0py 2025-03-21 10:18:04 +08:00
2e0b4cfde0 [ROCM] Upgrade torch to 2.6 (#15244) Sage Moore 2025-03-20 19:17:33 -07:00
10f55fe6c5 [Misc] Clean up the BitsAndBytes arguments (#15140) Jee Jee Li 2025-03-21 10:17:12 +08:00
d3ccbd6350 Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159) Lu Fang 2025-03-20 19:01:11 -07:00
0cfe7d386d [CI/Build] LoRA : make add_lora_test safer (#15181) Varun Sundar Rabindranath 2025-03-20 21:28:53 -04:00

... 104 105 106 107 108 ...