Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

b6a3a9f76d [Core] Fix abrupt request abort (#18485) Nicolò Lucchesi 2025-06-07 01:27:59 +02:00
ca27f0f9c1 [Bugfix][Core] Update cancellation logic in generate() to handle Generator exits (#19225) Adolfo Victoria 2025-06-06 13:17:54 -07:00
aad30bd306 [BugFix] Fix MultiConnector test after HMA changes (#19291) Nick Hill 2025-06-06 13:16:24 -07:00
94ecee6282 Fixed ppc build when it runs on non-RHEL based linux distros (#18422) Nishidha 2025-06-07 00:24:26 +05:30
8267f9916f improve logits bias (#19041) Yu Guo 2025-06-06 04:59:25 -07:00
7353492a47 [Core] Raise when non-multi-instance DP clients target a DP rank (#19227) jmswen 2025-06-06 04:03:01 -07:00
7661e92ef8 [Model] Optimize nemotron_h implementation (#19249) Jee Jee Li 2025-06-06 18:05:14 +08:00
f168b85725 Unit Test for run_dp_sharded_vision_model (#19103) Siqi Yan 2025-06-06 01:24:02 -07:00
da511d54d8 Fix CompilationConfig repr (#19091) Richard Zou 2025-06-06 04:23:35 -04:00
65c69444b1 [Docs] Improve V1 KVConnector interface documentation (#19172) Nick Hill 2025-06-06 01:22:45 -07:00
94870359cd [Quantization] Bump compressed-tensors version; update NVFP4A16 test model (#19224) Dipika Sikka 2025-06-06 04:21:54 -04:00
0d49483ea9 [TPU] fix kv cache dtype in model runner (#19244) Chengji Yao 2025-06-06 01:20:16 -07:00
90b78ec5f9 [v1][P/D] Fix a edge case in kv cache schedule (#19182) Jinghui Zhang 2025-06-05 23:32:55 -07:00
91a2ef98ea [Chore] update CODEOWNERS (#19247) Aaron Pham 2025-06-06 02:09:43 -04:00
3da2313d78 Support allowed_token_ids in ChatCompletionRequest (#19143) Xu Song 2025-06-06 13:06:48 +08:00
b61dc5f972 [TPU] update torch_xla pin (#19231) Chengji Yao 2025-06-05 21:27:38 -07:00
f8a1a2d108 [v1] Hybrid Memory Allocator (#17996) Chen Zhang 2025-06-06 11:47:09 +08:00
3465b87ef8 [Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033) Benjamin Chislett 2025-06-05 22:10:08 -04:00
c8134bea15 Fix AOPerModuleConfig name changes (#18869) Jerry Zhang 2025-06-05 21:51:32 -04:00
cb6d572e85 [Model] NemotronH support (#18863) Luis Vega 2025-06-05 14:29:28 -07:00
87360308b7 [V1] Use FlashInfer by default on Blackwell GPUs (#19118) Michael Goin 2025-06-05 15:40:39 -04:00
aa49f14832 [Quantization] Skip Fp4 Test for compressed-tensors (#19217) Dipika Sikka 2025-06-05 14:21:53 -04:00
9ef9173cfa [P/D][NixlConnector] Enable FlashInfer backend (#19090) Nicolò Lucchesi 2025-06-05 19:10:15 +02:00
85e2b7bb13 [MISC][Bugfix] Use less CPU when message queue has been empty for some time (#16226) Povilas Kanapickas 2025-06-05 19:53:08 +03:00
61059bee40 [Hardware][NVIDIA] FP4 MoE kernel optimization (#19110) Chiyue Wei 2025-06-05 09:48:26 -07:00
ec89524f50 Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 (#19205) Xu Wenqing 2025-06-06 00:38:54 +08:00
f20f9f063b [mistral_common] Add v11 tokenizer (#19193) Patrick von Platen 2025-06-05 17:27:41 +02:00
9bc8bb07cf [Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided (#19202) Guillaume Calmettes 2025-06-05 14:59:28 +02:00
1aeb925f34 [Frontend] improve vllm run-batch --help display (#19187) Reid 2025-06-05 19:16:25 +08:00
188a4590d8 [Misc] Do not override NCCL_CUMEM_ENABLE if set explicitly (#19105) 22quinn 2025-06-05 04:14:32 -07:00
18093084be [Misc] Remove unnecessary fallback to prefill-decode attention (#19138) vllmellm 2025-06-05 16:08:26 +08:00
da40380214 [Build] Annotate wheel and container path for release workflow (#19162) Simon Mo 2025-06-04 23:24:56 -07:00
8fc57501d3 [Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled (#19135) Chauncey 2025-06-05 14:24:24 +08:00
af7fc84fd2 [BugFix][Minor] Fix full cuda graph bug when max_num_seqs < 512 (#19171) Woosuk Kwon 2025-06-04 22:41:25 -07:00
0678b52251 Handle non-serializable objects when dumping benchmark results (#19114) Huy Do 2025-06-04 22:40:04 -07:00
25b918eee6 [Torch Nightly]add missing dependency (#18770) Yang Wang 2025-06-04 21:56:12 -07:00
a408820f2f [Bugfix] Fix port handling in make_zmq_path (#19117) Michael Goin 2025-06-04 23:00:59 -04:00
c56ed8bb0e [Bugfix][Nixl] Fix full prefix cache hit bug (#18632) Robert Shaw 2025-06-04 22:07:32 -04:00
78dcf56cb3 [doc] small fix (#19167) Reid 2025-06-05 09:13:50 +08:00
b2fac67130 [P/D] Heterogeneous TP (#18833) Nicolò Lucchesi 2025-06-05 01:25:34 +02:00
23027e2daf [Misc] refactor: simplify EngineCoreClient.make_async_mp_client in AsyncLLM (#18817) CYJiang 2025-06-05 06:37:25 +08:00
c3fd4d669a [Kernel] Integrate batched/masked deepgemm kernel (#19111) Varun Sundar Rabindranath 2025-06-04 17:59:18 -04:00
ef3f98b59f [Bugfix] fix v1 cpu worker fails on macOS (#19121) Kebe 2025-06-05 04:17:38 +08:00
7ee2590478 [TPU] Update dynamo dump file name in compilation test (#19108) Siyuan Liu 2025-06-04 13:13:43 -07:00
53a5a0ce30 [Perf] Tunings for SM100 FP8 CUTLASS kernel (#18778) Michael Goin 2025-06-04 13:46:28 -04:00
d459fae0a2 [Bugfix][EP+DP] Fix internode check (#19112) Tyler Michael Smith 2025-06-04 11:39:23 -04:00
c8dcc15921 Allow AsyncLLMEngine.generate to target a specific DP rank (#19102) jmswen 2025-06-04 08:26:47 -07:00
8f4ffbd373 [Doc] Update V1 Guide for embedding models (#19141) Cyrus Leung 2025-06-04 22:57:55 +08:00
5f2cd251d2 Sm100 blockwise fp8 swap ab (#18564) Lain 2025-06-04 07:48:45 -07:00
02658c2dfe Add DeepSeek-R1-0528 function call chat template (#18874) Xu Wenqing 2025-06-04 21:24:18 +08:00
01dc9a76db [CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678) Cyrus Leung 2025-06-04 19:49:20 +08:00
35cf32df30 Improve the output precision of embedding models (#19092) wang.yuqi 2025-06-04 19:48:57 +08:00
8711bc5e68 [Misc] Add packages for benchmark as extra dependency (#19089) Isotr0py 2025-06-04 19:18:48 +08:00
2669a0d7b5 Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113) Seiji Eicher 2025-06-04 02:10:45 -07:00
8e972d9c44 [TPU] Skip hanging tests (#19115) Siyuan Liu 2025-06-04 01:43:00 -07:00
3336c8cfbe Fix #19130 (#19132) 汪志鹏 2025-06-04 16:42:06 +08:00
b124e1085b [Bugfix] Fix FA3 full cuda graph correctness (#19106) Woosuk Kwon 2025-06-03 23:10:15 -07:00
41aa578428 [NVIDIA] Add Cutlass MLA backend (#17625) Kaixi Hou 2025-06-04 12:40:26 +08:00
8d646c2e53 [Cleanup][v1]:remote guided-decoding-backend for example (#19059) Calvin Chen 2025-06-04 12:23:26 +08:00
5d6d1adf15 [KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437) Vadim Gimpelson 2025-06-04 08:13:01 +04:00
1409ef9134 [Core] Cast multimodal input in hf processor (#18862) Lukas Geiger 2025-06-04 04:24:56 +01:00
4555143ea7 [CPU] V1 support for the CPU backend (#16441) Li, Jiang 2025-06-04 09:43:01 +08:00
52dceb172d [Docs] Add developer doc about CI failures (#18782) Russell Bryant 2025-06-03 21:09:13 -04:00
abd7df2fca [Misc] Fix path and python alias errors in disagg_prefill exmaples (#18919) Jiaxin Shan 2025-06-03 17:15:18 -07:00
b712be98c7 feat: add data parallel rank to KVEventBatch (#18925) Yan Ru Pei 2025-06-03 17:14:20 -07:00
a8da78eac9 [Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029) Chen Zhang 2025-06-04 08:14:06 +08:00
5d96533e22 [Bugfix][P/D] Fix Prefix Cache Bug (#18411) Nicolò Lucchesi 2025-06-04 01:53:16 +02:00
4de790fcad [Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled (#19075) Chauncey 2025-06-04 07:27:24 +08:00
b5fd9506c1 [Bugfix] get_num_blocks_to_allocate with null_block (#19031) Chen Zhang 2025-06-04 06:30:55 +08:00
135cf55cd1 [V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix (#18971) Ekagra Ranjan 2025-06-03 18:26:33 -04:00
6cac54f4d1 [v1] Re-init input batch for multiple kv cache groups (#18654) Chen Zhang 2025-06-04 05:41:36 +08:00
6865fe0074 Fix interaction between Optional and Annotated in CLI typing (#19093) Harry Mellor 2025-06-03 22:07:19 +01:00
e31446b6c8 [Perf] Tune scaled_fp8_quant by increasing vectorization (#18844) Michael Goin 2025-06-03 16:48:25 -04:00
bdf13965ab [V1] Support cross-layer KV sharing (#18212) Yong Hoon Shin 2025-06-03 13:33:07 -07:00
fa98d77773 [Kernel] DeepEP dispatch-combine kernel integration (#18434) Varun Sundar Rabindranath 2025-06-03 15:30:02 -04:00
01eee40536 [doc] update docker version (#19074) Reid 2025-06-04 03:08:21 +08:00
19bdaf32b1 [Doc] Readme standardization (#18695) SorenDreano 2025-06-03 20:50:55 +02:00
02f0c7b220 [Misc] Add SPDX-FileCopyrightText (#19100) Simon Mo 2025-06-03 11:20:17 -07:00
d054da1992 [Misc] fix: add miss best_of param validation (#18555) CYJiang 2025-06-04 02:02:07 +08:00
4b7817c119 [Misc] Add missing _Backend enums (#19081) Nicolò Lucchesi 2025-06-03 18:15:16 +02:00
d00dd65cd4 [Doc] Improve the Pull Request template with key components (#19086) Lu Fang 2025-06-03 23:44:34 +08:00
d81edded69 [Bugfix] disable processor cache (#19068) Raushan Turganbay 2025-06-03 17:06:04 +02:00
476844d44c Fix underscores in dict keys passed via CLI (#19030) Harry Mellor 2025-06-03 15:39:24 +01:00
4e68ae5e59 [CI/Build] Remove V0 LoRA test (#19066) Jee Jee Li 2025-06-03 22:30:18 +08:00
4e88723f32 [doc] clarify windows support (#19088) youkaichao 2025-06-03 21:42:17 +08:00
118ff92111 [Doc] Update V1 user guide for embedding and enc-dec models (#19060) Cyrus Leung 2025-06-03 17:29:41 +08:00
ec2dcd80bc [Misc] Update WeightsMapper for qwen2-vl/qwen2.5-vl (#19054) Isotr0py 2025-06-03 17:08:20 +08:00
42243fbda0 [Doc] Add InternVL LoRA support (#19055) Jee Jee Li 2025-06-03 17:08:03 +08:00
6d18ed2a2e Update docker docs with ARM CUDA cross-compile (#19037) Michael Goin 2025-06-03 04:21:53 -04:00
f32fcd9444 [v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015) Chen Zhang 2025-06-03 16:01:48 +08:00
d32aa2e670 [Bugfix] Use cmake 3.26.1 instead of 3.26 to avoid build failure (#19019) Lu Fang 2025-06-03 15:16:17 +08:00
cc977286e7 Reduce logs in CLI scripts and plugin loader (#18970) Michael Goin 2025-06-03 02:00:45 -04:00
17430e3653 [bugfix] small fix logic issue (#18999) Reid 2025-06-03 13:35:12 +08:00
1282bd812e Add tarsier model support (#18985) 汪志鹏 2025-06-03 13:13:13 +08:00
bdce64f236 [V1] Support DP with Ray (#18779) Rui Qiao 2025-06-02 21:15:13 -07:00
9e6f61e8c3 [ROCm][Build] Clean up the ROCm build (#19040) Gregory Shtrasberg 2025-06-02 23:47:47 -04:00
8655f47f37 [CPU][CI] Re-enable the CPU CI tests (#19046) Li, Jiang 2025-06-03 11:46:47 +08:00
4ce42f9204 Adding "LoRA Test %N" to AMD production tests (#18929) Concurrensee 2025-06-02 22:46:44 -05:00
8a57872b2a [Bugfix][EP+DP] Use pplx-kernel internode instead of intranode (#19034) Tyler Michael Smith 2025-06-02 23:36:51 -04:00
5bc1ad6cee [Doc] Remove duplicate TOCs during MkDocs migration (#19021) Hyogeun Oh (오효근) 2025-06-03 11:49:48 +09:00

... 88 89 90 91 92 ...