Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

8d6cf89526 [V1] [Spec Decode] Support random sampling for spec decode (#13933) v0.8.0rc1 Lily Liu 2025-03-16 22:00:20 -07:00
583a9778e0 [Benchmark] Do not save detailed info to json by default (#14879) Simon Mo 2025-03-16 21:48:11 -07:00
a73e183e36 [Misc] Replace os environ to monkeypatch in test suite (#14516) Sibi 2025-03-17 11:35:57 +08:00
1e799b7ec1 [BugFix] Fix MLA + V1 + TP==1 causing reinitialization of cuda context (#14910) Lucas Wilkinson 2025-03-16 23:35:37 -04:00
7f6c5ee06c [V1][Minor] Add __repr__ to ConstantList (#14907) Woosuk Kwon 2025-03-16 20:20:15 -07:00
faa0275730 [V1] Optimize the overhead of rewinding (#14905) Woosuk Kwon 2025-03-16 20:19:30 -07:00
8a5a9b70d7 [CI/Build] Update defaults for test reproducibility (#14893) Cyrus Leung 2025-03-17 10:38:15 +08:00
bb3aeddfaf [CI] Nightly Tests (#14898) Robert Shaw 2025-03-16 22:06:43 -04:00
aecc780dba [V1] Enable Entrypoints Tests (#14903) Robert Shaw 2025-03-16 20:56:16 -04:00
90df7f23aa [Doc] Add guidance for using ccache with pip install -e . in doc (#14901) Vadim Gimpelson 2025-03-17 03:10:04 +04:00
b9b5bdfc7d [Misc] Catching Ray Compiled Graph PP test failures for V1 (#14847) Rui Qiao 2025-03-16 15:46:42 -07:00
31060b2757 [V1][BugFix] Detect interleaved sliding window attention (#14896) Woosuk Kwon 2025-03-16 14:53:53 -07:00
fc1f67715d [BugFix][V1] Fix overhead related to bad_words sampling when not in use (#14894) Nick Hill 2025-03-16 14:53:34 -07:00
f6137adbcb Revert "[Bugfix] Limit profiling run sequence length by max_model_len (#14785) (#14892) Cyrus Leung 2025-03-17 00:13:46 +08:00
e53b1350f2 [Bugfix] Explicitly disable Phi-4-multimodal in V1 (#14889) Cyrus Leung 2025-03-17 00:05:40 +08:00
d30aa7e9e6 [Bugfix] Limit profiling run sequence length by max_model_len (#14785) Kyle Sayers 2025-03-16 10:44:19 -04:00
d1ad2a57af [V1] [Spec Decode] Fix ngram tests (#14878) Lily Liu 2025-03-16 00:29:22 -07:00
b82662d952 [BugFix] Fix torch distributed stateless PG backend init (#14870) Nick Hill 2025-03-15 20:26:19 -07:00
71c1e07107 [Kernel] Add more tuned configs (#14877) Simon Mo 2025-03-15 20:25:03 -07:00
b30c75dda4 [V1] Remove V0 fallback for mistral-tokenizer (#14873) Roger Wang 2025-03-15 20:21:11 -07:00
def232e122 [VLM] Clean up Phi-4-MM ViT implementation (#14812) Isotr0py 2025-03-16 09:53:52 +08:00
3453b964a3 [Misc][Doc] Minor benchmark README update (#14874) Roger Wang 2025-03-15 18:46:17 -07:00
61c6a5a796 [VLM] Merged multi-modal processor for Pixtral (#12211) Rémi Delacourt 2025-03-15 14:28:27 +01:00
74bc397b0a [Core] Expose API endpoint /is_sleeping (#14312) Jun Duan 2025-03-15 09:28:14 -04:00
f58aea002c [CI][Intel GPU] refine intel GPU ci docker build (#14860) Kunshang Ji 2025-03-15 04:58:53 -07:00
3556a41434 [VLM] Limit multimodal input cache by memory (#14805) Cyrus Leung 2025-03-15 17:52:05 +08:00
9ed6ee92d6 [Bugfix] EAGLE output norm bug (#14464) Bryan Lu 2025-03-14 23:50:33 -07:00
ee3778d5fc [Build/CI] Upgrade jinja2 to get 3 moderate CVE fixes (#14839) Russell Bryant 2025-03-15 01:38:19 -04:00
aaacf17324 [Doc] V1 user guide (#13991) Jennifer Zhao 2025-03-14 22:17:59 -07:00
4c7629cae9 [V1][Structured Output] calculate vocab_size eagerly (#14851) Aaron Pham 2025-03-15 01:09:51 -04:00
e0fdfa1608 [CI/Build] Delete LoRA bias test (#14849) Jee Jee Li 2025-03-15 13:09:25 +08:00
5952d8ab61 [Attention] Get rid of mla cache alignment (#14842) Lucas Wilkinson 2025-03-15 01:08:25 -04:00
a2ae496589 [CPU] Support FP8 KV cache (#14741) Li, Jiang 2025-03-15 13:07:36 +08:00
877e352262 [Docs] Add new East Coast vLLM Meetup slides to README and meetups.md (#14852) Simon Mo 2025-03-14 22:06:38 -07:00
d4d93db2c5 [V1] V1 Enablement Oracle (#13726) Robert Shaw 2025-03-15 01:02:20 -04:00
8c0d15d5c5 [Misc][Easy] Annotate unused vars in the csrc files (#14798) Lu Fang 2025-03-14 21:40:09 -07:00
97ac781c62 [Misc] Remove misleading message in gemma2 and gemma3 (#14850) Isotr0py 2025-03-15 12:35:12 +08:00
776dcec8fe Disable outlines cache by default (#14837) Russell Bryant 2025-03-14 23:57:55 -04:00
ccf02fcbae Revert "[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of U… (#14848) Tyler Michael Smith 2025-03-14 23:45:42 -04:00
acaea3bb07 [Bugfix][V1] Fix flashinfer sampling (#14815) DefTruth 2025-03-15 11:42:38 +08:00
9f37422779 [Neuron][CI] update docker run command (#14829) Liangfu Chen 2025-03-14 18:51:35 -07:00
dd344e0342 [Bugfix] Fix torch_xla in V0 which can't handle None seed introduced … (#14844) yarongmu-google 2025-03-14 17:41:15 -07:00
54a8804455 [Doc] More neutral K8s deployment guide (#14084) Yuan Tang 2025-03-14 19:12:36 -04:00
bbd94a19fc [Build/CI] Upgrade aiohttp to incldue CVE fix (#14840) Russell Bryant 2025-03-14 19:11:28 -04:00
233ffce1eb [Build/CI] Move ninja to common deps (#14835) Russell Bryant 2025-03-14 17:25:28 -04:00
40677783aa [CI] Add TPU v1 test (#14834) Richard Liu 2025-03-14 14:13:30 -07:00
14f301b541 Update to torch==2.6.0 (#12721) Michael Goin 2025-03-14 16:58:30 -04:00
46f98893dd [V1] Fix model parameterization for structured output tests (#14833) Russell Bryant 2025-03-14 16:55:18 -04:00
fe66b34728 [Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies (#14778) Chih-Chieh Yang 2025-03-14 16:36:18 -04:00
270a5da495 Re-enable the AMD Entrypoints Test (#14711) Alexei-V-Ivanov-AMD 2025-03-14 14:18:13 -05:00
7097b4cc1c [release] Remove log cleanup commands from TPU job (#14838) Kevin H. Luu 2025-03-14 11:59:52 -07:00
977a16772c [Bugfix][Kernel]: Fix AllSpark kernel compilation errors and enable for CUDA < 12.0 (#14430) Yajie Wang 2025-03-15 00:55:14 +08:00
73deea2fdb [Frontend] track server_load (#13950) daniel-salib 2025-03-14 09:53:17 -07:00
9d2b4a70f4 [V1][Metrics] Updated list of deprecated metrics in v0.8 (#14695) Mark McLoughlin 2025-03-14 16:45:25 +00:00
0b0d6421b2 [Frontend] Fix log message to use http vs https (#14774) Russell Bryant 2025-03-14 12:21:09 -04:00
1140991a7b [V1] Fix vocab size calculation for structured output (#14826) Russell Bryant 2025-03-14 12:18:38 -04:00
613c5bb945 [Bugfix] Fix Aria test loading (#14823) Cyrus Leung 2025-03-15 00:11:23 +08:00
fd8e055ffb [BugFix]: properly catch templating error when preprocess input (#13976) Guillaume Calmettes 2025-03-14 08:58:34 -04:00
ab93f1360f [VLM] Various cleanup and fixes (#14806) Cyrus Leung 2025-03-14 20:58:19 +08:00
40253bab44 [Bugfix][W8A8] fixed cutlass block fp8 binding (#14796) DefTruth 2025-03-14 18:32:42 +08:00
c77620d22d [V1][Minor] Minor code cleanup for scheduling metrics (#14800) Woosuk Kwon 2025-03-14 01:21:28 -07:00
989ecd2007 [Misc] Gemma3ForConditionalGeneration supports LoRA (#14797) Jee Jee Li 2025-03-14 16:07:30 +08:00
54cc46f3eb [Bugfix] Fix small typo in the example of Streaming delimiter (#14793) WeiCheng 2025-03-14 16:05:17 +08:00
601bd3268e [Misc] Clean up type annotation for SupportsMultiModal (#14794) Cyrus Leung 2025-03-14 15:59:56 +08:00
09269b3127 [BugFix]Fix performance serving benchmark when enable profiling (#14737) Li Wang 2025-03-14 15:02:05 +08:00
27b50f1fe6 [Bugfix][Kernel][CPU] Fix num_tokens in CPU rotary embedding kernel (#14667) Thien Tran 2025-03-14 14:47:49 +08:00
9532c49836 [Attention] MLA get rid of materialization (#14770) Lucas Wilkinson 2025-03-14 02:39:02 -04:00
0c2af17c76 [CI] Fix missing example model id in processor test (#14787) Roger Wang 2025-03-13 22:52:15 -07:00
a6e0d096dd [Feature] Add visionarena offline support for benchmark_throughput (#14654) Jennifer Zhao 2025-03-13 21:07:54 -07:00
d3d4956261 [Neuron] flatten test parameterization for neuron attention kernels (#14712) Liangfu Chen 2025-03-13 20:46:56 -07:00
4059adc31b [Misc][Minor] Simplify SamplingParams.__post_init__() (#14772) Nick Hill 2025-03-13 23:44:20 -04:00
f1f632d9ec [ci] Reduce number of tests in fastcheck (#14782) Kevin H. Luu 2025-03-13 20:43:45 -07:00
95d680b862 [Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prepack when CPU does not support it (#14681) Thien Tran 2025-03-14 11:43:18 +08:00
fb4c7f8ef0 [Kernel] [V1] Further optimizations to ROCm (Triton) Backend to better handle GQA. (#14431) Thomas Parnell 2025-03-14 04:42:27 +01:00
0b1cfa6180 [Kernel] LoRA - Enable CUDAGraphs for V1 (#14626) Varun Sundar Rabindranath 2025-03-13 23:42:04 -04:00
32ef4983cd [V1] Temporarily disable FlashInfer Rejection Sampler (#14788) Woosuk Kwon 2025-03-13 20:40:35 -07:00
ad19c8a003 [V1] Move OOM check into sampler run (#14728) Roger Wang 2025-03-13 20:40:23 -07:00
2a602b055a forward fix PR 14245, restore build on ROCm 6.2 (#14709) Jeff Daily 2025-03-13 20:40:15 -07:00
7888e1d0a3 [V1] TPU - Enable prefix caching by default (#14773) Alexander Matveev 2025-03-13 23:40:05 -04:00
60c872d4b6 [Doc] Fix small typo in Transformers fallback (#14791) Chen Zhang 2025-03-14 11:33:12 +08:00
3fb17d26c8 [Doc] Fix typo in documentation (#14783) yasu52 2025-03-13 20:33:09 -07:00
d47807ba08 [Attention] Remove slow setattr in MLA (#14769) Lucas Wilkinson 2025-03-13 17:31:14 -04:00
02fcaa3d0a [V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (#14624) afeldman-nm 2025-03-13 15:07:34 -04:00
8a4a2efc6f [V1][Core] using cached vocab_size for Structured Outputs (#14630) Aaron Pham 2025-03-13 14:39:28 -04:00
8e9ffd37d6 [Misc] Clean up processor tests (#14771) Cyrus Leung 2025-03-14 02:25:37 +08:00
01b3fd0af7 [V1][Minor] Minor enhancements on scheduler (#14732) Woosuk Kwon 2025-03-13 08:53:22 -07:00
f53a0586b9 [Bugfix] Fix prompt format of GLM4V (#14539) Cyrus Leung 2025-03-13 19:37:17 +08:00
b1cc4dfef5 [VLM] Support loading InternVideo2.5 models as original InternVLChatModel (#14738) Isotr0py 2025-03-13 18:10:02 +08:00
382403921f [VLM] Support pan-and-scan for Gemma3 multi-modal processor (#14672) Cyrus Leung 2025-03-13 17:23:12 +08:00
a73122de96 [Bugfix] fix benchmark moe (#14653) Jee Jee Li 2025-03-13 16:12:42 +08:00
bd44b812cb [CI/Build] Delete ultravox LoRA test (#14730) Jee Jee Li 2025-03-13 15:57:39 +08:00
55211b01e8 [Bugfix] Fix chunked prefill for GGUF (#14666) Szymon Ożóg 2025-03-13 08:19:03 +01:00
5d043c1685 [Quant] Bamba SupportsQuant (#14698) Kyle Sayers 2025-03-13 00:57:05 -04:00
36d1ccb286 [Quant] BartModel SupportsQuant (#14699) Kyle Sayers 2025-03-13 00:55:59 -04:00
1bc3b739c4 [V1][TPU] Add assertion on multi-step-scheduler (#14707) Siyuan Liu 2025-03-12 21:37:58 -07:00
1bd32bc8dd [Config][Disaggregated] Add timeout configuration for the torch.store and add KVTransferConfig.kv_connector_extra_config (#14367) Mathis Felardos 2025-03-13 04:15:20 +01:00
128bf75283 [BugFix][TritonMLA] Process weights after model loading for GGUF (#14555) TY-AMD 2025-03-13 11:14:36 +08:00
a94a699c3f [ROCm][FP8] Fix for adjustments needed only for fnuz (#14689) Gregory Shtrasberg 2025-03-12 23:14:04 -04:00
ab426ec9c0 Add ray[data] as tpu dependency (#14691) Richard Liu 2025-03-12 20:13:48 -07:00
165290d357 [bugfix] fixup warning message for plugged schedulers for v1 (#14700) Joe Runde 2025-03-12 21:12:13 -06:00

... 106 107 108 109 110 ...