Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

d419aa5dc4 [V1] Enable TPU V1 backend by default (#17673) Michael Goin 2025-05-06 09:49:49 -04:00
f9bc5a0693 [Bugfix] Fix triton import with local TritonPlaceholder (#17446) Mengqing Cao 2025-05-06 17:53:09 +08:00
05e1f96419 Fix dockerfilegraph pre-commit hook (#17698) Harry Mellor 2025-05-06 09:56:48 +01:00
6eae34533a [Misc] Fix ScalarType float4 naming (#17690) Lucas Wilkinson 2025-05-06 04:07:15 -04:00
63ced7b43f [Doc] Update notes for H2O-VL and Gemma3 (#17219) Cyrus Leung 2025-05-06 15:51:02 +08:00
dc47ba32f8 [Bugfix] Fixed prompt length for random dataset (#17408) Mikhail Podvitskii 2025-05-06 09:00:08 +02:00
edbf2d609e [easy] Fix logspam on PiecewiseBackend errors (#17138) Richard Zou 2025-05-06 02:46:11 -04:00
999328be0d [Model] Add GraniteMoeHybrid 4.0 model (#17497) Stan Wozniak 2025-05-06 06:00:31 +02:00
98834fefaa Update nm to rht in doc links + refine fp8 doc (#17678) Michael Goin 2025-05-05 20:41:14 -04:00
90bd2ae172 [Bugfix] LoRA - Retire unused maxnreg LoRA kernel argument (#17677) Varun Sundar Rabindranath 2025-05-06 06:04:29 +05:30
5941e0b7ea [TPU][V1] Add support for top-logprobs (#17072) Nicolò Lucchesi 2025-05-05 23:20:15 +02:00
9765940824 [TPU] Enable gemma3-27b with TP>1 on multi-chips. (#17335) XiongfeiWei 2025-05-05 14:19:58 -07:00
5ea5c514da [BugFix] Increase timeout for startup failure test (#17642) Nick Hill 2025-05-05 13:53:19 -07:00
d3efde8176 [Benchmarks] Remove invalid option under V1 engine (#17651) Russell Bryant 2025-05-05 16:30:22 -04:00
aea302be6c Use git-path commit in hook (#17616) Thomas J. Fan 2025-05-05 13:55:32 -04:00
cc05b90d86 [Doc] Fix broken cuda installation doc rendering (#17654) Isotr0py 2025-05-06 01:52:40 +08:00
1d0c9d6b2d [Kernel] some optimizations for dense marlin and moe marlin (#16850) Jinzhen Lin 2025-05-06 00:39:30 +08:00
f62cad6431 [Build/CI] Upgrade CUTLASS to 3.9.2 (#17641) Tyler Michael Smith 2025-05-04 22:23:17 -04:00
5394ad7387 [Bugfix] fix KeyError on top logprobs are special tokens (#17637) Chauncey 2025-05-05 10:22:35 +08:00
68e1ee0072 [Bugfix][Easy] Fix whitespace in shm_broadcast.py logging (#17635) Tyler Michael Smith 2025-05-04 22:20:19 -04:00
2858830c39 [Bugfix] Prioritize dtype in root config before checking text config (#17629) Cyrus Leung 2025-05-04 20:43:05 +08:00
d6484ef3c3 Add full API docs and improve the UX of navigating them (#17485) Harry Mellor 2025-05-04 03:42:43 +01:00
46fae69cf0 [Misc] V0 fallback for --enable-prompt-embeds (#17615) Cyrus Leung 2025-05-04 06:59:24 +08:00
f66f1e0fa3 [Bugfix] Fix broken Qwen2.5-omni tests (#17613) Isotr0py 2025-05-04 01:08:14 +08:00
887d7af882 [Core] Gate prompt_embeds behind a feature flag (#17607) Cyrus Leung 2025-05-04 00:19:20 +08:00
a92842454c [Bugfix][ROCm] Using device_type because on ROCm the API is still torch.cuda (#17601) Gregory Shtrasberg 2025-05-03 01:25:47 -04:00
c8386fa61d [Build/CI] Upgrade CUTLASS to 3.9.1 (#17602) Tyler Michael Smith 2025-05-03 01:25:14 -04:00
87baebebd8 [Frontend][TPU] Add TPU default max-num-batched-tokens based on device name (#17508) Chenyaaang 2025-05-02 21:42:44 -07:00
e3d0a1d190 [Quantizaton] [AMD] Add support for running DeepSeek int8 w8a8 MoE on ROCm (#17558) rasmith 2025-05-02 23:41:10 -05:00
d47b605eca Update test requirements to CUDA 12.8 (#17576) 22quinn 2025-05-02 21:40:15 -07:00
22c6f6397f [Neuron][Build] Require setuptools >= 77.0.3 for PEP 639 (#17603) Liangfu Chen 2025-05-02 19:41:59 -07:00
3ec97e2cc5 [release] Add command to clean up Docker containers/images in TPU release machine (#17606) Kevin H. Luu 2025-05-02 18:54:34 -07:00
9b103a1d76 fix typo in logging (#17605) Eric Hartford 2025-05-02 21:04:40 -04:00
b90b0852e9 [easy] Print number of needed GPUs in skip message (#17594) Richard Zou 2025-05-02 18:27:43 -04:00
9352cdb56d [Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263) Xiaodong Wang 2025-05-02 12:44:19 -07:00
182f40ea8b Add NVIDIA TensorRT Model Optimizer in vLLM documentation (#17561) Zhiyu 2025-05-02 11:36:46 -07:00
3e887d2e0c permute/unpermute kernel for moe optimization (#14568) Caleb_Du 2025-05-03 02:31:55 +08:00
3015d5634e [BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (#17574) v0.8.5.post1 Lucas Wilkinson 2025-05-02 14:01:38 -04:00
edb5286ea5 [BugFix] Fix Memory Leak (#17567) Robert Shaw 2025-05-02 04:07:03 -04:00
0f87d8f7b2 [BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (#17574) Lucas Wilkinson 2025-05-02 14:01:38 -04:00
4c33d67321 [Bugfix] fix tmp_out and exp_sums dimensions (#17438) Hui Liu 2025-05-02 09:44:07 -07:00
cb234955df [Misc] Clean up input processing (#17582) Cyrus Leung 2025-05-02 23:11:53 +08:00
3a500cd0b6 [doc] miss result (#17589) Reid 2025-05-02 22:04:49 +08:00
868c546da4 Support W8A8 INT8 MoE for compressed-tensors (#16745) Michael Goin 2025-05-02 08:03:32 -06:00
99404f53c7 [Security] Fix image hash collision (#17378) Cyrus Leung 2025-05-02 20:36:39 +08:00
785d75a03b Automatically tell users that dict args must be valid JSON in CLI (#17577) Harry Mellor 2025-05-02 13:24:55 +01:00
6d1479ca4b [doc] add the print result (#17584) Reid 2025-05-02 20:24:45 +08:00
b8b0859b5c add more pytorch related tests for torch nightly (#17422) Yang Wang 2025-05-02 03:29:59 -07:00
d7543862bd [Misc] Rename assets for testing (#17575) Cyrus Leung 2025-05-02 18:29:25 +08:00
c777df79f7 [BugFix] Fix Memory Leak (#17567) Robert Shaw 2025-05-02 04:07:03 -04:00
cc2a77d7f1 [Core] [Bugfix] Add Input Embeddings (#15428) Andrew Sansom 2025-05-02 03:06:39 -05:00
9e2de9b9e9 [Bugifx] Remove TritonPlaceholder from sys.modules (#17317) Isotr0py 2025-05-02 15:45:01 +08:00
109e15a335 Add pt_load_map_location to allow loading to cuda (#16869) Jerry Zhang 2025-05-01 23:23:42 -07:00
f192ca90e6 Fix PixtralHF missing spatial_merge_size (#17571) Michael Goin 2025-05-01 23:14:09 -06:00
f89d0e11bf [Misc] Continue refactoring model tests (#17573) Cyrus Leung 2025-05-02 13:06:08 +08:00
b4003d11fc Check if bitblas is installed during support check (#17572) Michael Goin 2025-05-01 22:32:54 -06:00
292fc59d61 [CI] Actually run tests/kv_transfer/test_disagg.py in CI (#17555) Michael Goin 2025-05-01 22:05:04 -06:00
afcb3f8863 [Attention] MLA move o_proj q_proj into cuda-graph region (#17484) Lucas Wilkinson 2025-05-01 23:16:26 -04:00
afb12e4294 [Doc] note that not all unit tests pass on CPU platforms (#17554) David Xia 2025-05-01 22:57:21 -04:00
24aebae177 [Bugfix] Disable gptq_bitblas for <SM80 to fix GPTQ on V100/T4 (#17541) Michael Goin 2025-05-01 18:59:35 -06:00
39c0813a7f [V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 (#17504) qizixi 2025-05-01 16:19:30 -07:00
9b70e2b4c1 [Misc][Tools][Benchmark] Publish script to auto tune server parameters (#17207) Chenyaaang 2025-05-01 12:53:03 -07:00
173daac19d [Bug]change the position of cuda_graph_sizes in dataclasses (#17548) Chen Xia 2025-05-01 11:52:37 -07:00
04f2cfc894 Remove duplicate code from dbrx.py (#17550) sstamenk 2025-05-01 20:51:58 +02:00
811a6c0972 [ROCM] Add gfx950 to the custom attention archs (#16034) Juan Villamizar 2025-05-01 13:18:28 -05:00
9b1769dd9a [Bugfix] Fix lint error (#17547) Cyrus Leung 2025-05-02 02:12:19 +08:00
61c299f81f [Misc]add configurable cuda graph size (#17201) Chen Xia 2025-05-01 11:04:50 -07:00
4acfa3354a [ROCm] update installation guide to include build aiter from source instructions (#17542) Hongxia Yang 2025-05-01 14:01:28 -04:00
88c8304104 [Model] Refactor Ovis2 to support original tokenizer (#17537) Isotr0py 2025-05-02 02:00:53 +08:00
6768ff4a22 Move the last arguments in arg_utils.py to be in their final groups (#17531) Harry Mellor 2025-05-01 18:31:44 +01:00
f2e7af9b86 [CI/Build] Remove awscli dependency (#17532) Cyrus Leung 2025-05-02 00:20:54 +08:00
7423cf0a9b [Misc] refactor example - cpu_offload_lmcache (#17460) Reid 2025-05-01 23:05:24 +08:00
460a2b1100 [torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations (#10867) Sage Moore 2025-05-01 07:59:28 -07:00
28566d73b3 [ROCm] remove unsupported archs from rocm triton flash-attention supported list (#17536) Hongxia Yang 2025-05-01 10:54:25 -04:00
98060b001d [Feature][Frontend]: Deprecate --enable-reasoning (#17452) Chauncey 2025-05-01 21:46:16 +08:00
f5a3c655b2 [FEAT] [ROCm]: Add Qwen/Qwen3-235B-A22B-FP8 TP4 triton fused moe config (#17535) TJian 2025-05-01 21:37:17 +08:00
7169f87ad0 [doc] add streamlit integration (#17522) Reid 2025-05-01 21:34:02 +08:00
b74d888c63 Fix more broken speculative decode tests (#17450) Huy Do 2025-05-01 06:05:58 -07:00
2007d4d54f [FEAT] [ROCm]: Add Qwen/Qwen3-30B-A3B-FP8 fused moe config for MI300X (#17530) TJian 2025-05-01 21:03:13 +08:00
48e925fab5 [Misc] Clean up test docstrings and names (#17521) Cyrus Leung 2025-05-01 20:19:32 +08:00
1903c0b8a3 [Frontend] Show progress bar for adding requests (#17525) Cyrus Leung 2025-05-01 20:15:32 +08:00
86a1f67a3b [Bugfix][Benchmarks] Allow benchmark of deepspeed-mii backend to select a model (#17285) Teruaki Ishizaki 2025-05-01 20:54:51 +09:00
a257d9bccc Improve configs - ObservabilityConfig (#17453) Harry Mellor 2025-05-01 11:52:05 +01:00
015069b017 [Misc] Optimize the Qwen3_ReasoningParser extract_reasoning_content (#17515) Chauncey 2025-05-01 18:29:01 +08:00
fbefc8a78d [Core] Enable IPv6 with vllm.utils.make_zmq_socket() (#16506) Russell Bryant 2025-05-01 05:38:18 -04:00
26bc4bbcd8 Avoid overwriting vllm_compile_cache.py (#17418) Keyun Tong 2025-05-01 00:30:57 -07:00
3c3d767201 [BugFix] Fix mla cpu - missing 3 required positional arguments (#17494) Lucas Wilkinson 2025-05-01 02:36:52 -04:00
13cf6b6236 [BugFix] fix speculative decoding memory leak when speculation is disabled (#15506) Noah Yoshida 2025-04-30 23:28:17 -07:00
90d0a54c4d [ROCm] Effort to reduce the number of environment variables in command line (#17229) Hongxia Yang 2025-05-01 02:27:06 -04:00
7a0a146c54 [Build] Require setuptools >= 77.0.3 for PEP 639 (#17389) Russell Bryant 2025-05-01 02:25:36 -04:00
7ab643e425 FIxing the AMD test failures caused by PR#16457 (#17511) Alexei-V-Ivanov-AMD 2025-05-01 01:23:07 -05:00
afb4429b4f [CI/Build] Reorganize models tests (#17459) Cyrus Leung 2025-05-01 14:03:08 +08:00
aa4502e7f3 [CI][Bugfix] Fix failing V1 Test due to missing 'cache_salt' arg (#17500) Michael Goin 2025-04-30 22:03:30 -06:00
17b4d85f63 [CI][TPU] Skip structured outputs+spec decode tests on TPU (#17510) Michael Goin 2025-04-30 21:36:20 -06:00
1144a8efe7 [Bugfix] Temporarily disable gptq_bitblas on ROCm (#17411) NaLan ZeYu 2025-05-01 10:51:45 +08:00
08fb5587b4 [Bugfix][ROCm] Fix import error on ROCm (#17495) Gregory Shtrasberg 2025-04-30 22:51:42 -04:00
dbc18e7816 [CI][TPU] Skip Multimodal test (#17488) Siyuan Liu 2025-04-30 19:51:39 -07:00
02bd654846 [Misc] Rename Audios -> Audio in Qwen2audio Processing (#17507) Alex Brooks 2025-04-30 20:51:36 -06:00
200bbf92e8 Bump Compressed Tensors version to 0.9.4 (#17478) Rahul Tuli 2025-04-30 17:24:45 -05:00
81ecf425f0 [v1][Spec Decode] Make sliding window compatible with eagle prefix caching (#17398) Chen Zhang 2025-05-01 02:25:53 +08:00

... 95 96 97 98 99 ...