Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

9112b443a0 [Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011) Siyuan Liu 2025-06-02 17:06:20 -07:00
c57d577e8d add an absolute path for run.sh (#18258) Calvin Chen 2025-06-03 03:38:23 +08:00
ca2f6b9c30 [Bugfix][Model] Attempt to fix eagle in V0. (#18978) Gregory Shtrasberg 2025-06-02 11:15:53 -04:00
20133cfee2 [Frontend] enable custom logging for the uvicorn server (OpenAI API server) (#18403) Frαnçois 2025-06-02 17:04:23 +02:00
ebb1ec9318 [Model] enable data parallel for Llama4 vision encoder (#18368) jennyyyyzhen 2025-06-02 04:22:54 -07:00
5b168b6d7a [doc] add pytest tips (#19010) Reid 2025-06-02 19:07:26 +08:00
9760fd8f6a [Core] Support inplace model weights loading (#18745) 22quinn 2025-06-02 02:38:50 -07:00
b9f61e1387 [Bugfix][Nixl] Fix DP Metadata Handshake (#19008) Robert Shaw 2025-06-01 23:30:41 -04:00
d6fd3a33b8 [Misc] reuse num_tokens_across_dp of get_dp_padding to avoid unnecessary dp all reduce in set_forward_context (#18935) zhrrr 2025-06-02 03:41:18 +08:00
432ec9926e [doc] wrong output (#19000) Reid 2025-06-01 19:26:14 +08:00
2b102d51ad [BugFix] Fix incorrect metrics shutdown error log message (#18992) Nick Hill 2025-05-31 20:42:23 -07:00
aa54a7bf7b [BugFix] fix data parallel construct ipv6 url addres (#18991) rongfu.leng 2025-06-01 11:42:10 +08:00
2ad6194a02 Let max_num_batched_tokens use human_readable_int for large numbers (#18968) Michael Goin 2025-05-31 23:41:29 -04:00
c594cbf565 [doc] small fix - mkdocs (#18996) Reid 2025-06-01 11:23:43 +08:00
a35ca765a5 [LoRA] Support dynamically initialize packed_modules_mapping for VLM with arbitrary components (#18987) Isotr0py 2025-06-01 11:06:57 +08:00
6aa8f9a4e7 [Core] Rework dtype resolution (#18751) Cyrus Leung 2025-06-01 11:04:23 +08:00
1bc86a3da1 [Bugfix] Fix EAGLE3 broken logits (#18909) Benjamin Chislett 2025-05-31 22:58:07 -04:00
bbfa0c61d1 [Misc][Benchmark] Add support for CustomDataset (#18511) Ekagra Ranjan 2025-05-31 15:07:38 -04:00
20079c6e36 [Misc] add return token strs for tokenize (#18941) Reid 2025-06-01 02:00:11 +08:00
9a1b9b99d7 [BugFix] Fix multi-node offline data-parallel (#18981) Nick Hill 2025-05-31 08:34:52 -07:00
8bf507d766 [P/D] NixlConnector use cache device index for memory registration (#18969) ptarasiewiczNV 2025-05-31 17:19:18 +02:00
306d60401d [ROCm][Kernel] Add gfx950 support for skinny gemms (#18010) Charlie Fu 2025-05-31 09:40:05 -05:00
f2c3f66d59 [Bugfix] Fix for issue 17396 (#18773) Fred Reiss 2025-05-31 04:58:17 -07:00
0f5e0d567e [FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825) vllmellm 2025-05-31 18:39:31 +08:00
c55d804672 [BugFix] Pydantic part 2 (#18911) Luka Govedič 2025-05-31 06:39:28 -04:00
749f5bdd38 [doc] fix the list rendering issue - security.md (#18982) Reid 2025-05-31 18:39:21 +08:00
2a50ef5760 [Neuron] Add Multi-Modal model support for Neuron (#18921) Satyajith Chilappagari 2025-05-31 03:39:11 -07:00
b8b904795d fix security issue of logging llm output (#18980) Lucia Fang 2025-05-31 03:38:56 -07:00
ba5111f237 [Bugfix]: Fix the incompatibility issue with Structured Outputs when Thinking is disabled (#18879) Chauncey 2025-05-31 17:20:54 +08:00
1e123529d7 [Misc] Fix estimated max model len msg (#18966) Yong Hoon Shin 2025-05-31 01:43:44 -07:00
dff80b0e42 [Frontend] Add rerank support to run_batch endpoint (#16278) Pooya Davoodi 2025-05-31 00:40:01 -07:00
7782464a17 create util function for batched arange (#18937) Yu Guo 2025-05-30 22:50:38 -07:00
0f71e24034 [Docs] Correct multiprocessing design doc (#18964) Lukas Geiger 2025-05-31 02:30:15 +01:00
1dab4d5718 Tool parser regex timeout handling (#18960) Will Eaton 2025-05-30 17:02:54 -04:00
7f21e8052b [Misc] add group_size is -1 in awq quantization (#18910) rongfu.leng 2025-05-31 01:34:22 +08:00
5a8641638a [VLM] Add PP support and fix GPTQ inference for Ovis models (#18958) Isotr0py 2025-05-31 01:11:44 +08:00
f49239cb45 Benchmark script for fp8 vs bf16 gemm (#17126) Michael Goin 2025-05-30 12:56:11 -04:00
5fbbfe9a4c [BugFix] FA2 MLA Accuracy Issue (#18807) v0.9.0.1 Lucas Wilkinson 2025-05-28 04:59:39 -04:00
2dbe8c0774 [Perf] API-server scaleout with many-to-many server-engine comms (#17546) Nick Hill 2025-05-30 08:17:00 -07:00
84ec470fca Improve "failed to get the hash of the compiled graph" error (#18956) Richard Zou 2025-05-30 11:00:54 -04:00
b29ca5c4d5 [Docs] Update SECURITY.md with link to our security guide (#18961) Russell Bryant 2025-05-30 10:37:27 -04:00
ec6833c5e9 [doc] show the count for fork and watch (#18950) Reid 2025-05-30 21:45:59 +08:00
e1fadf1197 [Feature] minicpm eagle support (#18943) Shawn Huang 2025-05-30 21:45:56 +08:00
43ff405b90 [CI/Build] remove regex from build dependencies (#18945) Daniele 2025-05-30 13:02:50 +02:00
fba02e3bd1 [Bugfix][TPU] Fix tpu model runner testcase failure (#18810) Carol Zheng 2025-05-30 10:04:03 +00:00
4577fc9abb [Misc]Fix typo (#18947) Always-Naive 2025-05-30 17:21:35 +08:00
5f1d0c8118 [Bugfix][Failing Test] Fix test_vllm_port.py (#18618) Rabi Mishra 2025-05-30 14:43:47 +05:30
c3bb9f2331 [Model] Use in-place adds in SigLIP (#18922) Lukas Geiger 2025-05-30 10:12:59 +01:00
8f8900cee9 [doc] add mkdocs doc (#18930) Reid 2025-05-30 15:58:44 +08:00
6acb7a6285 [Misc]Fix benchmarks/README.md for speculative decoding (#18897) Rabi Mishra 2025-05-30 13:28:04 +05:30
4f4a6b844a [Deprecation] Remove mean pooling default for Qwen2EmbeddingModel (#18913) Cyrus Leung 2025-05-30 14:53:37 +08:00
4d0a1541be [Bugfix] Remove NVFP4 scales assertions to fix load_format=dummy (#18861) Michael Goin 2025-05-30 01:37:36 -04:00
77b6e74fe2 [ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. (#18938) vllmellm 2025-05-30 13:33:17 +08:00
5acf828d99 [docs] fix: fix markdown syntax (#18927) H 2025-05-29 22:20:48 -07:00
3987e2ae96 [Model] Use AutoWeightsLoader for mamba2 (#18918) iLeGend 2025-05-30 12:50:10 +08:00
77164dad5e [Bugfix] Consistent ascii handling in tool parsers (#18883) Chauncey 2025-05-30 12:44:43 +08:00
3de3eadf5b improve the robustness of parsing vlms config in AutoRound (#18894) Wenhua Cheng 2025-05-30 10:24:47 +08:00
3132290a14 [TPU][CI/CD] Clean up docker for TPU tests. (#18926) Carol Zheng 2025-05-30 02:24:19 +00:00
1aa2f81b43 [Misc] Update type annotation for rotary embedding base (#18914) Cyrus Leung 2025-05-30 10:17:01 +08:00
d54af615d5 [Bugfix] Fix PP default fallback behavior for V1 (#18915) Michael Goin 2025-05-29 22:13:17 -04:00
a1cc9f33a3 [TPU] remove transpose ops in moe kernel (#18923) Chengji Yao 2025-05-29 16:00:11 -07:00
a521ef06e5 Use standalone_compile by default in torch >= 2.8.0 (#18846) Richard Zou 2025-05-29 18:41:58 -04:00
64eaf5fe05 [P/D] NixlConnector DP fixes (#18903) Will Eaton 2025-05-29 14:08:40 -04:00
d1d61f3351 [BugFix] Make DP work with connector-delayed new requests (#18559) Nick Hill 2025-05-29 11:04:18 -07:00
32ce3cf7c9 [V1] Allocate kv_cache with stride order for V1 (#18775) Nicolò Lucchesi 2025-05-29 19:54:16 +02:00
d58f9c7f7a [Misc] Remove duplicate init for self.vllm_config (#18896) CYJiang 2025-05-30 01:26:07 +08:00
c29034037d [Deprecation] Disallow pos-args other than model when initializing LLM (#18802) Cyrus Leung 2025-05-30 00:36:58 +08:00
1b7cfd5a36 [ROCm][V0][Attention] Revert to the previous FA triton kernel (#18226) Gregory Shtrasberg 2025-05-29 12:13:18 -04:00
da4b69d0b4 [Attention][V1] Toggle for v1 attention backend (#18275) Gregory Shtrasberg 2025-05-29 10:48:24 -04:00
c9479b2920 [Bugfix] Fix the failing gte embedding test (#18720) Isotr0py 2025-05-29 22:39:25 +08:00
6f2909405e [Doc] Fix codeblocks formatting in LoRA adapters documentation (#18907) Hyogeun Oh (오효근) 2025-05-29 23:38:55 +09:00
b169d5f7b6 [Misc][Tools][Benchmark] Add benchmark_serving supports for llama.cpp. (#18692) Duyi-Wang 2025-05-29 20:02:08 +08:00
f8977c233f Fix an error in dummy weight loading for quantization models (#18855) Chenyaaang 2025-05-29 03:07:20 -07:00
f274581f44 [BugFix] Update pydantic to fix error on python 3.10 (#18852) Luka Govedič 2025-05-29 06:05:46 -04:00
0b1447f890 [Bugfix] Ensure tensors are contiguous during serialisation (#18860) Lukas Geiger 2025-05-29 11:05:20 +01:00
24d0ef8970 [Misc] Replace TODO in serving transcription (#18895) Nicolò Lucchesi 2025-05-29 11:58:14 +02:00
7fcfd954ff [Bugfix] Fix misleading information in the documentation (#18845) Jee Jee Li 2025-05-29 17:54:14 +08:00
e740d07f07 [doc] add CLI doc (#18871) Reid 2025-05-29 17:51:36 +08:00
a652e71dd0 [Doc] Remove redundant spaces from compatibility_matrix.md (#18891) Michael Yao 2025-05-29 17:51:20 +08:00
34d6c447c4 [LoRA] Add LoRA support for InternVL (#18842) Jee Jee Li 2025-05-29 16:46:24 +08:00
972eddf7c9 [Neuron] Add multi-LoRA support for Neuron. (#18284) Satyajith Chilappagari 2025-05-29 01:41:22 -07:00
fd7bb88d72 Fixes a dead link in nightly benchmark readme (#18856) Brent Salisbury 2025-05-29 00:41:39 -04:00
3c49dbdd03 Skip device and quant Pydantic validation to make plugin device work (#18843) Yikun Jiang 2025-05-29 11:12:30 +08:00
1661a9c28f [Doc][Neuron] Update documentation for Neuron (#18868) aws-elaineyz 2025-05-28 19:44:01 -07:00
8e882ffdc0 [Bugfix][TPU] fix moe custom kernel import (#18853) Chengji Yao 2025-05-28 19:34:19 -07:00
26b4fa45be Add ability to use CUDAGraphs with use_inductor=False (#17345) Richard Zou 2025-05-28 22:16:52 -04:00
515b413ebf Prevent the cross-encoder logic from being applied to classification tasks (#18838) Maximilien de Bayser 2025-05-28 23:16:17 -03:00
269d901734 [Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix (#18100) Hongxia Yang 2025-05-28 19:21:46 -04:00
7951d78738 [Core] Enable CUDA graphs for DP + All2All kernels (#18724) Varun Sundar Rabindranath 2025-05-28 18:55:30 -04:00
6dbe5b5c93 Remove checks for None for fields which should never be None (#17985) Harry Mellor 2025-05-28 22:32:19 +01:00
643622ba46 [Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend (#15655) Akshat Tripathi 2025-05-28 20:59:09 +01:00
a09c7ca9f2 [Chore][Spec Decode] Update check NoneType instead of assigning variables (#18836) Aaron Pham 2025-05-28 14:57:19 -04:00
0e98964e94 [V1][Metrics] Remove metrics that were deprecated in 0.8 (#18837) Mark McLoughlin 2025-05-28 19:54:12 +01:00
c68b5c63eb [Misc] fix olmoe model layer can't laod in tp gt 1 (#18828) rongfu.leng 2025-05-29 01:36:21 +08:00
fced756923 [Chore] update ty configuration (#18839) Aaron Pham 2025-05-28 11:59:11 -04:00
321331b8ae [Core] Add Lora Support to Beam Search (#18346) Alex Brooks 2025-05-28 09:58:24 -06:00
6e4cea1cc5 decrement server_load on listen for disconnect (#18784) daniel-salib 2025-05-28 07:15:12 -07:00
435fa95444 [Frontend] add run batch to CLI (#18804) Reid 2025-05-28 22:08:57 +08:00
4c2b38ce9e Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599) Harry Mellor 2025-05-28 13:46:04 +01:00
d781930f90 [Platform][Dist] Make torch distributed process group extendable (#18763) Mengqing Cao 2025-05-28 18:52:34 +08:00

... 89 90 91 92 93 ...