Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

54cf1cae62 [Misc] Do not print async output warning for v1 (#21151) Woosuk Kwon 2025-07-17 21:57:02 -07:00
5780121c95 [Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm (#20911) shixianc 2025-07-17 21:34:43 -07:00
c7d8724e78 [Core] FlashInfer CUTLASS fused MoE backend (NVFP4) (#20037) Shu Wang 2025-07-17 23:32:45 -05:00
b38baabcf9 [Doc] Add inplace weights loading example (#19640) 22quinn 2025-07-17 21:12:23 -07:00
89cab4d01f [Attention] Make local attention backend agnostic (#21093) Lucas Wilkinson 2025-07-18 00:10:42 -04:00
b9a21e9173 [Docs] Update supported models documentation with missing models (#20844) Lucia Fang 2025-07-18 11:12:13 +08:00
c4e3b12524 [Docs] Add minimal demo of Ray Data API usage (#21080) Ricardo Decal 2025-07-17 20:09:19 -07:00
8dfb45ca33 [Bugfix] Fix the tensor non-contiguous issue for Flashinfer TRT-LLM backend attention kernel (#21133) elvischenv 2025-07-18 08:35:58 +08:00
8a8fc94639 [Log] Debugging Log with more Information (#20770) Wentao Ye 2025-07-17 20:19:46 -04:00
4de7146351 [V0 deprecation] Remove V0 HPU backend (#21131) Woosuk Kwon 2025-07-17 16:37:36 -07:00
ac9fb732a5 On environments where numa cannot be detected we get 0 (#21115) Eric Curtin 2025-07-17 19:52:17 +01:00
a3a6c695f4 [Misc] Qwen MoE model supports LoRA (#20932) Jee Jee Li 2025-07-18 02:32:52 +08:00
90bd2ab6e3 [Model] Update pooling model interface (#21058) Cyrus Leung 2025-07-18 00:05:40 +08:00
9fb2d22032 [Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762) ElizaWszola 2025-07-17 15:56:44 +02:00
2d6a38209b [Docs] Move code block out of admonition now that it's short (#21118) Harry Mellor 2025-07-17 14:12:29 +01:00
89e3c4e9b4 [Misc] Avoid unnecessary import (#21106) wangxiyuan 2025-07-17 20:57:41 +08:00
fe8a2c544a [Docs] Improve docstring formatting for FusedMoEParallelConfig.make (#21117) Harry Mellor 2025-07-17 12:13:00 +01:00
4ef00b5cac [VLM] Add Nemotron-Nano-VL-8B-V1 support (#20349) kYLe 2025-07-17 05:07:55 -05:00
5a7fb3ab9e [Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820) Asher 2025-07-17 17:10:09 +08:00
11dfdf21bf [Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels (#20903) Varun Sundar Rabindranath 2025-07-17 13:40:37 +05:30
fdc5b43d20 [Bugfix]: Fix final_res_batch list index out of range error (#21055) Chauncey 2025-07-17 15:29:09 +08:00
c5b8b5953a [Misc] Fix PhiMoE expert mapping (#21085) Jee Jee Li 2025-07-17 13:47:49 +08:00
4fcef49ec4 [V1] [KVConnector] Fix MultiprocExecutor worker output aggregation (#21048) David Ben-David 2025-07-17 08:29:45 +03:00
8a4e5c5f3c [V1][P/D]Enhance Performance and code readability for P2pNcclConnector (#20906) Zhonghua Deng 2025-07-17 13:13:00 +08:00
76b494444f [Attention] Refactor attention metadata builder interface (#20466) Lucas Wilkinson 2025-07-17 00:44:25 -04:00
28a6d5423d [Bugfix] Fix Machete zero point issue for GPTQ models on SM90 (#21066) Michael Goin 2025-07-16 22:54:45 -04:00
58760e12b1 [TPU] Start using python 3.12 (#21000) XiongfeiWei 2025-07-16 19:37:44 -07:00
a50d918225 [Docker] Allow FlashInfer to be built in the ARM CUDA Dockerfile (#21013) Michael Goin 2025-07-16 22:37:13 -04:00
c9ba8104ed [Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group (#21024) Kevin_Xiong 2025-07-17 10:36:36 +08:00
4e7dfbe7b4 Update PyTorch to torch==2.7.1 for CUDA (#21011) Michael Goin 2025-07-16 22:30:44 -04:00
72ad273582 Remove torch_xla.tpu.version() from pallas.py. (#21065) QiliangCui 2025-07-16 17:25:26 -07:00
01513a334a Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor) (#12010) Nir David 2025-07-16 22:33:41 +03:00
ac2bf41e53 [Model] Remove model sampler (#21059) Cyrus Leung 2025-07-17 03:03:37 +08:00
a931b4cdcf Remove Qwen Omni workaround that's no longer necessary (#21057) Harry Mellor 2025-07-16 17:25:23 +01:00
a0f8a79646 [fix] fix qwen image_embeds input (#21049) Avshalom Manevich 2025-07-16 17:17:20 +02:00
18bdcf4113 feat - add a new endpoint get_tokenizer_info to provide tokenizer/chat-template information (#20575) Mac Misiura 2025-07-16 14:52:14 +01:00
1c3198b6c4 [Model] Consolidate pooler implementations (#20927) Cyrus Leung 2025-07-16 21:39:13 +08:00
260127ea54 [Docs] Add intro and fix 1-2-3 list in frameworks/open-webui.md (#19199) Michael Yao 2025-07-16 21:11:38 +08:00
d0dc4cfca4 Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests (#20831) Seiji Eicher 2025-07-16 00:14:49 -07:00
d31a647124 [BugFix] Fix import error on non-blackwell machines (#21020) Lucas Wilkinson 2025-07-16 01:27:29 -04:00
85431bd9ad [TPU] fix kv_cache_update kernel block size choosing logic (#21007) Chengji Yao 2025-07-15 21:39:48 -07:00
c11013db8b [Meta] Llama4 EAGLE Support (#20591) zhiweiz 2025-07-15 21:14:15 -07:00
1eb2b9c102 [CI] update typos config for CI pre-commit and fix some spells (#20919) Peter Pan 2025-07-16 12:12:40 +08:00
6ebf313790 Avoid direct comparison of floating point numbers (#21002) Maximilien de Bayser 2025-07-16 01:12:14 -03:00
cfbcb9ed87 [Voxtral] Add more tests (#21010) Patrick von Platen 2025-07-16 06:11:49 +02:00
76ddeff293 [Doc] Remove duplicate docstring (#21012) Wentao Ye 2025-07-15 23:09:13 -04:00
f46098335b [Bugfix] Fix Mistral3 support on SM100/SM120 (#20998) Michael Goin 2025-07-15 23:08:41 -04:00
e9534c7202 [CI][HPU] update for v0 deprecate by switching to VLLM_TARGET_DEVICE=empty (#21006) Chendi.Xue 2025-07-15 22:07:05 -05:00
7976446015 Add Dockerfile argument for VLLM_USE_PRECOMPILED environment (#20943) Doug Smith 2025-07-15 22:53:57 -04:00
fcb9f879c1 [Bugfix] Correct per_act_token in CompressedTensorsW8A8Fp8MoECutlassM… (#20937) Ming Yang 2025-07-15 19:53:42 -07:00
3ed94f9d0a [Docs] Enhance Anyscale documentation, add quickstart links for vLLM (#21018) Ricardo Decal 2025-07-15 22:46:56 -04:00
fa839565f2 [Misc] Refactor: Improve argument handling for conda command (#20481) Reid 2025-07-16 10:43:19 +08:00
75a99b98bf [Chore] Remove outdated transformers check (#20989) Brayden Zhong 2025-07-15 22:42:40 -04:00
b5c3b68359 [Misc] bump xgrammar version to v0.1.21 (#20992) Chauncey 2025-07-16 10:42:16 +08:00
6cbc4d4bea [Model] Add ModelConfig class for GraniteMoeHybrid to override default max_seq_len_to_capture (#20923) Thomas Parnell 2025-07-16 04:19:10 +02:00
153c6f1e61 [Frontend] Remove print left in FrontendArgs.add_cli_args (#21004) Michael Goin 2025-07-15 22:18:41 -04:00
34cda778a0 [Frontend] OpenAI Responses API supports input image (#20975) Chauncey 2025-07-16 08:59:36 +08:00
30800b01c2 [Nvidia] Integrate SM100 cudnn prefill API to MLA prefill (#20411) Elfie Guo 2025-07-15 17:56:45 -07:00
10be209493 [Bug Fix] get_distributed_init_method should get the ip from get_ip i… (#20889) Chen LI 2025-07-15 14:23:52 -07:00
19c863068b [Frontend] Support cache_salt in /v1/completions and /v1/responses (#20981) Marko Rosenmueller 2025-07-15 23:01:04 +02:00
f29fd8a7f8 [BugFix] fix 3 issues: (1) using metadata for causal-conv1d, (2) indexing overflow in v1 vLLM, and (3) init_states in v0 (#20838) Tuan, Hoang-Trong 2025-07-15 16:08:26 -04:00
ed10f3cea1 [ROCm] warpSize is being made non constexpr in ROCm 7.0 (#20330) Gregory Shtrasberg 2025-07-15 14:01:44 -04:00
b637e9dcb8 Add full serve CLI reference back to docs (#20978) Harry Mellor 2025-07-15 18:42:30 +01:00
1e36c8687e [Deprecation] Remove nullable_kvs (#20969) Harry Mellor 2025-07-15 18:21:50 +01:00
5bac61362b Configure Gemini (#20971) Harry Mellor 2025-07-15 17:37:05 +01:00
313ae8c16a [Deprecation] Remove everything scheduled for removal in v0.10.0 (#20979) Harry Mellor 2025-07-15 16:57:53 +01:00
c847e34b39 [CI/Build] Fix wrong path in Transformers Nightly Models Test (#20994) Cyrus Leung 2025-07-15 23:53:16 +08:00
e7e3e6d263 Voxtral (#20970) Patrick von Platen 2025-07-15 16:35:30 +02:00
4ffd963fa0 [v1][core] Support for attention free models (#20811) Christian Pinto 2025-07-15 15:20:01 +01:00
56fe4bedd6 [Deprecation] Remove TokenizerPoolConfig (#20968) Harry Mellor 2025-07-15 15:00:50 +01:00
d91278181d [doc] Add more details for Ray-based DP (#20948) Rui Qiao 2025-07-15 05:37:12 -07:00
20149d84d9 [MISC] Add init files for python package (#20908) Li Wang 2025-07-15 20:16:33 +08:00
3534c39a20 [V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840) Thomas Parnell 2025-07-15 13:04:35 +02:00
c586b55667 [TPU] Optimize kv cache update kernel (#20415) Yifei Teng 2025-07-15 03:56:43 -07:00
33d560001e [Docs] Improve documentation for ray cluster launcher helper script (#20602) Ricardo Decal 2025-07-15 06:55:45 -04:00
f148c44c6a [frontend] Refactor CLI Args for a better modular integration (#20206) kourosh hakhamaneshi 2025-07-15 02:23:42 -07:00
235bfd5dfe [Docs] Improve documentation for RLHF example (#20598) Ricardo Decal 2025-07-15 04:54:10 -04:00
68d28e37b0 [frontend] Add --help=page option for paginated help output (#20961) Reid 2025-07-15 15:42:00 +08:00
37a7d5d74a [Misc] Refactor AllReduceFusionPass. Remove parameter (#20918) Ilya Markov 2025-07-15 08:57:40 +02:00
d4d309409f Implement Async Scheduling (#19970) Woosuk Kwon 2025-07-14 23:01:46 -07:00
85bd6599e4 [Model] Add AutoWeightsLoader support for BERT, RoBERTa (#20534) Jennifer He 2025-07-15 01:34:24 -04:00
91b3d190ae [cold start] replace VLLM_COMPILE_DEPYF with debug_dump_dir (#20940) Boyuan Feng 2025-07-14 22:02:17 -07:00
fc017915f5 [Doc] Clearer mistral3 and pixtral model support description (#20926) Isotr0py 2025-07-15 12:56:53 +08:00
9ad0a4588b [Bugfix] Switch bailout logic for kv-cache-dtype with SM100 Flashinfer (#20934) Pavani Majety 2025-07-14 20:27:50 -07:00
016b8d1b7f Enabled BnB NF4 inference on Gaudi (#20172) Ruheena Suhani Shaik 2025-07-15 08:56:08 +05:30
80305c1b24 [CI] Fix flaky test_streaming_response test (#20913) Nicolò Lucchesi 2025-07-15 05:15:15 +02:00
37e2ecace2 feat: add image zoom to improve image viewing experience (#20763) Reid 2025-07-15 11:14:23 +08:00
054c8657e3 [Docs] Add Kuberay to deployment integrations (#20592) Ricardo Decal 2025-07-14 23:13:55 -04:00
d4170fad39 Use w8a8 quantized matmul Pallas kernel (#19170) XiongfeiWei 2025-07-14 20:06:33 -07:00
946aadb4a0 [CI/Build] Split Entrypoints Test into LLM and API Server (#20945) Michael Goin 2025-07-15 11:44:18 +09:00
bcdfb2a330 [Bugfix] Fix incorrect dispatch for CutlassBlockScaledGroupedGemm and DeepGEMM (#20933) Michael Goin 2025-07-15 10:42:17 +09:00
ba8c300018 [BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache (#20942) Richard Zou 2025-07-14 21:26:18 -04:00
8cdc371217 SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP (#20769) Alexander Matveev 2025-07-14 21:06:38 -04:00
61e20828da Fall back if flashinfer comm module not found (#20936) Yong Hoon Shin 2025-07-14 16:11:18 -07:00
55e1c66da5 [Docs] remove outdated performance benchmark (#20935) Kuntai Du 2025-07-14 15:14:17 -07:00
86f3ac21ce Fix overflow indexing in causal_conv1d kernel (#20938) Thomas Parnell 2025-07-14 23:43:07 +02:00
149f2435a5 [Misc] Relax translations tests (#20856) Nicolò Lucchesi 2025-07-14 22:08:36 +02:00
c0569dbc82 [Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725) Varun Sundar Rabindranath 2025-07-15 01:17:16 +05:30
8bb43b9c9e Add benchmark dataset for mlperf llama tasks (#20338) Michael Goin 2025-07-15 04:10:07 +09:00
559756214b Change default model to Qwen3-0.6B (#20335) Tyler Michael Smith 2025-07-14 12:54:52 -04:00

... 80 81 82 83 84 ...