Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

da31f6ad3d Revert precompile wheel changes (#22055) Simon Mo 2025-08-01 01:26:24 -07:00
98df153abf [Frontend] Align tool_choice="required" behavior with OpenAI when tools is empty (#21052) Sungyoon Jeong 2025-08-01 16:54:17 +09:00
e0f63e4a35 [Core] Avoid repeated len(block_token_ids) check in hash_request_tokens (#21781) Zebing Lin 2025-08-01 03:23:29 -04:00
b4e081cb15 [Bugfix] Disable multi-modal preprocessor cache for DP (#21896) Cyrus Leung 2025-08-01 15:03:56 +08:00
79731a79f0 [Doc] Fix a syntax error of example code in structured_outputs.md (#22045) Hongsheng Liu 2025-08-01 15:01:22 +08:00
53d7c39271 Update sampling_metadata.py (#21937) Aviad Rossmann 2025-08-01 09:23:18 +03:00
61dcc280fa [Doc] Add Voxtral to Supported Models page (#22059) Cyrus Leung 2025-08-01 14:10:56 +08:00
0f46a780d4 [Model] [Quantization] Support quantization for Gemma3n (#21974) Kyle Sayers 2025-08-01 01:45:15 -04:00
e1a7fe4af5 [BugFix] fix: aot passes kvcache dtype information (#19750) Mickaël Seznec 2025-08-01 07:45:02 +02:00
82de9b9d46 [Misc] Automatically resolve HF processor init kwargs (#22005) Cyrus Leung 2025-08-01 13:44:10 +08:00
ad57f23f6a [Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache (#20873) Charent 2025-08-01 10:48:13 +08:00
3700642013 [Refactor] Remove Duplicate per_block_cast_to_fp8, Remove Dependencies of DeepGEMM (#21787) Wentao Ye 2025-07-31 21:13:27 -04:00
0bd409cf01 Move flashinfer-python to optional extra vllm[flashinfer] (#21959) Michael Goin 2025-07-31 21:02:11 -04:00
e360316ab9 Add DeepGEMM to Dockerfile in vllm-base image (#21533) Matthew Bonanni 2025-07-31 21:01:55 -04:00
c3e0e9337e [Feature] Add Flashinfer MoE Support for Compressed Tensor NVFP4 (#21639) Wentao Ye 2025-07-31 18:26:11 -04:00
6e672daf62 Add FlashInfer allreduce RMSNorm Quant fusion (#21069) Ilya Markov 2025-07-31 22:58:38 +02:00
2dff2e21d9 [Bugfix] Fix MTP weight loading (#21941) Benjamin Chislett 2025-07-31 16:33:53 -04:00
71470bc4af [Misc] Add unit tests for chunked local attention (#21692) Yong Hoon Shin 2025-07-31 11:39:16 -07:00
9e0726e5bf [Meta] Official Eagle mm support, first enablement on llama4 (#20788) zhiweiz 2025-07-31 10:35:07 -07:00
53c21e492e Update torch_xla pin to 20250730 (#21956) XiongfeiWei 2025-07-31 10:26:43 -07:00
0780bb5783 Removing amdproduction Tests (#22027) Alexei-V-Ivanov-AMD 2025-07-31 11:53:27 -05:00
58bb902186 fix(setup): improve precompiled wheel setup for Docker builds (#22025) Doug Smith 2025-07-31 12:52:48 -04:00
7349d5268b [ez] Remove a trailing space from compilation/decorators.py (#22028) Zhengxu Chen 2025-07-31 12:46:07 -04:00
9484641616 [Model] Add step3 vl (#21998) Song 2025-07-31 23:19:06 +08:00
207b750e19 [NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend (#21458) amirkl94 2025-07-31 16:00:01 +03:00
5daffe7cf6 [BugFix] Fix case where collective_rpc returns None (#22006) Nick Hill 2025-07-31 13:51:37 +01:00
2836dd73f1 [Model][CI] Let more pooling models support v1 (#21747) wang.yuqi 2025-07-31 16:51:15 +08:00
d2aab336ad [CI/Build] get rid of unused VLLM_FA_CMAKE_GPU_ARCHES (#21599) Daniele 2025-07-31 09:00:08 +02:00
9532a6d563 [Deprecation] Remove deprecated args and methods (#21907) Cyrus Leung 2025-07-31 14:46:38 +08:00
3e36fcbee6 [Bugfix]: fix metadata file copy in test_sharded_state_loader (#21830) Ning Xie 2025-07-31 14:22:11 +08:00
055bd3978e [CI Bugfix] Fix CI OOM for test_shared_storage_connector_hashes (#21973) Michael Goin 2025-07-30 23:45:29 -04:00
0f7919fca0 [Misc] Expand SUPPORTED_HIDDEN_SIZES for DeepEP low-latency kernels (#21818) Jee Jee Li 2025-07-31 11:41:12 +08:00
61445453df [UX] Rename CUTLASS_MLA_VLLM_V1 to CUTLASS_MLA (#21966) Michael Goin 2025-07-30 23:40:34 -04:00
ec02e536df [Bugfix] Relax lang pin for voxtral (#21833) Sanchit Gandhi 2025-07-31 04:38:52 +01:00
9cb497bfa3 [Example] Add async_llm_streaming.py example for AsyncLLM streaming in python (#21763) Michael Goin 2025-07-30 20:39:46 -04:00
ca9e2be3ed [Core] Move EngineCoreRequest to Request conversion out of EngineCore (#21627) Zebing Lin 2025-07-30 18:00:54 -04:00
601f856d56 [Bugfix] Fix None value handling in trace span creation for cancelled requests (#20272) Bram 2025-07-30 14:44:02 -07:00
287f527f54 [Feature] Add async tensor parallelism for scaled mm (#20155) cascade 2025-07-30 14:23:41 -07:00
f12d9256b3 [Misc] Use dracut on CentOS and skip clone if repo exists for EP kernel installation (#21635) Ming Yang 2025-07-30 13:15:06 -07:00
b9b753e7a7 For VLLM_USE_PRECOMPILED, only compiled .so files should be extracted (#21964) Doug Smith 2025-07-30 16:04:40 -04:00
56bd537dde [Misc] Support more collective_rpc return types (#21845) Nick Hill 2025-07-30 18:20:20 +01:00
8f0d516715 [TPU] Support Pathways in vLLM (#21417) wenxindongwork 2025-07-30 10:02:12 -07:00
f4135232b9 feat(distributed): add get_required_kvcache_layout class method to kv connector api (#20433) wxsm 2025-07-31 00:41:51 +08:00
4904e53c32 [Bugfix] SharedStorage Connector for V1 PD multimodal (#21611) Chenguang Zheng 2025-07-31 00:18:37 +08:00
004203e953 [CI/Build] Fix registry tests (#21934) Cyrus Leung 2025-07-31 00:10:41 +08:00
5c765aec65 [Bugfix] Fix TypeError in scheduler when comparing mixed request_id types (#21816) 633WHU 2025-07-30 23:54:44 +08:00
ad510309ee Override attention metadata for fast prefill in some KV sharing setups (#21590) Yong Hoon Shin 2025-07-30 08:54:15 -07:00
366f6b3a4d [Bugfix] Fix multi-api server not working for text models (#21933) Cyrus Leung 2025-07-30 23:42:05 +08:00
6e599eebe8 [Bugfix] Fix OOM tests in initialization test (#21921) Isotr0py 2025-07-30 22:35:47 +08:00
88edf5994c [Docs] Reduce the size of the built docs (#21920) Harry Mellor 2025-07-30 15:35:08 +01:00
ff08e51940 [NVIDIA] Fix Llama4 Scout FP4 functionality issues (#21499) Po-Han Huang (NVIDIA) 2025-07-30 22:33:40 +08:00
8f4a1c9a04 [Misc] Improve code readability of KVCacheManager (#21673) Ruixiang Tan 2025-07-30 22:20:43 +08:00
36ede45989 Reduce time wasted in GitHub Actions using concurrency (#21919) Harry Mellor 2025-07-30 15:18:02 +01:00
0e40b26073 [CI/Build] Only run markdownlint in CI (#21892) Cyrus Leung 2025-07-30 22:17:14 +08:00
0271c2ff2f [Test] Add Benchmark and Unit Test for per_token_group_quant (#21860) Wentao Ye 2025-07-30 10:15:02 -04:00
e91d3c9cda [misc] skip p2p check by default (#21904) youkaichao 2025-07-30 22:05:04 +08:00
bf668b5bf5 [Feature] Support multiple api keys in server (#18548) Yan Pashkovsky 2025-07-30 15:03:23 +01:00
da3e0bd6e5 [Bugfix] we should use metavar is not choices (#21902) rongfu.leng 2025-07-30 21:51:58 +08:00
fcfd1eb9c5 [Doc] Remove vLLM prefix and add citation for PagedAttention (#21910) Cyrus Leung 2025-07-30 21:36:34 +08:00
d979dd6beb [Feature][EPLB] Add eplb support for Qwen3 (#20815) aladerran 2025-07-30 21:27:57 +08:00
b876860c62 [Hardware][CPU] Build fix for ARM without BF16 (#21848) Eric Curtin 2025-07-30 14:22:00 +01:00
13986365a9 Add @patrickvonplaten as maintainer of mistral's related files. (#21928) Patrick von Platen 2025-07-30 14:42:51 +02:00
5c8fe389d6 [Docs] Fix the example code of streaming chat completions in reasoning (#21825) Hongsheng Liu 2025-07-30 20:11:58 +08:00
5bbaf492a6 [Doc] Update partial support (#21916) Cyrus Leung 2025-07-30 16:32:39 +08:00
533db0935d [benchmark] add max-concurrency in result table (#21095) Peter Pan 2025-07-30 16:15:43 +08:00
fc91da5499 [Model] Remove DSV2 unused code (#21903) Jee Jee Li 2025-07-30 15:55:03 +08:00
547795232d [Tests] Fixing bug inside MultiModalProfiler. (#21842) Varun Vinayak Shenoy 2025-07-30 00:44:15 -07:00
30ef30ed5a [CI] rollback lint-and-deploy pipeline using amd machine (#21912) Kebe 2025-07-30 15:37:59 +08:00
02f82fe438 [Doc] Update Intern-S1 info (#21908) Jee Jee Li 2025-07-30 14:58:57 +08:00
2ca5f82c2a [Misc] Remove redundant config definitions (#21891) Cyrus Leung 2025-07-30 14:54:18 +08:00
6f8d261882 Update vLLM Benchmark Suite for Xeon based on 0.9.2 release (#21486) Louie Tsai 2025-07-29 22:57:03 -07:00
4cd7fe6cea [Docs] Expand introduction to Ray in Multi-node deployment section (#21584) Ricardo Decal 2025-07-29 22:07:28 -07:00
16f3250527 [CI/Build] Fix pre-commit failure in docs (#21897) Cyrus Leung 2025-07-30 12:53:08 +08:00
e3bc17ceea Add @sighingnow as maintainer of qwen's related files. (#21895) Tao He 2025-07-30 12:30:44 +08:00
05cbbe20c5 [XPU] use ZE_AFFINITY_MASK for device select on xpu (#21815) Kunshang Ji 2025-07-30 11:56:14 +08:00
65f311ce59 [Frontend] Add LLM.reward specific to reward models (#21720) wang.yuqi 2025-07-30 11:56:03 +08:00
1b0a155534 [Perf] Using __nv_fp8_e4m3 instead of c10::e4m3 for per_token_group_quant (#21867) Wentao Ye 2025-07-29 23:50:46 -04:00
44bc46da60 [Bugfix] Actually disable processing cache when API server is scaled out (#21839) Cyrus Leung 2025-07-30 11:36:04 +08:00
b7b23da4d2 [Bugfix] Fix comment typo of get_num_common_prefix_blocks() (#21827) MingzhenHan 2025-07-30 11:35:33 +08:00
fdde18229e [Bugfix] Fix shape mismatch assertion error when loading Gemma3n model with BitsAndBytes quantization (#21808) Areeb Syed 2025-07-30 09:05:21 +05:30
b917da442b Expose PyTorch profiler configuration to environment variables (#21803) Csrayz 2025-07-30 10:46:31 +08:00
fb58e3a651 [Docs] Update docker.md with HF_TOKEN, new model, and podman fix (#21856) Michael Goin 2025-07-29 22:45:41 -04:00
76080cff79 [DOC] Fix path of v1 related figures (#21868) Chen Zhang 2025-07-29 19:45:18 -07:00
ba5c5e5404 [Docs] Switch to better markdown linting pre-commit hook (#21851) Harry Mellor 2025-07-30 03:45:08 +01:00
555e7225bc [v1][attention] Support Hybrid Allocator + FlashInfer (#21412) Chen Zhang 2025-07-29 18:45:29 -07:00
0e36abf993 [Bugfix] Correct max tokens for non-contiguous embeds (#21798) milesial 2025-07-29 18:16:25 -07:00
452b2a3180 [ci] mark blackwell test optional for now (#21878) Simon Mo 2025-07-29 18:03:27 -07:00
0d0cc9e150 [ci] add b200 test placeholder (#21866) Simon Mo 2025-07-29 17:11:50 -07:00
9266d98048 [BugFix] Fix interleaved sliding window not set for Gemma3n (#21863) Yong Hoon Shin 2025-07-29 16:34:19 -07:00
176bbce1db Revert "[AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure (#21647)" (#21850) Gregory Shtrasberg 2025-07-29 17:56:29 -04:00
a1873db23d docker: docker-aware precompiled wheel support (#21127) Doug Smith 2025-07-29 17:45:19 -04:00
a33ea28b1b Add flashinfer_python to CUDA wheel requirements (#21389) Michael Goin 2025-07-29 15:51:58 -04:00
7b49cb1c6b [Doc] update Contributing page's testing section (#18272) David Xia 2025-07-29 13:32:46 -04:00
f03e9cf2bb [Doc] Add FusedMoE Modular Kernel Documentation (#21623) Varun Sundar Rabindranath 2025-07-29 23:02:30 +05:30
37f86d9048 [Docs] use uv in GPU installation docs (#20277) David Xia 2025-07-29 13:32:06 -04:00
58b11b24a6 [Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend (#21525) elvischenv 2025-07-29 22:34:00 +08:00
ad341c5194 [Bugfix]fix mixed bits and visual language model quantization in AutoRound (#21802) Wenhua Cheng 2025-07-29 22:26:31 +08:00
759b87ef3e [TPU] Add an optimization doc on TPU (#21155) Brittany 2025-07-29 07:23:19 -07:00
f693b067a2 [Docs] Merge design docs for a V1 only future (#21832) Harry Mellor 2025-07-29 15:22:50 +01:00
04e38500ee [Bugfix] VLLM_V1 supports passing other compilation levels (#19340) Richard Zou 2025-07-29 09:35:58 -04:00

... 76 77 78 79 80 ...