Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

b78772c433 [Frontend] supports deepseekv32 chat template (#29837) Chauncey 2025-12-03 20:53:44 +08:00
f5d3d93c40 [docker] Build CUDA kernels in separate Docker stage for faster rebuilds (#29452) Amr Mahdi 2025-12-03 03:41:53 -08:00
78f4bb0ba8 [DOC] Add Arm to list of compute resouces providers (#29894) Fadi Arafeh 2025-12-03 11:36:58 +00:00
b294e28db2 [refactor] CTMoEMethods to use QuantizationArgs (#28871) HDCharles 2025-12-03 06:00:56 -05:00
787b84a9fc [Bugfix] Follow-up fix on MediaWithBytes (#29951) Roger Wang 2025-12-03 02:42:49 -08:00
42c1949643 [Bugfix][Quantization] Support BF16 tensors on GGUF (#29948) Tsukasa OI 2025-12-03 19:33:46 +09:00
cc4e296ea6 [CI/Build] Avoid duplicate empty inputs test for common multimodal generation tests (#29907) Isotr0py 2025-12-03 18:27:36 +08:00
a21cd9ed23 [Bugfix] Fix incorrect image_grid_thw rank for HunyuanOCR from missing merge_by_field_config=True (#29950) Isotr0py 2025-12-03 18:05:10 +08:00
7fe9c1a223 [CI] Add Async Eplb nightly CI tests (#29385) WeiQing Chen 2025-12-03 17:51:08 +08:00
3f42b05fbc [Refactor] [1/N] to simplify the vLLM serving architecture (#28040) Chauncey 2025-12-03 17:26:39 +08:00
69520bc695 Add logging for cudagraph related info (#29825) Yong Hoon Shin 2025-12-02 23:01:48 -10:00
3a7751485b [responsesAPI] support input output messages for non harmony models (#29549) Andrew Xia 2025-12-02 23:59:23 -08:00
bbfb55c29e [Misc] Allow fetch_* utils to access local files by default (#29932) Cyrus Leung 2025-12-03 15:49:34 +08:00
0bec63fa31 [BugFix] fix imgs_pos in hunyuan_vl (#29879) JackieWu 2025-12-03 14:20:37 +08:00
c719c40540 [Bugfix] Defunctionalize TRTLLM AR+Norm op for avoiding extra clone kernel before it (#29631) elvischenv 2025-12-03 13:15:50 +08:00
b08025a83b [Docs] Discuss api key limitations in security guide (#29922) Russell Bryant 2025-12-02 23:57:28 -05:00
4fd9d6a85c [Core] Rename PassConfig flags as per RFC #27995 (#29646) v0.12.0 Arpit Khandelwal 2025-12-02 22:38:55 -05:00
d7284a2604 [Core] Rename PassConfig flags as per RFC #27995 (#29646) Arpit Khandelwal 2025-12-02 22:38:55 -05:00
506ed87e87 [ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues (#29909) Andreas Karatzas 2025-12-02 20:36:49 -06:00
4dd7978374 [Bugfix] Fix regression on pooling models from PR#29621 (#29921) Roger Wang 2025-12-02 18:33:45 -08:00
a1d627e40f [BugFix] Fix assert in build_for_cudagraph_capture (#29893) Lucas Wilkinson 2025-12-02 19:56:54 -05:00
5cdd664509 [BugFix] Fix assert in build_for_cudagraph_capture (#29893) Lucas Wilkinson 2025-12-02 19:56:54 -05:00
5f67361fd1 Reverting re-direction to amd_mi355_X. (#29914) Alexei-V-Ivanov-AMD 2025-12-02 18:40:02 -06:00
2f055ec1c1 [Bugfix] Fix incorrect channel order for idefics3 in edge case (#29881) Isotr0py 2025-12-03 00:03:52 +08:00
5d91d2b292 [Doc] Add allocate_slots parameter docs (#29777) maang-h 2025-12-03 07:23:09 +08:00
6a6108511f [BUGFIX] Fix regex pattern for Mistral Tool Call (#29918) Julien Denize 2025-12-02 23:51:58 +01:00
9057fc2f1b [BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention (#29908) Julien Denize 2025-12-02 23:51:20 +01:00
a05b580540 [Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764) Chauncey 2025-12-03 06:42:28 +08:00
b6ae5aeca6 [Bugfix][EPLB] Prevent user-provided EPLB config from being overwritten with defaults (#29911) Sage Moore 2025-12-02 14:20:22 -08:00
5c7c09af8f [Perf] Avoid pageable HtoD transfer in MinTokensLogitsProcessor (#29826) jthomson04 2025-12-02 13:25:52 -08:00
c014de1ec7 [ROCm][CI] Fix test_cudagraph_mode.py Failure For AMD CI (#29808) Micah Williamson 2025-12-02 16:54:36 -06:00
1b1e35aaf9 [BUGFIX] Fix regex pattern for Mistral Tool Call (#29918) Julien Denize 2025-12-02 23:51:58 +01:00
5e5646e206 [BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention (#29908) Julien Denize 2025-12-02 23:51:20 +01:00
0a9caca9f5 [Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764) Chauncey 2025-12-03 06:42:28 +08:00
e6f114ac25 [Bugfix][EPLB] Prevent user-provided EPLB config from being overwritten with defaults (#29911) Sage Moore 2025-12-02 14:20:22 -08:00
6fc5841db1 Fix some more Transformers nightly tests (#29872) Harry Mellor 2025-12-02 21:49:44 +00:00
3ff5b53bc2 Bump actions/setup-python from 6.0.0 to 6.1.0 (#29768) dependabot[bot] 2025-12-02 21:29:32 +00:00
1528e079e2 [Perf] Avoid pageable HtoD transfer in MinTokensLogitsProcessor (#29826) jthomson04 2025-12-02 13:25:52 -08:00
afb1e5b380 [CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test (#29123) Divakar Verma 2025-12-02 14:46:10 -06:00
1c593e117d Fix boolean nested params, add dict format support, and enhance plotting for vllm bench sweep (#29025) Copilot 2025-12-02 20:40:56 +00:00
7f718169d1 [CI/Build] Fixes missing runtime dependencies (#29822) Benjamin Bartels 2025-12-02 18:21:49 +00:00
339e84ce86 [Bugfix] Fix DeepSeek R1 MTP weight loading (#29545) Matthew Bonanni 2025-12-02 10:52:18 -05:00
34a8559be7 [Chore] Use tokenizer.encode and tokenizer.decode directly (#29851) Cyrus Leung 2025-12-02 20:30:40 +08:00
85fb2e3120 Remove default values from InitVars so that they're not stored (#29859) Harry Mellor 2025-12-02 12:16:37 +00:00
a2b053dc85 feat(model): Add BitsAndBytes quantization support for Qwen3-Omni-MoE (#29896) Navanit Dubey 2025-12-03 00:58:35 +05:30
1d93f11675 [Attention][CUDAGraph] Remove CG padding from attention backends (#29352) Matthew Bonanni 2025-12-02 13:48:08 -05:00
2d613de9ae [CI/Build] Fixes missing runtime dependencies (#29822) Benjamin Bartels 2025-12-02 18:21:49 +00:00
c77b9929a0 Update AMD-CI testing mirror (as of 2025-12-02) (#29898) Alexei-V-Ivanov-AMD 2025-12-02 11:52:54 -06:00
63b1da76ba [Chore]: Reorganize gguf utils funtions under transformers_utils (#29891) Isotr0py 2025-12-03 01:33:23 +08:00
52cb349fc0 [responsesAPI][3] ResponsesParser to set up non harmony MCP (#29413) Andrew Xia 2025-12-02 08:24:45 -08:00
0ec8422171 [Bugfix] Fix incorrect channel order for idefics3 in edge case (#29881) Isotr0py 2025-12-03 00:03:52 +08:00
2eb4fe9129 [examples] Resettle pooling examples. (#29365) wang.yuqi 2025-12-02 23:54:28 +08:00
51c57b51dd [Bugfix] Fix DeepSeek R1 MTP weight loading (#29545) Matthew Bonanni 2025-12-02 10:52:18 -05:00
60c3d413af [Multimodal][Core] Optimize multimodal preprocessing cache by hashing image bytes instead of pixel values (#29621) ImaGoodFella 2025-12-02 14:49:02 +01:00
68ffbca7e4 [Chore] Use tokenizer.encode and tokenizer.decode directly (#29851) Cyrus Leung 2025-12-02 20:30:40 +08:00
951445a52d Remove default values from InitVars so that they're not stored (#29859) Harry Mellor 2025-12-02 12:16:37 +00:00
d8c6210eea Add Mistral Large 3 and Ministral 3 (#29757) Julien Denize 2025-12-02 11:29:00 +01:00
8bbcf8b6e7 [vLLM Benchmark Suite] Add default parameters section and update CPU benchmark cases (#29381) Louie Tsai 2025-12-02 01:00:23 -08:00
70fb77b4dc [BugFix] add max-num-batched-token to scheduler hash (#29829) Boyuan Feng 2025-12-02 00:55:02 -08:00
48d15a32aa [CI] Fix Bad_words test for tokenizer encode/decode asymmetry (#28193) 杰兮 2025-12-02 16:02:12 +08:00
3b221cb661 [BugFix] respect VLLM_LOGGING_LEVEL in logger (#29761) Boyuan Feng 2025-12-01 23:49:16 -08:00
0037b5746a [Core] Eliminate redundant is_encoder_decoder lookups (20-40us/step) (#29800) Wushi Dong 2025-12-01 23:08:07 -08:00
f5b0846ba0 Fix some Transformers nightly tests (#29802) Harry Mellor 2025-12-02 07:05:27 +00:00
13ea39bc09 [CPU]Parallelize over tokens in int4 moe (#29600) Zhang Xiangze 2025-12-02 14:21:39 +08:00
4b612664fd [CI] Renovation of nightly wheel build & generation (take 2) (#29838) Shengqi Chen 2025-12-02 14:17:10 +08:00
653591d5e7 [Chore] Move tokenizer initialization methods (#29793) Cyrus Leung 2025-12-02 13:33:37 +08:00
e2fbfc955e [CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm (#29827) Divakar Verma 2025-12-01 23:27:46 -06:00
a690fb5bd6 [CI][ROCm] Fix test_correctness_sliding_window (#29243) Divakar Verma 2025-12-01 22:53:27 -06:00
81fe3f82af [BugFix] Fix index error in ngram_proposer (#29779) usberkeley 2025-12-02 12:48:11 +08:00
53bf71b0f0 [Misc] Update conftest for entrypoints/sagemaker test folder (#29799) Zuyi Zhao 2025-12-01 19:56:39 -08:00
f441d36cee Add missing return in _check_vllm_model_embed_input_ids (#29834) Johnny Yang 2025-12-01 19:22:50 -08:00
22274b2184 [Misc] Add ReplicaId to Ray metrics (#24267) Seiji Eicher 2025-12-01 19:21:44 -08:00
fc95521ba5 [Misc] Throw error on unintended access to scheduler_config.max_model_len (#29771) Wei Wei 2025-12-01 18:58:44 -08:00
d0cd728907 [Core] Support reseting all running requests' KV while calling reset_prefix_cache (#28827) Zhuohan Li 2025-12-01 18:25:05 -08:00
fa8804ad9c [responsesAPI][4] fix responseOutputItem Kimi K2 thinking bug (#29555) Andrew Xia 2025-12-01 18:11:35 -08:00
4b40924998 [ROCm] Fallback pytorch GELU with tanh approximation to GELU() (#29244) Divakar Verma 2025-12-01 20:02:22 -06:00
c0dfc89485 SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm (#29711) Hendrik Holtmann 2025-12-02 02:24:18 +01:00
44822d7ff2 [BugFix] Preserve spec decoding uniform decode when scheduling (#29759) Nick Hill 2025-12-01 17:15:52 -08:00
342c4f1472 Updated CI mirror 2025-11-25 (#29434) Alexei-V-Ivanov-AMD 2025-12-01 17:44:33 -06:00
1336a1ea24 Revert #29787 and #29690 (#29815) Kevin H. Luu 2025-12-01 13:42:03 -08:00
eaf81485ed [Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode (#28935) Nengjun Ma 2025-12-02 04:02:18 +08:00
38caf7fa1a Update FAQ on interleaving sliding windows support (#29796) Finbarr Timbers 2025-12-01 12:15:19 -07:00
cabc77cc86 [Core][Observability] Add KV cache residency metrics (#27793) shivampr 2025-12-01 10:27:53 -08:00
ec7035c9d4 [ci] Make distributed 8 gpus test optional (#29801) Kevin H. Luu 2025-12-01 10:22:05 -08:00
fc6acc88ca [Bugfix] Missing cached item in the MultiModalReceiverCache (#28525) knlnguyen1802 2025-12-02 02:18:07 +08:00
d0985c5feb [Hardware][AMD] Remove ROCm skip conditions for transformers backend tests (#29782) BADAOUI Abdennacer 2025-12-01 19:03:13 +01:00
092bb73b8a [Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209) sangbumlikeagod 2025-12-02 02:19:17 +09:00
5d43f7372e [Doc] Update description disable_any_whitespace (#29784) FredericOdermatt 2025-12-01 17:48:33 +01:00
37593deb02 [CI] fix url-encoding behavior in nightly metadata generation (#29787) Shengqi Chen 2025-12-01 23:17:20 +08:00
f5516039c5 [Doc] fix heading levels (#29783) Liu Jinyi 2025-12-01 22:49:22 +08:00
36db0a35e4 [CI] Renovation of nightly wheel build & generation (#29690) Shengqi Chen 2025-12-01 21:25:39 +08:00
5cfa967efa [Bugfix] TypeError: 'NoneType' object is not callable (#29414) Marcin Ostrowski 2025-12-01 14:16:44 +01:00
b95db244ee [v1] Add real sliding window calculation to FlexAttention direct BlockMask building (#26015) Isotr0py 2025-12-01 21:12:51 +08:00
ad9d656bfa [multimodal][test] Reduce memory utilization for test_siglip to avoid OOM (#29504) Zhengxu Chen 2025-12-01 07:41:48 -05:00
f37e8938d2 [XPU] Fix AWQ skipped layer detection in IPEX quantization (#29774) Fanli Lin 2025-12-01 20:00:52 +08:00
f0a28bf661 [Misc] Unify tokenizer registration (#29767) Cyrus Leung 2025-12-01 19:34:58 +08:00
86e178f7c4 [crashfix] Eagle + multimodal can crash on mm cache miss (#29750) Mickaël Seznec 2025-12-01 10:29:33 +01:00
014ece97c7 [Frontend] Add tool filtering support to ToolServer (#29224) daniel-salib 2025-12-01 03:03:57 -05:00
62de4f4257 [Frontend] Resettle pooling entrypoints (#29634) wang.yuqi 2025-12-01 15:30:43 +08:00
83805a6078 [CI] Skip paddleocr_vl for transformer 4.57.3 (#29758) Huamin Li 2025-11-30 20:38:06 -08:00

... 38 39 40 41 42 ...