Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

5ce2d10e4a Fix models which use layer_type_validation for Transformers v5 (#37398) Harry Mellor 2026-03-18 18:41:51 +00:00
738d0a281f [Bugfix] Fix incorrect use of merge_size in Qwen3-VL video timestamp calculation (#37439) Chengyu Fang 2026-03-19 02:36:34 +08:00
70b81c4f3d [bugfix][async scheduling] fix extra cuda context in device 0 with EP/DP (#37449) youkaichao 2026-03-19 02:32:30 +08:00
7476d148db [Model] Remove unnecessary processor definition for Nemotron Parse (#37456) Cyrus Leung 2026-03-19 02:25:13 +08:00
f3732bd931 [Misc] Clean up model registry (#37457) Cyrus Leung 2026-03-19 02:24:44 +08:00
0ef7f79054 [Perf] Add tuned triton moe config for Qwen3.5 H200, 9.9% E2E throughput improvement (#37340) Wentao Ye 2026-03-18 14:18:34 -04:00
16c971dbc7 [CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename (#37328) Andreas Karatzas 2026-03-18 04:44:12 -05:00
5dd8df0701 [kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642) Or Ozeri 2026-03-18 19:26:40 +02:00
39bfb57b7c Add API docs link if the CLI arg is a config class (#37432) Harry Mellor 2026-03-18 17:19:35 +00:00
c9d838fc33 Adding deterministic lora benchmarking to vLLM Bench (#36057) RonaldBXu 2026-03-18 09:02:03 -07:00
b1169d7be8 [Kernel] Add gpt-oss Router GEMM kernel (#37205) Xin Yang 2026-03-18 08:15:56 -07:00
17808394bc standardize load_weights using AutoWeightsLoader for kimi_linear and minimax_text_01 (#37371) XLiu-2000 2026-03-18 23:05:37 +08:00
296839a1b0 [Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE (#30647) elvischenv 2026-03-18 23:01:26 +08:00
c373b5c00d [Log] Reduce duplicate log (#37313) Wentao Ye 2026-03-18 10:57:44 -04:00
de1a86b7de elastic_ep: Fix stateless group port races (#36330) Itay Alroy 2026-03-18 16:36:18 +02:00
99267c23ca [2/3] Refactor InternVL-based processors (#37324) Cyrus Leung 2026-03-18 22:22:19 +08:00
525f2eeb0b [kv_offload+HMA][6/N]: Split offloading_connector.py (#37405) Or Ozeri 2026-03-18 15:42:46 +02:00
918b7890a1 [Bugfix] Fix base64 JPEG video frames returning empty metadata (#37301) Yufeng He 2026-03-18 21:40:03 +08:00
98b09ddc27 [NIXL][Bugfix] metrics & testing minor bug (#36051) Andy Lo 2026-03-18 13:39:14 +00:00
cef1f302d2 [Model] Enable LoRA support for tower and connector in H2OVL (#31696) Shwetha Poojary 2026-03-18 18:56:47 +05:30
17c47fb869 [Bugfix] Fix EP weight filter breaking EPLB and NVFP4 accuracy (#37322) Elvir Crnčević 2026-03-18 11:30:29 +01:00
b322b197f1 [Build] Bump python openai version (#32316) Chauncey 2026-03-18 18:20:10 +08:00
eaf7c9b976 [CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename (#37328) Andreas Karatzas 2026-03-18 04:44:12 -05:00
47a1f11bff [docs] Add docs for new RL flows (#36188) Aaron Hao 2026-03-18 02:04:26 -07:00
262ddd0d81 [cherry-pick][Bugfix] Fix EP weight filter breaking EPLB and NVFP4 accuracy #37322 v0.18.0rc1 khluu 2026-03-18 01:48:32 -07:00
e60c1674b3 [Bugfix] Avoid OpenMP thread reallocation in CPU torch compile (#37391) Li, Jiang 2026-03-18 15:51:39 +08:00
faa80947f5 [Performance] Add --enable-ep-weight-filter CLI option (#37351) Roy Wang 2026-03-18 09:36:55 +08:00
eeabf740bb [Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389) Terry Gao 2026-03-16 15:51:46 -07:00
cdcffafef8 Fix eplb nvfp4 experts hook (#37217) Elvir Crnčević 2026-03-16 23:03:54 +01:00
fad09e8a1f fix(glm47): improve tool call parsing and content normalization (#37386) Karan Bansal 2026-03-18 13:42:21 +05:30
8c31f47c63 [LoRA] Make LoRA respect language_model_only (#37375) Jee Jee Li 2026-03-18 15:53:34 +08:00
261801242f [Bugfix] Avoid OpenMP thread reallocation in CPU torch compile (#37391) Li, Jiang 2026-03-18 15:51:39 +08:00
fcf0687b27 [kv_offload+HMA][0/N]: Support block-level preemption handling (#34805) Or Ozeri 2026-03-18 08:49:53 +02:00
86b7e3c95a [XPU] skip unsupported ut and update test_nixl_connector (#37179) liuzhenwei 2026-03-18 13:32:59 +08:00
0e95916155 [responsesAPI] parser.extract_response_outputs can take in token IDs (#37130) Andrew Xia 2026-03-17 22:31:31 -07:00
ce2ef42fd3 [CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335) Andreas Karatzas 2026-03-18 00:26:20 -05:00
8b6325758c [ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture (#37349) Andreas Karatzas 2026-03-17 23:55:40 -05:00
a0dd1995c7 [Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924) gxd3 2026-03-17 21:53:28 -07:00
f1740006e4 [Perf] Enable dual stream execution of input projection for Qwen3 (#36795) Xin Yang 2026-03-17 20:13:27 -07:00
58cde5c026 [ROCm][CI] Skip trtllm kvfp8 dequant tests on ROCm (#37330) Andreas Karatzas 2026-03-17 22:12:26 -05:00
761e0aa7a0 [Performance] Add --enable-ep-weight-filter CLI option (#37351) Roy Wang 2026-03-18 09:36:55 +08:00
ff9fbc9aff [Kernel][Helion] [16/N] Refactor register_kernel API to be more Dynamo-friendly (#36705) Yanan Cao 2026-03-17 18:23:35 -07:00
e6c4797704 [ROCm][Quantization] add fp8xfp8 attn support for rocm_aiter_unified_attn (#36927) Divakar Verma 2026-03-17 20:49:32 -04:00
09e4576f65 [Kernel] Add non-gated support for NVFP4 CUTLASS MoE (#37320) Michael Goin 2026-03-17 23:12:04 +01:00
3ed7b1e6e0 [ROCm] Validate block_size for explicitly selected attention backends (#36846) Andreas Karatzas 2026-03-17 17:04:40 -05:00
e8f9dbc369 [Bugfix][ROCm] Fix worker startup OOM on ROCm by skipping unreliable cudagraph memory profiling (#36720) JartX 2026-03-17 22:55:34 +01:00
de35c06c66 Make KV connector metadata build overridable via plugin (#37336) Yong Hoon Shin 2026-03-17 14:29:06 -07:00
c0745a851a [Model] Add ColQwen3.5 4.5B support (#36887) Athrael Soju 2026-03-17 21:17:02 +00:00
b5ca9c3557 [Models] Cohere ASR (#35809) Ekagra Ranjan 2026-03-17 17:04:17 -04:00
245758992e [Bugfix] Rescale NVFP4 weight scales to fix BF16 dequant underflow (#34577) Chao-Ju Chen 2026-03-18 04:48:42 +08:00
1204cf0a9d [Bugfix] Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158) Dimitrios Bariamis 2026-03-17 21:13:06 +01:00
b36adfa349 [Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache (#37252) Wei Zhao 2026-03-17 16:09:20 -04:00
e78821b438 [Deprecation] Deprecate --calculate-kv-scales option (#37201) Michael Goin 2026-03-17 20:57:24 +01:00
51f0acda79 [Model] Remove unused handle_oov_mm_token (#37321) Cyrus Leung 2026-03-18 03:44:52 +08:00
fa75204b16 bump compressed-tensors version to 0.14.0.1 (#36988) Brian Dellabetta 2026-03-17 15:36:19 -04:00
bdb903bb5f [Bug] Fix FlashInfer MNNVL socket collisions under concurrent vLLM jobs (#36674) Wentao Ye 2026-03-17 15:19:52 -04:00
68f783a727 [Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673) Andrey Talman 2026-03-17 14:47:59 -04:00
c5030c439d [CI] Split Distributed Tests (4 GPUs) and Kernel MoE tests (#37100) Avinash Singh 2026-03-18 00:14:55 +05:30
51b2333be1 [Perf] Optimize top-k search in apply_top_k_top_p_triton sampler (#37225) Michael Goin 2026-03-17 19:35:17 +01:00
4ed51308c8 [CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ (#37230) Andreas Karatzas 2026-03-17 11:08:08 -05:00
c781fbbab3 [Bugfix] Standardize custom HF Processor init (#37289) Cyrus Leung 2026-03-17 23:38:55 +08:00
979ff44cea [BugFix] PyTorch Compilation Tests should error if any test fails (#37300) Richard Zou 2026-03-17 11:26:38 -04:00
f63ed7b5ac [Bugfix] Fix DP MTP Dummy Run (#35243) Benjamin Chislett 2026-03-17 11:16:48 -04:00
c9e5096256 [openapi] remove redundant exception stack trace[4/N] (#37157) Ning Xie 2026-03-17 23:06:25 +08:00
2ff0ad9694 [UltraVox] Fix output type (#37224) Anton Vlasjuk 2026-03-17 15:51:17 +01:00
a836524d20 [Chore] Replace all base64 usages with faster pybase64 package (#37290) Isotr0py 2026-03-17 22:44:19 +08:00
3717a4dd47 [Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984) Bhoomit 2026-03-17 07:36:41 -07:00
ecfcdd2ce4 Fix Phi3 test that fails with Transformers v5 (#37298) Harry Mellor 2026-03-17 14:29:24 +00:00
c25dbc2d27 [Bugfix] Fix unclean shutdown crash with AllReduce Fusion workspace (#36955) Siew's Capital Jarvis 2026-03-17 22:22:09 +08:00
77d2a5f17b pick up tuned prefill configs for FP8 FA3 (#36265) Jonas M. Kübler 2026-03-17 15:00:26 +01:00
59192dfd39 [Frontend] Complete OpenAI render delegation (#37287) Sage 2026-03-17 15:53:55 +02:00
56cb1baa66 [Misc] Use VLLMValidationError in batch, pooling, and tokenize protocol validators (#36256) Umut Polat 2026-03-17 16:52:30 +03:00
f340324335 [1/2] Move InternVL-based processors (#37260) Cyrus Leung 2026-03-17 21:50:56 +08:00
2660b9289c Bugfix for offloading+prefetch for GLM-4.7-FP8 (#37178) sfbemerk 2026-03-17 14:22:09 +01:00
293f036e6d Add gigachat 3.1 tool parser + fix gigachat3 tool parser (#36664) Viacheslav 2026-03-17 15:03:20 +03:00
0fb142a454 [perf][connector] optimize build_connector_meta when host buffer transfer is not used (#37165) youkaichao 2026-03-17 19:59:35 +08:00
00f8e0d211 [Frontend] Delegate tokenization serving preprocessing to OpenAIServingRender (#37266) Sage 2026-03-17 13:22:54 +02:00
4af9ed21cb [Bugfix](xpu): prevent “selected index k out of range” in TP decode path (#37259) zhao, zhenhui 2026-03-17 19:14:07 +08:00
9c7cab5ebb [Feature]: Support for multiple embedding types in a single inference call (#35829) Augusto Yao 2026-03-17 17:05:42 +08:00
132bfd45b6 [Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens (#37258) Chauncey 2026-03-17 16:54:52 +08:00
24b4272a8c Fix infinite recursive search issue in quark.py (#32779) xiao-llm 2026-03-17 03:19:15 -04:00
8a680463fa [Bugfix] Fix NemotronH MTP + Chunked Prefill (#35447) Benjamin Chislett 2026-03-17 02:07:33 -04:00
20b14095a4 [Bugfix] Fix loading Music Flamingo (#35535) Nick Cao 2026-03-17 01:24:40 -04:00
17c1bdf371 [Bugfix] dtype mismatch in ngram gpu propose (#37246) PatchyTIS 2026-03-17 13:19:55 +08:00
3e3d320c1b [Refactor] Relocate responses API tests (#37241) Flora Feng 2026-03-17 01:14:52 -04:00
4d22667c32 [Feature][Frontend] add support for Cohere Embed v2 API (#37074) Walter Beller-Morales 2026-03-16 19:55:53 -04:00
1fe3932c8b [ROCm] Fix AttributeError for torch.compiler.skip_all_guards_unsafe on older PyTorch (#37219) Andreas Karatzas 2026-03-16 22:34:49 -05:00
54a62a79f7 [ROCm] Fix AttributeError for torch.compiler.skip_all_guards_unsafe on older PyTorch (#37219) v0.17.2rc0 Andreas Karatzas 2026-03-16 22:34:49 -05:00
384dc7f77b [Refactor] Relocate completion and chat completion tests (#37125) Flora Feng 2026-03-16 23:31:23 -04:00
f04d5226f8 [CI] Fix flaky tool_use chat completion tests with deterministic seed (#37027) Flora Feng 2026-03-16 23:24:34 -04:00
0a0a1a198b Add ability to replace oot ops when using lora (#37181) Kyuyeun Kim 2026-03-16 18:04:15 -07:00
6c1cfbad32 Support non-contiguous KV cache in TRTLLM fp8 dequant kernel (#36867) Vadim Gimpelson 2026-03-17 04:48:42 +04:00
45f526d652 [BugFix] Correct max memory usage for multiple KV-cache groups (#36030) Harry Huang 2026-03-17 08:38:52 +08:00
5db91f0aaf Fix some Mistral parser issues (#37209) Julien Denize 2026-03-17 01:08:56 +01:00
061980c36a [Feature][Frontend] add support for Cohere Embed v2 API (#37074) Walter Beller-Morales 2026-03-16 19:55:53 -04:00
7a49742b88 [CI/Build] Add common tool call parser test suite (#27599) Ben Browning 2026-03-16 19:46:20 -04:00
3e6a1e1686 [Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389) Terry Gao 2026-03-16 15:51:46 -07:00
7961486a9b Fix EagleMistralLarge3Model initialization (#37232) Julien Denize 2026-03-16 23:41:00 +01:00
4f9b14c21c [CI] Stabilize multinode DP internal LB completion tests (#36356) Andreas Karatzas 2026-03-16 17:40:23 -05:00
31a458c091 [Doc] Clarify schema enforcement behavior for tool_choice modes (#37064) Yuchen Fama 2026-03-16 18:27:42 -04:00

... 6 7 8 9 10 ...