Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

981cadb35c [Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty (#28181) courage17340 2025-11-06 17:52:13 +08:00
c3ee80a01a [V0 deprecation]clean up is_v1_supported_oracle (#28116) wangxiyuan 2025-11-06 16:05:32 +08:00
3755c14532 [CPU] Enable torch profiling (#28130) Aditya Tewari 2025-11-06 07:32:05 +00:00
201dc98acc Fix hard-coded parameter name in gemma3n.py (#27946) Seungduk Kim 2025-11-06 16:07:36 +09:00
a404e2c0f1 Patch Mistral Tokenizer (#28146) Julien Denize 2025-11-06 07:43:16 +01:00
e31946f86e [flashinfer] fix FI all2all with FI cutlass moe (#28166) Xiaozhu Meng 2025-11-05 21:52:16 -08:00
bde5039325 [CI] Add compile/test_multimodal_compile.py to CI (#28151) gmagogsfm 2025-11-05 21:41:47 -08:00
d72299d47b Make the cv2 dependency optional (#27780) Jacob Zhong 2025-11-06 13:08:55 +08:00
80679f108f [Core][MM] Use non-blocking CPU-GPU copy of multimodal data (#28141) Lukas Geiger 2025-11-06 04:05:12 +00:00
43ecd0a900 [Chore] Clean up deepseek v2/v3 config copy (#28055) Isotr0py 2025-11-06 11:46:30 +08:00
07d614511f [Misc] Remove the duplicate code (#28111) Chauncey 2025-11-06 10:07:47 +08:00
f948ab6945 [CI Failure] nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV was removed from HF. Skip it in tests (#28170) Vadim Gimpelson 2025-11-06 05:22:13 +04:00
d71af5f502 [Feature] Enable TP + EP shared_experts overlap with router, 3.7% E2E performance improvement (#28164) Wentao Ye 2025-11-05 20:21:08 -05:00
90189c71a9 [Bug] Fix env string "0" same to True (#28159) Wentao Ye 2025-11-05 20:04:20 -05:00
d79d9f0780 [Bug] Fix cpu disable shared_experts VLLM_DISABLE_SHARED_EXPERTS_STREAM (#28157) Wentao Ye 2025-11-05 20:03:09 -05:00
b6a248bdd7 [PERF] Decouple projections from GDN custom op. Attempt 2 (#28083) Vadim Gimpelson 2025-11-06 05:01:12 +04:00
1767658559 [Debugging] Add annotation for easier trace analysis (#22496) Dayeol Lee 2025-11-05 16:52:52 -08:00
efe73e9b57 [Core][Hybrid allocator + connector 2/n] Unify remove_skipped_blocks by get_last_useful_token (#25431) Kuntai Du 2025-11-05 16:12:00 -08:00
0b8e871e5e [CI/Build] Fix test_defaults_with_usage_context in AMD CI (#27926) Zhewen Li 2025-11-05 15:40:24 -08:00
5ee93a5956 [CI/Build] Update checking logic in cutlass_group_gemm_supported (#27948) Zhewen Li 2025-11-05 15:40:10 -08:00
e15601789b [Feature]: Add corrupted request metric to V1 metrics system. (#27306) Snehlata 2025-11-06 03:15:29 +05:30
65ac8d8dc4 [Docs] Add guide to debugging vLLM-torch.compile integration (#28094) Richard Zou 2025-11-05 16:31:46 -05:00
ffb08379d8 [Chore] Remove Nemotron-Nano-VL config copy (#28126) Isotr0py 2025-11-06 04:06:45 +08:00
e04492449e [Hardware][IBM Z] Optimize s390x Dockerfile (#28023) R3hankhan 2025-11-06 00:55:44 +05:30
518ec6b722 [Docs] Clean up README_TUNING.md (#28088) Michael Yao 2025-11-06 03:01:34 +08:00
802748bddb [Bugfix] Fix Qwen3-Reranker-8B load (#28117) wang.yuqi 2025-11-06 02:33:50 +08:00
faedbb4d4f [Feature] Extend batch invariant torch.compile to B200 (#27856) Paul Zhang 2025-11-05 13:04:49 -05:00
40db194446 [CI]: Add LMCacheConnector Unit Tests (#27852) Samuel Shen 2025-11-05 09:45:57 -08:00
c765f0b443 [FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell (#27994) Chen Zhang 2025-11-05 09:25:32 -08:00
002b07c4b2 [Bugfix] vLLM should check Inductor config for compile cache enablement status (#27637) gmagogsfm 2025-11-05 09:22:44 -08:00
752ddeacaa [Core] add support for reasoning parser plugins (#28075) Walter Beller-Morales 2025-11-05 12:15:06 -05:00
c18f88c6ca [Kernel] Fuse computation of g and beta for Gated Delta Net (#28095) Jiangyun Zhu 2025-11-06 01:14:55 +08:00
6fd0df8132 [misc] add vLLM Beijing Meetup (#28127) Jiaju Zhang 2025-11-06 01:12:59 +08:00
3f5a4b6473 [Bugfix] Validate custom logits processor xargs for online serving (#27560) Isotr0py 2025-11-06 00:53:33 +08:00
6cae1e5332 [ROCm][MLA] Support block-size > 1 for AITER MLA backend (#27224) Pleaplusone 2025-11-05 23:43:02 +08:00
80c9275348 Enabling cooperative multi-gpu tests on multi-gpu nodes (#27986) Alexei-V-Ivanov-AMD 2025-11-05 09:35:49 -06:00
e50c454672 [BugFix] Support EP/DP + EPLB with MTP (#25311) Ilya Markov 2025-11-05 16:22:17 +01:00
5d16d0fa62 [DCP] check return_lse for all layers in dcp (#27929) Chen Zhang 2025-11-05 06:27:25 -08:00
0606bea2b6 add kimi reasoning parser (#28128) bigmoyan 2025-11-05 21:48:33 +08:00
6e97eccf5d [XPU] Enable custom routing functions in IPEX for Llama4 (#28004) Frost Mitchell 2025-11-05 08:39:57 -05:00
6ab183813c [Graph Partition][Cache] Use inductor partition ops config (#27702) Boyuan Feng 2025-11-05 05:04:48 -08:00
6b7a81185d Bugfix: Cutlass FP8 FusedMoE bad scaling factors (#27255) amirkl94 2025-11-05 13:06:06 +02:00
b57789b62b Fix excessive logging noise by reducing the log level of the MinimaxM2ToolParser import success message (#27635) Eric Yue 2025-11-05 19:03:51 +08:00
377061d481 [Misc] fix import error for DeepSeekR1ReasoningParser (#28114) Chauncey 2025-11-05 19:02:32 +08:00
86dca07d9b [Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator (#28011) Kuntai Du 2025-11-05 02:36:31 -08:00
16b37f3119 [bugfix] fix wrong dcp_local_seq_lens calc (#27518) Qiu 2025-11-05 17:58:13 +08:00
0976711f3b [Refactor] to simplify and extract the shared logic between chat completion and responses (#27961) Chauncey 2025-11-05 15:46:39 +08:00
e261d37c9a [Refactor] Lazy-loaded reasoning_parser (#28092) Chauncey 2025-11-05 15:37:02 +08:00
b7cbc25416 [Model, Core] Support Granite Speech & LoRA for STT (#24455) Alex Brooks 2025-11-05 00:33:48 -07:00
d43ad5a757 [BugFix] Fix DCP Assert (AssertionError: DCP not support reorder_batch_threshold > 1 now.) (#28100) Lucas Wilkinson 2025-11-05 01:54:43 -05:00
0ff05e3770 [Bugfix] Fix encoder-only model support for transformers backend (#28021) Isotr0py 2025-11-05 14:24:41 +08:00
428bc7bf1c [V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955) wangxiyuan 2025-11-05 12:51:16 +08:00
878fd5a16f [CI/Build] Enable some fixed tests in AMD CI (#28078) Zhewen Li 2025-11-04 19:15:59 -08:00
18b39828d9 [XPU] Add gpt-oss model support for Intel GPU (#27786) Kunshang Ji 2025-11-05 10:17:23 +08:00
4ea62b77f5 [Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 (#27740) tou 2025-11-05 09:25:09 +08:00
d4e547bb7e Revert "[PERF] Decouple projections from GDN custom op" (#28080) Vadim Gimpelson 2025-11-05 03:58:23 +04:00
2d977a7a9e [ROCm] gemm_a16w16 upstreaming (#26969) Aleksandr Malyshev 2025-11-04 13:01:00 -08:00
1fb4217a05 [Multimodal] Make MediaConnector extensible. (#27759) Chenheli Hua 2025-11-04 10:28:01 -08:00
611c86ea3c Added disable rule to track files under benchmarks/lib (#28048) nadavkluger 2025-11-04 20:18:43 +02:00
dc937175d4 [ROCm][Perf] New design on ROCm AITER MHA backend Implementation (#25763) Pleaplusone 2025-11-05 02:05:33 +08:00
2f1cc8cef1 Remove deprecated --rope-scaling and --rope-theta (#28006) Harry Mellor 2025-11-04 10:01:56 -08:00
938a81692e [AsyncScheduling] Don't schedule past request max_tokens (#27922) Nick Hill 2025-11-04 09:06:28 -08:00
c9f66da8fd [PerfFix] Avoid separate thread for MP executor shm spin (#28012) Nick Hill 2025-11-04 08:33:55 -08:00
05cae69f0f [model] Add support for openPangu_Ultra_MoE (#27521) yt0428 2025-11-05 00:17:20 +08:00
5fd8f02ea9 [PERF] Decouple projections from GDN custom op (#27512) Vadim Gimpelson 2025-11-04 20:11:41 +04:00
97e3dda84b [Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM (#27284) lyrisz 2025-11-04 07:49:25 -08:00
5a0a6dfd55 [BugFix] Fix incorrect preallocated sampled_token_ids tensor size (#28025) Nick Hill 2025-11-04 07:38:16 -08:00
938772af03 [Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123) bnellnm 2025-11-04 08:59:45 -05:00
e4ee658672 [Model] add optimal triton fused moe configs for NemotronH MoE (#27967) tomeras91 2025-11-04 14:59:43 +02:00
77f8001f53 [Model][Bugfix] fix pipeline parallelism support for NemotronH (#27968) tomeras91 2025-11-04 14:28:36 +02:00
300a265978 [Core] Enable StatLogger in LLMEngine (#28020) Zhuohan Li 2025-11-04 04:13:35 -08:00
03c4c4aa9d Support using Int4PreshuffledTensor after loading (#26066) Jerry Zhang 2025-11-04 03:00:57 -08:00
2ec401bc39 Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435) yugong333 2025-11-04 02:27:35 -08:00
4022a9d279 [BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904) Varun Sundar Rabindranath 2025-11-04 02:56:21 -05:00
53f6e81dfd [CI/Build] Fix OpenAI API correctness on AMD CI (#28022) Zhewen Li 2025-11-03 23:20:50 -08:00
43a6acfb7d [Model] fix ernie45 reasoning_parser (#27973) CSWYF3634076 2025-11-04 15:16:46 +08:00
58279c60b5 [KV Connector] Make KVCacheConfig an explicit constructor argument (#27887) Mark McLoughlin 2025-11-04 07:00:49 +00:00
2f84ae1f27 [CI/Build] Update LM Eval Version in AMD CI (#27944) Zhewen Li 2025-11-03 22:36:40 -08:00
f32cbc9a0c [CPU]Improve dynamic 4bit moe performance (#27240) xiangze-arm 2025-11-04 14:33:23 +08:00
7e4be74104 [Bug] Batch invariant: Fix flash attn MLA RuntimeError: scheduler_metadata must have shape (metadata_size) (#27884) Wentao Ye 2025-11-04 01:05:55 -05:00
380ba6816d [Metrics] Enable sleep state metric outside of dev mode (#27867) Mark McLoughlin 2025-11-04 04:35:36 +00:00
14a125a06d [NIXL][XPU] Pin NIXL version to 0.7.0 (#27849) liuzhenwei 2025-11-04 11:28:35 +08:00
c02fccdbd2 [Refactor] Lazy import tool_parser (#27974) Chauncey 2025-11-04 10:10:10 +08:00
6ddae74054 [LoRA] Lora shrink swizzle (#27694) li2haipeng 2025-11-03 17:30:20 -08:00
b13a447546 [Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm (#27748) vllmellm 2025-11-04 09:12:19 +08:00
7956b0c0bc Remove the tpu docker image nightly build. (#27997) QiliangCui 2025-11-03 16:35:54 -08:00
3758757377 [Bugfix] Fix MoE Routing Simulation (#28002) Tyler Michael Smith 2025-11-03 17:26:49 -05:00
ccd3e55e51 [Bugfix][plugin] fla crash on plugin (#27322) Hank_ 2025-11-04 05:27:03 +08:00
01baefe674 Add TP parameter to attention tests (#27683) Matthew Bonanni 2025-11-03 16:04:40 -05:00
786030721e [Docs] add runai_streamer_sharded to LoadConfig (#27937) Ning Xie 2025-11-04 04:35:16 +08:00
145c00a4d3 [Bugfix] change FlashMLA reorder_batch_threshold (#27777) Matthew Bonanni 2025-11-03 15:17:10 -05:00
55011aef24 [Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile (#27764) Lucas Kabela 2025-11-03 11:12:15 -08:00
a4398fbb5e [Feature][Benchmarks] Support inf burstiness (#26941) Sophie du Couédic 2025-11-03 19:33:17 +01:00
2c19d96777 [Spec Decode] Integrate Suffix Decoding from Arctic Inference (#25784) Aurick Qiao 2025-11-03 09:23:31 -08:00
4bc400f47e [CI/Testing] Add basic single node dual batch overlap test (#27235) Lucas Wilkinson 2025-11-04 02:00:46 +09:00
cac4c10ef0 [BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616) ahao-anyscale 2025-11-03 08:13:51 -08:00
f7d2946e99 [Bugfix] Skip gs:// model paths for speculator detection (#27846) pwschuurman 2025-11-03 06:31:03 -08:00
294c805f1d Early exit for MoE LoRA kernels (#27131) gnovack 2025-11-03 04:22:17 -08:00
40b69e33e7 [Model] Add PaddleOCR-VL Model Support (#27758) zhang-prog 2025-11-03 19:04:22 +08:00
32257297dd [CI/Build] Remove the flaky gpt-oss lora test (#27966) Jee Jee Li 2025-11-03 16:50:06 +08:00

... 47 48 49 50 51 ...