Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

a3a51d20e7 [Benchmark] Improvements to attention benchmark script (#37115) Wei Zhao 2026-03-16 18:22:40 -04:00
e5b807607c [Quant][Feature] Support online MXFP8 quantization for MoE and dense models (#35448) EdalatiAli 2026-03-16 18:07:39 -04:00
fd4d96302a Fix eplb nvfp4 experts hook (#37217) Elvir Crnčević 2026-03-16 23:03:54 +01:00
c0f011918d [Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688) (#36779) Krish Gupta 2026-03-17 02:41:33 +05:30
e6ae4b1be1 [compile] Enable mega aot artifact for torch 2.12+. (#37198) Zhengxu Chen 2026-03-16 17:05:51 -04:00
2dccb38f73 [Bugfix][MultiConnector] Fix MultiConnector for SupportsHMA sub-connectors (#36549) v0.18.0rc0 zhanqiuhu 2026-03-16 16:51:04 -04:00
d157216093 [BUGFIX][Mamba] Use uint64 for address in KVBlockZeroer (#37197) Kunshang Ji 2026-03-17 04:39:56 +08:00
93f3c8e531 [Misc] Add float16 to CacheDType (#37199) Matthew Bonanni 2026-03-16 16:24:48 -04:00
2cc26c3a99 [CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213) rasmith 2026-03-16 15:22:57 -05:00
dfa8852db2 [Refactor] Consolidate GPT-OSS reasoning parser tests (#36915) Flora Feng 2026-03-16 15:53:07 -04:00
714c6e0eab [torch.compile][BE] Modify cudagraph callable to check for is_forward_context_set (#36288) Lucas Kabela 2026-03-16 12:42:34 -07:00
0fefd00e6c [Bugfix] Fix render server crash for quantized models on CPU-only hosts (#37215) Sage 2026-03-16 20:59:01 +02:00
f5c081d432 [PD][Nixl] Add support for hybrid SSM-FA models (#36687) Nicolò Lucchesi 2026-03-16 19:58:06 +01:00
c88ea8338b [MTP][Sparse MLA] Take advantage of native MTP support in indexer when possible (#36982) Matthew Bonanni 2026-03-16 13:51:21 -04:00
9f9ecff4cd Add simple granite4 tool parser (#36827) Max de Bayser 2026-03-16 14:49:09 -03:00
ca1954d58c [Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090) haosdent 2026-03-17 01:03:10 +08:00
55e6d3d5c0 [Bugfix] Make siglip/clip compatible with transformers v5 (#37200) Raushan Turganbay 2026-03-16 17:48:18 +01:00
6682c231fa [Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing (#37148) Chauncey 2026-03-17 00:27:47 +08:00
5ae685c1c8 [Bugfix] Relax TRTLLM KV cache contiguity assertion for cross-layer layout (#34158) Itay Etelis 2026-03-16 17:20:51 +02:00
ce8cf9161d [Compile] Fix compile warning st256_cs in cuda_vec_utils.cuh (#36693) Wentao Ye 2026-03-16 11:12:15 -04:00
18be11fd59 [BUGFIX]fix CUDA OOM ERROR : invalid argument at cumem_allocator.cpp:119 (#35594) xjx 2026-03-16 23:10:42 +08:00
8d8855fdae [Bugfix] Add safety check and fallback for null scaling factor (#36106) Yuanheng Zhao 2026-03-16 22:27:29 +08:00
e855d380fa [Compile] Fix compile warning in moe_permute (#36529) Wentao Ye 2026-03-16 10:16:14 -04:00
0e5a9382af [Bugfix] accept redacted thinking blocks in Anthropic messages (#36992) Benjamin Bartels 2026-03-16 14:01:57 +00:00
04bf5a35fa [Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013) Fynn Schmitt-Ulms 2026-03-16 09:53:45 -04:00
43a73f853b Remove unused EVS functions in qwen3_vl.py (#37183) Tianyu Guo 2026-03-16 21:09:09 +08:00
ffbc2e5bdb Patch Mistral config (#37104) Julien Denize 2026-03-16 13:22:18 +01:00
f9e6db3034 [Models][Qwen3 ViT] Keep max_seqlen on CPU to prevent D2H sync (#37139) Lukas Geiger 2026-03-16 12:11:59 +00:00
d61d2b08e9 [Build] Fix API rate limit exceeded when using VLLM_USE_PRECOMPILED=1 (#36229) elvischenv 2026-03-16 20:09:27 +08:00
f5e59ee7a6 [Performance] Add prefetch for checkpoints to OS page cache (#36012) Artem Perevedentsev 2026-03-16 13:32:02 +02:00
9b005edc48 [Docs] Make the link to hardware plugins clearer (#37174) Harry Mellor 2026-03-16 11:12:58 +00:00
bf9a185395 GLM4 tool parser: fix streaming mode (#35208) Robin Nabel 2026-03-16 11:48:52 +01:00
ad041c79db Fix text only inputs for MRoPE models with the Transformers modelling backend (#37055) Harry Mellor 2026-03-16 10:31:16 +00:00
747b068136 [Hardware] Replace memory related torch.cuda APIs (#37031) Kunshang Ji 2026-03-16 18:24:48 +08:00
122f75d939 Fix pipeline parallel with multimodal models with the Transformers modelling backend (#37057) Harry Mellor 2026-03-16 10:20:37 +00:00
d8f8a7aad2 [Misc] Sync pre-commit to 4.5.1 in workflows and docs (#36675) SoluMilken 2026-03-16 18:03:21 +08:00
0115e957d4 [Frontend][Misc] Remove unused log in /is_sleeping (#37093) Roy Wang 2026-03-16 17:46:28 +08:00
116ed130f4 [Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871) haosdent 2026-03-16 17:30:23 +08:00
8374387bd8 [FlashInfer] Revert block_size 16 + head_size 256 workaround on Blackwell (#36987) Vadim Gimpelson 2026-03-16 13:04:29 +04:00
912fbe9555 [Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs (#37147) Isotr0py 2026-03-16 16:56:06 +08:00
52131f88d9 use skip_all_guards_unsafe to drop global_state and torch_function_mode_stack guards instead of previous hacks (#36204) Laith Sakka 2026-03-16 01:52:31 -07:00
821eb80c0d [Performance][Model Loader] Skip non-local expert weights during EP model loading (#37136) Roy Wang 2026-03-16 16:33:36 +08:00
a2956a0f8e [ROCm][CI] Retrying in case of batch variance effects and reducing flakiness (#36442) Andreas Karatzas 2026-03-16 03:08:51 -05:00
911355e216 [ROCm] Fix KV copy methods and auto-select attention backend for ROCm (#36845) Andreas Karatzas 2026-03-16 03:07:27 -05:00
8d3f8f485e [Bugfix] fix Qwen3.5 tool calling bug (#36774) Chauncey 2026-03-16 15:38:42 +08:00
96efb91480 [Model Runner V2] Fix processed logits in sample() (#37144) Woosuk Kwon 2026-03-16 00:35:49 -07:00
2754231ba3 [Kernel] Add FlashInfer MoE A2A Kernel (#36022) leo-cf-tian 2026-03-16 02:45:32 -04:00
2390d44209 [Model] Add HyperCLOVAX-SEED-Think-14B language model support (#37107) bigshanedogg 2026-03-16 15:40:05 +09:00
7362b4450a [Bugfix] Avoid LD_PRELOAD check on MacOS (#37145) Li, Jiang 2026-03-16 14:31:44 +08:00
57a314d155 [CI][Bugfix] Fix 500 errors from priority overflow and TemplateError subclasses in schema fuzz tests (#37127) Andreas Karatzas 2026-03-16 00:27:21 -05:00
d4c57863f7 [ROCm][CI] Fix engine teardown and text normalization to stabilize voxtral test (#37138) Andreas Karatzas 2026-03-15 23:49:31 -05:00
68e1b711f1 [XPU] Add deepseek_scaling_rope fused kernel (#36612) Wang, Yiting 2026-03-16 12:35:08 +08:00
0024f39a32 [ROCm][P/D][MORI][BugFix] Add transfer_id for moriio_connector so moriio_connector to restore P/D functionality (#34907) rasmith 2026-03-15 21:36:51 -05:00
e9163b536e [responsesAPI][ez] add a unit test for SimpleContext logprobs (#37126) Andrew Xia 2026-03-15 17:12:26 -07:00
7acaea634c In-Tree AMD Zen CPU Backend via zentorch [1/N] (#35970) Lalithnarayan C 2026-03-16 05:05:35 +05:30
697e4ff352 [GDN] add a config for gdn kernel selection (#36647) Jiangyun Zhu 2026-03-16 00:40:17 +08:00
a3e2e250f0 [Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614) Hari 2026-03-15 17:08:21 +05:30
143e4dccdf [Misc] Add online audio_in_video test (#36775) Isotr0py 2026-03-15 15:14:11 +08:00
6590a3ecda [Frontend] Remove torchcodec from audio dependency (#37061) Isotr0py 2026-03-15 13:15:59 +08:00
b3debb7e77 [Build] Upgrade xgrammar to get a security fix (#36168) Russell Bryant 2026-03-14 23:13:48 -04:00
458c1a4b2d [Frontend] Reduce chat template warmup logging levels (#37062) Nick Hill 2026-03-14 13:48:59 -07:00
821fde2df4 [Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference (#32384) Karan Bansal 2026-03-14 22:59:06 +05:30
8c29042bb9 [Feature] Add InstantTensor weight loader (#36139) arlo 2026-03-15 01:05:23 +08:00
5467d137b3 [Frontend] Avoid startup error log for models without chat template (#37040) Cyrus Leung 2026-03-15 00:36:11 +08:00
3ed46f374b [Model Runner V2] Add Support for XD-RoPE (#36817) Santino Ramos 2026-03-14 09:27:55 -07:00
84868e4793 [Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM formats (#35109) seanmamasde 2026-03-14 23:44:03 +08:00
a8e8d62dd8 [Misc] Clean up Kimi-audio whisper encoder loading (#36903) Isotr0py 2026-03-14 23:37:52 +08:00
e42b49bd69 Mistral common v10 (#36971) Julien Denize 2026-03-14 15:26:43 +01:00
4a718e770d [Bug] Fix Failure in /v1/chat/completions/render for Multimodal Requests (https://github.com/vllm-project/vllm/issues/35665) (#35684) Sergey Zinchenko 2026-03-14 17:10:11 +03:00
600a039f57 [CI] Shard Multi-Modal Models (Standard) into 4 parallel jobs (#37014) Kevin H. Luu 2026-03-14 01:26:54 -07:00
ffa5d74f15 Enable loading of fused expert weights in the Transformers modelling backend (#36997) Harry Mellor 2026-03-14 07:01:06 +00:00
74fe80ee95 [CI] Split Distributed Tests (4 GPUs) into 3 parallel jobs (#37015) Kevin H. Luu 2026-03-13 21:21:13 -07:00
bcfdadb1bc [Refactor] Relocate chat completion and anthropic tests (#36919) Flora Feng 2026-03-14 00:16:16 -04:00
236de72e49 [CI] Pin helion version (#37012) Yanan Cao 2026-03-13 20:25:29 -07:00
a116f96930 [V1] Remove pin_memory() in async_copy_to_gpu to fix sporadic stalls (#37006) sbeurnier 2026-03-14 02:37:32 +01:00
092ace9e3a [UX] Improve UX of CPU backend (#36968) Li, Jiang 2026-03-14 09:27:29 +08:00
f680dc1b39 [responsesAPI] prioritize content over summary in reasoning item input (#36516) Andrew Xia 2026-03-13 18:20:30 -07:00
b41aa264f9 fix: resolve chat template names before kwargs detection (#36937) Giulio Leone 2026-03-14 01:20:16 +01:00
367cf5cd3e [Feat][Bugfix] Enable additional dimension for Flashinfer MLA and fix routing dtype (#36931) Dimitrios Bariamis 2026-03-14 00:41:16 +01:00
6d53efd2a5 [Bugfix] Fix MLA attention crash with AWQ/GPTQ quantized models (#34695) haosdent 2026-03-14 07:25:41 +08:00
8b346309a5 [Refactor] Consolidate SupportsEagle (#36063) Benjamin Chislett 2026-03-13 19:22:40 -04:00
54a6db827f [BugFix] Fix "DP Coordinator receives unexpected..." messages (#37008) Nick Hill 2026-03-13 16:18:05 -07:00
9efc4db965 [Bugfix] Fix DeepSeek-V3.2 tokenizer stripping spaces (#37004) Matthew Bonanni 2026-03-13 18:55:36 -04:00
f1816fb192 [CI] Split V1 e2e + engine (1 GPU) into separate jobs (#36945) Kevin H. Luu 2026-03-13 14:16:02 -07:00
0005d2a3c9 Use Transformers v5 WeightRenaming for Transformers modeling backend (#31545) Harry Mellor 2026-03-13 20:49:08 +00:00
d0b402974f [Bugfix][Spec Decode] Avoid double call of Ngram CPU (#36952) Ekagra Ranjan 2026-03-13 16:33:19 -04:00
6341d43043 [ROCm][Quantization] add quark w4a8 mxfp4_fp8 for LinearLayer (#35316) Divakar Verma 2026-03-13 15:44:24 -04:00
7afe0faab1 [Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish (#36666) Mark McLoughlin 2026-03-13 19:10:06 +00:00
5a3f1eb62f [Misc] Set default kv_buffer_device in a better way (#36862) Harry Mellor 2026-03-13 19:07:33 +00:00
b3ce711b93 Fp8 lora dense kernel (#35242) yugong333 2026-03-13 12:05:08 -07:00
abf61aaa8e [Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request (#36800) Isotr0py 2026-03-14 02:16:05 +08:00
4508532fbd [Bugfix] fix paddleocr crash on some image shape (#36959) bigmoyan 2026-03-13 21:46:55 +08:00
d5af196c18 [2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627) Itay Alroy 2026-03-13 15:25:33 +02:00
82f836d976 [XPU] Support LoRA via torch.compile on XPU platform (#36962) Chaojun Zhang 2026-03-13 18:34:59 +08:00
4fccd30f19 [ROCm][CI] Upgrading orchestrator to handle python pipeline markers and options (#36181) Andreas Karatzas 2026-03-13 04:04:22 -05:00
cfaf4668f7 [kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec (#36610) Or Ozeri 2026-03-13 10:04:21 +02:00
99a57bdf74 [ROCm][CI] Corrected the GPT-OSS test root path (#36711) Andreas Karatzas 2026-03-13 02:53:43 -05:00
a2268617cf [Frontend] Delegate preprocessing to OpenAIServingRender (#36483) Sage 2026-03-13 09:39:43 +02:00
a4ad9db541 Enable RoPE+KV cache fusion for ROCm AITER FA (non-shuffle layout) (#35786) Rohan Potdar 2026-03-13 02:33:22 -05:00
b373b5102a [Tests] Shutdown test RemoteVLLMServer cleanly (#36950) Nick Hill 2026-03-13 00:32:55 -07:00

... 7 8 9 10 11 ...