Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

b30372cbd0 [Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896) Jialin Ouyang 2025-11-10 15:34:18 -08:00
d17ecc6b19 [PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248) Ilya Markov 2025-11-11 00:33:11 +01:00
021143561f [ROCm] Add missing gemm_a8w8_blockscale import (#28378) Yong Hoon Shin 2025-11-10 13:13:36 -10:00
30700b1cd7 [CI] Fix Plugin Tests Tests (#28413) v0.11.1rc6 Robert Shaw 2025-11-10 17:36:11 -05:00
4b94ed8f92 [Frontend][2/n] remove empty content from _parse_tool_calls_from_content (#28331) Andrew Xia 2025-11-10 14:07:49 -08:00
6dec9f6109 [BugFix] Fix DeepGEMM over-allocating workspace (#28254) Lucas Wilkinson 2025-11-10 17:01:17 -05:00
bf6a3d0ff5 [Misc] Add more scoping for improved trace (#28329) Wei Wei 2025-11-10 13:03:21 -08:00
40d33264c6 [Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled (#28377) Sage Moore 2025-11-10 12:39:19 -08:00
9c84ca8293 [FA/Chore] Bump FA version for FP8 two-level accumulation (#27889) Jonas M. Kübler 2025-11-10 21:06:04 +01:00
6d54336ae5 [Bugfix] Fix llguidance backend, rollback when EOS was encountered (#25905) Rémi Delacourt 2025-11-10 20:53:32 +01:00
34553b9d27 [Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next (#27492) jiahanc 2025-11-10 09:34:57 -08:00
b039bfda8f [Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366) Varun Sundar Rabindranath 2025-11-10 12:21:52 -05:00
d0e186c16f [V0 Deprecation] Remove unused context_len and seq_len from M-RoPE (#28395) Cyrus Leung 2025-11-11 00:30:06 +08:00
f080a83511 [RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops (#24490) vllmellm 2025-11-10 17:20:53 +01:00
40e2eeeb92 [Kernel] Optimization of the mm_k operator. (#28280) caozuoba 2025-11-11 00:03:46 +08:00
b06b9470ca [Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model (#27474) zejunchen-zejun 2025-11-10 23:38:56 +08:00
4673e465ff Add @tjtanaa to codeowner for ROCm and multi-modal (#28360) TJian 2025-11-10 05:39:17 -08:00
912744d066 [Fix] optimize visual token mask with caching and multi-token support (#28374) Ferrebo 2025-11-10 21:23:49 +08:00
15be507c86 [bugfix] fix siglip batch text output error (#28365) Yu Jiaqi 2025-11-10 21:21:15 +08:00
6f7de33bed [Metrics] Refactor LoRA state tracking (#26801) Mark McLoughlin 2025-11-10 08:34:36 +00:00
a98cc35c34 Restore PlaMo2 unit test as pfnet/plamo-2-1b now supports transformers >=4.56 (#28019) Shinichi Hemmi 2025-11-10 15:50:02 +09:00
e8697faf03 [V0 deprecation] Remove no longer used get_metadata_cls (#28370) Lucas Wilkinson 2025-11-10 01:32:09 -05:00
03fa4d3fb3 [Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X (#28373) Xiake Sun 2025-11-10 12:53:40 +08:00
6b2b9fd934 [CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness (#28322) Varun Sundar Rabindranath 2025-11-09 21:45:29 -05:00
c5f685b3ae [ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP (#28279) JartX 2025-11-10 00:09:36 +01:00
c4768dcf47 [Kernel] Fix fused_gdn_gating (#28343) Jiangyun Zhu 2025-11-10 05:26:35 +08:00
a65a934ebe [CI/Build] Temporary fix to LM Eval Small Models (#28324) Zhewen Li 2025-11-09 13:08:38 -08:00
4a8d6bd168 Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method (#28214) usberkeley 2025-11-10 03:11:46 +08:00
636efd10a5 [Core] Separate out attention metadata building logic from prepare inputs (#26764) Lucas Wilkinson 2025-11-09 13:51:43 -05:00
289eb6c537 [Core] Simplify async KV output aggregation (#28327) Nick Hill 2025-11-09 09:44:13 -08:00
19d91ece4b [CI] Fix flaky test_eagle_correctness test (#28364) Nicolò Lucchesi 2025-11-09 17:04:59 +01:00
7ae5a5fb11 [Misc] Add some comments in qwen3-next (#28267) Jiangyun Zhu 2025-11-09 15:59:24 +08:00
de2b78305f [ROCm] Add env to enable/disable aiter triton gemm (#28321) Yong Hoon Shin 2025-11-08 20:27:00 -10:00
e5e9067e61 [Misc] fix typo and add detailed log (#28178) Ning Xie 2025-11-09 13:33:46 +08:00
3a7d580343 fix: close issue 28338 by fixed python version (#28339) yihong 2025-11-09 13:07:26 +08:00
05f8d69077 [chore] Move some wikimedia images to S3 (#28351) Kevin H. Luu 2025-11-08 17:58:26 -08:00
404d7a9d14 [Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 (#28345) Mohammad Miadh Angkad 2025-11-09 06:50:10 +08:00
171133f929 [Bugfix] Fix test fused quant layernorm tests (#27865) ElizaWszola 2025-11-08 23:31:33 +01:00
32787d0644 Remove setuptools upper bound constraint (<80) (#28337) Cole Murray 2025-11-08 14:30:18 -08:00
975676d174 [Feat] Drop-in Torch CUDA Profiler (#27841) Benjamin Chislett 2025-11-08 17:07:37 -05:00
77d702a22b Enhance run_cluster.sh for multi-NIC support (#28328) Ev Lacey 2025-11-08 14:04:16 -08:00
2108a571d7 [DCP] Support dcp kv_cache interleave size > 1 (#26696) zhangsicheng5 2025-11-09 03:45:27 +08:00
47604137a2 [Bugfix] Spec decode + structured output + spec model max len edge case (#28298) Andy Lo 2025-11-08 19:44:25 +00:00
26990d25dc [Bugfix] Update device name for H200 detection (#28349) Robert Shaw 2025-11-08 14:01:11 -05:00
d9ab1ad9d1 reasoning_content -> reasoning (#27752) Harry Mellor 2025-11-08 04:15:08 -08:00
608bb14462 [Attention] Remove max cudagraph size limit of 992 (#27840) 22quinn 2025-11-07 22:33:27 -08:00
4a36681f85 [flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins (#27990) Xiaozhu Meng 2025-11-07 22:25:21 -08:00
d15afc1fd0 Refactor CPU/GPU extension targets for CMake build (#28026) Abolfazl Shahbazi 2025-11-07 22:17:35 -08:00
934a9c3b79 [Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 (#28101) Isotr0py 2025-11-08 13:01:27 +08:00
70af44fd10 [bugfix] support eagle with lora cudagraph specialization (#28318) gnovack 2025-11-07 19:25:45 -08:00
781f5ebf52 Bump arctic-inference requirement (#28174) Aurick Qiao 2025-11-07 18:31:18 -08:00
0852527647 [Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM (#28124) Michael Goin 2025-11-08 10:20:55 +08:00
61d25dc44b Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) (#28308) Hamid Mukhtar 2025-11-07 21:09:21 -05:00
d0c7792004 [Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding (#21068) Xiaohong (Sean) Chen 2025-11-07 20:58:22 -05:00
b158df2813 remove resolve_op_overloads and use splitting_ops directly (#28081) Boyuan Feng 2025-11-07 17:13:13 -08:00
1aaecda078 [XPU] Enable Expert parallel for MoE models (#28263) Kunshang Ji 2025-11-08 08:33:11 +08:00
811df41ee9 Update Flashinfer from v0.4.1 to v0.5.2 (#27952) Harry Mellor 2025-11-07 16:24:42 -08:00
67a2da890e [PerfFix] Avoid separate thread for MP executor shm spin (take 2) (#28319) Nick Hill 2025-11-07 14:11:03 -08:00
da786e339e [Core] Rework handling of async scheduling config (#28250) Nick Hill 2025-11-07 12:01:23 -08:00
18903216f5 [Bugfix] Fix and add tests for GptOss reasoning parser (#28000) Benjamin Chislett 2025-11-07 14:28:04 -05:00
d0ceb38ae8 [Build] Fix release pipeline failing annotation (#28272) Simon Mo 2025-11-07 10:06:45 -08:00
155ad56d7b [doc] add guide about the provided PTX was compiled with an unsupported toolchain (#28305) youkaichao 2025-11-08 00:26:34 +08:00
5fb4137c99 [README] Add Arm CPUs to the list of supported targets (#28290) Fadi Arafeh 2025-11-07 15:41:47 +00:00
68a72a5cc1 Revert "[PerfFix] Avoid separate thread for MP executor shm spin (#28012)" (#28289) Nicolò Lucchesi 2025-11-07 16:07:01 +01:00
0f872b7977 [Log] update shm wait time msg (#28255) Boyuan Feng 2025-11-07 06:43:30 -08:00
4b1ff13221 [Feature] Default ignore_eos True for random dataset (#28227) Wentao Ye 2025-11-07 07:35:33 -05:00
e0d6b4a867 [CLI] add --max-tokens to vllm complete (#28109) Iceber Gu 2025-11-07 20:21:40 +08:00
72b1c2ae2c [Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439) Pavani Majety 2025-11-07 04:18:39 -08:00
e0919f331d [Core][MM] Add mechanism to configure multimodal fields which should stay on CPU (#28168) Lukas Geiger 2025-11-07 12:14:29 +00:00
8e19d470af [fix] Revert "fixing mm placeholder replacement issue with gemma3" (#28285) Kevin H. Luu 2025-11-07 04:09:09 -08:00
1958bda9b4 [Misc][Model][Refactor] Pass the prefix into Linear layers (#28259) Mengqing Cao 2025-11-07 19:38:38 +08:00
7bdb42b2f2 [CPU]Avoid repeated random sample compile (#28260) Zhang Xiangze 2025-11-07 19:03:57 +08:00
315068eb4a [FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark (#28265) 汪志鹏 2025-11-07 17:35:22 +08:00
ccd98b59c1 [Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead (#28171) Jialin Ouyang 2025-11-07 00:27:12 -08:00
21b82f4ea2 [Kernel] LoRA triton kernels support PDL (#27402) Jee Jee Li 2025-11-07 16:05:48 +08:00
a736e5ff77 [CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly (#28074) Copilot 2025-11-07 15:58:16 +08:00
9da9208b20 [Bug] Fix missing token_ids for reasoning parser models in chat completions #28246 (#28256) baonudesifeizhai 2025-11-07 02:31:58 -05:00
11fd69dd54 [amd][gptoss] Perf gain because of block alignment (#28024) smit kadvani 2025-11-06 21:27:42 -08:00
c0a4b95d64 Fix issues from #28242 (#28257) Harry Mellor 2025-11-06 20:23:17 -08:00
a47d94f18c Add runai model streamer e2e test for GCS (#28079) Alexis MacAskill 2025-11-06 19:07:54 -08:00
e70fbc599b [CI/Build] Loosen STT LoRA Translate Check (Flaky Test) (#28247) Alex Brooks 2025-11-06 19:51:27 -07:00
4bf56c79cc [Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242) Lucas Kabela 2025-11-06 16:16:03 -08:00
59b453eaa2 Speed up mm processor kwargs per request by spliting dynamic and static kwargs (#26483) Junhong Liu 2025-11-07 07:51:28 +08:00
827e4237bc Fix failing test for CRadio (#27738) Eugene Khvedchenya 2025-11-07 01:32:25 +02:00
ca6f755d24 [BugFix] Fix FusedMoELoRA + ModularKernel Integration (#28237) Varun Sundar Rabindranath 2025-11-06 17:53:30 -05:00
ca90f50304 [Test] Add non-MoE DP test coverage (#28235) Matthew Bonanni 2025-11-06 15:59:57 -05:00
da855b42d2 [Doc]: Make extraInit containers fully configurable in helm chart (#27497) Fang Han 2025-11-06 12:27:16 -08:00
449de9001a [ROCm] triton fp8 kernel (#27058) Aleksandr Malyshev 2025-11-06 11:46:44 -08:00
d4aa65c998 [Chore] eliminate duplicated and unconditional object serialization in anthropic messages api (#27792) Vico Chu 2025-11-07 03:09:19 +08:00
7a8375f8a0 Add llama 4 scaling support (#28145) Julien Denize 2025-11-06 19:55:17 +01:00
5e0c1fe69c [Structured outputs] Upgrade llguidance to 1.3.0 (#28039) Andy Lo 2025-11-06 18:24:47 +00:00
4507a6dae4 CODEOWNERS: Add myself as reviewer on security docs (#28216) Russell Bryant 2025-11-06 12:39:42 -05:00
d1dd5f53e4 [Frontend] Fix logging format when enable response logging (#28049) Roy Wang 2025-11-07 00:25:39 +08:00
e52e4da971 [HARDWARE][CPU] Add Option for Disabling Binding to Specific CPU Cores (#27953) StanHatko 2025-11-06 10:47:11 -05:00
2176778cd3 [Doc] Add Arm CPUs are on the list of supported targets in vLLM (#26018) Milos Puzovic 2025-11-06 15:30:26 +00:00
0370679ce9 [Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 (#28200) Eric Yue 2025-11-06 23:29:46 +08:00
8816e375d3 [Docs] Switch to directory style URLs (#28058) Harry Mellor 2025-11-06 07:06:33 -08:00
f32229293e Disable nm-testing models with issues in CI (#28206) Michael Goin 2025-11-06 22:19:07 +08:00
c757a15f0f [CPU]Improve cpu fused moe perf (#27244) xiangze-arm 2025-11-06 19:04:18 +08:00
59a50afa08 [Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony (#26874) Chauncey 2025-11-06 18:40:03 +08:00

... 46 47 48 49 50 ...