Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

0730414999 [Core] Add audio_embeds support to chat completions (#29059) jeremyteboul 2025-11-20 19:39:47 -08:00
a982f5b5ea [kernel][perf] support uncontiguous input for rms_norm kernel (#28103) zhrrr 2025-11-21 11:39:09 +08:00
0e741c12e3 [Bugfix] Fix Plamo3 rope handling (#29092) Cyrus Leung 2025-11-21 11:38:35 +08:00
56669c1f29 [CI] Fix mypy for vllm/v1/worker (#29037) Wentao Ye 2025-11-20 22:36:07 -05:00
3f5f36da3f [ROCm] Fix for import when building with upstream triton for gfx1100 for gpt-oss serving (#29127) Hongxia Yang 2025-11-20 22:30:07 -05:00
e1eefa4c40 [Bug] Fix torch warning of tf32 usage (#29112) Wentao Ye 2025-11-20 20:54:59 -05:00
ed6ae1e36a [AITER] [ROCm] Fix crash when loading llama4 model with old aiter version installed, fallback to forward_native implementation (#29124) Xiao Li 2025-11-20 17:54:35 -08:00
9875be6431 [LoRA][2/2]Remove LoRA extra vocab (#28545) Jee Jee Li 2025-11-21 09:46:43 +08:00
df44df0143 [Feature] Shared Experts Overlap with FI deepgemm swap kernel, 2.2% throughput improvement and 3.6% TTFT improvement (#28879) Wentao Ye 2025-11-20 20:41:49 -05:00
87cbbdff63 Update model references for OLMo3 (#29099) Michael Goin 2025-11-20 20:16:52 -05:00
986ab5db63 [CI Bugfix] Fix Kernels DeepGEMM Test (H100) (#29106) Michael Goin 2025-11-20 19:42:33 -05:00
dd39f91edb [Doc] cleanup TPU documentation and remove outdated examples (#29048) Rob Mulla 2025-11-20 19:05:59 -05:00
c7a29d2c8d [CI/Build] Remove skip global cleanup in test_struct_output_generate.py (#29022) rasmith 2025-11-20 15:44:37 -06:00
8237ab8a2b [CI/Build] Skip lm-format-enforcer tests in test_struct_output_generate.py for now (#29021) rasmith 2025-11-20 15:35:14 -06:00
3fd74189db Fixes bench (#29058) Driss Guessous 2025-11-20 13:21:54 -08:00
5e5a7eb16f [CI/Build] Make test_attention_selector.py run tests on correct platform (#29064) rasmith 2025-11-20 14:45:56 -06:00
3d84ef9054 [CI/Build][AMD] Skip if flash_attn_varlen_func not available in test_aiter_flash_attn.py (#29043) rasmith 2025-11-20 14:39:49 -06:00
4d01b64284 [Bugfix] - Add Trace Headers to Beam Search Path (#29100) Software Developer 2025-11-20 21:00:33 +01:00
114b0e2500 [chore] Update annotate release scripts (#29077) Kevin H. Luu 2025-11-20 10:22:40 -08:00
647464719b [KVConnector][Core] Support cross-layer KV blocks (#27743) Or Ozeri 2025-11-20 20:09:59 +02:00
e5bfcb6a88 [BugFix][PD]: make example proxy usable with P2pNcclConnector (#26628) Pan Li 2025-11-21 01:38:31 +08:00
22924383e1 Updating the mirror of test-amd.yaml as of 2025-11-18 (#29016) Alexei-V-Ivanov-AMD 2025-11-20 11:07:06 -06:00
56f45eddaf [Frontend] Optimize beam search loop by sorting and then splicing (#19347) rookie 2025-11-21 01:02:30 +08:00
82b05b15e6 [BugFix] [FEAT] Enable fastsafetensors for ROCm platform (#28225) TJian 2025-11-20 23:34:11 +07:00
a2e9ebe9e2 [BugFix] Fix flash_attn import in siglip2navit.py (#29082) Fanli Lin 2025-11-20 20:14:29 +08:00
93c8672ceb [Bugfix] Fix spec decode memory regression after #28549 (#28819) Zhewen Li 2025-11-20 03:05:50 -08:00
371b1d4c61 [RL] Add Pause and Resume Generation for Asynchronous RL Training (#28037) Samit 2025-11-20 19:01:03 +08:00
c9e093116c [MODEL] Implement plamo3 (#28834) Shinichi Hemmi 2025-11-20 20:00:19 +09:00
c0c2dd1e0b [BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (#28951) Or Ozeri 2025-11-20 12:55:10 +02:00
06c20c9904 [ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA (#26670) Pleaplusone 2025-11-20 18:54:01 +08:00
6eb745d9bd Add truncate arg to yarn to match openai implementation of gpt-oss (#28244) Anna Shors 2025-11-20 02:53:50 -08:00
66483a9d00 [Chore] Update xgrammar version from 0.1.25 to 0.1.27 (#28221) cjackal 2025-11-20 19:53:09 +09:00
edfe867208 [Misc] don't cache CUTLASS_REVISION var in CMakeLists.txt (#28518) Jinzhen Lin 2025-11-20 18:52:53 +08:00
dc45efc8ef [BugFix] Fix Llama4 Pipeline Parallelism Assert Error (#28577) Dezhan 2025-11-20 02:52:36 -08:00
fb8851f254 [Bugfix][cache_kernels]: Fix OOB in cache_kernels.cu (#28760) Vensen 2025-11-20 18:52:02 +08:00
a903d59ffa cleanup at::Tag::needs_fixed_stride_order (#28974) Boyuan Feng 2025-11-20 02:51:36 -08:00
322cb02872 [CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032) rasmith 2025-11-20 03:48:09 -06:00
2c52c7fd9a [Bug] Fix torch dynamo warning Dynamo detected a call to a functools.lru_cache (#29038) Wentao Ye 2025-11-20 03:52:23 -05:00
1e1c06789e [ci][amd] fix EPLB execution test (#28742) Bradley D 2025-11-19 23:53:38 -08:00
7218f83992 [ROCm][BugFix] Fix shared expert loading error when disable VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS (#28633) Pleaplusone 2025-11-20 15:50:23 +08:00
20e4497be2 [V0 Deprecation] Remove num_lookahead_slots (#29000) Cyrus Leung 2025-11-20 14:39:10 +08:00
1c7bcc55b8 [Frontend] Allow parsed tool arguments (#28820) Quentin Gallouédec 2025-11-19 23:20:12 -07:00
a9705a290a [Model][QwenVL] Replace torch.repeat_interleave with faster np.repeat (#28964) Lukas Geiger 2025-11-20 06:04:23 +00:00
64192d5624 [Bugfix] Revert custom attention mask for gemma3-mm (#28995) Isotr0py 2025-11-20 13:23:22 +08:00
fe25772aa9 [Bugfix] Handle broken frames in video loading (#29001) Canlin Guo 2025-11-20 12:38:12 +08:00
0cca9b4d13 [Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM (#28972) prashanth058 2025-11-19 19:50:37 -08:00
a8c536829c Consolidate Nvidia ModelOpt quant config handling for all quantization methods (#28076) Shengliang Xu 2025-11-19 19:39:36 -08:00
fcbcba6c70 [Feat] Iteration-level profiling for Torch and CUDA profiler (#28987) Benjamin Chislett 2025-11-19 22:17:48 -05:00
3168285fca [cpu][ci] Add initial set of tests for Arm CPUs (#28657) Fadi Arafeh 2025-11-20 02:37:09 +00:00
3fb0d90999 [AMD] Use Decoupled Kernel Block Size to Support AITER MLA block_size=1 (#27715) Qiang Zhang 2025-11-20 10:11:52 +08:00
05c2dee7e9 [DeepSeek + LMCache Multiprocess] handle MLA for deepseek model + LMCache Multiprocess connector (#29039) Kuntai Du 2025-11-20 09:40:49 +08:00
1d642872a2 [torchao] fix safetensors for sharding (#28169) liangel-02 2025-11-19 19:39:45 -05:00
9ccef8e333 [Misc] Colorize logs (#29017) Nick Hill 2025-11-19 16:26:04 -08:00
537cc635c7 [GC Debugger] Simply and improve GC Debugger Utils (#29029) Jialin Ouyang 2025-11-19 16:10:22 -08:00
5031cd5d55 [Refactor] Optimize select_experts (#28069) Wentao Ye 2025-11-19 18:53:15 -05:00
3aaa94ac99 [Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier (#28687) Alexander Matveev 2025-11-19 18:47:13 -05:00
8e38e99829 [Feature] EPLB on Qwen3VLMoe and CompressedTensorsWNA16MoEMethod (#28849) JartX 2025-11-20 00:30:08 +01:00
0075bfffd4 [CI] Fix precommit rope_theta issue (#29040) Wentao Ye 2025-11-19 17:22:43 -05:00
275de34170 [BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 (#29036) v0.11.2 Lucas Wilkinson 2025-11-19 16:43:54 -05:00
fa3ffb4365 [BugFix] Ray with multiple nodes (#28873) Julien Denize 2025-11-19 22:20:58 +01:00
6d5974369c [BugFix] Fix async-scheduling + FlashAttn MLA (#28990) Lucas Wilkinson 2025-11-19 10:04:07 -05:00
0ce9990d2c [NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 (#28938) Johnny 2025-11-19 01:44:27 +01:00
cb0a7b4bea [Bugfix] Move flashinfer kernel check into ``__init__` function of `FusedMoE`` (#29018) Max Hu 2025-11-19 16:54:15 -05:00
8f4f77a727 [BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 (#29036) Lucas Wilkinson 2025-11-19 16:43:54 -05:00
22e44ad589 [ROCm][CI] Fix Weight Loading With Multiple GPU Tests on ROCm (#28984) Micah Williamson 2025-11-19 15:31:33 -06:00
88f5b19f0b [DeepSeek] Fix DeepSeek V3.2 Rope Embedding (#28968) Yongye Zhu 2025-11-19 16:30:04 -05:00
613abb50d5 [MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990) Shu Wang 2025-11-19 15:29:06 -06:00
cdeec2e606 [BugFix] Ray with multiple nodes (#28873) Julien Denize 2025-11-19 22:20:58 +01:00
1607e664f0 [Bug] Fix Batch Invariant MLA test (#28967) Wentao Ye 2025-11-19 16:18:32 -05:00
68d7231991 [CI/Build] Fix test_prefix_prefill for AMD (#28905) Ryan Rock 2025-11-19 15:04:36 -06:00
2fd893b4ce [Feature] Prefill Context Parallel (PCP) basic support (#28718) Qiu 2025-11-20 04:52:44 +08:00
02f5903b84 Eagle: MM Cuda Graphs with MRope (#28896) Izzy Putterman 2025-11-19 12:01:05 -08:00
ac10fd3c69 Upstreaming aiter triton attention backend as a new backend (#28701) Aleksandr Malyshev 2025-11-19 11:59:30 -08:00
9d2d561257 [Bugfix] Fix precision corruption when shared_experts_stream=None (#28942) 杰兮 2025-11-20 03:30:57 +08:00
fe69f331f8 [Kernels] Improve H200 Fused MoE Config (#28992) Robert Shaw 2025-11-19 14:23:54 -05:00
3319a493fc [Core] Reuse created spec tokens lists to mitigate GC cost (#28917) Jialin Ouyang 2025-11-19 11:20:22 -08:00
61728cd1df Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests (#28966) Copilot 2025-11-19 13:32:19 -05:00
0c80efd94f GLM-V video segmentation solution adjustment (#28941) Yuxuan Zhang 2025-11-20 01:32:55 +08:00
a8b70304d6 Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542) Harry Mellor 2025-11-19 18:06:36 +01:00
d44e9df7d4 [Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device (#26487) Shanshan Shen 2025-11-20 00:24:55 +08:00
48fc8b1e59 [BugFix] Fix async-scheduling + FlashAttn MLA (#28990) Lucas Wilkinson 2025-11-19 10:04:07 -05:00
1ffe934c8a [torch.compile] caching of config fields should be opt-out by default (#26468) vnadathur 2025-11-19 06:13:54 -08:00
2c8b9182b5 [CI] Reorganize compile tests so new tests are automatically included in CI (#28625) Yanan Cao 2025-11-19 06:13:50 -08:00
4f5299f717 Relax Transformers modeling backend MoE experts check (#28952) Harry Mellor 2025-11-19 14:50:30 +01:00
09540cd918 [Doc]: fix typos in various files (#29010) Didier Durand 2025-11-19 13:56:21 +01:00
da2f6800e0 [Feat][Perf] Enable deepep-low-latency with round-robin expert placement. (#28449) Chen Bruce 2025-11-19 20:46:24 +08:00
ba558c029a [config] Expose get_total_num_hidden_layers() in ModelConfig (#28961) Tova Movshovitz 2025-11-19 13:37:11 +02:00
97cfa99d59 [Docs] Take env var definition out of folded admonition (#29005) Harry Mellor 2025-11-19 12:32:04 +01:00
bbc6c2f1e5 [CI/Build] Fix broken build on Apple M1 (#28999) j20120307 2025-11-19 03:07:22 -08:00
8151609583 refactor(cpu_types_scalar.hpp): Unify scalar loop implementations using unroll_loop (#28847) ihb2032 2025-11-19 19:05:44 +08:00
fdf93486d6 [Docs] Clean up moe_kernel_features.md (#28530) Michael Yao 2025-11-19 18:35:29 +08:00
d69062c67a add support for --fully-sharded-loras in fused_moe (#28761) gnovack 2025-11-19 00:32:00 -08:00
ae4821a108 Add CPU support model (#28697) Louie Tsai 2025-11-18 23:47:57 -08:00
7ed27f3cb5 [Doc]: fix typos in various files (#28945) Didier Durand 2025-11-19 07:52:30 +01:00
a4511e38db Speed up macOS smoke test (#28954) Michael Goin 2025-11-19 01:46:32 -05:00
71d0ae1c54 [Misc] Update embedding/cross encoder tests to use mteb v2 (#27329) Roman Solomatin 2025-11-19 09:28:40 +03:00
3d4e7d34be [Model][QwenVL] Simplify cos/sin rotary embedding indexing (#28962) Lukas Geiger 2025-11-19 05:43:01 +00:00
6a25ea5f0e [Docs] Update oneshot imports (#28188) Uranus 2025-11-19 13:30:08 +08:00
73ff872db0 [Bugfix] Fix typo in Qwen3 Next model executor (#28960) Gleb Kurchanov 2025-11-19 08:21:02 +03:00
468a8d72ba [Bugfix] Fix FusedMoEModularKernel for triton backend (#28913) Xin Yang 2025-11-18 21:05:22 -08:00

... 42 43 44 45 46 ...