Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

1c46dea001 Revert "[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#308… (#31617) shyeh25 2026-01-11 04:39:59 +08:00
028599739d [BugFix] scheduler: Fix resuming of preempted requests after async load (#31583) Or Ozeri 2026-01-10 22:39:25 +02:00
d1fd802fa3 fused_moe_kernel - cast accumulator after applying router weights (#32002) gnovack 2026-01-10 12:36:45 -08:00
543c23be78 [LoRA][Perf] Improve FusedMoE LoRA performance for small rank (#32019) Xin Yang 2026-01-10 11:04:18 -08:00
b8bf5c45bb [Kernel] Optimize Sliding Window Attention in 3D Triton Kernel (#31984) jvlunteren 2026-01-10 19:13:44 +01:00
e6c6f2c79d [Quant] Support MXFP4 W4A16 for compressed-tensors dense models (#31926) Michael Goin 2026-01-10 09:44:35 -05:00
07286ec5a6 [Bugfix] Fix integer overflow in Gemma3n audio processing (#31657) Jeremy Teboul 2026-01-10 01:52:53 -08:00
14fc7a68c7 [Bugfix] fix offline chat output prompt (#32076) Ning Xie 2026-01-10 15:50:57 +08:00
5f2385a4c8 [Benchmark][1/2] Generalize SLA criterion validation from binary flags to margins (#32075) Cyrus Leung 2026-01-10 15:11:03 +08:00
a01a1c0d69 [Bugfix] fix encoder cache leak of waiting requests in scheduler to solve stuck in CPU scheduling (#31857) Frelam 2026-01-10 14:27:58 +08:00
da6709c9fe [Misc] Delay deprecation of CommonAttentionMetadata properties (#32074) Lucas Wilkinson 2026-01-10 00:06:44 -05:00
d83becd503 [ROCm][CI] Fix flaky test_function_calling_with_stream and reduce schema test examples (#32063) Andreas Karatzas 2026-01-09 23:02:35 -06:00
0c9614876e Update modelopt KV cache quantization resolution to new scheme (#31895) roikoren755 2026-01-10 06:54:13 +02:00
583a90e005 [Refactor] Separate sequence and token pooling types (#32026) Cyrus Leung 2026-01-10 12:53:24 +08:00
52d428295d [Core] Refactor ColumnParallelLinear: remove unused parameter and optimize forward (#31939) maang 2026-01-10 12:19:49 +08:00
c60578de0a [Bugfix][Hardware][AMD] Use dynamic WARP_SIZE in sampler vectorized_process (#31295) Kevin McKay 2026-01-09 21:57:38 -06:00
80fead8bf6 Fuse RoPE and MLA KV-cache write (#25774) PatrykSaffer 2026-01-10 04:18:37 +01:00
e45946bd91 feature/issac 0.2 (#31550) Akshat Shrivastava 2026-01-09 19:18:05 -08:00
ea6d067a2a [Misc][LLaMa4] Compile LLaMa Vision Encoder (#30709) Lucas Kabela 2026-01-09 19:01:38 -08:00
abd9224280 resolve pydantic error in startup benchmark (#31348) Ning Xie 2026-01-10 10:41:27 +08:00
4dc0d606b7 [Bugfix] Narrow broad exceptions in compilation backends (#31616) Kevin McKay 2026-01-09 20:39:22 -06:00
ac0675ff6b [CI] Allow Deprecated Quantization For LM Eval Tests (#32065) Micah Williamson 2026-01-09 20:10:47 -06:00
e18464a57d [Perf] Optimize async scheduling placeholder using empty (#32056) Wentao Ye 2026-01-09 19:46:11 -05:00
1963245ed1 [Core] Use weights_only=True with torch.load (#32045) Russell Bryant 2026-01-09 19:28:57 -05:00
0308901975 [2/N][Attention] Fix pre-commit errors (#32052) Matthew Bonanni 2026-01-09 19:27:15 -05:00
aaf4b70aae [Misc][BE] Type coverage for vllm/compilation [2/3] (#31744) Lucas Kabela 2026-01-09 15:30:38 -08:00
3adffd5b90 [Misc] Enable async scheduling by default with spec decoding (#31998) Nick Hill 2026-01-09 15:09:34 -08:00
97ba96fbe9 [perf][async] support non cpu sync get logprob tensors for spec (#31336) zhrrr 2026-01-10 05:24:51 +08:00
94578127a4 [NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout (#30275) Chendi.Xue 2026-01-09 15:22:19 -06:00
2612ba9285 [1/N][Attention] Restructure attention: move files (#31916) Matthew Bonanni 2026-01-09 16:10:24 -05:00
1f8b7c536b [responsesAPI] fix incomplete_messages for simple/parsable context (#31836) Andrew Xia 2026-01-09 16:00:57 -05:00
0a0aa07747 [Quant] Make static quant support all group shapes (#30833) Lucas Wilkinson 2026-01-09 15:49:27 -05:00
f9e2a75a1e [fix] add cutedsl to global sf (#32001) jiahanc 2026-01-09 12:03:02 -08:00
a4d5d663e2 Add unpermute-aware fused MoE path and small-batch fallback (#29354) Runkai Tao 2026-01-09 14:58:39 -05:00
657e9c0e18 [Fix] Introduce audio channels spec (#31595) Jeremy Teboul 2026-01-09 11:34:51 -08:00
308feab33f [Perf] Optimize cutlass moe problem size calculation, 5.3% E2E Throughput improvement, 2.2% TTFT improvement (#31830) Wentao Ye 2026-01-09 14:13:43 -05:00
28ae32a5d3 [Refactor] Remove numpy split in async scheduling (#32034) Wentao Ye 2026-01-09 14:09:02 -05:00
f32c629eb4 [Frontend][gpt-oss] Allow system message to overwrite model identity (#31737) Andrew Xia 2026-01-09 14:03:57 -05:00
cd4a95e3aa [Feat][Core] Support multiple KV cache groups in Hybrid KV Coordinator (#31707) Yifan Qiao 2026-01-09 10:53:20 -08:00
d5ec6c056f [UX] Add vLLM model inspection view (#29450) Michael Goin 2026-01-09 12:12:35 -05:00
08d954f036 [Doc] Add developer guide for CustomOp (#30886) Shanshan Shen 2026-01-10 00:21:11 +08:00
ac9f9330e6 Rename --exclude-log-deltas to --enable-log-deltas (#32020) Kevin Šuc 2026-01-09 16:30:40 +01:00
2d0c5b630e [Doc] Remove hardcoded Whisper in example openai translation client (#32027) Isotr0py 2026-01-09 22:44:52 +08:00
34cd32fe30 [Perf][Kernel] Fused SiLU+Mul+Quant kernel for NVFP4 cutlass_moe (#31832) Michael Goin 2026-01-09 09:40:33 -05:00
8e27663b6a [CPU] Add head sizes 80 and 112 with vec16 fallback (#31968) R3hankhan 2026-01-09 19:44:46 +05:30
7cdf7e2fe0 [Model] Remove redundant None check in DeepSeekOCR image input processing (#32016) maang 2026-01-09 22:12:44 +08:00
bbf80ede43 Fix type error (#31999) Adolfo Victoria 2026-01-09 06:03:32 -08:00
4505849b30 [ROCm][PD] add moriio kv connector. (#29304) inkcherry 2026-01-09 22:01:57 +08:00
db07433ce5 [Misc] Skip hashing kwargs if value is None (#32025) Roger Wang 2026-01-09 05:20:59 -08:00
e02706d2d2 [ROCm][CI][V1] Fix nixl_connector test failure and achieve CUDA parity in test_async_scheduling (#32000) Andreas Karatzas 2026-01-09 06:48:32 -06:00
b474782ad7 [Feature][Benchmarks] Custom dataset: read output length from dataset (#31881) Sophie du Couédic 2026-01-09 13:40:59 +01:00
55212c1404 fix: remove duplicate engine_id check in nixl_connector (#31948) Bofeng Xue 2026-01-09 20:13:17 +08:00
e7b68f4d6c [Bugfix] Fix Triton FusedMoE LoRA (#30585) Xin Yang 2026-01-09 03:46:59 -08:00
1a19e9cd87 [Bugfix][ROCm]Fix Qwen3-Next-80B-A3B-Thinking inference and optimize non-standard block size (544) support under rocm_atten (#31380) vllmellm 2026-01-09 12:28:02 +01:00
c8ed39b9dd [Model] Reorganize pooling layers (#31973) Cyrus Leung 2026-01-09 19:02:14 +08:00
020732800c [Bugfix] Fix OpenAPI schema test failures (#31921) Andreas Karatzas 2026-01-09 04:56:20 -06:00
dc77cb7129 [Bugfix] Fix Var Length Batched Padding in Granite Speech (#31906) Alex Brooks 2026-01-09 03:28:43 -07:00
bde38c11df fix lora moe sharding when rank < max_lora_rank (#31994) gnovack 2026-01-08 22:43:25 -08:00
707b240d7e [Bugfix] Fix FusedMoE LoRA w2_output_size (#31949) Xin Yang 2026-01-08 21:54:05 -08:00
29ce48221c [Cleanup] Remove obsolete spec decoding compatibility logic (#32003) Nick Hill 2026-01-08 21:44:18 -08:00
7a05d2dc65 [CI] [ROCm] Fix tests/entrypoints/test_grpc_server.py on ROCm (#31970) TJian 2026-01-09 12:54:20 +08:00
a1648c4045 [ROCm][CI] Fix test_token_classification.py::test_bert_models (#31993) Divakar Verma 2026-01-08 22:04:33 -06:00
e2d49ec2a4 [Bugfix] missing tokens occur in harmony streaming (#30437) RioS 2026-01-09 12:59:34 +09:00
8413868dab [Bugfix] Fix typo in FusedMoE LoRA reshape comment (#31992) Xin Yang 2026-01-08 18:46:05 -08:00
8ff4a99566 [Async][Feat] support apply penalty or bad_words for async + spec (#30495) zhrrr 2026-01-09 10:31:50 +08:00
a4ec0c5595 [Frontend] Add MCP tool streaming support to Responses API (#31761) daniel-salib 2026-01-08 17:19:34 -08:00
0fa8dd24d2 [Bugfix] Fix Typo from NVFP4 Refactor (#31977) Robert Shaw 2026-01-08 19:18:50 -05:00
6ebe34d6fa [Feature] Add iteration level logging and enhance nvtx marker (#31193) Max Hu 2026-01-08 19:13:39 -05:00
11cec296dd [BugFix] Add spec-decode-incompatible request param validation (#31982) Nick Hill 2026-01-08 16:08:21 -08:00
5825bbc1f7 [Quantization] Deprecate Long Tail of Schemes (#31688) Robert Shaw 2026-01-08 19:07:45 -05:00
d62cfe546d [MoE Refactoring][Bugfix]Wrap WNA16 Triton kernel into mk and change compressed tensor kernel selection (#31752) Yongye Zhu 2026-01-08 16:01:30 -08:00
6cdf015c3c [Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747) Lucas Wilkinson 2026-01-08 18:20:49 -05:00
5d3b6097ad [Compressed-Tensors] Simplify NVFP4 Conditions, enable marlin support for NVFP4A16 MoEs (#30881) Dipika Sikka 2026-01-08 17:45:17 -05:00
e74698c27a [Misc][Refactor] Add FusedMoERouter object (#30519) bnellnm 2026-01-08 15:52:55 -05:00
aa125ecf0e [Frontend] Improve error message (#31987) Cyrus Leung 2026-01-09 04:07:03 +08:00
f16bfbe5bc [Documentation][torch.compile] Add documentation for torch.compile + multimodal encoders (#31627) Lucas Kabela 2026-01-08 11:33:24 -08:00
87e07a6b46 Revert "feat(moe): Add is_act_and_mul=False support for Triton MoE kernels" (#31978) Michael Goin 2026-01-08 14:31:53 -05:00
7508243249 [Model Runner V2] Simplify BlockTables with UVA (#31965) Woosuk Kwon 2026-01-08 10:24:26 -08:00
83e1c76dbe [CI][ROCm] Fix NIXL tests on ROCm (#31728) Nicolò Lucchesi 2026-01-08 18:34:43 +01:00
a563866b48 Fix ijson build for Power. (#31702) Nishidha Panpaliya 2026-01-08 22:42:33 +05:30
a3d909ad2b [Misc] Tidy up some spec decode logic in GPUModelRunner (#31591) Nick Hill 2026-01-08 09:10:07 -08:00
49568d5cf9 [Doc] Improve MM models LoRA notes (#31979) Jee Jee Li 2026-01-09 00:55:22 +08:00
b8112c1d85 [Bugfix] Fix vllm serve failure with Nemotron Nano V3 FP8 (#31960) danisereb 2026-01-08 18:08:37 +02:00
eaba8ece77 [Bugfix]: Fix Step3ReasoningParser missing is_reasoning_end_streaming (#31969) Chauncey 2026-01-08 23:28:13 +08:00
fe86be66c5 [Model] Support IQuestCoder model (#31575) yxing-bj 2026-01-08 22:42:57 +08:00
1da3a5441a [Docs]: update claude code url (#31971) Chauncey 2026-01-08 22:04:55 +08:00
72c068b8e0 [CI] [Bugfix] Fix unbounded variable in run-multi-node-test.sh (#31967) TJian 2026-01-08 21:42:01 +08:00
7645bc524b [OpenAI] Fix tool_choice=required streaming when output has trailing extra data (#31610) Mary 2026-01-08 13:01:42 +00:00
1123a87892 [Model] Enable LoRA support for Pixtral (#31724) Ce Zhao 2026-01-08 08:00:57 -05:00
03fd76c570 [Model] Add LFM2-VL model support (#31758) tianshu-Michael-yu 2026-01-08 05:00:27 -08:00
59d260f5e4 [Model] Add Grok-2 (#31847) Bijaya Dangol 2026-01-08 13:59:48 +01:00
18d4e481d0 [Voxtral] Fix speech transcription api (#31388) Patrick von Platen 2026-01-08 12:34:19 +02:00
2972a05473 [MM Encoder]: Make MMEncoderAttention's scale takes effect properly (#31950) Isotr0py 2026-01-08 18:33:48 +08:00
5576227bc1 [Model] Standardize common vision encoders (#31947) Cyrus Leung 2026-01-08 18:33:16 +08:00
d1b6fe007f [Chore] Further cleanup pooler (#31951) Cyrus Leung 2026-01-08 18:16:21 +08:00
04a49669d1 RayLLM Bugfix - Preserve obj store URL for multi engine_config creation (#30803) omer-dayan 2026-01-08 12:00:25 +02:00
96fcd3c267 [Misc] Support qwen3-next lora (#31719) BingjiaWang 2026-01-08 17:27:50 +08:00
1f214290d6 fix(compile): apply partition wrapper when loading AOT cached functions (#31536) DevByteAI 2026-01-08 11:27:26 +02:00
8cbdc7eb94 [CI/Build] Enable test_kv_cache_events_dp for AMD (#31834) Ryan Rock 2026-01-08 03:00:24 -06:00
b634e619bb Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635) Lumosis 2026-01-08 01:00:07 -08:00

... 28 29 30 31 32 ...