Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

3ecabd06ee Fix tpu-inference platform path (#29554) Johnny Yang 2025-11-26 23:25:21 -08:00
c069086b9c [Bugfix] Fix getting device for MoE LoRA (#29475) Jee Jee Li 2025-11-27 15:16:07 +08:00
11ea5ec1ff [Model Runner V2] Refactor CudaGraphManager (#29583) Woosuk Kwon 2025-11-26 21:37:59 -08:00
ecb1952378 [cpu][fix] Fix Arm CI tests (#29552) Fadi Arafeh 2025-11-27 05:09:41 +00:00
da8e1a1bf9 [DOC] Add vLLM Bangkok Meetup info (#29561) TJian 2025-11-27 12:42:50 +08:00
ee80aee1ca [Model Runner V2] Minor cleanup for build_attn_metadata (#29576) Woosuk Kwon 2025-11-26 20:10:12 -08:00
0aeb698b77 [Model Runner V2] Minor code cleanup (#29570) Woosuk Kwon 2025-11-26 19:47:17 -08:00
9bb33c8919 add xpu supported model and model id for cpu (#29380) Louie Tsai 2025-11-26 19:30:50 -08:00
a67dec7cba [Bugfix] fix IMA issue in certain cases of the moe marlin kernel (#28619) Jinzhen Lin 2025-11-27 11:02:21 +08:00
77740191de [Attention][Async] Eliminate seq_lens_cpu in FlashAttention metadata building with DCP > 1 (#29449) Matthew Bonanni 2025-11-26 21:48:43 -05:00
df01eda4dc [Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878) HDCharles 2025-11-26 21:35:13 -05:00
ba1fcd84a7 [TPU] add tpu_inference (#27277) Johnny Yang 2025-11-26 14:46:36 -08:00
56539cddac [Core] Refactor padding logic and pad for CUDA graphs before attention metadata building (#28579) Lucas Wilkinson 2025-11-26 14:07:13 -05:00
430dd4d9eb [Attention] Remove imports from vllm/attention/__init__.py (#29342) Matthew Bonanni 2025-11-26 12:53:15 -05:00
c4c0354eec [CI/Build] allow user modify pplx and deepep ref by ENV or command line (#29131) Alec 2025-11-26 09:41:16 -08:00
e603129505 [refactor] CTConfig methods to static/class methods (#28870) HDCharles 2025-11-26 12:21:58 -05:00
0b0aa874e8 [Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement (#29345) Wentao Ye 2025-11-26 11:38:52 -05:00
70d5953f82 Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841)" (#29483) Huamin Li 2025-11-26 06:27:26 -08:00
3650a74ed8 Optimize the wording of the document and unify the terminology and th… (#29491) yxt 2025-11-26 21:16:12 +08:00
bb706d6048 Fix TeleChatForCausalLM not register issue (#29473) Yejing Lai 2025-11-26 21:15:00 +08:00
e30859dff3 [Bugfix] Fix handling of image embeds in models (#29480) Cyrus Leung 2025-11-26 21:00:15 +08:00
452a7c9f7c [Misc] Allow LM only loading for Pixtral (#29451) Roger Wang 2025-11-26 05:00:00 -08:00
d9d342d214 [Performance][MLA][ROCm] Remove redundant D2D copy in deepseek (#27457) Pleaplusone 2025-11-26 12:45:28 +08:00
53d7f1f601 [Kernel] Use pre-allocated output buffer for triton kernel fused_experts (#29219) Xin Yang 2025-11-25 18:21:00 -08:00
c5ee430328 Bump actions/checkout from 4 to 6 (#29293) dependabot[bot] 2025-11-26 01:57:08 +00:00
8d6a89dffd [UX] Suppress gloo log spam (#29250) Michael Goin 2025-11-25 20:19:35 -05:00
56531b79cc [Misc] Add backup hash algorithm for FIPS constrained environments (#28795) George D. Torres 2025-11-25 18:50:22 -06:00
12866af748 dummy run corner case (#29433) Xieyang Xu 2025-11-25 16:20:35 -08:00
d8819c88eb fix assertion for single world use case (uni) (#29429) Lucia Fang 2025-11-25 16:14:23 -08:00
de75b0bb70 [BugFix] Fix initialization of draft model. (#29319) Andrey Khalyavin 2025-11-26 02:45:58 +03:00
7df0289782 Change warning logs to debug for unimplemented MXFP4 Linear/Attention (#29441) Michael Goin 2025-11-25 17:52:31 -05:00
0abc79482a [caching] Add enable_prompt_embeds and cpu_offload_gb to compile hashes. (#29435) Zhengxu Chen 2025-11-25 16:46:41 -05:00
4e57c6587f [Core] Support logprobs with spec decode + async scheduling (#29223) Nick Hill 2025-11-25 12:55:24 -08:00
e7d776273d [Compile] Refactor. Move PostGradPassManager out of Compilation config (#29340) Ilya Markov 2025-11-25 20:58:56 +01:00
c32a18cbe7 Attempt to fix GPU OOM in a spec-decoding test (#29419) Eldar Kurtić 2025-11-25 20:23:36 +01:00
b07555d26f [responsesAPI][2] parse ResponseFunctionToolCallOutputItem (#29383) Andrew Xia 2025-11-25 10:27:26 -08:00
0353d2e162 Fix RoPE related failures in Transformers nightly tests (#29333) Harry Mellor 2025-11-25 16:23:45 +00:00
a1f2676879 Scheduled removal of override_pooler_config and disable_log_requests (#29402) Harry Mellor 2025-11-25 16:08:57 +00:00
48ddb02b79 [Hybrid Allocator] Support KV cache groups with different block_size (#29143) Yifan Qiao 2025-11-25 07:30:57 -08:00
e502098643 [Kernel] Add NVFP4 MoE CUTLASS support for SM120 (#29242) Michael Goin 2025-11-25 09:59:07 -05:00
dbc3d9991a [UX] Put CUDA attention backend selection log into one line (#29337) Michael Goin 2025-11-25 09:46:18 -05:00
794029f012 [Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type (#29137) Injae Ryou 2025-11-25 23:28:53 +09:00
0231ce836a Revert back to torch.equal over torch.allclose from #28819 (#29086) Eldar Kurtić 2025-11-25 15:23:38 +01:00
516c3f7847 [Bugfix] Fix logic for choosing default prefix caching setting (#29393) Thomas Parnell 2025-11-25 15:05:10 +01:00
51fc9e017a Scheduled removal of CompilationConfig.use_inductor (#29323) Harry Mellor 2025-11-25 12:55:42 +00:00
bf0c75cd4f Make Transformers Nightly tests soft-fail and enable all tests (#29401) Harry Mellor 2025-11-25 12:41:15 +00:00
c2c661af9b [Bugfix] Fix overallocation in MM profiling (#29386) Roger Wang 2025-11-25 04:38:36 -08:00
798e87db5c [Core] Generalize Encoder-Decoder seq_lens computation to avoid Whisper hardcoded logic (#29268) Nicolò Lucchesi 2025-11-25 12:32:11 +01:00
de6889946b [Misc] Suppress log outputs when constructing the default vllm config. (#29291) wang.yuqi 2025-11-25 19:00:44 +08:00
7a80b01889 [CI] Resettle pooling entrypoints tests. (#29370) wang.yuqi 2025-11-25 18:39:10 +08:00
e1dd706cd1 [Frontend] Respect Chat Completion parallel_tool_calls param (#26233) Ben Browning 2025-11-25 04:56:15 -05:00
a685b47c57 [responsesAPI] refactor construct_input_messages (#29359) Andrew Xia 2025-11-25 01:47:10 -08:00
32c40b95e0 [BugFix] bad_words filtering ineffective when n > 1 (#29313) Avishek Goswami 2025-11-25 15:06:34 +05:30
db2906108a [Misc] Streamline unique id generation (#29375) Nick Hill 2025-11-25 00:30:11 -08:00
67fc16cd8c [Bugfix] If chunked_prefill is disabled, end the scheduling early. (#28911) wang.yuqi 2025-11-25 16:06:09 +08:00
6330f9477d [Bugfix] Fix GPT-OSS AR+NORM fusion (#28841) elvischenv 2025-11-25 15:59:40 +08:00
ef1f7030f0 [ROCm][CI] Fix test_cudagraph_mode failure in AMD CI (#29367) Micah Williamson 2025-11-25 01:55:09 -06:00
12c007e288 EAGLE Support DP>1 (#26086) Rémi Delacourt 2025-11-25 08:32:21 +01:00
f242cfcdd5 [Perf] use cpu all reduce to avoid sync when async_scheduling & dp > 1 (#29311) zhrrr 2025-11-25 15:31:07 +08:00
888152bf87 Allow oot custom compiler extension via CompilerInterface (#28623) Icey 2025-11-25 15:25:15 +08:00
fe3a4f5b34 [CI/Build] Pin torchgeo dependency for AMD (#29353) Ryan Rock 2025-11-25 01:14:59 -06:00
98caeadd54 [fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei (#29273) Fadi Arafeh 2025-11-25 07:11:11 +00:00
64deead719 [Bugfix] [ROCm] [UX]: revert Flex attention backend (#29371) vllmellm 2025-11-25 14:56:06 +08:00
7992324f23 [BugFix] Use unique ids for different transcription prompts (#29372) Nick Hill 2025-11-24 22:55:16 -08:00
40a6f53f6c Display warning only when ROCm version is less than Pytorch required version (#29200) Inoki 2025-11-25 07:40:06 +01:00
ce58fdc1c3 Fix PoolingParams.skip_reading_prefix_cache type (#29364) kflu 2025-11-24 22:39:29 -08:00
a21256c463 Add TP CLI argument to multimodal inference examples (#29301) Fanli Lin 2025-11-25 14:03:20 +08:00
316c8492bf Scheduled removal of guided_* config fields (#29326) Harry Mellor 2025-11-25 05:24:05 +00:00
2d9ee28cab [CI/Test Fix] Fix CP tests on Blackwell (#29338) Lucas Wilkinson 2025-11-24 23:55:57 -05:00
81db702ed2 [Attention] add _cudagraph_support for linear attention (#28934) Jiangyun Zhu 2025-11-25 12:25:20 +08:00
92effb07a4 [Model] Add HunyuanOCR support (#29327) Isotr0py 2025-11-25 11:28:51 +08:00
87185c88d5 [Bugfix] Make deprecated --task embedding consistent with `--runner… (#29312) Maryam Tahhan 2025-11-25 03:19:52 +00:00
9cf4edae6e [Metrics] Scheduled removal of deprecated metrics (#29330) Mark McLoughlin 2025-11-25 03:15:13 +00:00
7012d8b45e [Docker] Optimize Dockerfile: consolidate apt-get and reduce image size by ~200MB (#29060) 汪志鹏 2025-11-25 10:54:00 +08:00
22b42b5402 [CI][ROCm] Install arctic-inference on ROCm tests (#29344) Divakar Verma 2025-11-24 20:15:39 -06:00
cb7214d8ea [ROCm][MLA] enable fp8 MLA decode on ROCm (#28032) gbyu-amd 2025-11-25 10:15:02 +08:00
77e10c9cab [Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence (#28029) Pleaplusone 2025-11-25 10:05:46 +08:00
6f1355a1b7 [Perf] Disable DeepGEMM MoE by default when TP=8 is used (#29346) Michael Goin 2025-11-24 21:01:40 -05:00
a4ad43ad5a Scheduled removal of ParallelConfig's direct child EPLB fields (#29324) Harry Mellor 2025-11-25 01:58:58 +00:00
a178a0b40b [BugFix] Fix duplicate id tool-call race condition (#29355) Nick Hill 2025-11-24 17:54:26 -08:00
b8328b49fb [XPU] upgrade torch & ipex 2.9 on XPU platform (#29307) Kunshang Ji 2025-11-25 09:34:47 +08:00
5f9679a43b [Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states (#27688) Hanjie Qiu 2025-11-24 20:13:12 -05:00
699bca76c0 [UX] Raise error for attn backend of batch invariant (#29348) Wentao Ye 2025-11-24 19:49:01 -05:00
c17610e2ba [Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 (#29339) Michael Goin 2025-11-24 18:22:46 -05:00
71df2a57ef [Hybrid Allocator] Better layer padding strategy for gpt-oss eagle (#29303) Chen Zhang 2025-11-24 14:28:32 -08:00
4dd42db566 Remove VLLM_SKIP_WARMUP tip (#29331) Tyler Michael Smith 2025-11-24 17:16:05 -05:00
84371daf75 [Tests] Verify gpt_oss package is installed in harmony tests (#29336) Nick Hill 2025-11-24 14:04:31 -08:00
f32c7d6f54 [Model Runner V2] Simplify Eagle bookkeeping with num_rejected (#29347) Woosuk Kwon 2025-11-24 13:54:59 -08:00
3cfa63ad99 [XPU]fix Kimi-VL-A3B-thinking on xpu (#29309) Yan Ma 2025-11-25 05:02:21 +08:00
4d6afcaddc [CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies (#29270) Benjamin Bartels 2025-11-24 19:40:54 +00:00
97588c4d12 [Model Runner V2] Add minor clarification comments for Eagle (#29332) Woosuk Kwon 2025-11-24 11:28:56 -08:00
839c6b7b72 [Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. (#27721) Chenheli Hua 2025-11-24 11:24:37 -08:00
8f066146c3 [MoE][Refactor] Make select_experts a non-static method (#29067) bnellnm 2025-11-24 13:38:04 -05:00
cec418b5df [Model Runner V2] Change Numba AoT to JIT (#29328) Woosuk Kwon 2025-11-24 09:34:37 -08:00
cc313cb73d [Model Runner V2] Implement Single-step Eagle 1 (#29300) Woosuk Kwon 2025-11-24 09:32:27 -08:00
26a465584a [NIXL] Use config to enable telemetry + NIXL version bump (#29305) Nicolò Lucchesi 2025-11-24 18:18:04 +01:00
e924bbb4f4 [Build/CI][DP/EP] Add QWen/Qwen3-30B-A3B-FP8 + EPLB tests to Nightly H100 and B200 (#29195) Varun Sundar Rabindranath 2025-11-24 11:06:17 -05:00
656516c315 [Bugfix] properly handle nested json with llama3 tool parser (#27701) Aydin Abiar 2025-11-24 07:28:51 -08:00
e48b2e6848 [Bugfix] [ROCm] [UX] Reorganize ROCm Backend Selection Logic (#26980) vllmellm 2025-11-24 22:24:49 +07:00
7a228b5305 Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. (#26199) Laith Sakka 2025-11-24 07:12:41 -08:00

... 40 41 42 43 44 ...