Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

49d9653852 [ROCm][CI] fix get_valid_backends (#32787) Divakar Verma 2026-01-21 22:27:47 -06:00
a1d82466ea [Docs] Remove outdated async_scheduling limitation with speculative decoding (#32775) Ifta khairul Alam Adil 2026-01-22 05:19:25 +01:00
24a163ed77 Cleanup some huggingface_hub-related stuff (#32788) Lucain 2026-01-22 04:38:17 +01:00
378385b90c [EC Connector] Optimize remote cache check in scheduler (#32585) knlnguyen1802 2026-01-22 11:30:59 +08:00
c5487e2b96 [Bugfix] Fix potential EAGLE spec decode segfault during graph capture (#32818) Matt 2026-01-21 21:11:55 -06:00
6437ff1fb9 [Deprecation] Remove deprecated environment variables (#32812) Wentao Ye 2026-01-21 21:25:16 -05:00
5e00b561cd [Model Runner V2] Do not error on attention backends (#32820) Woosuk Kwon 2026-01-21 17:02:48 -08:00
408195ec59 [Model Runner V2] Refactor Prompt Logprobs (#32811) Woosuk Kwon 2026-01-21 15:12:20 -08:00
63227accf5 [Kernel] Add topk_sigmoid kernel (#31246) Xin Yang 2026-01-21 14:49:51 -08:00
e675dda67b [Misc] Add Helion version check to collect_env (#32797) Yanan Cao 2026-01-21 13:54:46 -08:00
24dc30f7ff [ModelRunner V2] Don't pin reused flashinfer tensors (#32799) Nick Hill 2026-01-21 13:17:43 -08:00
180fba653e [ROCm] fix import for on_gfx9 (#32783) Divakar Verma 2026-01-21 12:41:11 -06:00
f999539869 Add missing import of fused_topk to benchmark_moe (#32784) danisereb 2026-01-21 20:30:10 +02:00
e1da249c93 [Model Runner V2] Minor refactor for compute_slot_mappings (#32794) Woosuk Kwon 2026-01-21 10:24:35 -08:00
9b693d023c [Misc] Omit "disable NCCL for DP sync" startup log when not applicable (#32707) Nick Hill 2026-01-21 09:03:39 -08:00
808d6fd7b9 Bump Flashinfer to v0.6.1 (#30993) elvischenv 2026-01-22 00:49:50 +08:00
1861ae8aae [PluggableLayer][1/N] Define PluggableLayer (Fix ci) (#32744) whx 2026-01-22 00:38:04 +08:00
4e31b7f228 [Quantization][Deprecation] Remove RTN (#32697) Robert Shaw 2026-01-21 11:34:42 -05:00
6c20e89c02 [ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp (#29287) Pleaplusone 2026-01-21 23:16:30 +08:00
85f55c943c [Quantization][Deprecation] Deprecate HQQ (#32681) Robert Shaw 2026-01-21 09:32:40 -05:00
cea3c754c4 [Quantization][Deprecation] Remove DeepSpeedFp8 (#32679) Robert Shaw 2026-01-21 09:32:12 -05:00
42135d6898 [MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414) Robert Shaw 2026-01-21 08:22:33 -05:00
e14467be43 [bugfix] Aria model (#32727) Divakar Verma 2026-01-21 07:11:31 -06:00
7727ce35c2 [Model] Add Eagle2.5-8B Vision-Language Model support (#32456) Kim Hee Su 2026-01-21 18:39:53 +09:00
6bb2bc71e2 [Bugfix] Force using spawn multiprocess method when it's the WSL platform (#32749) Yanwen Lin 2026-01-21 01:35:55 -08:00
c80f92c14d [Documentation] Fix typo in docs/design/torch_compile_multimodal.md (#32741) Lucas Kabela 2026-01-20 23:54:20 -08:00
f23fb5a7c1 [Bugfix] Support HF sharded weights for Mistral3/Pixtral models (#32673) RickyChen / 陳昭儒 2026-01-21 15:27:30 +08:00
360aa93f8f [Docs] Fix GitHub handle in governance process (#32582) Paco Xu 2026-01-21 15:07:50 +08:00
27ca95b3c9 [Bugfix] Fix Nemotron-Nano-v2-vlm static resolution (#32682) Netanel Haber 2026-01-21 08:28:21 +02:00
b4f64e5b02 Update FlashMLA (#32491) Lucas Wilkinson 2026-01-20 22:03:37 -07:00
7ab80a8e37 Added qwen3 vision language moe support for speculative decoding (#32048) shanjiaz 2026-01-20 22:24:05 -05:00
0900cedb3f Enable Eagle3 speculative decoding for Pixtral (LlavaForConditionalGeneration) (#32542) gopalsarda 2026-01-20 19:18:05 -08:00
6f067b1fb7 [Cleanup] Remove unused KVConnectorModelRunnerMixin methods (#32077) Nick Hill 2026-01-20 19:16:37 -08:00
27b81e010d [Bugfix] Fix Granite Vision / Don't use Siglip Pooling Head Nested Models by Default (#32299) Alex Brooks 2026-01-20 20:11:52 -07:00
7013e9ac8f OffloadingConnector: Prevent redundant loads (#29087) Or Ozeri 2026-01-21 03:15:42 +02:00
c78ee240b3 Revert "[PluggableLayer][1/N] Define PluggableLayer" (#32725) Robert Shaw 2026-01-20 19:21:06 -05:00
d2389c1262 fp8 online quant: split out Fp8OnlineLinearMethod (#32189) Vasiliy Kuznetsov 2026-01-20 18:13:22 -05:00
22375f8d13 [ROCm][CI] Remove DS async eplb accuracy test from AMD CI (#32717) Micah Williamson 2026-01-20 15:40:48 -06:00
9b67338b78 [Bugfix] Suppress log on non-ROCm platform (#32703) TJian 2026-01-21 05:38:20 +08:00
2261340806 [Misc] Remove pad_for_cudagraphs from config (#30143) Lucas Wilkinson 2026-01-20 13:05:48 -07:00
86c69dc54c [Bugfix] Fix byte fallback handling when using outlines (#31391) Shinichi Hemmi 2026-01-21 04:48:08 +09:00
7c5dedc247 [AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205) dolpm 2026-01-20 11:45:59 -08:00
193069d129 [5/N] Initialize MM components in context managers (Q-Z) (#32695) Cyrus Leung 2026-01-21 03:10:23 +08:00
f0feb1cf81 Test: added acceptance length tests (#32030) Rahul Tuli 2026-01-21 00:25:15 +05:30
09194b90a5 [Doc] Update docs for MM model development with context usage (#32691) Cyrus Leung 2026-01-21 02:37:35 +08:00
9ab4388cd3 [Model Runner V2] Support FLASHINFER_MLA backend (#32709) Woosuk Kwon 2026-01-20 10:26:17 -08:00
04a9e064db [Bugfix] fix the ima issue of qwen-vit (#32687) JJJYmmm 2026-01-21 01:21:25 +08:00
c025263ddd [Doc] [ROCm] Update ROCm getting started doc (#32580) TJian 2026-01-21 01:20:08 +08:00
6c97b9b9b6 [Perf] Only clone when needed for moe_permute (#32273) Wentao Ye 2026-01-20 11:34:39 -05:00
4ca62a0dbd [PluggableLayer][1/N] Define PluggableLayer (#32331) whx 2026-01-21 00:19:21 +08:00
7901109ea5 [Bugfix] Fix Off-by-one error in _num_tokens_to_min_blocks calculation (#32603) linhaifeng 2026-01-21 00:13:39 +08:00
13f6630a9e [XPU]Support AgRsAll2AllManager on XPU device (#32654) YiSheng5 2026-01-20 22:27:24 +08:00
fda3f03eb2 [4/N] Initialize MM components in context managers (M-P) (#32663) Cyrus Leung 2026-01-20 22:06:32 +08:00
bb9172030e [Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric (#32661) 杨朱 · Kiki 2026-01-20 20:28:41 +08:00
c4e5bdf61b [Bugfix] Fix the fp8_mqa_logits dim mismatch (#32652) Chauncey 2026-01-20 18:48:07 +08:00
7f1bcd18ff [3/N] Initialize MM components in context managers (I-L) (#32650) Cyrus Leung 2026-01-20 18:21:56 +08:00
8be263c3fb [Core] Cleanup shm based object store on engine shutdown (#32429) Walter Beller-Morales 2026-01-20 03:53:37 -05:00
e1a34c3a5d [2/N] Initialize MM components in context managers (E-H) (#32641) Cyrus Leung 2026-01-20 16:12:56 +08:00
148117ea2e [Refactor] Make FP8 Linear Ops use kernel abstraction (#27814) vllmellm 2026-01-20 14:48:20 +08:00
e9c83cdc51 [Model Runner V2] Skip kernel launch for penalties & logit_bias (#32634) Woosuk Kwon 2026-01-19 22:20:19 -08:00
b75e85dede [1/N] Initialize MM components in context managers (A-D) (#32632) Cyrus Leung 2026-01-20 14:12:42 +08:00
4753f3bf69 [Model] Use context managers for encoder- and LM-only mode (#32605) Cyrus Leung 2026-01-20 11:43:38 +08:00
6c01ffb897 [Model Runner V2] Decouple temperature from penalties (#32629) Woosuk Kwon 2026-01-19 19:13:24 -08:00
7b7cdce968 [Model Runner V2] Refactor get_cudagraph_and_dp_padding (#32625) Woosuk Kwon 2026-01-19 18:25:02 -08:00
12dab78f49 [Feat] allow inplace loading lora (#31326) Jackmin801 2026-01-19 18:15:20 -08:00
05dc4bfab6 [Model Runner V2] Initialized communication buffer for DP (#32624) Woosuk Kwon 2026-01-19 17:27:06 -08:00
1a1fc3bbc0 [Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32615) Matthew Bonanni 2026-01-19 18:41:34 -05:00
43fada5360 [Model Runner V2] Refactor dummy_run (#32533) Woosuk Kwon 2026-01-19 14:50:59 -08:00
4a5299c93f feat: spec decode with draft models (#24322) Tomas Ruiz 2026-01-19 15:05:46 -06:00
73f2a81c75 docs: prefix caching seems quite outdated (#28784) lon 2026-01-19 16:49:52 -03:00
7350331718 [BugFix] Fix TRT-LLM NVFP4 DP/EP (#32349) jiahanc 2026-01-19 11:32:24 -08:00
9d1e611f0e [CI] Add Helion as an optional dependency (#32482) Yanan Cao 2026-01-19 11:09:56 -08:00
0727cc9ecf [BUGFIX] Fix test_mla_backends.py. Scale MLA projection weights to prevent numerical instability (#32529) Vadim Gimpelson 2026-01-19 22:49:29 +04:00
a0490be8f1 [CI][amd] Revert NIXL connector change to avoid crash (#32570) qli88 2026-01-19 12:39:16 -06:00
cd3ac5b797 support dynamic resolution image encoding for Nemotron Nano VL (#32121) Netanel Haber 2026-01-19 20:15:58 +02:00
2636d76257 [Misc] Remove unused ModelKeys (#32608) Jee Jee Li 2026-01-20 01:34:59 +08:00
aa7f37ccfa Add support for LoRA adapters in Nemotron-H models (#30802) danisereb 2026-01-19 16:30:44 +02:00
c88860d759 [Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs (#32577) wang.yuqi 2026-01-19 22:07:46 +08:00
758df5afe7 [NIXL][Metrics] Track nixl_num_kv_expired_reqs metric in Prometheus (#32340) Nicolò Lucchesi 2026-01-19 13:28:27 +01:00
cdd03d25d3 [CI/Build] Fix dependency conflict between model-hosting-container-standards and starlette (#32560) Daniel Mescheder 2026-01-19 12:27:08 +01:00
74c583bc50 [Core] Whisper support torch.compile (#30385) Nicolò Lucchesi 2026-01-19 11:02:31 +01:00
c0a350ca73 [ROCm][CI] Add ROCm attention backend support for EAGLE DP tests (#32363) Andreas Karatzas 2026-01-19 03:57:54 -06:00
71832ba71e [GLM-4.7] GLM Model support for GLM-Lite (#31386) Yuxuan Zhang 2026-01-19 17:18:38 +08:00
11bbf86f6a [CI][Hardware][AMD] Fix test_rotary_embedding_mla_cache_fused (#32408) Matt 2026-01-19 02:25:47 -06:00
3c8740aacb [Frontend] Add render endpoints for prompt preprocessing (#32473) Hyunkyun Moon 2026-01-19 13:21:46 +09:00
7518a3dc65 [CI/Build] Use Common Event Map Fixture in Harmony / MCP Server Tests (#32531) Alex Brooks 2026-01-18 21:05:51 -07:00
976af2f314 [BugFix] Fix embed_input_ids argument error of QwenVLForConditionalGeneration (#32462) honglyua 2026-01-19 11:06:02 +08:00
9a1f16da1e [Model Runner V2] Refactor update_states (#32562) Woosuk Kwon 2026-01-18 17:32:42 -08:00
bb1848cd62 [Model Runner V2] Support VLM (#32546) Woosuk Kwon 2026-01-18 16:58:51 -08:00
6101a26dc9 [BUGFIX] Fix degenerate strides in TRTLLM query tensors for FlashInfer backend. Fixes issue #32353 (#32417) Vadim Gimpelson 2026-01-19 04:57:32 +04:00
f5d1740030 [Bugfix] Add OOT backend option (#32471) Iryna Boiko 2026-01-18 23:20:39 +01:00
eebc58df0c [Refactor] Remove unused cutlass moe problem size function (#32047) Wentao Ye 2026-01-18 15:46:59 -05:00
16de822c71 [Refactor] Remove unused file pallas_kv_cache_update.py (#32433) Wentao Ye 2026-01-18 15:46:39 -05:00
5480c6b1fa [Doc] Correct comment for _jobs dict in OffloadingConnectorWorker (#32556) Deming 2026-01-19 04:46:00 +08:00
ba29ab441e Use the same memory for workspace13 and fused_output. (#31531) Andrey Khalyavin 2026-01-18 22:14:22 +03:00
afc3622602 [CI] Move Distributed Tests from H200 -> H100 (#32555) Robert Shaw 2026-01-18 13:25:23 -05:00
327a02d8db [MoE Refactor] Separate Router into OO Classes (#30623) bnellnm 2026-01-18 11:40:49 -05:00
2f03035a61 "refactor: refactor_repeated_interfaces" (#32486) tjp_zju 2026-01-18 22:07:01 +08:00
38bf2ffb21 [Bugfix] Fix GLM-ASR audio encoder RoPE dim (#32540) Isotr0py 2026-01-18 19:17:59 +08:00
c826c72a96 [Model] Support Step1 Model (#32511) Li Xie 2026-01-18 18:20:46 +08:00

... 25 26 27 28 29 ...