Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

cd775bdbe0 [Tests] Replace flaky sleep with polling in test_background_cancel (#32986) 7. Sun 2026-01-24 16:39:07 +00:00
da5e7b12be [MLA] Fuse cat and qaunt for fp8 kv-cache (#32950) Lucas Wilkinson 2026-01-24 09:03:02 -07:00
719ac592ed Update CPU doc according to feedback (#32963) Louie Tsai 2026-01-24 08:02:44 -08:00
1209b784f2 [Bugfix]: resolve torch.compile cache conflict between mm_encoder_tp_modes (#32842) Hiroken. 2026-01-24 22:45:14 +08:00
5fa0f6efa9 [EncoderCacheManager] Remove unnecessary copy (#32800) Lukas Geiger 2026-01-24 14:28:57 +00:00
bc0d291bfe feat: Complete LoRA support for MiniMaxM2 Fixes #32736 (#32763) david guan 2026-01-24 20:48:46 +08:00
9ad7f89f55 [Models]: Make Multimodal config implicit in ViT implementation (#31972) Isotr0py 2026-01-24 20:34:26 +08:00
6450b536a6 [Bugfix] Fix E2E latency calculation and add warmup support in mm_processor benchmark (#32646) Hiroken. 2026-01-24 18:31:41 +08:00
0f19427db5 [Perf] Cache exc.errors() result in validation exception handler (#32984) 7. Sun 2026-01-24 10:01:35 +00:00
51931c5c9a [UX] Deduplicate sampling parameter startup logs (#32953) Cyrus Leung 2026-01-24 17:37:28 +08:00
06b557ecd9 feat(benchmark): add encoder forward pass benchmarking to mm-processor (#31655) Reagan Lee 2026-01-24 00:24:44 -08:00
81c2a889ce [Doc] Ignore typo check on doc (#32999) Roger Wang 2026-01-23 23:52:22 -08:00
8edaf38570 [Models] Add SharedFusedMoE support to Qwen3MoE (#32082) Isotr0py 2026-01-24 15:36:31 +08:00
5c86a89805 [docs] Update governance process links (#32995) Roy Wang 2026-01-24 15:32:44 +08:00
0ccecf8833 [Tests] Standardize RNG seed utility across test files (#32982) 7. Sun 2026-01-24 06:47:14 +00:00
0b9a735e11 [Tests] Clarify pytest skip reasons with actionable context (#32981) 7. Sun 2026-01-24 06:38:50 +00:00
14d03b8ddb [Perf] Cache xpu_get_mem_info() result to avoid duplicate calls (#32983) 7. Sun 2026-01-24 04:56:23 +00:00
d0cbac5827 [Dev UX] Add auto-detection for VLLM_PRECOMPILED_WHEEL_VARIANT during install (#32948) Michael Goin 2026-01-23 22:15:17 -05:00
c0d820457a Auth_token added in documentation as it is required (#32988) ruizcrp 2026-01-24 04:03:05 +01:00
97ef11dd34 [ROCm][ViT] Enable Flash Attention Triton backend on RDNA3/RDNA4 (#32944) monajafi-amd 2026-01-23 19:03:07 -07:00
ecc3dd66cc [Bugfix] Fix FusedMoE LoRA kernel offs_token out of bound value (#32279) Xin Yang 2026-01-23 17:41:35 -08:00
7e1f10d562 [Core][Bugfix] allow graceful worker termination (#32965) Joe Runde 2026-01-23 18:28:45 -07:00
a28b94e6ef [Performance] Split FlashAttn attention and cache update (#25954) ElizaWszola 2026-01-24 02:28:06 +01:00
0118cdcc02 [fix] add VLLM_OBJECT_STORAGE_SHM_BUFFER_NAME to compile factors (#32912) dolpm 2026-01-23 14:53:10 -08:00
d7de043d55 [CI] fix version comparsion and exclusion patterns in upload-release-wheels.sh (#32971) v0.14.1 Shengqi Chen 2026-01-23 14:21:49 -08:00
136c499f6e [CI] fix version comparsion and exclusion patterns in upload-release-wheels.sh (#32971) Shengqi Chen 2026-01-23 14:21:49 -08:00
ebd0a17e0e [Bugfix] Fix missing is_layer_skipped check for FusedMoE in AWQConfig (#32935) joninco 2026-01-23 17:19:56 -05:00
37c9859fab [Refactor] Clean up unused variables & func (#32692) Wentao Ye 2026-01-23 17:04:25 -05:00
4561f13985 [Refactor] Rename gptq_marlin to marlin to match MoE (#32952) Michael Goin 2026-01-23 16:48:12 -05:00
6cc6d92be5 [CI][AMD][BugFix] Update wvSplitK (and other skinny_gemm wrappers) to ensure tensors passed will be made contiguous for the kernel (#32831) rasmith 2026-01-23 15:35:48 -06:00
dfab5f3764 [Bug] Fix benchmark script moe_permute_unpermute (#32949) Wentao Ye 2026-01-23 16:18:56 -05:00
586a57ad7e fix: Add glm4_moe_lite to MLA detection (#32614) Markus / Mark 2026-01-23 21:38:57 +01:00
3a41459501 [cudagraphs] Refactor cudagraph capture loop (#32946) Lucas Wilkinson 2026-01-23 13:22:20 -07:00
8518b30447 [Model Runner V2] Add KV Connector support (#32742) Nick Hill 2026-01-23 10:49:17 -08:00
2d6b537157 [Bugfix][CI] Fix pre-commit (#32956) Matthew Bonanni 2026-01-23 13:26:56 -05:00
68b0a6c1ba [CI][torch nightlies] Use main Dockerfile with flags for nightly torch tests (#30443) Orion Reblitz-Richardson 2026-01-23 08:22:56 -10:00
5206e5e28c [V1][Hybrid] Mamba Prefix Caching with align mode (#30877) Harry Huang 2026-01-24 01:56:48 +08:00
fec9da0af4 [Model] Enable LoRA support for internvl2 (#32397) Matteo Fari 2026-01-23 18:39:01 +01:00
bbbd696af9 [torch.compile][CI] Add back attn fusion on hopper/ada (#32940) Luka Govedič 2026-01-23 11:49:20 -05:00
9b77bb790d [Frontend] add logprob, compression_rate to 'verbose_json' features (#31059) sangbumlikeagod 2026-01-24 01:35:13 +09:00
305e53ade8 [Hardware][AMD][CI][Bugfix] Fix Kernels Attention Cache test (#32904) Matt 2026-01-23 10:24:26 -06:00
1cb4341fbc [ROCm][PD] Remove unused moriio connector proxy code (#32939) Mark McLoughlin 2026-01-23 15:59:04 +00:00
1fb648bf10 [Bugfix] Fix FP8 MoE EP Weight Loading for ModelOpt Llama4 (#32886) baonudesifeizhai 2026-01-23 10:31:48 -05:00
7e22309755 [Misc] Postpone torch_profiler deprecation (#32867) Nicolò Lucchesi 2026-01-23 15:39:48 +01:00
90c2007932 [Bugfix] Disable tma_aligned_scales in test_fusions_e2e (#32916) Xin Yang 2026-01-23 06:34:30 -08:00
d95d650762 [Bugfix] Fix getting vision features in Transformer Multimodal backend (#32933) Raushan Turganbay 2026-01-23 14:34:48 +01:00
13d8746c54 [Feature]: Remove DtoH Copy for lfm2_vl On Default Stream (#32815) tianshu-Michael-yu 2026-01-23 05:20:30 -08:00
10e94c84f6 [CPU][Feat] Update PyTorch to v2.10 for CPU Backend (#32869) Fadi Arafeh 2026-01-23 13:13:06 +00:00
243e78c20f [Benchmark][Bugfix] Fix race condtion when starting server for sweep benchmark (#32927) Isotr0py 2026-01-23 20:11:18 +08:00
aac0b817fa [CPU Backend][BugFix] Fix failing CPU MoE test (#32876) Fadi Arafeh 2026-01-23 12:06:51 +00:00
05f3d714db [Frontend][3/n] Make pooling entrypoints request schema consensus | EmbedRequest & ClassifyRequest (#32905) wang.yuqi 2026-01-23 20:03:44 +08:00
3f3f89529d [Voxtral] Add new streaming arch (#32861) Patrick von Platen 2026-01-23 12:41:52 +01:00
4dc11b06d3 [Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789) Nicolò Lucchesi 2026-01-22 11:50:37 +01:00
2bd95d803a [Misc] Bump opencv-python dependecy version to 4.13 (#32668) Isotr0py 2026-01-22 23:51:15 +08:00
f46d576c54 [Misc] Replace urllib's urlparse with urllib3's parse_url (#32746) Isotr0py 2026-01-22 16:37:15 +08:00
5da4c7d789 [CI/Build][CPU] Fix failed pooling tests and macos smoke test (#32907) Li, Jiang 2026-01-23 18:48:20 +08:00
160c6fa387 [Misc] Add get_name to missing AttentionBackends (#32698) Nicolò Lucchesi 2026-01-23 11:35:44 +01:00
a8eb1182f1 [CI][Models] Add VLM Support for Sequence Classification Conversion (#32885) Andreas Karatzas 2026-01-23 02:22:51 -06:00
fa6e599a61 [Bugfix] Fix _CPU_MOE_ACT AssertionError when vLLM config not set (#32777) Karan Bansal 2026-01-23 13:52:37 +05:30
7ef5873752 [CI] Fix mypy for vllm/v1/structured_output (#32722) Wentao Ye 2026-01-22 22:55:51 -05:00
5e4e0e51f4 [torch.compile] Compile CustomOp.forward_native for SiluAndMul and QuantFP8 to avoid raw torch ops inside opaque custom ops (#32806) Luka Govedič 2026-01-22 22:52:26 -05:00
f61c9da711 [BugFix] deepseek_v32_encoding: Replace asserts with proper exceptions (#32884) Rishabh Saini 2026-01-22 22:44:11 -05:00
7fe255889e [Misc] Log vLLM logo when starting server (#32796) Nick Hill 2026-01-22 19:15:12 -08:00
dc917cceb8 [MoE Refactor] Move select_experts from FusedMoEQuantMethod -> FusedMoE (#31996) bnellnm 2026-01-22 18:21:35 -05:00
fc56f4a071 [BugFix] Fix invalid flashinfer_fused_moe_blockscale_fp8 op registration (#32855) Fadi Arafeh 2026-01-22 22:27:40 +00:00
d08b356ee0 [Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper (#32619) Xin Yang 2026-01-22 12:47:04 -08:00
f744810184 [Refactor] Remove unused tpu files (#32610) Wentao Ye 2026-01-22 15:35:18 -05:00
44f08af3a7 Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141) Eldar Kurtić 2026-01-22 21:29:57 +01:00
955b43a5a5 [Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 (#32795) Matthew Bonanni 2026-01-22 14:05:18 -05:00
744ef30484 [CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm (#32792) Fadi Arafeh 2026-01-22 18:55:23 +00:00
300622e609 [CI][Attention] Add more CI dependencies for attention tests (#32487) Matthew Bonanni 2026-01-22 13:44:56 -05:00
69d09fdd6c [Feature] Add --ssl-ciphers CLI argument for TLS cipher control (#30937) RickyChen / 陳昭儒 2026-01-23 01:53:24 +08:00
3a63be0faa Support custom URI schemes and trace handlers for profiler (#32393) David Ramon Prados 2026-01-22 12:45:40 -05:00
803e3f3f68 [UX] Default api_server_count to dp_size if not specified (#32525) Tyler Michael Smith 2026-01-22 12:35:35 -05:00
70917b1c55 [MISC] Add .cursor to .gitignore (#32868) Vadim Gimpelson 2026-01-22 21:27:13 +04:00
c517d8c934 [Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars (#32837) Matt 2026-01-22 10:59:15 -06:00
fc37187a51 [Bugfix] ModelScope is supported when downloading LORA models. (#32844) Xu Jinyang 2026-01-23 00:33:21 +08:00
ff365eea94 Support bge-m3 sparse embeddings and colbert embeddings (#14526) Maximilien de Bayser 2026-01-22 12:52:57 -03:00
444e2e7e1f [Misc] Bump opencv-python dependecy version to 4.13 (#32668) Isotr0py 2026-01-22 23:51:15 +08:00
bc14663e6a [Cleanup] Move scheduler get_routed_experts logic to separate method (#32706) Nick Hill 2026-01-22 07:46:00 -08:00
654a71fc3c [torch.compile] Improve Cold Start for MoEs (#32805) Richard Zou 2026-01-22 10:44:40 -05:00
15e302dfce [Misc][BE] Turn on strict type coverage for vllm/compilation (#31756) Lucas Kabela 2026-01-22 07:12:26 -08:00
d117a4d1a9 [Frontend] Introduce Renderer for processing chat messages (using ModelConfig) (#30200) Cyrus Leung 2026-01-22 20:44:22 +08:00
421012b63a OffloadingConnector: Support kernel_block_size != block_size (#30692) Or Ozeri 2026-01-22 14:30:04 +02:00
841d53aaa8 [Frontend] add prompt_cache_key for openresponses (#32824) Chauncey 2026-01-22 19:34:14 +08:00
1752262e96 [CI] refactor release pipeline config into groups (#32833) Shengqi Chen 2026-01-22 03:27:21 -08:00
ea6102b85d [Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789) Nicolò Lucchesi 2026-01-22 11:50:37 +01:00
328cbb2773 [Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest (#32574) wang.yuqi 2026-01-22 18:32:44 +08:00
64e3d67ac0 Enable Cross layers KV cache layout at NIXL Connector (#30207) liranschour 2026-01-22 12:12:58 +02:00
098b2d66fe [Benchmark] Don't default to temperature==0 in vllm bench serve (#32723) Nick Hill 2026-01-22 02:03:15 -08:00
8ebf271bb6 [Misc] Replace urllib's urlparse with urllib3's parse_url (#32746) Isotr0py 2026-01-22 16:37:15 +08:00
49a1262267 [AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664) Alex Sun 2026-01-22 16:33:18 +08:00
2b8a38b6d6 [Model] Extend collect_children and no_init_weights contexts (#32757) Cyrus Leung 2026-01-22 16:20:27 +08:00
1bf1a34b19 [bench] add start_times field to vllm bench serve json result (#32667) Kebe 2026-01-22 16:10:14 +09:00
a810299838 [ROCm][CI][Docs] Add comment explaining TRITON_ATTN fallback for ROCm (#32835) Andreas Karatzas 2026-01-22 00:11:09 -06:00
eb1629da24 [ROCm][CI] Fix AITER test flakiness by using explicit attention backend (#32346) Andreas Karatzas 2026-01-21 23:55:25 -06:00
019e2c3b7c [ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization (#32731) Micah Williamson 2026-01-21 23:47:33 -06:00
f5fdec8ce2 Upgrade transformers-4.57.5 (#32287) Huy Do 2026-01-21 21:19:19 -08:00
1579c9b5fd [Llama.py -> mistral.py] Extract mistral-only relevant code into separate file (#32780) Patrick von Platen 2026-01-22 06:14:57 +01:00
889722f3bf [FlashMLA] Update FlashMLA to expose new arguments (#32810) Lucas Wilkinson 2026-01-21 22:02:39 -07:00

... 24 25 26 27 28 ...