Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

59f3b93636 [DOC] update v1_guide with INTEL HW (#22679) Chendi.Xue 2025-08-12 03:22:49 -05:00
78077d5417 Move SchedulerConfig from config/__init__.py to config/scheduler.py (#22626) Harry Mellor 2025-08-12 08:23:49 +01:00
6d729c43fb [Bugfix] Fix ModernBert load & Enable sliding window attention for bidirectional attention. (#22637) wang.yuqi 2025-08-12 15:23:17 +08:00
2f4657952b [doc] Update x86 CPU-inference installation doc to reflect optionality of AVX512f (#22707) Sooraj S 2025-08-12 12:51:08 +05:30
3a7e3bbdd2 [Doc] Added unmentioned required option "method" in the usage of EAGLE-3 based models (#21737) Hongsheng Liu 2025-08-12 15:14:51 +08:00
4fbd8bb597 Fix passing SpeculativeConfig from the CLI (#22652) Harry Mellor 2025-08-12 06:13:32 +01:00
ad344ef552 [gpt-oss] Small bug fixes for frontend (#22512) Chen Zhang 2025-08-11 22:04:38 -07:00
bbaf9e9cb1 [gpt-oss] Fix mxfp4 support (#22700) Chen Zhang 2025-08-11 21:22:26 -07:00
4678503476 Migrate MiniCPMVImageInputs to TensorSchema (#21939) Benji Beck 2025-08-11 20:43:37 -07:00
93d0652433 [CI] Increase timeout for test_completion_with_image_embeds (#22670) Michael Goin 2025-08-11 23:31:36 -04:00
ea1292ad3e [CI Failure] Use float32 for tests/entrypoints/openai/test_audio.py (#22686) Michael Goin 2025-08-11 23:20:42 -04:00
dc5e4a653c Upgrade FlashInfer to v0.2.11 (#22613) Po-Han Huang (NVIDIA) 2025-08-12 10:58:41 +08:00
839ab00349 Re-enable Xet on TPU tests now that hf_xet has been updated (#22666) Harry Mellor 2025-08-12 03:54:40 +01:00
9b94d6ec8f Enable 4bit bnb prequant MOE (#21548) Andy Chen 2025-08-11 19:02:14 -07:00
1891a265d3 [gpt-oss] Add test for response API + harmony (but skipped) (#22554) Chen Zhang 2025-08-11 17:47:24 -07:00
95a935fc48 [gpt-oss] Support streaming in response API (#22431) Chen Zhang 2025-08-11 17:46:59 -07:00
458e74eb90 Support more parallel styles in Transformers backend TP (#22651) Harry Mellor 2025-08-11 18:42:48 +01:00
65abe111a3 [CI] Skip Tree Attn Test in test_max_len.py to unblock CI (#22664) TJian 2025-08-11 10:36:05 -07:00
807d21b80d [BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611) 22quinn 2025-08-11 10:31:36 -07:00
c90fb03df5 [CI/Build] Skip Mllama HF runner tests with Transformers v4.55.0 (#22659) Isotr0py 2025-08-12 01:00:58 +08:00
84cf78acee [Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930) wang.yuqi 2025-08-12 00:41:37 +08:00
16fb668b61 fix: NIXL connector transfers partial block to pass full multi-modal context (#21074) GuanLuo 2025-08-11 09:40:55 -07:00
f7dcce7a4a [Feature] Add VLLM_USE_DEEP_GEMM_E8M0 Env to Control E8M0 Scale (#21968) Wentao Ye 2025-08-11 12:39:08 -04:00
8e13d9fe6d [Misc] Further clean up some redundant config definitions (#22649) Isotr0py 2025-08-12 00:22:25 +08:00
3fa5b25845 Document aarch64 CPU support works (#22646) Eric Curtin 2025-08-11 15:22:45 +01:00
14a5d903ab [Model] NemotronH Support (#22349) danielafrimi 2025-08-11 14:09:24 +03:00
951b038298 [Misc] Move jsontree to utils (#22622) Cyrus Leung 2025-08-11 18:49:32 +08:00
ebf7605b0d [Misc] Move tensor schema tests (#22612) Cyrus Leung 2025-08-11 15:15:27 +08:00
bc1d02ac85 [Docs] Add comprehensive CLI reference for all large vllm subcommands (#22601) Harry Mellor 2025-08-11 08:13:33 +01:00
1e55dfa7e5 [BUGFIX] KeyError 'layers.14.mlp.gate.g_idx' for Qwen3-MoE with GPTQ on ROCm (#22017) JartX 2025-08-11 09:13:30 +02:00
384a052971 [Misc] benchmark_moe supports expert parallel (#22251) Jee Jee Li 2025-08-11 15:13:27 +08:00
39052dbca8 Support token_type_ids in V1 with less code changes (#21985) Maximilien de Bayser 2025-08-11 02:54:59 -03:00
9c97a1c349 [ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. (#22521) vllmellm 2025-08-11 13:52:34 +08:00
f919d4cb8f [BugFix] Fix logits repetition penalty cuda check (#22592) Eugene Cheah 2025-08-10 22:52:31 -07:00
afa5b7ca0b [Misc][gpt-oss] guard import when triton kernel when not up to date (#22584) Zhewen Li 2025-08-10 21:29:35 -07:00
1b99028069 [Misc][gpt-oss] Add rules to label gpt-oss related PRs (#22600) Lifans 2025-08-10 19:49:51 -07:00
5898b135ab [BugFix] Fix KVConnectorOutput TPU breakage (#22598) Nick Hill 2025-08-10 19:33:48 -07:00
b799f4b9ea [CI/Build] Fix tensorizer test for load_format change (#22583) 22quinn 2025-08-10 19:30:00 -07:00
06da44f0cb Migrate LlavaImageInputs to TensorSchema (#21770) Benji Beck 2025-08-10 19:29:19 -07:00
a554991748 Migrate LlavaNextVideoPixelInputs to TensorSchema (#21843) Benji Beck 2025-08-10 19:29:16 -07:00
d1af8b7be9 enable Docker-aware precompiled wheel setup (#22106) Doug Smith 2025-08-10 19:29:02 -04:00
68b254d673 Fix TensorSchema validation test for symbolic dims (#22366) Benji Beck 2025-08-10 10:16:44 -07:00
8c50d62f5a Remove redundant row_indices unsqueeze operation in MiniCPMO (#22528) ZiTian Zhao 2025-08-11 00:20:00 +08:00
b4e2916721 Migrate LlavaNextImageInputs to TensorSchema (#21774) Benji Beck 2025-08-10 09:05:21 -07:00
65a7917be4 Fix(benchmarks): allow multiple mm contents in OpenAI Chat Completion Benchmarks (#22534) Breno Baldas Skuk 2025-08-10 18:03:15 +02:00
b76753f0b5 [Bugfix][Kernel] Support partial rotary embedding for MRoPE triton kernel (#22593) Isotr0py 2025-08-11 00:00:36 +08:00
b81fe83b2c [doc] add alibaba cloud as sponsor (#22597) youkaichao 2025-08-10 23:13:47 +08:00
0757551c96 [doc] add beijing meetup links (#22596) youkaichao 2025-08-10 22:51:36 +08:00
8290d15d2c Move CacheConfig from config/__init__.py to config/cache.py (#22586) Harry Mellor 2025-08-10 15:36:40 +01:00
049c245143 [Misc] Replace flaky image urls in pixtral test (#22574) Isotr0py 2025-08-10 21:18:21 +08:00
00976db0c3 [Docs] Fix warnings in docs build (#22588) Harry Mellor 2025-08-10 13:49:51 +01:00
d411df0296 [Misc] Further refine type annotations in parallel state (#22499) Cyrus Leung 2025-08-10 20:49:48 +08:00
010e0e39ea [Doc] Fix API doc link in side navigation (#22585) 22quinn 2025-08-10 01:35:22 -07:00
326976291b [Misc] code clean duplicate set_current_vllm_config in _set_vllm_config (#22566) Ning Xie 2025-08-10 15:08:48 +08:00
7e8d685775 [Minor] Fix pre-commit error on main (#22579) Isotr0py 2025-08-10 15:08:23 +08:00
c49848396d Refactor sliding window configuration to Transformers best practice (#21927) Harry Mellor 2025-08-10 04:50:48 +01:00
2a84fb422f [TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block (#22394) Chengji Yao 2025-08-09 20:49:04 -07:00
534c45b962 Improve fast_topk function with type hints and documentation (#22530) ZiTian Zhao 2025-08-10 11:25:42 +08:00
3d7363e61c [Config] add "qwen" as a native eagle3 target supported model (#22333) Le Chen 2025-08-10 11:21:05 +08:00
0c5254b82a [oss] Init gpt-oss bf16 support (#22508) Jee Jee Li 2025-08-10 11:19:13 +08:00
61f67d8acd [V1] [Hybrid] Enable Full CUDA Graph (decode-only) for Mamba layers (#21401) Thomas Parnell 2025-08-10 05:16:11 +02:00
42172ad18f [FEAT] [Performance] Add triton mrope to replace the torch code path (#22375) TJian 2025-08-09 11:50:03 -07:00
fbd8595c5c [Bugfix] Fix basic models tests hanging due to mm processor creation (#22571) Isotr0py 2025-08-10 02:42:21 +08:00
5a16fa614c [Model] Gemma3n MM (#20495) Nicolò Lucchesi 2025-08-09 18:56:25 +02:00
2d18256e47 Move ParallelConfig from config/__init__.py to config/parallel.py (#22565) Harry Mellor 2025-08-09 16:33:46 +01:00
56186474f6 [Docs] Reduce noise in docs and --help from the JSON tip (#22567) Harry Mellor 2025-08-09 16:31:32 +01:00
1bf5e1f25b [CI] [Hybrid] Speed up hybrid models test by removing large models (#22563) Thomas Parnell 2025-08-09 11:04:42 +02:00
a6022e6fbc GLM-4.5V with new class name at transformers (#22520) Yuxuan Zhang 2025-08-09 15:50:21 +08:00
2be07a0db1 Update docs for Minimax-Text support (#22562) Thomas Parnell 2025-08-09 09:18:18 +02:00
0edc0cd52b [Bugfix] Fix CI moe kernel failure (#22556) Jee Jee Li 2025-08-09 15:03:29 +08:00
7920e9b1c5 [Bugfix] Fix failing GPT-OSS initialization test (#22557) Isotr0py 2025-08-09 15:03:26 +08:00
b7c0942b65 [ROCm][Misc] Rename the context_len to seq_len in ROCm custom paged attention kernel (#22097) Charlie Fu 2025-08-09 01:15:06 -05:00
9a0c5ded5a [TPU] Add support for online w8a8 quantization (#22425) Kyuyeun Kim 2025-08-08 23:12:54 -07:00
10a02535d4 Fix loading of quantized BigCode models (#22463) Eldar Kurtić 2025-08-09 08:12:12 +02:00
65552b476b [Misc] Use config definitions from Transformers library (#21913) Cyrus Leung 2025-08-09 14:10:51 +08:00
7ad7adb67f v1: Pass KVConnectorOutput to scheduler-side (#22157) Or Ozeri 2025-08-09 09:09:51 +03:00
6ade99eafa [V1] [Hybrid] Support Minimax-Text-01 in V1 (#22151) Thomas Parnell 2025-08-09 08:08:48 +02:00
3157aebb63 [Log] Add Warning for Deprecation of DeepGEMM old version (#22194) Wentao Ye 2025-08-09 02:07:48 -04:00
8a0ffd6285 Remove mamba_ssm from vLLM requirements; install inside test container using --no-build-isolation (#22541) Thomas Parnell 2025-08-09 08:05:32 +02:00
23472ff51c [Doc] Add usage of implicit text-only mode (#22561) Roger Wang 2025-08-08 23:04:19 -07:00
08b751ba74 Implicit language-model-only mode via limit-mm-per-prompt (#22299) Roger Wang 2025-08-08 22:21:40 -07:00
429e4e2d42 [Bugfix] Fix ModernBert cuda graph capturing in v1 (#21901) Isotr0py 2025-08-09 13:17:22 +08:00
35afe1b30b [BugFix] [P/D] Handle lookahead token count edge-case with Eagle Spec Decoding and P/D (#22317) Pradyun92 2025-08-08 20:04:15 -04:00
81c57f60a2 [XPU] upgrade torch 2.8 on for XPU (#22300) Kunshang Ji 2025-08-09 08:03:45 +08:00
311d875614 Drop flaky test_healthcheck_response_time (#22539) Russell Bryant 2025-08-08 19:56:47 -04:00
e3edc0a7a8 Extract CompilationConfig from config.py (#22524) Harry Mellor 2025-08-09 00:34:25 +01:00
baece8c3d2 [Frontend] Add unix domain socket support (#18097) yyweiss 2025-08-09 02:23:44 +03:00
2fcf6b27b6 [Docs] fix broken links in metrics.md (#22315) Guy Stone 2025-08-08 19:22:35 -04:00
41b9655751 Skip Qwen 1 in CI because remote code is no longer compatible with Transformers (#22536) Harry Mellor 2025-08-09 00:20:58 +01:00
bd875d2eb7 [Bugfix] Update FA commit hash (#22546) Thomas Parnell 2025-08-09 01:10:25 +02:00
f703b923f3 [Misc] DeepGEMM : Avoid JIT generation in the hot-path (#22215) Varun Sundar Rabindranath 2025-08-08 19:09:59 -04:00
cd9b9de1fb [BugFix] Fix IMA FlashMLA full cuda-graph and DP + Update FlashMLA (#21691) Lucas Wilkinson 2025-08-08 19:09:42 -04:00
fe6d8257a1 [gpt-oss] Support tool call and implement MCP tool server (#22427) Chen Zhang 2025-08-08 15:06:37 -07:00
e290594072 [Docs] Rename “Distributed inference and serving” to “Parallelism & Scaling” (#22466) Ricardo Decal 2025-08-08 12:26:21 -07:00
f756a682d9 [gpt-oss] guard import when triton kernel is not installed (#22529) Yongye Zhu 2025-08-08 11:18:33 -07:00
f0964e29cb [Benchmark] Add benchmark tool for multi turn conversations (#20267) Daniel Serebrenik 2025-08-08 20:28:50 +03:00
e789cad6b8 [gpt-oss] triton kernel mxfp4 (#22421) Yongye Zhu 2025-08-08 08:24:07 -07:00
e5ebeeba53 Remove exception for Python 3.8 typing from linter (#22506) Harry Mellor 2025-08-08 11:06:46 +01:00
7be7f3824a [Docs] Improve API docs (+small tweaks) (#22459) Harry Mellor 2025-08-08 11:02:51 +01:00
ccdae737a0 [BugFix] Don't cancel asyncio tasks directly from destructors (#22476) Nick Hill 2025-08-08 01:13:18 -07:00

... 73 74 75 76 77 ...