Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

73e2e0118f [Quantization] Improve AWQ logic (#19431) Jee Jee Li 2025-06-12 19:02:11 +08:00
c9280e6346 [Bugfix] Respect num-gpu-blocks-override in v1 (#19503) jmswen 2025-06-12 04:00:23 -07:00
af09b3f0a0 [Bugfix][V1] Allow manual FlashAttention for Blackwell (#19492) Michael Goin 2025-06-12 06:40:24 -04:00
4f6c42fa0a [Security] Prevent new imports of (cloud)pickle (#18018) Russell Bryant 2025-06-12 06:30:17 -04:00
dff680001d Fix typo (#19525) niu_he 2025-06-12 17:24:45 +08:00
2e090bd5df [AMD][Kernel][BugFix] fix test_rocm_compressed_tensors_w8a8 for rocm (#19509) rasmith 2025-06-12 02:14:24 -05:00
1b0b065eb5 [BugFix] Handle missing sep_token for Qwen3-Reranker in Score API (#19522) wonjun Jang 2025-06-12 16:00:47 +09:00
d5bdf899e4 [BugFix] Work-around incremental detokenization edge case error (#19449) Nick Hill 2025-06-11 23:43:20 -07:00
7e3e74c97c [Frontend] Improve error message in tool_choice validation (#19239) 22quinn 2025-06-11 23:13:00 -06:00
3f6341bf7f Add Triton Fused MoE kernel config for E=16 on B200 (#19518) Brayden Zhong 2025-06-12 00:31:51 -04:00
e5d35d62f5 [BugFix] Force registration of w8a8_block_fp8_matmul_deepgemm via lazy import (#19514) Varun Sundar Rabindranath 2025-06-12 00:28:12 -04:00
2f1c19b245 [CI] change spell checker from codespell to typos (#18711) Ning Xie 2025-06-12 10:57:10 +08:00
42f52cc95b [CI/Build] Fix torch nightly CI dependencies (#19505) Richard Zou 2025-06-11 17:40:42 -04:00
97a9465bbc [UX] Add Feedback During CUDAGraph Capture (#19501) Robert Shaw 2025-06-11 14:09:05 -07:00
c7ea0b56cd [AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger (#17331) rasmith 2025-06-11 14:53:28 -05:00
29fa5cac1c [Kernels] Add activation chunking logic to FusedMoEModularKernel (#19168) bnellnm 2025-06-11 12:53:10 -04:00
b2d9be6f7d [Docs] Remove WIP features in V1 guide (#19498) Woosuk Kwon 2025-06-11 09:15:03 -07:00
04a55612dd [Misc] Fix misleading ROCm warning (#19486) Jee Jee Li 2025-06-12 00:12:10 +08:00
89b0f84e17 [doc] fix "Other AI accelerators" getting started page (#19457) David Xia 2025-06-11 12:11:17 -04:00
497a91e9f7 [CI] Update FlashInfer to 0.2.6.post1 (#19297) Michael Goin 2025-06-11 10:57:28 -04:00
943ffa5703 [Bugfix] Update the example code, make it work with the latest lmcache (#19453) runzhen 2025-06-11 05:42:20 -07:00
5c8d34a42c Support no privileged mode on CPU for docker and kubernetes deployments (#19241) Louie Tsai 2025-06-11 04:11:47 -07:00
3c8694eabe Fix some typo (#19475) Ximingwang-09 2025-06-11 18:36:04 +08:00
7484e1fce2 Add cache to cuda get_device_capability (#19436) Michael Goin 2025-06-11 05:37:05 -04:00
a2142f0196 Support non-string values in JSON keys from CLI (#19471) Cyrus Leung 2025-06-11 17:34:04 +08:00
871d6b7c74 [Misc] Reduce warning message introduced in env_override (#19476) Lu Fang 2025-06-11 17:29:54 +08:00
29a38f0352 [Doc] Support "important" and "announcement" admonitions (#19479) Cyrus Leung 2025-06-11 16:39:58 +08:00
a5115f4ff5 [Doc] Fix quantization link titles (#19478) Cyrus Leung 2025-06-11 16:27:22 +08:00
68b4a26149 [Doc] Update V1 User Guide for Hardware and Models (#19474) Cyrus Leung 2025-06-11 15:49:06 +08:00
b8e809a057 [Kernel] Support deep_gemm for linear methods (#19085) artetaout 2025-06-11 15:14:45 +08:00
5039ec2336 [ROCm] Add rules to automatically label ROCm related PRs (#19405) Lu Fang 2025-06-11 15:09:18 +08:00
7c644ab6d5 Fix Typo in Documentation and Function Name (#19442) leopardracer 2025-06-11 08:44:11 +03:00
2d40665fe8 Add fused MOE config for Qwen3 30B A3B on B200 (#19455) Junhao Li 2025-06-11 01:43:46 -04:00
96ada386b7 [Misc] Remove unused MultiModalHasher.hash_prompt_mm_data (#19422) Lukas Geiger 2025-06-11 07:18:57 +02:00
1e473b3010 [CI] Disable failing GGUF model test (#19454) Michael Goin 2025-06-11 01:12:38 -04:00
2b1e2111b0 Fix test_max_model_len in tests/entrypoints/llm/test_generate.py (#19451) Lu Fang 2025-06-11 12:54:59 +08:00
a45b979d9f [BugFix] Fix docker build cpu-dev image error (#19394) niu_he 2025-06-11 11:56:40 +08:00
3952731e8f [New Model]: Support Qwen3 Embedding & Reranker (#19260) wang.yuqi 2025-06-11 11:07:30 +08:00
77f0d465d0 [BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (#19390) Richard Zou 2025-06-10 19:54:41 -04:00
22c3c0aa4a Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 (#19401) Xu Wenqing 2025-06-11 07:23:57 +08:00
33f8dba7c6 [Model] use AutoWeightsLoader for commandr (#19399) py-andy-c 2025-06-10 15:42:21 -07:00
5241ca50d6 [ROCm][V1] Adding ROCm to the list of plaforms using V1 by default (#19440) Gregory Shtrasberg 2025-06-10 18:06:15 -04:00
da9b523ce1 [Docs] Note that alternative structured output backends are supported (#19426) Russell Bryant 2025-06-10 12:20:00 -04:00
b6553be1bc [Misc] Slight improvement of the BNB (#19418) v0.9.1rc2 v0.9.1 Jee Jee Li 2025-06-10 21:51:49 +08:00
64a9af5afa Simplify ep kernels installation (#19412) youkaichao 2025-06-10 20:06:08 +08:00
e4248849ec [BugFix][CPU] Fix CPU CI by ignore collecting test_pixtral (#19411) Li, Jiang 2025-06-10 20:02:40 +08:00
467bef18a3 [BugFix][FlashInfer] Fix attention backend interface mismatch with unexpected keyword use_irope (#19134) Rachel Guo 2025-06-10 01:48:51 -07:00
5f1ac1e1d1 Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404) Isotr0py 2025-06-10 16:30:20 +08:00
9368cc90b2 Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. (#17930) Louie Tsai 2025-06-09 23:22:05 -07:00
32b3946bb4 Add clear documentation around the impact of debugging flag (#19369) Anna Pendleton 2025-06-09 23:16:09 -07:00
6b1391ca7e [Misc] refactor neuron_multimodal and profiling (#19397) Reid 2025-06-10 14:12:42 +08:00
a3f66e75d1 Add security warning to bug report template (#19365) Russell Bryant 2025-06-10 02:06:36 -04:00
319cb1e351 [Core] Batch multi modal input using pinned memory (#19169) Lukas Geiger 2025-06-10 07:44:59 +02:00
1efef71645 [Bugfix] Fix modelscope token passed in (#19389) Li Wang 2025-06-10 13:39:37 +08:00
646d62f636 [Core] Use tuple for kv cache group block ids (#19175) Nick Hill 2025-06-09 22:01:17 -07:00
6cd4ae8acd [Frontend] Add tqdm_leave_pbar to control progress bar visibility (#19357) Reid 2025-06-10 12:55:09 +08:00
c016047ed7 Fix docs/mkdocs/hooks/remove_announcement.py (#19382) Harry Mellor 2025-06-10 05:36:54 +01:00
9af6d22e4c Use xla flag to improve the quantized model performance (#19303) XiongfeiWei 2025-06-09 18:28:45 -07:00
4589b94032 [Bugfix] Fix benchmark_moe.py (#19016) Tianyu Guo 2025-06-10 09:04:36 +08:00
cc867be19c [V1] Reuse V0's memory_profiling util for gpu worker memory profiling (#19312) Ye (Charlotte) Qi 2025-06-09 17:40:01 -07:00
3a7cd627a8 [Misc] Fix a config typo in disable_hybrid_kv_cache_manager configuration (#19383) v0.9.1rc1 Siyuan Liu 2025-06-09 16:41:51 -07:00
8058c91108 [HOT-FIX] Add kv_sharing_target_layer_name argument to cutlass_mla backend (#19374) Pavani Majety 2025-06-09 16:00:07 -07:00
7d44c469fe [TPU]Fix KV cache sharing tests (#19371) Siyuan Liu 2025-06-09 15:38:15 -07:00
31f58be96a [Frontend] Make TIMEOUT_KEEP_ALIVE configurable through env var (#18472) liusiqian-tal 2025-06-10 05:41:21 +08:00
ebb2f383b8 [Quantization] Bump compressed-tensors version (#19295) Kyle Sayers 2025-06-09 17:33:15 -04:00
c1c7dbbeeb [Bugfix][Core] Prevent token lengths exceeding max_model_len in V0 (#19348) 22quinn 2025-06-09 08:01:29 -07:00
5cf2daea9a [Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298) Varun Sundar Rabindranath 2025-06-09 10:50:39 -04:00
b8089195b4 [v1] Add fp32 support to v1 engine through flex attn (#19319) Isotr0py 2025-06-09 22:10:44 +08:00
770e5dcdb8 [full_graph] Fix query_start_loc padding (#19321) Yinghai Lu 2025-06-09 06:32:56 -07:00
c57c9415b1 [Docs] Fix a bullet list in usage/security.md (#19358) Michael Yao 2025-06-09 21:28:51 +08:00
01810f9236 [CI] Introduce rules for llama auto-label (#19323) Lu Fang 2025-06-09 20:05:42 +08:00
59abbd84f9 [Fix] Allow kernel compilation for CUDA capability 8.7 (#19328) Conroy Cheers 2025-06-09 19:57:23 +10:00
95a6568b5c [CI/Build] Fix LoRA test (#19350) Jee Jee Li 2025-06-09 17:52:10 +08:00
0eca5eacd0 [Doc] Fix description in the Automatic Prefix Caching design doc (#19333) Se7en 2025-06-09 17:30:02 +08:00
12e5829221 [doc] improve ci doc (#19307) Reid 2025-06-09 15:26:12 +08:00
3a4d417707 [Misc] Cleanup compilation tests (#19343) Richard Zou 2025-06-09 03:05:44 -04:00
8335667c22 [Frontend] Remove unreachable code from llm.py (#19288) Kseniya Parkhamchuk 2025-06-08 21:22:10 -05:00
e1c4380d4c [Misc] Add documentation update reminder to PR template (#19289) Isotr0py 2025-06-09 10:20:53 +08:00
e31ae3de36 [Deprecation] Remove inputs arg fallback in Engine classes (#18799) Cyrus Leung 2025-06-09 10:19:56 +08:00
2ffb9b6e07 [Bugfix] model_max_length should consider max_model_len in tokenizer_config (#19201) wang.yuqi 2025-06-08 22:17:53 +08:00
cda10fa3e2 [Multi Modal] Add an env var for message queue max chunk bytes (#19242) jennyyyyzhen 2025-06-08 06:39:12 -07:00
c123bc33f9 [Quantization] Add compressed-tensors NVFP4 support (#18312) Dipika Sikka 2025-06-08 06:05:55 -07:00
b9a1791e2c [Hardware][POWER] Add IBM POWER11 Support to CPU Extension Detection (#19082) Akash kaothalkar 2025-06-08 14:47:14 +05:30
989dcee981 Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B (#19315) Xu Wenqing 2025-06-08 16:07:02 +08:00
3d64d366e0 [Misc] Change tests/compile to use VLLM_V1 by default (#19302) Richard Zou 2025-06-08 04:06:48 -04:00
eaa2e51088 [Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299) Richard Zou 2025-06-07 20:56:12 -04:00
d77f7fb871 [Bugfix]: Fix TypeError: 'float' object cannot be interpreted as an integer (#19283) Chauncey 2025-06-08 08:16:31 +08:00
2d8476e465 [BugFix][V1] Fix memory profiling bug (#18974) Luka Govedič 2025-06-07 13:34:51 -04:00
88be823d57 [AMD] Update compatible packaging version (#19309) pramenku 2025-06-07 18:25:09 +05:30
4e4f63ad45 [Nit][Benchmark]Fix example in benchmark_serving_structured_output.py (#19311) Lifans 2025-06-07 03:25:38 -07:00
d2f0e7e615 [CI/Build] Improve Llama GGUF test robustness (#19287) Isotr0py 2025-06-07 17:23:28 +08:00
122cdca5f6 [Misc] refactor context extension (#19246) Reid 2025-06-07 13:13:21 +08:00
cf02f9b283 Add FlexAttention to V1 (#16078) Driss Guessous 2025-06-07 00:58:55 -04:00
c4296b1a27 [CI][PowerPC] Use a more appropriate way to select testcase in tests/models/language/pooling/test_embedding.py (#19253) Aaruni Aggarwal 2025-06-07 09:22:52 +05:30
66c508b137 [TPU][Test] Add script to run benchmark on TPU for buildkite (#19039) QiliangCui 2025-06-06 20:10:24 -07:00
84166fee97 [Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762) ElizaWszola 2025-06-07 03:26:11 +02:00
6e0cd10f72 [Easy][Test] Simplify test_function_tool_use with multiple parametrizes (#19269) Lu Fang 2025-06-07 09:19:09 +08:00
e010688f50 [Build][ROCm] Update Dockerfile.rocm (#19296) Alexei-V-Ivanov-AMD 2025-06-06 18:35:16 -05:00
441b65d8c7 [Misc][Tools][Benchmark] Fix and improve auto tune script (#19163) Chenyaaang 2025-06-06 16:31:19 -07:00
46ecc57973 [BugFix] Fix tpu_model_runner block_id concatenation (#19228) Nick Hill 2025-06-06 16:28:17 -07:00

... 87 88 89 90 91 ...