Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

a69693e38f Migrate Qwen inputs to TensorSchema (#23473) Benji Beck 2025-08-27 19:43:26 -07:00
5da4f5d857 [Bugfix] Fix for V1 priority scheduling crashes at preemption (#23713) Hanchenli 2025-08-27 17:44:52 -07:00
321938e9ac [Feature] Add VLLM_DISABLE_PAD_FOR_CUDAGRAPH to Avoid Hang Issue (#23595) Wentao Ye 2025-08-27 17:52:24 -04:00
f9ca2b40a0 [Bugfix] Fix Marlin NVFP4 for modelopt (#23659) Michael Goin 2025-08-27 17:48:16 -04:00
afe23a2990 use abosolute path yewentao256 2025-08-27 21:44:27 +00:00
e92676ef4e update for fp8 yewentao256 2025-08-27 21:36:03 +00:00
082cc07ef8 DP/EP Support for gpt-oss with deepep-ht comm kernel on SM100 (#23608) Yongye Zhu 2025-08-27 17:33:21 -04:00
57f2f26a05 update directory for cutlass w8a8 yewentao256 2025-08-27 21:05:41 +00:00
853c371fc3 [V1][Mamba] - Enable V1 by default for Mamba Models (#23650) Asaf Joseph Gardin 2025-08-27 23:53:30 +03:00
c643e63f98 Merge branch 'main' into wye-refactor-quant-folder yewentao256 2025-08-27 20:29:14 +00:00
8bf6266a17 [Multimodal] Generate mm_hash based on request metadata when caching is turned off (#23690) Roger Wang 2025-08-27 13:24:31 -07:00
0585a9e73c Disable torch.compile for dynamic rope models in Transformers backend (#23738) Harry Mellor 2025-08-27 20:03:05 +01:00
3c0ef769ba ci: Add arm64 docker build to release pipeline (#23210) Eli Uriegas 2025-08-27 10:41:48 -07:00
4e4d017b6f [Docs] Fix warnings in mkdocs build (continued) (#23743) Hyogeun Oh (오효근) 2025-08-28 02:17:29 +09:00
dd58932280 [V1] [Hybrid] Enable compile and piecewise CUDA graph for MiniMax-Text models (#22589) Thomas Parnell 2025-08-27 19:05:16 +02:00
52883ed084 [Model] Merge SupportsMultiModalWithRawInput with SupportsMultiModal (#23749) Cyrus Leung 2025-08-28 01:01:50 +08:00
4f35be10a9 [BugFix] Fix topk_softmax assert (#19764) Luka Govedič 2025-08-27 12:47:28 -04:00
2b61d2e22f [Docs] Remove in-tree Gaudi install instructions (#23628) Harry Mellor 2025-08-27 17:22:21 +01:00
3ce8285d6d [LogitsProcs] Deduplicate built-in LP implementation logic (#23362) Nick Hill 2025-08-27 08:11:33 -07:00
83f555f637 [Doc]: upgrade version of crate-ci tool for improved typo detection (#23755) Didier Durand 2025-08-27 16:59:34 +02:00
841490434a [Model] Enable native HF format InternVL support (#23742) Isotr0py 2025-08-27 22:45:17 +08:00
3af47c3cc6 [Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt (#23666) Wentao Ye 2025-08-27 10:09:08 -04:00
513c1fe255 Only run get_attr_docs if generating help text (#23723) Harry Mellor 2025-08-27 14:55:12 +01:00
fe8d7b6f03 [Model] Interface to enable batch-level DP support (#23733) Cyrus Leung 2025-08-27 21:41:22 +08:00
16dc4052b0 Fix pre-commit on main (#23747) Harry Mellor 2025-08-27 14:39:48 +01:00
8dd2baa597 Add vLLM Korea Meetup in the README.md and meetups.md (#23746) rebel-hongseok 2025-08-27 22:25:49 +09:00
5eeef1b908 [Model] Explicit default_pooling_type interface (#23736) Cyrus Leung 2025-08-27 21:24:09 +08:00
704432af3c [V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-based models (#23716) Thomas Parnell 2025-08-27 14:51:54 +02:00
a403d0fa41 [Misc] Remove unnecessary _send_reconfig_message() in core_client.py (#23127) Nick Hill 2025-08-27 05:50:47 -07:00
8c13820f0b [Bugfix] Fix task field initialization when PYTHONOPTIMIZE is enabled (#23718) cndoit18 2025-08-27 20:42:20 +08:00
9d30de4469 [model] Support MiniCPM-V 4.5 (#23586) tc-mb 2025-08-27 20:38:00 +08:00
1f7a9c95e4 [Docs] Fix a 1-2-3 list and style issues in tpu.md (#23729) Michael Yao 2025-08-27 20:37:52 +08:00
8f0d7eaea8 [XPU] Fix OOM issue for data parallel with Ray backend (#22500) Fanli Lin 2025-08-27 19:57:38 +08:00
e03940762b [CI/Build] Reduce LoRA layer test cases (#23721) Jee Jee Li 2025-08-27 18:59:35 +08:00
11eddf02f0 [FlashInfer] Cache hyper params in metadata builder (#23732) Woosuk Kwon 2025-08-27 03:45:04 -07:00
04ff1e43fb [Misc] Move CpuGpuBuffer to vllm/v1/utils.py (#23728) Woosuk Kwon 2025-08-27 03:25:00 -07:00
6578e87365 Optimize input preparation for FlashInfer [2/N] (#23174) Woosuk Kwon 2025-08-27 02:52:45 -07:00
5bd9f84158 [Docs] Fix an admonition important (#23726) Michael Yao 2025-08-27 17:50:09 +08:00
91e382c935 [CI/Build] Remove redundant register in model init tests (#23715) Cyrus Leung 2025-08-27 16:11:15 +08:00
6446677839 [XPU]fix cuda event used in XPU model runner (#23708) Kunshang Ji 2025-08-27 15:27:14 +08:00
69244e67e6 [Core] Use key-only cache for BaseMultiModalProcessor (#23018) Cyrus Leung 2025-08-27 14:19:13 +08:00
8dbf6ed7be [Bugfix] fix when config.yaml config value is list parse error (#23528) rongfu.leng 2025-08-27 13:54:39 +08:00
9de25c294b [CI/Build] Remove redundant LoRA model tests (#23706) Jee Jee Li 2025-08-27 13:51:50 +08:00
fce10dbed5 [XPU] Add xpu torch.compile support (#22609) Kunshang Ji 2025-08-27 13:33:27 +08:00
d272415e57 [Quantization] Expand compressed-tensors MoE matching logic to support NFP4 + FP8 MoEs (#22674) Dipika Sikka 2025-08-27 01:00:21 -04:00
142ac08030 [Frontend] Optimize beam search performance by limiting concurrency (#23599) Chen Zhang 2025-08-26 21:59:14 -07:00
3210264421 [Frontend] Add --log-error-stack to print stack trace for error response (#22960) Chen Zhang 2025-08-26 21:58:59 -07:00
644d57d531 [Model] Add Ernie4.5 VL Model Support (#22514) CSWYF3634076 2025-08-27 12:02:55 +08:00
c905684cfe [Core] Asynchronous h2d in merge_multimodal_embeddings via pinned memory. (#23686) Chenheli Hua 2025-08-26 20:05:34 -07:00
786835807b [Bugfix]: Qwen3 Coder Tool Parser (#23099) Yiheng Xu 2025-08-27 10:58:32 +08:00
fecbb7c782 [Bugfix][gpt-oss] passing the cache config in gpt-oss (#23613) Wei 2025-08-26 19:54:23 -07:00
6dab89b8ec [Docs] Fix math rendering in docs (#23676) Harry Mellor 2025-08-27 02:47:08 +01:00
de02b07db4 [Bugfix] Lazy import gpt_oss_triton_kernels_moe for mxfp4 (#23678) Michael Goin 2025-08-26 21:34:57 -04:00
eb1995167e [gpt-oss] Enable unit test for response API harmony integration (#23533) Chen Zhang 2025-08-26 18:23:26 -07:00
2c2b140ae8 [quantization] use channel scales for w4a8 + misc fixes (#23570) czhu-cohere 2025-08-26 21:23:23 -04:00
c7c80af084 fix pynccl reduce_scatter (#23648) yzds 2025-08-27 09:21:11 +08:00
6891205b16 [Feature][Responses API] Support MCP tool in background mode (#23494) wuhang 2025-08-27 09:06:58 +08:00
b1625dbe9c feat: add triton fused moe config for GLM-4.5-Air-FP8 on B200 (#23695) zixuanzhang226 2025-08-26 18:06:10 -07:00
585e0bde36 [Bugfix] UnboundLocalError when GptOss reasoning specified (#23054) Federico 2025-08-27 02:29:52 +02:00
714872f1a9 [Compile] Fix Cmake Warning (#23689) Wentao Ye 2025-08-26 19:48:32 -04:00
5f1af97f86 [V1] [Hybrid] Enable Full CUDA graph by default for hybrid models in V1 (#22594) Thomas Parnell 2025-08-27 01:28:55 +02:00
c3b0fd1ee6 [V1][P/D]P2pNcclConnector supports flashinfer (#23536) Zhonghua Deng 2025-08-27 06:56:16 +08:00
6421b66bf4 [Docs] Move quant supported hardware table to README (#23663) Harry Mellor 2025-08-26 23:26:46 +01:00
2f13319f47 Enhance the pre-notification policy (#23532) Huzaifa Sidhpurwala 2025-08-27 00:41:36 +04:00
d696f86e7b [doc] Hybrid KV Cache Manager design doc (#22688) Chen Zhang 2025-08-26 13:19:05 -07:00
9816b81f5f [Model] Enable video support for InternVL3.5 models (#23658) Isotr0py 2025-08-27 03:46:52 +08:00
c37c0af990 [Misc] Fix comments in tests/kernels/quantization (#23675) Jiangyun Zhu 2025-08-27 03:31:20 +08:00
9715f7bb0f [Bugfix] Fix incorrect original shape in hashing (#23672) Cyrus Leung 2025-08-27 03:01:25 +08:00
98aa16ff41 [v1] Add cross-attention KV cache support for encoder-decoder models (#23664) Russell Bryant 2025-08-26 14:49:06 -04:00
227e231b55 [Docs] [V1] [Hybrid] Update docs to remove FlashInfer constraint for hybrid models (#23665) Thomas Parnell 2025-08-26 20:33:16 +02:00
730d0ac8b9 [Docs] Fix warnings in mkdocs build (#23649) Hyogeun Oh (오효근) 2025-08-27 03:19:23 +09:00
9b0187003e [Bugfix] Fix cuda event usage with CPU model runner (#23643) Li, Jiang 2025-08-27 01:10:42 +08:00
44ac25eae2 [CI] [Doc]: Add GH Action for auto labeling issues with rocm tag (#20988) vllmellm 2025-08-27 00:20:13 +08:00
7ea22e42d5 [Misc] Add override for allreduce fusion thresholds (#23639) nvjullin 2025-08-26 23:53:04 +08:00
9d4183dd2e [model] support qwen2audio embedding input (#23625) Yuekai Zhang 2025-08-26 23:48:08 +08:00
513298f1b4 [Bugfix] fix bf16 multimodal model hash (#23623) Yuekai Zhang 2025-08-26 23:47:50 +08:00
379f828fba [Docs] Reduce requirements for docs build (#23651) Harry Mellor 2025-08-26 16:43:28 +01:00
1fdc732419 [ROCm] Starting to add AMD code reviewers for ROCm components (#23496) Hongxia Yang 2025-08-26 10:32:37 -04:00
f58675bfb3 [CPU] add cpu fused moe pytorch native implementation (#23146) TianyuLi0 2025-08-26 22:09:17 +08:00
7c04779afa [Doc]: fix various spelling issues in multiple files (#23636) Didier Durand 2025-08-26 16:05:29 +02:00
f66673a39d [Kernel] Added flashinfer fp8 per-tensor gemms (#22895) nvjullin 2025-08-26 21:54:04 +08:00
b78bed1bc5 [Hardware][Mac] Fix the installation fail for Apple Silicon (CPU) (#23565) En Ouyang 2025-08-26 21:04:25 +08:00
164b2273c8 [Docs] Fix broken links to docs/api/summary.md (#23637) Harry Mellor 2025-08-26 14:00:18 +01:00
2b4fc9bd9b Support FlashAttention Backend for Hybrid SSM Models (#23299) Chen Zhang 2025-08-26 05:41:52 -07:00
ebd5a77bb5 feat: add usage to TranscriptionResponse (text and json response_format) (#23576) Guillaume Calmettes 2025-08-26 14:26:26 +02:00
384dd1b0a8 [Bugfix] Add missing enable_log_outputs parameter to init_app_state function (#23634) Matúš Námešný 2025-08-26 14:13:15 +02:00
fdeb3dac13 [Model] fix DeepSeek e_score_correction_bias dtype to fp32 (#23640) Jee Jee Li 2025-08-26 20:09:47 +08:00
d52358c1e0 [Perf] Remove duplicated NVFP4 blockscales to save memory (#23379) Michael Goin 2025-08-26 07:16:33 -04:00
6ace2f72b0 Fix writing benchmark results with tuple keys (#23633) Huy Do 2025-08-26 04:16:09 -07:00
b00e69f8ca Fix nits from #20059 (#23548) Harry Mellor 2025-08-26 11:27:20 +01:00
50fede6634 [V1] Enable V1 for compute capability < 8.0 + FP32 (#23614) Cyrus Leung 2025-08-26 18:00:18 +08:00
b5d34af328 [Bugfix] Fix scheduling when repeated images in one request (#23544) Roger Wang 2025-08-26 02:46:28 -07:00
9b5f64238f [Bugfix] Fix Qwen25VL packed_modules_mapping (#23604) Jee Jee Li 2025-08-26 16:09:14 +08:00
ff77764f86 Fix CLI parameter documentation inconsistency in pooling_models.md (#23630) Raghavan 2025-08-26 13:35:37 +05:30
bfc1edc9f5 [Docs] Fix titles for multi-file examples that are rendered in the docs (#23573) Harry Mellor 2025-08-26 08:16:44 +01:00
3ecbb14b81 [Benchmarks] add benchmark for embedding models (#23000) Jiangyun Zhu 2025-08-26 14:57:08 +08:00
7d67a9d9f9 [mypy] Fix incorrect type hint for EAGLE3 support (#23617) Cyrus Leung 2025-08-26 14:50:17 +08:00
959783fb99 [fix] fix seed-oss-parser (#23560) Bin Jia 2025-08-26 14:16:36 +08:00
ce0e9dbd43 [CI/Build] Fix typo in #23561 (#23616) Cyrus Leung 2025-08-26 14:13:03 +08:00
b395b3b0a3 [Disagg][Perf] Use CUDA event sync instead of blocking tolist to avoid unintentional copy ops blocking across different CUDA streams, improving disagg TTIT/TTFT (#22760) Zijing Liu 2025-08-25 21:06:00 -07:00

... 68 69 70 71 72 ...