Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

ffbcc9e757 [BugFix] Fix VllmConfig() construction on all platforms (#20695) Nick Hill 2025-07-10 08:00:20 +01:00
59389c927b [BugFix][CPU] Fix CPU worker dependency on cumem_allocator (#20696) Nick Hill 2025-07-10 07:24:20 +01:00
8f2720def9 [Frontend] Support Tool Calling with both tool_choice='required' and $defs. (#20629) Chauncey 2025-07-10 13:56:35 +08:00
ad6c2e1a0b Correct PPMissingLayer handling in Deepseek-V2-Lite PP deployment (#20665) Seiji Eicher 2025-07-09 20:34:40 -07:00
49e8c7ea25 Use NVCC --compress-mode to reduce binary size by 30% (#20694) Michael Goin 2025-07-10 10:26:48 +09:00
805d62ca88 [Misc] DP : Add ExpertTokensMetadata (#20332) Varun Sundar Rabindranath 2025-07-09 20:33:14 -04:00
b7d9e9416f [CI/Build] Fix FlashInfer double build in Dockerfile (#20651) Michael Goin 2025-07-10 08:41:56 +09:00
7c12a765aa [Misc] Simplify the prefix caching logic on draft tokens (#20701) Woosuk Kwon 2025-07-09 14:48:35 -07:00
cd587c93ef [BugFix]: Properly set engine_id when using multi connector (#19487) Yiming 2025-07-10 04:32:44 +08:00
332d4cb17b [Feature][Quantization] MXFP4 support for MOE models (#17888) fxmarty-amd 2025-07-09 22:19:02 +02:00
bf03ff3575 [Kernel] Add Conch backend for mixed-precision linear layer (#19818) Jacob Manning 2025-07-09 16:17:55 -04:00
47043eb678 [Kernel] Triton implementation of causal-conv1d for Mamba-based models (#18218) Tuan, Hoang-Trong 2025-07-09 15:53:55 -04:00
31b96d1c64 Support Llama 4 for cutlass_moe_fp4 (#20453) Michael Goin 2025-07-10 04:53:38 +09:00
e59ba9e142 [CI/Build] Enlarge tolerance for a CPU multi-modal test (#20684) Li, Jiang 2025-07-10 01:48:52 +08:00
403b481573 Remove heading form installation inc.md file (#20697) Harry Mellor 2025-07-09 18:42:51 +01:00
138709f8d1 [Doc] Update CPU doc (#20676) Li, Jiang 2025-07-10 01:28:30 +08:00
0bbac1c1b4 [Bench] Add NVFP4 GEMM benchmark script (#20578) Michael Goin 2025-07-10 02:23:48 +09:00
a3e4e85ece [XPU][CI] enhance xpu test support (#20652) Liangliang Ma 2025-07-10 00:53:09 +08:00
eb58f5953d [TPU][Bugfix] fix test_pallas (#20666) Chengji Yao 2025-07-09 09:32:48 -07:00
4ac9c33f78 [Bugfix] Fix handling of Tensorizer arguments for LoadConfig (#20643) Sanger Steel 2025-07-09 11:36:37 -04:00
efe73d0575 [doc] update doc format (#20673) Reid 2025-07-09 23:08:19 +08:00
853487bc1b [Docs] Improve docs for RLHF co-location example (#20599) Ricardo Decal 2025-07-09 08:06:43 -07:00
9ff2af6d2b [Benchmark] Parameterization of streaming loading of multimodal datasets (#20528) Li Wang 2025-07-09 21:35:16 +08:00
70ca5484f5 [Doc] Update notes (#20668) Cyrus Leung 2025-07-09 18:46:36 +08:00
5358cce5ff [V1] [Doc] Update V1 docs for Mamba models (#20499) Thomas Parnell 2025-07-09 10:02:41 +02:00
2155e95ef1 [Bugfix] Fix the issue where reasoning_content is None when Thinkng is enabled and tool_choice is set to 'required'. (#20662) Chauncey 2025-07-09 15:39:58 +08:00
f95570a52d [Docs] fix minimax tool_calling docs error (#20667) qscqesze 2025-07-09 15:37:07 +08:00
b6e7e3d58f [Intel GPU] support ray as distributed executor backend for XPU. (#20659) Kunshang Ji 2025-07-09 15:36:58 +08:00
e760fcef22 [XPU] Use spawn with XPU multiprocessing (#20649) Dmitry Rogozhkin 2025-07-09 00:34:28 -07:00
6bbf1795b7 [Misc] Fix the size of batched_dummy_mm_inputs in profile_run (#20434) B-201 2025-07-09 11:15:44 +08:00
9e0ef888f0 Fix bullets in incremental_build.md (#20642) Michael Goin 2025-07-09 12:03:41 +09:00
97abeb1daa [feat] enable SM100 CUTLASS block scaled group gemm for smaller batch sizes (#20640) Duncan Moss 2025-07-08 20:03:35 -07:00
34dad19e7b [Bugfix] set default set cuda_graph_sizes to min(self.max_num_seqs * 2, 512) (#20628) zhrrr 2025-07-09 11:02:51 +08:00
6db31e7a27 [Hardware][PPC64LE] Enable V1 for ppc64le and ARM (#20554) Akash kaothalkar 2025-07-09 08:30:41 +05:30
977180c912 [Docs] Improve documentation for multi-node service helper script (#20600) Ricardo Decal 2025-07-08 19:44:26 -07:00
c40784c794 [BugFix][Intel GPU] Use refactored API for dist_backend in V1 worker (#20596) Ratnam Parikh 2025-07-08 19:44:23 -07:00
baed180aa0 [tech debt] Revisit lora request model checker (#20636) kourosh hakhamaneshi 2025-07-08 18:42:41 -07:00
0b407479ef [misc]refactor Platform.set_device method (#20262) Kunshang Ji 2025-07-09 09:39:47 +08:00
5eaf570050 Replace multiply_add with homogeneous_multiply_add to Address Clang Template Parameter Issue (#20142) Wenxin Cheng 2025-07-08 17:30:18 -07:00
d8ee5a2ca4 [TPU][Bugfix] disable phi-3 test (#20632) QiliangCui 2025-07-08 16:14:26 -07:00
b9fca83256 [Bugfix] Fix GLM-4.1-V video prompt update (#20635) Isotr0py 2025-07-09 07:13:58 +08:00
32dffc2772 [Core] Rename get_max_tokens_per_item for backward compatibility (#20630) Cyrus Leung 2025-07-09 07:11:30 +08:00
c438183e99 [Bugfix] Fix topk_ids indices_type for CUTLASS w8a8 FP8 MoE (#20166) Ming Yang 2025-07-08 16:10:57 -07:00
baba0389f7 [CI] Increase the threshold of the MTEB RERANK tests (#20615) wang.yuqi 2025-07-08 23:10:11 +08:00
c6c22f16d3 Revert invalid spellchecker fix on deepseek_vl2 (#20618) viravera 2025-07-08 08:07:14 -07:00
dd382e0fe3 [Model] Implement missing get_language_model for Keye-VL (#20631) Cyrus Leung 2025-07-08 22:47:46 +08:00
849590a2a7 Update torch/xla pin to 20250703 (#20589) XiongfeiWei 2025-07-08 07:44:02 -07:00
a4c23314c0 [xpu]feat: support multi-lora on xpu (#20616) Yan Ma 2025-07-08 22:07:10 +08:00
b942c094e3 Stop using title frontmatter and fix doc that can only be reached by search (#20623) Harry Mellor 2025-07-08 11:27:40 +01:00
b4bab81660 Remove unnecessary explicit title anchors and use relative links instead (#20620) Harry Mellor 2025-07-08 10:49:13 +01:00
b91cb3fa5c [Docs] Improve documentation for Deepseek R1 on Ray Serve LLM (#20601) Ricardo Decal 2025-07-08 02:09:06 -07:00
71d1d75b7a [PD][Nixl] Remote consumer READ timeout for clearing request blocks (#20139) Nicolò Lucchesi 2025-07-08 09:56:40 +02:00
72d14d0eed [Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load (#19619) Sanger Steel 2025-07-08 01:47:43 -04:00
e34d130c16 [TPU] Temporary fix vmem oom for long model len by reducing page size (#20278) Chenyaaang 2025-07-07 22:16:16 -07:00
7721ef1786 [CI/Build][CPU] Fix CPU CI and remove all CPU V0 files (#20560) Li, Jiang 2025-07-08 13:13:44 +08:00
8369b7c2a9 [Misc] improve error msg (#20604) Reid 2025-07-08 12:45:18 +08:00
3eb4ad53f3 [Docs] Add Anyscale to frameworks (#20590) Ricardo Decal 2025-07-07 20:09:13 -07:00
90a2769f20 [Docs] Add Ray Serve LLM section to openai compatible server guide (#20595) Ricardo Decal 2025-07-07 20:08:05 -07:00
e60d422f19 [Docs] Improve docstring for ray data llm example (#20597) Ricardo Decal 2025-07-07 20:06:26 -07:00
0d914c81a2 [Docs] Rewrite offline inference guide (#20594) Ricardo Decal 2025-07-07 20:06:02 -07:00
6e428cdd7a [Doc] Syntax highlight request responses as JSON instead of bash (#20582) Harry Mellor 2025-07-08 04:02:45 +01:00
93b9d9f499 [Bugfix]: Fix messy code when using logprobs (#19209) Chauncey 2025-07-08 11:02:15 +08:00
af107d5a0e Make distinct code and console admonitions so readers are less likely to miss them (#20585) Harry Mellor 2025-07-08 03:55:28 +01:00
31c5d0a1b7 [Optimize] Don't send token ids when kv connector is not used (#20586) Woosuk Kwon 2025-07-07 19:04:54 -07:00
afb7cff1b9 [Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe (#20167) Ming Yang 2025-07-07 18:07:22 -07:00
d2e841a10a [Misc] Improve logging for dynamic shape cache compilation (#20573) Kyle Yu 2025-07-07 20:48:09 -04:00
14601f5fba [Config] Refactor mistral configs (#20570) Patrick von Platen 2025-07-08 00:25:10 +02:00
042d131f39 Fix links in multi-modal model contributing page (#18615) Harry Mellor 2025-07-07 22:13:52 +01:00
8e807cdfa4 [Misc] feat output content in stream response (#19608) rongfu.leng 2025-07-08 04:45:10 +08:00
e601efcb10 [Misc] Add fully interleaved support for multimodal 'string' content format (#14047) Anton 2025-07-07 22:43:08 +03:00
22dd9c2730 [Kernel] Optimize Prefill Attention in Unified Triton Attention Kernel (#20308) jvlunteren 2025-07-07 21:08:12 +02:00
a6d795d593 [DP] Copy environment variables to Ray DPEngineCoreActors (#20344) Rui Qiao 2025-07-07 10:14:22 -07:00
a37d75bbec [Front-end] microbatch tokenization (#19334) ztang2370 2025-07-08 00:54:10 +08:00
edd270bc78 [Bugfix] Prevent IndexError for cached requests when pipeline parallelism is disabled (#20486) Peter Pan 2025-07-08 00:41:15 +08:00
110df74332 [Model][Last/4] Automatic conversion of CrossEncoding model (#19675) wang.yuqi 2025-07-07 22:46:04 +08:00
1ad69e8375 [Doc] Fix some MkDocs snippets used in the installation docs (#20572) Harry Mellor 2025-07-07 15:44:34 +01:00
b8a498c9b2 [Doc] Add outline for content tabs (#20571) Harry Mellor 2025-07-07 15:43:26 +01:00
923147b5e8 [Doc] Fix internal links so they don't always point to latest (#20563) Harry Mellor 2025-07-07 12:15:50 +01:00
45877ef740 [Doc] Use gh-pr and gh-issue everywhere we can in the docs (#20564) Harry Mellor 2025-07-07 11:54:22 +01:00
6e4bef1bea [Doc] Remove extra whitespace from CI failures doc (#20565) Harry Mellor 2025-07-07 11:35:47 +01:00
4ff79a136e [Misc] Set the minimum openai version (#20539) Jee Jee Li 2025-07-07 17:15:26 +08:00
448acad31e [Misc] remove unused jinaai_serving_reranking (#18878) Abirdcfly 2025-07-07 17:14:12 +08:00
eb0b2d2f08 [Docs] Clean up tables in supported_models.md (#20552) Michael Yao 2025-07-07 16:46:31 +08:00
3112271f6e [XPU] log clean up for XPU platform (#20553) Yan Ma 2025-07-07 16:38:22 +08:00
1fd471e957 Add docstrings to url_schemes.py to improve readability (#20545) Michael Yao 2025-07-07 16:31:49 +08:00
2c5ebec064 [XPU][CI] add v1/core test in xpu hardware ci (#20537) Liangliang Ma 2025-07-07 16:16:40 +08:00
2e610deb72 [CI/Build] Enable phi2 lora test (#20540) Jee Jee Li 2025-07-07 13:10:41 +08:00
6e2c19ce22 [Refactor]Abstract Platform Interface for Distributed Backend and Add xccl Support for Intel XPU (#19410) Yang Yang 2025-07-07 12:32:32 +08:00
47db8c2c15 [Misc] add a tip for pre-commit (#20536) Reid 2025-07-07 10:42:06 +08:00
462b269280 Implement OpenAI Responses API [1/N] (#20504) Woosuk Kwon 2025-07-06 18:32:13 -07:00
a5dd03c1eb Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)" v0.9.2rc2 v0.9.2 simon-mo 2025-07-06 14:02:36 -07:00
c18b3b8e8b [Bugfix] Add use_cross_encoder flag to use correct activation in ClassifierPooler (#20527) Cyrus Leung 2025-07-07 05:01:48 +08:00
9528e3a05e [BugFix][Spec Decode] Fix spec token ids in model runner (#20530) Woosuk Kwon 2025-07-06 12:44:52 -07:00
9fb52e523a [V1] Support any head size for FlexAttention backend (#20467) Cyrus Leung 2025-07-07 00:54:36 +08:00
e202dd2736 [V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412) Woosuk Kwon 2025-07-06 08:48:13 -07:00
43813e6361 [Misc] call the pre-defined func (#20518) Reid 2025-07-06 18:25:29 +08:00
cede942b87 [Benchmark] Add support for multiple batch size benchmark through CLI in benchmark_moe.py (#20516) Brayden Zhong 2025-07-06 05:20:11 -04:00
fe1e924811 [Frontend] Support image object in llm.chat (#19635) Flora Feng 2025-07-05 23:47:13 -07:00
4548c03c50 [TPU][Bugfix] fix the MoE OOM issue (#20339) Chengji Yao 2025-07-05 21:19:09 -07:00
40b86aa05e [BugFix] Fix: ImportError when building on hopper systems (#20513) Lucas Wilkinson 2025-07-06 00:17:30 -04:00

... 82 83 84 85 86 ...