Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

27c6c2f98c [Bugfix] Fix MoE LoRA bin/pt loading (#31161) Jee Jee Li 2025-12-23 19:09:15 +08:00
73cfb7a722 Correct position of docstring of class attributes (#31209) Weida Hong 2025-12-23 18:08:58 +08:00
f32cfd7d97 [ROCm][FEAT] Support AITER RMSNorm quantization fusion pass (#26575) vllmellm 2025-12-23 11:07:54 +01:00
6b16fff01b [Bugfix] Fix Jais2ForCausalLM (#31198) Jee Jee Li 2025-12-23 15:44:01 +08:00
f1c2c20136 [XPU] decrease IGC_ForceOCLSIMDWidth for speculative decoding triton-xpu kernel compilation (#30538) Yan Ma 2025-12-23 13:22:15 +08:00
8cef137689 [Chore] Update more locations to use attention_config.backend (#31153) Cyrus Leung 2025-12-23 11:19:50 +08:00
a37328fc5c [Feature] Batch invariant: Lora (#30097) quanliu 2025-12-23 10:32:47 +08:00
3e10262356 Revert "[SM100] Enable fp8 compute for prefill MLA (#30746)" (#31197) Pavani Majety 2025-12-22 18:15:33 -08:00
612d5ffdab [ci] Fix Pytorch compilation test oom in 2.10 (#31194) Angela Yi 2025-12-22 17:56:47 -08:00
78e5e62bbf [AMD][CI] fix v1/engine test_preprocess_error_handling (#31192) Divakar Verma 2025-12-22 19:28:19 -06:00
b57b967386 [MoE Refactor][7/N] AITER MK (#31102) Robert Shaw 2025-12-22 18:42:58 -05:00
6d518ffbaa [CI Failure] Disable mosaicml/mpt-7b and databricks/dbrx-instruct tests (#31182) Michael Goin 2025-12-22 18:40:35 -05:00
85aff45e24 [Perf] Remove blocking copy in GDN Attention (#31167) Benjamin Chislett 2025-12-22 17:25:22 -05:00
5312a7284e [Bug] Fix 'CutlassMLAImpl' object has no attribute '_workspace_buffer' (#31173) Wentao Ye 2025-12-22 17:24:27 -05:00
de71747655 [SpecDecode] Simplified alternative padded-speculation acceptance rate fix (#29845) Lucas Wilkinson 2025-12-22 16:06:10 -05:00
9586354053 [Doc] Add vllm-metal to hardware plugin documentation (#31174) Michael Goin 2025-12-22 15:06:29 -05:00
b10f41c894 [SM100] Enable fp8 compute for prefill MLA (#30746) Pavani Majety 2025-12-22 11:15:57 -08:00
7b926e8901 [MoE Refactor][9/N] Use modular kernel for unquantized Triton MoE (#31052) Yongye Zhu 2025-12-22 09:34:19 -08:00
ab3a85fd68 [ROCm][CI/Build] Fix triton version to one that has triton_kernels required for gpt-oss to run (#31159) Gregory Shtrasberg 2025-12-22 11:19:27 -06:00
8dd0db687b [UX] improve profiler error message (#31125) Boyuan Feng 2025-12-22 08:45:59 -08:00
022f3cea53 [ROCm] [Critical]: Remove unused variable (#31156) TJian 2025-12-23 01:28:22 +09:00
a5bc77c253 [AMD][CI] Add "V1 Test e2e + engine" to mi325_8 Agent Pool (#31040) Micah Williamson 2025-12-22 09:41:56 -06:00
b1c3f96ae3 [CI][Bugfix] Fix entrypoints/openai/test_audio.py (#31151) Nicolò Lucchesi 2025-12-22 16:21:40 +01:00
8f8f469b1b [BugFix] skip language model in Encoder (#30242) dengyunyang 2025-12-22 21:25:59 +08:00
2cf91c2ea4 [CI] add polling for precompiled wheel in python_only_compile.sh, fix index generation for releases (#30781) Shengqi Chen 2025-12-22 21:24:21 +08:00
bd6d5a7475 [gpt-oss] Fix harmony parser in streaming responses (#30205) AlonKejzman 2025-12-22 14:56:06 +02:00
256a33ecb4 [Model] Fix bagel failed to run (#31132) Li Wang 2025-12-22 18:15:54 +08:00
c02a2705f9 Update MiniMax-M2 ToolCall and add MiniMax-M2.1 in Docs (#31083) Roger Young 2025-12-22 13:28:40 +08:00
cf8eed7bef [Bugfix][ROCm] Fix typo: is_linear_fp8_enaled -> is_linear_fp8_enabled (#31109) Kevin McKay 2025-12-21 23:14:58 -06:00
44ae85f725 [Misc] Fix typo: 'occured' -> 'occurred' (#31120) Kevin McKay 2025-12-21 23:14:27 -06:00
14c3e6ade3 [Misc] Fix spelling typos in model comments (#31117) Kevin McKay 2025-12-21 23:14:14 -06:00
42b42824ae [Misc] Fix grammar errors in comments and messages (#31115) Kevin McKay 2025-12-21 23:14:02 -06:00
ec58c10ce1 [Misc] Fix quantization-related typos (#31116) Kevin McKay 2025-12-21 23:13:48 -06:00
8c084de59d [Misc] Fix spelling typos in comments (#31114) Kevin McKay 2025-12-21 23:13:14 -06:00
19cc9468fd [Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM (#30957) CedricHuang 2025-12-22 11:34:49 +08:00
097978a15d [Kernel] Enable fused_qknorm_rope_kernel supports partial rope (#30821) Jee Jee Li 2025-12-22 10:39:22 +08:00
7e065eba59 [CI] Fix "2 Node Tests (4 GPUs in total)" (#31090) Lucas Wilkinson 2025-12-21 21:32:40 -05:00
9d701e90d8 [Doc] Clarify FP8 KV cache computation workflow (#31071) Steve Westerhouse 2025-12-21 18:41:37 -06:00
06d490282f [NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size (#30897) Michael Goin 2025-12-21 12:41:57 -05:00
b471092d3a [MoE Refactor][4/N] Marlin Fp8 Mk (#31036) Robert Shaw 2025-12-21 12:37:42 -05:00
93cabc417c ci: add nvidia-smi warmup before Prime-RL integration test (#31093) Ameen Patel 2025-12-21 07:43:01 -08:00
bb80f69bc9 add aarnphm and chaunceyjiang to the new tool_parser directory (#31088) Chauncey 2025-12-21 11:24:34 +08:00
3e92b2b7ac [BugFix]fix gpt-oss v1/completions response bug (#30608) 汪志鹏 2025-12-21 10:39:31 +08:00
7c73ceb581 [Quantization] add marlin w4a8/w8a8 check (#31061) Jinzhen Lin 2025-12-21 05:58:11 +08:00
ae0770fa6b [CI] Fix H200 Distributed test (#31054) Lucas Wilkinson 2025-12-20 16:48:49 -05:00
ee52d9901d [Quantization] support logical_widths for fp8 marlin (#30962) Jinzhen Lin 2025-12-21 04:02:57 +08:00
54c8924384 [MoE Refactor][5/N] Isolate zero expert to LongCatFlash (#28891) baonudesifeizhai 2025-12-20 13:22:04 -05:00
560ae9638c [XPU] enable fp8 online streaming quantization (#30944) Yan Ma 2025-12-20 21:45:27 +08:00
1501a4070e [Bugfix] Read truncate_prompt_tokens from pooling_params in AsyncLLM.encode() (#31013) Jeffrey Wang 2025-12-20 02:29:31 -08:00
ff2168bca3 [CI] FIx fixture 'siglip_attention_config' not found (#31053) Lucas Wilkinson 2025-12-19 22:46:15 -05:00
0be149524c [ROCm][CI/Build] Update ROCm dockerfiles (#30991) Gregory Shtrasberg 2025-12-19 21:19:12 -06:00
d52c5096d7 [Bugfix] fix the alias bug of AttentionBackendEnum when register CUSTOM attention backend to vllm (#30869) zejunchen-zejun 2025-12-20 09:03:35 +08:00
8a7a414374 GLM-4.7 Tool Parser and Doc Update (#30876) Yuxuan Zhang 2025-12-20 08:09:58 +08:00
95befecc18 [MoE Refactor][2/N] Use Modular Kernels for Fp8 (#30825) Robert Shaw 2025-12-19 18:36:38 -05:00
4cf9429897 [Bug] Fix error 'Dynamo failed to run FX node with fake tensors for Deepseek V3.2 (#31046) Wentao Ye 2025-12-19 18:31:31 -05:00
83a317f650 [MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990) Robert Shaw 2025-12-19 16:09:54 -05:00
5f6477d1d0 [BugFix] Fix TypeError: unhashable type: 'dict' when serving deepseek32 (#30924) Lucas Wilkinson 2025-12-19 16:07:54 -05:00
3bd8335bd0 [Refactor] Refactor for DeepGemmQuantScaleFMT using cache (#30898) Wentao Ye 2025-12-19 15:50:39 -05:00
1ab5213531 Make engine core client handshake timeout configurable (#27444) Seiji Eicher 2025-12-19 12:38:30 -08:00
969bbc7c61 [Model] Add MiMo-V2-Flash support (#30836) Zhonghua Deng 2025-12-20 01:17:03 +08:00
268a972c62 Update Pytorch version update docs (#30982) Andrey Talman 2025-12-19 11:08:53 -05:00
5fbfa8d9ef [Quantization] fix marlin w8a8 check (#30961) Jinzhen Lin 2025-12-19 23:33:22 +08:00
23a1946e3b [CustomOp][Refactor] Extract common methods for ApplyRotaryEmb CustomOp (#31021) Shanshan Shen 2025-12-19 22:16:09 +08:00
b5545d9d5c [Bugfix] [Kernel] Triton attention kernels: mask out V blocks that fall outside sliding window (#30887) Thomas Parnell 2025-12-19 14:39:54 +01:00
bd2b52fc2d [CPU][Bugfix] Fix ppc64le CPU build (#30871) Nishidha Panpaliya 2025-12-19 17:56:35 +05:30
420ba2dbb6 Enable aarch64 CPU performance benchmarks (#26494) Li, Jiang 2025-12-19 20:16:18 +08:00
455949675d [Frontend][Bug] allow tool calls in analysis channel (#28139) Marko Rosenmueller 2025-12-19 11:47:44 +01:00
086b96339f [Bugfix] Add validation for tool requests when tool_parser is unavailable (#30613) lif 2025-12-19 18:23:28 +08:00
9187de9fac [Quantization] enable compressed-tensors marlin support for turing (2) (#31008) Jinzhen Lin 2025-12-19 16:56:35 +08:00
ac1c934276 [Bugfix] Fix incorrect tiles creation for mm prefix triton attention (#30974) Isotr0py 2025-12-19 16:00:33 +08:00
4924ac582c Add hidden dimension validation for multimodal embedding inputs (#30968) Wenqi Glantz 2025-12-19 02:59:36 -05:00
096b25c9ed [Doc][CPU] Fix index link for CPU regular release wheels (#31015) Li, Jiang 2025-12-19 15:29:52 +08:00
de08b8f61b [Quantization] enable compressed-tensors marlin support for turing (#31000) Jinzhen Lin 2025-12-19 12:29:48 +08:00
2ac85a4544 [BugFix] Fix logprobs with spec decode and modified logits (#30846) Nick Hill 2025-12-18 19:58:28 -08:00
7b43db210c [ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements (#30270) Andreas Karatzas 2025-12-18 20:17:27 -06:00
6a09612b2e [Bugfix] Fix tool_choice="none" being ignored by GPT-OSS/harmony models (#30867) v0.14.0rc0 PlatinumGod 2025-12-19 09:34:27 +08:00
45c0526ac9 [BugFix] Handle errors when preprocessing added requests (#30895) Nick Hill 2025-12-18 17:29:11 -08:00
d6b3d39b6d [Cleanup] Refactor FlashInferMetadataBuilder (#29128) Benjamin Chislett 2025-12-18 17:45:30 -05:00
6ca74bc11a [NIXL][BUG FIX] Fix both failing issue and accuracy issue with nixl + host_buffer on CUDA (#30419) Chendi.Xue 2025-12-18 16:10:02 -06:00
72506c9834 Check for truthy rope_parameters not the existence of it (#30983) v0.13.0 Harry Mellor 2025-12-18 21:59:10 +00:00
b2eb84de77 [Bugfix] Remove tile_size=64 for mm_prefix triton attention (#30973) Isotr0py 2025-12-19 03:42:32 +08:00
ac43367ced adds jais 2 support (#30188) sarathc-cerebras 2025-12-18 21:16:58 +05:30
30fe765e9f [Fix][FlexAttention] return max logical block index to handle reused blocks (#30915) Yifan Qiao 2025-12-17 22:42:21 -08:00
19c583398a Check for truthy rope_parameters not the existence of it (#30983) Harry Mellor 2025-12-18 21:59:10 +00:00
b0b77c4655 [BugFix] Fix spec decode + structured outputs + preemption edge case (#30916) Nick Hill 2025-12-18 12:59:55 -08:00
634a14bd7d Strengthen input validation and tests for 'parse_raw_prompts’. (#30652) Kayvan Mivehnejad 2025-12-18 14:51:58 -05:00
24b65eff0d [BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 (#30319) Chen Zhang 2025-12-18 11:47:56 -08:00
41b6f9200f Remove all2all backend envvar (#30363) Elizabeth Thomas 2025-12-18 13:46:28 -06:00
97000a2be7 [Bug] Fix compressed tensor not using deepgemm (#30820) Wentao Ye 2025-12-18 14:45:55 -05:00
d2dc5dfc6e [Bugfix] Remove tile_size=64 for mm_prefix triton attention (#30973) Isotr0py 2025-12-19 03:42:32 +08:00
b8c477c115 tuned fused configs for B300 (#30629) navmarri14 2025-12-18 11:41:59 -08:00
53ad423f26 [Perf] enable flashinfer rotary_embedding custom ops in DeepSeek rotary (#30729) jiahanc 2025-12-18 11:31:18 -08:00
889f8bb250 [BugFix]Reclaim resources to prevent memory leaks when use LMCacheMPConnector (#30745) wz1qqx 2025-12-19 03:09:51 +08:00
058926d48c [XPU] allow custom workers (e.g. vllm-omni workers) to be used on XPU (#30935) Fanli Lin 2025-12-19 02:16:36 +08:00
700a5ad6c6 [MM Encoder]: Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface (#30684) Isotr0py 2025-12-19 02:04:19 +08:00
62be3670cb [BugFix] Add sleep to fix tight loop and release GIL (#29476) Alec 2025-12-18 09:52:55 -08:00
500f26e6d3 [Bugfix] fix DP-aware routing in OpenAI API requests (#29002) inkcherry 2025-12-19 01:50:42 +08:00
686cbaac64 [Cleanup] Remove unused ModelRunner V1 InputBatch.num_tokens field (#30218) Nick Hill 2025-12-18 09:17:00 -08:00
f4ee2c3d90 fix fp8 online quantization streaming with tp > 1 (#30900) Vasiliy Kuznetsov 2025-12-18 11:45:15 -05:00
9a5e96523b [LoRA] Set default MXFP4 LoRA backend to Marlin (#30598) Xin Yang 2025-12-18 08:42:22 -08:00

... 32 33 34 35 36 ...