Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

01a08739e0 [misc] split engine_model into json file for nsys profile tool (#23117) Grace Ho 2025-08-19 00:44:53 -07:00
fda9537c5e [Model] Support Pipeline Parallelism for moonshotai/Kimi-VL-A3B-Thinking-2506 (#23114) Jiangyun Zhu 2025-08-19 14:24:31 +08:00
90bbe0a5ad [Log] Warning Once for Cutlass MLA (#23137) Wentao Ye 2025-08-19 02:24:16 -04:00
e75f342261 Migrate InternVLImagePixelInputs (in nemotron_vl.py) to TensorSchema (#22023) Benji Beck 2025-08-18 22:48:26 -07:00
78dba404ad [Hardware][IBM Z]Enable v1 for s390x and s390x dockerfile fixes (#22725) Nikhil Suryawanshi 2025-08-19 10:10:37 +05:30
e9d6a3db69 [TPU] make ptxla not imported when using tpu_commons (#23081) Chengji Yao 2025-08-18 20:46:42 -07:00
a4454e9401 chore: disable enable_cpp_symbolic_shape_guards (#23048) Xiao 2025-08-18 20:08:05 -07:00
14006840ea [V0 Deprecation] Remove V0 FlashInfer attention backend (#22776) Woosuk Kwon 2025-08-18 19:54:16 -07:00
6603288736 [CI][V0 Deprecation] Removed V0 Only Chunked Prefill and Prefix Caching Tests (#22871) Robert Shaw 2025-08-18 20:39:01 -04:00
95e3095136 [Misc] Add @tdoublep as a maintainer of hybrid model and Triton-attention related code (#23122) Thomas Parnell 2025-08-19 02:31:38 +02:00
c9b38be8aa [Spec Decode] Make propose_draft_token_ids non-blocking for lower TTFT (#23041) Woosuk Kwon 2025-08-18 17:20:38 -07:00
0dd3f4f5ab [Misc] Minor refactoring for prepare_inputs (#23116) Woosuk Kwon 2025-08-18 16:58:05 -07:00
498259ccce Install tpu_info==0.4.0 to fix core dump for TPU (#23135) Xiang Xu 2025-08-18 16:23:33 -07:00
aab549870d Use Blackwell FlashInfer MXFP4 MoE by default if available (#23008) v0.10.1 Michael Goin 2025-08-18 18:25:49 -04:00
ba6928cf13 fix: OpenAI SDK compat (ResponseTextConfig) (#23126) Breno Baldas Skuk 2025-08-19 00:22:59 +02:00
befedf86a8 [CI Bugfix] Pin openai<1.100 to unblock CI (#23118) Michael Goin 2025-08-18 15:14:01 -04:00
6d25e3fd6e Use Blackwell FlashInfer MXFP4 MoE by default if available (#23008) Michael Goin 2025-08-18 18:25:49 -04:00
ac6eb49de3 fix: OpenAI SDK compat (ResponseTextConfig) (#23126) Breno Baldas Skuk 2025-08-19 00:22:59 +02:00
bf756321c7 [CI Bugfix] Pin openai<1.100 to unblock CI (#23118) Michael Goin 2025-08-18 15:14:01 -04:00
0e3bb543f0 [Bugfix] Support compile for Transformers multimodal (#23095) Raushan Turganbay 2025-08-18 15:35:48 +02:00
569aefd134 chore: remove unnecessary patch_padding_side for the chatglm model (#23090) 杨朱 · Kiki 2025-08-18 20:32:13 +08:00
d3f71f1224 [Refactor] Get prompt updates earlier (#23097) Cyrus Leung 2025-08-18 20:31:53 +08:00
5a30bd10d8 [Bugfix] fix IntermediateTensors equal method (#23027) Ning Xie 2025-08-18 17:58:11 +08:00
27e8d1ea3e [Refactor] Define MultiModalKwargsItems separate from MultiModalKwargs (#23053) Cyrus Leung 2025-08-18 17:52:00 +08:00
5c79b0d648 [XPU][CI]add xpu env vars in CI scripts (#22946) Kunshang Ji 2025-08-18 17:47:03 +08:00
5f5664b3e4 [XPU] Fix compile size for xpu (#23069) Kunshang Ji 2025-08-18 15:04:08 +08:00
89657a557c [Misc] Fix backward compatibility from #23030 (#23070) Roger Wang 2025-08-17 23:33:29 -07:00
08d5f7113a [Misc] refactor function name (#23029) Ning Xie 2025-08-18 13:16:21 +08:00
b2fd0b81e0 [Bugfix][CI] Machete kernels: deterministic ordering for more cache hits (#23055) Andy Lo 2025-08-18 07:10:26 +02:00
9f1c642254 [Bugfix] fix Qwen2.5-Omni processor output mapping (#23058) double7 2025-08-18 13:09:11 +08:00
7be3a59d8e [Misc] enhance static type hint (#23059) Ning Xie 2025-08-18 13:09:08 +08:00
8ea0c2753a [Misc] Minor code cleanup for _get_prompt_logprobs_dict (#23064) Woosuk Kwon 2025-08-17 18:16:03 -07:00
0fc8fa751a fix: gptq marlin weight loading failure (#23066) v0.10.1rc1 Simon Mo 2025-08-17 15:56:07 -07:00
21e39436c8 [XPU] fix xpu to set cudagraph batch sizes (#23044) Calvin Chen 2025-08-18 05:45:42 +08:00
6d243efeda [Misc] Convert use_structured_output property into constant (#23060) Woosuk Kwon 2025-08-17 12:41:38 -07:00
c55bc1db26 [Misc] Remove dead return (#23061) Woosuk Kwon 2025-08-17 10:36:46 -07:00
292084e72a [BugFix] Fix for IMA in FA3 varlen combine (#22967) Lucas Wilkinson 2025-08-17 11:52:04 -04:00
16bff144be [Misc] fix typo in the multimodal doc (#23051) Kevinzz 2025-08-17 16:56:20 +08:00
fe0411fc6f [Bugfix] should use stack instead of concat (#22972) 947132885 2025-08-17 16:46:36 +08:00
4d4061b6e7 [Kernel] Add cuda kernel for gpt_oss activation (#22951) Jee Jee Li 2025-08-17 13:03:24 +08:00
87f48623a5 [Misc] method name typo fix (#23042) Ning Xie 2025-08-17 12:49:14 +08:00
5c32143b9d [Refactor] Defer tensor data construction in MultiModalKwargs (#23030) Cyrus Leung 2025-08-17 12:05:50 +08:00
94096a47c9 [UX] Separate marlin moe config logic from triton moe (#23006) Michael Goin 2025-08-16 22:16:42 -04:00
a258ad8bcc [Bugfix] fix qwen3 moe fp8 accuracy issue (#23031) Jinzhen Lin 2025-08-17 08:41:23 +08:00
bf7f470b22 [V1] Logits processors extensibility (#19912) afeldman-nm 2025-08-16 15:59:17 -04:00
4fc722eca4 [Kernel/Quant] Remove AQLM (#22943) Michael Goin 2025-08-16 15:38:21 -04:00
3253ae765e [Flaky CI] Increase timeout tolerance for test_mp_crash_detection+test_default_mm_lora_chat_completions (#23028) Michael Goin 2025-08-16 14:33:08 -04:00
000cceca8c [Bugfix gpt-oss] Fix float32 convert for flashinfer sink support (#23016) Michael Goin 2025-08-16 14:16:00 -04:00
68373d3126 [Frontend] Added support for HermesToolParser for models without special tokens (#16890) Woonggi Min 2025-08-17 02:38:42 +09:00
52ce1420e9 Fix handling of max_num_batched_tokens for pooling tasks (#23004) Maximilien de Bayser 2025-08-16 14:36:30 -03:00
829bbd7882 [New Model]mBART model (#22883) 汪志鹏 2025-08-16 20:16:58 +08:00
4dff91c93d [Refactor] Allow optional MultiModalKwargsItem in IPC (#23022) Cyrus Leung 2025-08-16 19:30:49 +08:00
de9cb61763 Add docs for PrefixRepetitionDataset + enable usage with vllm bench throughput (#23012) Seiji Eicher 2025-08-16 03:21:20 -07:00
2dbccce8a6 [CI][Bugfix] Skip Ovis2 generation test because of broken remote code (#22954) Isotr0py 2025-08-16 17:44:19 +08:00
933f45334a [Core] Make cudagraph check cuda platform only (#23005) Chengji Yao 2025-08-16 00:46:00 -07:00
cc826a202b [Multimodal] Update Tensor schema test to cover arbitrary shape mm inputs (#22867) Isotr0py 2025-08-16 15:44:50 +08:00
6d3da472bc [Misc] Add --save-dir option to benchmark_moe (#23020) Jee Jee Li 2025-08-16 15:26:10 +08:00
78863f8c5c [BugFix] Add support for loading prompt embeds tensors serialized on unavailable devices and sparse tensors (#22962) Andrew Sansom 2025-08-16 01:25:10 -05:00
5157827cfc [Build] Env var to disable sccache (#22968) Lucas Wilkinson 2025-08-16 01:36:27 -04:00
7caec10e7b [XPU]avoid circular import during XPU init (#23017) Kunshang Ji 2025-08-16 13:16:34 +08:00
1f83e7d849 [misc] nsys profile output kernel classifier and visualizer (#22971) Grace Ho 2025-08-15 19:52:51 -07:00
e4e37ded56 [V1] support min_tokens for detokener (#22014) Calvin Chen 2025-08-16 10:28:10 +08:00
f6b5040590 [Frontend] Avoid list copies in serving_chat.py (#22947) Nick Hill 2025-08-15 19:06:30 -07:00
fbd88728b3 [Bugfix] Fix DeepSeek MTP (#22934) Benjamin Chislett 2025-08-15 21:25:06 -04:00
070da660c1 [Kernel] Simplify get_kv_cache_layout and cache use_trtllm_attention env-dependent bit (#22735) Nicolò Lucchesi 2025-08-16 02:14:08 +02:00
ad0297d113 [Misc] Support passing multiple request ids at once to AsyncLLM.abort() (#22944) Nick Hill 2025-08-15 17:00:36 -07:00
236b864e4f [BugFix] Make run_once thread-safe (#22978) Yichen Yan 2025-08-16 07:56:17 +08:00
3e2f7985a2 Support multiple attention groups for KV sharing (#22672) Yong Hoon Shin 2025-08-15 16:54:10 -07:00
c280066f9d [v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728) Or Ozeri 2025-08-16 02:52:52 +03:00
b9dc9d2607 [BugFix] Handle case where async utility call is cancelled (#22996) Nick Hill 2025-08-15 16:38:42 -07:00
1fc375dc05 [Structured Outputs] [Bug] Fix misalignment in apply_grammar_bitmask causing unintended masking and NaN logits (#22963) rishitdholakia13 2025-08-15 17:25:05 -06:00
76144adf76 ci: Add CUDA + arm64 release builds (#21201) Eli Uriegas 2025-08-15 16:16:23 -07:00
f5d412bafb [BugFix] Fix regression caused by mamba state dtype PR (#22998) Thomas Parnell 2025-08-16 00:55:26 +02:00
177e55e3bd [Attention] FA3 Attention Sinks Perf Boost (#22478) Lucas Wilkinson 2025-08-15 17:41:07 -04:00
1723ef1aae minor: zero workspace buffer init for flashinfer trtllm-gen attn (#22603) eigen 2025-08-15 17:38:10 -04:00
00d6cba0cf Add PrefixRepetitionRandomDataset to vllm bench serve datasets (#20638) Seiji Eicher 2025-08-15 14:09:23 -07:00
7f89ed248f [Fix] enable swap_ab for pplx problem size computation (#22991) shixianc 2025-08-15 14:02:12 -07:00
8a87cd27d9 [CI] Speed up Whisper tests by reusing server (#22859) Michael Goin 2025-08-15 16:56:31 -04:00
a344a1a7da Use regex in convert-results-json-to-markdown.py (#22989) Michael Goin 2025-08-15 16:54:20 -04:00
79899b63f6 [Bugfix] Added more env vars to hash (#22449) nvjullin 2025-08-16 04:08:37 +08:00
6e670778cd [Core] direct indexing on self.block_table_np in compute_slot_mapping (#22940) Zebing Lin 2025-08-15 15:12:12 -04:00
df5afa82e5 [Log] Debug Once for Randomizing dummy data for DP Rank (#22860) Wentao Ye 2025-08-15 14:51:50 -04:00
6cd69f51bf [Model] Granite-4 support loading quantized checkpoint (#22925) Chih-Chieh Yang 2025-08-15 14:47:56 -04:00
8ad7285ea2 [Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035) bnellnm 2025-08-15 14:46:00 -04:00
48b01fd4d4 [Structured Output] Make the output of structured output example more complete (#22481) Shanshan Shen 2025-08-16 02:29:25 +08:00
993d3d122b [Benchmarks] Include image data when ShareGPT4V dataset is used. (#22955) Chenheli Hua 2025-08-15 11:23:06 -07:00
68af77e51c [FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (#22896) JartX 2025-08-15 19:42:49 +02:00
6b04039a72 [BugFix] Skip the Q component for QKVParallelLinear in the case of QKVCrossParallelLinear since its width is 0 (#22369) sstamenk 2025-08-15 19:17:31 +02:00
7e2fb3c507 Merge branch 'main' into wye-refactor-quant-folder Wentao Ye 2025-08-15 11:24:28 -04:00
1c859a1387 [V0 Deprecation] Remove advance_step (#22969) Woosuk Kwon 2025-08-15 08:22:31 -07:00
74f441f4b5 [Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059) fhl2000 2025-08-15 22:01:39 +08:00
a0632a3e03 [Frontend] Expose do_log_stats interval to env (#22905) Csrayz 2025-08-15 21:00:20 +08:00
e8b40c7fa2 [CI] Remove duplicated docs build from buildkite (#22924) Harry Mellor 2025-08-15 13:58:06 +01:00
48f4636927 [Misc] Ignore ep_kernels_workspace (#22807) Jee Jee Li 2025-08-15 20:58:03 +08:00
75531a6c13 [V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) (#22928) Thomas Parnell 2025-08-15 14:57:06 +02:00
22341b996e Improve multimodal hasher performance for re-used Image prompts (#22825) Staszek Paśko 2025-08-15 14:32:56 +02:00
49252cf59e [MM] Allow skipping memory profiling for multimodal models. (#22950) Roger Wang 2025-08-15 04:41:38 -07:00
3e6dd40016 [Bugfix] fix cuda 12.6 and 11.8 build (#22952) Jinzhen Lin 2025-08-15 18:10:22 +08:00
aa300c438d [Bugfix] Unquote file uri before reading image (#22912) Sayandip Dutta 2025-08-15 14:58:00 +05:30
fe91ce9591 [V1] - Split Prefill and Decode for Mamba1 models (#22653) amirai21 2025-08-15 11:59:52 +03:00

... 71 72 73 74 75 ...