Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

39a22dcaac [Misc] Minor code simplification for spec decode (#24053) Woosuk Kwon 2025-09-01 08:54:01 -07:00
41c80698b3 Document multi-proc method selection for profiling (#23802) Julien Debache 2025-09-01 15:28:26 +02:00
7c8271cd1e [Model]: support KeyeVL-1_5-8B (#23838) Kwai-Keye 2025-09-01 18:50:27 +08:00
3e330fcb21 [Doc]: Fix CPU install docs: force torch-backend=cpu to avoid GPU torchvision errors (#24033) Kay Yan 2025-09-01 18:34:52 +08:00
d46934b229 [Frontend] Gemma3n audio transcriptions/translations endpoint (#23735) Nicolò Lucchesi 2025-09-01 12:07:46 +02:00
107284959a [Doc]: fix typos in Python comments (#24026) Didier Durand 2025-09-01 11:38:20 +02:00
dc1a53186d [Kernel] Update DeepGEMM to latest commit (#23915) Jee Jee Li 2025-09-01 17:38:04 +08:00
55602bb2e6 [Frontend] Update the warning log when using VLLM_ALLOW_LONG_MAX_MODEL_LEN (#20904) wang.yuqi 2025-09-01 16:50:25 +08:00
d7fbc6ddac [Misc] Enable V1 FP16 inference on pre-Ampere GPUs (#24022) Isotr0py 2025-09-01 16:12:22 +08:00
5438967fbc [Misc] add hash_function doc string (#24014) Ning Xie 2025-09-01 14:11:20 +08:00
422e793fa6 [Bugfix] Add support for <tool_call> format in streaming mode for XLAM Tool Parser (#22769) Code Jesus 2025-08-31 23:07:54 -07:00
1cb39dbcdd [Misc] IO Processor plugins for pooling models (#22820) Christian Pinto 2025-09-01 07:07:12 +01:00
437c3ce026 Migrate Phi4 inputs to TensorSchema (#23471) Benji Beck 2025-08-31 23:05:59 -07:00
499b074bfd [Misc] refactor code by import as for torch._inductor.config (#23677) Ning Xie 2025-09-01 14:05:42 +08:00
ff0e59d83a [CI/Build] Improve Tensor Schema tests speed by avoid engine core initialization (#23357) Isotr0py 2025-09-01 13:52:20 +08:00
b55713683c [Misc] Move fast prefill logic to separate method (#24013) Woosuk Kwon 2025-08-31 22:40:38 -07:00
acc1a6e10a Fix the bug related to loading GPTP INT3 weights. (#23328) Jun-Howie 2025-09-01 13:39:57 +08:00
8c742a66d1 [Misc] Avoid redundant copy for encoder-only models (#24012) Woosuk Kwon 2025-08-31 21:02:43 -07:00
183a70967a [BUGFIX] GPTQ quantization compatibility for Qwen3 MOE models (AutoGPTQ and AutoRound-GPTQ) (#23994) JartX 2025-09-01 05:33:40 +02:00
14b4326b94 v1: Support KV events from connectors (#19737) Or Ozeri 2025-09-01 04:13:21 +03:00
752d2e1c36 [Minor] Fix some random typos in comments (#24009) Nick Hill 2025-08-31 16:42:17 -07:00
81eea3d348 vllm fix check on max vocab size (#22471) Xiaodong Wang 2025-08-31 05:57:05 -07:00
9701352e4b [Doc]: fix typos in Python comments (#24001) Didier Durand 2025-08-31 10:21:59 +02:00
749be00a98 [Core][Multimodal] Allow passing multi_modal_uuids as multimodal identifiers. (#23394) Roger Wang 2025-08-30 18:01:22 -07:00
5b8077b8ac Fix wrong truncate_prompt_tokens type hint (#22761) Gabriel Marinho 2025-08-30 17:39:38 -03:00
038e9be4eb [LoRA] Much faster startup when LoRA is enabled (#23777) Andy Lo 2025-08-30 16:37:39 +01:00
68a349114f [Misc] enhance type hint for rearrange return value (#23519) Ning Xie 2025-08-30 21:43:33 +08:00
e80bca309e [Refactor] refactor freezing_value/cuda_event initialize outside try finally (#23758) Ning Xie 2025-08-30 21:42:25 +08:00
fb4983e112 [Misc] add reorder_batch AttentionMetadataBuilder (#23798) Ning Xie 2025-08-30 21:41:45 +08:00
379ea2823a Add LoRA support for DeepSeek models (V2, V3, R1-0528) (#23971) sadegh.shokatian 2025-08-30 06:40:02 -07:00
3a6acad431 [Model] Enable encoder DP for MiniCPM-V (#23948) Jiangyun Zhu 2025-08-30 21:31:26 +08:00
5490d633ce [UT] fix unify_kv_cache_configs when kv cache config needs sort (#23843) Ning Xie 2025-08-30 19:22:14 +08:00
628d00cd7b [Bugfix] Fix test_lora_resolvers.py (#23984) Jee Jee Li 2025-08-30 19:16:11 +08:00
4071c76cf3 [V1] [Hybrid] Move MiniMaxLinearAttention into layers/mamba (#23831) Thomas Parnell 2025-08-30 09:16:15 +02:00
f1bddbd852 [Core] Cleanup TPU model runner for MM (#23894) Cyrus Leung 2025-08-30 15:14:58 +08:00
9748c5198b [CI] Fix broken compile tests due to unsupported SiluMul+Nvfp4Quant fusion (#23973) Yong Hoon Shin 2025-08-30 00:14:43 -07:00
ee52a32705 [CI] Move testing image from remote URL to S3 (#23980) Roger Wang 2025-08-29 21:41:25 -07:00
8fb85b7bb6 Add routed_scaling_factor to MoE grouped topk (#23123) Xin Yang 2025-08-29 21:36:48 -07:00
5b31cb1781 [Bugfix] Fix --config arg expansion called from api_server.py (#23944) dubejf 2025-08-30 00:36:39 -04:00
d660c98c1b [CI] Fix unavailable image remote URL (#23966) Roger Wang 2025-08-29 15:40:04 -07:00
5674a40366 [Misc] Make download_weights_from_hf more reliable (#23863) Harry Mellor 2025-08-29 20:37:24 +01:00
8c3e199998 Revert gemma3n fast prefill changes (#23897) Yong Hoon Shin 2025-08-29 12:16:57 -07:00
1c26b42296 [Docs] [V1] [Hybrid] Add new documentation re: contributing mamba-based models (#23824) Thomas Parnell 2025-08-29 20:47:58 +02:00
b7adf94c4a Tuned H100/H200 triton fp8 block configs for fused_qkv_a_proj (#23939) Michael Goin 2025-08-29 13:28:35 -04:00
4d7fe40fc0 [RL][BugFix] Fix missing tokenizer error for token-in-token-out (#23904) 22quinn 2025-08-29 10:09:55 -07:00
0dc9532065 [BUGFIX ] fix undefined silu_and_mul_nvfp4_quant (#23929) yzds 2025-08-30 00:36:39 +08:00
72a69132dc [CI] Add aiter to matching list of issue auto labeller for rocm tag (#23942) vllmellm 2025-08-29 23:29:21 +08:00
d90d8eb674 [BugFix] Async scheduling and PP compatibility with DP (#23770) Nick Hill 2025-08-29 08:17:27 -07:00
0a2f4c0793 [Models] Use in-place adds in Idefics2Vision (#23932) Lukas Geiger 2025-08-29 15:42:57 +01:00
1cf3753b90 [MODEL] Apertus and XIELU (#23068) EduardDurech 2025-08-29 14:29:18 +02:00
4f7cde7272 Adds json_count_leaves utility function (#23899) Adit Chawdhary 2025-08-29 17:58:13 +05:30
67c14906aa Update PyTorch to 2.8.0 (#20358) Huy Do 2025-08-29 03:57:35 -07:00
69f46359dd [Multimodal] Consolidate mm inputs into MultiModalFeatureSpec (#23779) Flora Feng 2025-08-29 03:36:57 -07:00
d9e00dbd1f [Performance] V1 Classify Models E2E Performance Optimization (#23541) wang.yuqi 2025-08-29 18:12:32 +08:00
ad39106b16 [CPU] Enable data parallel for CPU backend (#23903) Li, Jiang 2025-08-29 17:19:58 +08:00
2554b27baa [V0 Deprecation] Remove pooling model support in V0 (#23434) Maximilien de Bayser 2025-08-29 04:04:02 -03:00
934bebf192 Better errors for Transformers backend missing features (#23759) Harry Mellor 2025-08-29 08:01:40 +01:00
885ca6d31d [Misc] Fix warnings for mistral model (#23552) Jiangyun Zhu 2025-08-29 14:58:48 +08:00
2d0afcc9dc [mrope][Qwen2-VL] Fix edge case where getting index of image/video token can potentially throw in default vl mrope implementation. (#23895) Chenheli Hua 2025-08-28 23:29:13 -07:00
b4f9e9631c [CI/Build] Clean up LoRA test (#23890) Jee Jee Li 2025-08-29 14:28:35 +08:00
05d839c19e Fix(async): Add support for truncate_prompt_tokens in AsyncLLM (#23800) Raghavan 2025-08-29 11:25:06 +05:30
6597d7a456 [Platform] import activation_quant_fusion for CUDA only (#23882) wangxiyuan 2025-08-29 13:54:16 +08:00
5264015d74 [BugFix][AMD][Deepseek] fix a dtype mismatch error for deepseek running on AMD (#23864) Jinghui Zhang 2025-08-28 22:54:12 -07:00
98ac0cb32d [Bugfix] Use ReplicatedLinear for SequenceClassification head (#23836) Isotr0py 2025-08-29 12:41:20 +08:00
c8b3b299c9 [tests] Improve speed and reliability of test_transcription_api_correctness (#23854) Russell Bryant 2025-08-29 00:25:33 -04:00
006477e60b [ROCm][Fix] Fix rocm build caused by #23791 (#23847) Charlie Fu 2025-08-28 21:52:27 -05:00
de533ab2a1 [Models] Improve iteration over layers (#19497) Lukas Geiger 2025-08-29 02:26:34 +01:00
235c9db8a7 [XPU] support data parallel for MoE models on XPU (#22887) Chaojun Zhang 2025-08-29 09:23:04 +08:00
b668055a11 [V0 Deprecation] Remove V0 Samplers test (#23862) Woosuk Kwon 2025-08-28 18:05:52 -07:00
d3d2aad5a2 [Log] Use Debug Once for DeepGEMM E8M0 When not Enabled (#23858) Wentao Ye 2025-08-28 18:18:10 -04:00
cb293f6a79 [V1] Enable prefill optimization for Gemma3n (#22628) Yong Hoon Shin 2025-08-28 14:54:30 -07:00
7ffbf27239 [BugFix][FlashInfer] Fix potential race condition for paged_kv_indptr_cpu (#23737) Woosuk Kwon 2025-08-28 14:22:46 -07:00
27e88cee74 chore: build release image by default (#23852) Simon Mo 2025-08-28 13:17:15 -07:00
16a45b3a28 [NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671) elvischenv 2025-08-29 03:36:50 +08:00
57d4ede520 [bugfix] [spec-decoding] fix data race in sample_recovered_tokens_kernel (vLLM v1) (#23829) Jingkai He 2025-08-29 03:05:20 +08:00
04d1dd7f4a [ROCm][Aiter] Add triton fp8 bmm kernel for mla (#23264) Divakar Verma 2025-08-28 13:18:08 -05:00
f32a5bc505 Migrate Llama4ImagePatchInputs to TensorSchema (#22021) Benji Beck 2025-08-28 10:29:37 -07:00
8805ad9fa9 Add scale_config.yml file for Meta autoscalers for GH Actions (#23840) Jean Schmidt 2025-08-28 18:31:20 +02:00
0583578f42 [ci] breaks down V1 Test into 3 groups of approx 30 minutes runtime (#23757) Jean Schmidt 2025-08-28 17:59:19 +02:00
db74d60490 [Bugfix] Add fake mode around passes (#23349) Angela Yi 2025-08-28 08:25:56 -07:00
95089607fa [Model][gpt-oss] Support DP+EP for GPT-OSS with FlashInfer trtllm-gen MoE (#23819) Po-Han Huang (NVIDIA) 2025-08-28 21:56:20 +08:00
1f096f9b95 [CI] Fix linting error on main (#23835) Thomas Parnell 2025-08-28 15:52:01 +02:00
66548f6603 [Bugfix] Fix benchmark_moe.py for blockwise fp8. (#23823) YUQI.CHENG 2025-08-28 21:44:09 +08:00
d3da2eea54 [Doc]: fix typos in Python scripts (#23828) Didier Durand 2025-08-28 14:37:38 +02:00
bfab219648 [Model] [gpt-oss] fix gpt-oss pp support (#23815) Jiangyun Zhu 2025-08-28 20:36:55 +08:00
a3432f18fd [BugFix][Spec Decode] Use float64 for uniform_probs (#23803) Woosuk Kwon 2025-08-28 05:26:45 -07:00
67cee40da0 [CI/Build][Bugfix] Fix Qwen VL tests on CPU (#23818) Li, Jiang 2025-08-28 19:57:05 +08:00
d99c3a4f7b [Doc]: fix typos in .md files (including those of #23751) (#23825) Didier Durand 2025-08-28 13:38:19 +02:00
3462c1c522 [FIXBUG] Add return_success parameter to moe_wna16_weight_loader function (#22797) JartX 2025-08-28 11:03:22 +02:00
c5d004aaaf [Model] Add PP support and VLM backbone compatability for GPT-OSS (#23680) Isotr0py 2025-08-28 16:03:28 +08:00
11a7fafaa8 [New Model]: Support GteNewModelForSequenceClassification (#23524) wang.yuqi 2025-08-28 15:36:42 +08:00
186aced5ff [Kernel] cuda kernels for upcoming decode context parallel feature (#23791) yzds 2025-08-28 15:29:11 +08:00
daa1273b14 [Bugfix] when set offline model running error (#23711) rongfu.leng 2025-08-28 15:27:45 +08:00
c07a73317d [CI] enable idefics3 and fuyu-8b test in multimodal test (#23790) Jiangyun Zhu 2025-08-28 14:51:24 +08:00
22feac8e95 [Transform] [Quantization] Add transforms to compressed tensors (#22486) Kyle Sayers 2025-08-28 02:43:48 -04:00
c8851a4723 Add deprecation warning for lora_extra_vocab_size (#23635) Jinheng 2025-08-28 13:34:29 +08:00
f48a9af892 [CI] make all multi-gpu weight loading tests run nightly (#23792) Alex 2025-08-27 23:27:36 -05:00
a11adafdca Gracefully handle edge cases in harmony utils (#23155) Jan Kessler 2025-08-28 05:14:00 +02:00
a781e84ec2 [Perf] Tune configs for triton block fp8 gemm H100/H200 (#23748) Michael Goin 2025-08-27 23:12:53 -04:00
1b7b161a09 [Feature] models: pass layer prefix to replace_linear_class for per-layer quantization routing. Addresses #23239 (#23556) Shrey Gupta 2025-08-28 08:42:44 +05:30

... 67 68 69 70 71 ...