Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

08d81f1014 [Bugfix] Fix deepep tests (#20288) Varun Sundar Rabindranath 2025-07-01 03:29:08 -04:00
6cc1e7d96d [CPU] Update custom ops for the CPU backend (#20255) Li, Jiang 2025-07-01 15:25:03 +08:00
9909726d2a Enable ZP Support for Machete (#20268) czhu-cohere 2025-07-01 00:12:20 -07:00
22e9d42040 [Misc] add xgrammar for arm64 (#18359) Prashant Gupta 2025-07-01 00:02:20 -07:00
86debab54c Fix numel() downcast in vllm/csrc/moe/moe_align_sum_kernels.cu +2 (#17082) Richard Barnes 2025-07-01 00:48:10 -06:00
be250bbc67 [V1] Only print cudagraph tqdm on rank 0 with is_global_first_rank (#19516) Michael Goin 2025-07-01 15:02:09 +09:00
27949354fa [Feature] A calibration-free RTN-based quantization for accurate and accelerated INT4/INT8 inference (#18768) Alex Kogan 2025-07-01 01:44:38 -04:00
bd5038af07 [Doc] add config and troubleshooting guide for NCCL & GPUDirect RDMA (#15897) Ernest Wong 2025-06-30 21:44:39 -07:00
a2f14dc8f9 [CI][Intel Gaudi][vllm-Plugin]Add CI for hpu-plugin-v1-test (#20196) Chendi.Xue 2025-06-30 23:17:07 -05:00
92ee7baaf9 [Example] add one-click runnable example for P2P NCCL XpYd (#20246) Kuntai Du 2025-06-30 21:03:55 -07:00
7151f92241 [Misc] Fix spec decode example (#20296) Woosuk Kwon 2025-06-30 21:01:48 -07:00
e28533a16f [Bugfix] Fix include prompt in stream response when echo=true (#15233) fyuan1316 2025-07-01 09:30:14 +08:00
6d42ce8315 [CLI] Improve CLI arg parsing for -O/--compilation-config (#20156) Luka Govedič 2025-06-30 21:03:13 -04:00
ded1fb635b [Bugfix][V1][P/D]Fix the issue of occasional garbled output for P2pNcclConnector (#20263) Zhonghua Deng 2025-07-01 07:45:14 +08:00
97d9524fe9 [Refactor] Remove useless pdb comment (#20266) Wentao Ye 2025-06-30 14:15:24 -04:00
d8cf819a9a [Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models (#20058) Kyle Sayers 2025-06-30 13:26:49 -04:00
551ef1631a [Unit Test] Add unit test for deep gemm (#20090) Wentao Ye 2025-06-30 12:26:42 -04:00
2863befce3 [Optimization] Use Shared CachedRequestData Instance Across All Requests (#20232) Woosuk Kwon 2025-06-30 09:07:50 -07:00
2965c99c86 [Spec Decode] Clean up spec decode example (#20240) Woosuk Kwon 2025-06-30 08:28:13 -07:00
2062c0723d [Spec Decode] Refactor spec decoding into a separate function (#20238) Woosuk Kwon 2025-06-30 08:13:50 -07:00
1c50e100a9 [Bugfix] fix quark ptpc (#20251) li haoyang 2025-06-30 21:24:50 +08:00
3ee56e26be [Docs] Fix 1-2-3 list in v1/prefix_caching.md (#20243) Michael Yao 2025-06-30 19:20:51 +08:00
8fe7fc8634 [Quantization] Improve BitsAndBytesModelLoader (#20242) Jee Jee Li 2025-06-30 18:22:09 +08:00
e936e401de [Bugfix] Fix processor initialization in transformers 4.53.0 (#20244) Isotr0py 2025-06-30 18:16:16 +08:00
f5dfa07531 [Bugfix] Skip loading extra parameters for modelopt Qwen3 MoE model (#19598) noiji 2025-06-30 18:21:56 +09:00
022c58b80f [doc] Add Slack and Forum to the top navigation (#20208) Reid 2025-06-30 15:53:45 +08:00
19108ef311 [Misc] Fix import (#20233) Woosuk Kwon 2025-06-29 20:34:54 -07:00
5a52f389dd [BUGFIX][DEEPSEEK][MODEL_LOAD] fix w13, w2 weight not initialized assert (#20202) Chendi.Xue 2025-06-29 21:46:19 -05:00
65b1cbb138 [Model] support dots1 (#18254) redmoe-moutain 2025-06-30 10:34:36 +08:00
6c9837a761 Fix cuda_archs_loose_intersection when handling sm_*a (#20207) Huy Do 2025-06-29 16:52:34 -07:00
6f2f53a82d [Quantization] Add compressed-tensors NVFP4 MoE Support (#19990) Dipika Sikka 2025-06-30 00:05:40 +02:00
7b1895e6ce [CI Fix] Try fixing eagle e2e test OOM by reducing block allocation (#20213) Michael Goin 2025-06-29 11:31:37 +09:00
4d36693687 [Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx (#20187) Wentao Ye 2025-06-28 18:06:38 -04:00
daec9dea6e [Bugfix] Correct behavior of GraniteMoeHybrid for TensorParallel execution (#20137) Stan Wozniak 2025-06-28 17:16:41 +02:00
daceac57c7 [Frontend] Generalize v1/audio/transcriptions endpoint (#20179) Nicolò Lucchesi 2025-06-28 17:15:26 +02:00
8615d9776f [CI/Build] Add new CI job to validate Hybrid Models for every PR (#20147) Thomas Parnell 2025-06-28 08:00:25 +02:00
7b460c25f9 [BugFix] Fix the incorrect func name in the comments. (config.py) (#20185) Jiayi Yan 2025-06-28 13:51:16 +08:00
f719772281 [Bugfix] Properly reject requests with empty list guided_choice (#20195) Michael Goin 2025-06-28 14:50:52 +09:00
d45417b804 fix ci issue distributed 4 gpu test (#20204) Wentao Ye 2025-06-28 01:50:00 -04:00
a29e62ea34 Fix num_token_padding support for static per-tensor scaled_fp8_quant (#20188) Michael Goin 2025-06-28 14:48:13 +09:00
e53be6f00a [Misc] Add type assertion of request_id for LLMEngine.add_request (#19700) Chales Xu 2025-06-28 13:47:36 +08:00
c329ceca6d [CI Fix] Pin tests/models/registry.py MiniMaxText01ForCausalLM to revision due to model changes (#20199) Michael Goin 2025-06-28 14:43:06 +09:00
3c545c0c3b [CI/Build] Allow hermetic builds (#18064) Fabien Dupont 2025-06-27 18:04:39 +02:00
e8c3bd2cd1 [Bugfix] Fix some narrowing conversion warnings (#20141) Tyler Michael Smith 2025-06-27 12:01:28 -04:00
c6c983053d [Bugfix] Mark 'hidden_states' as mutable in moe_forward registration. (#20152) bnellnm 2025-06-27 11:42:22 -04:00
aafabaa0d5 [Fix][torch.compile] Enable custom ops by default when Inductor off (#20102) Luka Govedič 2025-06-27 11:00:42 -04:00
94a55c7681 [Fix][ROCm] Remove unused variables to fix build error on GFX11/12 (#19891) Hosang 2025-06-27 10:14:44 -04:00
aa0dc77ef5 [Perf] Improved perf for resolve_chat_template_content_format (#20065) Ilya Lavrenov 2025-06-27 13:16:41 +04:00
4ab3ac285e [Bugfix] Fix flaky failure when getting DP ports (#20151) Michael Goin 2025-06-27 16:30:53 +09:00
d1c956dc0f Gemma3n (Text-only) (#20134) Robert Shaw 2025-06-27 03:16:26 -04:00
dec197e3e5 Quick Fix by adding conditional import for flash_attn_varlen_func in flash_attn (#20143) Chendi.Xue 2025-06-27 00:48:13 -05:00
6e244ae091 [Perf][Frontend] eliminate api_key and x_request_id headers middleware overhead (#19946) Yazan Sharaya 2025-06-27 07:44:14 +03:00
cd4cfee689 [Model][1/N] Automatic conversion of CrossEncoding model (#20012) wang.yuqi 2025-06-27 12:10:04 +08:00
e110930680 [Fix] Fix gemma CI test failing on main (#20124) Thomas Parnell 2025-06-27 06:06:59 +02:00
8b64c895c0 [CI] Sync test dependency with test.in for torch nightly (#19632) Yang Wang 2025-06-26 20:55:25 -07:00
0740e29b66 [Feature] add quick all reduce (#19744) li haoyang 2025-06-27 11:54:24 +08:00
44d2e6af63 [Bugfix] Build moe_data for both sm100 and sm90 (#20086) Michael Goin 2025-06-27 12:50:12 +09:00
2d7779f888 [Perf] SM100 FP8 GEMM Optimizations after cutlass_profiler (#20071) Ilya Markov 2025-06-27 05:50:09 +02:00
a57d57fa72 [Quantization] Bump to use latest compressed-tensors (#20033) Dipika Sikka 2025-06-26 23:50:06 -04:00
71799fd005 [CI Failure] Fix OOM with test_oot_registration_embedding (#20144) Michael Goin 2025-06-27 12:21:04 +09:00
e9fd658a73 [Feature] Expert Parallelism Load Balancer (EPLB) (#18343) Bowen Wang 2025-06-26 15:30:21 -07:00
07b8fae219 [Doc] correct LoRA capitalization (#20135) Kyle Yu 2025-06-26 18:22:12 -04:00
562308816c [Refactor] Rename commnication utils (#20091) Wentao Ye 2025-06-26 18:19:32 -04:00
04e1642e32 [TPU] add kv cache update kernel (#19928) Chengji Yao 2025-06-26 10:01:37 -07:00
b69781f107 [Hardware][Intel GPU] Add v1 Intel GPU support with Flash attention backend. (#19560) Kunshang Ji 2025-06-27 00:27:18 +08:00
0bceac9810 Spam folks if config.py changes (#20131) Tyler Michael Smith 2025-06-26 11:19:46 -04:00
34878a0b48 [Doc] Rename page titles (#20130) Cyrus Leung 2025-06-26 23:18:49 +08:00
6393b03986 [Doc] Auto sign-off for VSCode (#20132) Cyrus Leung 2025-06-26 23:18:36 +08:00
0907d507bf [Doc] Automatically signed-off by PyCharm (#20120) wang.yuqi 2025-06-26 22:34:17 +08:00
c894c5dc1f [Bug Fix] Fix address/port already in use error for deep_ep test (#20094) Wentao Ye 2025-06-26 10:33:13 -04:00
1f5d178e9c Revert "[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine" (#20128) Michael Goin 2025-06-26 23:32:22 +09:00
27c065df50 [Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) (#19904) TJian 2025-06-26 05:42:31 -07:00
84c260caeb [Docs] Improve frameworks/helm.md (#20113) Michael Yao 2025-06-26 18:41:51 +08:00
167aca45cb [Misc] Use collapsible blocks for benchmark examples. (#20017) Reid 2025-06-26 18:35:16 +08:00
0567c8249f [CPU] Fix torch version in x86 CPU backend (#19258) Li, Jiang 2025-06-26 18:34:47 +08:00
d188913d99 [Refactor] Remove unused library (#20099) Wentao Ye 2025-06-26 05:16:10 -04:00
1d7c29f5fe [Doc] Update docs for New Model Implementation (#20115) Cyrus Leung 2025-06-26 15:47:06 +08:00
65397e40f5 [Bugfix] Allow CUDA_VISIBLE_DEVICES='' in Platform.device_id_to_physical_device_id (#18979) Seiji Eicher 2025-06-26 00:01:57 -07:00
9502c38138 [Benchmark][Bug] Fix multiple bugs in bench and add args to spec_decode offline (#20083) Ekagra Ranjan 2025-06-26 01:06:27 -04:00
2582683566 [PD] Skip tp_size exchange with rank0 (#19413) Nicolò Lucchesi 2025-06-26 05:04:39 +02:00
754b00edb3 [Bugfix] Fix Mistral tool-parser regex for nested JSON (#20093) Michael Goin 2025-06-26 10:01:17 +09:00
296ce95d8e [CI] Add SM120 to the Dockerfile (#19794) Michael Goin 2025-06-26 08:23:56 +09:00
2d7620c3eb [TPU] Add TPU specific var VLLM_TPU_MOST_MODEL_LEN (#19919) Chenyaaang 2025-06-25 15:51:02 -07:00
55c65ab495 [P/D] Avoid stranding blocks in P when aborted in D's waiting queue (#19223) Nick Hill 2025-06-25 15:19:44 -07:00
2cc2069970 [TPU][Bugfix] fix kv cache padding (#20048) Chengji Yao 2025-06-25 14:24:10 -07:00
9f0608fc16 [Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine (#20062) zhrrr 2025-06-26 05:03:17 +08:00
4e0db57fff Fix the path to the testing script. (#20082) QiliangCui 2025-06-25 13:48:17 -07:00
c40692bf9a [Misc] Add parallel state node_count function (#20045) Nick Hill 2025-06-25 13:38:53 -07:00
4734704b30 [PD] let toy proxy handle /chat/completions (#19730) lkchen 2025-06-25 12:17:45 -07:00
8b8c209e35 static_scaled_fp8_quant should not run when scale.numel is not 1 (#20076) Eldar Kurtić 2025-06-25 21:08:03 +02:00
23a04e0895 [Fix] Support cls pooling in ModernBertPooler (#20067) lsz05 2025-06-26 04:07:45 +09:00
02c97d9a92 [Quantization] Add compressed-tensors emulations support for NVFP4 (#19879) Dipika Sikka 2025-06-25 14:28:19 -04:00
e795d723ed [Frontend] Add /v1/audio/translations OpenAI API endpoint (#19615) Nicolò Lucchesi 2025-06-25 19:54:14 +02:00
8359f4c8d8 [V1][Speculative Decoding] Fix DeepSeek MTP (#20022) cjackal 2025-06-26 00:41:02 +09:00
bf5181583f [Doc] Guide for Incremental Compilation Workflow (#19109) Michael Goin 2025-06-25 22:06:46 +09:00
c53fec1fcb [doc] add reference link for Intel XPU (#20064) Reid 2025-06-25 20:24:07 +08:00
0f9e7354f5 [BugFix] Fix full-cuda-graph illegal memory access in FA3 (#20057) Lucas Wilkinson 2025-06-25 04:39:04 -04:00
ba7ba35cda [Chore] debloat some initial logs (#19438) Aaron Pham 2025-06-25 02:36:22 -04:00
015fab8c2f [Kernels][Bugfix] Use torch op for all kernels in FusedMoE forward. Add additional testing for cudagraphs. (#19717) bnellnm 2025-06-25 02:22:58 -04:00
f59fc60fb3 [Feat][CLI] enforce-include-usage (#19695) Max Wittig 2025-06-25 07:43:04 +02:00

... 84 85 86 87 88 ...