Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

bca55b556f [Bugfix] fix adding bias twice in ipex GPTQ quantization (#18363) Random Fly 2025-05-20 15:54:33 +08:00
d981396778 [release] Change dockerhub username for TPU release (#18389) Kevin H. Luu 2025-05-19 23:49:23 -07:00
9609327fa4 [Core] [Bugfix]: tensor parallel with prompt embeds (#18171) Nan Qin 2025-05-19 22:21:27 -05:00
f07a673eb2 [Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name (#18358) Isotr0py 2025-05-20 11:20:12 +08:00
d565e0976f [neuron] fix authorization issue (#18364) Liangfu Chen 2025-05-19 16:30:32 -07:00
258bf621d5 fix CUDA_check redefinition in #17918 (#18287) Lucia Fang 2025-05-19 13:42:35 -07:00
dc1440cf9f Neuron up mistral (#18222) Satyajith Chilappagari 2025-05-19 09:54:47 -07:00
8171221834 [Misc] Fix typo (#18330) Gong Shufan 2025-05-20 00:51:01 +08:00
7937c2fd52 Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup (#18337) sunyicode0012 2025-05-20 00:49:57 +08:00
e2ee1e8e9e [Feature]Add support for models quantized with AutoRound (#17850) Wenhua Cheng 2025-05-20 00:38:53 +08:00
20d8ce81eb [Frontend] add --quick option for vllm chat/complete (#18297) Reid 2025-05-20 00:36:13 +08:00
84ab4feb7e [Doc] Fix typo (#18355) Elad Segal 2025-05-19 19:05:16 +03:00
6781af5608 [Quantization] Pool model support bitsandbytes (#18087) Jee Jee Li 2025-05-20 00:03:43 +08:00
1b15df2546 [BugFix] Fix handling of num_computed_tokens with connector (#18232) Nick Hill 2025-05-19 09:03:25 -07:00
43b5f61dce [Doc] Move input-related docs to Features (#18353) Cyrus Leung 2025-05-19 23:08:39 +08:00
c5bb0ebdc6 [Doc] Fix prompt embedding examples (#18350) Li Wang 2025-05-19 21:48:16 +08:00
d637b96099 [BugFix] [Vul] Add missing usedforsecurity=False in MD5 hashing to enable FIPS (#18319) Shaoyu Yang 2025-05-19 16:31:23 +08:00
275c5daeb0 fix: Add type specifications for CLI arguments in tensorizer options (#18314) CYJiang 2025-05-19 14:42:17 +08:00
47fda6d089 [Build] Supports CUDA 12.6 and 11.8 after Blackwell Update (#18316) Simon Mo 2025-05-18 23:19:33 -07:00
27d0952600 [Misc] extract parser.parse_args() (#18323) Reid 2025-05-19 12:06:26 +08:00
221cfc2fea Feature/vllm/input embedding completion api (#17590) Nan Qin 2025-05-18 22:18:05 -05:00
9da1095daf [Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa (#18175) wwl2755 2025-05-18 21:49:46 -05:00
d1211f8794 [Doc] Add doc to explain the usage of Qwen3 thinking (#18291) Robin 2025-05-19 07:04:07 +08:00
b6a6e7a529 [Misc] add litellm integration (#18320) Reid 2025-05-18 23:32:30 +08:00
4fb349f66a Fix copy-paste error in phi4mm image processing (#18315) Lifu Huang 2025-05-18 07:00:12 -07:00
908733aca7 [Model] Use sigmoid for single-label classification (#18313) 22quinn 2025-05-18 07:00:09 -07:00
1a8f68bb90 [doc] update reasoning doc (#18306) Reid 2025-05-18 21:59:14 +08:00
9ab2c02ff8 Support sequence parallelism combined with pipeline parallelism (#18243) cascade 2025-05-17 15:47:25 -07:00
66e63e86ec [MISC] fix typo (#18305) Ning Xie 2025-05-18 01:52:09 +08:00
9214e60631 [Model] use AutoWeightsLoader for solar (#18113) rongfu.leng 2025-05-17 15:24:17 +08:00
f880d42582 Fixed build on ppc64le due to openssl conflicts (#18262) Nishidha 2025-05-17 12:53:46 +05:30
dcfe95234c Update Dockerfile to build for Blackwell (#18095) Michael Goin 2025-05-17 03:23:25 -04:00
48ac2bed5b [Hardware][TPU] Optionally import for TPU backend (#18269) Siyuan Liu 2025-05-17 00:23:12 -07:00
3e0d435027 [P/D][V1] Support dynamic loading of external KV connector implementations (#18142) David Ben-David 2025-05-17 09:40:39 +03:00
4ee4826ede [BugFix] Correct max_model_len derivation from config.json for Mistral format (#17937) 汪志鹏 2025-05-16 21:20:13 -07:00
60017dc841 [Misc] reformat the collect-env output (#18285) Reid 2025-05-17 10:46:18 +08:00
55f1a468d9 Move cli args docs to its own page (#18228) (#18264) Trevor Royer 2025-05-16 19:43:45 -07:00
fd195b194e [V1][P/D] Local attention optimization for NIXL (#18170) Michael Goin 2025-05-16 21:16:33 -04:00
fabe89bbc4 [Spec Decode] Don't fall back to V0 when spec decoding is enabled (#18265) Woosuk Kwon 2025-05-16 16:10:27 -07:00
e73b7dfd69 [Bugfix] fix an illegal memory access was encountered of marlin kernel + act_order (#18245) Jinzhen Lin 2025-05-17 07:02:44 +08:00
7fdfa01530 [Sampler] Adapt to FlashInfer 0.2.3 sampler API (#15777) Bowen Wang 2025-05-16 15:14:03 -07:00
aef94c6d07 [CI] Assign reviewer to mergify with changes to Tensorizer files (#18278) Sanger Steel 2025-05-16 15:04:14 -04:00
0ceaebf87b [BugFix] Fix ordering of KVConnector finished send/rcv sets (#18211) Nick Hill 2025-05-16 09:20:54 -07:00
1db4f47f81 [BugFix] Fix multi async save in MultiConnector (#18246) Nick Hill 2025-05-16 08:13:47 -07:00
d3d91b6f71 [Misc][MacOS] fix bfloat16 error (#18249) Reid 2025-05-16 23:05:59 +08:00
87d871470d [Model] Use autoweightloader for dbrx (#18251) learner0810 2025-05-16 22:54:13 +08:00
a5f8c111c2 [Fix] Fix typo in resolve_hf_chat_template (#18259) fxmarty-amd 2025-05-16 16:52:41 +02:00
e23564cb70 use ceil_div in cutlass block scaling shape check (#17918) Lain 2025-05-16 03:02:58 -07:00
390ec88905 [Misc] Consolidate Audio tests into multimodal common generation tests (#18214) Isotr0py 2025-05-16 17:18:08 +08:00
541817670c [Misc] Add Ray Prometheus logger to V1 (#17925) Seiji Eicher 2025-05-16 03:02:42 -05:00
67da5720d4 [PERF] Speed up Qwen2.5-VL model by speed up rotary position embedding (#17973) Vadim Gimpelson 2025-05-16 10:31:02 +04:00
5c04bb8b86 [doc] fix multimodal example script (#18089) David Xia 2025-05-16 02:05:34 -04:00
3d2779c29a [Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827) Lucia Fang 2025-05-15 22:28:27 -07:00
6b31c84aff Throw better error for when running into k8s service discovery issue (#18209) Will Eaton 2025-05-16 00:07:28 -04:00
b18201fe06 Allow users to pass arbitrary JSON keys from CLI (#18208) Harry Mellor 2025-05-16 05:05:34 +01:00
f4937a51c1 [Model] vLLM v1 supports Medusa (#17956) Sky Lee 2025-05-16 12:05:31 +08:00
ee659e3b60 [Bugfix][ROCm] Use chunked_prefill_paged_decode as fallback for V1 attention on ROCm (#18093) kliuae 2025-05-16 10:30:17 +08:00
4e1c6a0264 [Bugfix] fix rotary embedding test for _get_padded_tensor_shape (#18229) Lucas Wilkinson 2025-05-15 21:32:45 -04:00
c7852a6d9b [Build] Allow shipping PTX on a per-file basis (#18155) Lucas Wilkinson 2025-05-15 19:41:55 -04:00
8795eb9975 [Bugfix] Fix test_eagle test (#18223) Lucia Fang 2025-05-15 15:59:42 -07:00
0b34593017 Adding "AMD: Tensorizer Test" to amdproduction. (#18216) Alexei-V-Ivanov-AMD 2025-05-15 13:01:25 -05:00
e3f3aee6f4 [Misc] Avoid cuda graph log when sizes still match (#18202) Nicolò Lucchesi 2025-05-15 18:59:38 +02:00
92540529c0 [Bugfix] [ROCm]: Remove assertion logic when using AITER fused moe in unquantizedMethod to reenable LLama4 BF16 (#18205) TJian 2025-05-16 00:53:18 +08:00
fadb8d5c2d [Bugfix]Change the exception thrown by call_hf_processor from RuntimeError to ValueError (#18181) Zhonghua Deng 2025-05-16 00:01:47 +08:00
2aa5470ac5 [Frontend] Fix chat template content format detection (#18190) Sebastian Schoennenbeck 2025-05-15 18:00:21 +02:00
51ff154639 Improve examples rendering in docs and GitHub (#18203) Harry Mellor 2025-05-15 16:57:49 +01:00
566ec04c3d Adding "Basic Models Test" and "Multi-Modal Models Test (Extended) 3" in AMD Pipeline (#18106) Alexei-V-Ivanov-AMD 2025-05-15 10:49:23 -05:00
01c22335ba [Kernel] [V1] Fix performance regression for triton unified attention (#18161) Thomas Parnell 2025-05-15 15:39:00 +02:00
451da4bcbd add tools into TokenizeChatRequest (#18187) hustxiayang 2025-05-15 07:01:49 -04:00
07ad27121f Update deprecated type hinting in model_loader (#18130) Harry Mellor 2025-05-15 12:00:21 +01:00
a9944aabfa fix: typos (#18151) omahs 2025-05-15 11:16:15 +02:00
a8f5aec20a [V1] Update zmq socket creation in nixl connector (#18148) Russell Bryant 2025-05-15 02:17:57 -04:00
de71fec81b [CI] don't skip fixed test_kv_cache_events() (#18183) David Xia 2025-05-15 02:17:16 -04:00
70f8b96724 [Bugfix] Fix FusedMoEPrepareAndFinalize for cuda-disalike backends (#18178) Mengqing Cao 2025-05-15 14:16:31 +08:00
dd2a94596a [Model] Allow the use of sliding window in Qwen2 (#17772) inkcherry 2025-05-15 13:29:38 +08:00
420caf7557 [UT] Add ut for none hash (#17892) Ning Xie 2025-05-15 13:28:11 +08:00
4f07a64075 Support custom implementations of VideoLoader backends. (#18091) Chenheli Hua 2025-05-14 22:26:49 -07:00
e6b8e65d2d [Bugfix] Fix fp8 tests for triton_unified_attention for Triton 3.3 (#18013) Thomas Parnell 2025-05-15 07:26:34 +02:00
26d0419309 Update deprecated type hinting in models (#18132) Harry Mellor 2025-05-15 06:06:50 +01:00
83f74c698f [Fix][ROCm] Enforce eager for all encoder-decoder models on ROCm (#18154) Luka Govedič 2025-05-15 01:04:43 -04:00
2dff093574 [Misc] add lobe-chat support (#18177) Reid 2025-05-15 13:02:23 +08:00
afe3236e90 [Chore] astral's ty (#18116) Aaron Pham 2025-05-15 01:00:43 -04:00
65334ef3b9 [V1][Metrics] Remove unused code (#18158) Mark McLoughlin 2025-05-15 04:13:17 +01:00
e60f550b38 [v1] Support multiple KV cache groups in GPU model runner (#17945) Chen Zhang 2025-05-15 09:54:54 +08:00
f25e0d1125 [Bugfix]: make most of test_openai_schema.py pass (#17664) David Xia 2025-05-14 20:04:35 -04:00
09f106a91e Upload vllm index for the rc builds (#18173) Andrey Talman 2025-05-14 16:35:56 -07:00
2142035b51 [V1] Support multiple kv connectors (#17564) Michael Goin 2025-05-14 19:28:02 -04:00
78aa341d12 [CI] Fix race condition in test_kv_cache_events test (#18169) Russell Bryant 2025-05-14 19:27:48 -04:00
7974736740 Add support for loading torchao models with AOPerModuleConfig (#17826) Jerry Zhang 2025-05-14 16:24:59 -07:00
2fc9075b82 [V1] Structured Outputs + Thinking compatibility (#16577) Aaron Pham 2025-05-14 18:45:24 -04:00
d93c976a0d [Kernel] Have rotary embeddings support tensors (#18046) Lucas Wilkinson 2025-05-14 18:43:55 -04:00
749f792553 [Frontend] decrease import time of vllm.multimodal (#18031) David Xia 2025-05-14 18:43:32 -04:00
856865008e [CI] Disable Failing Tests (#18165) Robert Shaw 2025-05-14 16:49:56 -04:00
f9c069c85e Modularize fused experts and integrate PPLX kernels (#15956) bnellnm 2025-05-14 16:11:54 -04:00
418d2f8bfb [V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326) Ekagra Ranjan 2025-05-14 15:31:46 -04:00
964472b966 [Doc] Update prefix cache metrics to counting tokens (#18138) Chen Zhang 2025-05-14 23:23:30 +08:00
59dd311cf5 [KVConnector] Keep KVTransferParams as a dict (#18033) Nick Hill 2025-05-14 08:05:57 -07:00
d066e52013 [Bugfix] Fix chat utils tests (#18139) Cyrus Leung 2025-05-14 20:38:21 +08:00
c8ea982d9b Update deprecated type hinting in platform, plugins, triton_utils, vllm_flash_attn (#18129) Harry Mellor 2025-05-14 13:28:16 +01:00
dc372b9c8a Update deprecated type hinting in vllm/device_allocator and vllm/distributed (#18126) Harry Mellor 2025-05-14 12:07:57 +01:00

... 92 93 94 95 96 ...