Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

44607e07d3 Check if selected backend is None in get_attn_backend_cls() (#12975) Yuan Tang 2025-02-09 22:45:07 -05:00
67c4637ccf [V1] Use msgpack for core request serialization (#12918) Nick Hill 2025-02-09 19:35:56 -08:00
aa0ca5ebb7 [core][rlhf] add colocate example for RLHF (#12984) youkaichao 2025-02-10 10:28:59 +08:00
59fff4a01a [core] improve error handling when wake up from sleep mode (#12981) youkaichao 2025-02-10 09:38:57 +08:00
29f1d47e73 [MISC] Always import version library first in the vllm package (#12979) Lu Fang 2025-02-09 02:56:40 -08:00
cf797aa856 [core] port pynvml into vllm codebase (#12963) youkaichao 2025-02-09 15:00:00 +08:00
24700c346b [V1] Cache uses_mrope in GPUModelRunner (#12969) Woosuk Kwon 2025-02-08 15:32:32 -08:00
d366ccc4e3 [RFC] [Mistral] FP8 format (#10130) Patrick von Platen 2025-02-08 22:12:53 +01:00
870c37481e [V1][Minor] Remove outdated comment (#12968) Woosuk Kwon 2025-02-08 12:48:30 -08:00
86222a3dab [VLM] Merged multi-modal processor for GLM4V (#12449) Jee Jee Li 2025-02-09 04:32:16 +08:00
fe743b798d [bugfix] fix early import of flash attention (#12959) youkaichao 2025-02-09 00:06:56 +08:00
913df14da3 [Bugfix] Remove unused seq_group_metadata_list from ModelInputForGPU (#12935) shangmingc 2025-02-08 22:46:19 +08:00
8a69e0e20e [CI/Build] Auto-fix Markdown files (#12941) Cyrus Leung 2025-02-08 20:25:15 +08:00
4c8dd12ef3 [Misc] Add qwen2.5-vl BNB support (#12944) Isotr0py 2025-02-08 20:24:47 +08:00
256a2d29dc [Doc] Correct HF repository for TeleChat2 models (#12949) Jun Duan 2025-02-08 04:42:15 -05:00
c45d398e6f [CI] Resolve transformers-neuronx version conflict (#12925) Liangfu Chen 2025-02-08 01:41:35 -08:00
011e612d92 [Misc] Log time consumption on weight downloading (#12926) Jun Duan 2025-02-08 04:16:42 -05:00
7e1837676a [misc] Add LoRA to benchmark_serving (#12898) Varun Sundar Rabindranath 2025-02-08 14:45:44 +05:30
2880e21e3d [Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi (#12812) Sanju C Sudhakaran 2025-02-08 14:45:30 +05:30
407b5537db [Build] Make pypi install work on CPU platform (#12874) wangxiyuan 2025-02-08 17:15:15 +08:00
4ea48fb35c [V1][Minor] Move cascade attn logic outside _prepare_inputs (#12943) Woosuk Kwon 2025-02-08 00:39:09 -08:00
e31498bdcb [Misc] Add offline test for disaggregated prefill (#12418) Shaoting 2025-02-08 02:38:20 -06:00
91dd8f7aa6 [bugfix] respect distributed_executor_backend in world_size=1 (#12934) youkaichao 2025-02-08 16:17:08 +08:00
d01f66b039 [Bugfix] Fix multi-round chat error when mistral tokenizer is used (#12859) zifeitong 2025-02-07 23:04:34 -08:00
cc01223f3b [Misc] Fix typo in the example file (#12896) Ke Zhao 2025-02-08 14:56:43 +08:00
306923da82 [Bugfix] Fix Qwen2_5_VLForConditionalGeneration packed_modules_mapping (#12905) Jee Jee Li 2025-02-08 13:02:53 +08:00
3243158336 [V1] Move KV block hashes from Request to KVCacheManager (#12922) Woosuk Kwon 2025-02-07 19:14:10 -08:00
b21f0f9d17 [V1][Minor] Remove outdated comment (#12928) Woosuk Kwon 2025-02-07 19:07:37 -08:00
45cbc4991d [Bugfix] Fix disagg hang caused by the prefill and decode communication issues (#12723) Lu Fang 2025-02-07 16:39:50 -08:00
932c6b7461 [V1] LM Eval With Streaming Integration Tests (#11590) Robert Shaw 2025-02-07 18:07:03 -05:00
eaa92d4437 [ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing (#12501) TJian 2025-02-08 00:13:43 +08:00
0630d4537a [V1] Logprobs and prompt logprobs support (#9880) afeldman-nm 2025-02-07 10:26:20 -05:00
538fab93cd PR #12718 (#12718) Amit Garg 2025-02-07 06:22:37 -08:00
ce26b16268 [Misc] Remove unnecessary detokenization in multimodal processing (#12868) Cyrus Leung 2025-02-07 22:21:17 +08:00
1918aa1b80 [MISC][EASY] Break check file names into entry and args in the pre-commit hooks (#12880) Lu Fang 2025-02-07 05:04:39 -08:00
6e1fc61f0f Prevent unecessary requests to huggingface hub (#12837) Maximilien de Bayser 2025-02-07 02:37:41 -03:00
aa375dca9f [Bugfix] Missing quant_config in deepseek embedding layer (#12836) Szymon Ożóg 2025-02-07 06:35:09 +01:00
433c4a4923 Make vllm compatible with verl (#12824) ZSL98 2025-02-07 11:54:20 +08:00
ef533d25fb [Bugfix] FA2 illegal memory access (#12848) Lucas Wilkinson 2025-02-06 22:54:07 -05:00
b260782357 [misc] Revert # 12833 (#12857) Kevin H. Luu 2025-02-06 16:29:12 -08:00
741429a4cd [MISC] Check space in the file names in the pre commit checks (#12804) Lu Fang 2025-02-06 15:36:21 -08:00
aff404571b Add Bamba Model (#10909) Yu Chin Fabian Lim 2025-02-07 07:22:42 +08:00
467a96a541 [V1] LoRA Support (#10957) Varun Sundar Rabindranath 2025-02-06 23:02:51 +05:30
8108ac841d [Bugfix] Fix unsupported FA version check for Turing GPU (#12828) Isotr0py 2025-02-07 01:18:22 +08:00
afe74f7a96 [Doc] double quote cmake package in build.inc.md (#12840) Jitse Klomp 2025-02-06 18:17:55 +01:00
09b95e36ab [torch.compile] PyTorch 2.6 and nightly compatibility (#12393) youkaichao 2025-02-07 01:09:07 +08:00
85ac82d228 [Kernel] Make rotary_embedding ops more flexible with input shape (#12777) Isotr0py 2025-02-07 00:46:13 +08:00
1e57b1ee63 [Misc] Remove unnecessary decode call (#12833) Cyrus Leung 2025-02-07 00:45:44 +08:00
e152f29502 [misc] Reduce number of config file requests to HuggingFace (#12797) Kevin H. Luu 2025-02-06 06:59:18 -08:00
c786e757fa [Attention] Use FA3 for MLA on Hopper (#12807) Lucas Wilkinson 2025-02-06 06:43:12 -05:00
cefd56ee35 [Docs] Add Google Cloud Slides (#12814) Simon Mo 2025-02-06 01:02:38 -08:00
7ca9934fe7 [Misc] Update w2 scale loading for GPTQMarlinMoE (#12757) Dipika Sikka 2025-02-06 04:02:14 -05:00
0408efc6d0 [Misc] Improve error message for incorrect pynvml (#12809) v0.7.2 youkaichao 2025-02-06 15:23:50 +08:00
449d1bce02 [Misc] Remove duplicated DeepSeek V2/V3 model definition (#12793) Michael Goin 2025-02-06 02:16:20 -05:00
1a6fcad4c9 Improve TransformersModel UX (#12785) Harry Mellor 2025-02-06 06:24:57 +00:00
56534cd577 [Bugfix] Fix the test_ultravox.py's license (#12806) Lu Fang 2025-02-05 21:25:54 -08:00
d88506dda4 [Model] LoRA Support for Ultravox model (#11253) Sumit Vij 2025-02-05 19:54:13 -08:00
9cdea30b4f [Misc][Easy] Remove the space from the file name Lu Fang 2025-02-05 19:23:35 -08:00
76abd0c881 [Bugfix] Better FP8 supported defaults Lucas Wilkinson 2025-02-05 22:22:19 -05:00
5b19b93082 [ROCm][Kernel] Using the correct warp_size value Gregory Shtrasberg 2025-02-05 22:15:08 -05:00
75404d041b [VLM] Update compatibility with transformers 4.49 Cyrus Leung 2025-02-06 11:09:45 +08:00
bf3b79efb8 [VLM] Qwen2.5-VL Roger Wang 2025-02-05 13:31:38 -08:00
9a5b1554b4 [Docs] Drop duplicate [source] links Russell Bryant 2025-02-05 16:30:50 -05:00
a4ce74c14a [VLM] Use shared field to pass token ids to model Cyrus Leung 2025-02-06 05:30:46 +08:00
3b2005e1db Add: Support for Sparse24Bitmask Compressed Models Rahul Tuli 2025-02-05 15:30:43 -06:00
af8486de49 [Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU) Sanju C Sudhakaran 2025-02-06 02:59:45 +05:30
4c3aac51e1 Merging PR #12536 Chen Zhang 2025-02-06 05:24:26 +08:00
bc1bdecebf [core][distributed] exact ray placement control (#12732) youkaichao 2025-02-06 02:03:19 +08:00
022bcc701a [Bugfix] Fix 'ModuleNotFoundError: No module named 'intel_extension_for_pytorch'' for --tensor-parallel-size more than 1 (#12546) Akash kaothalkar 2025-02-05 12:41:02 +05:30
c53dc466b1 [Doc] Remove performance warning for auto_awq.md (#12743) Michael Goin 2025-02-05 01:43:11 -05:00
3d09e592a8 [V1][Misc] Shorten FinishReason enum and use constant strings (#12760) Nick Hill 2025-02-04 22:43:02 -08:00
fcf2e3d7fc [Bugfix] Fix OpenVINO model runner (#12750) Harry Mellor 2025-02-05 06:42:46 +00:00
58b218d7ae [Doc] Update PR Reminder with link to Developer Slack (#12748) Michael Goin 2025-02-05 01:42:09 -05:00
7ff7a638b6 [Model][Quant] Fix GLM, Fix fused module mappings for quantization (#12634) Kyle Sayers 2025-02-05 00:32:06 -05:00
686006a220 [Misc] Bump the compressed-tensors version (#12736) Dipika Sikka 2025-02-04 23:44:48 -05:00
98fd089fc9 [VLM] Add MLA with pure RoPE support for deepseek-vl2 models (#12729) Isotr0py 2025-02-05 12:44:26 +08:00
249824c3bf Refactor Linear handling in TransformersModel (#12727) Harry Mellor 2025-02-05 04:31:12 +00:00
64862d106e [ROCM][AMD][TRITON] Halving warps number for fw_prefill to reduce spilling (#12713) Aleksandr Malyshev 2025-02-04 19:58:22 -08:00
b3a0d01e45 [Core] add and implement VLLM_LOGITS_PROCESSOR_THREADS (#12368) Aviv Keshet 2025-02-04 18:46:26 -08:00
75e94309e8 [Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676) Lucas Wilkinson 2025-02-04 21:22:24 -05:00
233df6f5c4 [V1][Metrics] Add request_success_total counter, labelled with finish reason (#12579) Mark McLoughlin 2025-02-05 00:46:54 +00:00
18016a5e62 [Bugfix] Fix CI failures for InternVL and Mantis models (#12728) Cyrus Leung 2025-02-04 23:54:23 +08:00
649550f27e [Build] update requirements of no-device for plugin usage (#12630) Sophie du Couédic 2025-02-04 14:19:12 +01:00
62467a834a Avoid unnecessary multi-modal input data copy when len(batch) == 1 (#12722) Kero Liang 2025-02-04 21:03:19 +08:00
6469038b14 [Bugfix] Fix loading of fine-tuned models based on Phi-3-Small (#12689) Michael Greenbaum 2025-02-04 14:58:48 +02:00
815079de8e [VLM] merged multimodal processor and V1 support for idefics3 (#12660) Isotr0py 2025-02-04 20:00:51 +08:00
18a88fcccc [V1] Remove scheduling constraint on partial requests (#12674) Woosuk Kwon 2025-02-04 02:43:58 -08:00
d1ca7df84d [VLM] Merged multi-modal processor for InternVL-based models (#12553) Cyrus Leung 2025-02-04 16:44:52 +08:00
96b23621c1 [Misc] Add BNB quantization for Whisper (#12381) Jee Jee Li 2025-02-04 16:27:36 +08:00
c36ac98d01 [AMD][ROCm] Enable DeepSeek model on ROCm (#12662) Hongxia Yang 2025-02-04 03:24:11 -05:00
4896d0c2dd [Quant] Fix use_mla TypeError and support loading pure-sparsity Compressed Tensors configs (#12711) Kyle Sayers 2025-02-04 02:27:11 -05:00
bb392af434 [Doc] Replace ibm-fms with ibm-ai-platform (#12709) Thomas Parnell 2025-02-04 02:05:04 -05:00
5d98d56089 Support Pixtral-Large HF by using llava multimodal_projector_bias config (#12710) Michael Goin 2025-02-03 22:55:46 -05:00
73b35cca7f [Core] Improve hash collision avoidance in prefix caching (#12621) Russell Bryant 2025-02-03 19:28:20 -05:00
5095e96606 [V1] Revert uncache_blocks and support recaching full blocks (#12415) Cody Yu 2025-02-03 15:04:53 -08:00
cf58b9c4ca [MISC] Remove model input dumping when exception (#12582) Cody Yu 2025-02-03 13:34:16 -08:00
4797dad3ec [Model] Add Deepseek V3 fp8_w8a8 configs for B200 (#12707) kushanam 2025-02-03 13:30:39 -08:00
6dd5e52823 Squelch MLA warning for Compressed-Tensors Models (#12704) Kyle Sayers 2025-02-03 16:29:56 -05:00
c11de33dad [Bugfix][Kernel] Fix per-token/per-channel quantization for Hopper scaled mm (#12696) Tyler Michael Smith 2025-02-03 16:04:59 -05:00
33e0602e59 [Misc] Fix improper placement of SPDX header in scripts (#12694) Russell Bryant 2025-02-03 14:16:59 -05:00

... 113 114 115 116 117 ...