Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

3ea7b94523 Move linting to pre-commit (#11975) Harry Mellor 2025-01-20 06:58:01 +00:00
51ef828f10 [torch.compile] fix sym_tensor_indices (#12191) youkaichao 2025-01-20 11:37:50 +08:00
df450aa567 [Bugfix] Fix num_heads value for simple connector when tp enabled (#12074) shangmingc 2025-01-20 10:56:43 +08:00
bbe5f9de7d [Model] Support for fairseq2 Llama (#11442) Martin Gleize 2025-01-19 19:40:40 +01:00
81763c58a0 [V1] Add V1 support of Qwen2-VL (#12128) Roger Wang 2025-01-19 03:52:13 -08:00
edaae198e7 [Misc] Add BNB support to GLM4-V model (#12184) Isotr0py 2025-01-19 19:49:22 +08:00
936db119ed benchmark_serving support --served-model-name param (#12109) gujing 2025-01-19 17:59:56 +08:00
e66faf4809 [torch.compile] store inductor compiled Python file (#12182) youkaichao 2025-01-19 16:27:26 +08:00
630eb5b5ce [Bugfix] Fix multi-modal processors for transformers 4.48 (#12187) Cyrus Leung 2025-01-19 11:16:34 +08:00
4e94951bb1 [BUGFIX] Move scores to float32 in case of running xgrammar on cpu (#12152) Michal Adamczyk 2025-01-19 04:12:05 +01:00
7a8a48d51e [V1] Collect env var for usage stats (#12115) Simon Mo 2025-01-18 19:07:15 -08:00
32eb0da808 [Misc] Support register quantization method out-of-tree (#11969) yancong 2025-01-19 08:13:16 +08:00
6d0e3d3724 [core] clean up executor class hierarchy between v1 and v0 (#12171) youkaichao 2025-01-18 14:35:15 +08:00
02798ecabe [Model] Port deepseek-vl2 processor, remove dependency (#12169) Isotr0py 2025-01-18 13:59:39 +08:00
813f249f02 [Docs] Fix broken link in SECURITY.md (#12175) Russell Bryant 2025-01-17 23:35:21 -05:00
da02cb4b27 [core] further polish memory profiling (#12126) youkaichao 2025-01-18 12:25:08 +08:00
c09503ddd6 [AMD][CI/Build][Bugfix] use pytorch stale wheel (#12172) Hongxia Yang 2025-01-17 22:15:53 -05:00
2b83503227 [misc] fix cross-node TP (#12166) youkaichao 2025-01-18 10:53:27 +08:00
7b98a65ae6 [torch.compile] disable logging when cache is disabled (#12043) youkaichao 2025-01-18 04:29:31 +08:00
b5b57e301e [AMD][FP8] Using MI300 FP8 format on ROCm for block_quant (#12134) Gregory Shtrasberg 2025-01-17 12:12:26 -05:00
54cacf008f [Bugfix] Mistral tokenizer encode accept list of str (#12149) Kunshang Ji 2025-01-18 00:47:53 +08:00
58fd57ff1d [Bugfix] Fix score api for missing max_model_len validation (#12119) Wallas Henrique 2025-01-17 13:24:22 -03:00
87a0c076af [core] allow callable in collective_rpc (#12151) youkaichao 2025-01-17 20:47:01 +08:00
d4e6194570 [CI/Build][CPU][Bugfix] Fix CPU CI (#12150) Li, Jiang 2025-01-17 19:39:52 +08:00
07934cc237 [Misc][LoRA] Improve the readability of LoRA error messages (#12102) Jee Jee Li 2025-01-17 19:32:28 +08:00
69d765f5a5 [V1] Move more control of kv cache initialization from model_executor to EngineCore (#11960) Chen Zhang 2025-01-17 15:39:35 +08:00
8027a72461 [ROCm][MoE] moe tuning support for rocm (#12049) Divakar Verma 2025-01-17 00:49:16 -06:00
d75ab55f10 [Misc] Add deepseek_vl2 chat template (#12143) Isotr0py 2025-01-17 14:34:48 +08:00
d1adb9b403 [BugFix] add more is not None check in VllmConfig.__post_init__ (#12138) Chen Zhang 2025-01-17 13:33:22 +08:00
b8bfa46a18 [Bugfix] Fix issues in CPU build Dockerfile (#12135) Yuan Tang 2025-01-16 23:54:01 -05:00
1475847a14 [Doc] Add instructions on using Podman when SELinux is active (#12136) Yuan Tang 2025-01-16 23:45:36 -05:00
fead53ba78 [CI]add genai-perf benchmark in nightly benchmark (#10704) Kunshang Ji 2025-01-17 12:15:09 +08:00
ebc73f2828 [Bugfix] Fix a path bug in disaggregated prefill example script. (#12121) Kuntai Du 2025-01-17 11:12:41 +08:00
d06e824006 [Bugfix] Set enforce_eager automatically for mllama (#12127) Chen Zhang 2025-01-17 04:30:08 +08:00
62b06ba23d [Model] Add support for deepseek-vl2-tiny model (#12068) Isotr0py 2025-01-17 01:14:48 +08:00
5fd24ec02e [misc] Add LoRA kernel micro benchmarks (#11579) Varun Sundar Rabindranath 2025-01-16 21:21:40 +05:30
874f7c292a [Bugfix] Fix max image feature size for Llava-one-vision (#12104) Roger Wang 2025-01-16 06:54:06 -08:00
92e793d91a [core] LLM.collective_rpc interface and RLHF example (#12084) youkaichao 2025-01-16 20:19:52 +08:00
bf53e0c70b Support torchrun and SPMD-style offline inference (#12071) youkaichao 2025-01-16 19:58:53 +08:00
dd7c9ad870 [Bugfix] Remove hardcoded head_size=256 for Deepseek v2 and v3 (#12067) Isotr0py 2025-01-16 18:11:54 +08:00
9aa1519f08 Various cosmetic/comment fixes (#12089) Michael Goin 2025-01-16 04:59:06 -05:00
f8ef146f03 [Doc] Add documentation for specifying model architecture (#12105) Cyrus Leung 2025-01-16 15:53:43 +08:00
fa0050db08 [Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651) Elfie Guo 2025-01-15 20:31:27 -08:00
cd9d06fb8d Allow hip sources to be directly included when compiling for rocm. (#12087) tvirolai-amd 2025-01-15 23:46:03 +02:00
ebd8c669ef [Bugfix] Fix _get_lora_device for HQQ marlin (#12090) Varun Sundar Rabindranath 2025-01-16 01:29:42 +05:30
70755e819e [V1][Core] Autotune encoder cache budget (#11895) Roger Wang 2025-01-15 11:29:00 -08:00
edce722eaa [Bugfix] use right truncation for non-generative tasks (#12050) Joe Runde 2025-01-15 09:31:01 -07:00
57e729e874 [Doc]: Update OpenAI-Compatible Server documents (#12082) maang-h 2025-01-16 00:07:45 +08:00
de0526f668 [Misc][Quark] Upstream Quark format to VLLM (#10765) kewang-xlnx 2025-01-16 00:05:15 +08:00
5ecf3e0aaf Misc: allow to use proxy in HTTPConnection (#12042) Yuan 2025-01-15 21:16:40 +08:00
97eb97b5a4 [Model]: Support internlm3 (#12037) RunningLeon 2025-01-15 19:35:17 +08:00
3adf0ffda8 [Platform] Do not raise error if _Backend is not found (#12023) wangxiyuan 2025-01-15 18:14:15 +08:00
ad388d25a8 Type-fix: make execute_model output type optional (#12020) Keyun Tong 2025-01-15 01:44:56 -08:00
cbe94391eb Fix: cases with empty sparsity config (#12057) Rahul Tuli 2025-01-15 04:41:24 -05:00
994fc655b7 [V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003) Chen Zhang 2025-01-15 15:55:30 +08:00
3f9b7ab9f5 [Doc] Update examples to remove SparseAutoModelForCausalLM (#12062) Kyle Sayers 2025-01-15 01:36:01 -05:00
ad34c0df0f [core] platform agnostic executor via collective_rpc (#11256) youkaichao 2025-01-15 13:45:21 +08:00
f218f9c24d [core] Turn off GPU communication overlap for Ray executor (#12051) Rui Qiao 2025-01-14 21:19:55 -08:00
0794e7446e [Misc] Add multipstep chunked-prefill support for FlashInfer (#10467) Elfie Guo 2025-01-14 20:47:49 -08:00
b7ee940a82 [V1][BugFix] Fix edge case in VLM scheduling (#12065) Woosuk Kwon 2025-01-14 20:21:28 -08:00
9ddac56311 [Platform] move current_memory_usage() into platform (#11369) Shanshan Shen 2025-01-15 11:38:25 +08:00
1a51b9f872 [HPU][Bugfix] Don't use /dev/accel/accel0 for HPU autodetection in setup.py (#12046) Konrad Zawora 2025-01-15 03:59:18 +01:00
42f5e7c52a [Kernel] Support MulAndSilu (#11624) Jee Jee Li 2025-01-15 10:29:53 +08:00
a3a3ee4e6f [Misc] Merge bitsandbytes_stacked_params_mapping and packed_modules_mapping (#11924) Jee Jee Li 2025-01-15 07:49:49 +08:00
87054a57ab [Doc]: Update the Json Example of the Engine Arguments document (#12045) maang-h 2025-01-15 01:03:04 +08:00
c9d6ff530b Explain where the engine args go when using Docker (#12041) Harry Mellor 2025-01-14 16:05:50 +00:00
a2d2acb4c8 [Bugfix][Kernel] Give unique name to BlockSparseFlashAttention (#12040) Chen Zhang 2025-01-14 23:45:05 +08:00
2e0e017610 [Platform] Add output for Attention Backend (#11981) wangxiyuan 2025-01-14 21:27:04 +08:00
1f18adb245 [Kernel] Revert the API change of Attention.forward (#12038) Chen Zhang 2025-01-14 20:59:32 +08:00
bb354e6b2d [Bugfix] Fix various bugs in multi-modal processor (#12031) Cyrus Leung 2025-01-14 20:16:11 +08:00
ff39141a49 [HPU][misc] add comments for explanation (#12034) youkaichao 2025-01-14 19:24:06 +08:00
8a1f938e6f [Doc] Update Quantization Hardware Support Documentation (#12025) TJian 2025-01-14 12:37:52 +08:00
078da31903 [HPU][Bugfix] set_forward_context and CI test execution (#12014) Konrad Zawora 2025-01-14 04:04:18 +01:00
1a401252b5 [Docs] Add Sky Computing Lab to project intro (#12019) Woosuk Kwon 2025-01-13 17:24:36 -08:00
f35ec461fc [Bugfix] Fix deepseekv3 gate bias error (#12002) Steve Luo 2025-01-14 04:43:51 +08:00
289b5191d5 [Doc] Fix build from source and installation link in README.md (#12013) Yikun Jiang 2025-01-14 01:23:59 +08:00
c6db21313c bugfix: Fix signature mismatch in benchmark's get_tokenizer function (#11982) elijah 2025-01-13 23:22:07 +08:00
a7d59688fb [Platform] Move get_punica_wrapper() function to Platform (#11516) Shanshan Shen 2025-01-13 21:12:10 +08:00
458e63a2c6 [platform] add device_control env var (#12009) youkaichao 2025-01-13 20:59:09 +08:00
e8c23ff989 [Doc] Organise installation documentation into categories and tabs (#11935) Harry Mellor 2025-01-13 12:27:36 +00:00
cd8249903f [Doc][V1] Update model implementation guide for V1 support (#11998) Roger Wang 2025-01-13 03:58:54 -08:00
0f8cafe2d1 [Kernel] unified_attention for Attention.forward (#11967) Chen Zhang 2025-01-13 19:28:53 +08:00
5340a30d01 Fix Max Token ID for Qwen-VL-Chat (#11980) Alex Brooks 2025-01-13 01:37:48 -07:00
89ce62a316 [platform] add ray_device_key (#11948) youkaichao 2025-01-13 16:20:52 +08:00
c3f05b09a0 [Misc]Minor Changes about Worker (#11555) Chenguang Li 2025-01-13 15:47:05 +08:00
cf6bbcb493 [Misc] Fix Deepseek V2 fp8 kv-scale remapping (#11947) Concurrensee 2025-01-13 01:05:06 -06:00
80ea3af1a0 [CI][Spec Decode] fix: broken test for EAGLE model (#11972) Sungjae Lee 2025-01-13 15:50:35 +09:00
9dd02d85ca [Bug] Fix usage of .transpose() and .view() consecutively. (#11979) Siyuan Li 2025-01-13 14:24:10 +08:00
f7b3ba82c3 [MISC] fix typo in kv transfer send recv test (#11983) Yangcheng Li 2025-01-13 13:07:48 +08:00
619ae268c3 [V1] [2/n] Logging and Metrics - OutputProcessor Abstraction (#11973) Robert Shaw 2025-01-12 23:54:10 -05:00
d14e98d924 [Model] Support GGUF models newly added in transformers 4.46.0 (#9685) Isotr0py 2025-01-13 08:13:44 +08:00
9597a095f2 [V1][Core][1/n] Logging and Metrics (#11962) Robert Shaw 2025-01-12 16:02:02 -05:00
263a870ee1 [Hardware][TPU] workaround fix for MoE on TPU (#11764) Avshalom Manevich 2025-01-12 17:53:51 +02:00
8bddb73512 [Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100) Akshat Tripathi 2025-01-12 13:01:52 +00:00
f967e51f38 [Model] Initialize support for Deepseek-VL2 models (#11578) Isotr0py 2025-01-12 16:17:24 +08:00
43f3d9e699 [CI/Build] Add markdown linter (#11857) Rafael Vasquez 2025-01-12 03:17:13 -05:00
b25cfab9a0 [V1] Avoid sending text prompt to core engine (#11963) Roger Wang 2025-01-11 22:36:38 -08:00
4b657d3292 [Model] Add cogagent model support vLLM (#11742) sixgod 2025-01-12 03:05:56 +08:00
d697dc01b4 [Bugfix] Fix RobertaModel loading (#11940) Nicolò Lucchesi 2025-01-11 15:05:09 +01:00
a991f7d508 [Doc] Basic guide for writing unit tests for new models (#11951) Cyrus Leung 2025-01-11 21:27:24 +08:00

... 116 117 118 119 120 ...