Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

772a66732d [platforms] restore xpu check for parallel config (#10479) youkaichao 2024-11-20 09:13:28 -08:00
63f1fde277 [Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355) Li, Jiang 2024-11-20 18:57:39 +08:00
d5b28447e0 [Platforms] Refactor xpu code (#10468) Mengqing Cao 2024-11-20 14:52:13 +08:00
09dbf9ff16 [Bugfix] Handle conflicts between modern and legacy fields (#10471) Cyrus Leung 2024-11-20 14:45:08 +08:00
343041c4c4 [model] Reduce medusa weight (#10454) Sky Lee 2024-11-20 14:05:55 +08:00
ed701ca963 [ci/build] Combine nightly and optional (#10465) Kevin H. Luu 2024-11-19 19:36:03 -10:00
7629a9c6e5 [CI/Build] Support compilation with local cutlass path (#10423) (#10424) wchen61 2024-11-20 13:35:50 +08:00
709c9f1f25 [CI/Build] Add sphinx/rst linter for docs (#10366) Rafael Vasquez 2024-11-20 00:35:31 -05:00
b4be5a8adb [Bugfix] Enforce no chunked prefill for embedding models (#10470) Cyrus Leung 2024-11-20 13:12:51 +08:00
ad44437ba3 [Bugfix] Fix Mamba model initialization and MLP Speculator weights loading (#10456) Isotr0py 2024-11-20 13:04:05 +08:00
9e05252b46 [Misc] Add __setitem__ for LazyDict (#10469) Yanyi Liu 2024-11-20 12:44:57 +08:00
d200972e7f [Bugfix] Marlin 2:4 temp fix for large M dim (>256) (#10464) Lucas Wilkinson 2024-11-19 22:40:33 -05:00
d5b68aba2f [CI/Build] Update Dockerfile.rocm (#10434) Alexei-V-Ivanov-AMD 2024-11-19 19:19:59 -06:00
a324d3a1a7 Change granite chat template to keep json list formatting for tool calls (#10452) Maximilien de Bayser 2024-11-19 22:16:54 -03:00
b00b33d77e [Model][Quantization] HQQ support through Marlin kernel expansion (#9766) ElizaWszola 2024-11-19 22:31:12 +01:00
efa9084628 [Core] Avoid metrics log noise when idle (#8868) Russell Bryant 2024-11-19 16:05:25 -05:00
803f37eaaa [6/N] torch.compile rollout to users (#10437) youkaichao 2024-11-19 10:09:03 -08:00
fd9f124971 [Doc] fix link for page that was renamed (#10455) Russell Bryant 2024-11-19 12:48:30 -05:00
1ea291a417 Fix: Build error seen on Power Architecture (#10421) Manjul Mohan 2024-11-19 23:04:57 +05:30
11fd7ea639 [Pixtral-Large] Pixtral actually has no bias in vision-lang adapter (#10449) Patrick von Platen 2024-11-19 18:33:06 +01:00
f028dff33d [BugFix] Fix hermes tool parser output error stream arguments in some cases (#10395) (#10398) COSMOPlat 2024-11-19 21:42:50 +08:00
b4614656b8 [CI][CPU] adding numa node number as container name suffix (#10441) Yuan 2024-11-19 21:16:43 +08:00
25f9c78961 [misc][plugin] improve plugin loading (#10443) youkaichao 2024-11-19 02:43:21 -08:00
5390d6664f [Doc] Add the start of an arch overview page (#10368) Russell Bryant 2024-11-19 04:52:11 -05:00
382b6a4852 [Misc] Avoid misleading warning messages (#10438) Jee Jee Li 2024-11-19 16:54:58 +08:00
272e31c0bd [Bugfix] Guard for negative counter metrics to prevent crash (#10430) Travis Johnson 2024-11-18 21:57:10 -07:00
74f8c2cf5f Add openai.beta.chat.completions.parse example to structured_outputs.rst (#10433) Michael Goin 2024-11-18 23:37:46 -05:00
8c1fb50705 [Platform][Refactor] Extract func get_default_attn_backend to Platform (#10358) Mengqing Cao 2024-11-19 11:22:26 +08:00
7eb719df13 [Bugfix]Fix Phi-3 BNB online quantization (#10417) Jee Jee Li 2024-11-19 11:21:42 +08:00
284203f171 [ci/build] Have dependabot ignore all patch update (#10436) Kevin H. Luu 2024-11-18 15:04:25 -10:00
90a6c759ca [misc] partial prefix & random input generation benchmark (#9929) Ricky Xu 2024-11-18 15:39:14 -08:00
2298e69b5f [ci][bugfix] fix kernel tests (#10431) youkaichao 2024-11-18 15:29:37 -08:00
a03ea40792 [3/N][torch.compile] consolidate custom op logging (#10399) youkaichao 2024-11-18 15:14:59 -08:00
96d999fbe8 [Kernel] Initial Machete W4A8 support + Refactors (#9855) Lucas Wilkinson 2024-11-18 14:59:29 -05:00
c2170a5b39 [Kernel] Explicitly specify other value in tl.load calls (#9014) Angus Wang 2024-11-18 11:39:40 -08:00
6b2d25efc7 [Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107) Yan Ma 2024-11-19 02:18:05 +08:00
281cc4b3cd [Model][Bugfix] Support TP for PixtralHF ViT (#10405) Michael Goin 2024-11-18 13:04:14 -05:00
4f686d139f Fix open_collective value in FUNDING.yml (#10426) Andrew Nesbitt 2024-11-18 17:52:42 +00:00
31894a2155 [Doc] Add documentation for Structured Outputs (#9943) ismael-dm 2024-11-18 18:52:12 +01:00
7851b45196 [5/N][torch.compile] torch.jit.script --> torch.compile (#10406) youkaichao 2024-11-18 07:20:06 -08:00
4186be8111 [Doc] Update doc for LoRA support in GLM-4V (#10425) B-201 2024-11-18 23:08:30 +08:00
e7ebb662d7 [Model] Remove transformers attention porting in VITs (#10414) Isotr0py 2024-11-18 21:45:21 +08:00
5be4e52b65 [Model][LoRA]LoRA support added for glm-4v (#10418) B-201 2024-11-18 20:57:10 +08:00
01aae1cc68 [Model] Remove redundant softmax when using PoolingType.STEP (#10415) Maybewuss 2024-11-18 18:05:36 +08:00
c7dec926f6 [VLM] Report multi_modal_placeholders in output (#10407) lkchen 2024-11-18 00:06:16 -08:00
51bb12d17b [4/N][torch.compile] clean up set_torch_compile_backend (#10401) youkaichao 2024-11-17 23:57:20 -08:00
47826cacf0 [Bugfix] Ignore ray reinit error when current platform is ROCm or XPU (#10375) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2024-11-18 05:29:26 +02:00
c4e464333e [Misc] Add uninitialized params tracking for AutoWeightsLoader (#10327) Isotr0py 2024-11-18 09:07:46 +08:00
d1557e66d3 [Misc] Enhance offline_inference to support user-configurable paramet… (#10392) wchen61 2024-11-17 19:32:40 +08:00
80d85c5d7b [Bugfix] Fix mrope_position_delta in non-last prefill chunk (#10403) 电脑星人 2024-11-17 16:50:24 +08:00
76aab90ab6 [Hardware] [HPU]add mark_step for hpu (#10239) Kunshang Ji 2024-11-17 16:44:44 +08:00
8d74b5aee9 [platforms] refactor cpu code (#10402) youkaichao 2024-11-16 23:14:23 -08:00
cf349c4a97 [Bugfix][CPU] Fix CPU embedding runner with tensor parallel (#10394) Isotr0py 2024-11-17 15:12:04 +08:00
905d0f0af4 [CI/Build] Fix IDC hpu [Device not found] issue (#10384) Chendi.Xue 2024-11-17 00:58:22 -06:00
643ecf7b11 [V1] Refactor model executable interface for all text-only language models (#10374) Roger Wang 2024-11-16 21:18:46 -08:00
4fd9375028 [2/N][torch.compile] make compilation cfg part of vllm cfg (#10383) youkaichao 2024-11-16 18:02:14 -08:00
661a34fd4f [V1] Add code owners for V1 (#10397) Woosuk Kwon 2024-11-16 10:45:26 -08:00
361c29e174 [Bugfix] Fix M-RoPE position calculation when chunked prefill is enabled (#10388) 电脑星人 2024-11-17 02:10:00 +08:00
b98d89efd4 [Misc] Medusa supports custom bias (#10361) Sky Lee 2024-11-17 00:33:01 +08:00
8b6725b0cf [Misc] Update benchmark to support image_url file or http (#10287) Jaehyun An 2024-11-16 19:15:40 +09:00
1d75472626 [BugFix] [Kernel] Fix GPU SEGV occuring in fused_moe kernel (#10385) rasmith 2024-11-16 03:55:05 -06:00
2f427c2d16 [misc][plugin] improve log messages (#10386) youkaichao 2024-11-16 01:23:20 -08:00
755b85359b [doc] add doc for the plugin system (#10372) youkaichao 2024-11-15 21:46:27 -08:00
32e46e000f [Frontend] Automatic detection of chat content format from AST (#9919) Cyrus Leung 2024-11-16 13:35:40 +08:00
4f168f69a3 [Docs] Misc updates to TPU installation instructions (#10165) Michael Green 2024-11-15 21:26:17 +00:00
3e8d14d8a1 [Doc] Move PR template content to docs (#10159) Russell Bryant 2024-11-15 16:20:20 -05:00
a067f85e08 [Frontend] Add --version flag to CLI (#10369) Russell Bryant 2024-11-15 16:13:53 -05:00
c76ac49d26 [Docs] Add Nebius as sponsors (#10371) Simon Mo 2024-11-15 12:47:40 -08:00
a6221a144a [Misc] bump mistral common version (#10367) v0.6.4.post1 Simon Mo 2024-11-15 09:48:07 -08:00
79ee45b428 [Misc] Bump up test_fused_moe tolerance (#10364) ElizaWszola 2024-11-15 17:31:18 +01:00
691a3ec047 [Bugfix] Ensure special tokens are properly filtered out for guided structured output with MistralTokenizer (#10363) Guillaume Calmettes 2024-11-15 15:50:40 +01:00
3a763ba0c3 [core][misc] keep compatibility for old-style classes (#10356) youkaichao 2024-11-15 05:55:51 -08:00
f2056f726d [Misc] Fix some help info of arg_utils to improve readability (#10362) shangmingc 2024-11-15 20:40:30 +08:00
1d65ec7eeb [Bugfix] Fix fully sharded LoRA bug (#10352) Jee Jee Li 2024-11-15 18:34:58 +08:00
26908554b2 [Doc] Remove float32 choice from --lora-dtype (#10348) Xin Yang 2024-11-15 02:22:57 -08:00
b311efd0bd [Misc] Fix import error in tensorizer tests and cleanup some code (#10349) Cyrus Leung 2024-11-15 17:34:17 +08:00
3d158cdc8d Add default value to avoid Falcon crash (#5363) (#10347) wchen61 2024-11-15 16:52:20 +08:00
02dbf30e9a [Build] skip renaming files for release wheels pipeline (#9671) v0.6.4 Simon Mo 2024-11-14 23:31:52 -08:00
2ac6d0e75b [Misc] Consolidate pooler config overrides (#10351) Cyrus Leung 2024-11-15 14:59:00 +08:00
2ec8827288 [Bugfix] Qwen-vl output is inconsistent in speculative decoding (#10350) Sky Lee 2024-11-15 13:40:10 +08:00
b40cf6402e [Model] Support Qwen2 embeddings and use tags to select model tests (#10184) Cyrus Leung 2024-11-15 12:23:09 +08:00
2885ba0e24 [Misc] Change RedundantReshapesPass and FusionPass logging from info to debug (#10308) Tyler Michael Smith 2024-11-14 21:44:26 -05:00
bf2ddc6610 [bugfix] Fix static asymmetric quantization case (#10334) Luka Govedič 2024-11-14 20:35:11 -05:00
972112d82f [Bugfix] Fix unable to load some models (#10312) Cyrus Leung 2024-11-15 08:55:54 +08:00
11cd1ae6ad [Tool parsing] Improve / correct mistral tool parsing (#10333) Patrick von Platen 2024-11-15 01:42:49 +01:00
554af9228d [Bugfix] use AF_INET6 for OpenAI Compatible Server with ipv6 (#9583) Zijin Xiao 2024-11-15 08:38:53 +08:00
b2e0ad3b59 [Perf] Reduce peak memory usage of llama (#10339) Murali Andoorveedu 2024-11-14 16:38:20 -08:00
4a18fd14ba Support Roberta embedding models (#9387) Maximilien de Bayser 2024-11-14 18:23:29 -03:00
1dbae0329c [Docs] Publish meetup slides (#10331) Woosuk Kwon 2024-11-14 08:19:38 -08:00
675d603400 [CI/Build] Make shellcheck happy (#10285) Cyrus Leung 2024-11-14 17:47:53 +08:00
03025c023f [CI/Build] Fix CPU CI online inference timeout (#10314) Isotr0py 2024-11-14 16:45:32 +08:00
29f3ef26a3 [ci][distributed] disable hanging tests (#10317) youkaichao 2024-11-14 00:23:39 -08:00
294bf467ba [Model] Add BNB quantization support for Idefics3 (#10310) B-201 2024-11-14 14:31:44 +08:00
52b48c1ead [BugFix]: properly deserialize tool_calls iterator before processing by mistral-common when MistralTokenizer is used (#9951) Guillaume Calmettes 2024-11-14 05:48:16 +01:00
f67ce05d0b [Frontend] Pythonic tool parser (#9859) Mike Depinet 2024-11-13 20:14:34 -08:00
e0853b6508 [Misc] format.sh: Simplify tool_version_check (#10305) Russell Bryant 2024-11-13 22:12:35 -05:00
504ac53d18 [misc] error early for old-style class (#10304) youkaichao 2024-11-13 18:55:39 -08:00
15bb8330aa [Bugfix] Fix tensor parallel for qwen2 classification model (#10297) Isotr0py 2024-11-14 10:54:59 +08:00
ac49b59d8b [Bugfix] bitsandbytes models fail to run pipeline parallel (#10200) HoangCongDuc 2024-11-14 00:56:39 +08:00
0b8bb86bf1 [1/N] Initial prototype for multi-modal processor (#10044) Cyrus Leung 2024-11-13 20:39:03 +08:00

... 123 124 125 126 127 ...