Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

1b49148e47 [Installation] Allow lower versions of FastAPI to maintain Ray 2.9 compatibility (#8764) Cyrus Leung 2024-09-27 07:54:09 +08:00
4b377d6feb [BugFix] Fix test breakages from transformers 4.45 upgrade (#8829) Nick Hill 2024-09-27 00:46:43 +01:00
71d21c73ab [Bugfix] Fixup advance_step.cu warning (#8815) Tyler Michael Smith 2024-09-26 19:23:45 -04:00
ee2da3e9ef fix validation: Only set tool_choice auto if at least one tool is provided (#8568) Chirag Jain 2024-09-27 04:53:17 +05:30
e2f6f26e86 [Bugfix] Fix print_warning_once's line info (#8867) Tyler Michael Smith 2024-09-26 19:18:26 -04:00
b28d2104de [Misc] Change dummy profiling and BOS fallback warns to log once (#8820) Michael Goin 2024-09-26 19:18:14 -04:00
93d364da34 [Bugfix] Include encoder prompts len to non-stream api usage response (#8861) Pernekhan Utemuratov 2024-09-26 15:47:00 -07:00
d9cfbc891e [ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (#8872) Kevin H. Luu 2024-09-26 15:02:16 -07:00
70de39f6b4 [misc][installation] build from source without compilation (#8818) youkaichao 2024-09-26 13:19:04 -07:00
68988d4e0d [CI/Build] Fix missing ci dependencies (#8834) fyuan1316 2024-09-27 02:04:39 +08:00
520db4dbc1 [Docs] Add README to the build docker image (#8825) Michael Goin 2024-09-26 14:02:52 -04:00
f70bccac75 [Build/CI] Upgrade to gcc 10 in the base build Docker image (#8814) Tyler Michael Smith 2024-09-26 13:07:18 -04:00
4bb98f2190 [Misc] Update config loading for Qwen2-VL and remove Granite (#8837) Roger Wang 2024-09-26 07:45:30 -07:00
7193774b1f [Misc] Support quantization of MllamaForCausalLM (#8822) v0.6.2 Michael Goin 2024-09-25 17:46:22 -04:00
e2c6e0a829 [Doc] Update doc for Transformers 4.45 (#8817) Roger Wang 2024-09-25 13:29:48 -07:00
770ec6024f [Model] Add support for the multi-modal Llama 3.2 model (#8811) Chen Zhang 2024-09-25 13:29:32 -07:00
4f1ba0844b Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810) Simon Mo 2024-09-25 10:36:26 -07:00
873edda6cf [Misc] Support FP8 MoE for compressed-tensors (#8588) Michael Goin 2024-09-25 12:43:36 -04:00
64840dfae4 [Frontend] MQLLMEngine supports profiling. (#8761) 科英 2024-09-26 00:37:41 +08:00
28e1299e60 rename PromptInputs and inputs with backward compatibility (#8760) Cyrus Leung 2024-09-26 00:36:47 +08:00
0c4d2ad5e6 [VLM][Bugfix] internvl with num_scheduler_steps > 1 (#8614) DefTruth 2024-09-26 00:35:53 +08:00
c6f2485c82 [[Misc]] Add extra deps for openai server image (#8792) Jee Jee Li 2024-09-26 00:35:23 +08:00
300da09177 [Kernel] Fullgraph and opcheck tests (#8479) bnellnm 2024-09-25 10:35:52 -04:00
1c046447a6 [CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777) Hongxia Yang 2024-09-25 10:26:37 -04:00
8fae5ed7f6 [Misc] Fix minor typo in scheduler (#8765) Woo-Yeon Lee 2024-09-25 16:53:03 +09:00
3368c3ab36 [Bugfix] Ray 2.9.x doesn't expose available_resources_per_node (#8767) David Newman 2024-09-25 17:52:26 +10:00
1ac3de09cd [Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer (#8672) Adam Tilghman 2024-09-25 00:49:26 -07:00
3e073e66f1 [Bugfix] load fc bias from config for eagle (#8790) sohamparikh 2024-09-25 02:16:30 -04:00
c23953675f [Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (#8770) Isotr0py 2024-09-25 14:16:11 +08:00
e3dd0692fa [BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv (#8250) zifeitong 2024-09-24 22:53:43 -07:00
fc3afc20df Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (#8752) sroy745 2024-09-24 21:26:36 -07:00
b4522474a3 [Bugfix][Kernel] Implement acquire/release polyfill for Pascal (#8776) sasha0552 2024-09-25 04:26:33 +00:00
ee777d9c30 Fix test_schedule_swapped_simple in test_scheduler.py (#8780) sroy745 2024-09-24 21:26:18 -07:00
6e0c9d6bd0 [Bugfix] Use heartbeats instead of health checks (#8583) Joe Runde 2024-09-24 21:37:38 -06:00
6da1ab6b41 [Core] Adding Priority Scheduling (#5958) Archit Patke 2024-09-24 21:50:50 -05:00
01b6f9e1f0 [Core][Bugfix] Support prompt_logprobs returned with speculative decoding (#8047) Travis Johnson 2024-09-24 18:29:56 -06:00
13f9f7a3d0 [[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768) Jee Jee Li 2024-09-25 08:08:55 +08:00
1e7d5c01f5 [misc] soft drop beam search (#8763) youkaichao 2024-09-24 15:48:39 -07:00
2467b642dd [CI/Build] fix setuptools-scm usage (#8771) Daniele 2024-09-24 21:38:12 +02:00
72fc97a0f1 [Bugfix] Fix torch dynamo fixes caused by replace_parameters (#8748) Lucas Wilkinson 2024-09-24 14:33:21 -04:00
2529d09b5a [Frontend] Batch inference for llm.chat() API (#8648) Andy 2024-09-24 12:44:11 -04:00
a928ded995 [Kernel] Split Marlin MoE kernels into multiple files (#8661) ElizaWszola 2024-09-24 18:31:42 +02:00
cc4325b66a [Bugfix] Fix potentially unsafe custom allreduce synchronization (#8558) Hanzhi Zhou 2024-09-24 01:08:14 -07:00
8ff7ced996 [Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658) Alex Brooks 2024-09-24 01:36:46 -06:00
3f06bae907 [Core][Model] Support loading weights by ID within models (#7931) Peter Salas 2024-09-24 00:14:15 -07:00
b8747e8a7c [MISC] Skip dumping inputs when unpicklable (#8744) Cody Yu 2024-09-23 23:10:03 -07:00
3185fb0cca Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" (#8750) Simon Mo 2024-09-23 22:45:20 -07:00
0250dd68c5 re-implement beam search on top of vllm core (#8726) youkaichao 2024-09-23 22:08:12 -07:00
88577ac928 Fix tests in test_scheduler.py that fail with BlockManager V2 (#8728) sroy745 2024-09-23 21:43:13 -07:00
530821d00c [Hardware][AMD] ROCm6.2 upgrade (#8674) Hongxia Yang 2024-09-23 21:52:39 -04:00
1a2aef3e59 Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335) Alexander Matveev 2024-09-23 18:38:04 -04:00
5f7bb58427 Fix typical acceptance sampler with correct recovered token ids (#8562) jiqing-feng 2024-09-24 03:32:27 +08:00
b05f5c9238 [Core] Allow IPv6 in VLLM_HOST_IP with zmq (#8575) Russell Bryant 2024-09-23 15:15:41 -04:00
9b0e3ec970 [Kernel][LoRA] Add assertion for punica sgmv kernels (#7585) Jee Jee Li 2024-09-24 02:57:42 +08:00
86e9c8df29 [Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701) Lucas Wilkinson 2024-09-23 13:46:26 -04:00
ee5f34b1c2 [CI/Build] use setuptools-scm to set __version__ (#4738) Daniele 2024-09-23 18:44:26 +02:00
f2bd246c17 [VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (#8707) Jani Monoses 2024-09-23 17:43:09 +03:00
a79e522984 [Model] Support pp for qwen2-vl (#8696) Yanyi Liu 2024-09-23 21:46:59 +08:00
3e83c12b5c [Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (#8733) Li, Jiang 2024-09-23 21:15:16 +08:00
e551ca1555 [Hardware][CPU] Refactor CPU model runner (#8729) Isotr0py 2024-09-23 20:12:20 +08:00
9b8c8ba119 [Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657) Alex Brooks 2024-09-23 01:44:48 -06:00
d23679eb99 [Bugfix] fix docker build for xpu (#8652) Yan Ma 2024-09-23 13:54:18 +08:00
57a0702e63 [Bugfix] Fix CPU CMake build (#8723) Luka Govedič 2024-09-22 23:40:46 -04:00
3dda7c2250 [Bugfix] Avoid some bogus messages RE CUTLASS's revision when building (#8702) Tyler Michael Smith 2024-09-22 22:24:59 -04:00
92ba7e7477 [misc] upgrade mistral-common (#8715) youkaichao 2024-09-22 15:41:59 -07:00
d4a2ac8302 [build] enable existing pytorch (for GH200, aarch64, nightly) (#8713) youkaichao 2024-09-22 12:47:54 -07:00
c6bd70d772 [SpecDec][Misc] Cleanup, remove bonus token logic. (#8701) Lily Liu 2024-09-22 12:34:14 -07:00
5b59532760 [Model][VLM] Add LLaVA-Onevision model support (#8486) litianjian 2024-09-23 01:51:44 +08:00
ca2b628b3c [MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (#8703) Huazhong Ji 2024-09-23 01:44:09 +08:00
8ca5051b9a [Misc] Use NamedTuple in Multi-image example (#8705) Alex Brooks 2024-09-22 06:56:20 -06:00
06ed2815e2 [Model] Refactor BLIP/BLIP-2 to support composite model loading (#8407) Cyrus Leung 2024-09-22 20:24:21 +08:00
0e40ac9b7b [ci][build] fix vllm-flash-attn (#8699) youkaichao 2024-09-21 23:24:58 -07:00
13d88d4137 [Bugfix] Refactor composite weight loading logic (#8656) Isotr0py 2024-09-22 12:33:27 +08:00
d66ac62854 [Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (#8643) Tyler Michael Smith 2024-09-21 19:45:02 -04:00
9dc7c6c7f3 [dbrx] refactor dbrx experts to extend FusedMoe class (#8518) Divakar Verma 2024-09-21 16:09:39 -05:00
ec4aaad812 [Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (#8646) rasmith 2024-09-21 04:20:54 -05:00
4dfdf43196 [Doc] Fix typo in AMD installation guide (#8689) Andy Dai 2024-09-21 00:24:12 -07:00
5e85f4f82a [VLM] Use SequenceData.from_token_counts to create dummy data (#8687) Cyrus Leung 2024-09-21 14:28:56 +08:00
71c60491f2 [Kernel] Build flash-attn from source (#8245) Luka Govedič 2024-09-21 02:27:10 -04:00
0faab90eb0 [beam search] add output for manually checking the correctness (#8684) youkaichao 2024-09-20 19:55:33 -07:00
0455c46ed4 [Core] Factor out common code in SequenceData and Sequence (#8675) Cyrus Leung 2024-09-21 10:30:39 +08:00
d4bf085ad0 [MISC] add support custom_op check (#8557) Kunshang Ji 2024-09-21 10:03:55 +08:00
0057894ef7 [Core] Rename PromptInputs and inputs(#8673) Cyrus Leung 2024-09-21 10:00:54 +08:00
0f961b3ce9 [Bugfix] Fix incorrect llava next feature size calculation (#8496) zyddnys 2024-09-20 18:48:32 -04:00
7f9c8902e3 [Hardware][AWS] update neuron to 2.20 (#8676) omrishiv 2024-09-20 15:19:44 -07:00
7c8566aa4f [Doc] neuron documentation update (#8671) omrishiv 2024-09-20 15:04:37 -07:00
b4e4eda92e [Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640) Patrick von Platen 2024-09-20 23:33:03 +02:00
2874bac618 [Bugfix] Config got an unexpected keyword argument 'engine' (#8556) Pastel！ 2024-09-21 05:00:45 +08:00
035fa895ec [Misc] Show AMD GPU topology in collect_env.py (#8649) Cyrus Leung 2024-09-21 04:52:19 +08:00
b28298f2f4 [Bugfix] Validate SamplingParam n is an int (#8548) saumya-saran 2024-09-20 12:46:02 -07:00
2940afa04e [CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (#8670) Alexey Kondratiev(AMD) 2024-09-20 13:27:44 -04:00
3b63de9353 [Model] Add OLMoE (#7922) Niklas Muennighoff 2024-09-20 09:31:41 -07:00
260d40b5ea [Core] Support Lora lineage and base model metadata management (#6315) Jiaxin Shan 2024-09-19 23:20:56 -07:00
9e5ec35b1f [bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (#8474) William Lin 2024-09-19 20:49:54 -07:00
18ae428a0d [Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571) Amit Garg 2024-09-19 17:54:02 -07:00
de6f90a13d [Misc] guard against change in cuda library name (#8609) bnellnm 2024-09-19 18:36:30 -04:00
6cb748e190 [CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (#8551) Alexey Kondratiev(AMD) 2024-09-19 16:06:32 -04:00
9e99407e3c Create SECURITY.md (#8642) Simon Mo 2024-09-19 12:16:28 -07:00
ea4647b7d7 [Doc] Add documentation for GGUF quantization (#8618) Isotr0py 2024-09-20 03:15:55 +08:00
e42c634acb [Core] simplify logits resort in _apply_top_k_top_p (#8619) 盏一 2024-09-20 02:28:25 +08:00

... 130 131 132 133 134 ...