Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

a62bc0109c [Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark. (#10105) Atlas 2024-11-07 19:20:30 +08:00
999df95b4e [Bugfix] Make image processor respect mm_processor_kwargs for Qwen2-VL (#10112) Jiahao Li 2024-11-07 18:50:44 +08:00
a6f332d0d9 [Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target (#10108) Li, Jiang 2024-11-07 18:42:50 +08:00
0dfba97b42 [Frontend] Fix multiple values for keyword argument error (#10075) (#10076) Lei Yang 2024-11-07 17:07:19 +08:00
aa9078fa03 Adds method to read the pooling types from model's files (#9506) Flávia Béo 2024-11-07 05:42:40 -03:00
e036e527a0 [CI/Build] Improve mypy + python version matrix (#10041) Russell Bryant 2024-11-07 02:54:16 -05:00
6192e9b8fe [Core][Distributed] Refactor ipc buffer init in CustomAllreduce (#10030) Hanzhi Zhou 2024-11-06 23:50:47 -08:00
d7263a1bb8 Doc: Improve benchmark documentation (#9927) Rafael Vasquez 2024-11-07 02:50:35 -05:00
104d729656 [CI/Build] re-add codespell to CI (#10083) Russell Bryant 2024-11-07 01:54:46 -05:00
db7db4aab9 [Misc] Consolidate ModelConfig code related to HF config (#10104) Cyrus Leung 2024-11-07 14:00:21 +08:00
1fa020c539 [V1][BugFix] Fix Generator construction in greedy + seed case (#10097) Nick Hill 2024-11-07 05:06:57 +00:00
e7b84c394d [doc] add back Python 3.8 ABI (#10100) youkaichao 2024-11-06 21:06:41 -08:00
a4b3e0c1e9 [Hardware][CPU] Update torch 2.5 (#9911) Li, Jiang 2024-11-07 12:43:08 +08:00
29862b884b [Frontend] Adjust try/except blocks in API impl (#10056) Nick Hill 2024-11-07 04:07:51 +00:00
d3859f1891 [Misc][XPU] Upgrade to Pytorch 2.5 for xpu backend (#9823) Yan Ma 2024-11-07 09:29:03 +08:00
4ab3256644 [Bugfix] Fix FP8 torch._scaled_mm fallback for torch>2.5 with CUDA<12.4 (#10095) Michael Goin 2024-11-06 19:54:13 -05:00
719c1ca468 [core][distributed] add stateless_init_process_group (#10072) youkaichao 2024-11-06 16:42:09 -08:00
74f2f8a0f1 [CI/Build] Always run the ruff workflow (#10092) Russell Bryant 2024-11-06 17:25:23 -05:00
d58268c56a [V1] Make v1 more testable (#9888) Joe Runde 2024-11-06 12:57:35 -07:00
87bd7e0515 [CI/Build] change conflict PR comment from mergify (#10080) Russell Bryant 2024-11-06 13:15:42 -05:00
098f94de42 [CI/Build] Drop Python 3.8 support (#10038) Russell Bryant 2024-11-06 09:31:01 -05:00
399c798608 Remove ScaledActivation for AWQ (#10057) Michael Goin 2024-11-06 09:27:06 -05:00
406d4cc480 [Model][LoRA]LoRA support added for Qwen2VLForConditionalGeneration (#10022) Eric 2024-11-06 22:13:15 +08:00
a5bba7d234 [Model] Add Idefics3 support (#9767) Jee Jee Li 2024-11-06 19:41:17 +08:00
2003cc3513 [Model][LoRA]LoRA support added for LlamaEmbeddingModel (#10071) Jee Jee Li 2024-11-06 17:49:19 +08:00
6a585a23d2 [Hotfix] Fix ruff errors (#10073) Woosuk Kwon 2024-11-06 01:24:28 -08:00
a02a50e6e5 [Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143) Konrad Zawora 2024-11-06 10:09:10 +01:00
a5fda50a10 [CI/Build] Fix large_gpu_mark reason (#10070) Isotr0py 2024-11-06 16:50:37 +08:00
21063c11c7 [CI/Build] drop support for Python 3.8 EOL (#8464) Aaron Pham 2024-11-06 02:11:55 -05:00
4be3a45158 [distributed] add function to create ipc buffers directly (#10064) youkaichao 2024-11-05 22:35:03 -08:00
4089985552 [V1] Integrate Piecewise CUDA graphs (#10058) Woosuk Kwon 2024-11-05 22:16:04 -08:00
9d59b75593 [Bugfix] Remove CustomChatCompletionContentPartParam multimodal input type (#10054) zifeitong 2024-11-05 21:13:09 -08:00
ea928f608c [Bugfix] Gpt-j-6B patch kv_scale to k_scale path (#10063) arakowsk-amd 2024-11-05 21:10:40 -08:00
2bcbae704c [Bugfix] Fix edge-case crash when using chat with the Mistral Tekken Tokenizer (#10051) Travis Johnson 2024-11-05 21:28:29 -07:00
ffc0f2b47a [Model][OpenVINO] Fix regressions from #8346 (#10045) Peter Salas 2024-11-05 20:19:15 -08:00
82bfc38d07 [Misc] Sort the list of embedding models (#10037) Cyrus Leung 2024-11-06 12:05:05 +08:00
c4cacbaa7f [v1] reduce graph capture time for piecewise cudagraph (#10059) youkaichao 2024-11-05 18:19:50 -08:00
0c63c34f72 [Bugfix][SpecDecode] kv corruption with bonus tokens in spec decode (#9730) Sungjae Lee 2024-11-06 10:45:45 +09:00
966e31697b [Bugfix] Fix pickle of input when async output processing is on (#9931) Wallas Henrique 2024-11-05 21:39:26 -03:00
43300bd98a [Bugfix] Properly propagate trust_remote_code settings (#10047) zifeitong 2024-11-05 16:34:40 -08:00
ca9844b340 [bugfix] fix weak ref in piecewise cudagraph and tractable test (#10048) youkaichao 2024-11-05 14:49:20 -08:00
235366fe2e [CI] Prune back the number of tests in tests/kernels/* (#9932) Michael Goin 2024-11-05 16:02:32 -05:00
02462465ea [CI] Prune tests/models/decoder_only/language/* tests (#9940) Michael Goin 2024-11-05 16:02:23 -05:00
b9c64c0ca7 [Misc] Modify BNB parameter name (#9997) Jee Jee Li 2024-11-06 03:40:08 +08:00
d2e80332a7 [Feature] Update benchmark_throughput.py to support image input (#9851) lkchen 2024-11-05 11:30:02 -08:00
a53046b16f [Model] Support quantization of PixtralHFTransformer for PixtralHF (#9921) Michael Goin 2024-11-05 13:42:20 -05:00
731aec5be7 [CI/Build] Limit github CI jobs based on files changed (#9928) Russell Bryant 2024-11-05 13:30:42 -05:00
09d3550372 [Misc] Add logging for CUDA memory (#10027) Chenghao (Alan) Yang 2024-11-05 11:50:50 -06:00
cd34029e91 Refactor TPU requirements file and pin build dependencies (#10010) Richard Liu 2024-11-05 08:48:44 -08:00
5952d81139 [Frontend] Fix tcp port reservation for api server (#10012) Russell Bryant 2024-11-05 10:50:57 -05:00
93dee88f6b [Misc] vllm CLI flags should be ordered for better user readability (#10017) Chauncey 2024-11-05 18:59:56 +08:00
7a83b1aec0 [BugFix] Lazy import ray (#10021) Gene Der Su 2024-11-05 02:04:10 -08:00
ad23318928 [Bugfix] Fixup Mamba (#10004) Tyler Michael Smith 2024-11-04 22:46:38 -05:00
bbc3619dc8 [Core] Make encoder-decoder inputs a nested structure to be more composable (#9604) Cyrus Leung 2024-11-05 10:07:31 +08:00
04bbf38e05 [Core] Use os.sched_yield in ShmRingBuffer instead of time.sleep (#9994) Tyler Michael Smith 2024-11-04 20:08:21 -05:00
8f0a9ca890 [Bugfix] Respect modules_to_not_convert within awq_marlin (#9895) Michael Goin 2024-11-04 18:57:44 -05:00
2094062b4e [4.5/N] bugfix for quant config in speculative decode (#10007) youkaichao 2024-11-04 15:11:59 -08:00
d93478b399 [Bugfix] Upgrade to pytorch 2.5.1 (#10001) bnellnm 2024-11-04 18:11:28 -05:00
ac04a97a9f [Frontend] Add max_tokens prometheus metric (#9881) tomeras91 2024-11-05 00:53:24 +02:00
9a5664d4a4 [Misc] Refactor benchmark_throughput.py (#9779) lkchen 2024-11-04 14:32:16 -08:00
04cef2c6ab [Bugfix] Fix MQLLMEngine hanging (#9973) Robert Shaw 2024-11-04 16:01:43 -05:00
6e056bcf04 [Doc] Update VLM doc about loading from local files (#9999) Roger Wang 2024-11-04 11:47:11 -08:00
5208dc7a20 [Bugfix][CI/Build][Hardware][AMD] Shard ID parameters in AMD tests running parallel jobs (#9279) hissu-hyvarinen 2024-11-04 21:37:46 +02:00
1c45f4c385 [CI] Basic Integration Test For TPU (#9968) Robert Shaw 2024-11-04 14:34:26 -05:00
603a661ae8 [Model] factoring out MambaMixer out of Jamba (#8993) Mor Zusman 2024-11-04 20:00:00 +02:00
fb2716d641 [Misc]Reduce BNB static variable (#9987) Jee Jee Li 2024-11-05 01:04:40 +08:00
8d72bb20fa [4/N] make quant config first-class citizen (#9978) youkaichao 2024-11-04 08:51:31 -08:00
ac6b8f19b9 [Frontend] Multi-Modality Support for Loading Local Image Files (#9915) Chauncey 2024-11-04 23:34:57 +08:00
ccb5376a9a [Bugfix][OpenVINO] Fix circular reference #9939 (#9974) Mengqing Cao 2024-11-04 18:14:13 +08:00
ea4adeddc1 [Bugfix] Fix E2EL mean and median stats (#9984) Tran Quang Dai 2024-11-04 16:37:58 +07:00
4dbcbbeb09 [Misc] Compute query_start_loc/seq_start_loc on CPU (#9447) Yang Zheng 2024-11-04 16:54:37 +08:00
b67feb1274 [Bugfix]Using the correct type hints (#9885) Gregory Shtrasberg 2024-11-04 01:19:51 -05:00
c49f0407ba [Bugfix] Fix MiniCPMV and Mllama BNB bug (#9917) Jee Jee Li 2024-11-04 11:36:41 +08:00
91c9ebbb1b [V1] Fix Configs (#9971) Robert Shaw 2024-11-03 19:24:40 -05:00
54597724f4 [Model] Add support for H2OVL-Mississippi models (#9747) shanshan wang 2024-11-03 18:15:36 -06:00
1f1b6d6eda [V1] Support per-request seed (#9945) Nick Hill 2024-11-03 17:14:17 +00:00
3bb4befea7 [bugfix] fix tsts (#9959) youkaichao 2024-11-02 15:54:05 -07:00
ae5279a163 [torch.compile] Adding torch compile to vision-language models (#9946) Yongzao 2024-11-03 03:56:05 +08:00
1b73ab2a1f [CI/Build] Quoting around > (#9956) Nikita Furin 2024-11-02 22:50:28 +03:00
cea808f325 [3/N] model runner pass the whole config to model (#9958) youkaichao 2024-11-02 12:08:49 -07:00
74b529ceee [bugfix] fix chatglm dummy_data_for_glmv (#9955) youkaichao 2024-11-02 08:03:33 -07:00
d6459b4516 [V1] Fix EngineArgs refactor on V1 (#9954) Robert Shaw 2024-11-02 10:44:38 -04:00
e893795443 [2/N] executor pass the complete config to worker/modelrunner (#9938) youkaichao 2024-11-02 07:35:05 -07:00
1d4cfe2be1 [Doc] Updated tpu-installation.rst with more details (#9926) Michael Green 2024-11-02 14:06:45 +00:00
eed92f12fc [Docs] Update Granite 3.0 models in supported models table (#9930) Nick Hill 2024-11-02 09:02:18 +00:00
af7380d83b [torch.compile] fix cpu broken code (#9947) youkaichao 2024-11-01 23:35:47 -07:00
a78dd3303e [Encoder Decoder] Add flash_attn kernel support for encoder-decoder models (#9559) sroy745 2024-11-01 23:22:49 -07:00
d522034c85 [ci/build] Have dependabot ignore pinned dependencies (#9935) Kevin H. Luu 2024-11-01 13:56:13 -10:00
6c0b7f548d [Core][VLM] Add precise multi-modal placeholder tracking (#8346) Peter Salas 2024-11-01 16:21:10 -07:00
d151fde834 [ci/build] Bump the patch-update group with 10 updates (#9897) dependabot[bot] 2024-11-01 23:04:42 +00:00
27cd36e6e2 [Bugfix] PicklingError on RayTaskError (#9934) Gene Der Su 2024-11-01 15:08:23 -07:00
18bd7587b7 [1/N] pass the complete config from engine to executor (#9933) youkaichao 2024-11-01 13:51:57 -07:00
598b6d7b07 [Bugfix/Core] Flashinfer k_scale and v_scale (#9861) Pavani Majety 2024-11-01 12:15:05 -07:00
aff1fd8188 [torch.compile] use interpreter with stable api from pytorch (#9889) youkaichao 2024-11-01 11:50:37 -07:00
4581d2cc02 [Core] Refactor: Clean up unused argument in Scheduler._preempt (#9696) André Jonasson 2024-11-01 19:41:38 +01:00
1dd4cb2935 [Bugfix] Fix edge cases for MistralTokenizer (#9625) Travis Johnson 2024-11-01 11:33:15 -06:00
ba0d892074 [Frontend] Use a proper chat template for VLM2Vec (#9912) Cyrus Leung 2024-11-01 22:09:07 +08:00
30a2e80742 [CI/Build] Add Model Tests for PixtralHF (#9813) Michael Goin 2024-11-01 09:55:29 -04:00
06386a64dd [Frontend] Chat-based Embeddings API (#9759) Cyrus Leung 2024-11-01 16:13:35 +08:00
d3aa2a8b2f [Doc] Update multi-input support (#9906) Cyrus Leung 2024-11-01 15:34:49 +08:00

... 125 126 127 128 129 ...