Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

01b6113659 [TPU] optimize the all-reduce performance (#15903) Chengji Yao 2025-04-02 17:25:14 -07:00
1b84eff03a [V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#15736) Hyesoo Yang 2025-04-02 17:18:08 -07:00
55acf86bf8 Fix huggingface-cli[hf-xet] -> huggingface-cli[hf_xet] (#15969) Harry Mellor 2025-04-03 00:37:30 +01:00
f021b97993 [V1] Support Mistral3 in V1 (#15950) Michael Goin 2025-04-02 16:36:24 -06:00
1cab43c2d2 [misc] instruct pytorch to use nvml-based cuda check (#15951) youkaichao 2025-04-03 01:02:58 +08:00
8bd651b318 Restricted cmake to be less than version 4 as 4.x breaks the build of… (#15859) Nishidha 2025-04-02 21:49:39 +05:30
58e234a754 [Misc] V1 LoRA support CPU offload (#15843) Jee Jee Li 2025-04-02 23:04:43 +08:00
e86c414d6a [Model] use AutoWeightsLoader in model load_weights (#15770) rongfu.leng 2025-04-02 22:47:31 +08:00
550b2801ad [CPU][Bugfix] Using custom allreduce for CPU backend (#15934) Li, Jiang 2025-04-02 22:46:47 +08:00
cefb9e5a28 [Frontend] Implement Tool Calling with tool_choice='required' (#13483) Matthias Matt 2025-04-02 16:45:45 +02:00
98d7367b61 [Metrics] Hide deprecated metrics (#15458) Mark McLoughlin 2025-04-02 15:37:19 +01:00
594a8b9030 [Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (#15938) Chauncey 2025-04-02 21:33:52 +08:00
44f990515b [CI] Remove duplicate entrypoints-test (#15940) Kay Yan 2025-04-02 17:44:01 +08:00
252937806c [Bugfix][Benchmarks] Ensure async_request_deepspeed_mii uses the OpenAI choices key (#15926) Brayden Zhong 2025-04-02 05:19:35 -04:00
51826d51fa Add minimum version for huggingface_hub to enable Xet downloads (#15873) Harry Mellor 2025-04-02 10:03:36 +01:00
14e53ed11f [V1] Fix json_object support with xgrammar (#15488) Russell Bryant 2025-04-02 05:00:08 -04:00
ddb94c2605 [core] Add tags parameter to wake_up() (#15500) Eric Tang 2025-04-02 01:59:27 -07:00
90969fb39a [Kernel] Add more dtype support for GGUF dequantization (#15879) LukasBluebaum 2025-04-02 10:58:48 +02:00
101f1481f9 [Build/CI] Update lm-eval to 0.4.8 (#15912) Chris Thi 2025-04-02 04:47:57 -04:00
2edc87b161 [Bugfix] Fix cache block size calculation for CPU MLA (#15848) Thien Tran 2025-04-02 16:45:02 +08:00
4203926f10 [CI/Build] Further clean up LoRA tests (#15920) Jee Jee Li 2025-04-02 16:39:09 +08:00
cdb57015a7 [Misc] Replace print with logger (#15923) Chauncey 2025-04-02 16:37:38 +08:00
aa557e6422 [Benchmark]Fix error message (#15866) Li Wang 2025-04-02 16:32:24 +08:00
0e00d40e4f [V1][Bugfix] Fix typo in MoE TPU checking (#15927) Roger Wang 2025-04-01 23:46:42 -07:00
c920e01242 [Doc] Update rocm.inc.md (#15917) chun 2025-04-02 15:38:26 +09:00
274d8e8818 [V1][Minor] Enhance SpecDecoding Metrics Log in V1 (#15902) Woosuk Kwon 2025-04-01 23:38:02 -07:00
2039c6305b [Bugfix] Fix imports for MoE on CPU (#15841) Thien Tran 2025-04-02 11:33:55 +08:00
6efb195a6e [V1] Fix: make sure k_index is int64 for apply_top_k_only (#15907) Brayden Zhong 2025-04-01 22:06:44 -04:00
24b7fb455a [Spec Decode] Fix input triton kernel for eagle (#15909) Ekagra Ranjan 2025-04-01 21:15:14 -04:00
58f5a59769 [Docs] Add Intel as Sponsor (#15913) Simon Mo 2025-04-01 17:16:55 -07:00
db9dfcfa6a [Docs] Add Ollama meetup slides (#15905) Simon Mo 2025-04-01 13:58:59 -07:00
9ef98d527e [Model][MiniMaxText01] Support MiniMaxText01 model inference (#13454) Gerald 2025-04-02 04:23:55 +08:00
93491aefc7 [BugFix] make sure socket close (#15875) yihong 2025-04-02 04:10:24 +08:00
7acd539cd7 [Docs] update usage stats language (#15898) Simon Mo 2025-04-01 12:54:13 -07:00
e75a6301bd [V1][Spec Decode] Implement Eagle Proposer [1/N] (#15729) Woosuk Kwon 2025-04-01 12:33:16 -07:00
a79cc68b3a [V1][Metrics] Initial speculative decoding metrics (#15151) Mark McLoughlin 2025-04-01 18:45:04 +01:00
7e3f7a4ee7 [CI] Disable flaky structure decoding test temporarily. (#15892) Roger Wang 2025-04-01 10:42:34 -07:00
9ec8257914 [Model] Add module name prefixes to gemma3 (#15889) cloud11665 2025-04-02 02:13:40 +09:00
38327cf454 [Model] Aya Vision (#15441) Jennifer Zhao 2025-04-01 09:30:43 -07:00
dfa82e2a3d [CI/Build] Clean up LoRA tests (#15867) Jee Jee Li 2025-04-02 00:28:50 +08:00
e59ca942f5 Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932) bnellnm 2025-04-01 12:07:43 -04:00
a57a3044aa [ROCm][Build][Bugfix] Bring the base dockerfile in sync with the ROCm fork (#15820) Gregory Shtrasberg 2025-04-01 11:56:39 -04:00
4e5a0f6ae2 [Misc] Allow using OpenCV as video IO fallback (#15055) Isotr0py 2025-04-01 23:55:13 +08:00
b63bd14999 Reinstate format.sh and make pre-commit installation simpler (#15890) Harry Mellor 2025-04-01 16:41:30 +01:00
2041c0e360 [Doc] Quark quantization documentation (#15861) chaow-amd 2025-04-01 23:32:45 +08:00
085cbc4f9f [New Model]: jinaai/jina-reranker-v2-base-multilingual (#15876) wang.yuqi 2025-04-01 23:32:26 +08:00
2b93162fb0 Remove format.sh as it's been unsupported >70 days (#15884) Harry Mellor 2025-04-01 15:27:46 +01:00
2e45bd29fe [Misc] remove unused script (#15746) Reid 2025-04-01 21:58:05 +08:00
51d7c6a2b2 [Model] Support Mistral3 in the HF Transformers format (#15505) Michael Goin 2025-04-01 07:10:05 -06:00
f3aca1ee30 setup correct nvcc version with CUDA_HOME (#15725) Yang Chen 2025-04-01 06:09:40 -07:00
8dd41d6bcc [Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE (#15831) Rui Qiao 2025-04-01 06:07:53 -07:00
0a298ea418 [Bugfix] Fix no video/image profiling edge case for MultiModalDataParser (#15828) Isotr0py 2025-04-01 18:17:11 +08:00
d330558bab [Docs] Fix small error in link text (#15868) Harry Mellor 2025-04-01 11:05:14 +01:00
656fd72976 [Misc] Fix speculative config repr string (#15860) shangmingc 2025-04-01 17:26:22 +08:00
79455cf421 [Misc] Enable V1 LoRA by default (#15320) Varun Sundar Rabindranath 2025-04-01 04:53:56 -04:00
30d6a015e0 [Feature] specify model in config.yaml (#15798) Wei Zeng 2025-04-01 01:20:06 -07:00
8af5a5c4e5 fix: can not use uv run collect_env close #13888 (#15792) yihong 2025-04-01 15:45:49 +08:00
3a5f0afcd2 [V1] Implement sliding window attention in kv_cache_manager (#14097) Chen Zhang 2025-04-01 15:33:17 +08:00
c7e63aa4d8 [ROCm] Use device name in the warning (#15838) Gregory Shtrasberg 2025-04-01 03:10:48 -04:00
4a9ce1784c [sleep mode] clear pytorch cache after sleep (#15248) Lionel Villard 2025-04-01 01:58:58 -04:00
7e4e709b43 [V1] TPU - Fix fused MOE (#15834) Alexander Matveev 2025-04-01 01:58:07 -04:00
63d8eabed0 [Bugfix]: Fix is_embedding_layer condition in VocabParallelEmbedding (#15824) Alexey Kiryushin 2025-04-01 05:57:59 +00:00
e830b01383 [Bugfix] Fix extra comma (#15851) Percy 2025-04-01 00:57:28 -05:00
ff6473980d [Bugfix][Model] fix mllama multi-image (#14883) Yan Ma 2025-04-01 13:53:37 +08:00
a164aea35d [Frontend] Add Phi-4-mini function calling support (#14886) Kinfey 2025-04-01 13:50:05 +08:00
a76f547e11 Rename fallback model and refactor supported models section (#15829) Harry Mellor 2025-04-01 06:49:41 +01:00
b7b7676d67 [Distributed] Add custom allreduce support for ROCM (#14125) Ilya Markov 2025-04-01 07:49:12 +02:00
e6e3c55ef2 Move dockerfiles into their own directory (#14549) Harry Mellor 2025-03-31 21:47:32 +01:00
f98a4920f9 [V1][Core] Remove unused speculative config from scheduler (#15818) Mark McLoughlin 2025-03-31 20:15:21 +01:00
d4bfc23ef0 Fix Transformers backend compatibility check (#15290) Harry Mellor 2025-03-31 18:27:07 +01:00
9a2160fa55 [V1] TPU CI - Add basic perf regression test (#15414) Alexander Matveev 2025-03-31 13:25:20 -04:00
2de4118243 fix: change GB to GiB in logging close #14979 (#15807) yihong 2025-04-01 01:00:50 +08:00
239b7befdd [V1][Spec Decode] Remove deprecated spec decode config params (#15466) shangmingc 2025-04-01 00:19:35 +08:00
09e974d483 [Bugfix] Check dimensions of multimodal embeddings in V1 (#15816) Cyrus Leung 2025-04-01 00:01:35 +08:00
e5ef4fa99a Upgrade transformers to v4.50.3 (#13905) Harry Mellor 2025-03-31 16:59:37 +01:00
037bcd942c [Bugfix] Fix missing return value in load_weights method of adapters.py (#15542) Mrm 2025-03-31 21:56:42 +08:00
c2e7507ad4 [Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats (#15813) Alex Brooks 2025-03-31 07:23:53 -06:00
3aa2b6a637 [Model] Update support for NemotronNAS models (#15008) Naveassaf 2025-03-31 15:35:14 +03:00
555aa21905 [V1] Fully Transparent Implementation of CPU Offloading (#15354) youkaichao 2025-03-31 20:22:34 +08:00
e7ae3bf3d6 fix: better install requirement for install in setup.py (#15796) yihong 2025-03-31 20:13:32 +08:00
b932c048ac Recommend developing with Python 3.12 in developer guide (#15811) Harry Mellor 2025-03-31 12:54:49 +01:00
e85829450d [Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050) Charlie Fu 2025-03-31 06:42:18 -05:00
effc5d24fa [Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup (#15748) Jennifer Zhao 2025-03-31 00:38:58 -07:00
18ed3132d2 [Misc] update the comments (#15780) Chengyang LIU 2025-03-30 19:39:56 -07:00
9b459eca88 [V1][Scheduler] Avoid calling _try_schedule_encoder_inputs for every request (#15778) Woosuk Kwon 2025-03-30 14:10:42 -07:00
70fedd0f79 fix: Comments to English for better dev experience (#15768) yihong 2025-03-31 01:47:57 +08:00
bb103b29bf [Bugfix] Added embed_is_patch mask for fuyu model (#15731) kYLe 2025-03-30 05:45:08 -05:00
248e76c4df fix: lint fix a ruff checkout syntax error (#15767) yihong 2025-03-30 18:36:02 +08:00
803d5c35f3 [V1] Override mm_counts for dummy data creation (#15703) Cyrus Leung 2025-03-30 18:20:42 +08:00
7fd8c0f85c fix test_phi3v (#15321) pansicheng 2025-03-30 17:01:34 +08:00
44c3a5abc3 [doc] update conda to usage link in installation (#15761) Reid 2025-03-30 16:12:13 +08:00
6909a76201 [Bugfix] Fix Mistral guided generation using xgrammar (#15704) Julien Denize 2025-03-30 05:20:19 +02:00
045533716b [CI] xgrammar structured output supports Enum. (#15757) Chauncey 2025-03-30 11:20:02 +08:00
3c0ff914ac [Bugfix] Fix Mllama interleaved images input support (#15564) Isotr0py 2025-03-30 02:11:15 +08:00
2bc4be4e32 [V1][Minor] Simplify rejection sampler's parse_output (#15741) Woosuk Kwon 2025-03-29 09:25:17 -07:00
c67abd614f [V1] Support interleaved modality items (#15605) Roger Wang 2025-03-29 06:30:09 -07:00
6fa7cd3dbc [Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore (#12957) shangmingc 2025-03-29 19:01:46 +08:00
94744ba41a [V1] [Feature] Collective RPC (#15444) wwl2755 2025-03-29 05:39:14 -05:00
4965ec42d2 [FEAT] [ROCm] Add AITER int8 scaled gemm kernel (#15433) TJian 2025-03-29 18:33:56 +08:00
73aa7041bf [doc] update doc (#15740) Reid 2025-03-29 12:27:22 +08:00

... 102 103 104 105 106 ...