Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

e1fe7591f2 [Misc]Code Cleanup (#13859) Chenguang Li 2025-02-26 10:44:30 +08:00
5629f26df7 [V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729) Lily Liu 2025-02-25 18:14:48 -08:00
9ba28043b5 [misc] Show driver IP info when Ray fails to allocate driver worker (#13858) Rui Qiao 2025-02-25 17:53:43 -08:00
24679788ed DeepSeek V2/V3/R1 only place lm_head on last pp rank (#13833) Harry Mellor 2025-02-26 01:24:57 +00:00
07c4353057 [Model] Support Grok1 (#13795) Michael Goin 2025-02-25 20:07:12 -05:00
34e3494e70 Fix failing MyGemma2Embedding test (#13820) Harry Mellor 2025-02-25 20:33:03 +00:00
f75aa72732 [Neuron] Add custom_ops for neuron backend (#13246) Liangfu Chen 2025-02-25 11:47:49 -08:00
340e39e387 Fix string parsing error (#13825) Chen1022 2025-02-26 00:20:29 +08:00
f4133ce4e5 [Bugfix] Revert inspection code in #13743 (#13832) Cyrus Leung 2025-02-26 00:18:50 +08:00
6522d55b6f Fix /v1/audio/transcriptions Bad Request Error (#13811) Wen Sun 2025-02-25 22:03:33 +08:00
6ff518626c [Bugfix] Fix deepseek-vl2 inference with more than 2 images (#13818) Isotr0py 2025-02-25 22:03:02 +08:00
fa82074167 [Bugfix] Flush TunableOp results before worker processes are destroyed. (#13623) Nichols A. Romero 2025-02-25 05:08:20 -06:00
75e9d49796 [Bugfix] Initialize attention bias on the same device as Query/Key/Value (#13468) Junlin Zhou 2025-02-25 18:13:09 +08:00
32c3b6bfd1 [Misc]Clarify Error Handling for Non-existent Model Paths and HF Repo IDs (#13724) Chen1022 2025-02-25 18:12:19 +08:00
37b6cb4985 [CI/Build] Fix V1 LoRA failure (#13767) Jee Jee Li 2025-02-25 18:01:15 +08:00
aabeb2688f [ROCm][Quantization][Kernel] Using HIP FP8 header (#12593) Gregory Shtrasberg 2025-02-25 03:39:59 -05:00
2f42a4888c [Feature] Support KV cache offloading and disagg prefill with LMCache connector. (#12953) Jiayi Yao 2025-02-25 02:38:42 -06:00
3173c3b34e [misc] Clean up ray compiled graph type hints (#13731) Rui Qiao 2025-02-25 00:37:08 -08:00
2d87d7d1ac [Bugfix] Modify modelscope api usage in transformer_utils (#13807) Shanshan Shen 2025-02-25 16:36:07 +08:00
aab392774b [Core] xgrammar: Expand list of unsupported jsonschema keywords (#13783) Russell Bryant 2025-02-25 03:21:25 -05:00
6724e79164 [Misc] Check that the model can be inspected upon registration (#13743) Cyrus Leung 2025-02-25 16:18:19 +08:00
03f48b3db6 [Core] LoRA V1 - Add add/pin/list/remove_lora functions (#13705) Varun Sundar Rabindranath 2025-02-25 13:48:02 +05:30
4d251ad00e Fix CompressedTensorsWNA16MoE with grouped scales (#13769) Michael Goin 2025-02-25 03:17:14 -05:00
18e505930d [Bugfix] Support MLA for CompressedTensorsWNA16 (#13725) Michael Goin 2025-02-25 01:10:31 -05:00
4a8cfc7551 [Bugfix] Fix deepseek-v2 error: "missing 1 required positional argument: 'residual'" (#13802) Lucas Wilkinson 2025-02-24 23:33:59 -05:00
bc32bc73aa [V1][Metrics] Implement vllm:lora_requests_info metric (#13504) Mark McLoughlin 2025-02-25 04:01:33 +00:00
ab1091d5f2 [Misc][Attention][Quantization] init property earlier (#13733) wangxiyuan 2025-02-25 11:19:30 +08:00
1e15aaef56 [Bugfix][Quantization] Fix FP8 + EP (#13784) Tyler Michael Smith 2025-02-24 21:54:17 -05:00
51010a1807 [Misc] set single whitespace between log sentences (#13771) cjackal 2025-02-25 11:26:12 +09:00
7196a3b1db [Doc] arg_utils.py: fixed a typo (#13785) Eli Boyarski 2025-02-25 04:23:04 +02:00
cdc1fa12eb Remove unused kwargs from model definitions (#13555) Harry Mellor 2025-02-25 01:13:52 +00:00
f61528d46d [Misc][Chore] Clean Up AsyncOutputProcessing Logs (#13780) Robert Shaw 2025-02-24 19:39:07 -05:00
1f0ae3ed0a [Misc] Clean Up EngineArgs.create_engine_config (#13734) Robert Shaw 2025-02-24 13:52:21 -05:00
db986c19ea Fix precommit fail in fused_moe intermediate_cache2 chunking (#13772) Michael Goin 2025-02-24 12:25:47 -05:00
227578480d Revert "[V1][Core] Fix memory issue with logits & sampling" (#13775) Roger Wang 2025-02-24 09:16:05 -08:00
befc402d34 [V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) (#10980) afeldman-nm 2025-02-24 11:29:41 -05:00
444b0f0f62 [Misc][Docs] Raise error when flashinfer is not installed and VLLM_ATTENTION_BACKEND is set (#12513) Nicolò Lucchesi 2025-02-24 16:43:21 +01:00
ccc00515fd [BugFix] Illegal memory access for MoE On H20 (#13693) Zhonghua Deng 2025-02-24 23:37:32 +08:00
781096e385 Expert Parallelism (EP) Support for DeepSeek V2 (#12583) Jongseok Park 2025-02-24 07:33:20 -08:00
7940d8a6a7 [CI/Build] add python-json-logger to requirements-common (#12842) Roger Meier 2025-02-24 15:10:33 +01:00
c0e3ecd6d2 [Bugfix] fix(logging): add missing opening square bracket (#13011) Roger Meier 2025-02-24 15:10:25 +01:00
23eca9cf68 [model][refactor] remove cuda hard code in models and layers (#13658) Mengqing Cao 2025-02-24 22:10:14 +08:00
437b76ff59 [V1][Core] Fix memory issue with logits & sampling (#13721) Roger Wang 2025-02-24 06:10:06 -08:00
f90a375593 [ci] Add logic to change model to S3 path only when S3 CI env var is on (#13727) Kevin H. Luu 2025-02-23 22:32:11 -08:00
e7ef74e26e Fix some issues with benchmark data output (#13641) Huy Do 2025-02-23 18:23:18 -08:00
cbae7af552 [V1][BugFix] Fix engine core client shutdown hangs (#13298) Nick Hill 2025-02-23 13:07:43 -08:00
eb24dc4a45 [v1] torchrun compatibility (#13642) youkaichao 2025-02-23 22:47:24 +08:00
9bebc9512f [Misc] Deprecate --dataset from benchmark_serving.py (#13708) Roger Wang 2025-02-23 05:32:20 -08:00
5a2ba16f5c [Core][Distributed] Use IPC (domain socket) ZMQ socket for local comms (#13688) Nick Hill 2025-02-23 02:54:29 -08:00
ba5106e519 [LMM] Implement merged multimodal processor for whisper (#13278) Isotr0py 2025-02-23 17:46:03 +08:00
d5ca2110f1 [Quant] BaiChuan SupportsQuant (#13710) Kyle Sayers 2025-02-22 22:21:15 -05:00
2c5e637b57 [ci] Use env var to control whether to use S3 bucket in CI (#13634) Kevin H. Luu 2025-02-22 19:19:45 -08:00
322d2a27d6 [BugFix] Minor: logger import in attention backend (#13706) Andy Lo 2025-02-23 00:51:13 +00:00
82e0d601fc [CI/Build] Fix pre-commit errors from #13571 (#13709) Roger Wang 2025-02-22 16:50:38 -08:00
78ac0f591d [CI/Build] fix uv caching in Dockerfile (#13611) Daniele 2025-02-22 17:25:20 +01:00
b56155e7f3 [XPU]fix setuptools version for xpu (#13548) Yan Ma 2025-02-23 00:05:35 +08:00
382f66fb08 [Bugfix] Fix boolean conversion for OpenVINO env variable (#13615) Helena Kloosterman 2025-02-22 17:04:12 +01:00
8354f6640c [Doc] Dockerfile instructions for optional dependencies and dev transformers (#13699) Cyrus Leung 2025-02-22 22:04:31 +08:00
c904fdddf6 [ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm (#13231) Gregory Shtrasberg 2025-02-22 08:54:38 -05:00
558db8083c [V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths (#13095) Sage Moore 2025-02-22 05:25:41 -08:00
e109e598c7 [NVIDIA] Support nvfp4 cutlass gemm (#13571) Kaixi Hou 2025-02-22 05:24:05 -08:00
8db1b9d0a1 Support SSL Key Rotation in HTTP Server (#13495) Keyun Tong 2025-02-22 05:17:44 -08:00
2382ad29d1 [ci] fix linter (#13701) youkaichao 2025-02-22 20:28:59 +08:00
3e472d882a [core] set up data parallel communication (#13591) youkaichao 2025-02-22 19:28:59 +08:00
7f6bae561c [CI/Build] Fix pre-commit errors (#13696) Cyrus Leung 2025-02-22 16:31:26 +08:00
105b8ce4c0 [Misc] Reduce LoRA-related static variable (#13166) Jee Jee Li 2025-02-22 16:21:30 +08:00
2cb8c1540e [Metrics] Add --show-hidden-metrics-for-version CLI arg (#13295) Mark McLoughlin 2025-02-22 08:20:45 +00:00
1cd981da4f [V1][Metrics] Support vllm:cache_config_info (#13299) Mark McLoughlin 2025-02-22 08:20:00 +00:00
fca20841c2 Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size (#13660) Yu Chin Fabian Lim 2025-02-22 16:19:10 +08:00
da31b5333e [Bugfix] V1 Memory Profiling: V0 Sampler Integration without Rejection Sampler (#13594) Jennifer Zhao 2025-02-22 00:08:29 -08:00
bb78fb318e [v1] Support allowed_token_ids in v1 Sampler (#13210) Lu Fang 2025-02-21 22:13:05 -08:00
8aca27fa11 [Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len (#13691) Robin 2025-02-22 14:10:38 +08:00
95c617e04b [Misc] Bump compressed-tensors (#13619) Dipika Sikka 2025-02-22 01:09:04 -05:00
9a1f1da5d1 [Bugfix][Model] OLMo 2: split qkv correctly for GQA and MQA (#13687) Shane A 2025-02-21 22:07:45 -08:00
68d630a0c7 [ROCM] fix native attention function call (#13650) Gordon Wong 2025-02-22 14:07:04 +08:00
68d535ef44 [Misc] Capture and log the time of loading weights (#13666) Jun Duan 2025-02-22 01:06:34 -05:00
c6ed93860f [Bugfix][API Server] Fix invalid usage of 'ge' and 'le' in port valid… (#13672) Robin 2025-02-22 14:05:28 +08:00
0ffdf8ce0c [HTTP Server] Make model param optional in request (#13568) Keyun Tong 2025-02-21 21:55:50 -08:00
8c0dd3d4df docs: Add a note on full CI run in contributing guide (#13646) Yuan Tang 2025-02-22 00:53:59 -05:00
ada7c780d5 [Misc] Fix yapf linting tools etc not running on pre-commit (#13695) Isotr0py 2025-02-22 13:10:43 +08:00
288cc6c234 [Attention] MLA with chunked prefill (#12639) Lucas Wilkinson 2025-02-21 18:30:12 -05:00
900edbfa48 fix typo of grafana dashboard, with correct datasource (#13668) John Zheng 2025-02-22 02:21:05 +08:00
b2c3fc5d65 [Bugfix][CPU] Fix cpu all-reduce using native pytorch implementation (#13586) Isotr0py 2025-02-21 14:24:17 +08:00
839b27c6cc [Kernel]Add streamK for block-quantized CUTLASS kernels (#12978) leoneo 2025-02-21 14:14:24 +08:00
34ad27fe83 [ci] Fix metrics test model path (#13635) Kevin H. Luu 2025-02-20 22:12:10 -08:00
1c3c975766 [FEATURE] Enables /score endpoint for embedding models (#12846) Gabriel Marinho 2025-02-21 03:09:47 -03:00
1cdc88614a Missing comment explaining VDR variable in GGUF kernels (#13290) Szymon Ożóg 2025-02-21 07:06:54 +01:00
31aa045c11 [V1][Sampler] Avoid an operation during temperature application (#13587) Nick Hill 2025-02-20 22:05:56 -08:00
a30c093502 [Bugfix] Add mm_processor_kwargs to chat-related protocols (#13644) Roger Wang 2025-02-20 22:04:33 -08:00
c7b07a95a6 Use pre-commit to update requirements-test.txt (#13617) Harry Mellor 2025-02-21 06:03:27 +00:00
27a09dc52c [NVIDIA] Fix an issue to use current stream for the nvfp4 quant (#13632) Kaixi Hou 2025-02-20 22:01:48 -08:00
981f3c831e [Misc] Adding script to setup ray for multi-node vllm deployments (#12913) Edwin Hernandez 2025-02-20 21:16:40 -08:00
44c33f01f3 Add llmaz as another integration (#13643) Kante Yin 2025-02-21 11:52:40 +08:00
33170081f1 [Neuron][Kernel] Vectorize KV cache load in FlashPagedAttention to maximize DMA bandwidth (#13245) Lingfan Yu 2025-02-20 17:45:45 -08:00
71face8540 [Bugfix] Fix max_num_batched_tokens for MLA (#13620) Michael Goin 2025-02-20 20:45:20 -05:00
bfbc0b32c6 [Frontend] Add backend-specific options for guided decoding (#13505) Joe Runde 2025-02-20 13:07:58 -07:00
6a417b8600 fix neuron performance issue (#13589) ajayvohra2005 2025-02-20 13:59:36 -05:00
d3ea50113c [V1][Minor] Print KV cache size in token counts (#13596) Woosuk Kwon 2025-02-20 09:24:31 -08:00
34aad515c8 Update pre-commit's isort version to remove warnings (#13614) Harry Mellor 2025-02-20 16:00:14 +00:00
ed6e9075d3 [Bugfix] Fix deepseekv3 grouped topk error (#13474) v0.7.3 chenxiaobing 2025-02-20 22:47:01 +08:00

... 110 111 112 113 114 ...