Keyun Tong
|
8db1b9d0a1
|
Support SSL Key Rotation in HTTP Server (#13495)
|
2025-02-22 05:17:44 -08:00 |
|
youkaichao
|
2382ad29d1
|
[ci] fix linter (#13701)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-22 20:28:59 +08:00 |
|
youkaichao
|
3e472d882a
|
[core] set up data parallel communication (#13591)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-02-22 19:28:59 +08:00 |
|
Cyrus Leung
|
7f6bae561c
|
[CI/Build] Fix pre-commit errors (#13696)
|
2025-02-22 00:31:26 -08:00 |
|
Jee Jee Li
|
105b8ce4c0
|
[Misc] Reduce LoRA-related static variable (#13166)
|
2025-02-22 00:21:30 -08:00 |
|
Mark McLoughlin
|
2cb8c1540e
|
[Metrics] Add --show-hidden-metrics-for-version CLI arg (#13295)
|
2025-02-22 00:20:45 -08:00 |
|
Mark McLoughlin
|
1cd981da4f
|
[V1][Metrics] Support vllm:cache_config_info (#13299)
|
2025-02-22 00:20:00 -08:00 |
|
Yu Chin Fabian Lim
|
fca20841c2
|
Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size (#13660)
|
2025-02-22 00:19:10 -08:00 |
|
Jennifer Zhao
|
da31b5333e
|
[Bugfix] V1 Memory Profiling: V0 Sampler Integration without Rejection Sampler (#13594)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-02-22 00:08:29 -08:00 |
|
Lu Fang
|
bb78fb318e
|
[v1] Support allowed_token_ids in v1 Sampler (#13210)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-02-22 14:13:05 +08:00 |
|
Robin
|
8aca27fa11
|
[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len (#13691)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-02-22 14:10:38 +08:00 |
|
Dipika Sikka
|
95c617e04b
|
[Misc] Bump compressed-tensors (#13619)
|
2025-02-21 22:09:04 -08:00 |
|
Shane A
|
9a1f1da5d1
|
[Bugfix][Model] OLMo 2: split qkv correctly for GQA and MQA (#13687)
|
2025-02-21 22:07:45 -08:00 |
|
Gordon Wong
|
68d630a0c7
|
[ROCM] fix native attention function call (#13650)
|
2025-02-21 22:07:04 -08:00 |
|
Jun Duan
|
68d535ef44
|
[Misc] Capture and log the time of loading weights (#13666)
|
2025-02-21 22:06:34 -08:00 |
|
Robin
|
c6ed93860f
|
[Bugfix][API Server] Fix invalid usage of 'ge' and 'le' in port valid… (#13672)
|
2025-02-21 22:05:28 -08:00 |
|
Keyun Tong
|
0ffdf8ce0c
|
[HTTP Server] Make model param optional in request (#13568)
|
2025-02-21 21:55:50 -08:00 |
|
Yuan Tang
|
8c0dd3d4df
|
docs: Add a note on full CI run in contributing guide (#13646)
|
2025-02-21 21:53:59 -08:00 |
|
Isotr0py
|
ada7c780d5
|
[Misc] Fix yapf linting tools etc not running on pre-commit (#13695)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-02-22 13:10:43 +08:00 |
|
Lucas Wilkinson
|
288cc6c234
|
[Attention] MLA with chunked prefill (#12639)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Patrick Horn <patrick.horn@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-02-21 15:30:12 -08:00 |
|
John Zheng
|
900edbfa48
|
fix typo of grafana dashboard, with correct datasource (#13668)
Signed-off-by: John Zheng <john.zheng@hp.com>
|
2025-02-21 18:21:05 +00:00 |
|
Isotr0py
|
b2c3fc5d65
|
[Bugfix][CPU] Fix cpu all-reduce using native pytorch implementation (#13586)
|
2025-02-20 22:24:17 -08:00 |
|
leoneo
|
839b27c6cc
|
[Kernel]Add streamK for block-quantized CUTLASS kernels (#12978)
|
2025-02-20 22:14:24 -08:00 |
|
Kevin H. Luu
|
34ad27fe83
|
[ci] Fix metrics test model path (#13635)
|
2025-02-20 22:12:10 -08:00 |
|
Gabriel Marinho
|
1c3c975766
|
[FEATURE] Enables /score endpoint for embedding models (#12846)
|
2025-02-20 22:09:47 -08:00 |
|
Szymon Ożóg
|
1cdc88614a
|
Missing comment explaining VDR variable in GGUF kernels (#13290)
|
2025-02-20 22:06:54 -08:00 |
|
Nick Hill
|
31aa045c11
|
[V1][Sampler] Avoid an operation during temperature application (#13587)
|
2025-02-20 22:05:56 -08:00 |
|
Roger Wang
|
a30c093502
|
[Bugfix] Add mm_processor_kwargs to chat-related protocols (#13644)
|
2025-02-20 22:04:33 -08:00 |
|
Harry Mellor
|
c7b07a95a6
|
Use pre-commit to update requirements-test.txt (#13617)
|
2025-02-20 22:03:27 -08:00 |
|
Kaixi Hou
|
27a09dc52c
|
[NVIDIA] Fix an issue to use current stream for the nvfp4 quant (#13632)
|
2025-02-20 22:01:48 -08:00 |
|
Edwin Hernandez
|
981f3c831e
|
[Misc] Adding script to setup ray for multi-node vllm deployments (#12913)
|
2025-02-20 21:16:40 -08:00 |
|
Kante Yin
|
44c33f01f3
|
Add llmaz as another integration (#13643)
Signed-off-by: kerthcet <kerthcet@gmail.com>
|
2025-02-21 03:52:40 +00:00 |
|
Lingfan Yu
|
33170081f1
|
[Neuron][Kernel] Vectorize KV cache load in FlashPagedAttention to maximize DMA bandwidth (#13245)
Signed-off-by: Lingfan Yu <lingfany@amazon.com>
|
2025-02-20 17:45:45 -08:00 |
|
Michael Goin
|
71face8540
|
[Bugfix] Fix max_num_batched_tokens for MLA (#13620)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-20 17:45:20 -08:00 |
|
Joe Runde
|
bfbc0b32c6
|
[Frontend] Add backend-specific options for guided decoding (#13505)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-20 15:07:58 -05:00 |
|
ajayvohra2005
|
6a417b8600
|
fix neuron performance issue (#13589)
|
2025-02-20 10:59:36 -08:00 |
|
Woosuk Kwon
|
d3ea50113c
|
[V1][Minor] Print KV cache size in token counts (#13596)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-20 09:24:31 -08:00 |
|
Harry Mellor
|
34aad515c8
|
Update pre-commit's isort version to remove warnings (#13614)
|
2025-02-20 08:00:14 -08:00 |
|
chenxiaobing
|
ed6e9075d3
|
[Bugfix] Fix deepseekv3 grouped topk error (#13474)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Chen-XiaoBing <chenxb002@whu.edu.cn>
v0.7.3
|
2025-02-20 06:47:01 -08:00 |
|
Harry Mellor
|
992e5c3d34
|
Merge similar examples in offline_inference into single basic example (#12737)
|
2025-02-20 04:53:51 -08:00 |
|
Varun Sundar Rabindranath
|
b69692a2d8
|
[Kernel] LoRA - Refactor sgmv kernels (#13110)
|
2025-02-20 07:28:06 -05:00 |
|
Kevin H. Luu
|
a64a84433d
|
[2/n][ci] S3: Use full model path (#13564)
Signed-off-by: <>
|
2025-02-20 01:20:15 -08:00 |
|
Kevin H. Luu
|
aa1e62d0db
|
[ci] Fix spec decode test (#13600)
|
2025-02-20 16:56:00 +08:00 |
|
Michael Goin
|
497bc83124
|
[CI/Build] Use uv in the Dockerfile (#13566)
|
2025-02-19 23:05:44 -08:00 |
|
Yuan Tang
|
3738e6fa80
|
[API Server] Add port number range validation (#13506)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2025-02-20 15:05:13 +08:00 |
|
Gregory Shtrasberg
|
0023cd2b9d
|
[ROCm] MI300A compile targets deprecation (#13560)
|
2025-02-19 23:05:00 -08:00 |
|
燃
|
041e294716
|
[Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL (#13533)
|
2025-02-19 23:04:30 -08:00 |
|
Alex Brooks
|
9621667874
|
[Misc] Warn if the vLLM version can't be retrieved (#13501)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
|
2025-02-20 06:24:48 +00:00 |
|
Simon Mo
|
8c755c3b6d
|
[bugfix] spec decode worker get tp group only when initialized (#13578)
|
2025-02-20 04:46:28 +00:00 |
|
youkaichao
|
ba81163997
|
[core] add sleep and wake up endpoint and v1 support (#12987)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: cennn <2523403608@qq.com>
Co-authored-by: cennn <2523403608@qq.com>
|
2025-02-20 12:41:17 +08:00 |
|