Jared O'Connell
|
31282401b6
|
[BugFix] Fix Python 3.9 Support (#23306)
Signed-off-by: Jared O'Connell <46976761+jaredoconnell@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-20 23:23:56 -07:00 |
|
Cyrus Leung
|
0c31e28e95
|
[Bugfix] Fix extra whitespace in strings caused by newline (#23272)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 22:03:00 -07:00 |
|
22quinn
|
f571ff8eb6
|
[Sampler] Support returning final logprobs (#22387)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-20 21:28:32 -07:00 |
|
Michael Goin
|
f64ee61d9e
|
[CI] Block the cu126 wheel build while broken (#23285)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-21 04:21:05 +00:00 |
|
QiliangCui
|
8993073dc1
|
[CI] Delete images older than 24h. (#23291)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-08-20 21:15:20 -07:00 |
|
杨奇(yann qi)
|
655a09f653
|
[Model][VLM] Support R-4B Model (#23246)
Signed-off-by: yannqi <yannqi@qq.com>
Signed-off-by: 杨奇(yann qi) <51905299+yannqi@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: yannqiyang <yannqiyang@tencent.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-21 04:08:52 +00:00 |
|
Wentao Ye
|
f94bf9b924
|
[Compile] Fix Compile Warning SM100 Cutlass MLA (#23287)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-21 03:09:39 +00:00 |
|
Asaf Joseph Gardin
|
3663870c72
|
[V1][Mamba1] - Full CUDA and Piecewise CUDA Graphs Support (#23035)
Signed-off-by: asafg <asafg@ai21.com>
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
Co-authored-by: asafg <asafg@ai21.com>
|
2025-08-20 20:08:51 -07:00 |
|
Cyrus Leung
|
2461d9e562
|
[CI/Build] Split out mm processor tests (#23260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 20:05:20 -07:00 |
|
Li, Jiang
|
7be5d113d8
|
[CPU] Refactor CPU W8A8 scaled_mm (#23071)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-08-21 09:34:24 +08:00 |
|
Woosuk Kwon
|
b029de9902
|
[Optimization] Make new_block_ids None if empty (#23262)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-08-20 18:25:56 -07:00 |
|
Michael Goin
|
bbea1cefdd
|
[CI Bugfix] Fix CI by fully removing --enable-prompt-adapter (#23284)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-20 17:18:12 -07:00 |
|
Russell Bryant
|
f5aa307d77
|
Remove duplicate entry in vllm.attention.__all__ (#23296)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-20 17:14:59 -07:00 |
|
22quinn
|
4b795020ed
|
[EP] Add logging for experts map (#22685)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-20 23:46:06 +00:00 |
|
shixianc
|
c86af22f31
|
[Fix] remove is_marlin param in benchmark_moe (#23286)
|
2025-08-20 22:04:21 +00:00 |
|
Matthew Bonanni
|
10cc12ba66
|
Feature/mla tests (#23195)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-08-20 21:46:47 +00:00 |
|
Matthew Bonanni
|
a4fbb32fab
|
Remove chunked_prefill_enabled flag in V1 MLA (#23183)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-08-20 21:43:17 +00:00 |
|
youkaichao
|
1b125004be
|
[misc] fix multiple arch wheels for the nightly index (#23110)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-08-20 14:15:34 -07:00 |
|
rongfu.leng
|
4fbda0b20c
|
[Feature] use --eplb_config to set eplb param (#20562)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-20 14:07:28 -07:00 |
|
Russell Bryant
|
4e51fa8cba
|
Do not use eval() to convert unknown types (#23266)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-20 13:28:30 -07:00 |
|
Saurabh Misra
|
bf7c99dfc4
|
[Perf] Speed up function _convert_tokens_to_string_with_added_encoders by 13.7x (#20413)
Signed-off-by: Saurabh Misra <misra.saurabh1@gmail.com>
Signed-off-by: Aseem Saxena <aseem.bits@gmail.com>
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Co-authored-by: Aseem Saxena <aseem.bits@gmail.com>
|
2025-08-20 13:17:11 -07:00 |
|
Chen Zhang
|
b95697d731
|
[Frontend] improve error logging of chat completion (#22957)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-20 13:03:37 -07:00 |
|
bigmoyan
|
582bbe6bd7
|
[Fix] correct tool_id for kimi-k2 when use tool_choice=required (#21259)
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
|
2025-08-20 12:59:54 -07:00 |
|
Michael Goin
|
0cdbf5e61c
|
[Kernel/Quant] Remove the original marlin format and qqq (#23204)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-20 15:13:36 -04:00 |
|
dongluw
|
ebe56a0064
|
Small fix for Command-A-Vision (#23268)
Signed-off-by: donglu <donglu@cohere.com>
|
2025-08-20 18:15:18 +00:00 |
|
Russell Bryant
|
f77a0802b7
|
Limit HTTP header count and size (#23267)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2025-08-20 17:57:37 +00:00 |
|
Benji Beck
|
c4477f55e5
|
Migrate Mistral3ImagePixelInputs to TensorSchema (#21945)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-20 17:37:29 +00:00 |
|
Yong Hoon Shin
|
dfd2382039
|
[torch.compile] Support conditional torch.compile per module (#22269)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-20 16:52:59 +00:00 |
|
JartX
|
3b11b26b50
|
[FIXBUG ] Allow disabling rocm_aiter_fa backend for ROCm GPUs not compatible with AITER (#22795)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-08-20 09:08:29 -07:00 |
|
Woosuk Kwon
|
d6d13bd49e
|
[Misc] Add max_seq_len to CommonAttentionMetadata (#23216)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-20 09:05:29 -07:00 |
|
Cyrus Leung
|
5efd6905bc
|
[CLI][Doc] Formalize --mm-encoder-tp-mode (#23190)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 23:42:28 +08:00 |
|
shixianc
|
b17109beea
|
[Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045)
Signed-off-by: Shixian Cui <shixian@amazon.com>
|
2025-08-20 10:35:26 -04:00 |
|
Cyrus Leung
|
4449235843
|
[Bugfix] Ensure correctness of HCXVision processing (#23254)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 14:19:30 +00:00 |
|
rongfu.leng
|
38217877aa
|
[Fix] fix offline env use local mode path (#22526)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-08-20 13:34:49 +00:00 |
|
Jee Jee Li
|
c6d80a7a96
|
[Model] Improve olmo and olmo2 (#23228)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-20 12:47:05 +00:00 |
|
xyxinyang
|
7cd17e22d7
|
[Model][V1] Support Ernie MTP (#22169)
Signed-off-by: zhouchong <zhouchong03@baidu.com>
Co-authored-by: zhouchong <zhouchong03@baidu.com>
|
2025-08-20 20:41:55 +08:00 |
|
Michael Goin
|
50df09fe13
|
Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image (#23129)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-20 08:05:54 -04:00 |
|
Cyrus Leung
|
68fcd3fa73
|
[Bugfix] Ensure correctness of Cohere2Vision processing (#23245)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 11:09:18 +00:00 |
|
Xin Yang
|
83e69a09d6
|
[Model] Support deepseek with eagle (#21086)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2025-08-20 19:01:31 +08:00 |
|
Shiming Zhang
|
3aa8c10038
|
Fix missing quotes (#23242)
Signed-off-by: Shiming Zhang <wzshiming@hotmail.com>
|
2025-08-20 10:46:59 +00:00 |
|
Calvin Chen
|
103f1ec8d3
|
[Model] use autoWeightsLoader for gptoss (#22446)
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
|
2025-08-20 10:16:27 +00:00 |
|
who who who
|
d983769c41
|
fix cuda graph (#22721)
Signed-off-by: fsx950223 <fsx950223@outlook.com>
|
2025-08-20 06:24:37 +00:00 |
|
Nick Hill
|
8fd920924c
|
[BugFix] Fix stuck stats/metrics after requests are aborted (#22995)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-20 13:50:29 +08:00 |
|
Cyrus Leung
|
de7b67a023
|
[CI/Build] Sync multimodal tests (#23181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 05:06:42 +00:00 |
|
Zhewen Li
|
f729023272
|
[CI/Build] Also check DP in benchmarks throughput script (#23038)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-20 04:09:27 +00:00 |
|
길재은
|
1a3079a15e
|
chore: support pytorch format in lora (#22790)
Signed-off-by: jaeeun.kil <rha3122@naver.com>
Signed-off-by: 길재은 <rha3122@naver.com>
|
2025-08-20 04:02:50 +00:00 |
|
Louie Tsai
|
941f56858a
|
Fix a performance comparison issue in Benchmark Suite (#23047)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
|
2025-08-20 03:14:32 +00:00 |
|
Zebing Lin
|
a634733f67
|
[Attention] Optimize make_local_attention_virtual_batches for Flash Attention (#23185)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-08-20 02:57:47 +00:00 |
|
Cyrus Leung
|
64ab3c7253
|
[Doc] Update V1 status of various pooling models (#23189)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 10:33:41 +08:00 |
|
Chenheli Hua
|
e58c5a9768
|
[Core] Add torch profiler CPU traces for AsyncLLM. (#21794)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-08-20 02:32:47 +00:00 |
|