youkaichao
|
361a7463d3
|
fix m2 test (#27536)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-10-27 01:04:36 +08:00 |
|
Roger Young
|
720af6ab79
|
[Model][MiniMax-M2] Support MiniMax-M2 Model (#27535)
Signed-off-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
|
2025-10-27 00:59:11 +08:00 |
|
Cyrus Leung
|
55cba4a05c
|
[CI/Build] Update causal-conv1d installation (#27529)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-26 22:14:22 +08:00 |
|
Cyrus Leung
|
c7abff2990
|
Revert "[CI/Build] Use CPU for mm processing test on CI (#27522)" (#27531)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-26 04:44:27 -07:00 |
|
Yeshwanth N
|
71b1c8b667
|
[Chore]:Extract math and argparse utilities to separate modules (#27188)
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com>
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com>
Signed-off-by: yeshsurya <yeshsurya@gmail.com>
|
2025-10-26 04:03:32 -07:00 |
|
Cyrus Leung
|
8fb7b2fab9
|
[Doc] Fix links to GH projects (#27530)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-26 17:55:51 +08:00 |
|
Cyrus Leung
|
be7b55a83d
|
[Doc] Remove Molmo warning (#27527)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-26 16:22:52 +08:00 |
|
Lucia Fang
|
315b860abe
|
[bugfix]fix empty prompts for async-engine mode in benchmark throughput (#27494)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-10-26 08:16:35 +00:00 |
|
rongfu.leng
|
87c41c26ad
|
[Bugfix] Fix processor initialization for model from modelscope instead of HF (#27461)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-26 07:44:31 +00:00 |
|
JartX
|
65d2cf9511
|
[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA (#27190)
Signed-off-by: JartX <sagformas@epdcenter.es>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-10-26 15:08:52 +08:00 |
|
Isotr0py
|
d63cd9ff10
|
[CI/Build] Use CPU for mm processing test on CI (#27522)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-26 13:09:18 +08:00 |
|
Cyrus Leung
|
66a168a197
|
[CI/Build] Refactor processing tests (#27470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-25 16:14:30 +00:00 |
|
Matthew Bonanni
|
a99564ac5b
|
[Attention] Add missing kv cache scale setup (#27490)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-25 00:12:49 -07:00 |
|
Cyrus Leung
|
4c5f632165
|
[Misc] Simplify max tokens in multimodal registry (#27500)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-24 23:56:01 -07:00 |
|
Kuntai Du
|
b853540388
|
[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector (#25712)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-10-24 23:34:18 -07:00 |
|
Zhuohan Li
|
56ed7609a9
|
Revert "[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selectio… (#27502)
|
2025-10-25 05:31:43 +00:00 |
|
Jiangyun Zhu
|
29c9cb8007
|
[CI] Add tests for cudagraph (#27391)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-10-25 02:37:33 +00:00 |
|
Yihua Cheng
|
83f478bb19
|
[KVConnector] Migrate the LMCache integration code to be vLLM native (#25542)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
v0.11.1rc3
|
2025-10-25 00:23:53 +00:00 |
|
Varun Sundar Rabindranath
|
269c4db0a4
|
[Misc][DP] Guard mxfp4 implementation selection (#27484)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-24 23:29:24 +00:00 |
|
Wentao Ye
|
52efc34ebf
|
[Log] Optimize Startup Log (#26740)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-24 19:27:04 -04:00 |
|
Pengchao Wang
|
d95d0f4b98
|
[Distributed] Basic set of configuration for large EP deployment on GB200 (#27328)
Signed-off-by: Pengchao Wang <wpc@fb.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2025-10-24 14:16:44 -07:00 |
|
Lehua Ding
|
0402428200
|
[Perf][Async Scheduling] Remove CPU->GPU sync in dummy_run (#27455)
Signed-off-by: Lehua Ding <lehuading@tencent.com>
|
2025-10-24 20:45:36 +00:00 |
|
jinghanhu
|
17af6aa0da
|
[Document] Add ms-swift library to rlhf.md (#27469)
|
2025-10-24 20:31:50 +00:00 |
|
Zhewen Li
|
fc168c33f3
|
[CI/Build] Fix test_torch_utils in AMD CI (#27317)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-10-24 12:26:00 -07:00 |
|
Isotr0py
|
acc78aeb88
|
[Bugfix] Fix interns1-vit qk norm code path (#27480)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-24 17:43:45 +00:00 |
|
Ming Yang
|
0f67d4d962
|
[Attention] Add MLA prefill backend: trtllm_ragged_attention_deepseek (#26397)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-10-24 10:24:08 -07:00 |
|
kourosh hakhamaneshi
|
7e1d697b56
|
[Bugfix] Fix MultiConnector stats reconstruction across process boundaries (#27366)
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-10-24 17:08:05 +00:00 |
|
Chendi.Xue
|
699d62e6cf
|
[NIXL][BUGFIX] delay done_recving queue cleanup to bottom of get_finished (#27297)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-10-24 17:01:41 +00:00 |
|
Richard Zou
|
cd390b609d
|
[compile] Turn standalone_compile back on (#27460)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-10-24 16:30:27 +00:00 |
|
Fadi Arafeh
|
2080b05099
|
[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype (#27472)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-10-24 15:57:48 +00:00 |
|
Lifans
|
6454afec90
|
[Doc] Fix minor issues in docs/design/metrics.md (#27436)
Signed-off-by: Lifan Shen <lifans@meta.com>
|
2025-10-24 05:40:54 -07:00 |
|
Chauncey
|
41a62564a7
|
Fix test named tool use (#27458)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-24 20:27:45 +08:00 |
|
fhl2000
|
284cc92275
|
[MISC] cudagraph_capture_sizes related improvements (#26016)
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-24 05:11:05 -07:00 |
|
ioana ghiban
|
435be10db9
|
Fix AArch64 CPU Docker pipeline (#27331)
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
|
2025-10-24 05:11:01 -07:00 |
|
Cyrus Leung
|
b7030d962b
|
[Benchmark] Enable benchmark to run with encoding_format="bytes" (#27467)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-24 11:16:50 +00:00 |
|
Chauncey
|
3567816932
|
[Refactor] move tool parsing logic from protocol.py to the tool parser (#27383)
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-10-24 09:53:23 +00:00 |
|
22quinn
|
e0ef8a2920
|
[BugFix] Fix torchrun DP with LLM class (#27395)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-10-24 08:11:37 +00:00 |
|
Isotr0py
|
42efe609ba
|
[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer (#27418)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-24 07:32:47 +00:00 |
|
Yu Jiaqi
|
88d3141ec6
|
[Docs] remove v1 column for embedding models (#27446)
Signed-off-by: piood <2477084691@qq.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-23 23:55:03 -07:00 |
|
Rui Qiao
|
09a6a49eaf
|
[Misc] Avoid "PyTorch non-writable tensors" warning in RayPPCommunicator (#27443)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-10-24 14:53:09 +08:00 |
|
strinczer
|
074475541a
|
[Bugfix] Fix Pydantic union resolution for ResponseFunctionToolCall in Responses API (#26706)
Signed-off-by: Shai Trinczer <strinczer@icloud.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-10-23 22:53:42 -07:00 |
|
Aaron Pham
|
d4c574c39f
|
[Chore] remove structural tags logging lines (#27451)
|
2025-10-24 05:35:45 +00:00 |
|
usberkeley
|
c528b9006a
|
Fix EventPublisherFactory logic for disabled KV cache events (#27419)
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
|
2025-10-24 05:00:01 +00:00 |
|
fhl2000
|
85fee74b33
|
[Bugfix][CI] Move resolving cudagraph_mode before initializing attn_metadata_builder (#27427)
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
|
2025-10-23 20:31:14 -07:00 |
|
hfan
|
8dbe0c527f
|
[Misc] Add TPU usage report when using tpu_inference. (#27423)
Signed-off-by: Hongmin Fan <fanhongmin@google.com>
|
2025-10-23 20:29:37 -07:00 |
|
Xiangyu Li
|
5cc6bddb6e
|
[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092)
|
2025-10-23 23:26:13 -04:00 |
|
Harry Mellor
|
1f9460c4c1
|
Fix pooling adapters for Transformers backend (#27338)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-23 20:23:55 -07:00 |
|
xiao-llm
|
70022ffc00
|
Granite 4.0 quark quantization support (#26944)
Signed-off-by: Xiao YU <Xiao.YU@xilinx.com>
Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com>
Co-authored-by: Xiao YU <Xiao.YU@xilinx.com>
|
2025-10-24 02:14:03 +00:00 |
|
Akash kaothalkar
|
f417746ad7
|
[Hardware][POWERPC] Disable oneDNN path in vllm/model_executor/layers/utils.py for Powerpc (#27422)
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
|
2025-10-23 21:21:36 +00:00 |
|
Yu Jiaqi
|
0552cfb195
|
[Model] Siglip Embedding Support (#27324)
Signed-off-by: piood <2477084691@qq.com>
|
2025-10-23 20:19:48 +00:00 |
|