Nick Hill
|
32f4e4db00
|
[Cleanup] Remove deprecated fields from CachedRequestData class (#31734)
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-05 21:07:14 +00:00 |
|
amitz-nv
|
ee21291825
|
[Model] Nemotron Parse 1.1 Support (#30864)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-05 13:00:14 -08:00 |
|
Qidong Su
|
af1b07b0c5
|
[docker] install cuda13 version of lmcache and nixl (#30913)
Signed-off-by: Qidong Su <soodoshll@gmail.com>
|
2026-01-05 12:50:39 -08:00 |
|
gnovack
|
c77a993cc2
|
pin lora_b moe weights on cpu (#31317)
Signed-off-by: gnovack <gnovack@amazon.com>
|
2026-01-05 12:15:40 -08:00 |
|
Roberto L. Castro
|
fdcc5176be
|
[BugFix] Fix architecture flags to prevent issues on SM103 (#31150)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
|
2026-01-05 20:11:35 +00:00 |
|
Wang Kunpeng
|
5708297e4e
|
[Misc][Model][Refactor] Pass the prefix into Linear layers (#31669)
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
|
2026-01-05 20:03:18 +00:00 |
|
baonudesifeizhai
|
02dbb933cb
|
Fix GLM-4.6v flash tool calling in transformers 5.x (#31622)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
|
2026-01-05 11:32:43 -08:00 |
|
Isotr0py
|
51e38a8e30
|
[Misc] Enable Paligemma's PrefixLM attention mask computation (#31725)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-06 03:31:49 +08:00 |
|
Or Ozeri
|
d8e38d4939
|
Triton Attention: Support cross-layers blocks (#30687)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-05 19:29:16 +00:00 |
|
kzwrime
|
21156ff199
|
[Bugfix] Add missing extra_tensors arg to DeviceCommunicatorBase.disp… (#31644)
Signed-off-by: kunzh <zhikun.wu@outlook.com>
|
2026-01-06 01:26:09 +08:00 |
|
RickyChen / 陳昭儒
|
c455b771fd
|
[Bugfix][CPU] Fix RotaryEmbedding fallback causing gibberish with --enforce-eager (#31643)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
|
2026-01-06 01:25:38 +08:00 |
|
Michael Goin
|
eefa713a66
|
[CI Failure] Disable B200 tests while runner is broken (#31732)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-05 08:50:51 -08:00 |
|
Kevin Šuc
|
79ed460dd5
|
[Frontend] [Doc] Exclude log deltas feature (#30322)
Signed-off-by: Catacomba <kevinsuc16@gmail.com>
Signed-off-by: Kevin Šuc <kevinsuc16@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-01-05 16:34:35 +00:00 |
|
Isotr0py
|
6aa5b18e1d
|
[v1] Add encoder-only/cross attention support to Triton Attention backend (#31406)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-06 00:00:23 +08:00 |
|
wang.yuqi
|
911d38ed99
|
[Model] Let more models to support the score template. (#31335)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-01-05 11:54:26 +00:00 |
|
zzzzwwjj
|
caaa482aca
|
[platform] Support additional forward context for OOT (#31674)
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: zzzzwwjj <34335947+zzzzwwjj@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-01-05 10:25:13 +00:00 |
|
Yihua Cheng
|
b471aad41f
|
[KVconnector][LMCache] remove the import of legacy LMCache code (#31704)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
|
2026-01-05 10:11:01 +00:00 |
|
Jee Jee Li
|
d5503ca7f9
|
[LoRA] LoRA PDL improvement (#31660)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-05 08:28:46 +00:00 |
|
Qiping Pan
|
a2ad15c070
|
[Model] Enable LoRA support for BLIP2 (#31620)
Signed-off-by: Qiping Pan <panqiping@outlook.com>
|
2026-01-05 08:02:24 +00:00 |
|
Tres
|
3133c192a3
|
[ROCM] Reorder arguments and rename parameters for rope_cached_thd_positions_2c_fwd_inplace (#29993)
Signed-off-by: Tres Popp <tres.popp@amd.com>
|
2026-01-05 15:37:57 +08:00 |
|
wang.yuqi
|
76fd458aa7
|
[CI] Bump sentence-transformer from 3.2.1 to 5.2.0 (#31664)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-04 21:45:01 -08:00 |
|
cjackal
|
e2701cc525
|
[Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser (#31581)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-01-05 05:42:47 +00:00 |
|
Tyler Michael Smith
|
fe8a9fbd2e
|
[Bugfix] Fix EPLB state logging error (#31455)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-01-05 04:06:28 +00:00 |
|
Ning Xie
|
98b8b3abaa
|
[log] enable max_log_len trim only when needed (#31482)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-05 03:55:43 +00:00 |
|
CHENYUE
|
346e56455a
|
Add chat prefix completion feature to DeepSeek v3.2 (#31147)
|
2026-01-05 11:20:25 +08:00 |
|
wang.yuqi
|
8be6432bda
|
[CI Failure] Fix NomicBert max_model_len validation (#31662)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-05 11:06:52 +08:00 |
|
Nick Hill
|
43e3f8e4a9
|
[Misc] Various code simplifications (#31666)
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-04 18:35:56 -08:00 |
|
wangxiyuan
|
bb4337b34c
|
[Platform] Deprecate seed_everything (#31659)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2026-01-04 18:34:04 -08:00 |
|
Isotr0py
|
367856de14
|
[CI/Build] Revive skipped reward models e2e test (#31665)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-05 02:33:46 +00:00 |
|
Nick Hill
|
da436f868a
|
[Minor] Small pooler output processing optimization (#31667)
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-04 18:33:12 -08:00 |
|
Jee Jee Li
|
f099cd557a
|
[Bugfix] Fix AttributeError: 'Stream' object has no attribute 'dp_size' (#31663)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-05 02:31:31 +00:00 |
|
Andreas Karatzas
|
f2b6dfd237
|
[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp (#31597)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-05 02:17:05 +00:00 |
|
Andreas Karatzas
|
89f1f25310
|
[CI] Skip Phi-MoE test due to old API util (#31632)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-05 08:52:07 +08:00 |
|
Nick Hill
|
b53b89fdb3
|
[BugFix] Async scheduling: handle model forward errors more cleanly (#31611)
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-04 11:04:37 -08:00 |
|
Ning Xie
|
6522721d17
|
[misc] Sort uvicorn log level description according to verbosity (#31137)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-04 18:45:37 +00:00 |
|
Yuxuan Zhang
|
0d4044edd8
|
fix no think of GLM-4.5 / GLM-4.7 (#31449)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2026-01-04 11:43:00 +08:00 |
|
Reagan Lee
|
41ab179738
|
[Docs] Fix argparse include path for mm-processor benchmark (#31654)
Signed-off-by: Reagan <reaganjlee@gmail.com>
|
2026-01-04 03:31:29 +00:00 |
|
Robert Shaw
|
268b1c55ad
|
[MoE Refactor][13/N] Convert FI to Use PFNoEP (#31533)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-03 12:26:36 -08:00 |
|
Andreas Karatzas
|
4f9ce35afe
|
[CI][Bugfix] Fix token counting in chunked prefill compl test (#31630)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-03 14:28:49 +08:00 |
|
jeremyteboul
|
97a01308e9
|
Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring (#29255)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
|
2026-01-03 04:31:09 +00:00 |
|
Xingyu Liu
|
0eee877f67
|
[Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
|
2026-01-02 15:13:15 -08:00 |
|
Alfred
|
a0e9ee83c7
|
[Benchmark] Fix OOM during MoE kernel tuning for large models (#31604)
Signed-off-by: Alfred <massif0601@gmail.com>
|
2026-01-02 22:24:51 +00:00 |
|
Yongye Zhu
|
a3f2f40947
|
[MoE Refactor] Explicit construct mk for flashinfer bf16 kernel (#31504)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-01-02 13:54:50 -08:00 |
|
Yongye Zhu
|
5a468ff7c7
|
[MoE Refactor] Split invoke_fused_moe_kernel (#31050)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-01-02 13:47:15 -08:00 |
|
Andreas Karatzas
|
6ef770df7c
|
[MoE] Fix output_shape calculation in Attention layer to handle 3D query inputs (#31596)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-02 15:46:23 +00:00 |
|
Nick Hill
|
bd877162eb
|
[BugFix] Support online dense model DP without overhead (#30739)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-02 23:36:38 +08:00 |
|
Xinyu Chen
|
08f425bad1
|
CustomOp: test forward dispatch for grouped_topk (#31530)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2026-01-02 10:04:01 -05:00 |
|
labAxiaoming
|
a01f2faedf
|
Add multimodal input method in the documentation (#31601)
Signed-off-by: xiaoming <1259730330@qq.com>
|
2026-01-02 12:43:30 +00:00 |
|
Kyuyeun Kim
|
cc410e8644
|
[Bugfix] Fix weight_loader v1 block scale (#31103)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2026-01-02 13:14:10 +08:00 |
|
Kevin McKay
|
825c2dc133
|
[Bugfix][Hardware][AMD] Fix last_page_len calculation in AITER MLA decode (#31282)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2026-01-01 21:14:00 -08:00 |
|