Yuxiang Liang
|
638a872d77
|
fix(xpu): Re-compute compile ranges after platform-specific config updates (#37523)
Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>
Signed-off-by: Yuxiang Liang <yuliang@habana.ai>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-20 03:52:35 +00:00 |
|
Flora Feng
|
9040151fe1
|
[V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 11:31:43 +08:00 |
|
Jee Jee Li
|
8fbe3f303f
|
[Bugfix][LoRA] Fix Qwen35 LoRA (#36976)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-20 11:09:32 +08:00 |
|
Xiao
|
ea2c148fa7
|
[compile][graph_partition]Add tensor size handling (#36038)
Signed-off-by: Xiao Fu <xiaofu@meta.com>
|
2026-03-19 19:55:25 -07:00 |
|
Tianmu Li
|
47b7af0d87
|
[Feat] Enable CompressedTensorW4A8Int for XPU (#37207)
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
|
2026-03-20 02:34:28 +00:00 |
|
tianshu-Michael-yu
|
269bf46d99
|
fix: disambiguate multimodal prefix cache keys (#36708)
Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>
|
2026-03-20 10:33:20 +08:00 |
|
Flora Feng
|
e5a77a5015
|
[CI] Update mergify tool-calling label paths (#37478)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 02:22:23 +00:00 |
|
Itay Alroy
|
ca1ac1a4b4
|
Fix DP coordinator ZMQ port TOCTOU (#37452)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
|
2026-03-20 00:58:31 +00:00 |
|
Divakar Verma
|
4ca3fa6bb4
|
[ROCm][Bugfix] fix cache block size mismatch for aiter unified attention (#37606)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2026-03-20 00:00:08 +00:00 |
|
Flora Feng
|
be12afd284
|
[Bugfix] Fix Deepseekv32 tool parser when stream interval > 1 (#36056)
|
2026-03-19 19:51:25 -04:00 |
|
Wentao Ye
|
df3c0291a3
|
[Bug] Fix EmbedIOprocessor "classify" <-> "embed" (#37573)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-20 07:40:10 +08:00 |
|
Wentao Ye
|
2be1a0f74b
|
[Refactor] Remove dead code in pooling model (#37572)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-20 07:39:43 +08:00 |
|
Jim Smith
|
4120a05ff1
|
Fix AttributeError in Qwen3.5 GDN layers with quantized models (#37448)
Signed-off-by: Jim Smith <jim@joshua8.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
|
2026-03-19 19:21:14 -04:00 |
|
rasmith
|
98ff042917
|
[CI][BugFix][AMD] Don't set VLLM_ROCM_USE_AITER anymore in test_rocm_aiter_topk since its not necessary (#36996)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-03-20 07:12:45 +08:00 |
|
Artem Perevedentsev
|
b55156eae9
|
[Performance] Enable Triton autotuning disk cache by default (#37188)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
|
2026-03-19 17:36:28 -04:00 |
|
Laith Sakka
|
112944fab9
|
test Qwen/Qwen3-4B-Instruct-2507 for unbacked (#36064)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2026-03-19 17:28:45 -04:00 |
|
bnellnm
|
91be5f9be3
|
[MoE Refactor] Rename "naive" all2all backend (#36294)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-03-19 15:50:34 -04:00 |
|
Aaron Hao
|
4ee847e400
|
Comment fix for async rl example (#35244)
Signed-off-by: hao-aaron <ahao@anyscale.com>
|
2026-03-19 19:46:07 +00:00 |
|
Andreas Karatzas
|
040a505ff5
|
[ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline (#34839)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-19 14:30:58 -05:00 |
|
bnellnm
|
9279c59a0e
|
[MoE Refactor] DefaultMoERunner simplifcation (#33049)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-03-19 15:07:44 -04:00 |
|
Wentao Ye
|
7454096199
|
[Log] Log once in local node by default (#37568)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-19 12:04:59 -07:00 |
|
Andreas Karatzas
|
fb8b5e05fc
|
[CI] Add retry with 4x backoff to HTTP fetches for transient failures (#37218)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-19 19:00:20 +00:00 |
|
Harry Mellor
|
e5d96dc8fc
|
Fix SpeculatorsConfig now that PreTrainedConfig is a dataclass in Transformers (#37574)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 18:04:40 +00:00 |
|
EdalatiAli
|
daa05bf340
|
[Bugfix] Fix AttributeError when serving MXFP8 models with DeepGEMM installed (#37358)
Signed-off-by: EdalatiAli <aliedalati@cohere.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-19 17:58:33 +00:00 |
|
Lucas Kabela
|
7769b58307
|
[torch.compile][BE][Multimodal] Remove requirement to set_model_tag to avoid cache conflict (#37345)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-03-19 17:26:12 +00:00 |
|
Chauncey
|
2f9f946b22
|
[P/D] AnthropicMessages add kv_transfer_params for PD disaggregation (#37535)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-19 16:41:20 +00:00 |
|
Fadi Arafeh
|
2890aecce5
|
[CPU][UX] Do not crash when tcmalloc/libiomp are not ldpreloaded (#37561)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-03-19 16:35:45 +00:00 |
|
Harry Mellor
|
34f093b417
|
[CI] Gate pre-commit on ready label or number of contributions (#37544)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 16:21:57 +00:00 |
|
Harry Mellor
|
4dce8321a9
|
Run MacOS smoke test on daily cron job instead of every commit (#37567)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 16:19:50 +00:00 |
|
Cyrus Leung
|
657855ab41
|
[Misc] Cleanup more configs and processors (#37560)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 15:45:23 +00:00 |
|
Wei Zhao
|
e27b8ba3d1
|
[Bug] Fix fp8 trtllm MoE modular kernel supported routing methods (#37346)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-03-19 11:43:06 -04:00 |
|
Woosuk Kwon
|
40b8363b45
|
[MRV2] Use fp32 for draft logits (#37526)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-19 08:41:21 -07:00 |
|
mikaylagawarecki
|
8b10e4fb31
|
[1/n] Migrate permute_cols to libtorch stable ABI (#31509)
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
|
2026-03-19 11:27:26 -04:00 |
|
Ifta khairul Alam Adil
|
104605cbf2
|
Remove deprecated reasoning_content message field(part-2) (#37480)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Andy Lo <andy@mistral.ai>
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
Signed-off-by: sihao.li <sihao.li@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: JartX <sagformas@epdcenter.es>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Philip Ottesen <phiott256@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com>
Co-authored-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com>
Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 15:20:08 +00:00 |
|
Jee Jee Li
|
96266f119b
|
[LoRA] Minor improvements to LoRA log (#37557)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-03-19 15:18:06 +00:00 |
|
Sage Moore
|
7c0cf3bcd0
|
Cap the number of API servers to 1 when using Elastic EP. (#37466)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-19 10:42:57 -04:00 |
|
Harry Mellor
|
572b432913
|
Stop bench CLI from recursively casting all configs to dict (#37559)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 14:04:03 +00:00 |
|
Cyrus Leung
|
9515c20868
|
[Misc] Clean up processing logic (#37541)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 13:30:20 +00:00 |
|
DorBernsohn
|
c63ca2b2e6
|
[Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support (#37438)
Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>
|
2026-03-19 21:08:00 +08:00 |
|
Harry Mellor
|
a32eaf5bb2
|
[CI] Merge cleanup_pr_body.yml and reminder_comment.yml (#37552)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 12:55:07 +00:00 |
|
XueLiang Yang
|
e390742c59
|
Fix KV Offloading + MLA AssertionError by using num_kv_heads=1 in cpu… (#37536)
Signed-off-by: xueliangyang-oeuler <yxl546827391@gmail.com>
Co-authored-by: xueliangyang-oeuler <yxl546827391@gmail.com>
|
2026-03-19 12:05:07 +00:00 |
|
Cyrus Leung
|
7a6ebcbfcf
|
[Model] Remove unnecessary get_language_model (#37545)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 20:00:36 +08:00 |
|
Cyrus Leung
|
c7bc12c20f
|
[CI/Build] Split out MM pooling tests (#37542)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 11:36:11 +00:00 |
|
wang.yuqi
|
f9e2a38386
|
[Docs] Reorganize pooling docs. (#35592)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 11:25:47 +00:00 |
|
Harry Mellor
|
4426447bba
|
Don't log exc_info when vLLM tries to doenload a file that doesn't exist (#37458)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 10:38:29 +00:00 |
|
Li, Jiang
|
3322e26420
|
[Bugfix] Avoid more OpenMP thread reallocation in CPU torch compile (#37538)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-19 10:24:39 +00:00 |
|
Cyrus Leung
|
765e461065
|
[Bugfix] Fix Nemotron Parse loading (#37407)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 09:55:29 +00:00 |
|
Duyi-Wang
|
6a9cceb219
|
[Bugfix][ROCm] Fix MoRI + AITER FP8 dispatch compatibility for defer_input_quant (#37418)
Signed-off-by: Duyi-Wang <duyi.wang@amd.com>
|
2026-03-19 09:49:27 +00:00 |
|
yassha
|
199f914183
|
fix(cpu): add null check for aligned_alloc in ScratchPadManager (#37369)
Signed-off-by: yassha <50112520+yassha@users.noreply.github.com>
|
2026-03-19 17:45:06 +08:00 |
|
Kunshang Ji
|
ca21483bf9
|
[MISC] fix pin_memory=torch.cuda.is_available(), use is_pin_memory_available (#37415)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-19 09:23:24 +00:00 |
|