Harry Mellor
|
e5d96dc8fc
|
Fix SpeculatorsConfig now that PreTrainedConfig is a dataclass in Transformers (#37574)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 18:04:40 +00:00 |
|
EdalatiAli
|
daa05bf340
|
[Bugfix] Fix AttributeError when serving MXFP8 models with DeepGEMM installed (#37358)
Signed-off-by: EdalatiAli <aliedalati@cohere.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-19 17:58:33 +00:00 |
|
Lucas Kabela
|
7769b58307
|
[torch.compile][BE][Multimodal] Remove requirement to set_model_tag to avoid cache conflict (#37345)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-03-19 17:26:12 +00:00 |
|
Chauncey
|
2f9f946b22
|
[P/D] AnthropicMessages add kv_transfer_params for PD disaggregation (#37535)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-19 16:41:20 +00:00 |
|
Fadi Arafeh
|
2890aecce5
|
[CPU][UX] Do not crash when tcmalloc/libiomp are not ldpreloaded (#37561)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-03-19 16:35:45 +00:00 |
|
Harry Mellor
|
34f093b417
|
[CI] Gate pre-commit on ready label or number of contributions (#37544)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 16:21:57 +00:00 |
|
Harry Mellor
|
4dce8321a9
|
Run MacOS smoke test on daily cron job instead of every commit (#37567)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 16:19:50 +00:00 |
|
Cyrus Leung
|
657855ab41
|
[Misc] Cleanup more configs and processors (#37560)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 15:45:23 +00:00 |
|
Wei Zhao
|
e27b8ba3d1
|
[Bug] Fix fp8 trtllm MoE modular kernel supported routing methods (#37346)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-03-19 11:43:06 -04:00 |
|
Woosuk Kwon
|
40b8363b45
|
[MRV2] Use fp32 for draft logits (#37526)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-19 08:41:21 -07:00 |
|
mikaylagawarecki
|
8b10e4fb31
|
[1/n] Migrate permute_cols to libtorch stable ABI (#31509)
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
|
2026-03-19 11:27:26 -04:00 |
|
Ifta khairul Alam Adil
|
104605cbf2
|
Remove deprecated reasoning_content message field(part-2) (#37480)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Andy Lo <andy@mistral.ai>
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
Signed-off-by: sihao.li <sihao.li@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: JartX <sagformas@epdcenter.es>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Philip Ottesen <phiott256@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com>
Co-authored-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com>
Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 15:20:08 +00:00 |
|
Jee Jee Li
|
96266f119b
|
[LoRA] Minor improvements to LoRA log (#37557)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-03-19 15:18:06 +00:00 |
|
Sage Moore
|
7c0cf3bcd0
|
Cap the number of API servers to 1 when using Elastic EP. (#37466)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-19 10:42:57 -04:00 |
|
Harry Mellor
|
572b432913
|
Stop bench CLI from recursively casting all configs to dict (#37559)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 14:04:03 +00:00 |
|
Cyrus Leung
|
9515c20868
|
[Misc] Clean up processing logic (#37541)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 13:30:20 +00:00 |
|
DorBernsohn
|
c63ca2b2e6
|
[Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support (#37438)
Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>
|
2026-03-19 21:08:00 +08:00 |
|
Harry Mellor
|
a32eaf5bb2
|
[CI] Merge cleanup_pr_body.yml and reminder_comment.yml (#37552)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 12:55:07 +00:00 |
|
XueLiang Yang
|
e390742c59
|
Fix KV Offloading + MLA AssertionError by using num_kv_heads=1 in cpu… (#37536)
Signed-off-by: xueliangyang-oeuler <yxl546827391@gmail.com>
Co-authored-by: xueliangyang-oeuler <yxl546827391@gmail.com>
|
2026-03-19 12:05:07 +00:00 |
|
Cyrus Leung
|
7a6ebcbfcf
|
[Model] Remove unnecessary get_language_model (#37545)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 20:00:36 +08:00 |
|
Cyrus Leung
|
c7bc12c20f
|
[CI/Build] Split out MM pooling tests (#37542)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 11:36:11 +00:00 |
|
wang.yuqi
|
f9e2a38386
|
[Docs] Reorganize pooling docs. (#35592)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 11:25:47 +00:00 |
|
Harry Mellor
|
4426447bba
|
Don't log exc_info when vLLM tries to doenload a file that doesn't exist (#37458)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 10:38:29 +00:00 |
|
Li, Jiang
|
3322e26420
|
[Bugfix] Avoid more OpenMP thread reallocation in CPU torch compile (#37538)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-19 10:24:39 +00:00 |
|
Cyrus Leung
|
765e461065
|
[Bugfix] Fix Nemotron Parse loading (#37407)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 09:55:29 +00:00 |
|
Duyi-Wang
|
6a9cceb219
|
[Bugfix][ROCm] Fix MoRI + AITER FP8 dispatch compatibility for defer_input_quant (#37418)
Signed-off-by: Duyi-Wang <duyi.wang@amd.com>
|
2026-03-19 09:49:27 +00:00 |
|
yassha
|
199f914183
|
fix(cpu): add null check for aligned_alloc in ScratchPadManager (#37369)
Signed-off-by: yassha <50112520+yassha@users.noreply.github.com>
|
2026-03-19 17:45:06 +08:00 |
|
Kunshang Ji
|
ca21483bf9
|
[MISC] fix pin_memory=torch.cuda.is_available(), use is_pin_memory_available (#37415)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-19 09:23:24 +00:00 |
|
TJian
|
da70c87e81
|
[CI] Fix wrong path test file, missing rlhf_async_new_apis.py (#37532)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-03-19 02:21:55 -07:00 |
|
Collin McCarthy
|
0b6d52629f
|
Support temporal compression for Nemotron-3-VL videos (#36808)
Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com>
|
2026-03-19 08:02:19 +00:00 |
|
Ziming Huang
|
d3cc379567
|
[Perf] Fix slow hasattr in CUDAGraphWrapper.__getattr__ (#37425)
Signed-off-by: 智鸣 <hzm414167@alibaba-inc.com>
|
2026-03-19 15:43:48 +08:00 |
|
cdpath
|
354cd580d5
|
fix(anthropic): remove non-standard 'data: [DONE]' from Anthropic streaming (#37510)
Signed-off-by: cdpath <cdpath@outlook.com>
|
2026-03-19 07:23:35 +00:00 |
|
zhanqiuhu
|
d49f273144
|
[SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310)
|
2026-03-19 08:22:00 +01:00 |
|
Flora Feng
|
b21d384304
|
[Refactor] Relocate endpoint tests to mirror serving code directory structure (#37504)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-19 07:19:36 +00:00 |
|
Hongxia Yang
|
e3126cd107
|
[ROCm] issue management - request information for bug issues on ROCm (#37009)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-19 03:51:29 +00:00 |
|
Wentao Ye
|
e37ff5b5c8
|
[Perf] Optimize token_embed for pooling models, 1.0% token throughput improvement (#37347)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-19 10:27:51 +08:00 |
|
Aaron Hao
|
6accb21f2a
|
[bug] Fix deadlock with pause resume and collective_rpc (#37024)
Signed-off-by: hao-aaron <ahao@anyscale.com>
|
2026-03-19 01:49:02 +00:00 |
|
Giancarlo Delfin
|
053f3b6309
|
[Model Runner V2] Spec decode rejection sampler logprobs support (#37237)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-19 01:36:27 +00:00 |
|
Aaron Hao
|
5f82706a21
|
[BUG] Exclude SKIP_TENSORS from get_layer_size() + new weight sync example for dpep (#37334)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-03-19 00:45:10 +00:00 |
|
Sage Moore
|
c32a58cc2a
|
[EPLB] Simplify EPLB rearrange by only returning one map (#36267)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-18 20:34:00 -04:00 |
|
Elvir Crnčević
|
ef2c4f778d
|
[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-19 00:28:37 +00:00 |
|
sihao_li
|
9dade5da3a
|
[XPU]Unify xpu test dependencies in dockerfile.xpu (#36477)
Signed-off-by: sihao.li <sihao.li@intel.com>
|
2026-03-19 08:12:07 +08:00 |
|
Thillai Chithambaram
|
828f862acb
|
[Bugfix] Expand quantization method support in perf metrics (#37231)
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
|
2026-03-18 23:54:19 +00:00 |
|
Andy Lo
|
577df69b26
|
[Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish (#37054)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-03-18 23:07:29 +00:00 |
|
Giancarlo Delfin
|
04244fd0e1
|
[Model Runner V2] Spec decode rejection sampler greedy support (#37238)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-18 15:59:03 -07:00 |
|
Michael Goin
|
9482b0b085
|
[Bugfix] Remove assertion for NVFP4 scale dynamic range (#37465)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-18 15:37:49 -07:00 |
|
Woosuk Kwon
|
5bc1da147f
|
[LoRA][BugFix] Fix skipped LoRA adapters for Mistral3 (#36928)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-18 22:34:19 +00:00 |
|
Philip Ottesen
|
0091017188
|
fix(worker): optimize swap_states to copy only active token prefixes (#34733)
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
|
2026-03-18 14:59:27 -07:00 |
|
Wentao Ye
|
0d81a1fe61
|
[V0 Deprecation] Deprecate virtual engine (#37195)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-18 14:30:14 -07:00 |
|
Netanel Haber
|
6ae4c8d6fc
|
chunk parakeet into 30s clips to prevent OOMs on long audios (#36671)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-03-18 14:22:24 -07:00 |
|