Hongxia Yang
|
e3126cd107
|
[ROCm] issue management - request information for bug issues on ROCm (#37009)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-19 03:51:29 +00:00 |
|
Wentao Ye
|
e37ff5b5c8
|
[Perf] Optimize token_embed for pooling models, 1.0% token throughput improvement (#37347)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-19 10:27:51 +08:00 |
|
Aaron Hao
|
6accb21f2a
|
[bug] Fix deadlock with pause resume and collective_rpc (#37024)
Signed-off-by: hao-aaron <ahao@anyscale.com>
|
2026-03-19 01:49:02 +00:00 |
|
Giancarlo Delfin
|
053f3b6309
|
[Model Runner V2] Spec decode rejection sampler logprobs support (#37237)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-19 01:36:27 +00:00 |
|
Aaron Hao
|
5f82706a21
|
[BUG] Exclude SKIP_TENSORS from get_layer_size() + new weight sync example for dpep (#37334)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-03-19 00:45:10 +00:00 |
|
Sage Moore
|
c32a58cc2a
|
[EPLB] Simplify EPLB rearrange by only returning one map (#36267)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-18 20:34:00 -04:00 |
|
Elvir Crnčević
|
ef2c4f778d
|
[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-19 00:28:37 +00:00 |
|
sihao_li
|
9dade5da3a
|
[XPU]Unify xpu test dependencies in dockerfile.xpu (#36477)
Signed-off-by: sihao.li <sihao.li@intel.com>
|
2026-03-19 08:12:07 +08:00 |
|
Thillai Chithambaram
|
828f862acb
|
[Bugfix] Expand quantization method support in perf metrics (#37231)
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
|
2026-03-18 23:54:19 +00:00 |
|
Andy Lo
|
577df69b26
|
[Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish (#37054)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-03-18 23:07:29 +00:00 |
|
Giancarlo Delfin
|
04244fd0e1
|
[Model Runner V2] Spec decode rejection sampler greedy support (#37238)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-18 15:59:03 -07:00 |
|
Michael Goin
|
9482b0b085
|
[Bugfix] Remove assertion for NVFP4 scale dynamic range (#37465)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-18 15:37:49 -07:00 |
|
Woosuk Kwon
|
5bc1da147f
|
[LoRA][BugFix] Fix skipped LoRA adapters for Mistral3 (#36928)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-18 22:34:19 +00:00 |
|
Philip Ottesen
|
0091017188
|
fix(worker): optimize swap_states to copy only active token prefixes (#34733)
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
|
2026-03-18 14:59:27 -07:00 |
|
Wentao Ye
|
0d81a1fe61
|
[V0 Deprecation] Deprecate virtual engine (#37195)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-18 14:30:14 -07:00 |
|
Netanel Haber
|
6ae4c8d6fc
|
chunk parakeet into 30s clips to prevent OOMs on long audios (#36671)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-03-18 14:22:24 -07:00 |
|
JartX
|
a913b612d8
|
[Bugfix] Fix ROCm crash in qwen3_next multi-stream events (#36795) (#37427)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2026-03-18 16:06:31 -04:00 |
|
Harry Mellor
|
5ce2d10e4a
|
Fix models which use layer_type_validation for Transformers v5 (#37398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-18 18:41:51 +00:00 |
|
Chengyu Fang
|
738d0a281f
|
[Bugfix] Fix incorrect use of merge_size in Qwen3-VL video timestamp calculation (#37439)
Signed-off-by: chengyufang <cnyvfang@outlook.com>
|
2026-03-18 11:36:34 -07:00 |
|
youkaichao
|
70b81c4f3d
|
[bugfix][async scheduling] fix extra cuda context in device 0 with EP/DP (#37449)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2026-03-18 18:32:30 +00:00 |
|
Cyrus Leung
|
7476d148db
|
[Model] Remove unnecessary processor definition for Nemotron Parse (#37456)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-18 18:25:13 +00:00 |
|
Cyrus Leung
|
f3732bd931
|
[Misc] Clean up model registry (#37457)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-18 18:24:44 +00:00 |
|
Wentao Ye
|
0ef7f79054
|
[Perf] Add tuned triton moe config for Qwen3.5 H200, 9.9% E2E throughput improvement (#37340)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-18 14:18:34 -04:00 |
|
Or Ozeri
|
5dd8df0701
|
[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-18 19:26:40 +02:00 |
|
Harry Mellor
|
39bfb57b7c
|
Add API docs link if the CLI arg is a config class (#37432)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-18 17:19:35 +00:00 |
|
RonaldBXu
|
c9d838fc33
|
Adding deterministic lora benchmarking to vLLM Bench (#36057)
Signed-off-by: Ubuntu <ubuntu@ip-172-31-43-201.ap-northeast-1.compute.internal>
Signed-off-by: Ronald Xu <ronaldxu@amazon.com>
|
2026-03-18 16:02:03 +00:00 |
|
Xin Yang
|
b1169d7be8
|
[Kernel] Add gpt-oss Router GEMM kernel (#37205)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-18 08:15:56 -07:00 |
|
XLiu-2000
|
17808394bc
|
standardize load_weights using AutoWeightsLoader for kimi_linear and minimax_text_01 (#37371)
Signed-off-by: XuLiu <xuliu40@gmail.com>
Co-authored-by: XuLiu <xuliu40@gmail.com>
|
2026-03-18 15:05:37 +00:00 |
|
elvischenv
|
296839a1b0
|
[Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE (#30647)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-03-18 15:01:26 +00:00 |
|
Wentao Ye
|
c373b5c00d
|
[Log] Reduce duplicate log (#37313)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-18 10:57:44 -04:00 |
|
Itay Alroy
|
de1a86b7de
|
elastic_ep: Fix stateless group port races (#36330)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
|
2026-03-18 14:36:18 +00:00 |
|
Cyrus Leung
|
99267c23ca
|
[2/3] Refactor InternVL-based processors (#37324)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-18 22:22:19 +08:00 |
|
Or Ozeri
|
525f2eeb0b
|
[kv_offload+HMA][6/N]: Split offloading_connector.py (#37405)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-18 14:42:46 +01:00 |
|
Yufeng He
|
918b7890a1
|
[Bugfix] Fix base64 JPEG video frames returning empty metadata (#37301)
Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-18 13:40:03 +00:00 |
|
Andy Lo
|
98b09ddc27
|
[NIXL][Bugfix] metrics & testing minor bug (#36051)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-03-18 14:39:14 +01:00 |
|
Shwetha Poojary
|
cef1f302d2
|
[Model] Enable LoRA support for tower and connector in H2OVL (#31696)
Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>
|
2026-03-18 13:26:47 +00:00 |
|
Elvir Crnčević
|
17c47fb869
|
[Bugfix] Fix EP weight filter breaking EPLB and NVFP4 accuracy (#37322)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-03-18 18:30:29 +08:00 |
|
Chauncey
|
b322b197f1
|
[Build] Bump python openai version (#32316)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-18 18:20:10 +08:00 |
|
Andreas Karatzas
|
eaf7c9b976
|
[CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename (#37328)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 09:44:12 +00:00 |
|
Aaron Hao
|
47a1f11bff
|
[docs] Add docs for new RL flows (#36188)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-18 09:04:26 +00:00 |
|
Karan Bansal
|
fad09e8a1f
|
fix(glm47): improve tool call parsing and content normalization (#37386)
Signed-off-by: karanb192 <karan@example.com>
Co-authored-by: karanb192 <karan@example.com>
|
2026-03-18 08:12:21 +00:00 |
|
Jee Jee Li
|
8c31f47c63
|
[LoRA] Make LoRA respect language_model_only (#37375)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-18 07:53:34 +00:00 |
|
Li, Jiang
|
261801242f
|
[Bugfix] Avoid OpenMP thread reallocation in CPU torch compile (#37391)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-18 07:51:39 +00:00 |
|
Or Ozeri
|
fcf0687b27
|
[kv_offload+HMA][0/N]: Support block-level preemption handling (#34805)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-18 08:49:53 +02:00 |
|
liuzhenwei
|
86b7e3c95a
|
[XPU] skip unsupported ut and update test_nixl_connector (#37179)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-18 13:32:59 +08:00 |
|
Andrew Xia
|
0e95916155
|
[responsesAPI] parser.extract_response_outputs can take in token IDs (#37130)
Signed-off-by: Andrew Xia <axia@meta.com>
|
2026-03-18 05:31:31 +00:00 |
|
Andreas Karatzas
|
ce2ef42fd3
|
[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 05:26:20 +00:00 |
|
Andreas Karatzas
|
8b6325758c
|
[ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture (#37349)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 04:55:40 +00:00 |
|
gxd3
|
a0dd1995c7
|
[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924)
Signed-off-by: Guangxiang Du <gxd@google.com>
|
2026-03-18 12:53:28 +08:00 |
|
Xin Yang
|
f1740006e4
|
[Perf] Enable dual stream execution of input projection for Qwen3 (#36795)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-18 11:13:27 +08:00 |
|