Commit Graph

14979 Commits

Author SHA1 Message Date
Harry Mellor
39bfb57b7c Add API docs link if the CLI arg is a config class (#37432)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-18 17:19:35 +00:00
RonaldBXu
c9d838fc33 Adding deterministic lora benchmarking to vLLM Bench (#36057)
Signed-off-by: Ubuntu <ubuntu@ip-172-31-43-201.ap-northeast-1.compute.internal>
Signed-off-by: Ronald Xu <ronaldxu@amazon.com>
2026-03-18 16:02:03 +00:00
Xin Yang
b1169d7be8 [Kernel] Add gpt-oss Router GEMM kernel (#37205)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-03-18 08:15:56 -07:00
XLiu-2000
17808394bc standardize load_weights using AutoWeightsLoader for kimi_linear and minimax_text_01 (#37371)
Signed-off-by: XuLiu <xuliu40@gmail.com>
Co-authored-by: XuLiu <xuliu40@gmail.com>
2026-03-18 15:05:37 +00:00
elvischenv
296839a1b0 [Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE (#30647)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2026-03-18 15:01:26 +00:00
Wentao Ye
c373b5c00d [Log] Reduce duplicate log (#37313)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-18 10:57:44 -04:00
Itay Alroy
de1a86b7de elastic_ep: Fix stateless group port races (#36330)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
2026-03-18 14:36:18 +00:00
Cyrus Leung
99267c23ca [2/3] Refactor InternVL-based processors (#37324)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-18 22:22:19 +08:00
Or Ozeri
525f2eeb0b [kv_offload+HMA][6/N]: Split offloading_connector.py (#37405)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-03-18 14:42:46 +01:00
Yufeng He
918b7890a1 [Bugfix] Fix base64 JPEG video frames returning empty metadata (#37301)
Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-18 13:40:03 +00:00
Andy Lo
98b09ddc27 [NIXL][Bugfix] metrics & testing minor bug (#36051)
Signed-off-by: Andy Lo <andy@mistral.ai>
2026-03-18 14:39:14 +01:00
Shwetha Poojary
cef1f302d2 [Model] Enable LoRA support for tower and connector in H2OVL (#31696)
Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>
2026-03-18 13:26:47 +00:00
Elvir Crnčević
17c47fb869 [Bugfix] Fix EP weight filter breaking EPLB and NVFP4 accuracy (#37322)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
2026-03-18 18:30:29 +08:00
Chauncey
b322b197f1 [Build] Bump python openai version (#32316)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-03-18 18:20:10 +08:00
Andreas Karatzas
eaf7c9b976 [CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename (#37328)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-18 09:44:12 +00:00
Aaron Hao
47a1f11bff [docs] Add docs for new RL flows (#36188)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-18 09:04:26 +00:00
Karan Bansal
fad09e8a1f fix(glm47): improve tool call parsing and content normalization (#37386)
Signed-off-by: karanb192 <karan@example.com>
Co-authored-by: karanb192 <karan@example.com>
2026-03-18 08:12:21 +00:00
Jee Jee Li
8c31f47c63 [LoRA] Make LoRA respect language_model_only (#37375)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-03-18 07:53:34 +00:00
Li, Jiang
261801242f [Bugfix] Avoid OpenMP thread reallocation in CPU torch compile (#37391)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-03-18 07:51:39 +00:00
Or Ozeri
fcf0687b27 [kv_offload+HMA][0/N]: Support block-level preemption handling (#34805)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-03-18 08:49:53 +02:00
liuzhenwei
86b7e3c95a [XPU] skip unsupported ut and update test_nixl_connector (#37179)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-18 13:32:59 +08:00
Andrew Xia
0e95916155 [responsesAPI] parser.extract_response_outputs can take in token IDs (#37130)
Signed-off-by: Andrew Xia <axia@meta.com>
2026-03-18 05:31:31 +00:00
Andreas Karatzas
ce2ef42fd3 [CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-18 05:26:20 +00:00
Andreas Karatzas
8b6325758c [ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture (#37349)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-18 04:55:40 +00:00
gxd3
a0dd1995c7 [Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924)
Signed-off-by: Guangxiang Du <gxd@google.com>
2026-03-18 12:53:28 +08:00
Xin Yang
f1740006e4 [Perf] Enable dual stream execution of input projection for Qwen3 (#36795)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-03-18 11:13:27 +08:00
Andreas Karatzas
58cde5c026 [ROCm][CI] Skip trtllm kvfp8 dequant tests on ROCm (#37330)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-18 11:12:26 +08:00
Roy Wang
761e0aa7a0 [Performance] Add --enable-ep-weight-filter CLI option (#37351)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 09:36:55 +08:00
Yanan Cao
ff9fbc9aff [Kernel][Helion] [16/N] Refactor register_kernel API to be more Dynamo-friendly (#36705)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 01:23:35 +00:00
Divakar Verma
e6c4797704 [ROCm][Quantization] add fp8xfp8 attn support for rocm_aiter_unified_attn (#36927)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2026-03-18 08:49:32 +08:00
Michael Goin
09e4576f65 [Kernel] Add non-gated support for NVFP4 CUTLASS MoE (#37320)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-03-17 18:12:04 -04:00
Andreas Karatzas
3ed7b1e6e0 [ROCm] Validate block_size for explicitly selected attention backends (#36846)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-17 17:04:40 -05:00
JartX
e8f9dbc369 [Bugfix][ROCm] Fix worker startup OOM on ROCm by skipping unreliable cudagraph memory profiling (#36720)
Signed-off-by: JartX <sagformas@epdcenter.es>
2026-03-17 17:55:34 -04:00
Yong Hoon Shin
de35c06c66 Make KV connector metadata build overridable via plugin (#37336)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2026-03-17 21:29:06 +00:00
Athrael Soju
c0745a851a [Model] Add ColQwen3.5 4.5B support (#36887)
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-17 21:17:02 +00:00
Ekagra Ranjan
b5ca9c3557 [Models] Cohere ASR (#35809)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
2026-03-17 21:04:17 +00:00
Chao-Ju Chen
245758992e [Bugfix] Rescale NVFP4 weight scales to fix BF16 dequant underflow (#34577)
Signed-off-by: ricky-chaoju <ricky.chen@infinirc.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-03-17 20:48:42 +00:00
Dimitrios Bariamis
1204cf0a9d [Bugfix] Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
2026-03-17 20:13:06 +00:00
Wei Zhao
b36adfa349 [Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache (#37252)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
2026-03-17 20:09:20 +00:00
Michael Goin
e78821b438 [Deprecation] Deprecate --calculate-kv-scales option (#37201)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2026-03-17 19:57:24 +00:00
Cyrus Leung
51f0acda79 [Model] Remove unused handle_oov_mm_token (#37321)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-17 19:44:52 +00:00
Brian Dellabetta
fa75204b16 bump compressed-tensors version to 0.14.0.1 (#36988)
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
2026-03-17 15:36:19 -04:00
Wentao Ye
bdb903bb5f [Bug] Fix FlashInfer MNNVL socket collisions under concurrent vLLM jobs (#36674)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-17 15:19:52 -04:00
Andrey Talman
68f783a727 [Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673)
Signed-off-by: atalman <atalman@fb.com>
2026-03-17 18:47:59 +00:00
Avinash Singh
c5030c439d [CI] Split Distributed Tests (4 GPUs) and Kernel MoE tests (#37100)
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Signed-off-by: Avinash Singh  <107198269+avinashsingh77@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
2026-03-17 11:44:55 -07:00
Michael Goin
51b2333be1 [Perf] Optimize top-k search in apply_top_k_top_p_triton sampler (#37225)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-03-17 11:35:17 -07:00
Andreas Karatzas
4ed51308c8 [CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ (#37230)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-17 09:08:08 -07:00
Cyrus Leung
c781fbbab3 [Bugfix] Standardize custom HF Processor init (#37289)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-17 15:38:55 +00:00
Richard Zou
979ff44cea [BugFix] PyTorch Compilation Tests should error if any test fails (#37300)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-03-17 15:26:38 +00:00
Benjamin Chislett
f63ed7b5ac [Bugfix] Fix DP MTP Dummy Run (#35243)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-03-17 11:16:48 -04:00