Varun Sundar Rabindranath
|
fdadb6f43a
|
[Bugfix] Fused MoE Modular Kernel chunking loop (#20392)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-10 20:31:10 +00:00 |
|
Alex Brooks
|
41060c6e08
|
[Core] Add Support for Default Modality Specific LoRAs [generate / chat completions] (#19126)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-07-10 21:09:37 +01:00 |
|
Ming Yang
|
3de2ed767f
|
[Bugfix] Remove assertion of expert_map being None (#20714)
Signed-off-by: Ming Yang <yming@meta.com>
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-07-10 19:55:22 +00:00 |
|
Wentao Ye
|
299252ea82
|
[CI] Fix pre commit issue (#20782)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-10 12:48:13 -07:00 |
|
Nathan Hoos
|
d6902ce79f
|
[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975)
Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>
|
2025-07-10 15:30:26 -04:00 |
|
Sanger Steel
|
5e53c89a74
|
[Bugfix] [CI] Fix Tensorizer LoRA test (#20760)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
|
2025-07-10 19:07:06 +00:00 |
|
QiliangCui
|
c66e38ea4c
|
[Test] Remove docker build from test. (#20542)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-10 11:21:58 -07:00 |
|
sfbemerk
|
251595368f
|
Fix DeepSeek-R1-0528 chat template (#20717)
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
|
2025-07-10 17:47:36 +00:00 |
|
shineran96
|
4bed167768
|
[Model][VLM] Support JinaVL Reranker (#20260)
Signed-off-by: shineran96 <shinewang96@gmail.com>
|
2025-07-10 10:43:43 -07:00 |
|
Asher
|
b140416abf
|
[Model] Add reason parser for Hunyuan A13B Model. (#20625)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
|
2025-07-10 16:33:26 +00:00 |
|
Gregory Shtrasberg
|
5b8366b61a
|
[ROCm][Regression] Remove tensor creation that harms performance on ROCm (#20741)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-07-10 09:22:23 -07:00 |
|
nishith-fujitsu
|
c7753a9809
|
[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU (#14129)
Signed-off-by: nishith-fujitsu <nishith.jaiswal@fujitsu.com>
|
2025-07-10 15:59:04 +00:00 |
|
Michael Goin
|
4b9a9435bb
|
Update Dockerfile FlashInfer to v0.2.8rc1 (#20718)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-10 08:09:02 -07:00 |
|
Harry Mellor
|
3482fd7e4e
|
[Doc] Add engine args back in to the docs (#20674)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-10 08:02:40 -07:00 |
|
Isotr0py
|
77f77a951e
|
[Misc] Clean up mark to fork process in BNB tests (#20692)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-10 13:59:40 +00:00 |
|
Michael Goin
|
1a4f35e2ea
|
Normalize lm-eval command between baseline and correctness test (#18560)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-10 13:27:32 +00:00 |
|
Michael Goin
|
be1e128dfb
|
[CI Bugfix] Skip failing Tensorizer+LoRA test (#20724)
|
2025-07-10 21:15:03 +09:00 |
|
Reid
|
65393ee064
|
[doc] fix ordered list (#20749)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-10 03:13:52 -07:00 |
|
Gregory Shtrasberg
|
dc221ad72d
|
[Bugfix][Build][Non-CUDA] Only referencing CMAKE_CUDA_COMPILER_VERSION on CUDA where it is defined (#20738)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-07-10 02:58:11 -07:00 |
|
Jee Jee Li
|
7571a4a7e5
|
[CI/Build] Fix Basic Models Test (#20728)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-10 09:57:19 +00:00 |
|
Isotr0py
|
f67d986dd1
|
[Misc] loose new-model tagger conditions (#20747)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-10 02:54:47 -07:00 |
|
Or Ozeri
|
cc876d0f29
|
[KVConnector] Aggregate finished requests on the scheduler (#19555)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-07-10 09:22:18 +01:00 |
|
Chenyaaang
|
fdfd409f8f
|
[TPU][Core]Make load weight exceed hbm error more instructive for customers (#20644)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-07-10 07:01:17 +00:00 |
|
Nick Hill
|
ffbcc9e757
|
[BugFix] Fix VllmConfig() construction on all platforms (#20695)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-10 07:00:20 +00:00 |
|
Nick Hill
|
59389c927b
|
[BugFix][CPU] Fix CPU worker dependency on cumem_allocator (#20696)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-10 14:24:20 +08:00 |
|
Chauncey
|
8f2720def9
|
[Frontend] Support Tool Calling with both tool_choice='required' and $defs. (#20629)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-10 13:56:35 +08:00 |
|
Seiji Eicher
|
ad6c2e1a0b
|
Correct PPMissingLayer handling in Deepseek-V2-Lite PP deployment (#20665)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-07-09 20:34:40 -07:00 |
|
Michael Goin
|
49e8c7ea25
|
Use NVCC --compress-mode to reduce binary size by 30% (#20694)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-09 18:26:48 -07:00 |
|
Varun Sundar Rabindranath
|
805d62ca88
|
[Misc] DP : Add ExpertTokensMetadata (#20332)
Signed-off-by: Varun <vsundarr@redhat.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
|
2025-07-10 00:33:14 +00:00 |
|
Michael Goin
|
b7d9e9416f
|
[CI/Build] Fix FlashInfer double build in Dockerfile (#20651)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-09 17:41:56 -06:00 |
|
Woosuk Kwon
|
7c12a765aa
|
[Misc] Simplify the prefix caching logic on draft tokens (#20701)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-09 14:48:35 -07:00 |
|
Yiming
|
cd587c93ef
|
[BugFix]: Properly set engine_id when using multi connector (#19487)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: leiyiming <leiyiming@kingsoft.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-07-09 20:32:44 +00:00 |
|
fxmarty-amd
|
332d4cb17b
|
[Feature][Quantization] MXFP4 support for MOE models (#17888)
Signed-off-by: Felix Marty <felmarty@amd.com>
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
|
2025-07-09 13:19:02 -07:00 |
|
Jacob Manning
|
bf03ff3575
|
[Kernel] Add Conch backend for mixed-precision linear layer (#19818)
Signed-off-by: Jacob Manning <jmanning+oss@stackav.com>
|
2025-07-09 13:17:55 -07:00 |
|
Tuan, Hoang-Trong
|
47043eb678
|
[Kernel] Triton implementation of causal-conv1d for Mamba-based models (#18218)
Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-09 12:53:55 -07:00 |
|
Michael Goin
|
31b96d1c64
|
Support Llama 4 for cutlass_moe_fp4 (#20453)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-09 15:53:38 -04:00 |
|
Li, Jiang
|
e59ba9e142
|
[CI/Build] Enlarge tolerance for a CPU multi-modal test (#20684)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-09 17:48:52 +00:00 |
|
Harry Mellor
|
403b481573
|
Remove heading form installation inc.md file (#20697)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-09 10:42:51 -07:00 |
|
Li, Jiang
|
138709f8d1
|
[Doc] Update CPU doc (#20676)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-09 10:28:30 -07:00 |
|
Michael Goin
|
0bbac1c1b4
|
[Bench] Add NVFP4 GEMM benchmark script (#20578)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-09 13:23:48 -04:00 |
|
Liangliang Ma
|
a3e4e85ece
|
[XPU][CI] enhance xpu test support (#20652)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
|
2025-07-09 16:53:09 +00:00 |
|
Chengji Yao
|
eb58f5953d
|
[TPU][Bugfix] fix test_pallas (#20666)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-09 09:32:48 -07:00 |
|
Sanger Steel
|
4ac9c33f78
|
[Bugfix] Fix handling of Tensorizer arguments for LoadConfig (#20643)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
|
2025-07-09 15:36:37 +00:00 |
|
Reid
|
efe73d0575
|
[doc] update doc format (#20673)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-09 08:08:19 -07:00 |
|
Ricardo Decal
|
853487bc1b
|
[Docs] Improve docs for RLHF co-location example (#20599)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-09 08:06:43 -07:00 |
|
Li Wang
|
9ff2af6d2b
|
[Benchmark] Parameterization of streaming loading of multimodal datasets (#20528)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-07-09 13:35:16 +00:00 |
|
Cyrus Leung
|
70ca5484f5
|
[Doc] Update notes (#20668)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-09 03:46:36 -07:00 |
|
Thomas Parnell
|
5358cce5ff
|
[V1] [Doc] Update V1 docs for Mamba models (#20499)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-09 01:02:41 -07:00 |
|
Chauncey
|
2155e95ef1
|
[Bugfix] Fix the issue where reasoning_content is None when Thinkng is enabled and tool_choice is set to 'required'. (#20662)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-09 07:39:58 +00:00 |
|
qscqesze
|
f95570a52d
|
[Docs] fix minimax tool_calling docs error (#20667)
Signed-off-by: qingjun <qingjun@minimaxi.com>
|
2025-07-09 00:37:07 -07:00 |
|