Michael Goin
|
a408820f2f
|
[Bugfix] Fix port handling in make_zmq_path (#19117)
|
2025-06-04 21:00:59 -06:00 |
|
Robert Shaw
|
c56ed8bb0e
|
[Bugfix][Nixl] Fix full prefix cache hit bug (#18632)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-05 02:07:32 +00:00 |
|
Reid
|
78dcf56cb3
|
[doc] small fix (#19167)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-05 09:13:50 +08:00 |
|
Nicolò Lucchesi
|
b2fac67130
|
[P/D] Heterogeneous TP (#18833)
Signed-off-by: nicklucche <nlucches@redhat.com>
|
2025-06-04 23:25:34 +00:00 |
|
CYJiang
|
23027e2daf
|
[Misc] refactor: simplify EngineCoreClient.make_async_mp_client in AsyncLLM (#18817)
Signed-off-by: googs1025 <googs1025@gmail.com>
|
2025-06-04 15:37:25 -07:00 |
|
Varun Sundar Rabindranath
|
c3fd4d669a
|
[Kernel] Integrate batched/masked deepgemm kernel (#19111)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
|
2025-06-04 21:59:18 +00:00 |
|
Kebe
|
ef3f98b59f
|
[Bugfix] fix v1 cpu worker fails on macOS (#19121)
|
2025-06-04 20:17:38 +00:00 |
|
Siyuan Liu
|
7ee2590478
|
[TPU] Update dynamo dump file name in compilation test (#19108)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-06-04 16:13:43 -04:00 |
|
Michael Goin
|
53a5a0ce30
|
[Perf] Tunings for SM100 FP8 CUTLASS kernel (#18778)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-04 10:46:28 -07:00 |
|
Tyler Michael Smith
|
d459fae0a2
|
[Bugfix][EP+DP] Fix internode check (#19112)
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
|
2025-06-04 23:39:23 +08:00 |
|
jmswen
|
c8dcc15921
|
Allow AsyncLLMEngine.generate to target a specific DP rank (#19102)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-04 08:26:47 -07:00 |
|
Cyrus Leung
|
8f4ffbd373
|
[Doc] Update V1 Guide for embedding models (#19141)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-04 22:57:55 +08:00 |
|
Lain
|
5f2cd251d2
|
Sm100 blockwise fp8 swap ab (#18564)
|
2025-06-04 07:48:45 -07:00 |
|
Xu Wenqing
|
02658c2dfe
|
Add DeepSeek-R1-0528 function call chat template (#18874)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
|
2025-06-04 13:24:18 +00:00 |
|
Cyrus Leung
|
01dc9a76db
|
[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-04 04:49:20 -07:00 |
|
wang.yuqi
|
35cf32df30
|
Improve the output precision of embedding models (#19092)
|
2025-06-04 11:48:57 +00:00 |
|
Isotr0py
|
8711bc5e68
|
[Misc] Add packages for benchmark as extra dependency (#19089)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-04 04:18:48 -07:00 |
|
Seiji Eicher
|
2669a0d7b5
|
Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-06-04 17:10:45 +08:00 |
|
Siyuan Liu
|
8e972d9c44
|
[TPU] Skip hanging tests (#19115)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-06-04 01:43:00 -07:00 |
|
汪志鹏
|
3336c8cfbe
|
Fix #19130 (#19132)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-04 01:42:06 -07:00 |
|
Woosuk Kwon
|
b124e1085b
|
[Bugfix] Fix FA3 full cuda graph correctness (#19106)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-03 23:10:15 -07:00 |
|
Kaixi Hou
|
41aa578428
|
[NVIDIA] Add Cutlass MLA backend (#17625)
|
2025-06-03 21:40:26 -07:00 |
|
Calvin Chen
|
8d646c2e53
|
[Cleanup][v1]:remote guided-decoding-backend for example (#19059)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-06-04 04:23:26 +00:00 |
|
Vadim Gimpelson
|
5d6d1adf15
|
[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437)
|
2025-06-03 21:13:01 -07:00 |
|
Lukas Geiger
|
1409ef9134
|
[Core] Cast multimodal input in hf processor (#18862)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-03 20:24:56 -07:00 |
|
Li, Jiang
|
4555143ea7
|
[CPU] V1 support for the CPU backend (#16441)
|
2025-06-03 18:43:01 -07:00 |
|
Russell Bryant
|
52dceb172d
|
[Docs] Add developer doc about CI failures (#18782)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-06-04 01:09:13 +00:00 |
|
Jiaxin Shan
|
abd7df2fca
|
[Misc] Fix path and python alias errors in disagg_prefill exmaples (#18919)
|
2025-06-03 17:15:18 -07:00 |
|
Yan Ru Pei
|
b712be98c7
|
feat: add data parallel rank to KVEventBatch (#18925)
|
2025-06-03 17:14:20 -07:00 |
|
Chen Zhang
|
a8da78eac9
|
[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-04 00:14:06 +00:00 |
|
Nicolò Lucchesi
|
5d96533e22
|
[Bugfix][P/D] Fix Prefix Cache Bug (#18411)
Signed-off-by: nicklucche <nlucches@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-06-03 23:53:16 +00:00 |
|
Chauncey
|
4de790fcad
|
[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled (#19075)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-06-03 23:27:24 +00:00 |
|
Chen Zhang
|
b5fd9506c1
|
[Bugfix] get_num_blocks_to_allocate with null_block (#19031)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 15:30:55 -07:00 |
|
Ekagra Ranjan
|
135cf55cd1
|
[V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix (#18971)
|
2025-06-03 15:26:33 -07:00 |
|
Chen Zhang
|
6cac54f4d1
|
[v1] Re-init input batch for multiple kv cache groups (#18654)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 21:41:36 +00:00 |
|
Harry Mellor
|
6865fe0074
|
Fix interaction between Optional and Annotated in CLI typing (#19093)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Yikun Jiang <yikun@apache.org>
|
2025-06-03 21:07:19 +00:00 |
|
Michael Goin
|
e31446b6c8
|
[Perf] Tune scaled_fp8_quant by increasing vectorization (#18844)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-03 13:48:25 -07:00 |
|
Yong Hoon Shin
|
bdf13965ab
|
[V1] Support cross-layer KV sharing (#18212)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-06-03 20:33:07 +00:00 |
|
Varun Sundar Rabindranath
|
fa98d77773
|
[Kernel] DeepEP dispatch-combine kernel integration (#18434)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-03 12:30:02 -07:00 |
|
Reid
|
01eee40536
|
[doc] update docker version (#19074)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-03 19:08:21 +00:00 |
|
SorenDreano
|
19bdaf32b1
|
[Doc] Readme standardization (#18695)
Co-authored-by: Soren Dreano <soren@numind.ai>
|
2025-06-03 11:50:55 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
CYJiang
|
d054da1992
|
[Misc] fix: add miss best_of param validation (#18555)
Signed-off-by: googs1025 <googs1025@gmail.com>
|
2025-06-03 11:02:07 -07:00 |
|
Nicolò Lucchesi
|
4b7817c119
|
[Misc] Add missing _Backend enums (#19081)
Signed-off-by: nicklucche <nlucches@redhat.com>
|
2025-06-03 16:15:16 +00:00 |
|
Lu Fang
|
d00dd65cd4
|
[Doc] Improve the Pull Request template with key components (#19086)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-03 23:44:34 +08:00 |
|
Raushan Turganbay
|
d81edded69
|
[Bugfix] disable processor cache (#19068)
Signed-off-by: raushan <raushan@huggingface.co>
|
2025-06-03 15:06:04 +00:00 |
|
Harry Mellor
|
476844d44c
|
Fix underscores in dict keys passed via CLI (#19030)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-06-03 14:39:24 +00:00 |
|
Jee Jee Li
|
4e68ae5e59
|
[CI/Build] Remove V0 LoRA test (#19066)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-03 14:30:18 +00:00 |
|
youkaichao
|
4e88723f32
|
[doc] clarify windows support (#19088)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-06-03 21:42:17 +08:00 |
|
Cyrus Leung
|
118ff92111
|
[Doc] Update V1 user guide for embedding and enc-dec models (#19060)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-03 02:29:41 -07:00 |
|