Nicolò Lucchesi
|
f226a3f0c1
|
[CI][NIXL] Change default block_size for tests (#28927)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-18 09:22:30 -08:00 |
|
Nick Hill
|
5bdd155277
|
[CI] Fix async scheduling + spec decoding test flake (#28902)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-18 05:26:32 +00:00 |
|
Wentao Ye
|
a289cc1dde
|
[Test] Batch Invariant: Rename and organize tests (#27421)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-17 18:09:47 -05:00 |
|
Ronald
|
d8874c61a5
|
[Core] Async Scheduling X Spec Decoding Compatibility (#24799)
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-11-17 12:16:20 -08:00 |
|
Nick Hill
|
80b6080ddc
|
[BugFix] Fix async scheduling + chunked prefill + preemption (#28787)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-17 06:46:46 +08:00 |
|
Eldar Kurtić
|
e439c784fa
|
Add support for Eagle with separate lm-head and embed_tokens layers (#28549)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
|
2025-11-15 06:12:02 -08:00 |
|
Cyrus Leung
|
98b4d389ed
|
[Redo] #26368 (#28771)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-14 22:47:41 -08:00 |
|
Chendi.Xue
|
c9e665852a
|
[NIXL] heterogeneous block_size support (#26759)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-11-14 21:51:32 -08:00 |
|
Nick Hill
|
ac86bff8cb
|
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773)
|
2025-11-14 20:24:00 -08:00 |
|
Jialin Ouyang
|
186352b270
|
[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-14 16:04:04 -08:00 |
|
Nick Hill
|
58e61e56b7
|
[Test] Rework e2e async scheduling tests (#28744)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-14 16:01:09 -08:00 |
|
Laith Sakka
|
2e0ad629b0
|
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-11-14 14:11:10 -08:00 |
|
Marcin Ostrowski
|
0de4f217ab
|
[Bugfix] TypeError: 'NoneType' object is not callable (#27410)
Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>
|
2025-11-14 21:13:53 +00:00 |
|
Cyrus Leung
|
e2741f6cbc
|
[Chore] Rename SchedulerConfig.chunked_prefill_enabled (#28735)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-14 18:39:57 +00:00 |
|
Cyrus Leung
|
511a6b611d
|
[Config] Clean up SchedulerConfig initialization (#28665)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-14 22:41:02 +08:00 |
|
Nicolò Lucchesi
|
96b23b8e3b
|
[Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-14 22:40:05 +08:00 |
|
Yong Hoon Shin
|
9324e10275
|
Fix KV sharing fast prefill with cudagraph enabled (#28537)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-14 11:53:42 +00:00 |
|
rasmith
|
93103575ce
|
[BugFix][CI/Build][ROCM] Fix import error and apply assert in appropriate case in test_struct_output_generate (#28311)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-13 22:41:29 -08:00 |
|
Mark McLoughlin
|
6e25b1cddf
|
[KV Connector] Test async mode in scheduler tests (#28550)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-13 18:30:59 -05:00 |
|
elvischenv
|
5d6ce2b960
|
[Perf] Support stream interval for reducing host overhead (#27869)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-11-13 13:21:25 -05:00 |
|
Yannick Schnider
|
119c4927b3
|
[Bugfix] Fix validate model input for decoder models (#27099)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-13 10:18:47 -08:00 |
|
tjandy98
|
4504e8029b
|
[Bugfix] Prevent crash on empty grammar string (#28210)
Signed-off-by: tjandy98 <3953059+tjandy98@users.noreply.github.com>
|
2025-11-13 06:42:29 +00:00 |
|
Andrew Xia
|
1a0b157a2e
|
[Frontend][responsesAPI][1/n] convert responses API tool input to chat completions tool format (#28231)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-11-13 04:47:22 +00:00 |
|
Jialin Ouyang
|
a1d3866dda
|
[n-gen] DO NOT repeatedly return finished child requests (#28591)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-13 03:36:07 +00:00 |
|
Andy Lo
|
58ce8d12b7
|
[BugFix] Priority scheduling and spec tokens preemption (#28558)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2025-11-12 20:29:21 +00:00 |
|
alberto
|
bac904565f
|
Implement ARC KV cache eviction policy for CPU offloader (#27039)
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: alberto <aperdomo@redhat.com>
Co-authored-by: Or Ozeri <or@ozery.com>
|
2025-11-12 09:51:39 -08:00 |
|
Chenguang Zheng
|
4ccffe561f
|
[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233)
Signed-off-by: n00909098 <nguyen.kha.long@huawei.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Signed-off-by: Khuong Le <khuong.le.manh@huawei.com>
Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com>
Co-authored-by: n00909098 <nguyen.kha.long@huawei.com>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Khuong Le <khuong.le.manh@huawei.com>
Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>
|
2025-11-11 18:58:33 -08:00 |
|
Jialin Ouyang
|
4228be7959
|
[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead (#28245)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-11 10:28:47 -08:00 |
|
Nicolò Lucchesi
|
a7ef3eb0cd
|
[NIXL] Generalize block-first backend layouts (FlashInfer-like) (#28282)
|
2025-11-11 16:57:43 +00:00 |
|
Matthew Bonanni
|
b30dfa03c5
|
[Attention] Refactor CUDA attention backend selection logic (#24794)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-11 07:40:44 -05:00 |
|
Rémi Delacourt
|
6d54336ae5
|
[Bugfix] Fix llguidance backend, rollback when EOS was encountered (#25905)
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: remi <remi@mistral.ai>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-11-10 14:53:32 -05:00 |
|
Mark McLoughlin
|
6f7de33bed
|
[Metrics] Refactor LoRA state tracking (#26801)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-10 16:34:36 +08:00 |
|
usberkeley
|
4a8d6bd168
|
Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method (#28214)
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
|
2025-11-09 19:11:46 +00:00 |
|
Nick Hill
|
289eb6c537
|
[Core] Simplify async KV output aggregation (#28327)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-09 09:44:13 -08:00 |
|
Nicolò Lucchesi
|
19d91ece4b
|
[CI] Fix flaky test_eagle_correctness test (#28364)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-09 16:04:59 +00:00 |
|
zhangsicheng5
|
2108a571d7
|
[DCP] Support dcp kv_cache interleave size > 1 (#26696)
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com>
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: Qiu <qiuchunshuo@huawei.com>
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>
|
2025-11-09 04:45:27 +09:00 |
|
Andy Lo
|
47604137a2
|
[Bugfix] Spec decode + structured output + spec model max len edge case (#28298)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2025-11-08 19:44:25 +00:00 |
|
Harry Mellor
|
d9ab1ad9d1
|
reasoning_content -> reasoning (#27752)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-08 12:15:08 +00:00 |
|
Xiaohong (Sean) Chen
|
d0c7792004
|
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding (#21068)
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com>
Co-authored-by: Haipeng Li <li2haipeng@gmail.com>
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>
|
2025-11-08 01:58:22 +00:00 |
|
Nick Hill
|
67a2da890e
|
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) (#28319)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-07 22:11:03 +00:00 |
|
Nick Hill
|
da786e339e
|
[Core] Rework handling of async scheduling config (#28250)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-07 20:01:23 +00:00 |
|
Nicolò Lucchesi
|
68a72a5cc1
|
Revert "[PerfFix] Avoid separate thread for MP executor shm spin (#28012)" (#28289)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-07 15:07:01 +00:00 |
|
Matthew Bonanni
|
ca90f50304
|
[Test] Add non-MoE DP test coverage (#28235)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-06 20:59:57 +00:00 |
|
Chauncey
|
59a50afa08
|
[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony (#26874)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-11-06 10:40:03 +00:00 |
|
wangxiyuan
|
c3ee80a01a
|
[V0 deprecation]clean up is_v1_supported_oracle (#28116)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-06 16:05:32 +08:00 |
|
Zhewen Li
|
0b8e871e5e
|
[CI/Build] Fix test_defaults_with_usage_context in AMD CI (#27926)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-05 15:40:24 -08:00 |
|
Snehlata
|
e15601789b
|
[Feature]: Add corrupted request metric to V1 metrics system. (#27306)
Signed-off-by: atalhens <sneh.lata@nutanix.com>
|
2025-11-05 13:45:29 -08:00 |
|
Paul Zhang
|
faedbb4d4f
|
[Feature] Extend batch invariant torch.compile to B200 (#27856)
Signed-off-by: PaulZhang12 <paulzhan@fb.com>
|
2025-11-05 10:04:49 -08:00 |
|
Samuel Shen
|
40db194446
|
[CI]: Add LMCacheConnector Unit Tests (#27852)
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
|
2025-11-05 09:45:57 -08:00 |
|
Isotr0py
|
3f5a4b6473
|
[Bugfix] Validate custom logits processor xargs for online serving (#27560)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-05 16:53:33 +00:00 |
|