Morrison Turnansky
|
e3fdb627d9
|
[FrontEnd] UNREVERT CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26502)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
|
2025-10-13 22:47:16 +00:00 |
|
Fardin Hoque
|
577c72a227
|
[CI Perf]Prune Tests in kernel/mamba (#26538)
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-13 18:22:31 -04:00 |
|
wang.yuqi
|
d2a7938582
|
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). (#26414)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-13 19:06:43 +00:00 |
|
haoyangli-amd
|
134f70b3ed
|
[Bugfix][Rocm] fix qr error when different inp shape (#25892)
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-10-13 10:04:21 -07:00 |
|
Will Eaton
|
53c9a7cee2
|
[P/D] [NixlConnector] kv load recovery integration (#26171)
Signed-off-by: Will Eaton <weaton@redhat.com>
|
2025-10-13 08:48:04 -07:00 |
|
Bram Wasti
|
3263799056
|
[unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] (#26373)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
|
2025-10-13 10:24:53 -04:00 |
|
Isotr0py
|
8e67b2557a
|
[Bugfix] Fix out of bound index issue for Jina-embedding-v3 RoPE with cuda graph (#26687)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-13 03:21:48 -07:00 |
|
Jialin Ouyang
|
4073c82c4e
|
[ResponseAPI] Simplify input/output message serialization (#26620)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-13 09:59:15 +00:00 |
|
wang.yuqi
|
767c3ab869
|
[Model][0/N] Improve all pooling task | clean up (#25817)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-13 16:44:50 +08:00 |
|
CSWYF3634076
|
782505ed8e
|
[Model] Add reasoning_parser and tool_parser for Ernie45 thinking (#25027)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2025-10-13 15:55:20 +08:00 |
|
gjgjos
|
18ed7746ea
|
[Feature] Add support for naver/splade-v3 (BERT-based sparse embedding model) (#26339)
Signed-off-by: gjgjos <gjgjos@naver.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-12 17:00:52 +00:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Chendi.Xue
|
9bb38130cb
|
[Bugfix] Fix GPU_ID issue in test script (#26442)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-10-12 11:39:05 +00:00 |
|
Isotr0py
|
045b396d09
|
[Bugfix][CI/Build] Fix failing Mteb CI (#26638)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-12 02:42:42 -07:00 |
|
Vadim Gimpelson
|
82e64c7a20
|
[PERF] [Qwen3-next] Speed up gated RMSNorm (#26207)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-12 08:27:50 +00:00 |
|
Angela Yi
|
a25f2adee9
|
[compile] Add patched_fused_scaled_matmul_reduce_scatter (#26604)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-10-11 05:44:43 -07:00 |
|
Nick Hill
|
5bc26c438d
|
[BugFix] Make penalties and bad_words work with async scheduling (#26467)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-10 23:27:04 +00:00 |
|
Zhengxu Chen
|
eef921f45e
|
AOT Compilation for torch.compile (Bundled) (#24274)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-10-10 19:02:11 -04:00 |
|
Nick Hill
|
949cb0170d
|
[BugFix] Fix async scheduling + request preemption (#26385)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-10 20:29:57 +00:00 |
|
Harry Mellor
|
7c12763b24
|
Fix some typing issues found by mypy==1.18.2 (#26596)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-10 18:21:25 +00:00 |
|
Xiong Wang
|
19a9b169bf
|
Add Qwen3-Omni moe thinker (#25550)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Xiong Wang <feizi.wx@alibaba-inc.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-10 17:00:56 +00:00 |
|
Roberto L. Castro
|
96ad65b7fe
|
[Transform] [Quantization] Add QuTLASS support to vLLM (#24440)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-10 09:43:40 -07:00 |
|
Shane A
|
8d2b8c0ff2
|
[Model] Add FlexOlmo model implementation (#24923)
Signed-off-by: Shane A <shanea@allenai.org>
|
2025-10-10 09:43:15 -07:00 |
|
Chauncey
|
910abdbd08
|
[Bugfix] fixed top_logprobs: -1 does not appear to work as intended (#26470)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-11 00:41:17 +08:00 |
|
baonudesifeizhai
|
cddce79fda
|
[torch.compile] Make inductor partition rules respect splitting_ops #25691 (#25845)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-10 16:35:28 +00:00 |
|
Mark McLoughlin
|
e519281920
|
[Metrics] Add test for multi-modal cache stats logging (#26588)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-10-10 16:00:50 +00:00 |
|
Elvir Crnčević
|
7b03584de8
|
Silu v2 (#25074)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: elvircrn <elvircrn@gmail.com>
Signed-off-by: Elvir Crnčević <elvircrn@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
|
2025-10-10 15:19:53 +00:00 |
|
Daniel Cámpora
|
0e67102d93
|
Added test_top_k_per_row to test-pipeline.yaml. (#26569)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-10-10 10:48:33 -04:00 |
|
Chauncey
|
1e6848a65d
|
[CI] fix test_run_batch.py::test_completions - AssertionError (#26578)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-10 22:16:28 +08:00 |
|
Andy Lo
|
67661375fa
|
[BugFix] Fix noop elimination edge case (#26394)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2025-10-10 13:33:04 +00:00 |
|
Mark McLoughlin
|
784c231151
|
[NIXL] Ignore abort on already-finished request (#25067)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-10-10 12:21:56 +02:00 |
|
Chen Zhang
|
606b00e80f
|
[bugfix][DCP] fix block_size of hash in DCP prefix caching (#26296)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-10-10 03:02:49 -07:00 |
|
Chauncey
|
720d3cd0f0
|
[CI] fix ruff format (#26579)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-10 03:02:12 -07:00 |
|
Ashwin Phadke
|
ab196edefb
|
Remove LoRA bias support (#25807)
Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com>
Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-10 09:50:33 +00:00 |
|
Luis Tomas Bolivar
|
3ee202ea1e
|
[GPT-OSS] Add support for arrays at tool message content (#25593)
Signed-off-by: Luis Tomas Bolivar <ltomasbo@redhat.com>
|
2025-10-10 09:00:45 +00:00 |
|
Cyrus Leung
|
ad430a67ca
|
[Metrics] Log multi-modal cache stats and fix reset (#26285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-10 01:45:55 -07:00 |
|
Boyuan Feng
|
b545a0b207
|
fix test_simple_inductor_graph_partition (#26522)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-10-10 06:39:19 +00:00 |
|
Ben Browning
|
da4455609d
|
[Chore]: One pythonic tool parser test uses the wrong parser (#26515)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2025-10-10 04:03:55 +00:00 |
|
Julien Denize
|
c6187f55f7
|
Refactor MistralTokenizer (#26358)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-10-09 22:48:58 +00:00 |
|
elvischenv
|
44f633dba1
|
[Flashinfer][gpt-oss] Support FP8-qkv Flashinfer TRTLLM Sinks Attention (#25674)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-10-09 16:13:39 -04:00 |
|
Jiangyun Zhu
|
5728da11ea
|
Revert #26113 "[Frontend] CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops" (#26472)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-10-09 05:43:55 -07:00 |
|
Wenzheng Bi
|
ec10fd0abc
|
[Bugfix] Move current_platform import to avoid python import cache. (#16601)
Signed-off-by: iwzbi <wzbi@zju.edu.cn>
|
2025-10-09 10:46:19 +00:00 |
|
Cyrus Leung
|
4bdf7ac593
|
[Bugfix] Fix SHM cache initialization (#26427)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-09 02:48:04 -07:00 |
|
Cyrus Leung
|
dc7976dd9f
|
[Misc] Upgrade more code to Python 3.10 (#26463)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-09 10:43:53 +01:00 |
|
Jerry Zhang
|
a83ff278d6
|
[torchao] Add support for ModuleFqnToConfig using regex (#26001)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-10-09 08:32:32 +00:00 |
|
Rahul Tuli
|
cf4cd6c24f
|
Add: Support for multiple hidden layers in Eagle3 (#26164)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
|
2025-10-09 07:30:50 +00:00 |
|
elvischenv
|
5e49c3e777
|
Bump Flashinfer to v0.4.0 (#26326)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-10-08 23:58:44 -07:00 |
|
Cyrus Leung
|
0f29dca988
|
[CI/Build] Fix model nightly tests (#26466)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-08 23:44:16 -07:00 |
|
Zhiyuan Li
|
d24cf322e1
|
[Hybrid]: Decouple Kernel Block Size from KV Page Size (#24486)
Signed-off-by: lizhiyuan <uniartisan2017@gmail.com>
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>
|
2025-10-08 23:43:39 -07:00 |
|
Qier Li
|
d17f0fbf30
|
[Core][KVConnector] Propagate all tokens on resumed preemptions (#24926)
Signed-off-by: Qier Li <kevin44036@gmail.com>
Co-authored-by: Qier Li <qier@fb.com>
|
2025-10-09 14:43:31 +08:00 |
|