Aaron Pham
|
e67597545b
|
[CI][Fix] deterministic seed for flaky CI runs on structured outputs (#24380)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-09-07 11:10:40 +08:00 |
|
Woosuk Kwon
|
4172235ab7
|
[V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-06 16:15:18 -07:00 |
|
elvischenv
|
e68dc2f014
|
[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test (#24370)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-09-06 20:39:34 +00:00 |
|
Ye (Charlotte) Qi
|
a3645ed94d
|
[Frontend][Responses API] Support reporting tool output tokens and fix reasoning token count (#24285)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-06 13:27:15 -07:00 |
|
Aaron Pham
|
fb691ee4e7
|
[Fix] [gpt-oss] fix non-tool calling path for chat completion (#24324)
|
2025-09-06 19:10:32 +00:00 |
|
Jee Jee Li
|
7555d6b34a
|
[Bugfix] Fix test_mixtral_moe (#24371)
|
2025-09-06 09:32:03 -07:00 |
|
Roger Wang
|
b121ca22ad
|
[CI] Disable flaky structured output test from CI (#24366)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-06 13:31:56 +00:00 |
|
wang.yuqi
|
6d6c6b05d3
|
[New Model]: google/embeddinggemma-300m (#24318)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-05 22:58:36 -07:00 |
|
yzds
|
ac201a0eaf
|
[Feature] Support Decode Context Parallel (DCP) for MLA (#23734)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-09-06 13:24:05 +08:00 |
|
Didier Durand
|
35bf193864
|
[Doc]: fix typos in Python comments (#24294)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-05 19:41:12 -07:00 |
|
elvischenv
|
eedb2a2a10
|
[Bugfix] Fix silu_mul+quant fusion test (#24341)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-09-05 20:13:42 +00:00 |
|
Aaron Pham
|
c29fb540ff
|
[gpt-oss] tool parser supports for /chat/completions [1/n] (#22386)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-09-04 20:39:12 -07:00 |
|
Zhuohan Li
|
886ccbe5ba
|
[CI/Build] Reduce the number of redundant cases to test for LoRA (#24276)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
|
2025-09-04 21:58:44 +00:00 |
|
elvischenv
|
adc3ddb430
|
[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-04 14:25:45 -07:00 |
|
Seiji Eicher
|
60b755cbcb
|
[Misc] Have AsyncLLM custom_stat_loggers extend default logger list (#20952)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-09-04 14:25:30 -07:00 |
|
Didier Durand
|
83609ca91d
|
[Doc]: fix typos in Python comments (#24173)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-04 08:52:17 -07:00 |
|
nvjullin
|
37241077d5
|
[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-04 09:25:40 -04:00 |
|
Kebe
|
8f423e5f43
|
[Feature][Response API] Add streaming support for non-harmony (#23741)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-09-04 17:49:06 +08:00 |
|
Lucas Wilkinson
|
402759d472
|
[Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-04 02:47:59 -07:00 |
|
mgazz
|
51d5e9be7d
|
[Core][Model] Terratorch backend integration (#23513)
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-04 00:22:41 -07:00 |
|
bingchen-mi
|
e7fc70016f
|
[Model] Add MiDashengLM model support (#23652)
Signed-off-by: chenbing8 <chenbing8@xiaomi.com>
Signed-off-by: bingchen-mi <chenbing8@xiaomi.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-04 00:08:09 -07:00 |
|
Li, Jiang
|
57b1ce94f7
|
[CPU] Refactor CPU unquantized linear (#24150)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-04 14:28:45 +08:00 |
|
Flora Feng
|
712b273f65
|
[Refactor] Introduce basic Renderer for completion-style request (#24010)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-09-04 05:21:12 +00:00 |
|
wuhang
|
a38f8bd54c
|
[Feature][Responses API]Support MCP tools with streaming mode + background mode (#23927)
Signed-off-by: wuhang <wuhang6@huawei.com>
|
2025-09-04 04:05:10 +00:00 |
|
Peter Pan
|
b5ee1e3261
|
Remove deprecated PyNcclConnector (#24151)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-09-03 22:49:16 +00:00 |
|
Matthew Bonanni
|
a742322092
|
[Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend (#23289)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-03 14:05:24 -04:00 |
|
bnellnm
|
e9b92dcd89
|
[Kernels] Overlap shared experts with send/recv (#23273)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-03 12:35:18 -04:00 |
|
nopperl
|
fa4311d85f
|
[V1] v1 engine + full CUDA graph support for PLaMo2 (#23998)
Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp>
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
Co-authored-by: Hemmi Shinichi <shemmi@preferred.jp>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
|
2025-09-03 08:24:02 -07:00 |
|
wang.yuqi
|
51383bd472
|
[CI] Accelerate mteb test by setting SentenceTransformers mteb score to a constant (#24088)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-03 17:23:56 +08:00 |
|
Isotr0py
|
9c99e4871f
|
[Misc] Clean up deadcode for legacy processing pipeline (#24153)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-03 08:34:29 +00:00 |
|
dsinghvi
|
70549c1245
|
[CI/Build] Serve images used by multimodal tests through local HTTP Server (#23907)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-09-03 16:13:11 +08:00 |
|
Didier Durand
|
d7e1e59972
|
[Doc]: fix typos in Python comments (#24093)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 21:05:45 -07:00 |
|
co63oc
|
1bd007f234
|
fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
|
2025-09-02 20:44:50 -07:00 |
|
afeldman-nm
|
136d853e65
|
[V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (#23656)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
|
2025-09-03 02:52:51 +00:00 |
|
Thomas Parnell
|
d328f7894f
|
[CI] Enable all hf transformers baselines in test_hybrid (#23936)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-09-02 20:15:06 +00:00 |
|
Mark McLoughlin
|
2417798471
|
[Metrics] Deprecate TPOT in favor of ITL (#24110)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-09-02 18:10:10 +00:00 |
|
Chenheli Hua
|
f399182e8c
|
Run ruff format on a few files. (#24075)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-09-02 17:55:32 +00:00 |
|
Michael Goin
|
e66ed3e675
|
[CI Failure] Skip failing nvfp4 silu test (#23959)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-02 13:18:15 -04:00 |
|
Aziz
|
ce30dca5c4
|
[CI]: reduce HTTP calls inside entrypoints openai tests (#23646)
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: Aziz <azizbenothman76@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-02 10:49:32 +00:00 |
|
Didier Durand
|
fad73be1a5
|
[Doc]: fix typos in Python comments (#24077)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 02:38:55 -07:00 |
|
Asaf Joseph Gardin
|
2b41cbbf03
|
[V1][Mamba1] - FP32 SSM Kernel Support (#23506)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
|
2025-09-01 20:53:00 -07:00 |
|
WeiQing Chen
|
a0e0efd6bd
|
[Model] Support DP for ViT on Kimi-VL-A3B-Thinking-2506 (#23817)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-09-01 16:56:56 +00:00 |
|
Christian Pinto
|
cf91a89dd2
|
[docs][misc] IOProcessor plugins fixes (#24046)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-09-01 09:17:41 -07:00 |
|
Kwai-Keye
|
7c8271cd1e
|
[Model]: support KeyeVL-1_5-8B (#23838)
Signed-off-by: wangruitao <wangruitao@kuaishou.com>
Co-authored-by: wangruitao <wangruitao@kuaishou.com>
|
2025-09-01 03:50:27 -07:00 |
|
Nicolò Lucchesi
|
d46934b229
|
[Frontend] Gemma3n audio transcriptions/translations endpoint (#23735)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-01 18:07:46 +08:00 |
|
Code Jesus
|
422e793fa6
|
[Bugfix] Add support for <tool_call> format in streaming mode for XLAM Tool Parser (#22769)
Signed-off-by: Devon Peroutky <devon@kindo.ai>
|
2025-09-01 14:07:54 +08:00 |
|
Christian Pinto
|
1cb39dbcdd
|
[Misc] IO Processor plugins for pooling models (#22820)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-08-31 23:07:12 -07:00 |
|
Isotr0py
|
ff0e59d83a
|
[CI/Build] Improve Tensor Schema tests speed by avoid engine core initialization (#23357)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-31 22:52:20 -07:00 |
|
Roger Wang
|
749be00a98
|
[Core][Multimodal] Allow passing multi_modal_uuids as multimodal identifiers. (#23394)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-08-30 18:01:22 -07:00 |
|
Ning Xie
|
5490d633ce
|
[UT] fix unify_kv_cache_configs when kv cache config needs sort (#23843)
|
2025-08-30 11:22:14 +00:00 |
|