Stanislav Kirillov
|
50dbd6c9e6
|
[bugfix] Fix critical bug when reporting for all paths where handler.create_error_response is used (#34516)
Signed-off-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-14 23:24:25 -08:00 |
|
Cyrus Leung
|
fb455ed547
|
[V0 Deprecation] Remove code related to per-request logits processors (#34400)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-12 20:44:28 +08:00 |
|
Cyrus Leung
|
b96f7314b4
|
[Refactor] Pass Renderer to Input Processor (#34329)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-11 19:38:11 -08:00 |
|
Cyrus Leung
|
11a4c9d30d
|
[Misc] Simplify get_max_tokens (#34036)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 00:59:49 -08:00 |
|
Cyrus Leung
|
cd8b405bd0
|
[Refactor] Consolidate sequence normalization and enc-dec parsing (#33928)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-06 15:43:47 +00:00 |
|
Chauncey
|
6abb0454ad
|
[Perf] Optimize the performance of structured output + reasoning (#33557)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-02-05 15:45:29 +08:00 |
|
Chauncey
|
f67ee8b859
|
[Perf] Optimize chat completion streaming performance (#33782)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-02-04 12:30:36 +00:00 |
|
Andrew Xia
|
e1bf04b6c2
|
[1/N] Initial Implementation of Parser for ResponsesAPI (#32712)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2026-02-04 10:59:03 +08:00 |
|
Cyrus Leung
|
a502831d36
|
[Chore] Remove redundant input parsing methods (#33542)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-02 10:50:47 +00:00 |
|
Cyrus Leung
|
f0a1c8453a
|
[Frontend] Use new Renderer for Completions and Tokenize API (#32863)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-31 04:51:15 -08:00 |
|
Harry Mellor
|
c5113f60f2
|
Remove deprecated reasoning_content message field (#33402)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-30 11:48:15 +00:00 |
|
wangln19
|
39037d258e
|
Fix tool call indexing double-counting (#33141)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
|
2026-01-29 05:57:09 +00:00 |
|
Cyrus Leung
|
e0b005d9cf
|
[Frontend] Cleanup serving engine (#33103)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-26 20:47:26 -08:00 |
|
wangln19
|
2d7053438a
|
fix: preserve native tool call ID in multi-turn tool calling (#32768)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2026-01-27 10:22:35 +08:00 |
|
Cyrus Leung
|
51931c5c9a
|
[UX] Deduplicate sampling parameter startup logs (#32953)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-24 17:37:28 +08:00 |
|
Cyrus Leung
|
d117a4d1a9
|
[Frontend] Introduce Renderer for processing chat messages (using ModelConfig) (#30200)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-22 12:44:22 +00:00 |
|
Hyunkyun Moon
|
3c8740aacb
|
[Frontend] Add render endpoints for prompt preprocessing (#32473)
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: Hyunkyun Moon <mhg5303@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-19 12:21:46 +08:00 |
|
cjackal
|
35bf5d08e8
|
[bugfix] Fix online serving crash when text type response_format is received (#26822)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
|
2026-01-16 12:23:54 +08:00 |
|
Chauncey
|
4c1c501a7e
|
[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-15 07:41:34 +00:00 |
|
Aleksandr Samarin
|
d084e9fca7
|
[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding (#26291)
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com>
Signed-off-by: southfreebird <yvorott@gmail.com>
Co-authored-by: southfreebird <yvorott@gmail.com>
|
2026-01-14 13:20:52 -05:00 |
|
Cyrus Leung
|
3f28174c6a
|
[Frontend] Standardize use of create_error_response (#32319)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-14 11:22:26 +00:00 |
|
Chauncey
|
fefce49807
|
[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-13 13:01:39 +00:00 |
|