Commit Graph

748 Commits

Author SHA1 Message Date
Martin Vit
95a395dbec [Bugfix] Fix Anthropic API base64 image handling in Messages endpoint (#35557)
Signed-off-by: Martin Vit <martin@voipmonitor.org>
2026-02-28 20:57:08 +00:00
Wentao Ye
e113a30113 [Deprecation] Deprecate code in 0.17 as scheduled (#35441)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-28 17:32:37 +00:00
Andreas Karatzas
1e69c04887 [ROCm][CI] Parametrize vision score tests across attention backends with per-backend tolerances (#35571)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-28 08:59:26 +00:00
Chauncey
06254d4cbb [CI] add trainer_send_weights for MockWeightTransferEngine (#35589)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-02-28 06:47:43 +00:00
Umut Polat
1d5ab5d603 [Bugfix] Move chat completion response_format validation to Pydantic model_validator (#35510)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
2026-02-27 21:26:19 -08:00
Umut Polat
b66a74649e [Bugfix] Replace assert with ValueError for response_format validation in completions endpoint (#35456)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
2026-02-27 08:01:06 +00:00
daniel-salib
d43048ce05 [Bugfix] Emit reasoning_part events in simple streaming path for Resp… (#35184)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2026-02-27 09:49:06 +08:00
Wentao Ye
99c7892c5b [Perf] Optimize maxsim scores computation for pooling models, 13.9% E2E throughput improvement (#35330)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-02-26 17:14:54 +00:00
Krish Gupta
3827c8c55a [Test] Add tests for n parameter in chat completions API (#35283)
Signed-off-by: KrxGu <krishom70@gmail.com>
2026-02-26 09:14:07 +00:00
Flora Feng
186ea22efe [Misc][Harmony] Move Responses API only harmony utils to responses/harmony.py (#35339)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-02-26 14:35:16 +08:00
pushkar
5d18bf8b32 [Bugfix] Fix Harmony preamble visibility in Responses API (#32114)
Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Signed-off-by: pupa <pupa@users.noreply.github.com>
2026-02-25 08:08:16 -08:00
Andreas Karatzas
2ff3e436ad [Responses][CI] Filter negative token IDs in schema fuzz test to avoid 500 errors (#35231)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-25 05:52:44 +00:00
Flora Feng
ec1d30c0f6 [Responses] Decouple SSE event helpers from Harmony context (#35148)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-02-24 20:05:25 -08:00
Pooya Davoodi
e3b2324ec4 [Frontend] Use init_app_state and FrontendArgs in run_batch (#32967)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-24 19:40:39 -08:00
Harry Mellor
28c5e69ba0 Enforce that model is the first positional arg when --served-model-name is used (#34973)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-23 08:38:05 -08:00
Robert Shaw
d13ece38d7 [CI] Skip Responses API (#34990)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-02-23 07:46:45 -08:00
Andreas Karatzas
dd8c3a7fb2 [ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation delays (#35052)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-22 10:07:18 +00:00
Andreas Karatzas
a8a47c17b6 [ROCm][CI] Fix flaky embedding chat test by using tolerance-based comparison (#35050)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-22 09:03:44 +00:00
Roman
98b0205c3c [Frontend] Add automatic language detection for Whisper transcription (#34342)
Signed-off-by: space_check <roman.vuskov@rwth-aachen.de>
Signed-off-by: Roman <45857014+spacecheck@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-02-21 04:49:41 -08:00
Andreas Karatzas
991d6bff38 [CI][MCP][Harmony] Heavy refactoring Harmony & MCP response tests and stabilizing with deterministic test infrastructure (#33949)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-20 20:03:32 -08:00
Micah Williamson
f5432e35a3 [ROCm][CI] Loosen RemoteOpenAIServer Startup Timeout (#34922)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-02-20 05:37:49 +00:00
Varun Chawla
676f82ae81 Add validation to reject non-text content in system messages (#34072)
Signed-off-by: Varun Chawla <varun_6april@hotmail.com>
2026-02-19 21:30:33 -08:00
Tal Nir
f75b61a9e9 [Voxtral Realtime] Fix engine crash on empty multimodal embeddings (#34862)
Signed-off-by: Tal Nir <tal@nervexneurotech.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 23:21:47 -08:00
Jaeyeon Kim(김재연)
9681068cf9 [Frontend] Fix reasoning_tokens for text-based parsers in Responses API (#33513)
Signed-off-by: Jaeyeon Kim <anencore94@gmail.com>
2026-02-18 23:16:41 -08:00
Aaron Hao
e99ba957ec [BUG] Fixing Weight Sync unit test (#34841)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-02-18 17:20:10 -05:00
Jongseok Park
c656ba3b4d [Kernel] Triton-based Top-k and Top-p sampler kernels (#33538)
Signed-off-by: js_park <cakeng@naver.com>
Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com>
Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-02-17 23:14:30 +00:00
Flora Feng
1e4a084c8e [CI] Fix flaky test_parsable_context (#34717)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-02-17 18:42:52 +00:00
Cyrus Leung
574fe75245 [Renderer] Move InputPreprocessor into Renderer (2/2) (#34560)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-17 05:29:01 -08:00
Nicolò Lucchesi
6cc403e67d [Bugfix][CI] Fix flaky entrypoints/openai/test_response_api_with_harmony.py::test_function_calling[openai/gpt-oss-20b] (#34624)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-02-16 16:11:07 +00:00
Almog Tavor
72d5951d02 [Bugfix] Treat generation_config max_tokens as default not ceiling (#34063)
Signed-off-by: almogtavor <almogtavor@gmail.com>
2026-02-16 07:58:24 -08:00
Andreas Karatzas
1e828573b4 [CI][Metrics] Stabilize tests with polling and subprocess guards (#34566)
test_abort_metrics_reset is flaky due to hardware-dependent
fixed sleeps: replace fixed sleeps with polling.

test_metrics_exist_run_batch passes even when the engine crashes
on startup (false positive): add subprocess lifecycle guards.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-16 10:52:02 +00:00
Andreas Karatzas
974d829b05 [CI][Frontend] Return 422 instead of 500 for invalid Anthropic tool_choice (#34590)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-15 20:06:48 -08:00
Cyrus Leung
73391a1baa [Renderer] Move InputPreprocessor into Renderer (1/2) (#34510)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-14 10:14:21 -08:00
Ben Browning
fd267bc7b7 [Bugfix]: Fix structured output in multi-turn gpt-oss (#34454)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-13 11:12:48 -08:00
Ilya Boytsov
071d863e20 Extend ColBERT support to non-standard BERT backbones (#34170)
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>
2026-02-13 09:53:09 +00:00
Cyrus Leung
2f308214c0 [Refactor] Pass full VllmConfig to Renderer (#34485)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-12 22:48:38 -08:00
Andreas Karatzas
6afa587d31 [ROCm][CI] Fix serving tokens test failures (#34047)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-13 11:27:53 +08:00
Cyrus Leung
fc22cae4ac [CI/Build] Update video URLs for testing (#34446)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-12 18:15:36 -08:00
Alec S
be7370daf3 [Frontend] Enable generic structured_outputs for responses API (#33709)
Signed-off-by: Alec Solder <alecs@fb.com>
Co-authored-by: Alec Solder <alecs@fb.com>
2026-02-12 16:15:48 -08:00
Patrick von Platen
1100a97621 [Voxstral Realtime] Enable tests (#33803)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
2026-02-12 09:43:24 -08:00
Cyrus Leung
fb455ed547 [V0 Deprecation] Remove code related to per-request logits processors (#34400)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-12 20:44:28 +08:00
Cyrus Leung
b96f7314b4 [Refactor] Pass Renderer to Input Processor (#34329)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-11 19:38:11 -08:00
Junseo Park
5458eb835d [Bugfix] send None sentinel on final commit so server properly sends transcription.done (#33963)
Signed-off-by: pjs102793 <pjs102793@naver.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-02-11 21:01:53 +00:00
Adam Binford
1b8756562e Responses harmony system message structured (#34268)
Signed-off-by: Adam Binford <adamq43@gmail.com>
2026-02-11 05:14:28 -08:00
wang.yuqi
dab1de9f38 [Frontend][CI] Consolidate instrumentator entrypoints (#34123)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-10 07:30:19 +00:00
Andrew Xia
9608844f96 [responsesAPI] fix simpleContext streaming output_messages (#34188)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2026-02-09 22:53:07 -08:00
Cyrus Leung
ab97bcf662 [CI/Build] Relax test_mcp_tool_call (#34204)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-10 05:18:57 +00:00
wang.yuqi
22b64948f6 [Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-09 06:42:38 +00:00
Reagan Lee
c4df59ad43 Add embedding input functionality for disabled modalities [remake] (#32493)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”>
Signed-off-by: Reagan Lee <reaganjlee@gmail.com>
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-08 04:57:16 -08:00
Pooya Davoodi
2cb2340f7a [Frontend]Add support for transcriptions and translations to run_batch (#33934)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-07 05:24:57 -08:00