Andrew Xia
|
52cb349fc0
|
[responsesAPI][3] ResponsesParser to set up non harmony MCP (#29413)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-12-02 11:24:45 -05:00 |
|
ImaGoodFella
|
60c3d413af
|
[Multimodal][Core] Optimize multimodal preprocessing cache by hashing image bytes instead of pixel values (#29621)
Signed-off-by: Rahul Steiger <rasteiger@ethz.ch>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-02 21:49:02 +08:00 |
|
Cyrus Leung
|
653591d5e7
|
[Chore] Move tokenizer initialization methods (#29793)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-02 13:33:37 +08:00 |
|
Zuyi Zhao
|
53bf71b0f0
|
[Misc] Update conftest for entrypoints/sagemaker test folder (#29799)
Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com>
|
2025-12-01 18:56:39 -09:00 |
|
Andrew Xia
|
fa8804ad9c
|
[responsesAPI][4] fix responseOutputItem Kimi K2 thinking bug (#29555)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-12-02 02:11:35 +00:00 |
|
sangbumlikeagod
|
092bb73b8a
|
[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209)
Signed-off-by: sangbumlikeagod <oironese@naver.com>
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
|
2025-12-01 18:19:17 +01:00 |
|
Cyrus Leung
|
f0a28bf661
|
[Misc] Unify tokenizer registration (#29767)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-01 11:34:58 +00:00 |
|
daniel-salib
|
014ece97c7
|
[Frontend] Add tool filtering support to ToolServer (#29224)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-01 08:03:57 +00:00 |
|
wang.yuqi
|
62de4f4257
|
[Frontend] Resettle pooling entrypoints (#29634)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-12-01 15:30:43 +08:00 |
|
Jee Jee Li
|
b9d0504a36
|
[Bugfix] Revert test_tokenization.py (#29729)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-29 16:35:15 +00:00 |
|
Cyrus Leung
|
34a984274e
|
[Misc] Refactor tokenizer interface (#29693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 04:02:21 -08:00 |
|
Jee Jee Li
|
39e63dec7c
|
[LoRA] Cleanup LoRA unused code (#29611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 22:52:58 -08:00 |
|
Tsukasa OI
|
762a4a6ca9
|
[Frontend] Perform offline path replacement to tokenizer (#29706)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
2025-11-28 18:32:08 -08:00 |
|
Cyrus Leung
|
b2c50eda50
|
[Bugfix] Fix wrong mock attribute (#29704)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 10:30:41 +08:00 |
|
Cyrus Leung
|
8d9338fae4
|
[Chore] Rename Processor to InputProcessor (#29682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 09:35:41 -08:00 |
|
Nicolò Lucchesi
|
e5a621b724
|
[CI] Add batched audios Whisper test (#29308)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-27 19:31:52 +00:00 |
|
Andrew Xia
|
b07555d26f
|
[responsesAPI][2] parse ResponseFunctionToolCallOutputItem (#29383)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-11-25 10:27:26 -08:00 |
|
wang.yuqi
|
7a80b01889
|
[CI] Resettle pooling entrypoints tests. (#29370)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-11-25 10:39:10 +00:00 |
|
Mark McLoughlin
|
9cf4edae6e
|
[Metrics] Scheduled removal of deprecated metrics (#29330)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-25 11:15:13 +08:00 |
|
Nick Hill
|
84371daf75
|
[Tests] Verify gpt_oss package is installed in harmony tests (#29336)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-24 22:04:31 +00:00 |
|
Aydin Abiar
|
656516c315
|
[Bugfix] properly handle nested json with llama3 tool parser (#27701)
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Signed-off-by: Aydin Abiar <62435714+Aydin-ab@users.noreply.github.com>
Co-authored-by: Aydin Abiar <aydin@anyscale.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-11-24 15:28:51 +00:00 |
|
Andrew Xia
|
742e9ff6b3
|
[responsesAPI] parse reasoning item input (#28248)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-22 15:42:11 +08:00 |
|
rasmith
|
6f403501a0
|
[CI/Build][AMD] Enable Entrypoints Integration Test (Pooling) to run without error on ROCm (#29212)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-22 02:13:18 +00:00 |
|
jeremyteboul
|
0730414999
|
[Core] Add audio_embeds support to chat completions (#29059)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
|
2025-11-21 11:39:47 +08:00 |
|
Kevin H. Luu
|
c64c0b78de
|
[chore] Move the rest of wikimedia url to S3 (#28921)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-18 09:44:18 -08:00 |
|
Isotr0py
|
896e41ae04
|
[CI/Build] Replace wikipedia url with local server ones (#28908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-18 08:10:55 +00:00 |
|
Cyrus Leung
|
bf9e1e8767
|
[Bugfix] Fix wrong CLI defaults for dynamic SchedulerConfig fields (#28872)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-17 20:30:29 -08:00 |
|
Nicolò Lucchesi
|
6f1e7f7226
|
[DisaggEverything] Tokens in<>out /generate endpoint (#24261)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-14 09:58:01 -07:00 |
|
Srreyansh Sethi
|
360bd8762f
|
[Frontend] Added chat-style multimodal support to /classify. (#27516)
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: vnadathur <glvikramn@gmail.com>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-11-14 11:03:55 +00:00 |
|
Andrew Xia
|
1a0b157a2e
|
[Frontend][responsesAPI][1/n] convert responses API tool input to chat completions tool format (#28231)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-11-13 04:47:22 +00:00 |
|
Andrew Xia
|
7c38ed0f1c
|
[Frontend] split append tool output (#28333)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-11-13 04:03:23 +00:00 |
|
wangxiyuan
|
e1710393c4
|
[[V0 deprecation]]Remove VLLM_USE_V1 env (#28204)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-11 18:22:16 -07:00 |
|
Zuyi Zhao
|
bca74e32b7
|
[Frontend] Add sagemaker_standards dynamic lora adapter and stateful session management decorators to vLLM OpenAI API server (#27892)
Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com>
Signed-off-by: Shen Teng <sheteng@amazon.com>
Co-authored-by: Shen Teng <sheteng@amazon.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-11-11 04:57:01 +00:00 |
|
Harry Mellor
|
d9ab1ad9d1
|
reasoning_content -> reasoning (#27752)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-08 12:15:08 +00:00 |
|
Alex Brooks
|
e70fbc599b
|
[CI/Build] Loosen STT LoRA Translate Check (Flaky Test) (#28247)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-11-07 02:51:27 +00:00 |
|
Isotr0py
|
3f5a4b6473
|
[Bugfix] Validate custom logits processor xargs for online serving (#27560)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-05 16:53:33 +00:00 |
|
Alex Brooks
|
b7cbc25416
|
[Model, Core] Support Granite Speech & LoRA for STT (#24455)
|
2025-11-05 08:33:48 +01:00 |
|
Misha Efimov
|
ba464e6ae2
|
Add ORCA endpoint load metrics support (#24905)
Signed-off-by: Misha Efimov <mef@google.com>
|
2025-11-03 08:21:31 +00:00 |
|
Vensen
|
0ce743f4e1
|
Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 (#27420)
Signed-off-by: vensenmu <vensenmu@gmail.com>
|
2025-11-02 16:24:01 +00:00 |
|
Ben Browning
|
758ea2e980
|
[CI/Build] Fix flaky test_transcription_validation.py::test_basic_audio_gemma (#27924)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2025-11-02 03:45:02 +00:00 |
|
Benjamin Bartels
|
1e88fb751b
|
Adds anthropic /v1/messages endpoint to openai api_server (#27882)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
|
2025-11-01 12:45:42 -07:00 |
|
cong-meta
|
a2981c4272
|
[EP/DP][API Server] Enable DP-aware routing in OpenAI API requests (#24945)
Co-authored-by: Cong Chen <prowindy@gmail.com>
|
2025-10-30 12:10:16 -07:00 |
|
wang.yuqi
|
4464723f22
|
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. (#25524)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-30 12:13:05 +00:00 |
|
Braulio Dumba
|
1da3309ace
|
[Core] Exposing engine sleep & wake_up state as prometheus metrics (#24176)
Signed-off-by: Braulio Dumba <Braulio.Dumba@ibm.com>
|
2025-10-29 09:32:01 -07:00 |
|
Alec S
|
ab2eb27b74
|
[Frontend] [gpt-oss] Mcp type bug (#27689)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-10-29 10:01:32 +00:00 |
|
Junpu Fan
|
b186149e8e
|
[Bugfix][Frontend] validate arg priority in frontend LLM class before add request (#27596)
Signed-off-by: Junpu Fan <junpufan@gmail.com>
|
2025-10-28 14:02:43 +00:00 |
|
wangln19
|
446912d1cb
|
fix: allow HuggingFace standard chat template params via **kwargs (#27622)
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-28 21:12:34 +08:00 |
|
Matthew Bonanni
|
44b5ce956d
|
[Bugfix] In LongRoPE, decide short vs long based on max_model_len (#27431)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-28 12:00:56 +00:00 |
|
Andrew Xia
|
53a56e658b
|
[gpt-oss][2/N] Support input_messages in responsesRequest (#26962)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-10-27 23:15:49 +00:00 |
|
Ben Browning
|
3b96f85c36
|
[Chore]: Stream tokens vs characters in tool call parser tests (#26513)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2025-10-27 23:06:25 +08:00 |
|