Didier Durand
|
22cf679aad
|
[Doc]: fix various typos in multiple files (#23179)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-22 10:38:46 -07:00 |
|
Guillaume Calmettes
|
0ba1b54ac6
|
[gpt-oss] add input/output usage in responses api when harmony context is leveraged (#22667)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-08-22 08:32:24 +00:00 |
|
Bin Jia
|
5964069367
|
[New Model] Add Seed-Oss model (#23241)
Signed-off-by: jiabin.00 <jiabin.00@bytedance.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-22 04:58:10 +00:00 |
|
Cyrus Leung
|
8896eb72eb
|
[Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed (#18800)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-22 10:56:57 +08:00 |
|
Kebe
|
5368f76855
|
[Feature][Responses API] Support logprobs(non-stream) (#23319)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-08-21 23:09:16 +00:00 |
|
Chen Zhang
|
8a19303173
|
[BugFix][gpt-oss] Fix Chat Completion with Multiple Output Message (#23318)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-21 10:31:11 -07:00 |
|
Russell Bryant
|
4e51fa8cba
|
Do not use eval() to convert unknown types (#23266)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-20 13:28:30 -07:00 |
|
Chen Zhang
|
b95697d731
|
[Frontend] improve error logging of chat completion (#22957)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-20 13:03:37 -07:00 |
|
bigmoyan
|
582bbe6bd7
|
[Fix] correct tool_id for kimi-k2 when use tool_choice=required (#21259)
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
|
2025-08-20 12:59:54 -07:00 |
|
Russell Bryant
|
f77a0802b7
|
Limit HTTP header count and size (#23267)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2025-08-20 17:57:37 +00:00 |
|
Marko Rosenmueller
|
80141bbf2f
|
fix: use cache_salt for gpt-oss (#23186)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-08-19 18:12:25 +00:00 |
|
22quinn
|
f7cf5b512e
|
[Frontend] Add /collective_rpc API endpoint (#23075)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-19 17:29:32 +00:00 |
|
Yuge Zhang
|
24f4d1a224
|
Add return_token_ids parameter to OpenAI API endpoints (#22587)
Signed-off-by: Yuge Zhang <scottyugochang@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-19 09:48:31 -07:00 |
|
Breno Baldas Skuk
|
ac6eb49de3
|
fix: OpenAI SDK compat (ResponseTextConfig) (#23126)
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: Breno Baldas Skuk <breno.skuk@hcompany.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-18 15:22:59 -07:00 |
|
afeldman-nm
|
bf7f470b22
|
[V1] Logits processors extensibility (#19912)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeld2012@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-16 12:59:17 -07:00 |
|
Woonggi Min
|
68373d3126
|
[Frontend] Added support for HermesToolParser for models without special tokens (#16890)
Signed-off-by: minpeter <kali2005611@gmail.com>
|
2025-08-16 17:38:42 +00:00 |
|
Andrew Sansom
|
78863f8c5c
|
[BugFix] Add support for loading prompt embeds tensors serialized on unavailable devices and sparse tensors (#22962)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-08-16 06:25:10 +00:00 |
|
Nick Hill
|
f6b5040590
|
[Frontend] Avoid list copies in serving_chat.py (#22947)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-16 02:06:30 +00:00 |
|
Csrayz
|
a0632a3e03
|
[Frontend] Expose do_log_stats interval to env (#22905)
Signed-off-by: Csrayz <jover@cmbchina.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-15 13:00:20 +00:00 |
|
Roger Wang
|
da2705198f
|
[Misc] clear and separate error messages for input too long and input + max-tokens too long (#22803)
Signed-off-by: Roger Wang <hey@rogerw.me>
|
2025-08-13 07:22:56 -07:00 |
|
Kdump
|
653124bd46
|
[Frontend] Add chunked processing to handle long inputs in embedding models (#22280)
Signed-off-by: x22x22 <wadeking@qq.com>
Signed-off-by: Kdump <rootshellexp@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-13 04:14:24 -07:00 |
|
Chen Zhang
|
6807af8f46
|
[gpt-oss] upgrade gpt-oss to v0.0.3 and add version check (#22768)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-12 21:37:26 -07:00 |
|
Chen Zhang
|
ad344ef552
|
[gpt-oss] Small bug fixes for frontend (#22512)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-11 22:04:38 -07:00 |
|
Chen Zhang
|
95a935fc48
|
[gpt-oss] Support streaming in response API (#22431)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-11 17:46:59 -07:00 |
|
wang.yuqi
|
84cf78acee
|
[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-11 09:41:37 -07:00 |
|
Harry Mellor
|
bc1d02ac85
|
[Docs] Add comprehensive CLI reference for all large vllm subcommands (#22601)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-11 00:13:33 -07:00 |
|
Maximilien de Bayser
|
39052dbca8
|
Support token_type_ids in V1 with less code changes (#21985)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-08-10 22:54:59 -07:00 |
|
yyweiss
|
baece8c3d2
|
[Frontend] Add unix domain socket support (#18097)
Signed-off-by: <yyweiss@gmail.com>
Signed-off-by: yyw <yyweiss@gmail.com>
|
2025-08-08 16:23:44 -07:00 |
|
Chen Zhang
|
fe6d8257a1
|
[gpt-oss] Support tool call and implement MCP tool server (#22427)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-08 15:06:37 -07:00 |
|
Andrew Sansom
|
e2c8f1edec
|
[PERF] Use pybase64 to more quickly decode prompt embeddings (#22469)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-08-07 19:15:32 -07:00 |
|
Cyrus Leung
|
139d155781
|
[Frontend] Use engine argument to control MM cache size (#22441)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-07 09:47:10 -07:00 |
|
Woosuk Kwon
|
399d2a10e2
|
Fix pre-commit error in main (#22462)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-07 08:54:39 -07:00 |
|
Chen Zhang
|
4815b00f54
|
[gpt-oss] Generate ResponseOutputItem from Harmony Message (#22410)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-07 08:33:25 -07:00 |
|
Chen Zhang
|
4da8bf20d0
|
[Tool] Fix auto tool call (#22434)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-07 07:03:38 -07:00 |
|
Cyrus Leung
|
766bc8162c
|
[Core] Store only the keys for multi-modal data in P0 (#22198)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-07 01:45:04 -07:00 |
|
Adrián García García
|
8e8e0b6af1
|
feat: Add --enable-log-outputs flag for logging model generations (#20707)
Signed-off-by: Adrian Garcia <adrian.garcia@inceptionai.ai>
|
2025-08-06 23:10:13 -07:00 |
|
Moritz Sanft
|
370661856b
|
[Frontend] Update OpenAI error response to upstream format (#22099)
Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
|
2025-08-06 23:06:00 -07:00 |
|
Chen Zhang
|
f6278b6243
|
[gpt-oss] Convert user input to harmony format (#22402)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-06 20:56:02 -07:00 |
|
Lionel Villard
|
ad6c655dde
|
preload heavy modules when mp method is forkserver (#22214)
Signed-off-by: Lionel Villard <villard@us.ibm.com>
|
2025-08-06 20:33:24 -07:00 |
|
qscqesze
|
5e9455ae8f
|
[Bugfix]: Fix the streaming output for function calls in the minimax (#22015)
Signed-off-by: QscQ <qscqesze@gmail.com>
Signed-off-by: qingjun <qingjun@minimaxi.com>
|
2025-08-06 20:30:27 -07:00 |
|
Chen Zhang
|
19c9365aa4
|
[gpt-oss] add demo tool server (#22393)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-06 17:47:14 -07:00 |
|
Woosuk Kwon
|
ec7cb19224
|
[gpt-oss] Add loop for built-in tool call (#22374)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-06 10:32:21 -07:00 |
|
Woosuk Kwon
|
9edd1db02b
|
[Minor] Fix type (#22347)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-06 02:22:03 -07:00 |
|
Woosuk Kwon
|
f263a4b53f
|
[gpt-oss] Support chat completion api (#22342)
|
2025-08-06 01:57:39 -07:00 |
|
Woosuk Kwon
|
178d03fbd6
|
[gpt-oss] Add Tool/ConversationContext classes and harmony_utils (#22340)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-06 01:08:49 -07:00 |
|
wang.yuqi
|
586f286789
|
[Model] Pooling model activation supports per request control by PoolingParams (#20538)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-05 00:37:00 -07:00 |
|
tlipoca9
|
8a6e108e76
|
fix: kimi_k2 return empty tool call list (#22149)
Signed-off-by: tlipoca9 <tlipoca9@gmail.com>
|
2025-08-04 19:15:31 -07:00 |
|
Woosuk Kwon
|
9af654cc38
|
[Responses API] Ignore store=True and process the request by default (#22185)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-04 05:12:48 -07:00 |
|
Woosuk Kwon
|
6d98843b31
|
[Responses API] Disable response store by default (#22137)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-03 04:04:21 -07:00 |
|
Cyrus Leung
|
f5d0f4784f
|
[Frontend] Improve error message for too many mm items (#22114)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-02 02:20:38 -07:00 |
|