biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Patrick von Platen	10152d2194	[Realtime API] Adds minimal realtime API based on websockets (#33187 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-30 18:41:29 +08:00
wang.yuqi	7cbbca9aaa	[Frontend] Cleanup api server (#33158 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2026-01-27 15:18:10 +00:00
wang.yuqi	76139d0801	[Frontend] Frontend will only attach supported tasks corresponding entrypoints. (#33139 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-01-27 12:15:43 +00:00
Jared Wen	6ee7f18f33	[Logging] add `--disable-access-log-for-endpoints` CLI option (#30011 ) Add a new CLI option --disable-access-log-for-endpoints to suppress uvicorn access logs for specified endpoints (e.g., /health, /metrics, /ping). This addresses the need to reduce log noise in production environments where health check endpoints are frequently polled by load balancers or monitoring systems, generating excessive log entries that obscure meaningful request logs. Fixes #29982 Signed-off-by: JaredforReal <w13431838023@gmail.com>	2026-01-26 21:49:03 +00:00
7. Sun	0f19427db5	[Perf] Cache exc.errors() result in validation exception handler (#32984 ) Signed-off-by: 7. Sun <jhao.sun@gmail.com>	2026-01-24 02:01:35 -08:00
Nick Hill	7fe255889e	[Misc] Log vLLM logo when starting server (#32796 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-23 11:15:12 +08:00
RickyChen / 陳昭儒	69d09fdd6c	[Feature] Add --ssl-ciphers CLI argument for TLS cipher control (#30937 ) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>	2026-01-22 09:53:24 -08:00
Cyrus Leung	d117a4d1a9	[Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`) (#30200 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 12:44:22 +00:00
wang.yuqi	4ae77dfd42	[Frontend][1/n] Make pooling entrypoints request schema consensus \| CompletionRequest (#32395 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-16 06:17:04 +00:00
Chauncey	707b44cc28	[Refactor] [11/N] to simplify the mcp architecture (#32396 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-15 18:49:31 +08:00
Chauncey	4c1c501a7e	[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-15 07:41:34 +00:00
Chauncey	00e6402d56	[Frontend] track responsesAPI server_load (#32323 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-14 12:00:37 +00:00
Cyrus Leung	3f28174c6a	[Frontend] Standardize use of `create_error_response` (#32319 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-14 11:22:26 +00:00
Chauncey	769d0629e1	[Refactor] [9/N] to simplify the vLLM openai translations serving ar chitecture (#32313 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-14 10:20:58 +00:00
Chauncey	9312a6c03a	[Refactor] [8/N] to simplify the vLLM openai responsesapi_serving architecture (#32260 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-14 07:26:24 +00:00
Chauncey	fefce49807	[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-13 13:01:39 +00:00
Kevin Šuc	ac9f9330e6	Rename --exclude-log-deltas to --enable-log-deltas (#32020 ) Signed-off-by: Catacomba <kevinsuc16@gmail.com>	2026-01-09 15:30:40 +00:00
Cyrus Leung	aa125ecf0e	[Frontend] Improve error message (#31987 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-08 20:07:03 +00:00
R3hankhan	1ab055efe6	[OpenAI] Extend VLLMValidationError to additional validation parameters (#31870 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-01-07 14:45:49 +00:00
Kevin Šuc	79ed460dd5	[Frontend] [Doc] Exclude log deltas feature (#30322 ) Signed-off-by: Catacomba <kevinsuc16@gmail.com> Signed-off-by: Kevin Šuc <kevinsuc16@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-01-05 16:34:35 +00:00
Hojin Yang	dc837bc23e	feat(frontend): add --default-chat-template-kwargs CLI argument (#31343 ) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>	2025-12-30 03:38:47 +00:00
RickyChen / 陳昭儒	b3a2bdf1ac	[Feature] Add offline FastAPI documentation support for air-gapped environments (#30184 ) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com> Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-29 16:22:39 +00:00
R3hankhan	769f27e701	[OpenAI] Add parameter metadata to validation errors (#30134 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2025-12-23 11:30:12 +00:00
Jakub Zakrzewski	23daef548d	[Frontend] Support using chat template as custom score template for reranking models (#30550 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-23 11:19:16 +00:00
Nathan Price	05a83dc6ee	feat(api): Eager chat template warmup to eliminate first-request latency (#30700 ) Signed-off-by: Nathan Price <nathan@abridge.com>	2025-12-18 00:01:29 +00:00
Chauncey	9ad5b21710	[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory (#30749 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-17 02:27:30 -08:00
Chauncey	2a1776b7ac	[Refactor] [2/N] Move tool parsers into the vLLM main directory (#30675 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-15 12:54:52 +00:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Tova Movshovitz	adb315060c	[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170 ) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-05 18:33:26 +00:00
Chauncey	3f42b05fbc	[Refactor] [1/N] to simplify the vLLM serving architecture (#28040 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-03 01:26:39 -08:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
sangbumlikeagod	092bb73b8a	[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209 ) Signed-off-by: sangbumlikeagod <oironese@naver.com> Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>	2025-12-01 18:19:17 +01:00
wang.yuqi	62de4f4257	[Frontend] Resettle pooling entrypoints (#29634 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-01 15:30:43 +08:00
HappyAmazonian	f8151b66fa	Revert "Supress verbose logs from model_hosting_container_standards (… (#29335 ) Signed-off-by: Shen Teng <sheteng@amazon.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-28 05:29:05 -08:00
Samit	371b1d4c61	[RL] Add Pause and Resume Generation for Asynchronous RL Training (#28037 ) Signed-off-by: SamitHuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: samithuang <285365963@qq.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-20 03:01:03 -08:00
Michael Goin	67745d189f	Supress verbose logs from model_hosting_container_standards (#28949 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-18 12:29:06 -08:00
Zhuohan Li	dd6ac1c2bb	[RL] [V1] Remove unused device argument from reset_kv_cache (#28766 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-11-14 23:59:42 -08:00
Nicolò Lucchesi	6f1e7f7226	[DisaggEverything] Tokens in<>out `/generate` endpoint (#24261 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-14 09:58:01 -07:00
Srreyansh Sethi	360bd8762f	[Frontend] Added chat-style multimodal support to /classify. (#27516 ) Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Signed-off-by: vnadathur <glvikramn@gmail.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: vnadathur <glvikramn@gmail.com> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-11-14 11:03:55 +00:00
Zuyi Zhao	bca74e32b7	[Frontend] Add sagemaker_standards dynamic lora adapter and stateful session management decorators to vLLM OpenAI API server (#27892 ) Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com> Signed-off-by: Shen Teng <sheteng@amazon.com> Co-authored-by: Shen Teng <sheteng@amazon.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-11 04:57:01 +00:00
Jialin Ouyang	b30372cbd0	[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-10 15:34:18 -08:00
Benjamin Chislett	975676d174	[Feat] Drop-in Torch CUDA Profiler (#27841 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-08 14:07:37 -08:00
Vico Chu	d4aa65c998	[Chore] eliminate duplicated and unconditional object serialization in anthropic messages api (#27792 ) Signed-off-by: Vico Chu <vico24826@gmail.com>	2025-11-06 19:09:19 +00:00
Roy Wang	d1dd5f53e4	[Frontend] Fix logging format when enable response logging (#28049 ) Signed-off-by: esmeetu <jasonailu87@gmail.com>	2025-11-06 16:25:39 +00:00
Walter Beller-Morales	752ddeacaa	[Core] add support for reasoning parser plugins (#28075 ) Signed-off-by: walter beller-morales <walter.beller.morales@gmail.com>	2025-11-06 01:15:06 +08:00
Chauncey	e261d37c9a	[Refactor] Lazy-loaded reasoning_parser (#28092 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-11-05 15:37:02 +08:00
wangxiyuan	428bc7bf1c	[V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-04 20:51:16 -08:00
Chauncey	c02fccdbd2	[Refactor] Lazy import tool_parser (#27974 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-11-04 10:10:10 +08:00
Misha Efimov	ba464e6ae2	Add ORCA endpoint load metrics support (#24905 ) Signed-off-by: Misha Efimov <mef@google.com>	2025-11-03 08:21:31 +00:00

1 2 3 4 5 ...

323 Commits