biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Aaron Pham	77a318bd01	[V1][Core] Support MistralTokenizer for Structured Output (#14625 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-12 10:40:09 +08:00
Farzad Abdolhosseini	80e78d02ac	[Model] Extend Ultravox to accept audio longer than 30s (#13631 ) Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>	2025-03-12 10:27:10 +08:00
Joe Runde	47532cd9f4	[core][V1] pluggable scheduler (#14466 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-03-12 01:15:15 +00:00
Russell Bryant	4bf82d4b90	[V1] Add regex structured output support with xgrammar (#14590 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-11 23:03:44 +08:00
Cyrus Leung	af295e9b01	[Bugfix] Update `--hf-overrides` for `Alibaba-NLP/gte-Qwen2` (#14609 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-11 07:59:43 -07:00
Jeff Daily	a1c8f3796c	dynamic distpatch of fp8 kernels (#14245 ) Signed-off-by: Jeff Daily <jeff.daily@amd.com>	2025-03-11 10:54:56 -04:00
Roger Wang	1fc973c0b5	[V1][Core] Fix memory issue with logits & sampling (#14508 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com>	2025-03-11 04:03:41 +00:00
Liangfu Chen	c91b64f749	[neuron] add reshape_and_cache (#14391 )	2025-03-10 18:37:29 -07:00
gnovack	d6123170d5	[Neuron] Add Neuron device communicator for vLLM v1 (#14085 )	2025-03-10 18:37:04 -07:00
Varun Sundar Rabindranath	5ff0d32580	[V1] LoRA - Add triton kernels for V1 (#13096 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-10 17:27:53 -04:00
Harry Mellor	3b352a2f92	Correct capitalisation: `VLLM` -> `vLLM` (#14562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-10 16:36:21 +00:00
Szymon Ożóg	89cdaa83e7	[Kernel] Add more dtype support for GGUF kernels (#14043 ) Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com> Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>	2025-03-10 07:30:04 -07:00
Robert Shaw	5f0b53c6ea	Revert "[V1][Core] Fix memory issue with logits & sampling" (#14504 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-03-08 17:43:37 -08:00
22quinn	eb8b5eb183	[V1] Support bad_words in sampler (#13376 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-08 14:50:26 -08:00
Isotr0py	609ef61fea	[Bugfix] Fix profiling OOM and decouple encoder multimodal profiling (#14361 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-08 16:52:34 +00:00
Roger Wang	8d5aa466fb	[V1][Core] Fix memory issue with logits & sampling (#13776 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-08 06:11:04 -08:00
Alexander Matveev	cb8bdfade2	[V1] TPU - Add tensor parallel support via Ray (#13618 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-08 08:19:38 -05:00
Cyrus Leung	33f227e16b	[CI/Build] Use a fixed seed to avoid flaky tests (#14480 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-08 11:30:09 +00:00
Harry Mellor	47512b3200	Default to `generation_config` from model (#12622 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-08 14:46:15 +08:00
afeldman-nm	ef64044079	[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949 )	2025-03-08 01:48:12 +00:00
Nick Hill	8ed5421aaa	[V1] Eagerly remove finished requests from the batch (#14388 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-07 10:56:00 -08:00
Jinzhen Lin	d0feea31c7	[Kernel] optimize performance of gptq marlin kernel when n is small (#14138 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-03-07 11:53:38 -05:00
Aaron Pham	80e9afb5bc	[V1][Core] Support for Structured Outputs (#12388 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-07 07:19:11 -08:00
மனோஜ்குமார் பழனிச்சாமி	cc10281498	[Misc] Set default value of seed to None (#14274 ) Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2025-03-07 10:40:01 +00:00
Jee Jee Li	12c29a881f	[Bugfix] Further clean up LoRA test (#14422 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-07 10:30:55 +00:00
Ilya Lavrenov	8ca7a71df7	OpenVINO: added CPU-like conditions (#14338 ) Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com>	2025-03-06 22:24:49 -08:00
Jee Jee Li	ddd1ef66ec	[Bugfix] Fix JambaForCausalLM LoRA (#14370 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-06 22:05:47 -08:00
Luka Govedič	e1744502c2	[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-03-07 05:20:16 +00:00
Himanshu Jaju	cd579352bf	[V1] Do not detokenize if sampling param detokenize is False (#14224 ) Signed-off-by: Himanshu Jaju <hj@mistral.ai> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-06 10:40:24 -08:00
Harry Mellor	bf0560bda9	Reinstate `best_of` for V0 (#14356 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-06 08:34:22 -08:00
Thomas Parnell	6bd1dd9d26	[Kernel] [V1] Improved performance for V1 Triton (ROCm) backend (#14152 )	2025-03-06 07:39:16 -08:00
Nicolò Lucchesi	fa82b93853	[Frontend][Docs] Transcription API streaming (#13301 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-06 10:39:35 +00:00
kYLe	1769928079	[Model] Update Paligemma multimodal processing with PromptUpdate (#14015 ) Signed-off-by: Kyle Huang <kylhuang@nvidia.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-03-06 08:31:38 +00:00
Nicolò Lucchesi	5ee10e990d	[Bugfix][CI] ALiBi test case in xformers multi_query_kv_attention (#11301 )	2025-03-05 20:00:53 -08:00
Varun Sundar Rabindranath	3dbd2d813a	[V1] LoRA - Enable more V1 tests (#14315 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-06 11:55:42 +08:00
Lucas Wilkinson	f6bb18fd9a	[BugFix] MLA + V1, illegal memory access and accuracy issues (#14253 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-05 17:10:13 -08:00
Lu Fang	53ea6ad830	[V1][Easy] Add empty allowed_token_ids in the v1 sampler test (#14308 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-03-05 21:41:18 +00:00
Vincent	a4f1ee35d6	Deprecate `best_of` Sampling Parameter in anticipation for vLLM V1 (#13997 ) Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com> Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-05 20:22:43 +00:00
Robert Shaw	257e200a25	[V1][Frontend] Add Testing For V1 Runtime Parameters (#14159 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-03-05 14:18:55 +00:00
Benjamin Chislett	32985bed7c	[Frontend] Allow return_tokens_as_token_ids to be passed as a request param (#14066 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-03-05 06:30:40 +00:00
Michael Goin	dae9ec464c	Temporarily disable test_awq_gemm_opcheck (#14251 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-05 06:10:35 +00:00
Tyler Michael Smith	72c62eae5f	[V1] EP/TP MoE + DP Attention (#13931 )	2025-03-04 21:27:26 -08:00
Congcong Chen	0a995d5434	[Model] New model support for Phi-4-multimodal-instruct (#14119 )	2025-03-04 20:57:01 -08:00
Nick Hill	5db6b2c961	[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs (#13869 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-04 15:06:47 +00:00
Travis Johnson	c060b71408	[Model] Add support for GraniteMoeShared models (#13313 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-04 08:04:52 +08:00
Mark McLoughlin	ae122b1cbd	[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 19:04:45 +00:00
TJian	848a6438ae	[ROCm] Faster Custom Paged Attention kernels (#12348 )	2025-03-03 09:24:45 -08:00
Cody Yu	f35f8e2242	[Build] Make sure local main branch is synced when VLLM_USE_PRECOMPILED=1 (#13921 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-03 16:43:14 +08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Ce Gao	bf33700ecd	[v0][structured output] Support reasoning output (#12955 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-02 14:49:42 -05:00

... 54 55 56 57 58 ...

4252 Commits