biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	43faa0461a	[Bugfix] Fix hybrid model tests (#17182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 15:14:37 -07:00
Daniel Li	48cb2109b6	[V1] Move usage stats to worker and start logging TPU hardware (#16211 )	2025-04-25 14:06:01 -06:00
Russell Bryant	a5450f11c9	[Security] Use safe serialization and fix zmq setup for mooncake pipe (#17192 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-04-25 16:53:23 +00:00
Cyrus Leung	9d98ab5ec6	[Misc] Inline Molmo requirements (#17190 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 16:41:44 +00:00
Reid	df5c879527	[doc] update wrong hf model links (#17184 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-25 16:40:54 +00:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Harry Mellor	0bd7f8fca5	Bump Transformers to 4.51.3 (#17116 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:34:34 -07:00
Jasmond L	d5615af9ae	[Bugfix] Fix Mistral ChatCompletionRequest Body Exception (#16769 ) Signed-off-by: Jasmond Loh <Jasmond.Loh@hotmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-25 07:26:30 -07:00
Cyrus Leung	19dcc02a72	[Bugfix] Fix mistral model tests (#17181 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-25 06:03:34 -07:00
Alex Brooks	7feae92c1f	[Doc] Move todo out of beam search docstring (#17183 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-04-25 04:44:58 -07:00
Michael Yao	f851b84266	[Doc] Add two links to disagg_prefill.md (#17168 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-25 10:23:57 +00:00
Lu Fang	fc966e9cc6	Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158 )	2025-04-25 17:10:32 +08:00
Michael Yao	ef19e67d2c	[Doc] Add headings to improve gptqmodel.md (#17164 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-25 01:13:13 -07:00
rasmith	a41351f363	[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-04-25 00:45:02 -07:00
Sangyeon Cho	6aae216b4e	[Bugfix] remove fallback in guided_json (int range, patterns) (#16725 ) Signed-off-by: csy1204 <josang1204@gmail.com> Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>	2025-04-25 06:54:43 +00:00
yexin(叶鑫)	b22980a1dc	[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457 ) Signed-off-by: cynthieye <yexin93@qq.com> Co-authored-by: MagnetoWang <magnetowang@outlook.com>	2025-04-25 14:52:28 +08:00
Lucas Wilkinson	881f735827	[Misc] Benchmark Serving Script Support Appending Results (#17028 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-24 22:53:55 -07:00
Mengqing Cao	2f54045508	[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-04-24 22:51:02 -07:00
Lifu Huang	5aa6efb9a5	[Misc] Clean up redundant code in uniproc_executor.py (#16762 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-04-24 22:49:30 -07:00
Harry Mellor	6ca0234478	Move missed `SchedulerConfig` args into scheduler config group in `EngineArgs` (#17131 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 22:48:53 -07:00
Michael Goin	649818995f	[Docs] Fix True->true in supported_models.md (#17141 )	2025-04-25 04:20:04 +00:00
Varun Sundar Rabindranath	7a0a9da72b	[Doc] V1 : Update LoRA status (#17133 ) Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>	2025-04-24 20:17:22 -07:00
Zaida Zhou	69bff9bc89	fix float16 support for kimi-vl (#17156 ) Co-authored-by: zhouzaida <zhouzaida@msh.team>	2025-04-24 20:16:32 -07:00
Lucas Wilkinson	41ca7eb491	[Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (#16864 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-24 20:12:21 -07:00
vllmellm	eef364723c	[FEAT] [ROCm]: AITER Fused MOE V1 Support (#16752 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-04-25 11:06:50 +08:00
jglaser	0d6e187e88	Use custom address for listening socket (#15988 ) Signed-off-by: Jens Glaser <glaserj@ornl.gov>	2025-04-25 01:57:16 +00:00
Michael Goin	9420a1fc30	Better error message for missing mistral params.json (#17132 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 23:43:08 +00:00
Rui Qiao	583e900996	[Misc] Add example to run DeepSeek with Ray Serve LLM (#17134 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-24 22:25:21 +00:00
Maximilien de Bayser	05e1fbfc52	Add chat template for Llama 4 models (#16428 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-24 20:19:36 +00:00
Yinghai Lu	fe92176321	Add collective_rpc to llm engine (#16999 ) Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>	2025-04-24 20:16:52 +00:00
Russell Bryant	6d0df0ebeb	[Docs] Generate correct github links for decorated functions (#17125 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-24 10:39:43 -07:00
Harry Mellor	0fa939e2d1	Improve configs - `LoRAConfig` + `PromptAdapterConfig` (#16980 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 10:29:34 -07:00
Harry Mellor	0422ce109f	Add `:markdownhelp:` to `EngineArgs` docs so markdown docstrings render properly (#17124 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 10:28:45 -07:00
Eyshika Agarwal	47bdee409c	Molmo Requirements (#17026 ) Signed-off-by: Eyshika Agarwal <eyshikaengineer@gmail.com> Signed-off-by: eyshika <eyshikaengineer@gmail.com>	2025-04-24 10:08:37 -07:00
Atilla	49f189439d	existing torch installation pip command fix for docs (#17059 )	2025-04-24 10:07:21 -07:00
Aaruni Aggarwal	5adf6f6b7f	Updating builkite job for IBM Power (#17111 ) Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>	2025-04-24 10:06:17 -07:00
Russell Bryant	4115f19958	[CI] Add automation for the `tool-calling` github label (#17118 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-24 09:22:00 -07:00
Mark McLoughlin	340d7b1b21	[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-04-24 08:57:40 -07:00
Reid	1bcbcbf574	[Misc] refactor example series - structured outputs (#17040 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-24 07:49:48 -07:00
Michael Goin	82e43b2d7e	Add missing rocm_skinny_gemms kernel test to CI (#17060 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 07:49:37 -07:00
wang.yuqi	67309a1cb5	[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970 )	2025-04-24 07:06:28 -07:00
Shanshan Shen	b724afe343	[V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (#16954 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-24 06:15:03 -07:00
Harry Mellor	21f4f1c9a4	Improve static type checking in `LoRAModelRunnerMixin` (#17104 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 06:14:47 -07:00
Isotr0py	b0c1f6202d	[Misc] Remove OLMo2 config copy (#17066 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-24 06:14:32 -07:00
Rui Qiao	c0dfd97519	[V1][PP] Optimization: continue scheduling prefill chunks (#17080 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-24 05:27:08 -07:00
Harry Mellor	a9138e85b1	Fix OOT registration test (#17099 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:44:12 -07:00
Harry Mellor	0a05ed57e6	Simplify `TokenizerGroup` (#16790 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:43:56 -07:00
Michael Goin	14288d1332	Disable enforce_eager for V1 TPU sampler and structured output tests (#17016 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-24 02:50:09 -07:00
Woosuk Kwon	b411418ff0	[Chore] Remove Sampler from Model Code (#17084 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-24 02:49:33 -07:00
omer-dayan	2bc0f72ae5	Add docs for runai_streamer_sharded (#17093 ) Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-24 01:03:21 -07:00

... 19 20 21 22 23 ...

7056 Commits