biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nicolò Lucchesi	fa3bba2a53	[TPU][V1] Enable Top-P (#16843 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-22 00:46:07 +00:00
Michael Goin	986537f1c3	[V1] V1 FlashInfer Attention (#16684 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Aurick Qiao <qiao@aurick.net>	2025-04-22 00:38:41 +00:00
Nicolò Lucchesi	210207525e	[TPU][V1] Capture multimodal encoder during model compilation (#15051 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Siyuan Liu <lsiyuan@google.com>	2025-04-21 18:36:59 -06:00
Michael Goin	71eda0bb76	Update Qwen1.5-MoE-W4A16-compressed-tensors.yaml (#16946 )	2025-04-21 18:35:32 -06:00
Chengji Yao	471fe65630	[TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-21 15:43:13 -06:00
Woosuk Kwon	3a0fba5cf4	[V1][Spec Decode] Handle draft tokens beyond max_model_len (#16087 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-21 12:38:50 -07:00
Chanh Nguyen	299ebb62b2	[Core] Speed up decode by remove synchronizing operation in sampler (#16436 ) Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>	2025-04-21 18:18:22 +00:00
David Xia	f728ab8e35	[Doc] mention how to install in CPU editable mode (#16923 ) Signed-off-by: David Xia <david@davidxia.com>	2025-04-21 17:45:51 +00:00
David Xia	63e26fff78	[doc] install required python3-dev apt package (#16888 ) Signed-off-by: David Xia <david@davidxia.com>	2025-04-21 16:15:18 +00:00
Yan Ma	fe3462c774	[XPU][Bugfix] minor fix for XPU (#15591 ) Signed-off-by: yan ma <yan.ma@intel.com>	2025-04-22 00:02:57 +08:00
Kartik Ramesh	3b34fd5273	Raise error for data-parallel with benchmark_throughput (#16737 ) Signed-off-by: Kartik Ramesh <kartikx2000@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-04-21 23:51:43 +08:00
Isotr0py	55d6d3fdb8	[Bugfix] Fix GLM rotary_dim issue and support v1 (#16912 ) Signed-off-by: isotr0py <2037008807@qq.com>	2025-04-21 14:26:34 +00:00
Shanshan Shen	7272bfae77	[Misc] Refactor platform to get device specific stream and event (#14411 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-21 21:25:49 +08:00
wangxiyuan	d9ac9e3dc5	[Misc] fix collect_env version parse (#15267 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-21 20:29:40 +08:00
Han Zhang	d41faaf9df	Restore buffers when wake up from level 2 sleep (#16564 ) (#16889 ) Signed-off-by: Han <zh950713@gmail.com>	2025-04-21 20:18:28 +08:00
Alex Brooks	b34f33438a	[Doc] Split dummy_processor_inputs() in Multimodal Docs (#16915 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-04-21 11:10:01 +00:00
Yang Fan	26c0406555	[Bugfix] Fix distributed bug in Qwen2.5-VL & Qwen2.5-Omni (#16907 )	2025-04-21 10:25:21 +00:00
Woosuk Kwon	4c41278b77	[CI/CD][V1] Add spec decode tests to CI (#16900 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-20 22:37:16 -07:00
qizixi	bb3605db85	[Bugfix] Fix v1/spec_decode/test_ngram.py (#16895 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-04-20 20:54:29 -07:00
Richard Zou	fe742aef5a	[easy] Pass compile_fx only the config patches (#16845 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-20 12:25:19 +08:00
Harry Mellor	4b07d36891	Improve configs - `CacheConfig` (#16835 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-20 12:25:04 +08:00
Staszek Paśko	87aaadef73	Serialize tensors using int8 views (#16866 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-19 10:28:34 -07:00
Richard Zou	682e0b6d2f	Log how much time loading a compiled artifact takes (#16848 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-19 16:50:46 +00:00
Reid	d6195a748b	[doc] update hyperlink (#16877 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-19 16:40:38 +00:00
Cyrus Leung	205d84aaa9	[VLM] Clean up models (#16873 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-19 12:13:06 +00:00
Roger Wang	5124f5bf51	[Model] Qwen2.5-Omni Cleanup (#16872 )	2025-04-19 09:37:02 +00:00
Isotr0py	83f3c3bd91	[Model] Refactor Phi-4-multimodal to use merged processor and support V1 (#15477 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-19 02:26:11 -07:00
vie-serendipity	d9737ca1c6	[V1][Misc] stop update prefix cache stats when logs_stats is disabled (#16460 ) Signed-off-by: vie-serendipity <2733147505@qq.com>	2025-04-19 02:25:19 -07:00
Nicolò Lucchesi	9d4ca19d50	[Misc] Benchmarks for audio models (#16505 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-19 02:24:14 -07:00
Nicolò Lucchesi	2ef0dc53b8	[Frontend] Add sampling params to `v1/audio/transcriptions` endpoint (#16591 ) Signed-off-by: Jannis Schönleber <joennlae@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Jannis Schönleber <joennlae@gmail.com>	2025-04-19 07:03:54 +00:00
Divakar Verma	1d4680fad2	[rocm][MI300] llama4 maverick fp8 moe config tp8 (#16847 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-04-19 06:21:43 +00:00
Yang Fan	2c1bd848a6	[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130 ) Signed-off-by: fyabc <suyang.fy@alibaba-inc.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiong Wang <wangxiongts@163.com>	2025-04-18 23:14:36 -07:00
omrishiv	5c9121203c	[release] Publish neuron docker image (#16733 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2025-04-18 17:11:25 -07:00
Justin Ho	490b1698a5	[Doc] Updated Llama section in tool calling docs to have llama 3.2 config info (#16857 ) Signed-off-by: jmho <jaylenho734@gmail.com>	2025-04-18 23:28:53 +00:00
Reid	5a5e29de88	[Misc] refactor examples series - Chat Completion Client With Tools (#16829 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-18 23:24:42 +00:00
wang.yuqi	3d3ab3689f	[New Model]: Snowflake Arctic Embed (Family) (#16649 )	2025-04-18 08:11:57 -07:00
Harry Mellor	686623c5e7	Fix `nullable_kvs` fallback (#16837 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-18 05:58:39 -07:00
Cyrus Leung	aadb656562	[Misc] Clean up Kimi-VL (#16833 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-18 05:15:09 -07:00
Jonghyun Choe	87e067de41	[Model] use AutoWeightsLoader for BigCode, GPT-J (#16823 ) Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>	2025-04-18 10:42:41 +00:00
Michael Yao	26507f8973	[Docs] Fix a link and grammar issue in production-stack.md (#16809 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-18 06:42:58 +00:00
Nathan Weinberg	9c1d5b456d	[Doc] add podman setup instructions for official image (#16796 ) Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-18 06:10:49 +00:00
Lucia Fang	e31045f95c	[Bugfix] fix pp for llama4 (#16746 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-04-18 13:51:30 +08:00
Luka Govedič	aaec845f8e	[ROCm] [Attention] Cleanup ROCm output passing (#16431 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-04-18 05:46:45 +00:00
rongfu.leng	7bdfd29a35	[Misc] add collect_env to cli and docker image (#16759 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-17 22:13:35 -07:00
Harry Mellor	e78587a64c	Improve-mm-and-pooler-and-decoding-configs (#16789 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 22:13:32 -07:00
Lucas Wilkinson	7eb4255628	[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-04-17 22:13:29 -07:00
Michael Goin	6a0f547561	Add hardware print to TPU V1 test (#16792 )	2025-04-17 22:13:26 -07:00
Shanshan Shen	30ed81b7ca	[V1][Structured Output] Minor modification to `_validate_structured_output()` (#16748 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-18 13:12:54 +08:00
Chauncey	7a4a5de729	[Misc] Update outdated note: LMCache now supports chunked prefill (#16697 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-18 05:12:42 +00:00
Cyrus Leung	c16fb5dae8	[Doc] Improve help examples for `--compilation-config` (#16729 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-17 21:22:34 -07:00

1 2 3 4 5 ...

5941 Commits