biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jeremy Arnold	58abe35455	[Benchmarks] Make detokenization optional in benchmark scripts (#11697 ) Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>	2025-03-07 08:09:00 -08:00
York-RDWang	f7ebad2307	[Doc] Update prefix_caching.md to match the example image (#14420 )	2025-03-07 15:29:00 +00:00
Aaron Pham	80e9afb5bc	[V1][Core] Support for Structured Outputs (#12388 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-07 07:19:11 -08:00
iefgnoix	1e3598edeb	Use the optimized block sizes after tuning the kernel. (#14329 )	2025-03-07 13:25:13 +00:00
Harry Mellor	f7a6bd0fa1	Fix missing `kv_caches` and `attn_metadata` in `OpenVINOCausalLM` (#14271 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-07 12:30:42 +00:00
Aleksandr Malyshev	0ca3b8e01c	[BUGFIX] Skip tokenization support for throughput benchmark (#12712 ) Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-03-07 02:51:47 -08:00
மனோஜ்குமார் பழனிச்சாமி	cc10281498	[Misc] Set default value of seed to None (#14274 ) Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2025-03-07 10:40:01 +00:00
Cyrus Leung	05fb6718f0	[Bugfix] Clean up multi-modal processors (#14417 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-07 10:33:38 +00:00
Jee Jee Li	12c29a881f	[Bugfix] Further clean up LoRA test (#14422 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-07 10:30:55 +00:00
Peng Li	70da0c0748	correct wrong markdown syntax (#14414 ) Signed-off-by: vincent-pli <justdoit.pli@gmail.com>	2025-03-07 08:01:18 +00:00
Cyrus Leung	c1588a2c94	[GH] Auto-apply multi-modality label to relevant PRs (#14402 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-07 15:26:32 +08:00
Ilya Lavrenov	8ca7a71df7	OpenVINO: added CPU-like conditions (#14338 ) Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com>	2025-03-06 22:24:49 -08:00
Isotr0py	63137cd922	[Build] Add nightly wheel fallback when latest commit wheel unavailable (#14358 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-06 22:10:57 -08:00
Jee Jee Li	ddd1ef66ec	[Bugfix] Fix JambaForCausalLM LoRA (#14370 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-06 22:05:47 -08:00
Lucas Wilkinson	e5e03c2c1b	[BugFix] Illegal Memory Access in the blockwise cutlass fp8 GEMMs (#14396 )	2025-03-06 21:56:06 -08:00
Luka Govedič	e1744502c2	[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-03-07 05:20:16 +00:00
Lucas Wilkinson	dae6896977	[Perf] Reduce MLA CPU overheads in V1 (#14384 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-06 19:59:14 -08:00
Brayden Zhong	c34eeec58d	[Bugfix] Correctly call `cudaProfilerStop` in benchmarks script (#14183 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-07 00:42:49 +00:00
Daniel Li	ad60bbb2b2	[Doc] Fix a typo (#14385 )	2025-03-06 16:31:52 -08:00
Chengji Yao	0578e5a462	[Hardware][TPU]Enable ragged paged attention kernel and resolve recompilation issue (#14310 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-06 23:31:05 +00:00
Michael Goin	04222984f8	[Docs] Add nsight guide to profiling docs (#14298 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-06 14:19:58 -08:00
Michael Goin	6832707e90	[V1][Bugfix] Standardize quantized kv cache rejection for attention backends (#14221 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-06 14:18:29 -08:00
Michael Goin	6b2ef5cd17	[Bug] Fix Attention when ignored in by quant_method (#14313 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-06 14:18:06 -08:00
Tyler Michael Smith	958adce478	[Bugfix] Fix use_direct_call condition in FusedMoE layer for (#14382 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-06 14:17:21 -08:00
Tyler Michael Smith	99b0915d3b	[Kernel] Add needs_fixed_stride_order tag to most GEMMs (#14306 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-06 14:17:09 -08:00
Thomas Parnell	8ca2b21c98	[CI] Disable spawn when running V1 Test (#14345 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-03-06 21:52:46 +00:00
Michael Goin	d9292786e1	[CI/Build] Use uv python for docker rather than ppa:deadsnakes/ppa (#13569 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-03-06 16:08:36 -05:00
Tyler Michael Smith	cc2f9b32c8	[Distributed] Add enable_expert_parallel arg (#14305 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-06 18:54:45 +00:00
Himanshu Jaju	cd579352bf	[V1] Do not detokenize if sampling param detokenize is False (#14224 ) Signed-off-by: Himanshu Jaju <hj@mistral.ai> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-06 10:40:24 -08:00
Ying Zhong	9f1710f1ac	Fix mla prefill context performance (#13897 ) Signed-off-by: ZhongYingMatrix <zhongyingmatrix@gmail.com>	2025-03-06 09:35:49 -08:00
Thomas Parnell	e642ec962c	Add authors to license header. (#14371 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>	2025-03-06 08:43:09 -08:00
Dilip Gowda Bhagavan	ada19210a3	Adding cpu inference with VXE ISA for s390x architecture (#12613 ) Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com> Signed-off-by: Rishika Kedia <rishika.kedia@in.ibm.com> Co-authored-by: Rishika Kedia <rishika.kedia@in.ibm.com>	2025-03-06 08:40:53 -08:00
Harry Mellor	bf0560bda9	Reinstate `best_of` for V0 (#14356 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-06 08:34:22 -08:00
youkaichao	151b08e0fe	[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-07 00:32:46 +08:00
Jitse Klomp	81b2f4a45f	[Doc] Fix date typo in README.md (#14366 ) Signed-off-by: Jitse Klomp <jitse.klomp@conclusionxforce.nl>	2025-03-06 08:29:57 -08:00
Cyrus Leung	82551ad616	[Core] Don't use cache during multi-modal profiling (#14336 )	2025-03-06 08:03:31 -08:00
courage17340	caac5c2e59	[Bugfix][Core] fix abort_seq_group and memory leak when n>1 (#14326 ) Signed-off-by: courage17340 <courage17340@163.com>	2025-03-06 23:59:32 +08:00
Thomas Parnell	6bd1dd9d26	[Kernel] [V1] Improved performance for V1 Triton (ROCm) backend (#14152 )	2025-03-06 07:39:16 -08:00
Irina Yuryeva	4f27044aab	[Doc] Correct beam_search using in generative_models.md (#14363 )	2025-03-06 15:37:10 +00:00
Yanyi Liu	0ddc991f5c	[Doc] Update reasoning with stream example to use OpenAI library (#14077 ) Signed-off-by: liuyanyi <wolfsonliu@163.com>	2025-03-06 13:20:37 +00:00
Nicolò Lucchesi	fa82b93853	[Frontend][Docs] Transcription API streaming (#13301 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-06 10:39:35 +00:00
Nicolò Lucchesi	69ff99fdcd	[Core] Optimizing cross-attention `QKVParallelLinear` computation (#12325 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal> Co-authored-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>	2025-03-06 09:37:26 +00:00
lkchen	5d802522a7	[V1][VLM][Pixtral-HF] Support Pixtral-HF on V1 (#14275 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-03-06 08:58:41 +00:00
kYLe	1769928079	[Model] Update Paligemma multimodal processing with PromptUpdate (#14015 ) Signed-off-by: Kyle Huang <kylhuang@nvidia.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-03-06 08:31:38 +00:00
Pavani Majety	ed6ea06577	[Hardware] Update the flash attn tag to support Blackwell (#14244 )	2025-03-05 22:01:37 -08:00
Nicolò Lucchesi	5ee10e990d	[Bugfix][CI] ALiBi test case in xformers multi_query_kv_attention (#11301 )	2025-03-05 20:00:53 -08:00
Varun Sundar Rabindranath	3dbd2d813a	[V1] LoRA - Enable more V1 tests (#14315 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-06 11:55:42 +08:00
Ce Gao	f5f7f00cd9	[Bugfix][Structured Output] Support outlines engine with reasoning outputs for DeepSeek R1 (#14114 )	2025-03-06 03:49:20 +00:00
Rui Qiao	abcc61e0af	[misc] Mention `ray list nodes` command to troubleshoot ray issues (#14318 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-03-06 02:00:36 +00:00
Lucas Wilkinson	f6bb18fd9a	[BugFix] MLA + V1, illegal memory access and accuracy issues (#14253 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-05 17:10:13 -08:00

... 40 41 42 43 44 ...

7056 Commits