biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Bryan Lu	781d056280	[Feature] Enhance EAGLE Architecture with Proper RMS Norms (#14990 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-26 08:24:07 +00:00
daniel-salib	5aefd6ac31	Fix raw_request extraction in load_aware_call decorator (#15382 ) Signed-off-by: Daniel Salib <danielsalib@meta.com>	2025-03-25 22:29:54 -07:00
Varun Sundar Rabindranath	6c663dfd5e	[misc] LoRA - Skip LoRA kernels when not required (#15152 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-26 11:33:45 +08:00
Lucas Wilkinson	33437bc6e7	[BugFix] Fix nightly MLA failure (FA2 + MLA chunked prefill, i.e. V1, producing bad results) (#15492 ) Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>	2025-03-25 20:33:22 -07:00
Tyler Michael Smith	23114d3364	[Misc] Warn about v0 in benchmark_paged_attn.py (#15495 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-25 20:31:04 -07:00
Cyrus Leung	997c8811d6	[Model] Support multi-image for Molmo (#15438 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-26 11:26:33 +08:00
Harry Mellor	e42389f9d7	Transformers backend already supports V1 (#15463 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-25 20:26:16 -07:00
Varun Sundar Rabindranath	ff38f0a32c	[CI/Build] LoRA: Delete long context tests (#15503 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-25 17:18:34 -07:00
Varun Sundar Rabindranath	a5cfbab3c8	[Core] LoRA: V1 Scheduler optimization (#15422 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-25 22:50:09 +00:00
Chenyaaang	ac3cd6e83c	[core] add bucket padding to tpu_model_runner (#14995 ) Signed-off-by: Chenyaaang <llccyy1212@gmail.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-25 17:27:22 -04:00
Lu Fang	082ab86f5f	[V1] Support long_prefill_token_threshold in v1 scheduler (#15419 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-03-25 14:22:26 -07:00
Nick Hill	6aa196c8dc	[V1][Minor] Use `SchedulerInterface` type for engine scheduler field (#15499 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-25 14:21:36 -07:00
Nicolò Lucchesi	a0dd7dcd49	[TPU][V1] Fix Sampler recompilation (#15309 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-25 16:43:54 -04:00
Maximilien de Bayser	e977c11111	Add workaround for shared field_names in pydantic model class (#13925 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-03-25 20:31:08 +00:00
Joe Runde	5f063a80bd	[bugfix] add supports_v1 platform interface (#15417 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-03-25 15:00:32 -04:00
Antonio Gómez	5d8e1c9279	[Bugfix] Support triton==3.3.0+git95326d9f for RTX 5090 (Unsloth + vLLM compatibility) (#15471 ) Co-authored-by: ServerAI <ai@exc-mad-ai.com>	2025-03-25 17:59:25 +00:00
yarongmu-google	0a049c7d86	[CI/Build] Add tests for the V1 tpu_model_runner. (#14843 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-03-25 12:27:16 -04:00
youkaichao	d0cfec7ab9	[bugfix] fix inductor cache on max_position_embeddings (#15436 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-25 07:05:39 -07:00
Szymon Ożóg	a608160027	[Kernel] Fix conflicting macro names for gguf kernels (#15456 ) Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>	2025-03-25 13:50:49 +00:00
Cyrus Leung	3f04a7fbf2	[Doc] Update V1 user guide for multi-modality (#15460 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-25 11:01:58 +00:00
Cyrus Leung	5994430b84	[Misc] Remove redundant `num_embeds` (#15443 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-25 18:27:57 +08:00
Cyrus Leung	a9e879b316	[Misc] Clean up MiniCPM-V/O code (#15337 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-25 10:22:52 +00:00
Md. Shafi Hussain	3e2f37a69a	Dockerfile.ppc64le changes to move to UBI (#15402 ) Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>	2025-03-25 10:15:14 +00:00
Thien Tran	4f044b1d67	[Kernel][CPU] CPU MLA (#14744 ) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>	2025-03-25 09:34:59 +00:00
Siyuan Liu	4157f563b4	[Hardware][TPU][Bugfix] Fix v1 mp profiler (#15409 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-03-25 01:43:00 -07:00
Lu Fang	051da7efe3	Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160 ) Signed-off-by: Lu Fang <lufang@fb.com> Co-authored-by: Richard Barnes <rbarnes@meta.com>	2025-03-25 15:36:45 +08:00
Woosuk Kwon	25f560a62c	[V1][Spec Decode] Update target_logits in place for rejection sampling (#15427 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> v0.8.2	2025-03-24 21:04:41 -07:00
Russell Bryant	a09ad90a72	[V1] guidance backend for structured output + `auto` fallback mode (#14779 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>	2025-03-24 21:02:33 -07:00
Chauncey	10b34e36b9	[Bugfix] Fixed the issue of not being able to input video and image simultaneously (#15387 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-03-25 03:48:08 +00:00
Tyler Michael Smith	b5269db959	Revert "Fix non-contiguous input passed to Marlin kernel (#15319 )" (#15398 )	2025-03-24 20:43:51 -07:00
Jee Jee Li	6db94571d7	[Misc] Remove LoRA log (#15388 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-24 20:43:48 -07:00
Harry Mellor	97cfa65df7	Add pipeline parallel support to `TransformersModel` (#12832 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-03-25 10:41:45 +08:00
Woosuk Kwon	911c8eb000	[Minor][Spec Decode] Remove compiled_softmax (#15416 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-24 19:09:04 -07:00
Woosuk Kwon	ebcebeeb6b	[V1][Spec Decode] Enable spec decode for top-p & top-k sampling (#15063 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-24 17:16:46 -07:00
Gregory Shtrasberg	f533b5837f	[ROCm][Kernel] MoE weights padding (#14454 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: charlifu <charlifu@amd.com>	2025-03-24 23:45:30 +00:00
Gregory Shtrasberg	8279201ce6	[Build] Cython compilation support fix (#14296 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-03-24 23:37:54 +00:00
Siyuan Liu	23fdab00a8	[Hardware][TPU] Skip failed compilation test (#15421 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-03-24 23:28:57 +00:00
Nick Hill	623e2ed29f	[BugFix][V1] Quick fix for min_tokens with multiple EOS (#15407 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-24 15:58:59 -07:00
Nick Hill	9d72daf4ce	[V1][Perf] Simpler request output queues (#15156 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-24 22:44:08 +00:00
Cyrus Leung	6dd55af6c9	[Doc] Update docs on handling OOM (#15357 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-03-24 14:29:34 -07:00
Yuan Tang	3eb08ed9b1	[DOC] Add Kubernetes deployment guide with CPUs (#14865 )	2025-03-24 10:48:43 -07:00
liuzhenwei	5eeadc2642	[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral (#12303 ) Signed-off-by: zhenwei <zhenweiliu@habana.ai>	2025-03-24 09:48:40 -07:00
Nick Hill	3aee6573dc	[V1] Aggregate chunked prompt logprobs in model runner (#14875 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-24 12:27:57 -04:00
Yi Liu	9cc645141d	[MISC] Refine no available block debug msg (#15076 ) Signed-off-by: Yi Liu <yiliu4@habana.ai> Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: Yi Liu <yiliu4@habana.ai>	2025-03-25 00:01:10 +08:00
Chen1022	0893567db9	[V1][Minor] fix comments (#15392 ) Signed-off-by: chenjincong <chenjincong@baidu.com> Signed-off-by: Chen-0210 <chenjincong11@gmail.com> Co-authored-by: chenjincong <chenjincong@baidu.com>	2025-03-24 08:45:32 -07:00
Russell Bryant	8abe69b499	[Core] Don't force uppercase for VLLM_LOGGING_LEVEL (#15306 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-24 08:27:30 -07:00
Manish Sethi	761702fd19	[Core] Integrate `fastsafetensors` loader for loading model weights (#10647 ) Signed-off-by: Manish Sethi <Manish.sethi1@ibm.com>	2025-03-24 08:08:02 -07:00
youkaichao	9606d572ed	[distributed] fix dp group (#15355 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-24 14:54:27 +00:00
Cyrus Leung	cbcdf2c609	[Bugfix] Fix chat template loading (#15143 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-03-24 13:50:09 +00:00
Russell Bryant	038de04d7b	Fix zmq IPv6 URL format error (#15341 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-24 09:30:41 -04:00

... 32 33 34 35 36 ...

7056 Commits