biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Chenyaaang	33d5e29be9	[TPU] Fix tpu model runner test (#19995 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-06-23 16:04:28 -07:00
cascade	e6327c9b3e	[Feature] Support sequence parallelism for static fp8 quantization (#19181 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-06-23 16:09:02 -04:00
Isotr0py	61f4fc5dc6	[Bugfix][v1] Fix step pooler implementation and step pooling usage in v1 (#19956 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-23 18:38:06 +00:00
Tyler Michael Smith	68aaeb3749	[EP+DP] Optimize the little operations in the DeepGEMM + DeepEP low latency case (#19885 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-23 11:07:47 -07:00
Jee Jee Li	a6e6604d32	[Bugfix] Fix CI bitsandbytes failure (#19969 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-23 21:30:55 +08:00
lkchen	1bcd15edc7	[BugFix][P/D] Fix for cases where _recving_transfers can be cleaned up when all transfer done (#19874 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-06-22 22:41:53 -07:00
amit	4a0f7888a3	[Core] feat: Implement Priority Scheduling in V1 Engine (#19057 ) Signed-off-by: amit <amit.man@gmail.com> Co-authored-by: Roger Wang <Rogerw0108@gmail.com>	2025-06-22 20:18:08 -07:00
汪志鹏	c3bf9bad11	[New model support]Support Tarsier2 (#19887 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-21 04:01:51 +00:00
Li, Jiang	79f2f1c2a1	[CPU][CI] Fallback sliding window to v0 and fix CPU pooling model tests (#19901 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-06-20 15:30:36 +00:00
Vlad Tiberiu Mihailescu	2e3e3c86dc	Export NaNs in logits to scheduler_stats if output is corrupted (#18777 ) Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>	2025-06-20 22:47:16 +08:00
Chendi.Xue	7e8977fcd4	[custom_op][vllm-plugin] update custom_op class to use op_registry (#19164 ) Signed-off-by: Chendi.Xue <chendi.xue@intel.com>	2025-06-20 07:44:56 -07:00
Adrian	f1e840e842	[Model] GPT2ForSequenceClassification model (#19663 ) Signed-off-by: nie3e <adrcwiek@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-06-20 12:07:41 +00:00
kourosh hakhamaneshi	5e666f72cd	[Bugfix][Ray] Set the cuda context eagerly in the ray worker (#19583 )	2025-06-19 22:01:16 -07:00
Isotr0py	ee9a1531aa	[CI/Build][Bugfix] Fix deadlock on v1 engine test CI (#19872 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-20 09:51:07 +08:00
Alex Brooks	ead2110297	[Core][Bugfix] Fix Online MM Beam Search (#19688 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-06-19 17:18:07 +00:00
Alexei-V-Ivanov-AMD	4719460644	Fixing Chunked Prefill Test. (#19762 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-06-19 01:36:16 -07:00
Zuxin	1d0ae26c85	Add xLAM tool parser support (#17148 )	2025-06-19 14:26:41 +08:00
Isotr0py	6021999573	[Minor] Allow redirecting model path for HfRunner in test (#19795 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-18 23:04:10 -07:00
Yu-Hang "Maxin" Tang	83ca9ae47b	Mark invariant normalizer in Gemma as non-persistent (#19788 ) Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com>	2025-06-18 22:56:03 -07:00
kourosh hakhamaneshi	e2148dc5ea	[Bugfix] Add check_health to v1 async client. (#19821 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2025-06-18 21:47:01 -07:00
Maximilien de Bayser	799397ee4f	Support embedding models in V1 (#16188 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-18 21:36:33 -07:00
Richard Zou	ed33349738	[BugFix] Fix use_cudagraph=False (#19612 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-06-19 08:23:12 +08:00
Lukas Geiger	3b523e38d9	[Core] Do not copy array during hashing (#19484 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-06-18 15:36:55 -07:00
Chen Zhang	a89209b78d	[v1] Support mamba2 (#19327 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-18 20:34:15 +00:00
lkchen	d4629dc43f	[Misc] Add __str__ for RequestStatus (#19780 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-06-18 03:03:01 +00:00
Charlie Fu	a44b1c951d	[Feature][ROCm] Add full graph capture support for TritonAttentionBackend (#19158 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-06-17 17:03:06 -04:00
Wentao Ye	ffb2cd6b54	[Perf] Optimize `moe_align_block_size` CUDA kernel (#19572 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-06-17 11:49:26 -07:00
Isotr0py	ca94d7fa00	[Bugfix] Update multimodel models mapping to fit new checkpoint after Transformers v4.52 (#19151 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-17 15:58:38 +00:00
Driss Guessous	ddfed314f9	Fixes IMA for TP w/ flex-attention (#19712 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-06-17 04:01:50 +00:00
nguyenhoangthuan99	ede5c4ebdf	[Frontend] add chunking audio for > 30s audio (#19597 ) Signed-off-by: nguyenhoangthuan99 <thuanhppro12@gmail.com>	2025-06-17 11:34:00 +08:00
Dipika Sikka	6bc7b57315	[Quantization] Remove FP4 emulation; Fall-back to marlin for device < 100 (#19563 )	2025-06-16 17:33:51 -04:00
qscqesze	387bdf0ab9	[Model] Add support for MiniMaxM1ForCausalLM (shares architecture with MiniMaxText01ForCausalLM) (#19677 ) Signed-off-by: QscQ <qscqesze@gmail.com>	2025-06-16 09:47:14 -07:00
Isotr0py	1173804dca	[Bugfix] Fix TP inference for Flex attention backend (#19657 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-16 11:21:37 +00:00
wang.yuqi	f40f763f12	[CI] Add mteb testing for rerank models (#19344 )	2025-06-16 01:36:43 -07:00
Chengji Yao	a77aea59fd	[TPU] support attention head dim smaller than 128 (#19620 ) Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-06-16 06:40:53 +00:00
Ye (Charlotte) Qi	b692e9cd07	[Misc] Fix skipped max-model-len validation when deriving max model length from tokenizer config (#19660 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-06-16 06:30:29 +00:00
quanliu	92183b41f3	[Bugfix][Core] Prefix caching causes incorrect outputs due to outdated ComputedBlocksTracker (#18957 ) Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn> Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>	2025-06-15 21:56:37 -07:00
22quinn	0b73736a0d	[Kernel] Raise verbose error and consolidate `num_heads/num_kv_heads` divisibility check (#19339 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-15 13:43:48 +08:00
Lu Fang	ee1531bc38	[Bugfix][2/n] Fix speculative decoding CI - Fix test_ngram_e2e_greedy_correctness (#19644 )	2025-06-14 21:15:41 -07:00
Isotr0py	2db9044ab6	[Bugfix] Fix auto dtype casting for BatchFeature (#19316 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-06-14 15:13:08 +00:00
Lu Fang	06be858828	[Bugfix] Fix the speculative decoding test by setting the target dtype (#19633 )	2025-06-13 20:57:32 -07:00
Concurrensee	d65668b4e8	Adding "AMD: Multi-step Tests" to amdproduction. (#19508 ) Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-06-13 17:08:51 -07:00
Luka Govedič	3597b06a4f	[CUDA] Enable full cudagraph for FlashMLA (#18581 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-13 18:12:26 +00:00
Ekagra Ranjan	017ef648e9	[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets (#18847 )	2025-06-12 10:30:56 -07:00
Luka Govedič	f98548b9da	[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com>	2025-06-12 08:31:04 -07:00
mobicham	96846bb360	Fix TorchAOConfig skip layers (#19265 ) Signed-off-by: mobicham <hicham@mobiuslabs.com>	2025-06-12 22:22:53 +08:00
Wentao Ye	b6efafd9e4	[Perf] Vectorize static / dynamic INT8 quant kernels (#19233 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-12 06:51:41 -07:00
jmswen	c9280e6346	[Bugfix] Respect num-gpu-blocks-override in v1 (#19503 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-12 11:00:23 +00:00
Nick Hill	d5bdf899e4	[BugFix] Work-around incremental detokenization edge case error (#19449 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-12 06:43:20 +00:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00

1 2 3 4 5 ...

2164 Commits