biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jerry Zhang	2048c4e379	[torchao] Support quantization configs using module swap (#21982 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-09-10 23:53:24 -07:00
Hanjie Qiu	dcb28a332b	[Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration (#21078 ) Signed-off-by: hjjq <hanjieq@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-10 15:31:10 -07:00
Russell Bryant	37e8182bfe	[v1] Add Whisper model support (encoder-decoder) (#21088 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com>	2025-09-10 13:53:35 -07:00
wang.yuqi	bd98842c8a	[CI] Add PPL test for generation models (#24485 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-10 06:16:39 -07:00
Ye (Charlotte) Qi	492196ed0e	[CI/Build] split true unit tests to Entrypoints Unit Tests (#24418 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-10 06:16:07 -07:00
Jiangyun Zhu	b8a93076d3	[CI] execute all piecewise compilation tests together (#24502 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-09 11:05:25 -07:00
Sahithi Chigurupati	6910b56da2	[CI] Add nightly multiarch manifests to dockerhub (#24102 ) Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com> Signed-off-by: Simon Mo <simon.mo@hey.com> Signed-off-by: simon-mo <simon.mo@hey.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-09-09 01:18:09 +00:00
Woosuk Kwon	4172235ab7	[V0 deprecation] Deprecate V0 Neuron backend (#21159 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-06 16:15:18 -07:00
yzds	ac201a0eaf	[Feature] Support Decode Context Parallel (DCP) for MLA (#23734 ) Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-06 13:24:05 +08:00
Rafael Vasquez	c954c6629c	[CI] Add timeouts to tests (#24260 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-05 17:26:22 -07:00
Louie Tsai	006e7a34ae	Adding int4 and int8 models for CPU benchmarking (#23709 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com>	2025-09-05 20:08:50 +08:00
elvischenv	adc3ddb430	[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-04 14:25:45 -07:00
Kunshang Ji	16ded21eeb	[XPU] support Triton Attention backend on Intel GPU (#24149 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-04 20:41:08 +08:00
Lucas Wilkinson	402759d472	[Attention] FlashAttn MLA (#14258 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-04 02:47:59 -07:00
Didier Durand	02d411fdb2	[Doc]: fix typos in Python comments (#24115 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-02 21:14:07 -07:00
youkaichao	42dc59dbac	Update release pipeline post PyTorch 2.8.0 update (#24073 ) Signed-off-by: Huy Do <huydhn@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Huy Do <huydhn@gmail.com>	2025-09-03 10:09:19 +08:00
Matthew Bonanni	2fd1a40a54	[CI/Build] Disable SiluMul NVFP4 quant fusion tests (#24121 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-02 16:50:28 -07:00
Christian Pinto	1cb39dbcdd	[Misc] IO Processor plugins for pooling models (#22820 ) Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2025-08-31 23:07:12 -07:00
Isotr0py	ff0e59d83a	[CI/Build] Improve Tensor Schema tests speed by avoid engine core initialization (#23357 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-31 22:52:20 -07:00
Huy Do	67c14906aa	Update PyTorch to 2.8.0 (#20358 ) Signed-off-by: Huy Do <huydhn@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-29 18:57:35 +08:00
Li, Jiang	ad39106b16	[CPU] Enable data parallel for CPU backend (#23903 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-29 02:19:58 -07:00
Jee Jee Li	b4f9e9631c	[CI/Build] Clean up LoRA test (#23890 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-28 23:28:35 -07:00
Simon Mo	27e88cee74	chore: build release image by default (#23852 ) Signed-off-by: Codex <codex@openai.com>	2025-08-28 13:17:15 -07:00
elvischenv	16a45b3a28	[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671 ) Signed-off-by: jindih <jindih@nvidia.com> Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: jindih <jindih@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedic <lgovedic@redhat.com>	2025-08-28 19:36:50 +00:00
Jean Schmidt	0583578f42	[ci] breaks down V1 Test into 3 groups of approx 30 minutes runtime (#23757 ) Signed-off-by: Jean Schmidt <contato@jschmidt.me>	2025-08-28 08:59:19 -07:00
Li, Jiang	67cee40da0	[CI/Build][Bugfix] Fix Qwen VL tests on CPU (#23818 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-28 11:57:05 +00:00
Alex	f48a9af892	[CI] make all multi-gpu weight loading tests run nightly (#23792 ) Signed-off-by: Alex Yun <alexyun04@gmail.com>	2025-08-27 21:27:36 -07:00
Eli Uriegas	3c0ef769ba	ci: Add arm64 docker build to release pipeline (#23210 ) Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Signed-off-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>	2025-08-27 10:41:48 -07:00
Kunshang Ji	fce10dbed5	[XPU] Add xpu torch.compile support (#22609 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-08-27 05:33:27 +00:00
Didier Durand	7c04779afa	[Doc]: fix various spelling issues in multiple files (#23636 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-26 14:05:29 +00:00
nvjullin	f66673a39d	[Kernel] Added flashinfer fp8 per-tensor gemms (#22895 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-26 06:54:04 -07:00
Michael Goin	906e461ed6	[CI Fix] Pin deepep and pplx tags in tools/ep_kernels/, gate multigpu tests (#23568 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-25 18:29:00 -07:00
Pate Motter	c34c82b7fe	[TPU][Bugfix] Fixes prompt_token_ids error in tpu tests. (#23574 ) Signed-off-by: Pate Motter <patemotter@google.com>	2025-08-25 14:29:16 -07:00
Didier Durand	47455c424f	[Doc: ]fix various typos in multiple files (#23487 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-25 00:04:04 +00:00
Zhewen Li	0483fabc74	[CI/Build] add EP dependencies to docker (#21976 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-08-22 13:34:40 -07:00
Naman Lalit	ebe14621e3	[Bug fix] Dynamically setting the backend variable for genai_perf_tests in the run-nightly-benchmark script (#23375 ) Signed-off-by: Naman Lalit <nl2688@nyu.edu>	2025-08-22 15:12:28 +00:00
Cyrus Leung	8896eb72eb	[Deprecation] Remove `prompt_token_ids` arg fallback in `LLM.generate` and `LLM.embed` (#18800 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-22 10:56:57 +08:00
22quinn	480bdf5a7b	[Core] Support custom executor qualname (#23314 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-22 09:40:54 +08:00
Lain	f8ce022948	add tg-mxfp4-moe-test (#22540 ) Signed-off-by: siyuanf <siyuanf@nvidia.com> Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-21 17:05:47 +00:00
youkaichao	e0b056e443	[ci/build] Fix abi tag for aarch64 (#23329 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-08-21 23:32:55 +08:00
Michael Goin	f64ee61d9e	[CI] Block the cu126 wheel build while broken (#23285 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-21 04:21:05 +00:00
QiliangCui	8993073dc1	[CI] Delete images older than 24h. (#23291 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-08-20 21:15:20 -07:00
Cyrus Leung	2461d9e562	[CI/Build] Split out mm processor tests (#23260 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-20 20:05:20 -07:00
Li, Jiang	7be5d113d8	[CPU] Refactor CPU W8A8 scaled_mm (#23071 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-21 09:34:24 +08:00
youkaichao	1b125004be	[misc] fix multiple arch wheels for the nightly index (#23110 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-08-20 14:15:34 -07:00
Michael Goin	0cdbf5e61c	[Kernel/Quant] Remove the original marlin format and qqq (#23204 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-20 15:13:36 -04:00
Yong Hoon Shin	dfd2382039	[torch.compile] Support conditional torch.compile per module (#22269 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-20 16:52:59 +00:00
Michael Goin	50df09fe13	Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image (#23129 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-20 08:05:54 -04:00
Louie Tsai	941f56858a	Fix a performance comparison issue in Benchmark Suite (#23047 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2025-08-20 03:14:32 +00:00
Michael Goin	0f4f0191d8	[CI/Build] Replace lm-eval gsm8k tests with faster implementation (#23002 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-19 15:07:30 -07:00

1 2 3 4 5 ...

713 Commits