biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Michael Goin	4fc722eca4	[Kernel/Quant] Remove AQLM (#22943 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-08-16 19:38:21 +00:00
Eli Uriegas	76144adf76	ci: Add CUDA + arm64 release builds (#21201 ) Signed-off-by: Eli Uriegas <eliuriegas@meta.com>	2025-08-15 23:16:23 +00:00
Michael Goin	a344a1a7da	Use regex in convert-results-json-to-markdown.py (#22989 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-08-15 20:54:20 +00:00
bnellnm	8ad7285ea2	[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-15 14:46:00 -04:00
Harry Mellor	e8b40c7fa2	[CI] Remove duplicated docs build from buildkite (#22924 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-15 05:58:06 -07:00
nvjullin	279a5f31b3	[Kernel] Add nvfp4 gemm flashinfer backends (#22346 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-14 16:03:55 -04:00
Louie Tsai	00e3f9da46	vLLM Benchmark suite improvement (#22119 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2025-08-14 07:12:17 +00:00
Woosuk Kwon	71683ca6f6	[V0 Deprecation] Remove multi-step scheduling (#22138 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:18:39 -07:00
Harry Mellor	839ab00349	Re-enable Xet on TPU tests now that `hf_xet` has been updated (#22666 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 19:54:40 -07:00
Cyrus Leung	ebf7605b0d	[Misc] Move tensor schema tests (#22612 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-11 00:15:27 -07:00
22quinn	b799f4b9ea	[CI/Build] Fix tensorizer test for load_format change (#22583 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-10 19:30:00 -07:00
Kyuyeun Kim	9a0c5ded5a	[TPU] Add support for online w8a8 quantization (#22425 ) Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>	2025-08-08 23:12:54 -07:00
Thomas Parnell	8a0ffd6285	Remove mamba_ssm from vLLM requirements; install inside test container using `--no-build-isolation` (#22541 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-08 23:05:32 -07:00
Andrew Chan	35171b1172	[Doc] update docs for nightly benchmarks (#12022 ) Signed-off-by: Andrew Chan <andrewkchan.akc@gmail.com>	2025-08-07 00:29:45 -07:00
Siyuan Liu	4b29d2784b	[CI][TPU] Fix docker clean up (#22271 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-08-05 23:54:56 +00:00
elvischenv	83156c7b89	[NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel (#22095 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-08-05 02:45:34 -07:00
lkchen	f4f4e7ef27	[V0 deprecation][P/D] Deprecate v0 `KVConnectorBase` code (1/2) (#21785 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-08-04 19:11:33 -07:00
Isotr0py	3dddbf1f25	[Misc] Add tensor schema test coverage for multimodal models (#21754 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-03 00:52:14 -07:00
Roger Wang	067c34a155	docs: remove deprecated disable-log-requests flag (#22113 ) Signed-off-by: Roger Wang <hey@rogerw.me>	2025-08-02 00:19:48 -07:00
Michael Goin	88faa466d7	[CI] Initial tests for SM100 Blackwell runner (#21877 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-01 16:18:38 -07:00
Harry Mellor	2d7b09b998	Deprecate `--disable-log-requests` and replace with `--enable-log-requests` (#21739 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 17:16:37 +01:00
Charent	ad57f23f6a	[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache (#20873 ) Signed-off-by: charent <19562666+charent@users.noreply.github.com>	2025-07-31 19:48:13 -07:00
Ilya Markov	6e672daf62	Add FlashInfer allreduce RMSNorm Quant fusion (#21069 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-07-31 13:58:38 -07:00
Alexei-V-Ivanov-AMD	0780bb5783	Removing amdproduction Tests (#22027 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-07-31 09:53:27 -07:00
Daniele	d2aab336ad	[CI/Build] get rid of unused VLLM_FA_CMAKE_GPU_ARCHES (#21599 ) Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>	2025-07-31 15:00:08 +08:00
Louie Tsai	6f8d261882	Update vLLM Benchmark Suite for Xeon based on 0.9.2 release (#21486 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com>	2025-07-30 05:57:03 +00:00
Harry Mellor	ba5c5e5404	[Docs] Switch to better markdown linting pre-commit hook (#21851 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-29 19:45:08 -07:00
Simon Mo	452b2a3180	[ci] mark blackwell test optional for now (#21878 )	2025-07-29 18:03:27 -07:00
Simon Mo	0d0cc9e150	[ci] add b200 test placeholder (#21866 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-07-29 17:11:50 -07:00
Reza Barazesh	37efc63b64	[V0 deprecation] Guided decoding (#21347 ) Signed-off-by: Reza Barazesh <rezabarazesh@meta.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-29 03:15:30 -07:00
Michael Goin	afa2607596	[CI] Parallelize Kernels MoE Test (#21764 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-28 18:56:24 -07:00
Li, Jiang	65e8466c37	[Bugfix] Fix environment variable setting in CPU Dockerfile (#21730 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-28 11:02:39 +00:00
Ye (Charlotte) Qi	01a395e9e7	[CI/Build][Doc] Clean up more docs that point to old bench scripts (#21667 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-07-27 04:02:12 +00:00
Ye (Charlotte) Qi	e7c4f9ee86	[CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI (#21355 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-07-26 07:10:14 -07:00
Huy Do	e98def439c	[Take 2] Correctly kill vLLM processes after benchmarks (#21646 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-07-26 06:06:05 -07:00
QiliangCui	7728dd77bb	[TPU][Test] Divide TPU v1 Test into 2 parts. (#21431 )	2025-07-26 06:20:30 +00:00
Huy Do	a55c95096b	Correctly kill vLLM processes after finishing serving benchmarks (#21641 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-07-25 19:06:21 -07:00
QiliangCui	07d80d7b0e	[TPU][TEST] HF_HUB_DISABLE_XET=1 the test 3. (#21539 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-24 15:33:04 -07:00
Robert Shaw	d5b981f8b1	[DP] Internal Load Balancing Per Node [`one-pod-per-node`] (#21238 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-07-23 20:57:32 -07:00
Liangliang Ma	13e4ee1dc3	[XPU][UT] increase intel xpu CI test scope (#21492 ) Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>	2025-07-23 20:24:04 -07:00
Ming Yang	772ce5af97	[Misc] Add dummy maverick test to CI (#21324 ) Signed-off-by: Ming Yang <minos.future@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-07-23 20:22:42 -07:00
QiliangCui	14bf19e39f	[TPU][TEST] Fix the downloading issue in TPU v1 test 11. (#21418 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-23 11:29:36 -07:00
Nick Hill	316b1bf706	[Tests] Add tests for headless internal DP LB (#21450 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-23 07:49:25 -07:00
Alexei-V-Ivanov-AMD	107111a859	Changing "amdproduction" allocation. (#21409 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-07-22 20:48:31 -07:00
Cyrus Leung	c401c64b4c	[CI/Build] Fix model executor tests (#21387 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-22 20:25:37 -07:00
Michael Goin	005ae9be6c	Fix bad lm-eval fork (#21318 )	2025-07-21 10:47:51 -07:00
Li, Jiang	a15a50fc17	[CPU] Enable shared-memory based pipeline parallel for CPU backend (#21289 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-21 09:07:08 -07:00
Seiji Eicher	d1fb65bde3	Enable v1 metrics tests (#20953 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-07-20 03:22:02 +00:00
Woosuk Kwon	752c6ade2e	[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small (#21217 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-19 13:53:17 -07:00
Li, Jiang	e3a0e43d7f	[bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code (#21032 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-19 05:13:55 -07:00

1 2 3 4 5 ...

707 Commits