Woosuk Kwon
|
dd572c0ab3
|
[V0 Deprecation] Remove V0 Spec Decode workers (#21152)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-18 21:47:50 -07:00 |
|
XiongfeiWei
|
58760e12b1
|
[TPU] Start using python 3.12 (#21000)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-07-16 19:37:44 -07:00 |
|
Chendi.Xue
|
e9534c7202
|
[CI][HPU] update for v0 deprecate by switching to VLLM_TARGET_DEVICE=empty (#21006)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
|
2025-07-15 20:07:05 -07:00 |
|
Cyrus Leung
|
c847e34b39
|
[CI/Build] Fix wrong path in Transformers Nightly Models Test (#20994)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-15 08:53:16 -07:00 |
|
Michael Goin
|
946aadb4a0
|
[CI/Build] Split Entrypoints Test into LLM and API Server (#20945)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-15 02:44:18 +00:00 |
|
Isotr0py
|
6d0cf239c6
|
[CI/Build] Add Transformers nightly tests in CI (#20924)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 16:33:17 +00:00 |
|
QiliangCui
|
c66e38ea4c
|
[Test] Remove docker build from test. (#20542)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-10 11:21:58 -07:00 |
|
shineran96
|
4bed167768
|
[Model][VLM] Support JinaVL Reranker (#20260)
Signed-off-by: shineran96 <shinewang96@gmail.com>
|
2025-07-10 10:43:43 -07:00 |
|
Michael Goin
|
1a4f35e2ea
|
Normalize lm-eval command between baseline and correctness test (#18560)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-10 13:27:32 +00:00 |
|
Kunshang Ji
|
b6e7e3d58f
|
[Intel GPU] support ray as distributed executor backend for XPU. (#20659)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-07-09 00:36:58 -07:00 |
|
Li, Jiang
|
7721ef1786
|
[CI/Build][CPU] Fix CPU CI and remove all CPU V0 files (#20560)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-07 22:13:44 -07:00 |
|
Liangliang Ma
|
2c5ebec064
|
[XPU][CI] add v1/core test in xpu hardware ci (#20537)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
|
2025-07-07 01:16:40 -07:00 |
|
Cyrus Leung
|
9fb52e523a
|
[V1] Support any head size for FlexAttention backend (#20467)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-06 09:54:36 -07:00 |
|
Woosuk Kwon
|
e202dd2736
|
[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-07-06 08:48:13 -07:00 |
|
Peter Pan
|
5561681d04
|
[CI] add kvcache-connector dependency definition and add into CI build (#18193)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-07-04 06:49:18 -07:00 |
|
Alexei-V-Ivanov-AMD
|
536fd33003
|
[CI] Trimming some failing test groups from AMDPRODUCTION. (#20390)
|
2025-07-03 08:21:31 -07:00 |
|
Li, Jiang
|
7f0367109e
|
[CI/Build][CPU] Enable cross compilation in CPU release pipeline (#20423)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-03 05:26:12 -07:00 |
|
QiliangCui
|
4ff61ababa
|
[TPU] Add a case to cover RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 (#20385)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-03 06:46:41 +00:00 |
|
Louie Tsai
|
9965c47d0d
|
Enable CPU nightly performance benchmark and its Markdown report (#18444)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-07-02 17:50:25 -07:00 |
|
Nick Hill
|
657f2f301a
|
[DP] Support external DP Load Balancer mode (#19790)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-02 10:21:52 -07:00 |
|
Li, Jiang
|
6cc1e7d96d
|
[CPU] Update custom ops for the CPU backend (#20255)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-01 07:25:03 +00:00 |
|
Chendi.Xue
|
a2f14dc8f9
|
[CI][Intel Gaudi][vllm-Plugin]Add CI for hpu-plugin-v1-test (#20196)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-07-01 04:17:07 +00:00 |
|
Thomas Parnell
|
8615d9776f
|
[CI/Build] Add new CI job to validate Hybrid Models for every PR (#20147)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-27 23:00:25 -07:00 |
|
Yang Wang
|
8b64c895c0
|
[CI] Sync test dependency with test.in for torch nightly (#19632)
Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-26 20:55:25 -07:00 |
|
Bowen Wang
|
e9fd658a73
|
[Feature] Expert Parallelism Load Balancer (EPLB) (#18343)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
|
2025-06-26 15:30:21 -07:00 |
|
Chengji Yao
|
04e1642e32
|
[TPU] add kv cache update kernel (#19928)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-26 10:01:37 -07:00 |
|
Kunshang Ji
|
b69781f107
|
[Hardware][Intel GPU] Add v1 Intel GPU support with Flash attention backend. (#19560)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-06-26 09:27:18 -07:00 |
|
QiliangCui
|
4e0db57fff
|
Fix the path to the testing script. (#20082)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-06-25 20:48:17 +00:00 |
|
Nick Hill
|
c40692bf9a
|
[Misc] Add parallel state node_count function (#20045)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-25 13:38:53 -07:00 |
|
Nick Hill
|
8619e7158c
|
[BugFix] Fix multi-node offline data parallel (#19937)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-24 12:45:20 -07:00 |
|
QiliangCui
|
a738dbb2a1
|
Update test case parameter to have the throughput above 8.0 (#19994)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-06-24 00:18:10 +00:00 |
|
22quinn
|
a3bc76e4b5
|
[CI/Build] Push latest tag for cpu and neuron docker image (#19897)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-23 14:15:37 -07:00 |
|
Lukas Geiger
|
c3649e4fee
|
[Docs] Fix syntax highlighting of shell commands (#19870)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-23 17:59:09 +00:00 |
|
kourosh hakhamaneshi
|
5e666f72cd
|
[Bugfix][Ray] Set the cuda context eagerly in the ray worker (#19583)
|
2025-06-19 22:01:16 -07:00 |
|
Elaine Zhao
|
b6bad3d186
|
[CI][Neuron] Fail and exit on first error (#19622)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-20 12:27:51 +08:00 |
|
Alexei-V-Ivanov-AMD
|
4719460644
|
Fixing Chunked Prefill Test. (#19762)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-06-19 01:36:16 -07:00 |
|
Concurrensee
|
d65668b4e8
|
Adding "AMD: Multi-step Tests" to amdproduction. (#19508)
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-06-13 17:08:51 -07:00 |
|
Li, Jiang
|
6458721108
|
[CPU] Refine default config for the CPU backend (#19539)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-13 13:27:39 +08:00 |
|
kourosh hakhamaneshi
|
e6aab5de29
|
Revert "[Build/CI] Add tracing deps to vllm container image (#15224)" (#19378)
|
2025-06-12 17:26:40 -07:00 |
|
Luka Govedič
|
f98548b9da
|
[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-12 08:31:04 -07:00 |
|
Li, Jiang
|
e4248849ec
|
[BugFix][CPU] Fix CPU CI by ignore collecting test_pixtral (#19411)
Signed-off-by: jiang.li <jiang1.li@intel.com>
|
2025-06-10 12:02:40 +00:00 |
|
Reid
|
12e5829221
|
[doc] improve ci doc (#19307)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-09 07:26:12 +00:00 |
|
Aaruni Aggarwal
|
c4296b1a27
|
[CI][PowerPC] Use a more appropriate way to select testcase in tests/models/language/pooling/test_embedding.py (#19253)
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>
|
2025-06-07 11:52:52 +08:00 |
|
QiliangCui
|
66c508b137
|
[TPU][Test] Add script to run benchmark on TPU for buildkite (#19039)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-06-06 20:10:24 -07:00 |
|
Nishidha
|
94ecee6282
|
Fixed ppc build when it runs on non-RHEL based linux distros (#18422)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>
Signed-off-by: npanpaliya <nishidha.panpaliya@partner.ibm.com>
Co-authored-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>
|
2025-06-06 11:54:26 -07:00 |
|
Jerry Zhang
|
c8134bea15
|
Fix AOPerModuleConfig name changes (#18869)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-06-05 18:51:32 -07:00 |
|
Simon Mo
|
da40380214
|
[Build] Annotate wheel and container path for release workflow (#19162)
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-04 23:24:56 -07:00 |
|
Siyuan Liu
|
7ee2590478
|
[TPU] Update dynamo dump file name in compilation test (#19108)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-06-04 16:13:43 -04:00 |
|
Siyuan Liu
|
8e972d9c44
|
[TPU] Skip hanging tests (#19115)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-06-04 01:43:00 -07:00 |
|
Woosuk Kwon
|
b124e1085b
|
[Bugfix] Fix FA3 full cuda graph correctness (#19106)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-03 23:10:15 -07:00 |
|