elvischenv
|
83156c7b89
|
[NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel (#22095)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-08-05 02:45:34 -07:00 |
|
lkchen
|
f4f4e7ef27
|
[V0 deprecation][P/D] Deprecate v0 KVConnectorBase code (1/2) (#21785)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-08-04 19:11:33 -07:00 |
|
Isotr0py
|
3dddbf1f25
|
[Misc] Add tensor schema test coverage for multimodal models (#21754)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-03 00:52:14 -07:00 |
|
Michael Goin
|
88faa466d7
|
[CI] Initial tests for SM100 Blackwell runner (#21877)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-01 16:18:38 -07:00 |
|
Charent
|
ad57f23f6a
|
[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache (#20873)
Signed-off-by: charent <19562666+charent@users.noreply.github.com>
|
2025-07-31 19:48:13 -07:00 |
|
Ilya Markov
|
6e672daf62
|
Add FlashInfer allreduce RMSNorm Quant fusion (#21069)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-31 13:58:38 -07:00 |
|
Alexei-V-Ivanov-AMD
|
0780bb5783
|
Removing amdproduction Tests (#22027)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-07-31 09:53:27 -07:00 |
|
Simon Mo
|
452b2a3180
|
[ci] mark blackwell test optional for now (#21878)
|
2025-07-29 18:03:27 -07:00 |
|
Simon Mo
|
0d0cc9e150
|
[ci] add b200 test placeholder (#21866)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-07-29 17:11:50 -07:00 |
|
Reza Barazesh
|
37efc63b64
|
[V0 deprecation] Guided decoding (#21347)
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-29 03:15:30 -07:00 |
|
Michael Goin
|
afa2607596
|
[CI] Parallelize Kernels MoE Test (#21764)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-28 18:56:24 -07:00 |
|
Robert Shaw
|
d5b981f8b1
|
[DP] Internal Load Balancing Per Node [one-pod-per-node] (#21238)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-23 20:57:32 -07:00 |
|
Ming Yang
|
772ce5af97
|
[Misc] Add dummy maverick test to CI (#21324)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-23 20:22:42 -07:00 |
|
Nick Hill
|
316b1bf706
|
[Tests] Add tests for headless internal DP LB (#21450)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-23 07:49:25 -07:00 |
|
Alexei-V-Ivanov-AMD
|
107111a859
|
Changing "amdproduction" allocation. (#21409)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-07-22 20:48:31 -07:00 |
|
Cyrus Leung
|
c401c64b4c
|
[CI/Build] Fix model executor tests (#21387)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-22 20:25:37 -07:00 |
|
Michael Goin
|
005ae9be6c
|
Fix bad lm-eval fork (#21318)
|
2025-07-21 10:47:51 -07:00 |
|
Seiji Eicher
|
d1fb65bde3
|
Enable v1 metrics tests (#20953)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-07-20 03:22:02 +00:00 |
|
Woosuk Kwon
|
dd572c0ab3
|
[V0 Deprecation] Remove V0 Spec Decode workers (#21152)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-18 21:47:50 -07:00 |
|
Cyrus Leung
|
c847e34b39
|
[CI/Build] Fix wrong path in Transformers Nightly Models Test (#20994)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-15 08:53:16 -07:00 |
|
Michael Goin
|
946aadb4a0
|
[CI/Build] Split Entrypoints Test into LLM and API Server (#20945)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-15 02:44:18 +00:00 |
|
Isotr0py
|
6d0cf239c6
|
[CI/Build] Add Transformers nightly tests in CI (#20924)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 16:33:17 +00:00 |
|
shineran96
|
4bed167768
|
[Model][VLM] Support JinaVL Reranker (#20260)
Signed-off-by: shineran96 <shinewang96@gmail.com>
|
2025-07-10 10:43:43 -07:00 |
|
Alexei-V-Ivanov-AMD
|
536fd33003
|
[CI] Trimming some failing test groups from AMDPRODUCTION. (#20390)
|
2025-07-03 08:21:31 -07:00 |
|
Nick Hill
|
657f2f301a
|
[DP] Support external DP Load Balancer mode (#19790)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-02 10:21:52 -07:00 |
|
Thomas Parnell
|
8615d9776f
|
[CI/Build] Add new CI job to validate Hybrid Models for every PR (#20147)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-27 23:00:25 -07:00 |
|
Yang Wang
|
8b64c895c0
|
[CI] Sync test dependency with test.in for torch nightly (#19632)
Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-26 20:55:25 -07:00 |
|
Bowen Wang
|
e9fd658a73
|
[Feature] Expert Parallelism Load Balancer (EPLB) (#18343)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
|
2025-06-26 15:30:21 -07:00 |
|
Nick Hill
|
c40692bf9a
|
[Misc] Add parallel state node_count function (#20045)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-25 13:38:53 -07:00 |
|
Nick Hill
|
8619e7158c
|
[BugFix] Fix multi-node offline data parallel (#19937)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-24 12:45:20 -07:00 |
|
kourosh hakhamaneshi
|
5e666f72cd
|
[Bugfix][Ray] Set the cuda context eagerly in the ray worker (#19583)
|
2025-06-19 22:01:16 -07:00 |
|
Alexei-V-Ivanov-AMD
|
4719460644
|
Fixing Chunked Prefill Test. (#19762)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-06-19 01:36:16 -07:00 |
|
Concurrensee
|
d65668b4e8
|
Adding "AMD: Multi-step Tests" to amdproduction. (#19508)
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-06-13 17:08:51 -07:00 |
|
kourosh hakhamaneshi
|
e6aab5de29
|
Revert "[Build/CI] Add tracing deps to vllm container image (#15224)" (#19378)
|
2025-06-12 17:26:40 -07:00 |
|
Luka Govedič
|
f98548b9da
|
[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-12 08:31:04 -07:00 |
|
Jerry Zhang
|
c8134bea15
|
Fix AOPerModuleConfig name changes (#18869)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-06-05 18:51:32 -07:00 |
|
Woosuk Kwon
|
b124e1085b
|
[Bugfix] Fix FA3 full cuda graph correctness (#19106)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-03 23:10:15 -07:00 |
|
Yan Ru Pei
|
b712be98c7
|
feat: add data parallel rank to KVEventBatch (#18925)
|
2025-06-03 17:14:20 -07:00 |
|
Concurrensee
|
4ce42f9204
|
Adding "LoRA Test %N" to AMD production tests (#18929)
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
|
2025-06-02 20:46:44 -07:00 |
|
Nick Hill
|
2dbe8c0774
|
[Perf] API-server scaleout with many-to-many server-engine comms (#17546)
|
2025-05-30 08:17:00 -07:00 |
|
Rabi Mishra
|
5f1d0c8118
|
[Bugfix][Failing Test] Fix test_vllm_port.py (#18618)
Signed-off-by: rabi <ramishra@redhat.com>
|
2025-05-30 17:13:47 +08:00 |
|
Rabi Mishra
|
b78f844a67
|
[Bugfix][FailingTest]Fix test_model_load_with_params.py (#18758)
Signed-off-by: rabi <ramishra@redhat.com>
|
2025-05-28 05:42:54 +00:00 |
|
Mark McLoughlin
|
06a0338015
|
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-27 09:37:06 +00:00 |
|
Cyrus Leung
|
82e2339b06
|
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:38:04 -07:00 |
|
Isotr0py
|
0877750029
|
[CI/Build] Split pooling and generation extended language models tests in CI (#18705)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-26 04:00:08 -07:00 |
|
Michael Goin
|
0ddf88e16e
|
[CI] Enable test_initialization to run on V1 (#16736)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-23 15:09:44 -07:00 |
|
Cyrus Leung
|
6dd51c7ef1
|
[CI/Build] Fix V1 flag being set in entrypoints tests (#18598)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-23 05:51:53 -07:00 |
|
Harry Mellor
|
a1fe24d961
|
Migrate docs from Sphinx to MkDocs (#18145)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-23 02:09:53 -07:00 |
|
cascade
|
71ea614d4a
|
[Feature]Add async tensor parallelism using compilation pass (#17882)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-23 01:03:34 -07:00 |
|
Sanger Steel
|
c32e249a23
|
[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
|
2025-05-22 18:44:18 -07:00 |
|