wwl2755
|
99f536f830
|
[Misc] Enhance warning information to user-defined chat template (#15408)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-03-26 02:21:15 -07:00 |
|
vllmellm
|
5ebf66748b
|
[FEAT][ROCm] Integrate Fused MoE Kernels from AITER (#14967)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-26 16:30:30 +08:00 |
|
Cyrus Leung
|
997c8811d6
|
[Model] Support multi-image for Molmo (#15438)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-26 11:26:33 +08:00 |
|
Harry Mellor
|
e42389f9d7
|
Transformers backend already supports V1 (#15463)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-25 20:26:16 -07:00 |
|
Varun Sundar Rabindranath
|
ff38f0a32c
|
[CI/Build] LoRA: Delete long context tests (#15503)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-25 17:18:34 -07:00 |
|
Chenyaaang
|
ac3cd6e83c
|
[core] add bucket padding to tpu_model_runner (#14995)
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-25 17:27:22 -04:00 |
|
Lu Fang
|
082ab86f5f
|
[V1] Support long_prefill_token_threshold in v1 scheduler (#15419)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-25 14:22:26 -07:00 |
|
yarongmu-google
|
0a049c7d86
|
[CI/Build] Add tests for the V1 tpu_model_runner. (#14843)
Signed-off-by: Yarong Mu <ymu@google.com>
|
2025-03-25 12:27:16 -04:00 |
|
Cyrus Leung
|
a9e879b316
|
[Misc] Clean up MiniCPM-V/O code (#15337)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-25 10:22:52 +00:00 |
|
Thien Tran
|
4f044b1d67
|
[Kernel][CPU] CPU MLA (#14744)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-03-25 09:34:59 +00:00 |
|
Russell Bryant
|
a09ad90a72
|
[V1] guidance backend for structured output + auto fallback mode (#14779)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
|
2025-03-24 21:02:33 -07:00 |
|
Harry Mellor
|
97cfa65df7
|
Add pipeline parallel support to TransformersModel (#12832)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-03-25 10:41:45 +08:00 |
|
Woosuk Kwon
|
ebcebeeb6b
|
[V1][Spec Decode] Enable spec decode for top-p & top-k sampling (#15063)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-24 17:16:46 -07:00 |
|
Gregory Shtrasberg
|
f533b5837f
|
[ROCm][Kernel] MoE weights padding (#14454)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: charlifu <charlifu@amd.com>
|
2025-03-24 23:45:30 +00:00 |
|
Gregory Shtrasberg
|
8279201ce6
|
[Build] Cython compilation support fix (#14296)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-03-24 23:37:54 +00:00 |
|
Siyuan Liu
|
23fdab00a8
|
[Hardware][TPU] Skip failed compilation test (#15421)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-03-24 23:28:57 +00:00 |
|
Nick Hill
|
9d72daf4ce
|
[V1][Perf] Simpler request output queues (#15156)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-24 22:44:08 +00:00 |
|
Manish Sethi
|
761702fd19
|
[Core] Integrate fastsafetensors loader for loading model weights (#10647)
Signed-off-by: Manish Sethi <Manish.sethi1@ibm.com>
|
2025-03-24 08:08:02 -07:00 |
|
Cyrus Leung
|
cbcdf2c609
|
[Bugfix] Fix chat template loading (#15143)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-24 13:50:09 +00:00 |
|
Jinzhen Lin
|
6b3cc75be0
|
[Kernel] allow non-contiguous input for marlin kernel (#14658)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-03-24 09:21:33 -04:00 |
|
Luka Govedič
|
f622dbcf39
|
[Fix] [torch.compile] Improve UUID system for custom passes (#15249)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-03-24 01:54:07 +00:00 |
|
Robin
|
d6cd59f122
|
[Frontend] Support tool calling and reasoning parser (#14511)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-03-23 14:00:07 -07:00 |
|
Woosuk Kwon
|
b9bd76ca14
|
[V1][Spec Decode] Respect prompt_lookup_max (#15348)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-23 10:41:44 -07:00 |
|
youkaichao
|
f68cce8e64
|
[ci/build] fix broken tests in LLM.collective_rpc (#15350)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-23 14:49:48 +08:00 |
|
shangmingc
|
50c9636d87
|
[V1][Usage] Refactor speculative decoding configuration and tests (#14434)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-22 19:28:10 -10:00 |
|
Russell Bryant
|
b877031d80
|
Remove openvino support in favor of external plugin (#15339)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-22 14:06:39 -07:00 |
|
Russell Bryant
|
eb63ea1e18
|
[V1] Add disable-any-whitespace option support for xgrammar (#15316)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-22 15:56:17 +00:00 |
|
Naitong Yu
|
2f4bd358f1
|
[Model] Support Tele-FLM Model (#15023)
Signed-off-by: Naitong Yu <ntyu@baai.ac.cn>
Signed-off-by: jiangxin <horizon94@outlook.com>
Co-authored-by: Jason Fang <jasonfang3900@gmail.com>
Co-authored-by: jiangxin <horizon94@outlook.com>
|
2025-03-22 02:04:44 -07:00 |
|
Varun Sundar Rabindranath
|
8a8b30eac1
|
[Bugfix] LoRA V0 - Fix case where max_num_seqs is between cudagraph capture sizes (#15308)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-22 02:03:32 -07:00 |
|
TJian
|
ec870fba9a
|
[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature (#14959)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-21 22:36:14 -07:00 |
|
Nicolò Lucchesi
|
cfbb8c930f
|
[TPU][V1] MHA Pallas backend (#15288)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-21 08:50:39 -07:00 |
|
Cyrus Leung
|
baec0d4de9
|
Revert "[Feature] specify model in config.yaml (#14855)" (#15293)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-21 08:30:23 -07:00 |
|
Chen Zhang
|
93a00d7dde
|
[v1] Refactor KVCacheConfig (#14079)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-03-21 04:56:27 -07:00 |
|
Isotr0py
|
8afcd0f633
|
[Bugfix] Fix broken kernel test due to missing rename for v1 Triton backend (#15282)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-21 11:42:06 +00:00 |
|
Wei Zeng
|
0fa3970deb
|
[Feature] specify model in config.yaml (#14855)
Signed-off-by: weizeng <weizeng@roblox.com>
|
2025-03-21 00:26:03 -07:00 |
|
Nick Hill
|
da6ea29f7a
|
[V1] Avoid redundant input processing in n>1 case (#14985)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 22:24:10 -07:00 |
|
Hyesoo Yang
|
47195057e9
|
[V1][TPU] Speed up top-k on TPU by using torch.topk (#15242)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
|
2025-03-20 19:19:40 -07:00 |
|
Isotr0py
|
1e508343e1
|
[Bugfix] Fix incorrect qwen2.5-vl attention mask pre-computation (#15200)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-20 19:18:04 -07:00 |
|
Varun Sundar Rabindranath
|
0cfe7d386d
|
[CI/Build] LoRA : make add_lora_test safer (#15181)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-21 09:28:53 +08:00 |
|
Woosuk Kwon
|
0c6f5023c3
|
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface (#15250)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 17:50:43 -07:00 |
|
Jason
|
d8e82bc06d
|
[Bugfix] fix V1 Engine crash while handling requests with duplicate request id (#15043)
Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>
|
2025-03-20 10:01:02 -07:00 |
|
Chi Zhang
|
086b56824c
|
[ci] feat: make the test_torchrun_example run with tp=2, external_dp=2 (#15172)
Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-21 00:30:04 +08:00 |
|
Matt Ritter
|
a8652f4f0f
|
Enable CUDA graph support for llama 3.2 vision (#14917)
Signed-off-by: Matt Ritter <100659061+mritterfigma@users.noreply.github.com>
|
2025-03-19 23:29:16 -07:00 |
|
Mickaël Seznec
|
a597a57595
|
[Attention] Flash Attention 3 - fp8 (#14570)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
|
2025-03-20 01:14:20 -04:00 |
|
Russell Bryant
|
1f16b7fe74
|
[Core][V0] Add guidance backend for structured output (#14589)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <lohuynh@microsoft.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-19 21:33:51 -07:00 |
|
Nicolò Lucchesi
|
d8c6d7d6b5
|
[V1][TPU] Support V1 Sampler for ragged attention (#14227)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-19 21:00:39 -07:00 |
|
Jovan Sardinha
|
70e500cad9
|
Fix broken tests (#14713)
Signed-off-by: JovanSardinha <jovan.sardinha@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-03-20 02:06:49 +00:00 |
|
Murali Andoorveedu
|
61c7a1b856
|
[V1] Minor V1 async engine test refactor (#15075)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: andoorve <murali.andoorveedu@mail.utoronto.ca>
Co-authored-by: andoorve <murali.andoorveedu@mail.utoronto.ca>
|
2025-03-19 10:37:17 -07:00 |
|
Cyrus Leung
|
3d446433ec
|
[Bugfix] Fix size calculation of processing cache (#15114)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-19 05:53:19 -07:00 |
|
Cyrus Leung
|
61f412187d
|
[Bugfix] Re-enable Gemma3 for V1 (#14980)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-18 23:58:22 -07:00 |
|