Caleb_Du
|
57c22e57f9
|
Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-07-27 07:08:00 -07:00 |
|
Wentao Ye
|
bda9d0535f
|
[Refactor] Refactor MOE NVFP4 Code Base: ModelOpt + Compressed Tensor (#21631)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-27 05:25:21 -07:00 |
|
Isotr0py
|
3d847a3125
|
[VLM] Add video support for Intern-S1 (#21671)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-27 11:49:43 +00:00 |
|
Isotr0py
|
eed2f463b2
|
[VLM] Support HF format Phi-4-MM model (#17121)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-26 20:07:57 -07:00 |
|
Benji Beck
|
3339cba3ff
|
Migrate FuyuImagePatchInputs to TensorSchema (#21662)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-26 19:34:14 -07:00 |
|
Maximilien de Bayser
|
1cd6eaba54
|
Support encoder-only models without KV-Cache (#21270)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-07-26 21:09:52 +08:00 |
|
Isotr0py
|
f27fdfc3ed
|
[Bugfix] Investigate Qwen2-VL failing test (#21527)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-26 06:09:29 -07:00 |
|
Lyu Han
|
875af38e01
|
Support Intern-S1 (#21628)
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-26 19:14:04 +08:00 |
|
Farzad Abdolhosseini
|
62965de5fe
|
[Model] Ultravox: Support Llama 4 and Gemma 3 backends (#17818)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
Signed-off-by: Patrick Li <patrick8289@gmail.com>
Co-authored-by: Patrick Li <patrick8289@gmail.com>
|
2025-07-25 18:12:31 -07:00 |
|
Alex Kogan
|
7ae75fa6d0
|
[Feature] Add support for MoE models in the calibration-free RTN-based quantization (#20766)
Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
|
2025-07-25 18:09:34 -07:00 |
|
Yong Hoon Shin
|
2eddd437ba
|
Add interleaved RoPE test for Llama4 (Maverick) (#21478)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-25 17:07:26 -07:00 |
|
QiliangCui
|
7cfea0df39
|
[TPU][Test] Rollback PR-21550. (#21619)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-25 13:22:01 -07:00 |
|
mgazz
|
e189b50f53
|
Add support for Prithvi in Online serving mode (#21518)
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-25 07:01:27 -07:00 |
|
who who who
|
b3caeb82e7
|
[ROCm][AITER] Enable fp8 kv cache on rocm aiter backend. (#20295)
Signed-off-by: fsx950223 <fsx950223@outlook.com>
Signed-off-by: amd-ruitang3 <Rui.Tang2@amd.com>
Co-authored-by: amd-ruitang3 <Rui.Tang2@amd.com>
|
2025-07-25 06:50:21 -07:00 |
|
bigshanedogg
|
29c6fbe58c
|
[MODEL] New model support for naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B (#20931)
Signed-off-by: bigshanedogg <bigshane319@gmail.com>
|
2025-07-25 06:05:42 -07:00 |
|
Nick Hill
|
e38e96a3c0
|
[Tests] Harden DP tests (#21508)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-25 02:27:24 -07:00 |
|
Chengji Yao
|
40d86ee412
|
[TPU][Bugfix] fix OOM issue in CI test (#21550)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-24 23:01:53 -07:00 |
|
Benji Beck
|
965bc71b04
|
Integrate TensorSchema with shape validation for Phi3VImagePixelInputs (#21232)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-24 21:43:52 -07:00 |
|
QiliangCui
|
e0be2c4d09
|
[TPU][Test] Temporarily suspend this MoE model in test_basic.py. (#21560)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-24 20:44:50 -07:00 |
|
Nick Hill
|
9c8b2c2a8a
|
[DP] Support api-server-count > 0 in hybrid DP LB mode (#21510)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-24 20:18:16 -07:00 |
|
Cyrus Leung
|
34ddcf9ff4
|
[Frontend] run-batch supports V1 (#21541)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-24 20:05:55 -07:00 |
|
QiliangCui
|
07d80d7b0e
|
[TPU][TEST] HF_HUB_DISABLE_XET=1 the test 3. (#21539)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-24 15:33:04 -07:00 |
|
Juncheng Gu
|
6066284914
|
[P/D] Support CPU Transfer in NixlConnector (#18293)
Signed-off-by: Juncheng Gu <juncgu@gmail.com>
Signed-off-by: Richard Liu <ricliu@google.com>
Co-authored-by: Richard Liu <39319471+richardsliu@users.noreply.github.com>
Co-authored-by: Richard Liu <ricliu@google.com>
|
2025-07-24 17:58:42 +01:00 |
|
Rui Qiao
|
1e9ea8e69d
|
[P/D] Move FakeNixlWrapper to test dir (#21328)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-24 08:53:45 -07:00 |
|
Ming Yang
|
2ded067fd2
|
[Bugfix] Fix CUDA arch flags for MoE permute (#21426)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-07-24 03:23:59 -07:00 |
|
Harry Mellor
|
13abd0eaf9
|
[Model] Officially support Emu3 with Transformers backend (#21319)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-24 03:22:12 -07:00 |
|
Lucas Wilkinson
|
61b8cea3b4
|
[Attention] Optimize FlashInfer MetadataBuilder Build call (#21137)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-07-24 03:21:46 -07:00 |
|
22quinn
|
610852a423
|
[Core] Support model loader plugins (#21067)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-24 01:49:44 -07:00 |
|
Zhou Fang
|
fc5f756db4
|
[v1][Core] Clean up usages of SpecializedManager (#21407)
Signed-off-by: Zhou Fang <fang.github@gmail.com>
|
2025-07-24 00:40:11 -07:00 |
|
Chengji Yao
|
e74bfc70e4
|
[TPU][Bugfix] fix moe layer (#21340)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-07-24 00:38:39 -07:00 |
|
Julien Denize
|
6d8d0a24c0
|
Add think chunk (#21333)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-07-23 21:51:32 -07:00 |
|
Robert Shaw
|
d5b981f8b1
|
[DP] Internal Load Balancing Per Node [one-pod-per-node] (#21238)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-23 20:57:32 -07:00 |
|
Liangliang Ma
|
13e4ee1dc3
|
[XPU][UT] increase intel xpu CI test scope (#21492)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
|
2025-07-23 20:24:04 -07:00 |
|
Ming Yang
|
772ce5af97
|
[Misc] Add dummy maverick test to CI (#21324)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-23 20:22:42 -07:00 |
|
Michael Goin
|
82ec66f514
|
[V0 Deprecation] Remove Prompt Adapters (#20588)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-23 16:36:48 -07:00 |
|
22quinn
|
5c9b807b34
|
[Core] Add reload_weights RPC method (#20096)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-23 14:24:52 -07:00 |
|
Yong Hoon Shin
|
4ac7713e32
|
Add test case for compiling multiple graphs (#21044)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-23 11:00:47 -07:00 |
|
Christian Pinto
|
8560a5b258
|
[Core][Model] PrithviMAE Enablement on vLLM v1 engine (#20577)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-07-23 11:00:23 -07:00 |
|
Nick Hill
|
316b1bf706
|
[Tests] Add tests for headless internal DP LB (#21450)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-23 07:49:25 -07:00 |
|
Asher
|
2671334d45
|
[Model] add Hunyuan V1 Dense Model support. (#21368)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
|
2025-07-23 03:54:08 -07:00 |
|
Yang Chen
|
6929f8b437
|
[Misc] fixed nvfp4_moe test failures due to invalid kwargs (#21246)
Signed-off-by: Yang Chen <yangche@fb.com>
|
2025-07-23 01:41:43 -07:00 |
|
Yu Chin Fabian Lim
|
32ec9e2f2a
|
Mamba V2 Test not Asserting Failures. (#21379)
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
|
2025-07-23 01:40:27 -07:00 |
|
Lu Fang
|
accac82928
|
[Sampler] Introduce logprobs mode for logging (#21398)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-07-23 01:39:25 -07:00 |
|
Jialin Ouyang
|
a1f3610fc6
|
[Core] Add basic unit test for maybe_evict_cached_block (#21400)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-07-23 00:02:02 -07:00 |
|
Isotr0py
|
4ecedd1806
|
[Bugfix] Fix nightly transformers CI failure (#21427)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-23 00:01:01 -07:00 |
|
Harry Mellor
|
f154bb9ff0
|
Simplify weight loading in Transformers backend (#21382)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-22 20:29:43 -07:00 |
|
Cyrus Leung
|
c401c64b4c
|
[CI/Build] Fix model executor tests (#21387)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-22 20:25:37 -07:00 |
|
Yiheng Xu
|
4594fc3b28
|
[Model] Add Qwen3CoderToolParser (#21396)
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: simon-mo <xmo@berkeley.edu>
|
2025-07-22 15:05:57 -07:00 |
|
Cyrus Leung
|
35366ae57c
|
[CI/Build] Fix test failure due to updated model repo (#21375)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-22 08:39:35 -07:00 |
|
Aritra Roy Gosthipaty
|
2226d5bd85
|
[Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers (#21353)
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
|
2025-07-22 08:27:28 -07:00 |
|