Zhuohan Li
|
8c435c9bce
|
[Core] Enable command line logging for LLMEngine (#25610)
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
|
2025-09-25 15:31:17 -07:00 |
|
Ekagra Ranjan
|
e71b8e210d
|
[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-09-25 15:22:03 -07:00 |
|
Cyrus Leung
|
89fa54e6f7
|
[Optimization] Use a cheaper cache key in get_model_architecture (#25682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 17:54:20 -04:00 |
|
Cyrus Leung
|
3d54bdcb73
|
[Optimization] Streamline InputPreprocessor (#25702)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 21:06:49 +00:00 |
|
Cyrus Leung
|
6b0fcbbf43
|
[Misc] Simplify test_argsort_mm_positions (#25690)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 18:23:01 +00:00 |
|
Jee Jee Li
|
0fa673af4c
|
[V0 deprecation] Clean up LoRA (#25686)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-25 18:12:33 +00:00 |
|
Matthew Bonanni
|
3468f17ebe
|
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-25 17:37:50 +00:00 |
|
Isotr0py
|
71b25b0d48
|
[V0 deprecation] Clean up V0 fallback in compilation config (#25675)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-25 17:29:51 +00:00 |
|
Cyrus Leung
|
0ea80c87d9
|
[Model] Define merge_by_field_config MM interface (#25676)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 17:13:07 +00:00 |
|
Tao Hui
|
b8d9e4a326
|
[Model] Add optional parameter to reasoning parser constructor (#25554)
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-26 01:12:50 +08:00 |
|
Lucas Wilkinson
|
13cc7f5370
|
[BugFix] Fix DBO hang (#25625)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-25 17:04:48 +00:00 |
|
Michael Goin
|
916bd9204d
|
Revert "[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super()" (#25681)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-25 09:45:06 -07:00 |
|
AlonKejzman
|
e04a1b6b21
|
[BUGFIX] Fix crash in Eagle Speculative Decoding models when exceedin… (#24662)
Signed-off-by: AlonKejzman <alonkeizman@gmail.com>
|
2025-09-25 15:40:14 +00:00 |
|
Tyler Michael Smith
|
2e5df88c92
|
[Logging] Remove TORCH_NCCL_AVOID_RECORD_STREAMS to squash a warning (#25532)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-09-25 15:16:06 +00:00 |
|
Nicolò Lucchesi
|
0754ac4c49
|
[Misc] Remove cruft file in repo (#25678)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-25 08:05:12 -07:00 |
|
Isotr0py
|
03858e6d1c
|
[Bugfix] Fix InternS1 video processing after Transformers v4.56 (#25644)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-25 14:46:04 +00:00 |
|
Russell Bryant
|
532a6cfccb
|
[ux] Switch a warning to debug about a pytorch fallback (#23750)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-25 14:38:16 +00:00 |
|
Li, Jiang
|
eb32335e35
|
[CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata (#25652)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-25 13:29:11 +00:00 |
|
Jonas M. Kübler
|
69a8c8e99a
|
[torch.compile] Make Query Quantization Fusable (#24914)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
|
2025-09-25 09:25:12 -04:00 |
|
youkaichao
|
6c340da4df
|
[misc] log info messages by default for hanging / busy / idle (#25627)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-25 21:14:57 +08:00 |
|
Cyrus Leung
|
2f17117606
|
[mypy] Fix wrong type annotations related to tuple (#25660)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 13:00:45 +00:00 |
|
chenlang
|
1e9a77e037
|
[Hardware][RISC-V] Add riscv64 support for vLLM with scalar (#22112)
Signed-off-by: chenlang <chen.lang5@zte.com.cn>
Co-authored-by: chenlang <10346245@zte.com.cn>
|
2025-09-25 20:46:11 +08:00 |
|
Kunshang Ji
|
d2af67441d
|
[XPU][Triton]add xpu config in triton_reshape_and_cache_flash (#25643)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-09-25 12:38:11 +00:00 |
|
Cyrus Leung
|
0bcc3a160d
|
[CI/Build] Fix flaky entrypoints test (#25663)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 12:19:40 +00:00 |
|
Harry Mellor
|
70fbdb26e9
|
Add backward compatibility for guided_... API (#25615)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-09-25 19:45:25 +08:00 |
|
wang.yuqi
|
7f570f1caa
|
[V0 deprecation] Remove unreachable model_config.supported_tasks (#25642)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-25 11:26:31 +00:00 |
|
yyzxw
|
eaeca3cd7f
|
[Bugfix] Parse SpeculativeConfig Error (#25142)
Signed-off-by: zxw <1020938856@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-25 11:09:39 +00:00 |
|
Cyrus Leung
|
12c1287d64
|
[mypy] Further improve MM type annotations (#25654)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 10:57:36 +00:00 |
|
Isotr0py
|
17b4c6685c
|
[Bugfix] Fix Qwen3-VL max_num_video_tokens calculation for video profiling (#25648)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-25 18:36:01 +08:00 |
|
Agata Dobrzyniewicz
|
3c2b2ccece
|
[Bugfix] Add triton.language.tensor placeholder (#25649)
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
|
2025-09-25 10:31:14 +00:00 |
|
Roger Wang
|
7be9ffcd9f
|
[Misc] Fix Qwen3-VL video_grid_thw typing (#25646)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-25 10:16:45 +00:00 |
|
Fadi Arafeh
|
393de22d2e
|
[fix] Update torch version in cpu-build.txt for AArch64/ppc64le and Darwin (#25579)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-09-25 09:39:18 +00:00 |
|
Tyler Michael Smith
|
1260180c67
|
Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… (#25607)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-25 08:05:21 +00:00 |
|
Nicole LiHui 🥜
|
af4ee63e0e
|
typo: remove duplicate is (#25641)
Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
|
2025-09-25 00:46:22 -07:00 |
|
Jacob Kahn
|
bc092ea873
|
Map CwmForCausalLM to llama and LlamaForCausalLM (#25611)
Signed-off-by: Jacob Kahn <jacobkahn1@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-25 07:37:03 +00:00 |
|
Cyrus Leung
|
755ed7b05b
|
[Misc] Simplify PoolerOutput and move to v1/outputs (#25629)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 06:47:03 +00:00 |
|
courage17340
|
a676e668ee
|
[Bugfix] fix apply_temperature to avoid nan in probs (#24734)
Signed-off-by: courage17340 <courage17340@163.com>
|
2025-09-25 05:32:21 +00:00 |
|
Nicole LiHui 🥜
|
c85be1f6dd
|
optimize: eliminate duplicate split_enc_dec_inputs calls (#25573)
Signed-off-by: nicole-lihui <nicole.li@daocloud.io>
|
2025-09-25 05:03:25 +00:00 |
|
XuruiYang
|
845adb3ec6
|
[Model] Add LongCat-Flash (#23991)
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
|
2025-09-24 21:53:40 -07:00 |
|
Saman A. Pour
|
90b139cfff
|
Enable Fbgemm NVFP4 on Dense models (#25609)
Signed-off-by: Saman Keon <samanamp@outlook.com>
|
2025-09-24 21:12:53 -07:00 |
|
Wentao Ye
|
4492e3a554
|
[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super() (#25613)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-24 18:52:52 -07:00 |
|
Wei Wei
|
05c19485a5
|
[Kernel] Support DCP for Triton backend (#25132)
Signed-off-by: Wei Wei <wwei6@meta.com>
|
2025-09-24 18:09:34 -07:00 |
|
Jee Jee Li
|
52d0cb8458
|
[Model] Improve DotsOCRForCausalLM (#25466)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-25 07:58:08 +08:00 |
|
Shiyan Deng
|
5c1e496a75
|
[MISC] replace c10::optional with std::optional (#25602)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
|
2025-09-24 16:56:21 -07:00 |
|
Harry Mellor
|
e7f27ea648
|
Improve --help for enhanced user experience (#24903)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-24 23:08:18 +00:00 |
|
Wentao Ye
|
1f29141258
|
[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor (#25517)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-24 18:52:36 -04:00 |
|
Duncan Moss
|
6160ba4151
|
feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expert Parallel (#25503)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
|
2025-09-24 18:50:04 -04:00 |
|
Tyler Michael Smith
|
fea8006062
|
[Logging] Improve log for when DeepEP HT disables CUDA Graphs (#25531)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-09-24 22:43:06 +00:00 |
|
Woosuk Kwon
|
e6750d0b18
|
[V0 Deprecation] Remove unused classes in attention (#25541)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-24 13:24:40 -07:00 |
|
Harry Mellor
|
8c853050e7
|
[Docs] Enable fail_on_warning for the docs build in CI (#25580)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-24 19:30:33 +00:00 |
|