Liu Jinyi
|
f5516039c5
|
[Doc] fix heading levels (#29783)
Signed-off-by: KKKZOZ <kkkzoz@qq.com>
|
2025-12-01 14:49:22 +00:00 |
|
Shengqi Chen
|
36db0a35e4
|
[CI] Renovation of nightly wheel build & generation (#29690)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-01 21:25:39 +08:00 |
|
Marcin Ostrowski
|
5cfa967efa
|
[Bugfix] TypeError: 'NoneType' object is not callable (#29414)
Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>
|
2025-12-01 13:16:44 +00:00 |
|
Isotr0py
|
b95db244ee
|
[v1] Add real sliding window calculation to FlexAttention direct BlockMask building (#26015)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Co-authored-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
|
2025-12-01 13:12:51 +00:00 |
|
Zhengxu Chen
|
ad9d656bfa
|
[multimodal][test] Reduce memory utilization for test_siglip to avoid OOM (#29504)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-01 20:41:48 +08:00 |
|
Fanli Lin
|
f37e8938d2
|
[XPU] Fix AWQ skipped layer detection in IPEX quantization (#29774)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
|
2025-12-01 12:00:52 +00:00 |
|
Cyrus Leung
|
f0a28bf661
|
[Misc] Unify tokenizer registration (#29767)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-01 11:34:58 +00:00 |
|
Mickaël Seznec
|
86e178f7c4
|
[crashfix] Eagle + multimodal can crash on mm cache miss (#29750)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-12-01 17:29:33 +08:00 |
|
daniel-salib
|
014ece97c7
|
[Frontend] Add tool filtering support to ToolServer (#29224)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-01 08:03:57 +00:00 |
|
wang.yuqi
|
62de4f4257
|
[Frontend] Resettle pooling entrypoints (#29634)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-12-01 15:30:43 +08:00 |
|
Huamin Li
|
83805a6078
|
[CI] Skip paddleocr_vl for transformer 4.57.3 (#29758)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-12-01 04:38:06 +00:00 |
|
Yifei Zhang
|
1ab8fc8197
|
Make PyTorch profiler gzip and CUDA time dump configurable (#29568)
Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>
|
2025-12-01 04:30:46 +00:00 |
|
Shu Wang
|
f72a817bdf
|
[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141)
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-30 16:05:32 -08:00 |
|
Woosuk Kwon
|
ec38a7368d
|
[Model Runner V2] Use packed mask for prompt bin counts (#29756)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-30 14:15:42 -08:00 |
|
Xingyu Liu
|
21c2627934
|
[Misc]Remove redundant hidden_size property in ModelConfig (#29749)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-30 17:14:23 +00:00 |
|
Omer Ullman Argov
|
39d28108f4
|
[Feat] Support non-gated activations in NVFP4 modelopt path (#29004)
|
2025-11-30 11:02:40 -05:00 |
|
Harry Mellor
|
cd719de5cb
|
Fix RoPE failures in Transformers nightly (#29700)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-30 14:29:32 +00:00 |
|
Pleaplusone
|
8c363ed666
|
[ROCm][Attention] Sliding window support for AiterFlashAttentionBackend (#29234)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-30 11:31:50 +00:00 |
|
Cyrus Leung
|
64bc09ba27
|
[Core] Enable inputs_embeds_size separate from hidden_size (#29741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-30 17:31:12 +08:00 |
|
Isotr0py
|
47539cfd3e
|
[Bugfix] Fix mismatched nvfp4 gemm output shape (#29742)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-30 09:15:01 +00:00 |
|
Cyrus Leung
|
2afcec4dec
|
[Misc] Update TokenizerLike interface and move get_cached_tokenizer (#29730)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-30 14:59:47 +08:00 |
|
朝
|
9381b5cde0
|
[Doc]: Fix typo in fused_moe layer (#29731)
Signed-off-by: BowTen <bowten@qq.com>
|
2025-11-29 22:29:13 -08:00 |
|
Vensen
|
66b5840287
|
[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output (#28783)
Signed-off-by: vensen <vensenmu@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-11-30 14:24:25 +08:00 |
|
Huamin Li
|
82c795d6f2
|
Fix AttributeError about _use_fi_prefill (#29734)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-11-30 06:04:55 +00:00 |
|
Isotr0py
|
e1464c3a08
|
[Quantization] Enable compressed-tensors AWQ for Turing GPU (#29732)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-30 06:04:28 +00:00 |
|
Xin Yang
|
a491b0911b
|
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-30 10:37:25 +08:00 |
|
Jee Jee Li
|
b9d0504a36
|
[Bugfix] Revert test_tokenization.py (#29729)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-29 16:35:15 +00:00 |
|
Jinzhen Lin
|
1656ad3704
|
[Kernel][Quantization] add w4a8 support for marlin kernel (#24722)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
|
2025-11-29 07:19:33 -08:00 |
|
Cyrus Leung
|
fa59fe417f
|
[Chore] Move detokenizer_utils to vllm/tokenizers (#29727)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 06:25:17 -08:00 |
|
Cyrus Leung
|
fe3398fab2
|
[Chore] Enable passing tokenizer=None into MM processor (#29724)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 06:25:10 -08:00 |
|
Chukwuma Nwaugha
|
ad7f714d62
|
hfrunner.classify should return list[list[float]] not list[str] (#29671)
Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>
|
2025-11-29 13:57:00 +00:00 |
|
dublc
|
f4341f45d3
|
[Doc]: fix code block rendering (#29728)
Signed-off-by: dublc <jdublc0x@gmail.com>
|
2025-11-29 13:46:48 +00:00 |
|
Cyrus Leung
|
34a984274e
|
[Misc] Refactor tokenizer interface (#29693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 04:02:21 -08:00 |
|
Woosuk Kwon
|
f223ed4181
|
[Model Runner V2] Fuse penalties and temperature into single kernel (#29720)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-29 02:29:16 -08:00 |
|
Didier Durand
|
04a797cd0e
|
[Doc]: fixing typos in various files. (#29717)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-29 01:15:39 -08:00 |
|
Woosuk Kwon
|
6afc0ffaf6
|
[Model Runner V2] Add sample/ directory and reorganize files (#29719)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-29 00:41:01 -08:00 |
|
Jee Jee Li
|
39e63dec7c
|
[LoRA] Cleanup LoRA unused code (#29611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 22:52:58 -08:00 |
|
Woosuk Kwon
|
4a80ad0a25
|
[Model Runner V2] Don't use UVA buffer for prefill_len (#29713)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-28 20:27:16 -08:00 |
|
Angela Yi
|
4b17ce6815
|
Add gpu memory wait before test_async_tp (#28893)
Signed-off-by: angelayi <yiangela7@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-28 20:19:05 -08:00 |
|
Lucas Wilkinson
|
e23f665d83
|
[BugFix] Fix DBO failing with TypeError: 'NoneType' object is not iterable (#29698)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-28 20:19:01 -08:00 |
|
Woosuk Kwon
|
ca1b1e7296
|
[Model Runner V2] Refactor prefill token preparation (#29712)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-28 19:49:17 -08:00 |
|
Tsukasa OI
|
762a4a6ca9
|
[Frontend] Perform offline path replacement to tokenizer (#29706)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
2025-11-28 18:32:08 -08:00 |
|
Cyrus Leung
|
b2c50eda50
|
[Bugfix] Fix wrong mock attribute (#29704)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 10:30:41 +08:00 |
|
Woosuk Kwon
|
1dcafb3dea
|
[Model Runner V2] Support penalties using bin counts (#29703)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-28 17:53:17 -08:00 |
|
Andreas Karatzas
|
ea3370b428
|
[ROCm][Bugfix] Patch for the Multi-Modal Processor Test group (#29702)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-11-29 01:31:44 +00:00 |
|
Mert Unsal
|
c625d7b1c6
|
[Bugfix] Fix O(n²) multimodal string prompt processing (#29667)
Signed-off-by: mertunsall <mertunsal1905@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-28 16:10:39 -08:00 |
|
Zhengxu Chen
|
6173682b6e
|
[compile] Include enable_sleep_mode into caching factors. (#29696)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-11-29 07:58:38 +08:00 |
|
Augusto Yao
|
9726e64530
|
bugfix: correct attn output with base 2 or e (#28840)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
|
2025-11-29 07:52:12 +08:00 |
|
Huamin Li
|
3fd1fb0b60
|
Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)" (#29697)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-11-28 15:26:52 -08:00 |
|
Jiangyun Zhu
|
a51f4186f2
|
[Bugfix] fix dots.llm1.inst (#29687)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-28 15:25:26 -08:00 |
|