Omer Ullman Argov
|
39d28108f4
|
[Feat] Support non-gated activations in NVFP4 modelopt path (#29004)
|
2025-11-30 11:02:40 -05:00 |
|
Harry Mellor
|
cd719de5cb
|
Fix RoPE failures in Transformers nightly (#29700)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-30 14:29:32 +00:00 |
|
Pleaplusone
|
8c363ed666
|
[ROCm][Attention] Sliding window support for AiterFlashAttentionBackend (#29234)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-11-30 11:31:50 +00:00 |
|
Cyrus Leung
|
64bc09ba27
|
[Core] Enable inputs_embeds_size separate from hidden_size (#29741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-30 17:31:12 +08:00 |
|
Isotr0py
|
47539cfd3e
|
[Bugfix] Fix mismatched nvfp4 gemm output shape (#29742)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-30 09:15:01 +00:00 |
|
Cyrus Leung
|
2afcec4dec
|
[Misc] Update TokenizerLike interface and move get_cached_tokenizer (#29730)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-30 14:59:47 +08:00 |
|
朝
|
9381b5cde0
|
[Doc]: Fix typo in fused_moe layer (#29731)
Signed-off-by: BowTen <bowten@qq.com>
|
2025-11-29 22:29:13 -08:00 |
|
Vensen
|
66b5840287
|
[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output (#28783)
Signed-off-by: vensen <vensenmu@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-11-30 14:24:25 +08:00 |
|
Huamin Li
|
82c795d6f2
|
Fix AttributeError about _use_fi_prefill (#29734)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-11-30 06:04:55 +00:00 |
|
Isotr0py
|
e1464c3a08
|
[Quantization] Enable compressed-tensors AWQ for Turing GPU (#29732)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-30 06:04:28 +00:00 |
|
Xin Yang
|
a491b0911b
|
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-30 10:37:25 +08:00 |
|
Jee Jee Li
|
b9d0504a36
|
[Bugfix] Revert test_tokenization.py (#29729)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-29 16:35:15 +00:00 |
|
Jinzhen Lin
|
1656ad3704
|
[Kernel][Quantization] add w4a8 support for marlin kernel (#24722)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
|
2025-11-29 07:19:33 -08:00 |
|
Cyrus Leung
|
fa59fe417f
|
[Chore] Move detokenizer_utils to vllm/tokenizers (#29727)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 06:25:17 -08:00 |
|
Cyrus Leung
|
fe3398fab2
|
[Chore] Enable passing tokenizer=None into MM processor (#29724)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 06:25:10 -08:00 |
|
Chukwuma Nwaugha
|
ad7f714d62
|
hfrunner.classify should return list[list[float]] not list[str] (#29671)
Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>
|
2025-11-29 13:57:00 +00:00 |
|
dublc
|
f4341f45d3
|
[Doc]: fix code block rendering (#29728)
Signed-off-by: dublc <jdublc0x@gmail.com>
|
2025-11-29 13:46:48 +00:00 |
|
Cyrus Leung
|
34a984274e
|
[Misc] Refactor tokenizer interface (#29693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 04:02:21 -08:00 |
|
Woosuk Kwon
|
f223ed4181
|
[Model Runner V2] Fuse penalties and temperature into single kernel (#29720)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-29 02:29:16 -08:00 |
|
Didier Durand
|
04a797cd0e
|
[Doc]: fixing typos in various files. (#29717)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-29 01:15:39 -08:00 |
|
Woosuk Kwon
|
6afc0ffaf6
|
[Model Runner V2] Add sample/ directory and reorganize files (#29719)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-29 00:41:01 -08:00 |
|
Jee Jee Li
|
39e63dec7c
|
[LoRA] Cleanup LoRA unused code (#29611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 22:52:58 -08:00 |
|
Woosuk Kwon
|
4a80ad0a25
|
[Model Runner V2] Don't use UVA buffer for prefill_len (#29713)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-28 20:27:16 -08:00 |
|
Angela Yi
|
4b17ce6815
|
Add gpu memory wait before test_async_tp (#28893)
Signed-off-by: angelayi <yiangela7@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-28 20:19:05 -08:00 |
|
Lucas Wilkinson
|
e23f665d83
|
[BugFix] Fix DBO failing with TypeError: 'NoneType' object is not iterable (#29698)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-28 20:19:01 -08:00 |
|
Woosuk Kwon
|
ca1b1e7296
|
[Model Runner V2] Refactor prefill token preparation (#29712)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-28 19:49:17 -08:00 |
|
Tsukasa OI
|
762a4a6ca9
|
[Frontend] Perform offline path replacement to tokenizer (#29706)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
2025-11-28 18:32:08 -08:00 |
|
Cyrus Leung
|
b2c50eda50
|
[Bugfix] Fix wrong mock attribute (#29704)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 10:30:41 +08:00 |
|
Woosuk Kwon
|
1dcafb3dea
|
[Model Runner V2] Support penalties using bin counts (#29703)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-28 17:53:17 -08:00 |
|
Andreas Karatzas
|
ea3370b428
|
[ROCm][Bugfix] Patch for the Multi-Modal Processor Test group (#29702)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-11-29 01:31:44 +00:00 |
|
Mert Unsal
|
c625d7b1c6
|
[Bugfix] Fix O(n²) multimodal string prompt processing (#29667)
Signed-off-by: mertunsall <mertunsal1905@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-28 16:10:39 -08:00 |
|
Zhengxu Chen
|
6173682b6e
|
[compile] Include enable_sleep_mode into caching factors. (#29696)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-11-29 07:58:38 +08:00 |
|
Augusto Yao
|
9726e64530
|
bugfix: correct attn output with base 2 or e (#28840)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
|
2025-11-29 07:52:12 +08:00 |
|
Huamin Li
|
3fd1fb0b60
|
Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)" (#29697)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-11-28 15:26:52 -08:00 |
|
Jiangyun Zhu
|
a51f4186f2
|
[Bugfix] fix dots.llm1.inst (#29687)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-28 15:25:26 -08:00 |
|
Cyrus Leung
|
7675ba30de
|
[Misc] Remove redundant ClassRegistry (#29681)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-28 15:24:47 -08:00 |
|
Ralf Gommers
|
7c1ed45848
|
[CI/Build]: make it possible to build with a free-threaded interpreter (#29241)
Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 15:21:46 -08:00 |
|
Benjamin Chislett
|
1986de1375
|
[Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-11-28 22:25:05 +00:00 |
|
Yanan Cao
|
3461e7efd8
|
[Frontend] Remap -O to -cc commandline flag (#29557)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2025-11-28 21:51:12 +00:00 |
|
Harry Mellor
|
fecae12cd7
|
Remove all_special_tokens_extended from tokenizer code (#29686)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-28 20:26:51 +00:00 |
|
Cyrus Leung
|
8d9338fae4
|
[Chore] Rename Processor to InputProcessor (#29682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 09:35:41 -08:00 |
|
Isotr0py
|
d40c854009
|
[CI/Build] Rework CPU multimodal processor test (#29684)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-28 17:10:29 +00:00 |
|
Harry Mellor
|
4332955602
|
[Docs] Add CLI reference doc for vllm bench sweep plot_pareto (#29689)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-28 08:10:08 -09:00 |
|
Isotr0py
|
f946a8d743
|
[Chore]: Reorganize model repo operating functions in transformers_utils (#29680)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-28 08:46:51 -08:00 |
|
Isotr0py
|
6f9d81d03b
|
[V0 deprecation] Clean up legacy paged attention helper functions (#28043)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-28 16:44:33 +00:00 |
|
Didier Durand
|
fae6943068
|
[Doc]: fixing typos in multiple files. (#29685)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-28 08:41:41 -08:00 |
|
果冻虾仁
|
3bcbb30cbf
|
add add_truncate_prompt_tokens in repr for PoolingParams (#29683)
|
2025-11-28 08:41:05 -08:00 |
|
Cyrus Leung
|
9e6bcda3ac
|
[mypy] Enable type checking for more directories (#29674)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 08:39:27 -08:00 |
|
Harry Mellor
|
9eec282cb5
|
Guard FlashInfer sampler using the same check as FlashInfer attention backend (#29415)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 08:34:48 -08:00 |
|
Cyrus Leung
|
0808eb813b
|
[Misc] Remove yapf directives (#29675)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 15:07:23 +00:00 |
|