Gregory Shtrasberg
|
176bbce1db
|
Revert "[AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure (#21647)" (#21850)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-07-29 21:56:29 +00:00 |
|
Doug Smith
|
a1873db23d
|
docker: docker-aware precompiled wheel support (#21127)
Signed-off-by: dougbtv <dosmith@redhat.com>
|
2025-07-29 14:45:19 -07:00 |
|
Michael Goin
|
a33ea28b1b
|
Add flashinfer_python to CUDA wheel requirements (#21389)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-29 12:51:58 -07:00 |
|
David Xia
|
7b49cb1c6b
|
[Doc] update Contributing page's testing section (#18272)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-07-29 10:32:46 -07:00 |
|
Varun Sundar Rabindranath
|
f03e9cf2bb
|
[Doc] Add FusedMoE Modular Kernel Documentation (#21623)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-29 10:32:30 -07:00 |
|
David Xia
|
37f86d9048
|
[Docs] use uv in GPU installation docs (#20277)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-07-29 10:32:06 -07:00 |
|
elvischenv
|
58b11b24a6
|
[Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend (#21525)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-07-29 10:34:00 -04:00 |
|
Wenhua Cheng
|
ad341c5194
|
[Bugfix]fix mixed bits and visual language model quantization in AutoRound (#21802)
Signed-off-by: Wenhua Cheng <wenhua.cheng@intel.com>
|
2025-07-29 07:26:31 -07:00 |
|
Brittany
|
759b87ef3e
|
[TPU] Add an optimization doc on TPU (#21155)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-29 07:23:19 -07:00 |
|
Harry Mellor
|
f693b067a2
|
[Docs] Merge design docs for a V1 only future (#21832)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-29 07:22:50 -07:00 |
|
Richard Zou
|
04e38500ee
|
[Bugfix] VLLM_V1 supports passing other compilation levels (#19340)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-07-29 09:35:58 -04:00 |
|
Cyrus Leung
|
ab714131e4
|
[Doc] Update compatibility matrix for pooling and multimodal models (#21831)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-29 06:29:51 -07:00 |
|
Chen Zhang
|
755fa8b657
|
[KVCache] Make KVCacheSpec hashable (#21791)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-29 19:58:29 +08:00 |
|
Kay Yan
|
2470419119
|
[Docs] Fix the outdated URL for installing from vLLM binaries (#21523)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-29 04:56:27 -07:00 |
|
Jee Jee Li
|
61a6905ab0
|
[Model] Refactor JambaForCausalLM (#21394)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-29 18:25:07 +08:00 |
|
Reza Barazesh
|
37efc63b64
|
[V0 deprecation] Guided decoding (#21347)
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-29 03:15:30 -07:00 |
|
Isotr0py
|
a4528f0cac
|
[Model]: Fused MoE for nomic-embed-text-v2-moe (#18321)
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-29 03:13:27 -07:00 |
|
Cyrus Leung
|
a2480251ec
|
[Doc] Link to RFC for pooling optimizations (#21806)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-28 23:53:18 -07:00 |
|
Nick Hill
|
7234fe2685
|
[Misc] Rework process titles (#21780)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-29 05:14:47 +00:00 |
|
Benji Beck
|
f1e2c095ec
|
Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema (#21684)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-28 22:09:45 -07:00 |
|
Gregory Shtrasberg
|
12a223ef9b
|
[AMD][CI/Build][Bugfix] Guarding CUDA specific functions by ifndef ROCM (#21766)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-07-29 03:35:37 +00:00 |
|
Calvin Chen
|
e18f085103
|
skip fusedmoe layer for start_load_kv (#21378)
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
|
2025-07-28 18:59:44 -07:00 |
|
Michael Goin
|
afa2607596
|
[CI] Parallelize Kernels MoE Test (#21764)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-28 18:56:24 -07:00 |
|
Wentao Ye
|
48b763d6b5
|
[Refactor] Merge Compressed Tensor FP8 CompressedTensorsW8A8Fp8MoEMethod and CompressedTensorsW8A8Fp8MoECutlassMethod (#21775)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-28 19:47:21 -06:00 |
|
Michael Goin
|
947e982ede
|
[Docs] Minimize spacing for supported_hardware.md table (#21779)
|
2025-07-28 18:46:39 -07:00 |
|
lyrisz
|
c6c9122d50
|
[Kernel] SM90 CUTLASS FP8 GEMM: add support for swap AB + kernel tuning (#20396)
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com>
Co-authored-by: Duncan Moss <djm.moss@gmail.com>
|
2025-07-28 23:13:58 +00:00 |
|
Lucas Wilkinson
|
8aa1485fcf
|
[Perf] Disable chunked local attention by default with llama4 (#21761)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-07-28 18:49:04 -04:00 |
|
Nikhil Gupta
|
89ac266b26
|
[Feat]: Add support for Dynamic Quant 4 bit CPU kleidiai kernels (#17112)
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-28 20:55:15 +00:00 |
|
Clayton Coleman
|
c6f36cfa26
|
[Bugfix] DeepGEMM is not enabled on B200 due to _lazy_init() (#21472)
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-28 20:51:22 +00:00 |
|
Kuntai Du
|
b18b417fbf
|
Revert "[V1] Exception Handling when Loading KV Cache from Remote Store" (#21778)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-07-28 20:15:18 +00:00 |
|
Lu Fang
|
9ba1c88a93
|
[AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure (#21647)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-07-28 20:11:16 +00:00 |
|
Wentao Ye
|
e0e58f9729
|
[Bug] Enforce contiguous input for dynamic_scaled_fp8_quant and static_scaled_fp8_quant (#21773)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-28 19:55:48 +00:00 |
|
rasmith
|
b361f14e39
|
[AMD][BugFix] Fix omission of wvSplitK kernel for small batch sizes (1-4) due to torch.compile (#21350)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-07-28 15:38:20 -04:00 |
|
weiliang
|
01c753ed98
|
update flashinfer to v0.2.9rc2 (#21701)
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
|
2025-07-28 19:31:47 +00:00 |
|
Harry Mellor
|
94b71ae106
|
Use metavar to list the choices for a CLI arg when custom values are also accepted (#21760)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-28 19:31:10 +00:00 |
|
Nick Hill
|
7d44c691b0
|
[P/D] Log warnings related to prefill KV expiry (#21753)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-28 18:40:53 +00:00 |
|
Cyrus Leung
|
e17a4d3bf9
|
[Bugfix] Fix granite speech shape validation (#21762)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-28 14:19:21 -04:00 |
|
Chaojun Zhang
|
ec261b0291
|
[XPU] IPEX-optimized Punica Wrapper on XPU (#21703)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-28 16:43:37 +00:00 |
|
Cyrus Leung
|
04fe61aa3d
|
[CI/Build] Fix plugin tests (#21758)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-28 15:08:05 +00:00 |
|
Michard Hugo
|
25708d317a
|
[Bugfix] Mistral crashes on tool with no description (#21167)
Signed-off-by: HugoMichard <hugo@harfanglab.fr>
|
2025-07-28 08:03:35 -07:00 |
|
Cyrus Leung
|
0e18a5d058
|
[Misc] Reduce logs for model resolution (#21765)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-28 07:59:56 -07:00 |
|
Michael Goin
|
34a20c49b3
|
[Logs] Change flashinfer sampler logs to once (#21759)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-28 06:59:51 -07:00 |
|
Isotr0py
|
31084b3b1f
|
[Bugfix][CI/Build] Update peft version in test requirement (#21729)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-28 06:17:43 -07:00 |
|
wuhang
|
bccc43c033
|
[Bugfix]check health for engine core process exiting unexpectedly (#21728)
Signed-off-by: wuhang <wuhang6@huawei.com>
|
2025-07-28 06:17:31 -07:00 |
|
Harry Mellor
|
1395dd9c28
|
[Docs] Add revision date to rendered docs (#21752)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-28 06:12:46 -07:00 |
|
Keyang Ru
|
9ace2eaf35
|
[Bugfix] Improve JSON extraction in LlamaToolParser (#19024)
Signed-off-by: keru <keyang.ru@oracle.com>
Co-authored-by: keru <keyang.ru@oracle.com>
|
2025-07-28 12:36:58 +00:00 |
|
Anton Vlasjuk
|
656c24f1b5
|
[Ernie 4.5] Name Change for Base 0.3B Model (#21735)
Signed-off-by: vasqu <antonprogamer@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-28 12:22:32 +00:00 |
|
Chauncey
|
63fe3a700f
|
[PD] let p2p nccl toy proxy handle /chat/completions (#21734)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-28 11:45:50 +00:00 |
|
Isotr0py
|
0ae970ed15
|
[Bugfix] Fix glm4.1v video_grid_thw tensor shape scheme (#21744)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-28 04:26:49 -07:00 |
|
Li, Jiang
|
65e8466c37
|
[Bugfix] Fix environment variable setting in CPU Dockerfile (#21730)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-28 11:02:39 +00:00 |
|