Patrick von Platen
|
b038d9c40c
|
[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) (#26367)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-10-17 08:24:42 -07:00 |
|
Nicolò Lucchesi
|
2ba60ec7fe
|
[CI] Nixl integration tests (#27010)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-17 07:13:31 -07:00 |
|
Luka Govedič
|
bd7157a071
|
[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-17 08:10:23 -06:00 |
|
Yongtao Huang
|
be429d0cfd
|
Fix incorrect docstring for stop_profile() method (#27101)
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
|
2025-10-17 06:30:23 -07:00 |
|
Reima Karhila (AMD)
|
c253745eb8
|
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 (#25586)
Signed-off-by: Reima Karhila <reima.karhila@amd.com>
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com>
Co-authored-by: xaguilar <Xavier.AguilarFruto@amd.com>
|
2025-10-17 04:56:12 -07:00 |
|
Jee Jee Li
|
daec4d2624
|
[Model]Improve Qwen3VLMoeForConditionalGeneration packed_modules_mapping (#27096)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-17 04:47:00 -07:00 |
|
Harry Mellor
|
6c9fdbf725
|
[Docs] Replace rst style double-backtick with md single-backtick (#27091)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-17 02:47:34 -07:00 |
|
Harry Mellor
|
483ea64611
|
[Docs] Replace all explicit anchors with real links (#27087)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-17 02:22:06 -07:00 |
|
Mengqing Cao
|
e20eba753b
|
[VLM][Refactor] Remove useless func get_input_positions in MRotaryEmbedding (#27088)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2025-10-17 02:00:30 -07:00 |
|
cong-meta
|
bbc1b29665
|
Update troubleshooting.md and remind VLLM_TRACE_FUNCTION usage (#27069)
Signed-off-by: cong-meta <prowindy@hotmail.com>
|
2025-10-17 01:53:06 -07:00 |
|
Chauncey
|
acb1bfa601
|
[CI] fix docs build failed (#27082)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-17 07:53:40 +00:00 |
|
zhrrr
|
75c7ad9918
|
[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
|
2025-10-17 07:30:35 +00:00 |
|
Li, Jiang
|
5550ff9c25
|
[CI/Build] Update compressed tensor test path to fix CPU CI (#27068)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-10-16 22:34:56 -07:00 |
|
Said Taghadouini
|
3aeb19a39e
|
[Model] Add support for LightOnOCR (#26916)
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com>
Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-17 05:05:24 +00:00 |
|
Cyrus Leung
|
8c017b3490
|
[Model] Always use Transformers backend for PaliGemma and Gemma3-MM (#26715)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-17 05:03:35 +00:00 |
|
Zhewen Li
|
9c2c2287a0
|
[CI/Build] Update Llama4 eval yaml (#27070)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-10-17 04:59:47 +00:00 |
|
Jee Jee Li
|
fec2b341ad
|
[Kernel] Lazy import FlashInfer (#26977)
|
2025-10-17 04:48:18 +00:00 |
|
Jee Jee Li
|
87bc0c492f
|
[Bugfix] Fix ReplicatedLinearWithLoRA (#27065)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-17 04:43:16 +00:00 |
|
Nick Hill
|
fe3b9372ad
|
[Core] Change execute_model_with_error_logging() to be a ctx manager (#27060)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-17 11:45:32 +08:00 |
|
Tao He
|
bde9e2272a
|
[Bugfix][Qwen] fixes the weights dtype in qwen3_next: it is actually a bfloat16 (#27030)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
|
2025-10-17 03:37:52 +00:00 |
|
Boyuan Feng
|
08405609cc
|
disable graph partition in custom op (#26952)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Boyuan Feng <fby.1994@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-17 11:08:47 +08:00 |
|
Nick Hill
|
ab81379ea6
|
[Perf] Exploit out-of-band buffers in shm_broadcast (#26961)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-16 20:08:03 -07:00 |
|
Harry Mellor
|
4ffd6e8942
|
[Docs] Reduce custom syntax used in docs (#27009)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-16 20:05:34 -07:00 |
|
Tomas Ruiz
|
965c5f4914
|
vllm bench serve shows num of failed requests (#26478)
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>
|
2025-10-16 19:55:09 -07:00 |
|
Lukas Geiger
|
4d055ef465
|
Remove unused imports (#26972)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-10-16 19:51:17 -07:00 |
|
Boyuan Feng
|
17c540a993
|
[torch.compile] fix simple inductor graph partition test (#27050)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-10-16 21:09:36 -04:00 |
|
Cyrus Leung
|
4d4d6bad19
|
[Chore] Separate out vllm.utils.importlib (#27022)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-17 00:48:59 +00:00 |
|
Lucia Fang
|
11ae016bd7
|
[torch.compile] Passing only necessary compilation config to inductor pass config (#27041)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-10-17 00:01:52 +00:00 |
|
jiahanc
|
41d3071918
|
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-16 16:20:25 -07:00 |
|
Harry Mellor
|
fb5e10d3fb
|
Refactor Transformers backend to use mixins (#26906)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-16 21:50:39 +00:00 |
|
Bram Wasti
|
b2f78cbad4
|
[small][batch invariance] Rename the env and internal flags to simplify usage (#26855)
Signed-off-by: Bram Wasti <bwasti@meta.com>
|
2025-10-16 21:40:25 +00:00 |
|
Wentao Ye
|
23583ee28c
|
[Bug] Add Assertion for random-input-len / random-output-len (#26834)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-16 21:36:39 +00:00 |
|
Michael Goin
|
01c977e96d
|
[CI] Prune Quantization Tests and skip compilation (#27038)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-16 17:26:35 -04:00 |
|
Wentao Ye
|
b3dda72c23
|
[Feature] Migrate DeepGEMM API from get_m_alignment_for_contiguous_layout to get_mk_alignment_for_contiguous_layout (#26935)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-16 16:46:48 -04:00 |
|
Varun Sundar Rabindranath
|
fb0571b077
|
[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-16 12:53:11 -07:00 |
|
Wentao Ye
|
2ed8b6b3d0
|
[Bug] Fix batch invariant test has to is (#27032)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-16 19:45:14 +00:00 |
|
kimbochen
|
013abde6ef
|
Adding Warmup to Benchmark Serving (#26943)
Signed-off-by: Kimbo Chen <chentenghung@gmail.com>
|
2025-10-16 12:44:32 -07:00 |
|
Kyle Sayers
|
a5464dcf92
|
[Compressed Tensors] Always clone output for compile robustness (#26849)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-16 19:29:59 +00:00 |
|
Mandy Li
|
ac3ed5a815
|
Support block size of 256 used by Intel HPU (#26883)
Signed-off-by: mandy-li <mandy.j.li@intel.com>
|
2025-10-16 15:10:57 -04:00 |
|
Andrew Xia
|
e6ba2000ae
|
[gpt-oss][1/N] EZ: refactor serving_responses for modularity (#26948)
Signed-off-by: Andrew Xia <axia@meta.com>
|
2025-10-16 18:44:06 +00:00 |
|
Harry Mellor
|
aa255ff55a
|
Support set in the CLI generation (#27031)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-16 18:07:18 +00:00 |
|
ZiTian Zhao
|
7bb736d00e
|
Fix Qwen2.5 VL image grid docstring (#27033)
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
|
2025-10-16 09:57:36 -07:00 |
|
Jee Jee Li
|
9f4e30904b
|
[Model] Fix Qwen3VL mm mapping (#27027)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-16 09:45:59 -07:00 |
|
rongfu.leng
|
5afd3276df
|
[Feature] Add process_weights_after_loading to AttentionImpl (#26870)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-10-16 08:02:30 -07:00 |
|
Tahsin Tunan
|
43721bc67f
|
[CI] Replace large models with tiny alternatives in tests (#24057)
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-16 15:51:27 +01:00 |
|
Kay Yan
|
02d709a6f1
|
[docs] standardize Hugging Face env var to HF_TOKEN (deprecates HUGGING_FACE_HUB_TOKEN) (#27020)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-10-16 15:31:02 +01:00 |
|
Mark McLoughlin
|
4a510ab487
|
[NIXL] Improve request_finished() debug logs (#25665)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-10-16 15:55:17 +02:00 |
|
Matthew Bonanni
|
314fa8abbf
|
[Attention] Tune CUTLASS MLA num_splits (#26846)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-16 06:36:09 -07:00 |
|
Cyrus Leung
|
334535b6fb
|
[Benchmark] Show E2EL by default for pooling models (#27014)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-16 12:47:09 +00:00 |
|
bogdanm
|
dcbb3f1871
|
[Bugfix] Correct LayerNorm epsilon parameter in modernbert.py (#27008)
Signed-off-by: bogdanm <152898065+bogdan01m@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-16 12:27:44 +00:00 |
|