Isotr0py
|
f71b00a19e
|
[Bugfix] Fix broken vision language example (#14292)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-05 15:57:10 +00:00 |
|
DaividFrank
|
8f808cf86e
|
prefix_caching.md: Fixed typo (#14293)
Signed-off-by: Daivid Savernin-Frenk <daivid.frank@TurboNext.ai>
|
2025-03-05 15:43:13 +00:00 |
|
Jee Jee Li
|
7bab4bb048
|
[Misc] Add Qwen2MoeForCausalLM moe tuning support (#14276)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-05 23:11:29 +08:00 |
|
Isotr0py
|
e17e4488bd
|
[LoRA] Remove linear hack outside transformers backend (#14177)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-05 15:06:28 +00:00 |
|
Robert Shaw
|
257e200a25
|
[V1][Frontend] Add Testing For V1 Runtime Parameters (#14159)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-03-05 14:18:55 +00:00 |
|
Zhe Zhang
|
47d4a7e004
|
Small update for external_launcher backend docs (#14288)
|
2025-03-05 21:30:00 +08:00 |
|
Cyrus Leung
|
7f89a594dd
|
[Doc] [3/N] Refer code examples for common cases in dev multimodal processor (#14278)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-05 12:29:50 +00:00 |
|
Iacopo Poli
|
961644e6a8
|
[Doc] Update nginx guide: remove privileged from vllm container run and add target GPU ID (#14217)
Signed-off-by: Iacopo Poli <iacopo@lighton.ai>
|
2025-03-05 11:44:10 +00:00 |
|
Lu Fang
|
8d6cd32b7b
|
[Bugfix][V1] Fix allowed_token_ids for v1 Sampler (#14169)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-05 08:49:44 +00:00 |
|
Roger Wang
|
ec79b67c77
|
[Misc][V1] Avoid using envs.VLLM_USE_V1 in mm processing (#14256)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-05 07:37:16 +00:00 |
|
Benjamin Chislett
|
32985bed7c
|
[Frontend] Allow return_tokens_as_token_ids to be passed as a request param (#14066)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-03-05 06:30:40 +00:00 |
|
Michael Goin
|
dae9ec464c
|
Temporarily disable test_awq_gemm_opcheck (#14251)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-05 06:10:35 +00:00 |
|
youkaichao
|
6eaf93020d
|
[platforms] improve rocm debugging info (#14257)
|
2025-03-04 21:32:18 -08:00 |
|
Tyler Michael Smith
|
72c62eae5f
|
[V1] EP/TP MoE + DP Attention (#13931)
|
2025-03-04 21:27:26 -08:00 |
|
Congcong Chen
|
0a995d5434
|
[Model] New model support for Phi-4-multimodal-instruct (#14119)
|
2025-03-04 20:57:01 -08:00 |
|
Cody Yu
|
ade3f7d988
|
[V1][Bugfix] Do not reset prefix caching metrics (#14235)
|
2025-03-05 04:39:13 +00:00 |
|
rainkert
|
0df25101d6
|
[Bugfix] Fix gptq_marlin for deepseek-v3 (#13750)
Signed-off-by: dangshunya <dangshunya@baichuan-inc.com>
Co-authored-by: dangshunya <dangshunya@baichuan-inc.com>
|
2025-03-05 12:25:53 +08:00 |
|
Michael Goin
|
e123aafdf0
|
Disable GPTQ AllSpark kernels for CUDA Compiler < 12.0 (#14157)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-05 12:25:24 +08:00 |
|
Nishidha
|
5b143d33be
|
Moved numba from common requirements to cuda/rocm specific requirements (#14199)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
|
2025-03-05 12:25:00 +08:00 |
|
youkaichao
|
eb59b5a6cb
|
[misc] announce china meetup (#14248)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-05 10:33:50 +08:00 |
|
Michael Goin
|
fbfc3ee37e
|
[V1][TPU] TPU multimodal model support for ragged attention (#14158)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-03-04 19:58:48 -05:00 |
|
Sage Moore
|
3e1d223626
|
[ROCm] Disable a few more kernel tests that are broken on ROCm (#14145)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-03-04 23:37:55 +00:00 |
|
Tyler Michael Smith
|
4f5b059f14
|
Clean up unused padding_idx variables across many model definitions (#13240)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-04 21:27:00 +00:00 |
|
Kuntai Du
|
288ca110f6
|
[Security] Serialize using safetensors instead of pickle in Mooncake Pipe (#14228)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-03-04 21:10:32 +00:00 |
|
Mark McLoughlin
|
c2bd2196fc
|
[v1][Metrics] Add design doc (#12745)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-04 20:36:55 +00:00 |
|
Michael Goin
|
550c7ba3dc
|
[Docs] Update Dockerfile dependency image (#14215)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-04 20:22:11 +00:00 |
|
Harry Mellor
|
e5b2f1601a
|
[Frontend] Do prompt_logprobs clamping for chat as well as completions (#14225)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-04 20:13:06 +00:00 |
|
Harry Mellor
|
9badee53de
|
Fix performance when --generation-config is not None (#14223)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-04 20:59:22 +01:00 |
|
Siyuan Liu
|
beebf4742a
|
[TPU][Profiler] Support start_profile/stop_profile in TPU worker (#13988)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-03-04 14:40:06 -05:00 |
|
kushanam
|
f89978ad7c
|
add cutlass support for blackwell fp8 gemm (#13798)
|
2025-03-04 07:55:07 -08:00 |
|
lkchen
|
b3cf368d79
|
[V1][Molmo] Fix get_multimodal_embeddings() in molmo.py (#14161)
|
2025-03-04 15:43:59 +00:00 |
|
Mark McLoughlin
|
c8525f06fc
|
[V0][Metrics] Deprecate some questionable request time metrics (#14135)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-04 15:11:33 +00:00 |
|
Nick Hill
|
5db6b2c961
|
[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs (#13869)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-04 15:06:47 +00:00 |
|
Michael Goin
|
6247bae6c6
|
[Bugfix] Restrict MacOS CPU detection (#14210)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-04 22:25:27 +08:00 |
|
youkaichao
|
3610fb4930
|
[doc] add "Failed to infer device type" to faq (#14200)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-04 20:47:06 +08:00 |
|
youkaichao
|
71c4b40562
|
[sleep mode] error out with expandable_segments (#14189)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-04 18:54:19 +08:00 |
|
youkaichao
|
ac65bc92df
|
[platform] add debug logging during inferring the device type (#14195)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-04 18:39:16 +08:00 |
|
Michael Goin
|
f78c0be80a
|
Fix benchmark_moe.py tuning for CUDA devices (#14164)
|
2025-03-03 21:11:03 -08:00 |
|
Zhanwen Chen
|
66233af7b6
|
Use math.prod instead of np.prod for trivial ops (#14142)
|
2025-03-03 21:09:22 -08:00 |
|
Rui Qiao
|
bf13d40972
|
[core] Pass all driver env vars to ray workers unless excluded (#14099)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-03-04 11:44:17 +08:00 |
|
Cody Yu
|
989f4f430c
|
[Misc] Remove lru_cache in NvmlCudaPlatform (#14156)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-04 11:09:34 +08:00 |
|
Divakar Verma
|
bb5b640359
|
[core] moe fp8 block quant tuning support (#14068)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-03-04 01:30:23 +00:00 |
|
Travis Johnson
|
c060b71408
|
[Model] Add support for GraniteMoeShared models (#13313)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-04 08:04:52 +08:00 |
|
iefgnoix
|
79e4937c65
|
[v1] Add comments to the new ragged paged attention Pallas kernel (#14155)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-03-03 23:00:55 +00:00 |
|
Qubitium-ModelCloud
|
cd1d3c3df8
|
[Docs] Add GPTQModel (#14056)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-03-03 21:59:09 +00:00 |
|
Michael Goin
|
19d98e0c7d
|
[Kernel] Optimize moe intermediate_cache usage (#13625)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-03 16:29:53 -05:00 |
|
Michael Goin
|
2b04c209ee
|
[Bugfix] Allow shared_experts skip quantization for DeepSeekV2/V3 (#14100)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-03 14:20:24 -07:00 |
|
Mark McLoughlin
|
ae122b1cbd
|
[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-03 19:04:45 +00:00 |
|
Nick Hill
|
872db2be0e
|
[V1] Simplify stats logging (#14082)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-03 10:34:14 -08:00 |
|
Mark McLoughlin
|
2dfdfed8a0
|
[V0][Metrics] Deprecate some KV/prefix cache metrics (#14136)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-03 18:25:46 +00:00 |
|