kourosh hakhamaneshi
|
a75a5b54c7
|
[bug-fix] supported_tasks is breaking backward compatibility at init_app_state (#34027)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-09 09:46:46 +08:00 |
|
Andrey Talman
|
f97ca67176
|
[Release 2.10] Update to Torch 2.10 - final release (#30525)
|
2026-02-08 13:51:09 -08:00 |
|
danisereb
|
084aa19f02
|
Add support for ModelOpt MXFP8 dense models (#33786)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-02-08 11:16:48 -08:00 |
|
navmarri14
|
1ecfabe525
|
glm 4.6 fused tuned inference config for B200 (#32958)
|
2026-02-08 18:55:47 +00:00 |
|
Richard Zou
|
4df841fe75
|
[torch.compile] Add an option to force-enable the MOE cold start optimization (#33735)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-08 18:42:56 +00:00 |
|
TomerBN-Nvidia
|
a263aa6140
|
[BugFix] Change support no act and mul for marlin (#34088)
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
|
2026-02-08 17:18:22 +00:00 |
|
aabbccddwasd
|
179ae7da8f
|
[Revert] Fix performance regression for GLM-4.7-GPTQ decode and MTP acceptance rate (#33771)
Signed-off-by: aabbccddwasd <aabbccddwasd@qq.com>
|
2026-02-08 08:13:24 -08:00 |
|
Reagan Lee
|
c4df59ad43
|
Add embedding input functionality for disabled modalities [remake] (#32493)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”>
Signed-off-by: Reagan Lee <reaganjlee@gmail.com>
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-08 04:57:16 -08:00 |
|
TJian
|
785cf28fff
|
[ROCm] [CI] Reduce Resource of two test groups (#34059)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-02-08 15:17:26 +08:00 |
|
Nick Hill
|
a96197f564
|
[Perf] Simplify DeepseekV32 tokenizer, ensure fast detokenization used (#33855)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-08 07:16:34 +00:00 |
|
Andreas Karatzas
|
ab10d79855
|
[ROCm][Bugfix] fix act_quant_fusion module import error (#34069)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-07 19:21:12 -08:00 |
|
Cyrus Leung
|
7fcb705b80
|
[CI/Build] Skip GCS test (#34057)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 08:52:38 -08:00 |
|
Cyrus Leung
|
b956cdf818
|
[Doc] Fix run_batch docs (#34056)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 06:18:16 -08:00 |
|
Hashem Hashemi
|
ed17f54c8b
|
Perf tuning and expansion of cases covered for wvSplitKrc (#33493)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-07 05:33:11 -08:00 |
|
Jiang Wu
|
860981d8d8
|
Make directory exist ok for ray spinning up multiple replicas on a single instance (#33604)
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
|
2026-02-07 05:30:49 -08:00 |
|
zifeitong
|
52181baaea
|
Update DeepGEMM version pin in Dockerfile to match #32479 (#33935)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-02-07 05:30:22 -08:00 |
|
Rohan Potdar
|
de3869bb4d
|
move checks out of unified_kv_cache_update custom op (#33943)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-02-07 05:30:09 -08:00 |
|
whx
|
ce9b3cd3e9
|
[PluggableLayer][3/N] Apply PluggableLayer to mamba layers. (#33660)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2026-02-07 05:26:05 -08:00 |
|
Jee Jee Li
|
db4ede9743
|
[Model] Enable Step3p5ForCausalLM testing (#33755)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-07 05:25:24 -08:00 |
|
Pooya Davoodi
|
2cb2340f7a
|
[Frontend]Add support for transcriptions and translations to run_batch (#33934)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-07 05:24:57 -08:00 |
|
TundeAtSN
|
4df44c16ba
|
Enable Eagle3 speculative decoding for Mistral3ForConditionalGeneration to support eagle3 (#33939)
Signed-off-by: Akintunde Oladipo <akintunde.oladipo@servicenow.com>
Signed-off-by: TundeAtSN <akintunde.oladipo@servicenow.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-07 05:24:52 -08:00 |
|
Richard Zou
|
81fe69cae5
|
[torch.compile] Stop compiling identical artifacts (#34003)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-07 05:24:48 -08:00 |
|
Mohammad Miadh Angkad
|
dd6a6e1190
|
[Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune (#34006)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-07 05:24:44 -08:00 |
|
Cyrus Leung
|
edb359cce4
|
[Renderer] Define render_cmpl and render_chat (#34039)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 05:24:40 -08:00 |
|
wang.yuqi
|
6ed5eda300
|
[CI][Build] Pin grpcio-tools==1.78.0 (#34048)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-07 05:24:35 -08:00 |
|
Cyrus Leung
|
11a4c9d30d
|
[Misc] Simplify get_max_tokens (#34036)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 00:59:49 -08:00 |
|
lukec
|
15a0b9e570
|
Fix spelling errors (#33978)
|
2026-02-06 23:58:50 -08:00 |
|
Andreas Karatzas
|
c490d8cc73
|
[ROCm][CI] Pinning lm-eval version to resolve multi-modal small eval bug (#34038)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-06 22:21:08 -08:00 |
|
Cyrus Leung
|
48312e579a
|
[Misc] Make PlaceholderRange.get_num_embeds a method (#34035)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 05:30:17 +00:00 |
|
Vel
|
bc32444b23
|
[Kernel] Add enable_sm120_or_later for SM121 (DGX Spark) CUTLASS support (#33517)
Signed-off-by: code4me2 <velvetmoon222999@gmail.com>
|
2026-02-06 20:28:01 -08:00 |
|
Wentao Ye
|
18e8545297
|
[Revert] Add util handle_deprecated back (#33998)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-07 04:14:45 +00:00 |
|
果冻虾仁
|
6f7adc533a
|
fix description in plugin_system.md (#33999)
|
2026-02-06 19:37:02 -08:00 |
|
Nick Hill
|
40218a82ba
|
[ModelRunner V2] Revert token rank comparison difference for now (#34017)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-07 11:11:05 +08:00 |
|
kourosh hakhamaneshi
|
1c3b22058f
|
[Misc] Add backward-compatible import aliases for renamed translations module (#34015)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-02-07 11:01:41 +08:00 |
|
Xin Yang
|
3920cafdd6
|
[Bugfix] Fix _fused_moe_lora_expand signature mismatch (#33821)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-02-07 10:45:59 +08:00 |
|
rasmith
|
ec28784fdc
|
[CI][AMD]Bugfix] Check that model_config is not None in enable_norm_pad_fusion (#34007)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-02-07 02:43:25 +00:00 |
|
Nicolò Lucchesi
|
55aeec04f5
|
[Bugfix] Fix Whisper tokenization (#34011)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-07 10:42:52 +08:00 |
|
Ikenna
|
906077181b
|
[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 (#33967)
Signed-off-by: Ikenna <ikennachifo@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-07 02:27:33 +00:00 |
|
Aaron Hao
|
89a385d79f
|
[Feat][RL] Pause and Resume with keep requests for single engine (#32351)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-07 00:08:58 +00:00 |
|
kourosh hakhamaneshi
|
4a2d00eafd
|
[bugfix] [ROCm] Fix premature CUDA initialization in platform detection (#33941)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2026-02-06 16:17:55 -06:00 |
|
Dimitrios Bariamis
|
207c3a0c20
|
Fix RoutingMethodType logic (#33919)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-02-06 14:03:34 -08:00 |
|
Sumanth R Hegde
|
ae2e93f89b
|
[Fix] Fix logprobs=0 handling for /inference/v1/generate endpoint (#34010)
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
|
2026-02-06 20:33:40 +00:00 |
|
xuebwang-amd
|
9e9acce577
|
[Bugfix] Fix no attribute error of SharedFusedMoE (DeepSeek-V3.1 as test model) (#33993)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
2026-02-06 19:11:32 +00:00 |
|
Charlie Fu
|
fe5438200b
|
[Rocm][Bugfix] Fix dtype not same for gemm_a4w4 op (#33734)
Signed-off-by: charlifu <charlifu@amd.com>
|
2026-02-06 19:09:59 +00:00 |
|
Wentao Ye
|
77c09e1130
|
[Refactor] Remove align block size logic in moe_permute (#33449)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-06 10:57:06 -08:00 |
|
zhrrr
|
16786da735
|
[Model Runner V2] support apply penalty for spec decode (#33251)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2026-02-06 10:56:48 -08:00 |
|
vllmellm
|
aaa2efbe98
|
[DOC] [ROCm] Update docker deployment doc (#33971)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-06 10:05:35 -08:00 |
|
Seiji Eicher
|
aca5967416
|
[KV Connector] Add missing method overrides to MultiConnector (#33292)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2026-02-06 12:58:21 -05:00 |
|
Wentao Ye
|
67a746e87f
|
[Log] Optimize duplicate startup log (#33944)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-06 17:49:56 +00:00 |
|
Chauncey
|
7bec435130
|
[Bugfix] Fix the issue where tool calling does not work when using fast detokenization with dsv32 (#33964)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-02-06 09:23:44 -08:00 |
|