Chen Zhang
|
97fa8f6590
|
[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2026-02-10 07:41:16 +00:00 |
|
wang.yuqi
|
dab1de9f38
|
[Frontend][CI] Consolidate instrumentator entrypoints (#34123)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-10 07:30:19 +00:00 |
|
Andrew Xia
|
9608844f96
|
[responsesAPI] fix simpleContext streaming output_messages (#34188)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2026-02-09 22:53:07 -08:00 |
|
Cyrus Leung
|
ab97bcf662
|
[CI/Build] Relax test_mcp_tool_call (#34204)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-10 05:18:57 +00:00 |
|
Roger Wang
|
8a5e0e2b2b
|
[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-10 13:03:32 +08:00 |
|
Charlie Fu
|
bb9f97308d
|
[torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. (#33945)
Signed-off-by: charlifu <charlifu@amd.com>
|
2026-02-09 16:15:43 -05:00 |
|
Mohammad Miadh Angkad
|
d4f123cc48
|
[Kernel] FlashInfer: switch allreduce fusion to unified API (#33985)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
|
2026-02-09 15:43:24 +00:00 |
|
JJJYmmm
|
9562912cea
|
[MODEL] Adding Support for Qwen3.5 Models (#34110)
Signed-off-by: JJJYmmm <1650675829@qq.com>
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wulipc <wulipc@users.noreply.github.com>
Co-authored-by: ywang96 <ywang96@users.noreply.github.com>
Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-09 21:12:58 +08:00 |
|
Andreas Karatzas
|
3025b3cebb
|
[CI] Remove empty image_size_factors for fuyu, glm4_1v, glm_ocr (#34107)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-09 17:37:04 +08:00 |
|
Jee Jee Li
|
978a37c823
|
[Model] GLM adaptation (#34124)
|
2026-02-09 17:32:52 +08:00 |
|
wang.yuqi
|
22b64948f6
|
[Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-09 06:42:38 +00:00 |
|
Andrey Talman
|
f97ca67176
|
[Release 2.10] Update to Torch 2.10 - final release (#30525)
|
2026-02-08 13:51:09 -08:00 |
|
Reagan Lee
|
c4df59ad43
|
Add embedding input functionality for disabled modalities [remake] (#32493)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”>
Signed-off-by: Reagan Lee <reaganjlee@gmail.com>
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-08 04:57:16 -08:00 |
|
Nick Hill
|
a96197f564
|
[Perf] Simplify DeepseekV32 tokenizer, ensure fast detokenization used (#33855)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-08 07:16:34 +00:00 |
|
Cyrus Leung
|
7fcb705b80
|
[CI/Build] Skip GCS test (#34057)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 08:52:38 -08:00 |
|
Hashem Hashemi
|
ed17f54c8b
|
Perf tuning and expansion of cases covered for wvSplitKrc (#33493)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-07 05:33:11 -08:00 |
|
Jee Jee Li
|
db4ede9743
|
[Model] Enable Step3p5ForCausalLM testing (#33755)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-07 05:25:24 -08:00 |
|
Pooya Davoodi
|
2cb2340f7a
|
[Frontend]Add support for transcriptions and translations to run_batch (#33934)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-07 05:24:57 -08:00 |
|
Richard Zou
|
81fe69cae5
|
[torch.compile] Stop compiling identical artifacts (#34003)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-07 05:24:48 -08:00 |
|
Cyrus Leung
|
edb359cce4
|
[Renderer] Define render_cmpl and render_chat (#34039)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 05:24:40 -08:00 |
|
lukec
|
15a0b9e570
|
Fix spelling errors (#33978)
|
2026-02-06 23:58:50 -08:00 |
|
Cyrus Leung
|
48312e579a
|
[Misc] Make PlaceholderRange.get_num_embeds a method (#34035)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 05:30:17 +00:00 |
|
Ikenna
|
906077181b
|
[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 (#33967)
Signed-off-by: Ikenna <ikennachifo@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-07 02:27:33 +00:00 |
|
Aaron Hao
|
89a385d79f
|
[Feat][RL] Pause and Resume with keep requests for single engine (#32351)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-07 00:08:58 +00:00 |
|
kourosh hakhamaneshi
|
4a2d00eafd
|
[bugfix] [ROCm] Fix premature CUDA initialization in platform detection (#33941)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2026-02-06 16:17:55 -06:00 |
|
Sumanth R Hegde
|
ae2e93f89b
|
[Fix] Fix logprobs=0 handling for /inference/v1/generate endpoint (#34010)
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
|
2026-02-06 20:33:40 +00:00 |
|
Wentao Ye
|
77c09e1130
|
[Refactor] Remove align block size logic in moe_permute (#33449)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-06 10:57:06 -08:00 |
|
Seiji Eicher
|
aca5967416
|
[KV Connector] Add missing method overrides to MultiConnector (#33292)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2026-02-06 12:58:21 -05:00 |
|
Cyrus Leung
|
cd8b405bd0
|
[Refactor] Consolidate sequence normalization and enc-dec parsing (#33928)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-06 15:43:47 +00:00 |
|
Andreas Karatzas
|
350ca72c04
|
[ROCm][AITER] Fix AITER import regression for explicit backend selection (#33749)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-06 15:08:16 +00:00 |
|
Raushan Turganbay
|
85ee1d962b
|
[Bugfix] Fix models and tests for transformers v5 (#33977)
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-06 21:47:41 +08:00 |
|
Kurt Shuster
|
2991dd3d22
|
[Bugfix][Model] Support LoRA on Qwen3 Output Embedding (#29816)
Signed-off-by: kurt <kurt@thinkingmachines.ai>
|
2026-02-06 20:25:31 +08:00 |
|
Luka Govedič
|
ac32e66cf9
|
[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-06 04:19:49 -08:00 |
|
Xinyu Chen
|
e969a169ef
|
support view_from_cpu_tensor on XPU (#33868)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2026-02-06 08:34:20 +00:00 |
|
chengchengpei
|
965525667b
|
Onboard voyage-4-nano (#33720)
Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com>
Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com>
Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-06 06:23:34 +00:00 |
|
Mingliang Li
|
a32cb49b60
|
feat(frontend): early-fail tokenization guard for user requests (#31366)
Signed-off-by: limingliang <limingliang@stepfun.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: limingliang <limingliang@stepfun.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-05 19:38:02 -08:00 |
|
Xin Yang
|
79028d4388
|
[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568)
|
2026-02-05 20:34:00 -05:00 |
|
emricksini-h
|
325ab6b0a8
|
[Feature] OTEL tracing during loading (#31162)
|
2026-02-05 16:59:28 -08:00 |
|
Hashem Hashemi
|
d5c4800112
|
Adds padding and perf improvements to wvSplitK_fp8 (#33527)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-05 22:16:02 +00:00 |
|
Cyrus Leung
|
116880a5a0
|
[Bugfix] Make MM batching more robust (#33817)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-05 20:40:58 +00:00 |
|
Harry Mellor
|
1887acca9e
|
Fix tokenizer test for renamed attr on Transformers v5 (#33902)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-05 19:16:20 +00:00 |
|
bnellnm
|
a57c8228ff
|
[Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-05 18:07:18 +00:00 |
|
Benjamin Chislett
|
af3162d3aa
|
[Spec Decode] Unified Parallel Drafting (#32887)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-02-05 12:37:18 -05:00 |
|
Aaron Hao
|
c1858b7ec8
|
[Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
|
2026-02-05 12:13:23 -05:00 |
|
Mario Hong
|
82914d2ae8
|
[Bugfix] Fix step3p5 parser when using mtp (#33690)
Signed-off-by: mariohong <mariohong128@gmail.com>
|
2026-02-05 16:04:04 +00:00 |
|
wang.yuqi
|
1c3a221d3b
|
[Bugfix] Fix corner case of sparse embedding (#33886)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-05 02:51:22 -08:00 |
|
liranschour
|
8322d4e47f
|
Enable Cross layers KV cache layout at NIXL Connector V2 (#33339)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-05 02:17:02 -08:00 |
|
Andreas Karatzas
|
3e472e81f9
|
[ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba) (#32710)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-02-05 10:01:23 +00:00 |
|
Cyrus Leung
|
038914b7c8
|
[Refactor] Move task outside of PoolingParams.verify (#33796)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-05 09:33:11 +00:00 |
|
Mark McLoughlin
|
2abd97592f
|
[KV Connector][Metrics] Do not count local prefix cache hits in connector queries (#30522)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2026-02-05 09:57:27 +02:00 |
|