2abd97592f
[KV Connector][Metrics] Do not count local prefix cache hits in connector queries (#30522)
Mark McLoughlin
2026-02-05 07:57:27 +00:00
6abb0454ad
[Perf] Optimize the performance of structured output + reasoning (#33557)
Chauncey
2026-02-05 15:45:29 +08:00
db6f71d4c9
[CI/Build] Fix CPU CI test case title (#33870)
Li, Jiang
2026-02-05 15:07:14 +08:00
fd03538bf9
[CPU][BugFix] Allow w8a8 oneDNN quantized matmul to support 3D inputs (#33727)
Fadi Arafeh
2026-02-05 06:26:09 +00:00
1f70313e59
[Bugfix] Fix ScoreMultiModalParam multi-document scoring returning single result (#33837)
Andreas Karatzas
2026-02-05 00:17:00 -06:00
07daee132b
[CI/Build] Parallelize CPU CI tests (#33778)
Li, Jiang
2026-02-05 13:53:48 +08:00
9595afda18
[2/N] move responses/serving _make_response_output_items logic to parser (#33281)
Andrew Xia
2026-02-05 00:46:15 -05:00
c1395f72cd
[CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly (#33840)
rasmith
2026-02-04 23:05:48 -06:00
add9f1fbd9
[Minor] Include StreamingInput in inputs package (#33856)
Nick Hill
2026-02-04 20:38:20 -08:00
e3bf79ffa0
Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" (#33841)
Luka Govedič
2026-02-04 22:54:27 -05:00
fb1270f1f8
[CI][Bugfix]: return McpCall for built-in MCP tools in non-streaming mode (#32762)
Andreas Karatzas
2026-02-04 21:14:06 -06:00
72bb24e2db
[release] Minor fixes to release annotation (#33849)
Kevin H. Luu
2026-02-04 18:07:35 -08:00
a7be77beef
[Bugfix] fix DeepSeek R1 with CUTLASS MLA Broken on B200 (#33637)
Chauncey
2026-02-05 09:28:36 +08:00
bbe0574d8e
[Bugfix] Disable TRTLLM attention when KV transfer is enabled (#33192)
v0.15.2rc0
zhanqiuhu
2026-02-04 19:49:18 -05:00
4d9513537d
[CI][torch.compile] Reduce e2e fusion test time (#33293)
Luka Govedič
2026-02-04 19:09:03 -05:00
439afa4eea
feat: Add ColBERT late interaction model support (#33686)
Ilya Boytsov
2026-02-05 01:05:13 +01:00
fa4e0fb028
[Core] Don't schedule spec tokens with prefill chunks (#33652)
Nick Hill
2026-02-04 15:40:22 -08:00
ce498a6d61
Change the type signature of MixtureOfExperts.expert_weights to MutableSequence[Sequence[Tensor]] (#33573)
Sage Moore
2026-02-04 14:02:46 -08:00
9f14c9224d
Revert "[torch.compile] Significantly speed up cold start times" (#33820)
Richard Zou
2026-02-04 13:59:59 -08:00
535de06cb1
[Model] Add transcription support for Qwen3-Omni (#29828)
Muhammad Hashmi
2026-02-04 13:17:47 -08:00
4292c90a2a
[Bugfix] Support RotaryEmbedding CustomOp for gpt-oss (#33800)
Simon Danielsson
2026-02-04 21:17:41 +01:00
6e98f6d8b6
Implement zero-copy GQA for multimodal and CPU (#33732)
Taeksang Kim
2026-02-05 05:11:39 +09:00
2f6d17cb2f
[rocm][ray] Fix: Unify Ray device visibility handling across CUDA and ROCm (#33308)
kourosh hakhamaneshi
2026-02-04 10:09:14 -08:00
83449a5ff0
[Refactor] Clean up pooling serial utils (#33665)
Cyrus Leung
2026-02-03 18:29:18 +08:00
e4bf6ed90d
[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624)
Richard Zou
2026-02-02 19:38:49 -08:00
dad2d6a590
[Bugfix][Model] Fix DeepSeek-OCR-2 chat template to include BOS token (#33642)
Lucas Hänke de Cansino
2026-02-03 09:35:58 +01:00
611b18757e
[torch.compile] Speed up MOE handling in forward_context (#33184)
Richard Zou
2026-01-27 18:17:54 -05:00
9cd2cce17d
[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624)
v0.15.1rc0
Richard Zou
2026-02-02 19:38:49 -08:00