Fix routed experts capture for hybrid models (Mamba + Attention) (#35744)

Signed-off-by: arlenxu <arlenxu@tencent.com>
Signed-off-by: xhx1022 <1737006628@qq.com>
Co-authored-by: arlenxu <arlenxu@tencent.com>
This commit is contained in:
Hongxin Xu
2026-03-11 23:53:10 +08:00
committed by GitHub
parent a3ea760ea5
commit bea02cdf93
4 changed files with 442 additions and 16 deletions

View File

@@ -552,6 +552,9 @@ class Worker(WorkerBase):
else:
self.model_runner.initialize_kv_cache(kv_cache_config)
if self.model_config.enable_return_routed_experts:
self.model_runner.init_routed_experts_capturer()
# Build KV-zero metadata outside the CuMem pool so the bookkeeping
# GPU tensors (seg_addrs, block-id buffers) use the standard PyTorch
# allocator and are not discarded during sleep/wake cycles.