Commit Graph

  • 98b4d389ed [Redo] #26368 (#28771) Cyrus Leung 2025-11-15 14:47:41 +08:00
  • 6965ef436f [Performance][DeepGEMM] Estimate expected_m (#28694) Varun Sundar Rabindranath 2025-11-15 00:52:14 -05:00
  • c9e665852a [NIXL] heterogeneous block_size support (#26759) Chendi.Xue 2025-11-14 23:51:32 -06:00
  • 363aaeef0f Fix IntermediateTensors initialization and add type hints (#28743) Mohammad Othman 2025-11-15 06:31:36 +02:00
  • ac86bff8cb Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773) Nick Hill 2025-11-14 20:24:00 -08:00
  • edfe498189 [Bugfix] Build hadacore kernels on >SM90 (#28748) Michael Goin 2025-11-14 22:51:05 -05:00
  • f05d474c8a [Model][Qwen3VL] Use mm_position to compute mrope positions (#28730) Lukas Geiger 2025-11-15 03:45:11 +00:00
  • 9fc81ec765 [TPU] Fix import error in tpu launch (#28758) QiliangCui 2025-11-14 16:58:32 -08:00
  • 186352b270 [Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368) Jialin Ouyang 2025-11-14 16:04:04 -08:00
  • 58e61e56b7 [Test] Rework e2e async scheduling tests (#28744) Nick Hill 2025-11-14 16:01:09 -08:00
  • 75f01b9d3c [ROCm][CI/Build] Upgrade to ROCm 7.1 and AITER main (#28753) Gregory Shtrasberg 2025-11-14 18:53:21 -05:00
  • ba041d980b [Log] Save profiler results to file instead of stdout (#28144) rasmith 2025-11-14 17:26:39 -06:00
  • e0c910bb89 [Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 (#28295) Thomas Parnell 2025-11-14 23:55:42 +01:00
  • bf3ffb61e6 [Bugfix] Fix ChunkedLocalAttention CUDA Graph setting (#28739) Benjamin Chislett 2025-11-14 17:14:46 -05:00
  • e5c78956c0 [Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740) Alexander Matveev 2025-11-14 17:13:46 -05:00
  • 2e0ad629b0 Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110) Laith Sakka 2025-11-14 14:11:10 -08:00
  • 5a84b76b86 [ROCm][CI/Build] Change install location of uv (#28741) Gregory Shtrasberg 2025-11-14 16:34:18 -05:00
  • 0de4f217ab [Bugfix] TypeError: 'NoneType' object is not callable (#27410) Marcin Ostrowski 2025-11-14 22:13:53 +01:00
  • f08eab2acc [CI] Fix macos smoke test uv cache issue (#28736) Michael Goin 2025-11-14 15:29:55 -05:00
  • 8977ffb5e6 [ROCm][Bugfix] Fix compilation errors with fused_qknorm_rope_kernel.cu (#28682) Sage Moore 2025-11-14 11:06:01 -08:00
  • fd4555089a [BugFix] Fix misprint introduced by modular_kernel refactoring. (#28728) Andrey Khalyavin 2025-11-14 21:58:18 +03:00
  • cec275efce [Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure (#28663) GuanH 2025-11-15 02:44:27 +08:00
  • e2741f6cbc [Chore] Rename SchedulerConfig.chunked_prefill_enabled (#28735) Cyrus Leung 2025-11-15 02:39:57 +08:00
  • 67187554dd [Docs] Enable some more markdown lint rules for the docs (#28731) Harry Mellor 2025-11-14 18:39:19 +00:00
  • a425dc256e [Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo (#28716) TJian 2025-11-14 10:30:50 -08:00
  • 964d65deed LLaMA4 LoRA Adapter Enablement (#28602) Fardin Hoque 2025-11-14 10:27:56 -08:00
  • 9261eb3dc1 docs(lora_resolvers): clarify multi-resolver order and storage path requirement (#28153) Chen Wang 2025-11-14 13:08:30 -05:00
  • cdd7025961 [kernel] Improve FP8 PTPC on Hopper for larger shapes (#28692) czhu-cohere 2025-11-14 12:59:11 -05:00
  • 085424808e Remove audio optional dependency for mistral-common (#28722) Julien Denize 2025-11-14 18:54:38 +01:00
  • a17e36f223 Fix typo in comment: existance -> existence (#28737) Mohammad Othman 2025-11-14 19:35:45 +02:00
  • 8cc40f8992 [Attention] Bump FA for removed method (#28429) Matthew Bonanni 2025-11-14 12:13:37 -05:00
  • 6f1e7f7226 [DisaggEverything] Tokens in<>out /generate endpoint (#24261) Nicolò Lucchesi 2025-11-14 17:58:01 +01:00
  • d54a18a47e [CI][CPU] Smoke test for Apple Silicon using GHA MacOS runner (#28688) Michael Goin 2025-11-14 11:37:18 -05:00
  • 5f3cd7f7f2 [Docs] Update the name of Transformers backend -> Transformers modeling backend (#28725) Harry Mellor 2025-11-14 16:34:14 +00:00
  • c934caee88 [Fix] improve aspect ratio in dummy image generation and add common VLM tests for PaddleOCR-VL (#28711) dongbo910220 2025-11-15 00:07:20 +08:00
  • 3f8a874065 [Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) (#27134) Duncan Moss 2025-11-14 08:02:44 -08:00
  • 511a6b611d [Config] Clean up SchedulerConfig initialization (#28665) Cyrus Leung 2025-11-14 22:41:02 +08:00
  • 96b23b8e3b [Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677) Nicolò Lucchesi 2025-11-14 15:40:05 +01:00
  • 433c0f8675 [Model] Fix bailing_moe accuracy problem (#28277) zhaozx-cn 2025-11-14 21:33:02 +08:00
  • 8d3748d3c7 [Doc] Fix macOS installation dependency resolution issue (#26721) Fasal Shah 2025-11-14 18:13:56 +05:30
  • db56a59970 [BugFix] Fix FA3 IMA with FULL_AND_PIECEWISE and cascade attention (default) (#28702) Lucas Wilkinson 2025-11-14 07:19:22 -05:00
  • 9324e10275 Fix KV sharing fast prefill with cudagraph enabled (#28537) Yong Hoon Shin 2025-11-14 01:53:42 -10:00
  • 4516d44b7f [DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer (#25438) Jingchun Gao 2025-11-14 19:24:10 +08:00
  • 41b92f7d38 [Model][MM] Extract conv layer as CustomOp (#28455) Shanshan Shen 2025-11-14 19:16:13 +08:00
  • 360bd8762f [Frontend] Added chat-style multimodal support to /classify. (#27516) Srreyansh Sethi 2025-11-14 03:03:55 -08:00
  • ecf8230d4d [Metrics] Log number of preempted requests (#28522) lyn610 2025-11-14 17:47:45 +08:00
  • 8cfbe89b93 [Misc] fix comment in test_envs (#28529) Xing Liu 2025-11-14 01:32:46 -08:00
  • fd75d3e8c0 [Minor] avoid register new custom and just import silly_attn (#28578) Boyuan Feng 2025-11-14 01:32:31 -08:00
  • c9a3a02149 Add output token counting to gsm8k eval (#28594) Michael Goin 2025-11-14 04:32:03 -05:00
  • bc3e43069a [BugFix] Fix multi-modal async scheduling race condition (#28706) Nick Hill 2025-11-14 01:11:13 -08:00
  • c36bcfe6b3 [Bugfix] fix dots.ocr pp support (#28705) Jiangyun Zhu 2025-11-14 17:01:26 +08:00
  • 529cea343d use default CCL_ZE_IPC_EXCHANGE (#28700) Yan Ma 2025-11-14 16:55:29 +08:00
  • 93103575ce [BugFix][CI/Build][ROCM] Fix import error and apply assert in appropriate case in test_struct_output_generate (#28311) rasmith 2025-11-14 00:41:29 -06:00
  • 15ae8e0784 [Bugfix][CI/Test][Spec Decode] Fix illegal memory access in offline_inference/spec_decode.py (Issue 27619) (#28432) rasmith 2025-11-14 00:34:01 -06:00
  • 0b25498990 [Misc] add ignore mapper for quark quantization (#28275) haoyangli-amd 2025-11-14 13:56:35 +08:00
  • 0aecd9138f [Misc] Update xformers to 0.33.0.post1 (#28678) Roger Wang 2025-11-13 21:52:53 -08:00
  • da14ae0fad [XPU][CI]disable lm cache uts (#28696) Kunshang Ji 2025-11-14 11:15:50 +08:00
  • 01bea115c4 [Misc] Remove warn_for_unimplemented_methods (#28613) Cyrus Leung 2025-11-14 11:10:10 +08:00
  • b39a5026eb [ci][amd] fix basic models extra init test (#28676) Bradley D 2025-11-13 18:44:36 -08:00
  • 622e6106a9 [CPU][Bugfix] Fix Apple Silicon M1 compilation failure (#28681) Michael Goin 2025-11-13 20:49:55 -05:00
  • 2aa75c752b [ROCm] Bump up the version of amd-smi to 6.4.3 (#28680) Sage Moore 2025-11-13 17:24:28 -08:00
  • 4d5943bda6 [quantization][config] enable override existing quant_config (#28510) Hank_ 2025-11-14 09:24:10 +08:00
  • f2b8e1c551 Mirrored test group definitions for AMD (2025-11-11) (#28573) Alexei-V-Ivanov-AMD 2025-11-13 18:16:34 -06:00
  • 6e25b1cddf [KV Connector] Test async mode in scheduler tests (#28550) Mark McLoughlin 2025-11-13 23:30:59 +00:00
  • e64011f29a [CI] Bug: Fix ci entrypoint pooling (#28684) Wentao Ye 2025-11-13 17:19:35 -05:00
  • 1b622deba7 [Misc] Update CODEOWNERS for simon-mo and comaniac (#28675) Simon Mo 2025-11-13 13:01:43 -08:00
  • faed7bf07e [Bugfix] [CPU] bump torch to 2.9.0 for Darwin to fix segmentation fault (#27791) Kebe 2025-11-14 05:48:08 +09:00
  • 262d263f6c [Bugfix] Eliminate tuple inputs to submodules in graph partitioning (#28533) Yanan Cao 2025-11-13 12:09:05 -08:00
  • 968060c15a [bugfix] correct local_chunk_len for DCP in reorg_kvcache with long context (#28526) Qiu 2025-11-14 03:29:22 +08:00
  • 5d6ce2b960 [Perf] Support stream interval for reducing host overhead (#27869) elvischenv 2025-11-14 02:21:25 +08:00
  • f9f3b596f3 [Attention][Bugfix] Fix FA sink support (#28660) Matthew Bonanni 2025-11-13 12:20:01 -06:00
  • 119c4927b3 [Bugfix] Fix validate model input for decoder models (#27099) Yannick Schnider 2025-11-13 19:18:47 +01:00
  • fe1cd7704d [Performance][B200] silu_mul_quant: pack scales in int32 (#28358) Varun Sundar Rabindranath 2025-11-13 13:16:55 -05:00
  • fdfd5075aa [TPU] patch TPU wheel build script to resolve metadata issue (#27279) Johnny Yang 2025-11-13 09:36:54 -08:00
  • 327c0a9a23 [BugFix] Ensure EngineArgs.create_engine_config is idempotent (#28515) Nick Hill 2025-11-13 09:14:08 -08:00
  • 06c4873d95 Rewrite C++ meta funcs to Python (#28595) Jane (Yuan) Xu 2025-11-13 11:52:50 -05:00
  • d3387750f1 [Misc] Turn off encoder torch compile by default (#28634) Roger Wang 2025-11-13 08:38:08 -08:00
  • b230286fbc Fix get_num_experts when config sets it explicitly to None (#28652) Harry Mellor 2025-11-13 16:02:42 +00:00
  • 3035d1a166 [BugFix] DeepSeek-OCR: apply NoRepeatNGramLogitsProcessor to greedy path (#28617) Yuanping Song 2025-11-13 10:24:35 -05:00
  • 07a606aa7e [CI Failure] Fix backend selection for encoder-only models (#28534) Huamin Li 2025-11-13 07:11:27 -08:00
  • a7791eac9d [CI/Build] Install uv for AMD MI300: Language Models Tests (Hybrid) %N (#28142) amdfaa 2025-11-13 09:34:55 -05:00
  • 8da2f28f53 [ROCm][BugFix]Fix get_cu_count in rocm_aiter_fa.py (#28618) Pleaplusone 2025-11-13 22:18:20 +08:00
  • 86d15bfd8d [Hardware][PowerPC] Fix fp16 compilation error for Power in cpu attention backend and bump oneDNN version (#28535) Akash kaothalkar 2025-11-13 19:02:21 +05:30
  • c9fe6abe7c [Bugfix] Fix FPS value type for Qwen2.5-Omni video processing (#28630) Fanli Lin 2025-11-13 21:06:06 +08:00
  • c47b6c85ac [XPU] add sym params to IPEXConfig (#28611) zofia 2025-11-13 19:35:04 +08:00
  • c428e8d80b Fix io processor pooling #28273 (#28484) baonudesifeizhai 2025-11-13 06:34:14 -05:00
  • 5e973209aa [BugFix] Fix type error when assign a trition kernel tensor to a torch.nn.Parameter (#28603) Zijing Liu 2025-11-13 03:30:04 -08:00
  • e63fd44560 Fix: Correctly filter special tokens in benchmark_prefix_caching (#28615) Di Wu 2025-11-13 18:57:44 +08:00
  • 11ac9ddd03 Support all interleaved layer types (#28485) Yong Hoon Shin 2025-11-12 22:57:20 -10:00
  • 5c9ad138d5 [Frontend] supports interleaved thinking (#28531) Chauncey 2025-11-13 16:14:13 +08:00
  • fa183e9271 [Bugfix] fix kimi-linear crash (#28445) Jiangyun Zhu 2025-11-13 15:59:58 +08:00
  • 4ab34f6ef1 Add NUMA node validation for CPU thread binding (#28555) usberkeley 2025-11-13 15:03:52 +08:00
  • c33b87e777 Use official xformers-0.0.33 built for PT 2.9 (#28600) Huy Do 2025-11-12 22:48:53 -08:00
  • 4504e8029b [Bugfix] Prevent crash on empty grammar string (#28210) tjandy98 2025-11-13 14:42:29 +08:00
  • ca00b1bfc6 [ROCm][BugFix] Remove the usage of device_info from aiter (#28383) Pleaplusone 2025-11-13 13:43:42 +08:00
  • d44fbbab0e [build][cmake]: Bundle static ACL and torch libgomp for CPU extension builds (#28059) Radu Salavat 2025-11-12 21:43:08 -08:00
  • 7e082bc14e Support DeepEP for Kimi-k2-thinking through enabling gemm selection for compressed-tensor marlin wna16 (#28574) Lucia Fang 2025-11-12 21:40:45 -08:00
  • dbbe0c756a [XPU] Support Triton path for LoRA operations on XPU (#28511) Fanli Lin 2025-11-13 13:31:42 +08:00
  • 7dca0c90cb [BugFix][ROCm] Fix get_cu_count missing variable error (#28608) Pleaplusone 2025-11-13 13:18:56 +08:00
  • 1a0b157a2e [Frontend][responsesAPI][1/n] convert responses API tool input to chat completions tool format (#28231) Andrew Xia 2025-11-12 20:47:22 -08:00