Commit Graph

  • 1ab8fc8197 Make PyTorch profiler gzip and CUDA time dump configurable (#29568) Yifei Zhang 2025-12-01 12:30:46 +08:00
  • f72a817bdf [MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141) Shu Wang 2025-11-30 18:05:32 -06:00
  • ec38a7368d [Model Runner V2] Use packed mask for prompt bin counts (#29756) Woosuk Kwon 2025-11-30 14:15:42 -08:00
  • 21c2627934 [Misc]Remove redundant hidden_size property in ModelConfig (#29749) Xingyu Liu 2025-12-01 01:14:23 +08:00
  • 39d28108f4 [Feat] Support non-gated activations in NVFP4 modelopt path (#29004) Omer Ullman Argov 2025-11-30 18:02:40 +02:00
  • cd719de5cb Fix RoPE failures in Transformers nightly (#29700) Harry Mellor 2025-11-30 14:29:32 +00:00
  • 8c363ed666 [ROCm][Attention] Sliding window support for AiterFlashAttentionBackend (#29234) Pleaplusone 2025-11-30 19:31:50 +08:00
  • 64bc09ba27 [Core] Enable inputs_embeds_size separate from hidden_size (#29741) Cyrus Leung 2025-11-30 17:31:12 +08:00
  • 47539cfd3e [Bugfix] Fix mismatched nvfp4 gemm output shape (#29742) Isotr0py 2025-11-30 17:15:01 +08:00
  • 2afcec4dec [Misc] Update TokenizerLike interface and move get_cached_tokenizer (#29730) Cyrus Leung 2025-11-30 14:59:47 +08:00
  • 9381b5cde0 [Doc]: Fix typo in fused_moe layer (#29731) 2025-11-30 14:29:13 +08:00
  • 66b5840287 [Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output (#28783) Vensen 2025-11-30 14:24:25 +08:00
  • 82c795d6f2 Fix AttributeError about _use_fi_prefill (#29734) Huamin Li 2025-11-29 22:04:55 -08:00
  • e1464c3a08 [Quantization] Enable compressed-tensors AWQ for Turing GPU (#29732) Isotr0py 2025-11-30 14:04:28 +08:00
  • a491b0911b [LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708) Xin Yang 2025-11-29 18:37:25 -08:00
  • b9d0504a36 [Bugfix] Revert test_tokenization.py (#29729) Jee Jee Li 2025-11-30 00:35:15 +08:00
  • 1656ad3704 [Kernel][Quantization] add w4a8 support for marlin kernel (#24722) Jinzhen Lin 2025-11-29 23:19:33 +08:00
  • fa59fe417f [Chore] Move detokenizer_utils to vllm/tokenizers (#29727) Cyrus Leung 2025-11-29 22:25:17 +08:00
  • fe3398fab2 [Chore] Enable passing tokenizer=None into MM processor (#29724) Cyrus Leung 2025-11-29 22:25:10 +08:00
  • ad7f714d62 hfrunner.classify should return list[list[float]] not list[str] (#29671) Chukwuma Nwaugha 2025-11-29 13:57:00 +00:00
  • f4341f45d3 [Doc]: fix code block rendering (#29728) dublc 2025-11-29 21:46:48 +08:00
  • 34a984274e [Misc] Refactor tokenizer interface (#29693) Cyrus Leung 2025-11-29 20:02:21 +08:00
  • f223ed4181 [Model Runner V2] Fuse penalties and temperature into single kernel (#29720) Woosuk Kwon 2025-11-29 02:29:16 -08:00
  • 04a797cd0e [Doc]: fixing typos in various files. (#29717) Didier Durand 2025-11-29 10:15:39 +01:00
  • 6afc0ffaf6 [Model Runner V2] Add sample/ directory and reorganize files (#29719) Woosuk Kwon 2025-11-29 00:41:01 -08:00
  • 39e63dec7c [LoRA] Cleanup LoRA unused code (#29611) Jee Jee Li 2025-11-29 14:52:58 +08:00
  • 4a80ad0a25 [Model Runner V2] Don't use UVA buffer for prefill_len (#29713) Woosuk Kwon 2025-11-28 20:27:16 -08:00
  • 4b17ce6815 Add gpu memory wait before test_async_tp (#28893) Angela Yi 2025-11-28 20:19:05 -08:00
  • e23f665d83 [BugFix] Fix DBO failing with TypeError: 'NoneType' object is not iterable (#29698) Lucas Wilkinson 2025-11-28 23:19:01 -05:00
  • ca1b1e7296 [Model Runner V2] Refactor prefill token preparation (#29712) Woosuk Kwon 2025-11-28 19:49:17 -08:00
  • 762a4a6ca9 [Frontend] Perform offline path replacement to tokenizer (#29706) Tsukasa OI 2025-11-29 11:32:08 +09:00
  • b2c50eda50 [Bugfix] Fix wrong mock attribute (#29704) Cyrus Leung 2025-11-29 10:30:41 +08:00
  • 1dcafb3dea [Model Runner V2] Support penalties using bin counts (#29703) Woosuk Kwon 2025-11-28 17:53:17 -08:00
  • ea3370b428 [ROCm][Bugfix] Patch for the Multi-Modal Processor Test group (#29702) Andreas Karatzas 2025-11-28 19:31:44 -06:00
  • c625d7b1c6 [Bugfix] Fix O(n²) multimodal string prompt processing (#29667) Mert Unsal 2025-11-28 16:10:39 -08:00
  • 6173682b6e [compile] Include enable_sleep_mode into caching factors. (#29696) Zhengxu Chen 2025-11-28 18:58:38 -05:00
  • 9726e64530 bugfix: correct attn output with base 2 or e (#28840) Augusto Yao 2025-11-29 07:52:12 +08:00
  • 3fd1fb0b60 Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)" (#29697) Huamin Li 2025-11-28 15:26:52 -08:00
  • a51f4186f2 [Bugfix] fix dots.llm1.inst (#29687) Jiangyun Zhu 2025-11-29 07:25:26 +08:00
  • 7675ba30de [Misc] Remove redundant ClassRegistry (#29681) Cyrus Leung 2025-11-29 07:24:47 +08:00
  • 7c1ed45848 [CI/Build]: make it possible to build with a free-threaded interpreter (#29241) Ralf Gommers 2025-11-29 00:21:46 +01:00
  • 1986de1375 [Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597) Benjamin Chislett 2025-11-28 17:25:05 -05:00
  • 3461e7efd8 [Frontend] Remap -O to -cc commandline flag (#29557) Yanan Cao 2025-11-28 13:51:12 -08:00
  • fecae12cd7 Remove all_special_tokens_extended from tokenizer code (#29686) Harry Mellor 2025-11-28 20:26:51 +00:00
  • 8d9338fae4 [Chore] Rename Processor to InputProcessor (#29682) Cyrus Leung 2025-11-29 01:35:41 +08:00
  • d40c854009 [CI/Build] Rework CPU multimodal processor test (#29684) Isotr0py 2025-11-29 01:10:29 +08:00
  • 4332955602 [Docs] Add CLI reference doc for vllm bench sweep plot_pareto (#29689) Harry Mellor 2025-11-28 17:10:08 +00:00
  • f946a8d743 [Chore]: Reorganize model repo operating functions in transformers_utils (#29680) Isotr0py 2025-11-29 00:46:51 +08:00
  • 6f9d81d03b [V0 deprecation] Clean up legacy paged attention helper functions (#28043) Isotr0py 2025-11-29 00:44:33 +08:00
  • fae6943068 [Doc]: fixing typos in multiple files. (#29685) Didier Durand 2025-11-28 17:41:41 +01:00
  • 3bcbb30cbf add add_truncate_prompt_tokens in repr for PoolingParams (#29683) 果冻虾仁 2025-11-29 00:41:05 +08:00
  • 9e6bcda3ac [mypy] Enable type checking for more directories (#29674) Cyrus Leung 2025-11-29 00:39:27 +08:00
  • 9eec282cb5 Guard FlashInfer sampler using the same check as FlashInfer attention backend (#29415) Harry Mellor 2025-11-28 16:34:48 +00:00
  • 0808eb813b [Misc] Remove yapf directives (#29675) Cyrus Leung 2025-11-28 23:07:23 +08:00
  • 460d8bbf2d Remove upstream fa checks (#29471) Mingyuan Ma 2025-11-28 05:52:42 -08:00
  • e2f56c309d [CPU] Update torch 2.9.1 for CPU backend (#29664) Li, Jiang 2025-11-28 21:37:54 +08:00
  • f8151b66fa Revert "Supress verbose logs from model_hosting_container_standards (… (#29335) HappyAmazonian 2025-11-28 05:29:05 -08:00
  • 1168768a2d [Optimization] Early return for _apply_matches and _iter_placeholders (#29668) Cyrus Leung 2025-11-28 21:26:47 +08:00
  • 8e7a891602 [BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542) Nick Hill 2025-11-28 04:52:23 -08:00
  • 953d9c820b [mypy] Pass type checking for vllm/utils and vllm/v1/pool (#29666) Cyrus Leung 2025-11-28 20:40:47 +08:00
  • 33b06a6f24 [Misc] Remove redundant attention var constants (#29650) Cyrus Leung 2025-11-28 20:35:19 +08:00
  • 5c2b5cb422 [Docs] Add SPLADE and Ultravox models to supported models documentation (#29659) Wilson Wu 2025-11-28 18:29:28 +08:00
  • 3cb32e5d6e [Rocm] Set VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS default is disabled (#28985) 杰兮 2025-11-28 18:08:42 +08:00
  • ccbdf51bd5 [Doc] Reorganize benchmark docs (#29658) Cyrus Leung 2025-11-28 17:19:25 +08:00
  • 5f5521bd5d Fix parameter order in GPT-OSS weight loading function for non-MXFP4 weights (#29506) Filipp Fisin 2025-11-28 09:45:10 +01:00
  • b2c1d294fa [BUGFIX] MistralTokenizer._call__ adds an invalid EOS token (#29607) Julien Denize 2025-11-28 09:44:47 +01:00
  • cc0f2a0e19 [Doc] Improve abnormal information string (#29655) maang-h 2025-11-28 16:12:20 +08:00
  • 480598958e [Feature][Bench] Add pareto visualization (#29477) rongfu.leng 2025-11-28 15:53:20 +08:00
  • b34e8775a3 Revert "[CPU]Update CPU PyTorch to 2.9.0 (#29589)" (#29647) Cyrus Leung 2025-11-28 14:43:18 +08:00
  • f4b76056ee Improve enable chunked_prefill & prefix_caching logic. (#26623) wang.yuqi 2025-11-28 14:05:48 +08:00
  • 37b15e97e8 [Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594) EanWang211123 2025-11-28 14:05:45 +08:00
  • c7ba1f6bc7 [BugFix] Fix ValueError in NewRequestData repr methods (#29392) maang-h 2025-11-28 13:42:30 +08:00
  • 18523b87f6 [Docs] Update supported models for Olmo 3 in tool calling documentation (#29411) Wilson Wu 2025-11-28 10:53:55 +08:00
  • 745a3bae1a [LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971) Xin Yang 2025-11-27 18:48:28 -08:00
  • 35657bcd7a [CPU]Update CPU PyTorch to 2.9.0 (#29589) scydas 2025-11-28 09:34:33 +08:00
  • be493e0b3c [BugFix] Fix new nightly failures (#29578) Lucas Wilkinson 2025-11-27 16:45:38 -05:00
  • ae0ce1be27 [Model Runner V2][BugFix] Keep reference to GPU tensors in AsyncOutput (#29623) Woosuk Kwon 2025-11-27 12:38:53 -08:00
  • a5345bf49d [BugFix] Fix plan API Mismatch when using latest FlashInfer (#29426) Andrii Skliar 2025-11-27 20:34:59 +01:00
  • e5a621b724 [CI] Add batched audios Whisper test (#29308) Nicolò Lucchesi 2025-11-27 20:31:52 +01:00
  • 38658ec6f3 [Bugfix][MM encoder] Fix ViT attention backend resolving for Turing GPU (#29614) Isotr0py 2025-11-28 03:17:37 +08:00
  • a24ea5414b [Deprecation] Advance deprecation status (#29617) Cyrus Leung 2025-11-28 03:04:58 +08:00
  • ea228b4491 [Misc] Remove unused code from protocol.py (#29616) Cyrus Leung 2025-11-28 02:39:59 +08:00
  • d45269b378 add skip_reading_prefix_cache in repr for PoolingParams (#29620) 果冻虾仁 2025-11-28 01:21:00 +08:00
  • ee9841daa9 [Bugfix] Fix doc build on main (#29619) Cyrus Leung 2025-11-28 01:08:08 +08:00
  • 0840abdd24 [BugFix] Optional tokenizer argument when loading GGUF models (#29582) Injae Ryou 2025-11-28 01:53:10 +09:00
  • e1f262337b Update Transformers pin in CI to 4.57.3 (#29418) Harry Mellor 2025-11-27 16:42:14 +00:00
  • fc1d8be3dc [Attention] Update attention imports (#29540) Matthew Bonanni 2025-11-27 11:19:09 -05:00
  • cd007a53b4 [bugfix] avoid NIXL_ERR_REMOTE_DISCONNECT in nixl_connector when Prefill dies (#28120) Mathis Felardos 2025-11-27 16:32:38 +01:00
  • 66d3d5422c [Doc]: fixing typos in diverse files (#29492) Didier Durand 2025-11-27 16:15:50 +01:00
  • bab438ff3e [CI/Build] Skip ray tests on ROCm (#29556) Ryan Rock 2025-11-27 09:01:37 -06:00
  • 882851dc81 [CI/Build][Bugfix] Fix auto label issues for CPU (#29610) Li, Jiang 2025-11-27 22:51:26 +08:00
  • 2f5f9acd55 [LoRA] Continue optimizing MoE LoRA weight loading (#29322) Jee Jee Li 2025-11-27 21:56:28 +08:00
  • cf348c8d27 [Bugfix] Fix HunyuanVL XD-RoPE (#29593) Roger Wang 2025-11-27 04:36:24 -08:00
  • a5abd1d384 [CI] Auto label CPU related issues (#29602) Li, Jiang 2025-11-27 19:33:19 +08:00
  • e6d4f3c254 [Bugfix] Fix pre-commit (#29601) Cyrus Leung 2025-11-27 18:23:06 +08:00
  • 51906c8c55 [Docs] Improve priority parameter documentation (#29572) maang-h 2025-11-27 18:09:24 +08:00
  • 0838b52e2e [Frontend][torch.compile] CompilationConfig Overhaul (#20283): Set up -O infrastructure (#26847) Morrison Turnansky 2025-11-27 04:55:58 -05:00
  • 00d3310d2d [Bugfix] Update Ultravox compatibility (#29588) Cyrus Leung 2025-11-27 17:36:18 +08:00
  • da3222f371 [Model Runner V2] Implement multi-step Eagle with CUDA graph (#29559) Woosuk Kwon 2025-11-27 00:09:41 -08:00
  • 43c5792592 [ROCm][CI] Fix test_cpu_offloading for ROCm (#29548) Micah Williamson 2025-11-27 01:54:44 -06:00