Commit Graph

  • 66a2209645 [Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085) Kunshang Ji 2026-03-05 18:36:39 +08:00
  • 0bfa229bf1 [Release] Include source distribution (sdist) in PyPI uploads (#35136) Doug Smith 2026-03-05 04:43:50 -05:00
  • 7493c51c55 [Docs] add Dynamo/aibrix integration and kubeai/aks link (#32767) Paco Xu 2026-03-05 17:39:50 +08:00
  • ac773bbe80 [Docs] Update docs to include mm processor + encoder benchmarks (#34083) Reagan Lee 2026-03-05 01:38:25 -08:00
  • 48e376a007 qwen3coder tool parser fix anyOf double encoded parameters (#36032) Christian Munley 2026-03-05 01:06:57 -08:00
  • 21eb2c3372 [Chore] Correct MTP models test registry ordering (#36115) Isotr0py 2026-03-05 16:55:04 +08:00
  • e2b31243c0 [Docs] Update CacheConfig block_size docstring to remove inaccurate limit when using CUDA (#35632) Seiji Eicher 2026-03-04 22:24:08 -08:00
  • c3598d02fa [Misc] Remove deprecated items that are due for removal (#36006) Martin Hickey 2026-03-05 06:14:50 +00:00
  • 57c629e9c1 [Bugfix] Fix block_size for hybrid model MTP (#36036) Benjamin Chislett 2026-03-05 01:10:54 -05:00
  • d106bf39f5 [Doc] Add Parallel Draft Models (#35973) zihaoanllm 2026-03-05 13:44:07 +08:00
  • b0651021e5 [Kernel] [Helion] [11/N] Retune configs for silu_mul_fp8 (#36062) Yanan Cao 2026-03-04 21:25:59 -08:00
  • f600d5192e [Bugfix] Fix score layer quantization for sequence classification models - Qwen3 (VL) Reranker (#35849) Hanjun Cho 2026-03-05 13:57:20 +09:00
  • 8e7820131e [Perf] Use dummy M for weight prepacking on x86 (#35890) Tianmu Li 2026-03-04 20:56:49 -08:00
  • 0a12cea25f Order config.py in Lexicographical order (#35866) Andrii Skliar 2026-03-05 05:56:47 +01:00
  • dd6dbd93f8 [compile] Fix extra cache save on warm start. (#35921) Zhengxu Chen 2026-03-04 23:56:30 -05:00
  • 26366009c5 [CI] Don't leave docs preview comment on closed PRs (#36087) Harry Mellor 2026-03-05 04:51:46 +00:00
  • 16c472abe7 [Core] Move ray-specific WorkerWrapperBase methods to RayWorkerWrapper (#35328) Nick Hill 2026-03-04 20:11:59 -08:00
  • 3b23d57c96 [Model] Add LoRA support for Whisper models (#29856) daje0601 2026-03-05 11:38:25 +09:00
  • 2f4226fe52 [CI] Fix pre-commit mypy issue in main (#36049) Wentao Ye 2026-03-04 21:13:12 -05:00
  • 792cbd64ca Add platform method to enable custom collective ops registration (#34760) nkm-meta 2026-03-04 16:50:32 -08:00
  • 2ed4722e26 [compile] Reduce log spam from compile. (#36044) Zhengxu Chen 2026-03-04 19:48:36 -05:00
  • a3299c3d1d [Model Runner V2] Misc code simplification (#35941) Nick Hill 2026-03-04 15:26:35 -08:00
  • 6c21a0c2d7 [ROCm][CI] Added MI325 mirrors (stage C) (#35239) Andreas Karatzas 2026-03-04 16:48:46 -06:00
  • 562339abc3 [Misc] Support OOT linear method registering (#35981) Shanshan Shen 2026-03-05 06:25:56 +08:00
  • d7adcadb9b [Bugfix] Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 (#36017) amitz-nv 2026-03-05 00:23:51 +02:00
  • f678c3f61a [RL] [Weight Sync] Guard IPC update-info pickle deserialization behind insecure serialization flag (#35928) Simon Mo 2026-03-04 14:05:32 -08:00
  • be0a3f7570 [Bugfix] Fix race in non-blocking num_accepted_tokens GPU->CPU copy (#36013) Thomas Parnell 2026-03-04 22:52:44 +01:00
  • 17dc9c7fc9 [CI] Bump mypy version (#34950) Harry Mellor 2026-03-04 20:55:11 +00:00
  • 7eca859110 Add PyTorch profiler schedule support with warmup/active iterations (#35240) fenypatel99 2026-03-04 12:53:38 -08:00
  • 636ee223ac [Docs] Document security risks of GPT-OSS Python tool (#35139) Russell Bryant 2026-03-04 15:27:31 -05:00
  • b7d59ffce2 [UX] Remove NoOpOffloader log (#35678) Robert Shaw 2026-03-04 15:13:40 -05:00
  • 5569f5218d [torch.compile] Stop lazily compiling (#35472) Richard Zou 2026-03-04 15:13:17 -05:00
  • 138d891d7f [Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441) Davina Zaman 2026-03-04 11:44:39 -08:00
  • d7166e74c1 [CI] Add Blackwell AsyncTP correctness test (#35871) Stefano Castagnetta 2026-03-04 20:41:21 +01:00
  • 417fd28fb1 [Model Runner V2] Fix pooling (#36019) Nick Hill 2026-03-04 10:53:17 -08:00
  • 7faba503c4 [Kernel][Mamba] Optimize Mamba2 SSD prefill Triton kernels (#35397) tomeras91 2026-03-04 20:47:17 +02:00
  • bc6be89d16 [Frontend] Add vllm launch command for GPU-less preprocessing serving (#34551) Hyunkyun Moon 2026-03-05 03:41:52 +09:00
  • 32224f568a docs: update CPU Docker images to reference Docker Hub instead of AWS ECR (#34882) Maxime Grenu 2026-03-04 19:31:35 +01:00
  • f3dc292e9f docs: add version requirement note for --profiler-config flag (#32454) Abhishek Mathukiya 2026-03-04 13:13:54 -05:00
  • 138c5fa186 [Docs] Add RunPod GPU deployment guide for vLLM (#34531) Chen 2026-03-04 12:11:34 -06:00
  • 2f2c1d73a7 [Docs] Upgrade dynamic LoRA warning to admonition block (#35218) Russell Bryant 2026-03-04 13:01:42 -05:00
  • fb3e78ab09 [Feature][CI]: compare func & no_func outputs in test_functionalization.py (#35481) Bhuminjay Soni 2026-03-04 23:31:16 +05:30
  • fd3bfe74c9 [Docs] Update design/multiprocessing.md (#30677) Michael Yao 2026-03-05 01:58:59 +08:00
  • bfdb512f11 fix minicpmo4.5: fix attn_mask in vit attn && fix resampler pos_emb i… (#34127) tc-mb 2026-03-05 01:46:17 +08:00
  • d25c1ec3c9 docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build (#35090) Sage 2026-03-04 19:45:35 +02:00
  • 7cc6058ac6 [Doc] Add MTP docs and update speculative decoding guidance (#35197) Xing Liu 2026-03-05 01:23:34 +08:00
  • 28028dff2f fix(docs): use static rdzv backend in multi-node troubleshooting script (#34784) Manrique Vargas 2026-03-04 12:15:35 -05:00
  • 3417ba5648 docs: add README for logits_processor examples (#35933) Dr Alex Mitre 2026-03-04 11:09:19 -06:00
  • 58cfe0dc44 Fix phi4-mm and remove cuda binding (#35964) Yan Ma 2026-03-05 01:08:05 +08:00
  • e86221deb6 [Doc] Fix GPU Worker count in Process Count Summary (#36000) simone-dotolo 2026-03-04 18:03:14 +01:00
  • 289fc48ab7 Use MMEncoderAttention (=use FlashAttention) instead of torch.sdpa in radio.py (#35653) Netanel Haber 2026-03-04 18:43:13 +02:00
  • 2f2212e6cc Split generic IO Processor plugins tests from Terratorch specific ones (#35756) Christian Pinto 2026-03-04 16:01:03 +00:00
  • 18e01a0a10 [Misc] Add --attention-backend auto option (#35738) Nicolò Lucchesi 2026-03-04 16:12:27 +01:00
  • 6cb901093f [Core] Add All-to-All communication backend for DCP (#34883) sungsoo ha 2026-03-04 07:01:57 -08:00
  • ead7bde1ab [Bugfix] Make kaldi_native_fbank optional (#35996) Cyrus Leung 2026-03-04 22:47:32 +08:00
  • 6aa6ad8992 [BugFix] Fix implicit and incorrect assumption on ECConnector is_producer (#34783) Qi Wang 2026-03-04 06:01:30 -08:00
  • c8c3935b70 [Bugfix][Model] Fix FP8 k_scale/v_scale not loaded for Qwen3-MoE (#35656) Raghavan 2026-03-04 18:45:38 +05:30
  • bb6888b8b1 [Bugfix][CPUOffloadingManager] Prevent eviction of already-stored blocks in LRU/ARC prepare_store() (#35846) Ronen Schaffer 2026-03-04 14:25:33 +02:00
  • 1aaec59d79 [MISC] fixed tool_parser mypy errors (#35640) Taneem Ibrahim 2026-03-04 06:23:12 -06:00
  • 1659b2e058 [Feature] Add basic metrics for /realtime endpoint (#35500) pougetat 2026-03-04 03:56:32 -08:00
  • d6e04f4c43 [Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094) (#34571) haosdent 2026-03-04 18:56:22 +08:00
  • a8f66cbde8 [XPU] bump vllm-xpu-kernels to v0.1.3 (#35984) Kunshang Ji 2026-03-04 18:23:31 +08:00
  • 16d2ad1d38 [Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681) Kunshang Ji 2026-03-04 17:49:47 +08:00
  • 5dc3538736 [ROCm][Bugfix] Fall back from CK MXFP4 MoE when GEMM dimensions are unsupported (#35893) Chuan (Richard) Li 2026-03-04 00:30:54 -08:00
  • 36bf213181 [Bugfix] Add missing dynamic_arg_dims for Qwen3-ASR torch.compile (#35869) Nathan Price 2026-03-04 02:29:01 -06:00
  • 6f0dd93801 [Core] Remove busy loop from idle buffer readers (#28053) Joe Runde 2026-03-04 00:44:20 -07:00
  • 5d199ac8f2 Support Audio Extraction from MP4 Video for Nemotron Nano VL (#35539) Andrii Skliar 2026-03-04 08:20:33 +01:00
  • 9e0f44bec4 [cohere][fix][spec-decode]: fix crash when allowed_token_ids is set without penalties (#35654) Komal Kumar Teru 2026-03-04 12:50:15 +05:30
  • 097eb544e9 [Bugfix] Improve engine ready timeout error message (#35616) v0.17.0rc0 lailoo 2026-03-04 13:54:32 +08:00
  • 7cdba98edf [BugFix] Support tool_choice=none in the Anthropic API (#35835) ShiJie Zhong 2026-03-04 13:24:46 +08:00
  • 3c85cd9d74 [Rocm][CI] Fix ROCm LM Eval Large Models (8 Card) (#35913) Charlie Fu 2026-03-03 22:50:13 -06:00
  • edba15045a [Bugfix] Guard mm_token_type_ids kwarg in get_mrope_input_positions (#35711) Andreas Karatzas 2026-03-03 22:12:51 -06:00
  • e379396167 [Refactor] Clean up processor kwargs extraction (#35872) Cyrus Leung 2026-03-04 11:53:53 +08:00
  • 6e9f21e8a2 [Chore] Remove debug code in model implementation (#35883) Isotr0py 2026-03-04 11:50:58 +08:00
  • c1d963403c [model] support FireRedASR2 (#35727) AllenDou 2026-03-04 11:41:30 +08:00
  • 77e6dcbbfa [PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753) Shanshan Shen 2026-03-04 11:41:27 +08:00
  • 70c73df69e [Bugfix] Fix EVS implementation for Qwen3 VL (#33607) William Zhang 2026-03-03 18:18:11 -08:00
  • 9a9d442464 Enable bnb for multiple indices weight (#35838) xjx 2026-03-04 09:46:47 +08:00
  • f7da9cdffc [ROCm][CI] Support async weight transfer example with platform-aware determinism (#35710) Andreas Karatzas 2026-03-03 19:44:14 -06:00
  • f22ff2958c [Bugfix] Fix coord_socket assertion in DPEngineCoreProc for offline DP mode (#35916) Jaewon 2026-03-03 16:10:11 -08:00
  • d15c3b90fc [Core] Move save_tensorized_model logic to Worker (#35825) Nick Hill 2026-03-03 15:31:59 -08:00
  • 97286a20ed [Model Runner V2] support dp & ep for spec decoding (#35294) zhrrr 2026-03-04 07:19:45 +08:00
  • 12b38c0f45 [CI/Build] Allow mounting AWS credentials for sccache S3 auth (#35912) Amr Mahdi 2026-03-03 14:30:47 -08:00
  • 467886a0c4 [Model Runner V2] Fix inputs_embeds=None bug for MM models (#35917) Woosuk Kwon 2026-03-03 13:47:45 -08:00
  • a9b8b13e5c [Bugfix] Fix misnamed parameter in compressed_tensors_moe.py (#35813) bnellnm 2026-03-03 16:29:57 -05:00
  • e7213003cb [ROCm][CI] Fix TP size issue for test_gpt_oss (#35887) Micah Williamson 2026-03-03 14:57:34 -06:00
  • 3a8eef5869 [ROCm][Bugfix]: Disable AITER Triton ROPE by default (#35601) Rohan Potdar 2026-03-03 13:43:56 -06:00
  • 97995f6376 [MoE Refactor] Create MK for TRTLLM Kernels (#32564) Robert Shaw 2026-03-03 13:39:50 -05:00
  • 881a6b011b [CI] Temporarily Disable Llama4 MoE Refactor Test (#35870) Robert Shaw 2026-03-03 13:36:15 -05:00
  • 8e1fd5baf0 [CI] Bump num_speculative_tokens to 3 in nightly DeepSeek tests (#35882) Matthew Bonanni 2026-03-03 12:26:44 -05:00
  • ae88468bcc fix: Ensure invalid audio files return 400 error (#34715) JasonCohere 2026-03-03 16:47:39 +00:00
  • e05cb3b93e TRTLLM gen-full attn Test Coverage (#34986) ojhaanshika 2026-03-03 08:35:34 -08:00
  • 28ef9ba399 [BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA (#34552) Lucas Wilkinson 2026-03-03 10:21:57 -05:00
  • fb7fdc49c4 [ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops (#34307) TJian 2026-03-03 22:24:21 +08:00
  • ea463978bb [Frontend][1/n] Improve pooling entrypoints | classify. (#35604) wang.yuqi 2026-03-03 22:05:36 +08:00
  • 440f0e7dc6 [Bugfix] Avoid src/dst as None in irecv/isend_tensor_dict (#35754) Li, Jiang 2026-03-03 21:56:08 +08:00
  • fd4a90f337 [CI] And PPL test for Qwen3.5. (#35853) wang.yuqi 2026-03-03 21:15:51 +08:00
  • ad9d09e2b8 [Perf] [Hybrid] Copy num_accepted_tokens in non-blocking way when not using prefix caching (#35442) Thomas Parnell 2026-03-03 13:15:43 +01:00
  • 4beebfd146 [CI/Build][Intel] Add new performance benchmarks for Intel Gaudi 3 (#31025) Szymon Reginis 2026-03-03 12:48:24 +01:00
  • b8401cde0e add regression test (#35834) hallerite 2026-03-02 23:32:15 -08:00