Commit Graph

  • b6a3a9f76d [Core] Fix abrupt request abort (#18485) Nicolò Lucchesi 2025-06-07 01:27:59 +02:00
  • ca27f0f9c1 [Bugfix][Core] Update cancellation logic in generate() to handle Generator exits (#19225) Adolfo Victoria 2025-06-06 13:17:54 -07:00
  • aad30bd306 [BugFix] Fix MultiConnector test after HMA changes (#19291) Nick Hill 2025-06-06 13:16:24 -07:00
  • 94ecee6282 Fixed ppc build when it runs on non-RHEL based linux distros (#18422) Nishidha 2025-06-07 00:24:26 +05:30
  • 8267f9916f improve logits bias (#19041) Yu Guo 2025-06-06 04:59:25 -07:00
  • 7353492a47 [Core] Raise when non-multi-instance DP clients target a DP rank (#19227) jmswen 2025-06-06 04:03:01 -07:00
  • 7661e92ef8 [Model] Optimize nemotron_h implementation (#19249) Jee Jee Li 2025-06-06 18:05:14 +08:00
  • f168b85725 Unit Test for run_dp_sharded_vision_model (#19103) Siqi Yan 2025-06-06 01:24:02 -07:00
  • da511d54d8 Fix CompilationConfig repr (#19091) Richard Zou 2025-06-06 04:23:35 -04:00
  • 65c69444b1 [Docs] Improve V1 KVConnector interface documentation (#19172) Nick Hill 2025-06-06 01:22:45 -07:00
  • 94870359cd [Quantization] Bump compressed-tensors version; update NVFP4A16 test model (#19224) Dipika Sikka 2025-06-06 04:21:54 -04:00
  • 0d49483ea9 [TPU] fix kv cache dtype in model runner (#19244) Chengji Yao 2025-06-06 01:20:16 -07:00
  • 90b78ec5f9 [v1][P/D] Fix a edge case in kv cache schedule (#19182) Jinghui Zhang 2025-06-05 23:32:55 -07:00
  • 91a2ef98ea [Chore] update CODEOWNERS (#19247) Aaron Pham 2025-06-06 02:09:43 -04:00
  • 3da2313d78 Support allowed_token_ids in ChatCompletionRequest (#19143) Xu Song 2025-06-06 13:06:48 +08:00
  • b61dc5f972 [TPU] update torch_xla pin (#19231) Chengji Yao 2025-06-05 21:27:38 -07:00
  • f8a1a2d108 [v1] Hybrid Memory Allocator (#17996) Chen Zhang 2025-06-06 11:47:09 +08:00
  • 3465b87ef8 [Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033) Benjamin Chislett 2025-06-05 22:10:08 -04:00
  • c8134bea15 Fix AOPerModuleConfig name changes (#18869) Jerry Zhang 2025-06-05 21:51:32 -04:00
  • cb6d572e85 [Model] NemotronH support (#18863) Luis Vega 2025-06-05 14:29:28 -07:00
  • 87360308b7 [V1] Use FlashInfer by default on Blackwell GPUs (#19118) Michael Goin 2025-06-05 15:40:39 -04:00
  • aa49f14832 [Quantization] Skip Fp4 Test for compressed-tensors (#19217) Dipika Sikka 2025-06-05 14:21:53 -04:00
  • 9ef9173cfa [P/D][NixlConnector] Enable FlashInfer backend (#19090) Nicolò Lucchesi 2025-06-05 19:10:15 +02:00
  • 85e2b7bb13 [MISC][Bugfix] Use less CPU when message queue has been empty for some time (#16226) Povilas Kanapickas 2025-06-05 19:53:08 +03:00
  • 61059bee40 [Hardware][NVIDIA] FP4 MoE kernel optimization (#19110) Chiyue Wei 2025-06-05 09:48:26 -07:00
  • ec89524f50 Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 (#19205) Xu Wenqing 2025-06-06 00:38:54 +08:00
  • f20f9f063b [mistral_common] Add v11 tokenizer (#19193) Patrick von Platen 2025-06-05 17:27:41 +02:00
  • 9bc8bb07cf [Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided (#19202) Guillaume Calmettes 2025-06-05 14:59:28 +02:00
  • 1aeb925f34 [Frontend] improve vllm run-batch --help display (#19187) Reid 2025-06-05 19:16:25 +08:00
  • 188a4590d8 [Misc] Do not override NCCL_CUMEM_ENABLE if set explicitly (#19105) 22quinn 2025-06-05 04:14:32 -07:00
  • 18093084be [Misc] Remove unnecessary fallback to prefill-decode attention (#19138) vllmellm 2025-06-05 16:08:26 +08:00
  • da40380214 [Build] Annotate wheel and container path for release workflow (#19162) Simon Mo 2025-06-04 23:24:56 -07:00
  • 8fc57501d3 [Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled (#19135) Chauncey 2025-06-05 14:24:24 +08:00
  • af7fc84fd2 [BugFix][Minor] Fix full cuda graph bug when max_num_seqs < 512 (#19171) Woosuk Kwon 2025-06-04 22:41:25 -07:00
  • 0678b52251 Handle non-serializable objects when dumping benchmark results (#19114) Huy Do 2025-06-04 22:40:04 -07:00
  • 25b918eee6 [Torch Nightly]add missing dependency (#18770) Yang Wang 2025-06-04 21:56:12 -07:00
  • a408820f2f [Bugfix] Fix port handling in make_zmq_path (#19117) Michael Goin 2025-06-04 23:00:59 -04:00
  • c56ed8bb0e [Bugfix][Nixl] Fix full prefix cache hit bug (#18632) Robert Shaw 2025-06-04 22:07:32 -04:00
  • 78dcf56cb3 [doc] small fix (#19167) Reid 2025-06-05 09:13:50 +08:00
  • b2fac67130 [P/D] Heterogeneous TP (#18833) Nicolò Lucchesi 2025-06-05 01:25:34 +02:00
  • 23027e2daf [Misc] refactor: simplify EngineCoreClient.make_async_mp_client in AsyncLLM (#18817) CYJiang 2025-06-05 06:37:25 +08:00
  • c3fd4d669a [Kernel] Integrate batched/masked deepgemm kernel (#19111) Varun Sundar Rabindranath 2025-06-04 17:59:18 -04:00
  • ef3f98b59f [Bugfix] fix v1 cpu worker fails on macOS (#19121) Kebe 2025-06-05 04:17:38 +08:00
  • 7ee2590478 [TPU] Update dynamo dump file name in compilation test (#19108) Siyuan Liu 2025-06-04 13:13:43 -07:00
  • 53a5a0ce30 [Perf] Tunings for SM100 FP8 CUTLASS kernel (#18778) Michael Goin 2025-06-04 13:46:28 -04:00
  • d459fae0a2 [Bugfix][EP+DP] Fix internode check (#19112) Tyler Michael Smith 2025-06-04 11:39:23 -04:00
  • c8dcc15921 Allow AsyncLLMEngine.generate to target a specific DP rank (#19102) jmswen 2025-06-04 08:26:47 -07:00
  • 8f4ffbd373 [Doc] Update V1 Guide for embedding models (#19141) Cyrus Leung 2025-06-04 22:57:55 +08:00
  • 5f2cd251d2 Sm100 blockwise fp8 swap ab (#18564) Lain 2025-06-04 07:48:45 -07:00
  • 02658c2dfe Add DeepSeek-R1-0528 function call chat template (#18874) Xu Wenqing 2025-06-04 21:24:18 +08:00
  • 01dc9a76db [CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678) Cyrus Leung 2025-06-04 19:49:20 +08:00
  • 35cf32df30 Improve the output precision of embedding models (#19092) wang.yuqi 2025-06-04 19:48:57 +08:00
  • 8711bc5e68 [Misc] Add packages for benchmark as extra dependency (#19089) Isotr0py 2025-06-04 19:18:48 +08:00
  • 2669a0d7b5 Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113) Seiji Eicher 2025-06-04 02:10:45 -07:00
  • 8e972d9c44 [TPU] Skip hanging tests (#19115) Siyuan Liu 2025-06-04 01:43:00 -07:00
  • 3336c8cfbe Fix #19130 (#19132) 汪志鹏 2025-06-04 16:42:06 +08:00
  • b124e1085b [Bugfix] Fix FA3 full cuda graph correctness (#19106) Woosuk Kwon 2025-06-03 23:10:15 -07:00
  • 41aa578428 [NVIDIA] Add Cutlass MLA backend (#17625) Kaixi Hou 2025-06-04 12:40:26 +08:00
  • 8d646c2e53 [Cleanup][v1]:remote guided-decoding-backend for example (#19059) Calvin Chen 2025-06-04 12:23:26 +08:00
  • 5d6d1adf15 [KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437) Vadim Gimpelson 2025-06-04 08:13:01 +04:00
  • 1409ef9134 [Core] Cast multimodal input in hf processor (#18862) Lukas Geiger 2025-06-04 04:24:56 +01:00
  • 4555143ea7 [CPU] V1 support for the CPU backend (#16441) Li, Jiang 2025-06-04 09:43:01 +08:00
  • 52dceb172d [Docs] Add developer doc about CI failures (#18782) Russell Bryant 2025-06-03 21:09:13 -04:00
  • abd7df2fca [Misc] Fix path and python alias errors in disagg_prefill exmaples (#18919) Jiaxin Shan 2025-06-03 17:15:18 -07:00
  • b712be98c7 feat: add data parallel rank to KVEventBatch (#18925) Yan Ru Pei 2025-06-03 17:14:20 -07:00
  • a8da78eac9 [Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029) Chen Zhang 2025-06-04 08:14:06 +08:00
  • 5d96533e22 [Bugfix][P/D] Fix Prefix Cache Bug (#18411) Nicolò Lucchesi 2025-06-04 01:53:16 +02:00
  • 4de790fcad [Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled (#19075) Chauncey 2025-06-04 07:27:24 +08:00
  • b5fd9506c1 [Bugfix] get_num_blocks_to_allocate with null_block (#19031) Chen Zhang 2025-06-04 06:30:55 +08:00
  • 135cf55cd1 [V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix (#18971) Ekagra Ranjan 2025-06-03 18:26:33 -04:00
  • 6cac54f4d1 [v1] Re-init input batch for multiple kv cache groups (#18654) Chen Zhang 2025-06-04 05:41:36 +08:00
  • 6865fe0074 Fix interaction between Optional and Annotated in CLI typing (#19093) Harry Mellor 2025-06-03 22:07:19 +01:00
  • e31446b6c8 [Perf] Tune scaled_fp8_quant by increasing vectorization (#18844) Michael Goin 2025-06-03 16:48:25 -04:00
  • bdf13965ab [V1] Support cross-layer KV sharing (#18212) Yong Hoon Shin 2025-06-03 13:33:07 -07:00
  • fa98d77773 [Kernel] DeepEP dispatch-combine kernel integration (#18434) Varun Sundar Rabindranath 2025-06-03 15:30:02 -04:00
  • 01eee40536 [doc] update docker version (#19074) Reid 2025-06-04 03:08:21 +08:00
  • 19bdaf32b1 [Doc] Readme standardization (#18695) SorenDreano 2025-06-03 20:50:55 +02:00
  • 02f0c7b220 [Misc] Add SPDX-FileCopyrightText (#19100) Simon Mo 2025-06-03 11:20:17 -07:00
  • d054da1992 [Misc] fix: add miss best_of param validation (#18555) CYJiang 2025-06-04 02:02:07 +08:00
  • 4b7817c119 [Misc] Add missing _Backend enums (#19081) Nicolò Lucchesi 2025-06-03 18:15:16 +02:00
  • d00dd65cd4 [Doc] Improve the Pull Request template with key components (#19086) Lu Fang 2025-06-03 23:44:34 +08:00
  • d81edded69 [Bugfix] disable processor cache (#19068) Raushan Turganbay 2025-06-03 17:06:04 +02:00
  • 476844d44c Fix underscores in dict keys passed via CLI (#19030) Harry Mellor 2025-06-03 15:39:24 +01:00
  • 4e68ae5e59 [CI/Build] Remove V0 LoRA test (#19066) Jee Jee Li 2025-06-03 22:30:18 +08:00
  • 4e88723f32 [doc] clarify windows support (#19088) youkaichao 2025-06-03 21:42:17 +08:00
  • 118ff92111 [Doc] Update V1 user guide for embedding and enc-dec models (#19060) Cyrus Leung 2025-06-03 17:29:41 +08:00
  • ec2dcd80bc [Misc] Update WeightsMapper for qwen2-vl/qwen2.5-vl (#19054) Isotr0py 2025-06-03 17:08:20 +08:00
  • 42243fbda0 [Doc] Add InternVL LoRA support (#19055) Jee Jee Li 2025-06-03 17:08:03 +08:00
  • 6d18ed2a2e Update docker docs with ARM CUDA cross-compile (#19037) Michael Goin 2025-06-03 04:21:53 -04:00
  • f32fcd9444 [v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015) Chen Zhang 2025-06-03 16:01:48 +08:00
  • d32aa2e670 [Bugfix] Use cmake 3.26.1 instead of 3.26 to avoid build failure (#19019) Lu Fang 2025-06-03 15:16:17 +08:00
  • cc977286e7 Reduce logs in CLI scripts and plugin loader (#18970) Michael Goin 2025-06-03 02:00:45 -04:00
  • 17430e3653 [bugfix] small fix logic issue (#18999) Reid 2025-06-03 13:35:12 +08:00
  • 1282bd812e Add tarsier model support (#18985) 汪志鹏 2025-06-03 13:13:13 +08:00
  • bdce64f236 [V1] Support DP with Ray (#18779) Rui Qiao 2025-06-02 21:15:13 -07:00
  • 9e6f61e8c3 [ROCm][Build] Clean up the ROCm build (#19040) Gregory Shtrasberg 2025-06-02 23:47:47 -04:00
  • 8655f47f37 [CPU][CI] Re-enable the CPU CI tests (#19046) Li, Jiang 2025-06-03 11:46:47 +08:00
  • 4ce42f9204 Adding "LoRA Test %N" to AMD production tests (#18929) Concurrensee 2025-06-02 22:46:44 -05:00
  • 8a57872b2a [Bugfix][EP+DP] Use pplx-kernel internode instead of intranode (#19034) Tyler Michael Smith 2025-06-02 23:36:51 -04:00
  • 5bc1ad6cee [Doc] Remove duplicate TOCs during MkDocs migration (#19021) Hyogeun Oh (오효근) 2025-06-03 11:49:48 +09:00