Commit Graph

  • ce20124671 [release] Add force remove for TPU logs (#14697) Kevin H. Luu 2025-03-12 15:35:18 -07:00
  • 53be4a8634 [V1] Allow sliding window + prefix caching (#13069) Woosuk Kwon 2025-03-12 11:21:19 -07:00
  • f5d3acd474 [BugFix][V1] Fix parallel sampling finishing/aborts (#14512) Nick Hill 2025-03-12 13:29:48 -04:00
  • 916836bbfb [FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664) TJian 2025-03-13 00:31:19 +08:00
  • d9f83d6206 [ROCm] Enable chunked prefill/paged attention in MLA on ROCm (#14316) Sage Moore 2025-03-12 08:51:20 -07:00
  • 4a754fcf15 [Bugfix] Missing thumbnail from NVLM-D processor (#14633) ameyanjarlekar 2025-03-12 08:50:49 -07:00
  • c0c25e25fa [Model] Add support for Gemma 3 (#14660) Woosuk Kwon 2025-03-12 08:36:33 -07:00
  • 45f3f3f59e [ROCm][Bugfix] Ensure that the moe_wna16_gemm kernel is not built on ROCm platforms. (#14629) Sage Moore 2025-03-12 05:00:28 -07:00
  • ff47aab056 [CPU] Upgrade CPU backend to torch-2.6 (#13381) Li, Jiang 2025-03-12 18:41:13 +08:00
  • debd6bbf09 [Kernel] Add ModelOpt FP4 Checkpoint Support (#12520) Pavani Majety 2025-03-11 22:13:11 -07:00
  • 5c538c37b2 [V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing (#14645) Benjamin Chislett 2025-03-12 01:12:41 -04:00
  • e22ee1e7a2 [Kernel] GGUF MoE kernel (#14613) Szymon Ożóg 2025-03-12 04:33:27 +01:00
  • e392d85831 [Core] Refactor QKVCrossParallelLinear implementation to support BNB 4-bit quantization (#14545) Isotr0py 2025-03-12 11:12:52 +08:00
  • 77a318bd01 [V1][Core] Support MistralTokenizer for Structured Output (#14625) Aaron Pham 2025-03-11 22:40:09 -04:00
  • 80e78d02ac [Model] Extend Ultravox to accept audio longer than 30s (#13631) Farzad Abdolhosseini 2025-03-11 19:27:10 -07:00
  • 4a42b9f5d6 [Doc] Update benchmarks README (#14646) Jennifer Zhao 2025-03-11 19:23:04 -07:00
  • 47532cd9f4 [core][V1] pluggable scheduler (#14466) Joe Runde 2025-03-11 19:15:15 -06:00
  • 36e0c8f7da [Feature] Add vllm bench CLI (#13993) Randy Chen 2025-03-11 17:31:48 -07:00
  • 9f583e360c [release] Add commands to clean up logs on TPU release node (#14642) Kevin H. Luu 2025-03-11 17:14:50 -07:00
  • b706d898af [Bugfix][V1][PP] Only warmup sampler at last PP rank (#14643) Cody Yu 2025-03-11 16:40:07 -07:00
  • 863d315c86 [V1][TPU] Pad the block_table.shape[1] so the ragged paged attention can handle correctly (#14597) iefgnoix 2025-03-11 16:12:26 -07:00
  • d374f04a33 Fix run_tpu_test (#14641) Richard Liu 2025-03-11 14:14:33 -07:00
  • 61a01b27a7 [V1] Delay all xgrammar usage until needed (#14616) Russell Bryant 2025-03-11 16:21:33 -04:00
  • 53056731fd fix some typos : supported_head_sizes (#14627) Yang.Tao 2025-03-12 01:38:24 +08:00
  • 4cbf286794 [V1] Remove cache from StructuredOutputManager (#14622) Russell Bryant 2025-03-11 13:36:07 -04:00
  • c6e14a61ab [Hardware][Intel GPU] upgrade IPEX dependency to 2.6.10. (#14564) Kunshang Ji 2025-03-11 10:11:47 -07:00
  • 07b4b7a37f [BugFix/Build] Fix sparse kernels not getting built on hopper (#14572) Lucas Wilkinson 2025-03-11 13:09:03 -04:00
  • 07964e2f30 docs: Add documentation for s390x cpu implementation (#14198) Dilip Gowda Bhagavan 2025-03-11 22:32:17 +05:30
  • 4bf82d4b90 [V1] Add regex structured output support with xgrammar (#14590) Russell Bryant 2025-03-11 11:03:44 -04:00
  • 9ab326713f Uninstall dependencies before installing requirements/tpu.txt (#14586) Richard Liu 2025-03-11 08:01:35 -07:00
  • af295e9b01 [Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 (#14609) Cyrus Leung 2025-03-11 22:59:43 +08:00
  • a1c8f3796c dynamic distpatch of fp8 kernels (#14245) Jeff Daily 2025-03-11 07:54:56 -07:00
  • 08a1a1121d benchmarks: simplify test jsonschema (#14567) Russell Bryant 2025-03-11 09:39:30 -04:00
  • 1477ffc381 [VLM] Cleanup siglip legacy code and fix broken paligemma multimodal processor (#14602) Isotr0py 2025-03-11 19:27:36 +08:00
  • 70b808fe1a [Perf]:Optimize qwen2-vl to reduce cudaMemcpyAsync (#14377) yexin(叶鑫) 2025-03-11 15:39:56 +08:00
  • 63d635d179 [Misc] Correct deepseek-vl2 chat template (#14558) Isotr0py 2025-03-11 12:37:11 +08:00
  • 1fc973c0b5 [V1][Core] Fix memory issue with logits & sampling (#14508) Roger Wang 2025-03-10 21:03:41 -07:00
  • c982ac5722 [Bugfix] Fix FP16 overflow for DeepSeek V2 (#13232) Concurrensee 2025-03-10 22:46:59 -05:00
  • 4290b704ff [V1][PP] Do not block engine core when no requests to schedule (#14585) Cody Yu 2025-03-10 19:48:24 -07:00
  • c91b64f749 [neuron] add reshape_and_cache (#14391) Liangfu Chen 2025-03-10 18:37:29 -07:00
  • d6123170d5 [Neuron] Add Neuron device communicator for vLLM v1 (#14085) gnovack 2025-03-10 18:37:04 -07:00
  • 485afdd3cb [MISC][V1] Handle exception of current_platform.get_device_name() in arg_utils (#14379) Cody Yu 2025-03-10 17:42:11 -07:00
  • 90e88ab756 [Kernel] moe wna16 cuda kernel (#13321) Jinzhen Lin 2025-03-11 08:12:40 +08:00
  • 04421dff8a [V1] Prevent xgrammar from breaking TPU support (#14575) Russell Bryant 2025-03-10 19:06:19 -04:00
  • 432d6dad15 Fix typo in benchmark_serving_structured_output.py (#14566) Russell Bryant 2025-03-10 17:58:58 -04:00
  • 5ff0d32580 [V1] LoRA - Add triton kernels for V1 (#13096) Varun Sundar Rabindranath 2025-03-10 17:27:53 -04:00
  • 0967110e42 [Minor] Update the tqdm bar for parallel sampling (#14571) Woosuk Kwon 2025-03-10 14:23:48 -07:00
  • fb0acb6c72 [Perf] Improve MLA on V1 (#14540) Simon Mo 2025-03-10 12:06:58 -07:00
  • 92b0ce2ac7 [Bugfix][v1] fixed llava-hf/llava-1.5-7b-hf is broken on V1 (#14554) Chauncey 2025-03-11 02:24:51 +08:00
  • bc2d4473bf [Docs] Make installation URLs nicer (#14556) Harry Mellor 2025-03-10 18:43:08 +01:00
  • 3b352a2f92 Correct capitalisation: VLLM -> vLLM (#14562) Harry Mellor 2025-03-10 17:36:21 +01:00
  • dea985aef0 [V1][Bugfix] Fix handing of second_per_grid_ts for Qwen2-VL & Qwen2.5-VL (#14548) Roger Wang 2025-03-10 09:03:11 -07:00
  • 39be30351f Correct capitalisation: Github -> GitHub (#14561) Harry Mellor 2025-03-10 16:53:33 +01:00
  • 001a9c7b0d [Doc] Update PaliGemma note to a warning (#14565) Cyrus Leung 2025-03-10 23:02:28 +08:00
  • 89cdaa83e7 [Kernel] Add more dtype support for GGUF kernels (#14043) Szymon Ożóg 2025-03-10 15:30:04 +01:00
  • b0746fae3d [Frontend] support image embeds (#13955) Chauncey 2025-03-10 20:36:03 +08:00
  • 60a98b2de5 [Docs] Mention model_impl arg when explaining Transformers fallback (#14552) Harry Mellor 2025-03-10 13:13:10 +01:00
  • 460f553a6d [Misc] Add log information for handle_process_request. (#14130) Chauncey 2025-03-10 16:40:50 +08:00
  • 1253b15774 [Feature] Consolidate performance benchmark datasets (#14036) Jennifer Zhao 2025-03-10 00:23:11 -07:00
  • dc74613fa2 [Bugfix] Wrong requirements path - rocm (#14527) Martin Hoyer 2025-03-10 03:49:46 +01:00
  • a21076ed3a [Misc] Ensure out-of-tree quantization method recognize by cli args (#14328) Yanyi Liu 2025-03-09 20:13:31 +08:00
  • 212007b168 [Hardware][TPU] Fix the recompiling issue in logits processor after warmup (#14510) Chengji Yao 2025-03-09 01:44:39 -08:00
  • fb16eea48b [Bugfix] Revert QKVCrossParallelLinear usage in Mllama to keep BNB quantization work (#14498) Isotr0py 2025-03-09 12:47:45 +08:00
  • 73ae0b44e9 [Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 (#12428) Yuchen Yan 2025-03-09 12:14:53 +08:00
  • 6d7f037748 [Feat] Support chunked prefill for LMCache connector (#14505) Jiayi Yao 2025-03-08 21:30:06 -06:00
  • 10f7552789 [V1][TPU] Remove unnecessary padding for running on TPU. (#14467) iefgnoix 2025-03-08 18:56:04 -08:00
  • b0d541947a [Attention] Default to FlashMLA backend for MLA (#14451) Lucas Wilkinson 2025-03-08 21:18:39 -05:00
  • 5f0b53c6ea Revert "[V1][Core] Fix memory issue with logits & sampling" (#14504) Robert Shaw 2025-03-08 20:43:37 -05:00
  • eb8b5eb183 [V1] Support bad_words in sampler (#13376) 22quinn 2025-03-08 14:50:26 -08:00
  • 9513290032 [Misc] Upgrade to Python 3.9 typing for additional directories (#14492) Cyrus Leung 2025-03-09 01:35:50 +08:00
  • 0d5e73d30e Update CODEOWNERS for structured output (#14496) Russell Bryant 2025-03-08 12:19:51 -05:00
  • 609ef61fea [Bugfix] Fix profiling OOM and decouple encoder multimodal profiling (#14361) Isotr0py 2025-03-09 00:52:34 +08:00
  • db84f5eb3b [Bugfix] DeepSeek Accuracy (#14476) Lucas Wilkinson 2025-03-08 11:47:03 -05:00
  • 206e2577fa Move requirements into their own directory (#12547) Harry Mellor 2025-03-08 17:44:35 +01:00
  • e02883c400 [Misc] Don't run ruff at all on 3rd party libs (#14493) Cyrus Leung 2025-03-08 23:16:40 +08:00
  • 9085aabd62 [benchmarks] Add option to use unique jsonschema for each request (#14457) Russell Bryant 2025-03-08 09:36:39 -05:00
  • 8d5aa466fb [V1][Core] Fix memory issue with logits & sampling (#13776) Roger Wang 2025-03-08 06:11:04 -08:00
  • 0b7f06b447 [Misc] add use_tqdm_on_load to reduce logs (#14407) Aaron Pham 2025-03-08 08:57:46 -05:00
  • 03fe18ae0f [VLM] Add TP support for Phi-4-MM (#14453) Isotr0py 2025-03-08 21:57:14 +08:00
  • cb8bdfade2 [V1] TPU - Add tensor parallel support via Ray (#13618) Alexander Matveev 2025-03-08 08:19:38 -05:00
  • 33f227e16b [CI/Build] Use a fixed seed to avoid flaky tests (#14480) Cyrus Leung 2025-03-08 19:30:09 +08:00
  • cfd0ae8234 Add RLHF document (#14482) Harry Mellor 2025-03-08 10:51:39 +01:00
  • 7caff01a7b [Build/BugFix] Fix hopper 12.8 build (#14354) Lucas Wilkinson 2025-03-08 03:11:56 -05:00
  • be0b399d74 Add training doc signposting to TRL (#14439) Harry Mellor 2025-03-08 08:35:07 +01:00
  • b8b0ccbd2d [Bugfix] Make the deviceprofiler include LoRA memory. (#14469) Jee Jee Li 2025-03-08 15:12:22 +08:00
  • c908a07f57 [Doc] Added QwQ-32B to the supported models list in the reasoning out… (#14479) Robin 2025-03-08 15:07:32 +08:00
  • 7b6fd6e486 [Doc]add doc for Qwen models tool calling (#14478) Robin 2025-03-08 14:58:46 +08:00
  • 47512b3200 Default to generation_config from model (#12622) Harry Mellor 2025-03-08 07:46:15 +01:00
  • 3b9c6c6947 [CI/Build] refactor: set timezone of container to UTC (#12888) Roger Meier 2025-03-08 07:42:01 +01:00
  • 4aae667668 [core] add extra_args to SamplingParams (#13300) Aviv Keshet 2025-03-07 22:41:18 -08:00
  • 9f3bc0f58c [MISC][V1] Register process killing handler only in the main thread (#14380) Cody Yu 2025-03-07 22:40:06 -08:00
  • 980385f8c1 [Bugfix][Disaggregated] Add a check in send_kv_caches_and_hidden_states and fix the reshape of the KVCache (#14369) Mathis Felardos 2025-03-08 07:39:31 +01:00
  • ca7a2d5f28 Revert "[Perf] Reduce MLA CPU overheads in V1 (#14384)" (#14471) Tyler Michael Smith 2025-03-08 01:18:53 -05:00
  • 333681408f [Bugfix][V1] Handle MLA in kv_cache_interface (#14462) Tyler Michael Smith 2025-03-08 01:18:25 -05:00
  • ef64044079 [V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949) afeldman-nm 2025-03-07 20:48:12 -05:00
  • 66e16a038e [Bugfix] Fix torch_xla which can't handle None seed introduced in #14274 (#14459) yarongmu-google 2025-03-07 15:17:04 -08:00
  • e1f0835ae0 [V1][Metrics] Fix traceback with preemptions+LoRA (#14220) Mark McLoughlin 2025-03-07 20:36:16 +00:00
  • 8ed5421aaa [V1] Eagerly remove finished requests from the batch (#14388) Nick Hill 2025-03-07 10:56:00 -08:00
  • c6359e8ca6 [v1] torch.compile integration explanation (#14437) youkaichao 2025-03-08 01:55:50 +08:00
  • 952a074980 [Misc] Add Phi4-MM example (#14343) Jee Jee Li 2025-03-08 01:28:52 +08:00