Commit Graph

  • 3496274663 [Misc] Convert VLLM_TORCH_PROFILER_DIR path to absolute (#23191) Ning Xie 2025-08-22 03:49:09 +08:00
  • 8a19303173 [BugFix][gpt-oss] Fix Chat Completion with Multiple Output Message (#23318) Chen Zhang 2025-08-21 10:31:11 -07:00
  • 603fbbbce0 [Misc] Misc code cleanup/simplification (#23304) Nick Hill 2025-08-21 10:22:55 -07:00
  • 10f535c086 [Bugfix] Fix port conflict by obtaining a list of open ports upfront (#21894) Ming Yang 2025-08-21 10:22:18 -07:00
  • 48bfb0c9b7 [Bug] Fix R1 Accuracy 0 Bug (#23294) Wentao Ye 2025-08-21 13:11:28 -04:00
  • f8ce022948 add tg-mxfp4-moe-test (#22540) Lain 2025-08-21 10:05:47 -07:00
  • 0278f1ac3a Fix nvfp4 swizzling (#23140) Yi Liu 2025-08-22 00:54:50 +08:00
  • a482e4e769 Migrate MolmoImageInputs to TensorSchema (#22022) Benji Beck 2025-08-21 09:54:08 -07:00
  • e0b056e443 [ci/build] Fix abi tag for aarch64 (#23329) youkaichao 2025-08-21 23:32:55 +08:00
  • 79f05e4436 [Multimodal] Always enable hashing mm data (#23308) Roger Wang 2025-08-21 07:23:28 -07:00
  • f8daddcc4c [Bugfix] set system_message in phi4mini chat template (#23309) jerryzhuang 2025-08-22 00:22:39 +10:00
  • c8e33c72c6 [V1] Remove unnecessary check for main thread (#23298) Robert Shaw 2025-08-21 10:08:35 -04:00
  • d70a16625d [Performance] V1 Pooling Models E2E Performance Optimization (#23162) wang.yuqi 2025-08-21 21:26:09 +08:00
  • 5cc54f7c5b [Doc] Fix batch-level DP example (#23325) Cyrus Leung 2025-08-21 21:16:38 +08:00
  • 0c6e40bbaa [Refactor] Simplify code for MM budget (#23310) Cyrus Leung 2025-08-21 16:00:16 +08:00
  • 2e2000f352 [Model] Add LFM2 architecture (#22845) Paul Pak 2025-08-21 01:35:07 -06:00
  • 31282401b6 [BugFix] Fix Python 3.9 Support (#23306) Jared O'Connell 2025-08-21 02:23:56 -04:00
  • 0c31e28e95 [Bugfix] Fix extra whitespace in strings caused by newline (#23272) Cyrus Leung 2025-08-21 13:03:00 +08:00
  • f571ff8eb6 [Sampler] Support returning final logprobs (#22387) 22quinn 2025-08-20 21:28:32 -07:00
  • f64ee61d9e [CI] Block the cu126 wheel build while broken (#23285) Michael Goin 2025-08-21 00:21:05 -04:00
  • 8993073dc1 [CI] Delete images older than 24h. (#23291) QiliangCui 2025-08-21 04:15:20 +00:00
  • 655a09f653 [Model][VLM] Support R-4B Model (#23246) 杨奇(yann qi) 2025-08-21 12:08:52 +08:00
  • f94bf9b924 [Compile] Fix Compile Warning SM100 Cutlass MLA (#23287) Wentao Ye 2025-08-20 23:09:39 -04:00
  • 3663870c72 [V1][Mamba1] - Full CUDA and Piecewise CUDA Graphs Support (#23035) Asaf Joseph Gardin 2025-08-21 06:08:51 +03:00
  • 2461d9e562 [CI/Build] Split out mm processor tests (#23260) Cyrus Leung 2025-08-21 11:05:20 +08:00
  • 7be5d113d8 [CPU] Refactor CPU W8A8 scaled_mm (#23071) Li, Jiang 2025-08-21 09:34:24 +08:00
  • b029de9902 [Optimization] Make new_block_ids None if empty (#23262) Woosuk Kwon 2025-08-20 18:25:56 -07:00
  • bbea1cefdd [CI Bugfix] Fix CI by fully removing --enable-prompt-adapter (#23284) Michael Goin 2025-08-20 20:18:12 -04:00
  • f5aa307d77 Remove duplicate entry in vllm.attention.__all__ (#23296) Russell Bryant 2025-08-20 20:14:59 -04:00
  • 4b795020ed [EP] Add logging for experts map (#22685) 22quinn 2025-08-20 16:46:06 -07:00
  • c86af22f31 [Fix] remove is_marlin param in benchmark_moe (#23286) shixianc 2025-08-20 15:04:21 -07:00
  • 10cc12ba66 Feature/mla tests (#23195) Matthew Bonanni 2025-08-20 17:46:47 -04:00
  • a4fbb32fab Remove chunked_prefill_enabled flag in V1 MLA (#23183) Matthew Bonanni 2025-08-20 17:43:17 -04:00
  • 1b125004be [misc] fix multiple arch wheels for the nightly index (#23110) youkaichao 2025-08-21 05:15:34 +08:00
  • 4fbda0b20c [Feature] use --eplb_config to set eplb param (#20562) rongfu.leng 2025-08-21 05:07:28 +08:00
  • 1da94e673c Do not use eval() to convert unknown types (#23266) v0.10.1.1 Russell Bryant 2025-08-20 16:28:30 -04:00
  • d8b736f913 Limit HTTP header count and size (#23267) Russell Bryant 2025-08-20 13:57:37 -04:00
  • 3a8708f60a [BugFix] fix CUTLASS MLA full cudagraph (#23200) Lucas Wilkinson 2025-08-19 18:17:08 -04:00
  • 4e51fa8cba Do not use eval() to convert unknown types (#23266) Russell Bryant 2025-08-20 16:28:30 -04:00
  • bf7c99dfc4 [Perf] Speed up function _convert_tokens_to_string_with_added_encoders by 13.7x (#20413) Saurabh Misra 2025-08-20 13:17:11 -07:00
  • b95697d731 [Frontend] improve error logging of chat completion (#22957) Chen Zhang 2025-08-20 13:03:37 -07:00
  • 582bbe6bd7 [Fix] correct tool_id for kimi-k2 when use tool_choice=required (#21259) bigmoyan 2025-08-21 03:59:54 +08:00
  • 0cdbf5e61c [Kernel/Quant] Remove the original marlin format and qqq (#23204) Michael Goin 2025-08-20 15:13:36 -04:00
  • ebe56a0064 Small fix for Command-A-Vision (#23268) dongluw 2025-08-20 14:15:18 -04:00
  • f77a0802b7 Limit HTTP header count and size (#23267) Russell Bryant 2025-08-20 13:57:37 -04:00
  • c4477f55e5 Migrate Mistral3ImagePixelInputs to TensorSchema (#21945) Benji Beck 2025-08-20 10:37:29 -07:00
  • dfd2382039 [torch.compile] Support conditional torch.compile per module (#22269) Yong Hoon Shin 2025-08-20 09:52:59 -07:00
  • 3b11b26b50 [FIXBUG ] Allow disabling rocm_aiter_fa backend for ROCm GPUs not compatible with AITER (#22795) JartX 2025-08-20 18:08:29 +02:00
  • d6d13bd49e [Misc] Add max_seq_len to CommonAttentionMetadata (#23216) Woosuk Kwon 2025-08-20 09:05:29 -07:00
  • 5efd6905bc [CLI][Doc] Formalize --mm-encoder-tp-mode (#23190) Cyrus Leung 2025-08-20 23:42:28 +08:00
  • b17109beea [Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045) shixianc 2025-08-20 07:35:26 -07:00
  • 4449235843 [Bugfix] Ensure correctness of HCXVision processing (#23254) Cyrus Leung 2025-08-20 22:19:30 +08:00
  • 38217877aa [Fix] fix offline env use local mode path (#22526) rongfu.leng 2025-08-20 21:34:49 +08:00
  • c6d80a7a96 [Model] Improve olmo and olmo2 (#23228) Jee Jee Li 2025-08-20 20:47:05 +08:00
  • 7cd17e22d7 [Model][V1] Support Ernie MTP (#22169) xyxinyang 2025-08-20 20:41:55 +08:00
  • 50df09fe13 Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image (#23129) Michael Goin 2025-08-20 08:05:54 -04:00
  • 68fcd3fa73 [Bugfix] Ensure correctness of Cohere2Vision processing (#23245) Cyrus Leung 2025-08-20 19:09:18 +08:00
  • 83e69a09d6 [Model] Support deepseek with eagle (#21086) Xin Yang 2025-08-20 04:01:31 -07:00
  • 3aa8c10038 Fix missing quotes (#23242) Shiming Zhang 2025-08-20 18:46:59 +08:00
  • 103f1ec8d3 [Model] use autoWeightsLoader for gptoss (#22446) Calvin Chen 2025-08-20 18:16:27 +08:00
  • d983769c41 fix cuda graph (#22721) who who who 2025-08-20 14:24:37 +08:00
  • 8fd920924c [BugFix] Fix stuck stats/metrics after requests are aborted (#22995) Nick Hill 2025-08-19 22:50:29 -07:00
  • de7b67a023 [CI/Build] Sync multimodal tests (#23181) Cyrus Leung 2025-08-20 13:06:42 +08:00
  • f729023272 [CI/Build] Also check DP in benchmarks throughput script (#23038) Zhewen Li 2025-08-19 21:09:27 -07:00
  • 1a3079a15e chore: support pytorch format in lora (#22790) 길재은 2025-08-20 13:02:50 +09:00
  • 941f56858a Fix a performance comparison issue in Benchmark Suite (#23047) Louie Tsai 2025-08-19 20:14:32 -07:00
  • a634733f67 [Attention] Optimize make_local_attention_virtual_batches for Flash Attention (#23185) Zebing Lin 2025-08-19 22:57:47 -04:00
  • 64ab3c7253 [Doc] Update V1 status of various pooling models (#23189) Cyrus Leung 2025-08-20 10:33:41 +08:00
  • e58c5a9768 [Core] Add torch profiler CPU traces for AsyncLLM. (#21794) Chenheli Hua 2025-08-19 19:32:47 -07:00
  • d46d417b58 [CI Perf] Only test bfloat16 for tests/compile/test_fusion_all_reduce.py (#23132) Michael Goin 2025-08-19 22:18:52 -04:00
  • 0167efe20d [Core] Optimize scheduler request removal for single completions (#21917) 633WHU 2025-08-20 09:25:59 +08:00
  • c32e6ad1f6 [Quantization] Bump Compressed Tensors Version (#23202) Kyle Sayers 2025-08-19 20:39:28 -04:00
  • 1630cc8d0f [Benchmarks] Add video inputs to ShareGPTDataset. (#23199) Chenheli Hua 2025-08-19 16:42:31 -07:00
  • 14e2b0730b [BugFix] fix CUTLASS MLA full cudagraph (#23200) Lucas Wilkinson 2025-08-19 18:17:08 -04:00
  • 0f4f0191d8 [CI/Build] Replace lm-eval gsm8k tests with faster implementation (#23002) Michael Goin 2025-08-19 18:07:30 -04:00
  • a38b8af4c3 [NVIDIA] Add SM100 Flashinfer Cutlass MoE fp8 backend (#22357) amirkl94 2025-08-20 01:01:53 +03:00
  • 21dce80ea9 [CI/Build] Add support for Python 3.13 (#13164) Michael Goin 2025-08-19 16:49:34 -04:00
  • e61bac87ee [Misc] Minor refactoring for FlashInfer backend (#23147) Woosuk Kwon 2025-08-19 13:11:51 -07:00
  • 80141bbf2f fix: use cache_salt for gpt-oss (#23186) Marko Rosenmueller 2025-08-19 20:12:25 +02:00
  • b94faf9d50 [Bugfix] Fix accuracy issue when using flashinfer cutlass moe, TP=1 and modelopt. (#23125) bnellnm 2025-08-19 14:00:51 -04:00
  • 5b5f350d67 [Misc] Enable yapf for FlashInfer backend (#23193) Woosuk Kwon 2025-08-19 10:33:47 -07:00
  • f7cf5b512e [Frontend] Add /collective_rpc API endpoint (#23075) 22quinn 2025-08-19 10:29:32 -07:00
  • 03d4235fd2 [Misc] Fix the benchmark's README and improve the error messages for the benchmark's argument checks (#22654) Ruixiang Tan 2025-08-20 01:18:51 +08:00
  • d6a1a20973 [CI/Build] Update transformers to v4.55.2 (#23093) Isotr0py 2025-08-20 01:06:17 +08:00
  • a70d0bd0a3 Migrate LlavaOnevisionMultiInputs to TensorSchema (#21844) Benji Beck 2025-08-19 10:02:02 -07:00
  • 24f4d1a224 Add return_token_ids parameter to OpenAI API endpoints (#22587) Yuge Zhang 2025-08-20 00:48:31 +08:00
  • 4f510bc2a1 [Model] Removes redundant all-reduce operation in Qwen3MoeSparseMoeBlock (#23169) yiz-liu 2025-08-20 00:18:41 +08:00
  • 1298c67795 [FEAT] [Performance] Enable DP for ViT in Qwen2.5VL (#22742) TJian 2025-08-19 08:25:57 -07:00
  • 4d9c61993a [Bugfix] Fix benchmark_moe.py (#23177) Jee Jee Li 2025-08-19 21:39:40 +08:00
  • b87cb97a53 [Model] support new model ovis2.5 (#23084) myselvess 2025-08-19 21:12:59 +08:00
  • f856c33ce9 [Model] Add multi_label_classification support (#23173) wang.yuqi 2025-08-19 20:54:30 +08:00
  • 03752dba8f [NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716) elvischenv 2025-08-19 20:22:15 +08:00
  • 40f26734b9 [Misc] Fix seq_lens for graph capture (#23175) Woosuk Kwon 2025-08-19 03:58:16 -07:00
  • 2c3f557f08 [Doc] use power of 2 (#23172) Tialo 2025-08-19 13:16:23 +03:00
  • 21bcc8263f [Misc] Avoid accessing req_ids inside a loop (#23159) Woosuk Kwon 2025-08-19 02:39:38 -07:00
  • 5bfe0dea7a [bug fix] Fix llama4 spec decoding (#22691) qizixi 2025-08-19 17:53:24 +09:00
  • 31fd3265c8 [Bugfix] Fix broken Minimax-01-VL model (#22116) Isotr0py 2025-08-19 16:49:29 +08:00
  • 31436e8b4f [Misc] Add request_id into benchmark_serve.py (#23065) hustxiayang 2025-08-19 04:32:18 -04:00
  • 4efd43e9b4 Fix GLM-4.5V-FP8 numerical issue (#22949) qizixi 2025-08-19 16:56:31 +09:00
  • 3c8a787247 [Benchmark] Add flag --served-model-name to benchmark_serving_multi_turn (#22889) Daniel Serebrenik 2025-08-19 10:48:07 +03:00