Commit Graph

  • 42d9a2c4c7 doc: fix bug report Github template formatting (#17486) David Xia 2025-04-30 13:03:20 -04:00
  • 2ac74d098e [doc] add install tips (#17373) Reid 2025-05-01 01:02:41 +08:00
  • 584f5fb4c6 [Bugfix][ROCm] Restrict ray version due to a breaking release (#17480) Gregory Shtrasberg 2025-04-30 12:59:06 -04:00
  • d586ddc691 [BugFix] Fix authorization of openai_transcription_client.py (#17321) zh Wang 2025-05-01 00:51:05 +08:00
  • 0b7e701dd4 [Docs] Update optimization.md doc (#17482) Michael Goin 2025-04-30 10:34:02 -06:00
  • 947f2f5375 [V1] Allow turning off pickle fallback in vllm.v1.serial_utils (#17427) Russell Bryant 2025-04-30 12:10:54 -04:00
  • 739e03b344 [Bugfix] Fixed mistral tokenizer path when pointing to file (#17457) Pete Savage 2025-04-30 16:08:37 +01:00
  • da4e7687b5 [Fix] Support passing args to logger (#17425) Aaron Pham 2025-04-30 11:06:58 -04:00
  • 39317cf42b [Docs] Add command for running mypy tests from CI (#17475) Russell Bryant 2025-04-30 11:06:09 -04:00
  • 2990cee95b [Feature] The Qwen3 reasoning parser supports guided decoding (#17466) Chauncey 2025-04-30 22:48:21 +08:00
  • 0be6d05b5e [V1][Metrics] add support for kv event publishing (#16750) Alec 2025-04-30 16:44:45 +02:00
  • 77073c77bc [Core] Prevent side-channel attacks via cache salting (#17045) Marko Rosenmueller 2025-04-30 14:27:21 +02:00
  • a7d5b016bd [TPU][V1][CI] Update regression test baseline for v6 CI (#17064) Nicolò Lucchesi 2025-04-30 13:03:22 +02:00
  • d803786731 [V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None (#15755) rongfu.leng 2025-04-30 18:20:39 +08:00
  • 1534d389af [Misc] Remove deprecated files (#17447) Chauncey 2025-04-30 16:52:19 +08:00
  • ece5a8b0b6 Make the _apply_rotary_emb compatible with dynamo (#17435) Lu Fang 2025-04-30 00:52:48 -07:00
  • 54072f315f [MODEL ADDITION] Ovis2 Model Addition (#15826) Marco 2025-04-30 09:33:29 +02:00
  • be633fba0f [Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' (#17434) Chauncey 2025-04-30 15:11:04 +08:00
  • ed6cfb90c8 [Hardware][Intel GPU] Upgrade to torch 2.7 (#17444) Kunshang Ji 2025-04-30 15:03:58 +08:00
  • 6ed9f6047e [Intel GPU] [CI]Fix XPU ci, setuptools >=80.0 have build issue (#17298) Kunshang Ji 2025-04-30 13:54:10 +08:00
  • a44c4f1d2f Support LoRA for Mistral3 (#17428) Michael Goin 2025-04-29 22:10:30 -06:00
  • 88fcf00dda Fix some speculative decode tests with tl.dot (#17371) Huy Do 2025-04-29 19:41:02 -07:00
  • d1f569b1b9 Fix call to logger.info_once (#17416) Harry Mellor 2025-04-30 03:39:18 +01:00
  • 13698db634 Improve configs - ModelConfig (#17130) Harry Mellor 2025-04-30 03:38:22 +01:00
  • 2c4f59afc3 Update PyTorch to 2.7.0 (#16859) Huy Do 2025-04-29 19:08:04 -07:00
  • 1c2bc7ead0 Truncation control for embedding models (#14776) Gabriel Marinho 2025-04-29 22:24:57 -03:00
  • 4055130a85 [release] Always git fetch all to get latest tag on TPU release (#17322) Kevin H. Luu 2025-04-29 17:52:11 -07:00
  • 34120f5acd [V1][Feature] Enable Speculative Decoding with Structured Outputs (#14702) Benjamin Chislett 2025-04-29 17:02:10 -07:00
  • 7489ec0bab Remove Bamba 9B from CI (#17407) Harry Mellor 2025-04-29 22:10:31 +01:00
  • 70788bdbdc [V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE (#17211) Bryan Lu 2025-04-29 14:10:00 -07:00
  • c9c1b59e59 Fix: Python package installation for opentelmetry (#17049) Dilip Gowda Bhagavan 2025-04-30 01:50:24 +05:30
  • 0350809f3a Remove Falcon3 2x7B from CI (#17404) Harry Mellor 2025-04-29 20:52:25 +01:00
  • a6977dbd15 Simplify (and fix) passing of guided decoding backend options (#17008) Harry Mellor 2025-04-29 20:02:23 +01:00
  • 2fa2a50bf9 [Bugfix] Fix Minicpm-O-int4 GPTQ model inference (#17397) Isotr0py 2025-04-30 02:21:42 +08:00
  • 08e15defa9 [CI/Build] Add retry mechanism for add-apt-repository (#17107) Reid 2025-04-30 01:40:52 +08:00
  • b37685afbb [CI] Uses Python 3.11 for TPU (#17359) Aaron Pham 2025-04-29 13:39:16 -04:00
  • 792595b59d [TPU][V1][CI] Replace python3 setup.py develop with standard pip install --e on TPU (#17374) Nicolò Lucchesi 2025-04-29 19:36:48 +02:00
  • 0c1c788312 [Doc][Typo] Fixing label in new model requests link in overview.md (#17400) casinca 2025-04-29 19:29:48 +02:00
  • 56d64fbe30 [Docs] Propose a deprecation policy for the project (#17063) Russell Bryant 2025-04-29 13:29:44 -04:00
  • 608968b7c5 Enabling multi-group kernel tests. (#17115) Alexei-V-Ivanov-AMD 2025-04-29 12:27:27 -05:00
  • 06ffc7e1d3 [Misc][ROCm] Exclude cutlass_mla_decode for ROCm build (#17289) TY-AMD 2025-04-30 01:26:42 +08:00
  • d3cf61b89b fix gemma3 results all zero (#17364) Qiming Zhang 2025-04-29 09:40:25 -07:00
  • a39203f99e [Bugfix] add qwen3 reasoning-parser fix content is None when disable … (#17369) mofanke 2025-04-30 00:32:40 +08:00
  • 24e6ad3f16 [V1] Remove num_input_tokens from attn_metadata (#17193) Chen Zhang 2025-04-30 00:28:41 +08:00
  • 2ef5d106bb Improve literal dataclass field conversion to argparse argument (#17391) Harry Mellor 2025-04-29 17:25:08 +01:00
  • 0ed27ef66c Fix: Spelling of inference (#17387) a2q1p 2025-04-30 00:23:39 +08:00
  • 900edfa8d4 Transformers backend tweaks (#17365) Harry Mellor 2025-04-29 17:08:03 +01:00
  • 88ad9ec6b2 [Frontend] Support chat_template_kwargs in LLM.chat (#17356) Cyrus Leung 2025-04-29 22:03:35 +08:00
  • 40896bdf3f pre-commit autoupdate (#17380) Harry Mellor 2025-04-29 14:46:55 +01:00
  • 00ee37efa2 [Bugfix] Clean up MiniMax-VL and fix processing (#17354) Cyrus Leung 2025-04-29 20:42:16 +08:00
  • 890f104cdf [Doc] Fix QWen3MOE info (#17381) Jee Jee Li 2025-04-29 20:38:32 +08:00
  • 4a5e13149a Update docs requirements (#17379) Harry Mellor 2025-04-29 12:35:47 +01:00
  • 97cc8729f0 [Model] Ignore rotary embed load for Cohere model (#17319) Ekagra Ranjan 2025-04-29 03:30:40 -04:00
  • 4464109219 [Build][Bugfix] Restrict setuptools version to <80 (#17320) Gregory Shtrasberg 2025-04-29 03:17:23 -04:00
  • 193e78e35d [Fix] Documentation spacing in compilation config help text (#17342) Hyogeun Oh (오효근) 2025-04-29 16:16:17 +09:00
  • bdb2cddafc [Misc]Use a platform independent interface to obtain the device attributes (#17100) ponix-j 2025-04-29 14:59:13 +08:00
  • ebb3930d28 [Misc] Move config fields to MultiModalConfig (#17343) Cyrus Leung 2025-04-29 14:37:21 +08:00
  • cde384cd92 [Model] support MiniMax-VL-01 model (#16328) qscqesze 2025-04-29 12:05:50 +08:00
  • 96e06e3cb7 [Misc] Add a Jinja template to support Mistral3 function calling (#17195) Chauncey 2025-04-29 10:53:44 +08:00
  • 17eb306fcc [Bugfix] Add contiguous call inside rope kernel wrapper (#17091) Zhengyuan Su (苏政渊) 2025-04-29 10:24:07 +08:00
  • 165cb56329 Ignore '<string>' filepath (#17330) Richard Zou 2025-04-28 22:23:29 -04:00
  • d6da8a8ff2 [Bugfix] Fix numel() downcast in fused_layernorm_dynamic_per_token_quant.cu (#17316) Richard Barnes 2025-04-28 19:23:18 -07:00
  • b4ac4fa04d [model] make llama4 compatible with pure dense layers (#17315) Lucia Fang 2025-04-28 19:22:22 -07:00
  • e136000595 [V1][Spec Decode] Make Eagle model arch config driven (#17323) Ekagra Ranjan 2025-04-28 22:22:02 -04:00
  • 86d9fc29cb implement Structural Tag with Guidance backend (#17333) Michał Moskal 2025-04-28 19:21:32 -07:00
  • 506475de5f [Optim] Compute multimodal hash only once per item (#17314) Cyrus Leung 2025-04-29 09:40:35 +08:00
  • cfe4532093 [Benchmark] Add single turn MTBench to Serving Bench (#17202) Ekagra Ranjan 2025-04-28 19:46:15 -04:00
  • ba41cc90e8 [Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328) v0.8.5 Michael Goin 2025-04-28 16:20:24 -06:00
  • 8fc88d63f1 [Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328) Michael Goin 2025-04-28 16:20:24 -06:00
  • 6e74fd4945 Support loading transformers models with named parameters (#16868) Alex Wu 2025-04-28 15:15:58 -07:00
  • dcbac4cb4b [Model] Qwen3 Dense FP8 Compat Fixes (#17318) Simon Mo 2025-04-28 14:12:01 -07:00
  • ed2462030f [Bugfix] Fix moe weight losing all extra attrs after process_weights_after_loading. (#16854) Charlie Fu 2025-04-28 16:05:07 -05:00
  • cc5befbced [BugFix] Fix cascade attention - RuntimeError: scheduler_metadata must have shape (metadata_size) (#17283) Lucas Wilkinson 2025-04-28 16:55:50 -04:00
  • 2c89cd96a8 [Chore] cleanup license indicators in light of SPDX (#17259) Aaron Pham 2025-04-28 15:43:52 -04:00
  • a0304dc504 [Security] Don't bind tcp zmq socket to all interfaces (#17197) Russell Bryant 2025-04-28 13:08:20 -04:00
  • c7941cca18 Explicitly explain quant method override ordering and ensure all overrides are ordered (#17256) Harry Mellor 2025-04-28 17:55:31 +01:00
  • b6dd32aa07 Make name of compressed-tensors quant method consistent across vLLM (#17255) Harry Mellor 2025-04-28 17:28:13 +01:00
  • f94886946e Improve conversion from dataclass configs to argparse arguments (#17303) Harry Mellor 2025-04-28 17:22:12 +01:00
  • 72dfe4c74f [Docs] Add a security guide (#17230) Russell Bryant 2025-04-28 11:12:17 -04:00
  • 8b464d9660 [Misc] Clean up Qwen2.5-Omni code (#17301) Cyrus Leung 2025-04-28 21:20:45 +08:00
  • 889ebb2638 [Misc] Minor typo/grammar in platforms/interface.py (#17307) Nicolò Lucchesi 2025-04-28 14:45:42 +02:00
  • 3ad986c28b [doc] update wrong model id (#17287) Reid 2025-04-28 19:20:51 +08:00
  • 344e193b7d [Bugfix] Add missing get_language_model to new MLLMs (#17300) Cyrus Leung 2025-04-28 19:09:57 +08:00
  • fb1c933ade Add missing class docstring for PromptAdapterConfig (#17302) Harry Mellor 2025-04-28 12:06:59 +01:00
  • 72c5b97231 Update tpu_worker.py 's typo (#17288) idouba 2025-04-28 19:01:15 +08:00
  • fa93cd9f60 [Model] Add Granite Speech Support (#16246) Alex Brooks 2025-04-28 04:05:00 -06:00
  • aec9674dbe [Core] Remove legacy input mapper/processor from V0 (#15686) Cyrus Leung 2025-04-28 15:38:48 +08:00
  • 7fcc4223dc [Minor][Models] Pass partial_rotary_factor parameter to rope (#17266) Wanrui Dai 2025-04-28 12:28:59 +08:00
  • 8262a3e23b [Misc] Validate stop_token_ids contents (#17268) Nick Hill 2025-04-27 20:54:05 -07:00
  • f211331c48 [Doc] small fix (#17277) Reid 2025-04-28 11:53:35 +08:00
  • 9053d0b134 [Doc] Fix wrong github link in LMCache examples (#17274) Kuntai Du 2025-04-27 20:09:11 -07:00
  • cb3f2d8d10 [Bugfix] Fix Mistral3 spatial merge error (#17270) Michael Goin 2025-04-27 20:40:05 -06:00
  • c12df53b60 [Bugfix] Fix cutlass dispatch for fp8/int8 to properly invoke M<=16 c… (#16751) TherLF 2025-04-28 10:38:42 +08:00
  • d1aeea7553 [Bugfix] Fix missing ARG in Dockerfile for arm64 platforms (#17261) Lennart K. M. Schulz 2025-04-28 04:38:14 +02:00
  • d8bccde686 [BugFix] Fix vllm_flash_attn install issues (#17267) Lucas Wilkinson 2025-04-27 20:27:56 -04:00
  • 20e489eaa1 [V1][Spec Decode] Make eagle compatible with prefix caching. (#17137) Lily Liu 2025-04-27 09:29:43 -07:00
  • 4213475ec7 [Metrics] Fix minor inconsistencies in bucket progression (#17262) Cyrus Leung 2025-04-28 00:19:39 +08:00
  • d92879baf6 [doc] Add feature status legend (#17257) Reid 2025-04-27 23:17:02 +08:00
  • 690fe019f0 [Feature] support sequence parallelism using compilation pass (#16155) cascade 2025-04-27 06:29:35 -07:00
  • ed7a29d9f8 [NVIDIA] Support Cutlass MLA for Blackwell GPUs (#16032) Kaixi Hou 2025-04-27 06:29:21 -07:00