Commit Graph

  • 9b5b39b650 Update deprecated type hinting in vllm/lora (#18128) Harry Mellor 2025-05-14 11:57:59 +01:00
  • 9ccc6ded42 [doc] add missing import (#18133) Reid 2025-05-14 18:57:34 +08:00
  • d62a076e84 [Model] GritLM supports other attention backends (#18109) Cyrus Leung 2025-05-14 18:33:19 +08:00
  • 259127f8b8 [Bugfix] Fix LoRA test (#18123) Jee Jee Li 2025-05-14 18:25:47 +08:00
  • 612c2edb4f [FEAT] [ROCm]: Add AITER CK 2 Stages MoE support (#17110) TJian 2025-05-14 18:03:11 +08:00
  • 38fe728d60 [Bugfix] Fix QKVCrossParallelLinear::sync_weight_attrs for PyTorch compile (#17844) Andrzej Kotłowski 2025-05-14 11:39:51 +02:00
  • 82e7f9bb03 [Misc] replace does not exist model (#18119) rongfu.leng 2025-05-14 17:13:47 +08:00
  • 63dc3426e0 [Model] Add packed_modules_mapping for Qwen3-MOE (#18118) Jee Jee Li 2025-05-14 17:13:19 +08:00
  • 8f5dc41481 [Bugfix] Fix entrypoints audio test failure (#18111) Cyrus Leung 2025-05-14 17:08:07 +08:00
  • 63ad622233 [New Model]: support GTE NewModel (#17986) wang.yuqi 2025-05-14 16:31:31 +08:00
  • e7ef61c1f0 [Bugfix][Example] make lmcache v0 work. (#18051) majianpeng 2025-05-14 14:43:44 +08:00
  • d4154c35a2 [Bugfix] fix moe marlin topk_weight loading (#18080) Jinzhen Lin 2025-05-14 14:31:57 +08:00
  • 6685890d11 [Fix] Move "model_config" as keyword args in chat_utils.py (#18098) lkchen 2025-05-13 23:27:26 -07:00
  • 33011318c2 Fix broken example: examples/offline_inference/profiling at scheduler_config (#18117) Ecthlion_zyy 2025-05-14 14:19:14 +08:00
  • 4f8b373225 [BugFix][AMD] Compatible patch for AITER lib after 04/20 (#17912) qli88 2025-05-14 01:05:20 -05:00
  • 7b2f28deba [AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082) Charlie Fu 2025-05-14 00:13:56 -05:00
  • 2d912fb66f [FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 (#17955) vllmellm 2025-05-14 13:03:47 +08:00
  • 12e6c0b41c [Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig (#18086) Michael Goin 2025-05-13 23:36:17 -04:00
  • 9a2a6357de [Bugfix] Fix FP8 Marlin MoE and enable for compressed-tensors models (#18026) Michael Goin 2025-05-13 22:48:33 -04:00
  • 6266c57bae [core][distributed] add ep group and all2all interface (#18077) youkaichao 2025-05-14 10:46:49 +08:00
  • 754b699cbe [Bug]: Fix S3 model/tokenizer path resolution (#18083) Jon Gill 2025-05-13 19:34:17 -07:00
  • 6e27c6d86b [Misc] Remove unused numpy tensor (#18084) Roger Wang 2025-05-13 19:33:40 -07:00
  • d5af47a149 [P/D] Add some more debug logs to NixlConnector (#18102) Nick Hill 2025-05-13 19:33:03 -07:00
  • 65f0f74b66 [Hardware/NVIDIA/Modelopt] Fix modelopt forward method for v1 torch.compile (#18101) Pavani Majety 2025-05-13 19:33:00 -07:00
  • 176a95c670 [Fix] Support CUDAGraph capture for encoder-decoder on ROCm (#18104) Luka Govedič 2025-05-13 22:31:42 -04:00
  • f2ae883b67 [v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001) Chen Zhang 2025-05-14 10:09:39 +08:00
  • 40de1ef455 [FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968) vllmellm 2025-05-14 10:08:20 +08:00
  • 0189a65a2e [Docs] Expand security doc with firewall info (#18081) Russell Bryant 2025-05-13 15:36:00 -04:00
  • 55aa7af994 [V1] DP scale-out (2/N): Decouple engine process management and comms (#15977) Nick Hill 2025-05-13 10:48:21 -07:00
  • 0b217da646 Update deprecated type hinting in vllm/adapter_commons (#18073) Harry Mellor 2025-05-13 16:32:51 +01:00
  • 19324d660c Update deprecated type hinting in vllm/compilation (#18072) Harry Mellor 2025-05-13 16:32:48 +01:00
  • fc407a1425 Give auto-merge label workflow permission to add labels to issues (#18078) Harry Mellor 2025-05-13 15:53:13 +01:00
  • 009d9e7590 Convert benchmarks to ruff format (#18068) Harry Mellor 2025-05-13 14:43:29 +01:00
  • b922c2ebd2 [Bugfix] Fix entrypoints metrics tests (#18063) Cyrus Leung 2025-05-13 21:42:43 +08:00
  • 00b14e0f16 [CI] set token permissions for pre-commit CI job (#17729) Russell Bryant 2025-05-13 09:38:30 -04:00
  • 54e467e6f8 [CI] Add token permissions for add-ready-label CI job (#17730) Russell Bryant 2025-05-13 09:38:13 -04:00
  • 79a1d25bbd [CI] Add workflow permissions for helm CI job (#17727) Russell Bryant 2025-05-13 08:49:07 -04:00
  • 9944011b30 [CI] Set token permissions for reminder comment CI job (#17728) Russell Bryant 2025-05-13 08:46:58 -04:00
  • 8c946cecca Update deprecated type hinting in vllm/transformers_utils (#18058) Harry Mellor 2025-05-13 12:34:37 +01:00
  • ff334ca1cd Update deprecated type hinting in vllm/profiler (#18057) Harry Mellor 2025-05-13 12:34:34 +01:00
  • 6223dd8114 Update deprecated type hinting in model_executor/layers (#18056) Harry Mellor 2025-05-13 12:17:23 +01:00
  • 906f0598fc [doc] add download/list/delete HF model CLI usage (#17940) Reid 2025-05-13 19:15:51 +08:00
  • cb528d0585 [Fix] check to make sure processor has chat templates (#18047) Aaron Pham 2025-05-13 06:04:10 -04:00
  • 98fcba1575 Convert .buildkite to ruff format (#17656) Harry Mellor 2025-05-13 10:28:31 +01:00
  • 23b3134eb5 [Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722) Russell Bryant 2025-05-13 04:47:29 -04:00
  • ea6ae8cb45 [Bugfix] Fix marlin moe fallback logic for llama4 (#18042) Michael Goin 2025-05-13 03:53:28 -04:00
  • 2ff297dce9 [BugFix] Set default random seed to 0 for V1 (#17929) Woosuk Kwon 2025-05-13 00:52:19 -07:00
  • 8dd0671bac [Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP (#17916) Jin Huang 2025-05-13 03:10:07 -04:00
  • f0d610a8ae [v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999) Chen Zhang 2025-05-13 14:50:38 +08:00
  • e57e4d6e9e Fix Broken macro for cutlass moe (#18049) Driss Guessous 2025-05-12 23:31:06 -07:00
  • ee5be834e7 [BugFix] Fix 4-GPU RLHF tests (#18007) Nick Hill 2025-05-12 23:03:55 -07:00
  • 48545728d8 cleanup invalid prints (#18050) Calvin Chen 2025-05-13 14:01:57 +08:00
  • dc1a821768 [Feature][V1] Support tool_choice: required when using Xgrammar as the StructuredOutputBackend. (#17845) Chauncey 2025-05-13 14:01:31 +08:00
  • 61e0a506a3 [Bugfix] Avoid repeatedly creating dummy data during engine startup (#17935) Cyrus Leung 2025-05-13 13:40:19 +08:00
  • 1df491c522 [Bugfix] Fixes for new marlin moe usage (#18017) Michael Goin 2025-05-12 23:50:04 -04:00
  • d8487ef557 [ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 (#13779) Arjun Kathuria 2025-05-13 09:06:33 +05:30
  • c06af9a959 [Misc] Slight spelling modification (#18039) Jee Jee Li 2025-05-13 11:36:27 +08:00
  • 60f7624334 Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844) Tao He 2025-05-13 10:52:47 +08:00
  • f6518b2b48 [ROCm] Skip tests for quantizations incompatible with ROCm (#17905) hissu-hyvarinen 2025-05-13 03:39:28 +03:00
  • d67085c2c8 Remove noisy warnings from SchedulerConfig (#17995) Harry Mellor 2025-05-13 01:33:45 +01:00
  • 307939f299 Use NVFP4 Marlin for CompressedTensorsW4A16Fp4 (#18000) Michael Goin 2025-05-12 20:07:34 -04:00
  • 9d7ea9dbbf Update some more deprecated type hinting (#17998) Harry Mellor 2025-05-13 00:49:33 +01:00
  • acee8f48aa [Model] Support MiMo-7B inference with MTP (#17433) bwshen-mi 2025-05-13 07:25:33 +08:00
  • f065de4e88 Fix FBGEMM integration (#18002) Michael Goin 2025-05-12 19:02:07 -04:00
  • dc9905368d [V1][Spec Decode] Eagle unit tests (#17350) wwl2755 2025-05-12 16:01:17 -07:00
  • ebab1ac37c [CI] Make JSON output tests less likely to fail (#17859) Russell Bryant 2025-05-12 18:31:54 -04:00
  • 2b0db9b0e2 Enable standard language model for torhc nightly (#18004) Yang Wang 2025-05-12 14:00:04 -07:00
  • 195adb47c0 [Chore] Remove unused method (#18024) Robert Shaw 2025-05-12 16:59:47 -04:00
  • 302f3aca7e [v1][KVCacheManager] Change prefix caching metric from counting blocks to counting tokens (#18003) Chen Zhang 2025-05-13 04:46:12 +08:00
  • e9c730c9bd Enabling "Weight Loading Multiple GPU Test - Large Models" (#18020) Alexei-V-Ivanov-AMD 2025-05-12 15:05:33 -05:00
  • 289199feb6 [Core] Use platform-agnostic device control for DP engine core (#17245) Jade Zheng 2025-05-13 03:09:16 +08:00
  • b9fd0d7a69 [CI/Build] Fix TPU V1 Test mixed use of & and && across tests (#17968) Carol Zheng 2025-05-12 12:06:59 -07:00
  • 72a3f6b898 Construct KVTransferConfig properly from Python instead of using JSON blobs without CLI (#17994) Harry Mellor 2025-05-12 19:25:33 +01:00
  • 98ea35601c [Lora][Frontend]Add default local directory LoRA resolver plugin. (#16855) Jonathan Berkhahn 2025-05-12 10:39:10 -07:00
  • d19110204c [P/D] NIXL Integration (#17751) Robert Shaw 2025-05-12 12:46:16 -04:00
  • 05a4324f8e Initialize the delta tool call fields explicitly (#17340) Maximilien de Bayser 2025-05-12 10:28:58 -03:00
  • 7ea6cb28b2 [Misc] Improve modelscope import error (#17983) Jee Jee Li 2025-05-12 18:46:45 +08:00
  • 9fbf2bfbd5 Correcting testcases in builkite job for IBM Power (#17675) Aaruni Aggarwal 2025-05-12 13:41:55 +05:30
  • 3a5ea75129 [Feature] Support DeepSeekV3 Function Call (#17784) Xu Wenqing 2025-05-12 15:45:21 +08:00
  • 891b9d33de [Fix] Benchmark "EngineClient" has no attribute "model_config" (#17976) Brayden Zhong 2025-05-12 01:55:53 -04:00
  • 430783018c [Bugfix][TPU] Use np array when updating cache slot_mapping (#17971) Siyuan Liu 2025-05-11 21:58:33 -07:00
  • 19a3c78d1f [Bugfix] Fix pydantic.errors.PydanticUserError (#17962) Li Wang 2025-05-12 12:58:23 +08:00
  • ada50aa295 [bugfix] fix the wrong parser (#17958) Reid 2025-05-12 12:58:02 +08:00
  • 08bf784078 [Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails (#17623) Cheng Kuan Yong Jason 2025-05-12 09:06:10 +08:00
  • d45fe333fb [misc] add instructions on how to install nvshmem/pplx/deepep (#17964) youkaichao 2025-05-12 09:02:39 +08:00
  • 021c16c7ca [Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861) Isotr0py 2025-05-12 08:56:30 +08:00
  • 7de18d541b [BUG] [ROCm] [MLA] Fix variable name bug due to change in variable name in PR #17483 (#17961) TJian 2025-05-12 00:14:30 +08:00
  • a810b5b088 [BugFix] [ROCm]: Bugfix and handle addition case of input for rocm_aiter_rms_norm (#17857) TJian 2025-05-11 19:17:11 +08:00
  • 009b3d5382 [Misc] not show --model in vllm serve --help (#16691) Reid 2025-05-11 16:47:58 +08:00
  • e4b8713380 [New Model]: nomic-embed-text-v2-moe (#17785) wang.yuqi 2025-05-11 15:59:43 +08:00
  • 06c0922a69 [FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 (#17870) Gregory Shtrasberg 2025-05-11 03:58:45 -04:00
  • cd3edfc908 [Misc] Add compressed-tensors NVFP4A16 emulation support (#17914) Dipika Sikka 2025-05-11 03:58:38 -04:00
  • 9cea90eab4 [Frontend] Add /classify endpoint (#17032) Frieda Huang 2025-05-11 03:57:07 -04:00
  • d1110f5b5a [doc] update lora doc (#17936) Reid 2025-05-11 15:56:21 +08:00
  • 8132365b74 [Bugfix]: v1 engine - consider lora adapters in allowed_token_ids (#17855) Ben Browning 2025-05-11 03:53:58 -04:00
  • eea22a56ab fix amd triton mla path (#17871) Shiyan Deng 2025-05-11 00:53:31 -07:00
  • 9112155283 [Perf] Use small max_num_batched_tokens for A100 (#17885) Kuntai Du 2025-05-11 00:53:23 -07:00
  • 90d0a74b60 [Bugfix] Add revision to transformers.Auto*.from_pretrained processors (#17948) xinli-centml 2025-05-11 03:52:44 -04:00
  • d74e5f37bc [Kernel] fp4 marlin kernel (#17687) Jinzhen Lin 2025-05-11 10:58:49 +08:00
  • ca66a1674c [v1] Rename specialized_manager.py to single_type_kv_cache_manager.py (#17946) Chen Zhang 2025-05-11 07:14:12 +08:00