Commit Graph

  • 9112b443a0 [Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011) Siyuan Liu 2025-06-02 17:06:20 -07:00
  • c57d577e8d add an absolute path for run.sh (#18258) Calvin Chen 2025-06-03 03:38:23 +08:00
  • ca2f6b9c30 [Bugfix][Model] Attempt to fix eagle in V0. (#18978) Gregory Shtrasberg 2025-06-02 11:15:53 -04:00
  • 20133cfee2 [Frontend] enable custom logging for the uvicorn server (OpenAI API server) (#18403) Frαnçois 2025-06-02 17:04:23 +02:00
  • ebb1ec9318 [Model] enable data parallel for Llama4 vision encoder (#18368) jennyyyyzhen 2025-06-02 04:22:54 -07:00
  • 5b168b6d7a [doc] add pytest tips (#19010) Reid 2025-06-02 19:07:26 +08:00
  • 9760fd8f6a [Core] Support inplace model weights loading (#18745) 22quinn 2025-06-02 02:38:50 -07:00
  • b9f61e1387 [Bugfix][Nixl] Fix DP Metadata Handshake (#19008) Robert Shaw 2025-06-01 23:30:41 -04:00
  • d6fd3a33b8 [Misc] reuse num_tokens_across_dp of get_dp_padding to avoid unnecessary dp all reduce in set_forward_context (#18935) zhrrr 2025-06-02 03:41:18 +08:00
  • 432ec9926e [doc] wrong output (#19000) Reid 2025-06-01 19:26:14 +08:00
  • 2b102d51ad [BugFix] Fix incorrect metrics shutdown error log message (#18992) Nick Hill 2025-05-31 20:42:23 -07:00
  • aa54a7bf7b [BugFix] fix data parallel construct ipv6 url addres (#18991) rongfu.leng 2025-06-01 11:42:10 +08:00
  • 2ad6194a02 Let max_num_batched_tokens use human_readable_int for large numbers (#18968) Michael Goin 2025-05-31 23:41:29 -04:00
  • c594cbf565 [doc] small fix - mkdocs (#18996) Reid 2025-06-01 11:23:43 +08:00
  • a35ca765a5 [LoRA] Support dynamically initialize packed_modules_mapping for VLM with arbitrary components (#18987) Isotr0py 2025-06-01 11:06:57 +08:00
  • 6aa8f9a4e7 [Core] Rework dtype resolution (#18751) Cyrus Leung 2025-06-01 11:04:23 +08:00
  • 1bc86a3da1 [Bugfix] Fix EAGLE3 broken logits (#18909) Benjamin Chislett 2025-05-31 22:58:07 -04:00
  • bbfa0c61d1 [Misc][Benchmark] Add support for CustomDataset (#18511) Ekagra Ranjan 2025-05-31 15:07:38 -04:00
  • 20079c6e36 [Misc] add return token strs for tokenize (#18941) Reid 2025-06-01 02:00:11 +08:00
  • 9a1b9b99d7 [BugFix] Fix multi-node offline data-parallel (#18981) Nick Hill 2025-05-31 08:34:52 -07:00
  • 8bf507d766 [P/D] NixlConnector use cache device index for memory registration (#18969) ptarasiewiczNV 2025-05-31 17:19:18 +02:00
  • 306d60401d [ROCm][Kernel] Add gfx950 support for skinny gemms (#18010) Charlie Fu 2025-05-31 09:40:05 -05:00
  • f2c3f66d59 [Bugfix] Fix for issue 17396 (#18773) Fred Reiss 2025-05-31 04:58:17 -07:00
  • 0f5e0d567e [FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825) vllmellm 2025-05-31 18:39:31 +08:00
  • c55d804672 [BugFix] Pydantic part 2 (#18911) Luka Govedič 2025-05-31 06:39:28 -04:00
  • 749f5bdd38 [doc] fix the list rendering issue - security.md (#18982) Reid 2025-05-31 18:39:21 +08:00
  • 2a50ef5760 [Neuron] Add Multi-Modal model support for Neuron (#18921) Satyajith Chilappagari 2025-05-31 03:39:11 -07:00
  • b8b904795d fix security issue of logging llm output (#18980) Lucia Fang 2025-05-31 03:38:56 -07:00
  • ba5111f237 [Bugfix]: Fix the incompatibility issue with Structured Outputs when Thinking is disabled (#18879) Chauncey 2025-05-31 17:20:54 +08:00
  • 1e123529d7 [Misc] Fix estimated max model len msg (#18966) Yong Hoon Shin 2025-05-31 01:43:44 -07:00
  • dff80b0e42 [Frontend] Add rerank support to run_batch endpoint (#16278) Pooya Davoodi 2025-05-31 00:40:01 -07:00
  • 7782464a17 create util function for batched arange (#18937) Yu Guo 2025-05-30 22:50:38 -07:00
  • 0f71e24034 [Docs] Correct multiprocessing design doc (#18964) Lukas Geiger 2025-05-31 02:30:15 +01:00
  • 1dab4d5718 Tool parser regex timeout handling (#18960) Will Eaton 2025-05-30 17:02:54 -04:00
  • 7f21e8052b [Misc] add group_size is -1 in awq quantization (#18910) rongfu.leng 2025-05-31 01:34:22 +08:00
  • 5a8641638a [VLM] Add PP support and fix GPTQ inference for Ovis models (#18958) Isotr0py 2025-05-31 01:11:44 +08:00
  • f49239cb45 Benchmark script for fp8 vs bf16 gemm (#17126) Michael Goin 2025-05-30 12:56:11 -04:00
  • 5fbbfe9a4c [BugFix] FA2 MLA Accuracy Issue (#18807) v0.9.0.1 Lucas Wilkinson 2025-05-28 04:59:39 -04:00
  • 2dbe8c0774 [Perf] API-server scaleout with many-to-many server-engine comms (#17546) Nick Hill 2025-05-30 08:17:00 -07:00
  • 84ec470fca Improve "failed to get the hash of the compiled graph" error (#18956) Richard Zou 2025-05-30 11:00:54 -04:00
  • b29ca5c4d5 [Docs] Update SECURITY.md with link to our security guide (#18961) Russell Bryant 2025-05-30 10:37:27 -04:00
  • ec6833c5e9 [doc] show the count for fork and watch (#18950) Reid 2025-05-30 21:45:59 +08:00
  • e1fadf1197 [Feature] minicpm eagle support (#18943) Shawn Huang 2025-05-30 21:45:56 +08:00
  • 43ff405b90 [CI/Build] remove regex from build dependencies (#18945) Daniele 2025-05-30 13:02:50 +02:00
  • fba02e3bd1 [Bugfix][TPU] Fix tpu model runner testcase failure (#18810) Carol Zheng 2025-05-30 10:04:03 +00:00
  • 4577fc9abb [Misc]Fix typo (#18947) Always-Naive 2025-05-30 17:21:35 +08:00
  • 5f1d0c8118 [Bugfix][Failing Test] Fix test_vllm_port.py (#18618) Rabi Mishra 2025-05-30 14:43:47 +05:30
  • c3bb9f2331 [Model] Use in-place adds in SigLIP (#18922) Lukas Geiger 2025-05-30 10:12:59 +01:00
  • 8f8900cee9 [doc] add mkdocs doc (#18930) Reid 2025-05-30 15:58:44 +08:00
  • 6acb7a6285 [Misc]Fix benchmarks/README.md for speculative decoding (#18897) Rabi Mishra 2025-05-30 13:28:04 +05:30
  • 4f4a6b844a [Deprecation] Remove mean pooling default for Qwen2EmbeddingModel (#18913) Cyrus Leung 2025-05-30 14:53:37 +08:00
  • 4d0a1541be [Bugfix] Remove NVFP4 scales assertions to fix load_format=dummy (#18861) Michael Goin 2025-05-30 01:37:36 -04:00
  • 77b6e74fe2 [ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. (#18938) vllmellm 2025-05-30 13:33:17 +08:00
  • 5acf828d99 [docs] fix: fix markdown syntax (#18927) H 2025-05-29 22:20:48 -07:00
  • 3987e2ae96 [Model] Use AutoWeightsLoader for mamba2 (#18918) iLeGend 2025-05-30 12:50:10 +08:00
  • 77164dad5e [Bugfix] Consistent ascii handling in tool parsers (#18883) Chauncey 2025-05-30 12:44:43 +08:00
  • 3de3eadf5b improve the robustness of parsing vlms config in AutoRound (#18894) Wenhua Cheng 2025-05-30 10:24:47 +08:00
  • 3132290a14 [TPU][CI/CD] Clean up docker for TPU tests. (#18926) Carol Zheng 2025-05-30 02:24:19 +00:00
  • 1aa2f81b43 [Misc] Update type annotation for rotary embedding base (#18914) Cyrus Leung 2025-05-30 10:17:01 +08:00
  • d54af615d5 [Bugfix] Fix PP default fallback behavior for V1 (#18915) Michael Goin 2025-05-29 22:13:17 -04:00
  • a1cc9f33a3 [TPU] remove transpose ops in moe kernel (#18923) Chengji Yao 2025-05-29 16:00:11 -07:00
  • a521ef06e5 Use standalone_compile by default in torch >= 2.8.0 (#18846) Richard Zou 2025-05-29 18:41:58 -04:00
  • 64eaf5fe05 [P/D] NixlConnector DP fixes (#18903) Will Eaton 2025-05-29 14:08:40 -04:00
  • d1d61f3351 [BugFix] Make DP work with connector-delayed new requests (#18559) Nick Hill 2025-05-29 11:04:18 -07:00
  • 32ce3cf7c9 [V1] Allocate kv_cache with stride order for V1 (#18775) Nicolò Lucchesi 2025-05-29 19:54:16 +02:00
  • d58f9c7f7a [Misc] Remove duplicate init for self.vllm_config (#18896) CYJiang 2025-05-30 01:26:07 +08:00
  • c29034037d [Deprecation] Disallow pos-args other than model when initializing LLM (#18802) Cyrus Leung 2025-05-30 00:36:58 +08:00
  • 1b7cfd5a36 [ROCm][V0][Attention] Revert to the previous FA triton kernel (#18226) Gregory Shtrasberg 2025-05-29 12:13:18 -04:00
  • da4b69d0b4 [Attention][V1] Toggle for v1 attention backend (#18275) Gregory Shtrasberg 2025-05-29 10:48:24 -04:00
  • c9479b2920 [Bugfix] Fix the failing gte embedding test (#18720) Isotr0py 2025-05-29 22:39:25 +08:00
  • 6f2909405e [Doc] Fix codeblocks formatting in LoRA adapters documentation (#18907) Hyogeun Oh (오효근) 2025-05-29 23:38:55 +09:00
  • b169d5f7b6 [Misc][Tools][Benchmark] Add benchmark_serving supports for llama.cpp. (#18692) Duyi-Wang 2025-05-29 20:02:08 +08:00
  • f8977c233f Fix an error in dummy weight loading for quantization models (#18855) Chenyaaang 2025-05-29 03:07:20 -07:00
  • f274581f44 [BugFix] Update pydantic to fix error on python 3.10 (#18852) Luka Govedič 2025-05-29 06:05:46 -04:00
  • 0b1447f890 [Bugfix] Ensure tensors are contiguous during serialisation (#18860) Lukas Geiger 2025-05-29 11:05:20 +01:00
  • 24d0ef8970 [Misc] Replace TODO in serving transcription (#18895) Nicolò Lucchesi 2025-05-29 11:58:14 +02:00
  • 7fcfd954ff [Bugfix] Fix misleading information in the documentation (#18845) Jee Jee Li 2025-05-29 17:54:14 +08:00
  • e740d07f07 [doc] add CLI doc (#18871) Reid 2025-05-29 17:51:36 +08:00
  • a652e71dd0 [Doc] Remove redundant spaces from compatibility_matrix.md (#18891) Michael Yao 2025-05-29 17:51:20 +08:00
  • 34d6c447c4 [LoRA] Add LoRA support for InternVL (#18842) Jee Jee Li 2025-05-29 16:46:24 +08:00
  • 972eddf7c9 [Neuron] Add multi-LoRA support for Neuron. (#18284) Satyajith Chilappagari 2025-05-29 01:41:22 -07:00
  • fd7bb88d72 Fixes a dead link in nightly benchmark readme (#18856) Brent Salisbury 2025-05-29 00:41:39 -04:00
  • 3c49dbdd03 Skip device and quant Pydantic validation to make plugin device work (#18843) Yikun Jiang 2025-05-29 11:12:30 +08:00
  • 1661a9c28f [Doc][Neuron] Update documentation for Neuron (#18868) aws-elaineyz 2025-05-28 19:44:01 -07:00
  • 8e882ffdc0 [Bugfix][TPU] fix moe custom kernel import (#18853) Chengji Yao 2025-05-28 19:34:19 -07:00
  • 26b4fa45be Add ability to use CUDAGraphs with use_inductor=False (#17345) Richard Zou 2025-05-28 22:16:52 -04:00
  • 515b413ebf Prevent the cross-encoder logic from being applied to classification tasks (#18838) Maximilien de Bayser 2025-05-28 23:16:17 -03:00
  • 269d901734 [Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix (#18100) Hongxia Yang 2025-05-28 19:21:46 -04:00
  • 7951d78738 [Core] Enable CUDA graphs for DP + All2All kernels (#18724) Varun Sundar Rabindranath 2025-05-28 18:55:30 -04:00
  • 6dbe5b5c93 Remove checks for None for fields which should never be None (#17985) Harry Mellor 2025-05-28 22:32:19 +01:00
  • 643622ba46 [Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend (#15655) Akshat Tripathi 2025-05-28 20:59:09 +01:00
  • a09c7ca9f2 [Chore][Spec Decode] Update check NoneType instead of assigning variables (#18836) Aaron Pham 2025-05-28 14:57:19 -04:00
  • 0e98964e94 [V1][Metrics] Remove metrics that were deprecated in 0.8 (#18837) Mark McLoughlin 2025-05-28 19:54:12 +01:00
  • c68b5c63eb [Misc] fix olmoe model layer can't laod in tp gt 1 (#18828) rongfu.leng 2025-05-29 01:36:21 +08:00
  • fced756923 [Chore] update ty configuration (#18839) Aaron Pham 2025-05-28 11:59:11 -04:00
  • 321331b8ae [Core] Add Lora Support to Beam Search (#18346) Alex Brooks 2025-05-28 09:58:24 -06:00
  • 6e4cea1cc5 decrement server_load on listen for disconnect (#18784) daniel-salib 2025-05-28 07:15:12 -07:00
  • 435fa95444 [Frontend] add run batch to CLI (#18804) Reid 2025-05-28 22:08:57 +08:00
  • 4c2b38ce9e Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599) Harry Mellor 2025-05-28 13:46:04 +01:00
  • d781930f90 [Platform][Dist] Make torch distributed process group extendable (#18763) Mengqing Cao 2025-05-28 18:52:34 +08:00