Commit Graph

  • b78772c433 [Frontend] supports deepseekv32 chat template (#29837) Chauncey 2025-12-03 20:53:44 +08:00
  • f5d3d93c40 [docker] Build CUDA kernels in separate Docker stage for faster rebuilds (#29452) Amr Mahdi 2025-12-03 03:41:53 -08:00
  • 78f4bb0ba8 [DOC] Add Arm to list of compute resouces providers (#29894) Fadi Arafeh 2025-12-03 11:36:58 +00:00
  • b294e28db2 [refactor] CTMoEMethods to use QuantizationArgs (#28871) HDCharles 2025-12-03 06:00:56 -05:00
  • 787b84a9fc [Bugfix] Follow-up fix on MediaWithBytes (#29951) Roger Wang 2025-12-03 02:42:49 -08:00
  • 42c1949643 [Bugfix][Quantization] Support BF16 tensors on GGUF (#29948) Tsukasa OI 2025-12-03 19:33:46 +09:00
  • cc4e296ea6 [CI/Build] Avoid duplicate empty inputs test for common multimodal generation tests (#29907) Isotr0py 2025-12-03 18:27:36 +08:00
  • a21cd9ed23 [Bugfix] Fix incorrect image_grid_thw rank for HunyuanOCR from missing merge_by_field_config=True (#29950) Isotr0py 2025-12-03 18:05:10 +08:00
  • 7fe9c1a223 [CI] Add Async Eplb nightly CI tests (#29385) WeiQing Chen 2025-12-03 17:51:08 +08:00
  • 3f42b05fbc [Refactor] [1/N] to simplify the vLLM serving architecture (#28040) Chauncey 2025-12-03 17:26:39 +08:00
  • 69520bc695 Add logging for cudagraph related info (#29825) Yong Hoon Shin 2025-12-02 23:01:48 -10:00
  • 3a7751485b [responsesAPI] support input output messages for non harmony models (#29549) Andrew Xia 2025-12-02 23:59:23 -08:00
  • bbfb55c29e [Misc] Allow fetch_* utils to access local files by default (#29932) Cyrus Leung 2025-12-03 15:49:34 +08:00
  • 0bec63fa31 [BugFix] fix imgs_pos in hunyuan_vl (#29879) JackieWu 2025-12-03 14:20:37 +08:00
  • c719c40540 [Bugfix] Defunctionalize TRTLLM AR+Norm op for avoiding extra clone kernel before it (#29631) elvischenv 2025-12-03 13:15:50 +08:00
  • b08025a83b [Docs] Discuss api key limitations in security guide (#29922) Russell Bryant 2025-12-02 23:57:28 -05:00
  • 4fd9d6a85c [Core] Rename PassConfig flags as per RFC #27995 (#29646) v0.12.0 Arpit Khandelwal 2025-12-02 22:38:55 -05:00
  • d7284a2604 [Core] Rename PassConfig flags as per RFC #27995 (#29646) Arpit Khandelwal 2025-12-02 22:38:55 -05:00
  • 506ed87e87 [ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues (#29909) Andreas Karatzas 2025-12-02 20:36:49 -06:00
  • 4dd7978374 [Bugfix] Fix regression on pooling models from PR#29621 (#29921) Roger Wang 2025-12-02 18:33:45 -08:00
  • a1d627e40f [BugFix] Fix assert in build_for_cudagraph_capture (#29893) Lucas Wilkinson 2025-12-02 19:56:54 -05:00
  • 5cdd664509 [BugFix] Fix assert in build_for_cudagraph_capture (#29893) Lucas Wilkinson 2025-12-02 19:56:54 -05:00
  • 5f67361fd1 Reverting re-direction to amd_mi355_X. (#29914) Alexei-V-Ivanov-AMD 2025-12-02 18:40:02 -06:00
  • 2f055ec1c1 [Bugfix] Fix incorrect channel order for idefics3 in edge case (#29881) Isotr0py 2025-12-03 00:03:52 +08:00
  • 5d91d2b292 [Doc] Add allocate_slots parameter docs (#29777) maang-h 2025-12-03 07:23:09 +08:00
  • 6a6108511f [BUGFIX] Fix regex pattern for Mistral Tool Call (#29918) Julien Denize 2025-12-02 23:51:58 +01:00
  • 9057fc2f1b [BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention (#29908) Julien Denize 2025-12-02 23:51:20 +01:00
  • a05b580540 [Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764) Chauncey 2025-12-03 06:42:28 +08:00
  • b6ae5aeca6 [Bugfix][EPLB] Prevent user-provided EPLB config from being overwritten with defaults (#29911) Sage Moore 2025-12-02 14:20:22 -08:00
  • 5c7c09af8f [Perf] Avoid pageable HtoD transfer in MinTokensLogitsProcessor (#29826) jthomson04 2025-12-02 13:25:52 -08:00
  • c014de1ec7 [ROCm][CI] Fix test_cudagraph_mode.py Failure For AMD CI (#29808) Micah Williamson 2025-12-02 16:54:36 -06:00
  • 1b1e35aaf9 [BUGFIX] Fix regex pattern for Mistral Tool Call (#29918) Julien Denize 2025-12-02 23:51:58 +01:00
  • 5e5646e206 [BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention (#29908) Julien Denize 2025-12-02 23:51:20 +01:00
  • 0a9caca9f5 [Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764) Chauncey 2025-12-03 06:42:28 +08:00
  • e6f114ac25 [Bugfix][EPLB] Prevent user-provided EPLB config from being overwritten with defaults (#29911) Sage Moore 2025-12-02 14:20:22 -08:00
  • 6fc5841db1 Fix some more Transformers nightly tests (#29872) Harry Mellor 2025-12-02 21:49:44 +00:00
  • 3ff5b53bc2 Bump actions/setup-python from 6.0.0 to 6.1.0 (#29768) dependabot[bot] 2025-12-02 21:29:32 +00:00
  • 1528e079e2 [Perf] Avoid pageable HtoD transfer in MinTokensLogitsProcessor (#29826) jthomson04 2025-12-02 13:25:52 -08:00
  • afb1e5b380 [CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test (#29123) Divakar Verma 2025-12-02 14:46:10 -06:00
  • 1c593e117d Fix boolean nested params, add dict format support, and enhance plotting for vllm bench sweep (#29025) Copilot 2025-12-02 20:40:56 +00:00
  • 7f718169d1 [CI/Build] Fixes missing runtime dependencies (#29822) Benjamin Bartels 2025-12-02 18:21:49 +00:00
  • 339e84ce86 [Bugfix] Fix DeepSeek R1 MTP weight loading (#29545) Matthew Bonanni 2025-12-02 10:52:18 -05:00
  • 34a8559be7 [Chore] Use tokenizer.encode and tokenizer.decode directly (#29851) Cyrus Leung 2025-12-02 20:30:40 +08:00
  • 85fb2e3120 Remove default values from InitVars so that they're not stored (#29859) Harry Mellor 2025-12-02 12:16:37 +00:00
  • a2b053dc85 feat(model): Add BitsAndBytes quantization support for Qwen3-Omni-MoE (#29896) Navanit Dubey 2025-12-03 00:58:35 +05:30
  • 1d93f11675 [Attention][CUDAGraph] Remove CG padding from attention backends (#29352) Matthew Bonanni 2025-12-02 13:48:08 -05:00
  • 2d613de9ae [CI/Build] Fixes missing runtime dependencies (#29822) Benjamin Bartels 2025-12-02 18:21:49 +00:00
  • c77b9929a0 Update AMD-CI testing mirror (as of 2025-12-02) (#29898) Alexei-V-Ivanov-AMD 2025-12-02 11:52:54 -06:00
  • 63b1da76ba [Chore]: Reorganize gguf utils funtions under transformers_utils (#29891) Isotr0py 2025-12-03 01:33:23 +08:00
  • 52cb349fc0 [responsesAPI][3] ResponsesParser to set up non harmony MCP (#29413) Andrew Xia 2025-12-02 08:24:45 -08:00
  • 0ec8422171 [Bugfix] Fix incorrect channel order for idefics3 in edge case (#29881) Isotr0py 2025-12-03 00:03:52 +08:00
  • 2eb4fe9129 [examples] Resettle pooling examples. (#29365) wang.yuqi 2025-12-02 23:54:28 +08:00
  • 51c57b51dd [Bugfix] Fix DeepSeek R1 MTP weight loading (#29545) Matthew Bonanni 2025-12-02 10:52:18 -05:00
  • 60c3d413af [Multimodal][Core] Optimize multimodal preprocessing cache by hashing image bytes instead of pixel values (#29621) ImaGoodFella 2025-12-02 14:49:02 +01:00
  • 68ffbca7e4 [Chore] Use tokenizer.encode and tokenizer.decode directly (#29851) Cyrus Leung 2025-12-02 20:30:40 +08:00
  • 951445a52d Remove default values from InitVars so that they're not stored (#29859) Harry Mellor 2025-12-02 12:16:37 +00:00
  • d8c6210eea Add Mistral Large 3 and Ministral 3 (#29757) Julien Denize 2025-12-02 11:29:00 +01:00
  • 8bbcf8b6e7 [vLLM Benchmark Suite] Add default parameters section and update CPU benchmark cases (#29381) Louie Tsai 2025-12-02 01:00:23 -08:00
  • 70fb77b4dc [BugFix] add max-num-batched-token to scheduler hash (#29829) Boyuan Feng 2025-12-02 00:55:02 -08:00
  • 48d15a32aa [CI] Fix Bad_words test for tokenizer encode/decode asymmetry (#28193) 杰兮 2025-12-02 16:02:12 +08:00
  • 3b221cb661 [BugFix] respect VLLM_LOGGING_LEVEL in logger (#29761) Boyuan Feng 2025-12-01 23:49:16 -08:00
  • 0037b5746a [Core] Eliminate redundant is_encoder_decoder lookups (20-40us/step) (#29800) Wushi Dong 2025-12-01 23:08:07 -08:00
  • f5b0846ba0 Fix some Transformers nightly tests (#29802) Harry Mellor 2025-12-02 07:05:27 +00:00
  • 13ea39bc09 [CPU]Parallelize over tokens in int4 moe (#29600) Zhang Xiangze 2025-12-02 14:21:39 +08:00
  • 4b612664fd [CI] Renovation of nightly wheel build & generation (take 2) (#29838) Shengqi Chen 2025-12-02 14:17:10 +08:00
  • 653591d5e7 [Chore] Move tokenizer initialization methods (#29793) Cyrus Leung 2025-12-02 13:33:37 +08:00
  • e2fbfc955e [CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm (#29827) Divakar Verma 2025-12-01 23:27:46 -06:00
  • a690fb5bd6 [CI][ROCm] Fix test_correctness_sliding_window (#29243) Divakar Verma 2025-12-01 22:53:27 -06:00
  • 81fe3f82af [BugFix] Fix index error in ngram_proposer (#29779) usberkeley 2025-12-02 12:48:11 +08:00
  • 53bf71b0f0 [Misc] Update conftest for entrypoints/sagemaker test folder (#29799) Zuyi Zhao 2025-12-01 19:56:39 -08:00
  • f441d36cee Add missing return in _check_vllm_model_embed_input_ids (#29834) Johnny Yang 2025-12-01 19:22:50 -08:00
  • 22274b2184 [Misc] Add ReplicaId to Ray metrics (#24267) Seiji Eicher 2025-12-01 19:21:44 -08:00
  • fc95521ba5 [Misc] Throw error on unintended access to scheduler_config.max_model_len (#29771) Wei Wei 2025-12-01 18:58:44 -08:00
  • d0cd728907 [Core] Support reseting all running requests' KV while calling reset_prefix_cache (#28827) Zhuohan Li 2025-12-01 18:25:05 -08:00
  • fa8804ad9c [responsesAPI][4] fix responseOutputItem Kimi K2 thinking bug (#29555) Andrew Xia 2025-12-01 18:11:35 -08:00
  • 4b40924998 [ROCm] Fallback pytorch GELU with tanh approximation to GELU() (#29244) Divakar Verma 2025-12-01 20:02:22 -06:00
  • c0dfc89485 SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm (#29711) Hendrik Holtmann 2025-12-02 02:24:18 +01:00
  • 44822d7ff2 [BugFix] Preserve spec decoding uniform decode when scheduling (#29759) Nick Hill 2025-12-01 17:15:52 -08:00
  • 342c4f1472 Updated CI mirror 2025-11-25 (#29434) Alexei-V-Ivanov-AMD 2025-12-01 17:44:33 -06:00
  • 1336a1ea24 Revert #29787 and #29690 (#29815) Kevin H. Luu 2025-12-01 13:42:03 -08:00
  • eaf81485ed [Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode (#28935) Nengjun Ma 2025-12-02 04:02:18 +08:00
  • 38caf7fa1a Update FAQ on interleaving sliding windows support (#29796) Finbarr Timbers 2025-12-01 12:15:19 -07:00
  • cabc77cc86 [Core][Observability] Add KV cache residency metrics (#27793) shivampr 2025-12-01 10:27:53 -08:00
  • ec7035c9d4 [ci] Make distributed 8 gpus test optional (#29801) Kevin H. Luu 2025-12-01 10:22:05 -08:00
  • fc6acc88ca [Bugfix] Missing cached item in the MultiModalReceiverCache (#28525) knlnguyen1802 2025-12-02 02:18:07 +08:00
  • d0985c5feb [Hardware][AMD] Remove ROCm skip conditions for transformers backend tests (#29782) BADAOUI Abdennacer 2025-12-01 19:03:13 +01:00
  • 092bb73b8a [Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209) sangbumlikeagod 2025-12-02 02:19:17 +09:00
  • 5d43f7372e [Doc] Update description disable_any_whitespace (#29784) FredericOdermatt 2025-12-01 17:48:33 +01:00
  • 37593deb02 [CI] fix url-encoding behavior in nightly metadata generation (#29787) Shengqi Chen 2025-12-01 23:17:20 +08:00
  • f5516039c5 [Doc] fix heading levels (#29783) Liu Jinyi 2025-12-01 22:49:22 +08:00
  • 36db0a35e4 [CI] Renovation of nightly wheel build & generation (#29690) Shengqi Chen 2025-12-01 21:25:39 +08:00
  • 5cfa967efa [Bugfix] TypeError: 'NoneType' object is not callable (#29414) Marcin Ostrowski 2025-12-01 14:16:44 +01:00
  • b95db244ee [v1] Add real sliding window calculation to FlexAttention direct BlockMask building (#26015) Isotr0py 2025-12-01 21:12:51 +08:00
  • ad9d656bfa [multimodal][test] Reduce memory utilization for test_siglip to avoid OOM (#29504) Zhengxu Chen 2025-12-01 07:41:48 -05:00
  • f37e8938d2 [XPU] Fix AWQ skipped layer detection in IPEX quantization (#29774) Fanli Lin 2025-12-01 20:00:52 +08:00
  • f0a28bf661 [Misc] Unify tokenizer registration (#29767) Cyrus Leung 2025-12-01 19:34:58 +08:00
  • 86e178f7c4 [crashfix] Eagle + multimodal can crash on mm cache miss (#29750) Mickaël Seznec 2025-12-01 10:29:33 +01:00
  • 014ece97c7 [Frontend] Add tool filtering support to ToolServer (#29224) daniel-salib 2025-12-01 03:03:57 -05:00
  • 62de4f4257 [Frontend] Resettle pooling entrypoints (#29634) wang.yuqi 2025-12-01 15:30:43 +08:00
  • 83805a6078 [CI] Skip paddleocr_vl for transformer 4.57.3 (#29758) Huamin Li 2025-11-30 20:38:06 -08:00