Commit Graph

  • ffbcc9e757 [BugFix] Fix VllmConfig() construction on all platforms (#20695) Nick Hill 2025-07-10 08:00:20 +01:00
  • 59389c927b [BugFix][CPU] Fix CPU worker dependency on cumem_allocator (#20696) Nick Hill 2025-07-10 07:24:20 +01:00
  • 8f2720def9 [Frontend] Support Tool Calling with both tool_choice='required' and $defs. (#20629) Chauncey 2025-07-10 13:56:35 +08:00
  • ad6c2e1a0b Correct PPMissingLayer handling in Deepseek-V2-Lite PP deployment (#20665) Seiji Eicher 2025-07-09 20:34:40 -07:00
  • 49e8c7ea25 Use NVCC --compress-mode to reduce binary size by 30% (#20694) Michael Goin 2025-07-10 10:26:48 +09:00
  • 805d62ca88 [Misc] DP : Add ExpertTokensMetadata (#20332) Varun Sundar Rabindranath 2025-07-09 20:33:14 -04:00
  • b7d9e9416f [CI/Build] Fix FlashInfer double build in Dockerfile (#20651) Michael Goin 2025-07-10 08:41:56 +09:00
  • 7c12a765aa [Misc] Simplify the prefix caching logic on draft tokens (#20701) Woosuk Kwon 2025-07-09 14:48:35 -07:00
  • cd587c93ef [BugFix]: Properly set engine_id when using multi connector (#19487) Yiming 2025-07-10 04:32:44 +08:00
  • 332d4cb17b [Feature][Quantization] MXFP4 support for MOE models (#17888) fxmarty-amd 2025-07-09 22:19:02 +02:00
  • bf03ff3575 [Kernel] Add Conch backend for mixed-precision linear layer (#19818) Jacob Manning 2025-07-09 16:17:55 -04:00
  • 47043eb678 [Kernel] Triton implementation of causal-conv1d for Mamba-based models (#18218) Tuan, Hoang-Trong 2025-07-09 15:53:55 -04:00
  • 31b96d1c64 Support Llama 4 for cutlass_moe_fp4 (#20453) Michael Goin 2025-07-10 04:53:38 +09:00
  • e59ba9e142 [CI/Build] Enlarge tolerance for a CPU multi-modal test (#20684) Li, Jiang 2025-07-10 01:48:52 +08:00
  • 403b481573 Remove heading form installation inc.md file (#20697) Harry Mellor 2025-07-09 18:42:51 +01:00
  • 138709f8d1 [Doc] Update CPU doc (#20676) Li, Jiang 2025-07-10 01:28:30 +08:00
  • 0bbac1c1b4 [Bench] Add NVFP4 GEMM benchmark script (#20578) Michael Goin 2025-07-10 02:23:48 +09:00
  • a3e4e85ece [XPU][CI] enhance xpu test support (#20652) Liangliang Ma 2025-07-10 00:53:09 +08:00
  • eb58f5953d [TPU][Bugfix] fix test_pallas (#20666) Chengji Yao 2025-07-09 09:32:48 -07:00
  • 4ac9c33f78 [Bugfix] Fix handling of Tensorizer arguments for LoadConfig (#20643) Sanger Steel 2025-07-09 11:36:37 -04:00
  • efe73d0575 [doc] update doc format (#20673) Reid 2025-07-09 23:08:19 +08:00
  • 853487bc1b [Docs] Improve docs for RLHF co-location example (#20599) Ricardo Decal 2025-07-09 08:06:43 -07:00
  • 9ff2af6d2b [Benchmark] Parameterization of streaming loading of multimodal datasets (#20528) Li Wang 2025-07-09 21:35:16 +08:00
  • 70ca5484f5 [Doc] Update notes (#20668) Cyrus Leung 2025-07-09 18:46:36 +08:00
  • 5358cce5ff [V1] [Doc] Update V1 docs for Mamba models (#20499) Thomas Parnell 2025-07-09 10:02:41 +02:00
  • 2155e95ef1 [Bugfix] Fix the issue where reasoning_content is None when Thinkng is enabled and tool_choice is set to 'required'. (#20662) Chauncey 2025-07-09 15:39:58 +08:00
  • f95570a52d [Docs] fix minimax tool_calling docs error (#20667) qscqesze 2025-07-09 15:37:07 +08:00
  • b6e7e3d58f [Intel GPU] support ray as distributed executor backend for XPU. (#20659) Kunshang Ji 2025-07-09 15:36:58 +08:00
  • e760fcef22 [XPU] Use spawn with XPU multiprocessing (#20649) Dmitry Rogozhkin 2025-07-09 00:34:28 -07:00
  • 6bbf1795b7 [Misc] Fix the size of batched_dummy_mm_inputs in profile_run (#20434) B-201 2025-07-09 11:15:44 +08:00
  • 9e0ef888f0 Fix bullets in incremental_build.md (#20642) Michael Goin 2025-07-09 12:03:41 +09:00
  • 97abeb1daa [feat] enable SM100 CUTLASS block scaled group gemm for smaller batch sizes (#20640) Duncan Moss 2025-07-08 20:03:35 -07:00
  • 34dad19e7b [Bugfix] set default set cuda_graph_sizes to min(self.max_num_seqs * 2, 512) (#20628) zhrrr 2025-07-09 11:02:51 +08:00
  • 6db31e7a27 [Hardware][PPC64LE] Enable V1 for ppc64le and ARM (#20554) Akash kaothalkar 2025-07-09 08:30:41 +05:30
  • 977180c912 [Docs] Improve documentation for multi-node service helper script (#20600) Ricardo Decal 2025-07-08 19:44:26 -07:00
  • c40784c794 [BugFix][Intel GPU] Use refactored API for dist_backend in V1 worker (#20596) Ratnam Parikh 2025-07-08 19:44:23 -07:00
  • baed180aa0 [tech debt] Revisit lora request model checker (#20636) kourosh hakhamaneshi 2025-07-08 18:42:41 -07:00
  • 0b407479ef [misc]refactor Platform.set_device method (#20262) Kunshang Ji 2025-07-09 09:39:47 +08:00
  • 5eaf570050 Replace multiply_add with homogeneous_multiply_add to Address Clang Template Parameter Issue (#20142) Wenxin Cheng 2025-07-08 17:30:18 -07:00
  • d8ee5a2ca4 [TPU][Bugfix] disable phi-3 test (#20632) QiliangCui 2025-07-08 16:14:26 -07:00
  • b9fca83256 [Bugfix] Fix GLM-4.1-V video prompt update (#20635) Isotr0py 2025-07-09 07:13:58 +08:00
  • 32dffc2772 [Core] Rename get_max_tokens_per_item for backward compatibility (#20630) Cyrus Leung 2025-07-09 07:11:30 +08:00
  • c438183e99 [Bugfix] Fix topk_ids indices_type for CUTLASS w8a8 FP8 MoE (#20166) Ming Yang 2025-07-08 16:10:57 -07:00
  • baba0389f7 [CI] Increase the threshold of the MTEB RERANK tests (#20615) wang.yuqi 2025-07-08 23:10:11 +08:00
  • c6c22f16d3 Revert invalid spellchecker fix on deepseek_vl2 (#20618) viravera 2025-07-08 08:07:14 -07:00
  • dd382e0fe3 [Model] Implement missing get_language_model for Keye-VL (#20631) Cyrus Leung 2025-07-08 22:47:46 +08:00
  • 849590a2a7 Update torch/xla pin to 20250703 (#20589) XiongfeiWei 2025-07-08 07:44:02 -07:00
  • a4c23314c0 [xpu]feat: support multi-lora on xpu (#20616) Yan Ma 2025-07-08 22:07:10 +08:00
  • b942c094e3 Stop using title frontmatter and fix doc that can only be reached by search (#20623) Harry Mellor 2025-07-08 11:27:40 +01:00
  • b4bab81660 Remove unnecessary explicit title anchors and use relative links instead (#20620) Harry Mellor 2025-07-08 10:49:13 +01:00
  • b91cb3fa5c [Docs] Improve documentation for Deepseek R1 on Ray Serve LLM (#20601) Ricardo Decal 2025-07-08 02:09:06 -07:00
  • 71d1d75b7a [PD][Nixl] Remote consumer READ timeout for clearing request blocks (#20139) Nicolò Lucchesi 2025-07-08 09:56:40 +02:00
  • 72d14d0eed [Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load (#19619) Sanger Steel 2025-07-08 01:47:43 -04:00
  • e34d130c16 [TPU] Temporary fix vmem oom for long model len by reducing page size (#20278) Chenyaaang 2025-07-07 22:16:16 -07:00
  • 7721ef1786 [CI/Build][CPU] Fix CPU CI and remove all CPU V0 files (#20560) Li, Jiang 2025-07-08 13:13:44 +08:00
  • 8369b7c2a9 [Misc] improve error msg (#20604) Reid 2025-07-08 12:45:18 +08:00
  • 3eb4ad53f3 [Docs] Add Anyscale to frameworks (#20590) Ricardo Decal 2025-07-07 20:09:13 -07:00
  • 90a2769f20 [Docs] Add Ray Serve LLM section to openai compatible server guide (#20595) Ricardo Decal 2025-07-07 20:08:05 -07:00
  • e60d422f19 [Docs] Improve docstring for ray data llm example (#20597) Ricardo Decal 2025-07-07 20:06:26 -07:00
  • 0d914c81a2 [Docs] Rewrite offline inference guide (#20594) Ricardo Decal 2025-07-07 20:06:02 -07:00
  • 6e428cdd7a [Doc] Syntax highlight request responses as JSON instead of bash (#20582) Harry Mellor 2025-07-08 04:02:45 +01:00
  • 93b9d9f499 [Bugfix]: Fix messy code when using logprobs (#19209) Chauncey 2025-07-08 11:02:15 +08:00
  • af107d5a0e Make distinct code and console admonitions so readers are less likely to miss them (#20585) Harry Mellor 2025-07-08 03:55:28 +01:00
  • 31c5d0a1b7 [Optimize] Don't send token ids when kv connector is not used (#20586) Woosuk Kwon 2025-07-07 19:04:54 -07:00
  • afb7cff1b9 [Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe (#20167) Ming Yang 2025-07-07 18:07:22 -07:00
  • d2e841a10a [Misc] Improve logging for dynamic shape cache compilation (#20573) Kyle Yu 2025-07-07 20:48:09 -04:00
  • 14601f5fba [Config] Refactor mistral configs (#20570) Patrick von Platen 2025-07-08 00:25:10 +02:00
  • 042d131f39 Fix links in multi-modal model contributing page (#18615) Harry Mellor 2025-07-07 22:13:52 +01:00
  • 8e807cdfa4 [Misc] feat output content in stream response (#19608) rongfu.leng 2025-07-08 04:45:10 +08:00
  • e601efcb10 [Misc] Add fully interleaved support for multimodal 'string' content format (#14047) Anton 2025-07-07 22:43:08 +03:00
  • 22dd9c2730 [Kernel] Optimize Prefill Attention in Unified Triton Attention Kernel (#20308) jvlunteren 2025-07-07 21:08:12 +02:00
  • a6d795d593 [DP] Copy environment variables to Ray DPEngineCoreActors (#20344) Rui Qiao 2025-07-07 10:14:22 -07:00
  • a37d75bbec [Front-end] microbatch tokenization (#19334) ztang2370 2025-07-08 00:54:10 +08:00
  • edd270bc78 [Bugfix] Prevent IndexError for cached requests when pipeline parallelism is disabled (#20486) Peter Pan 2025-07-08 00:41:15 +08:00
  • 110df74332 [Model][Last/4] Automatic conversion of CrossEncoding model (#19675) wang.yuqi 2025-07-07 22:46:04 +08:00
  • 1ad69e8375 [Doc] Fix some MkDocs snippets used in the installation docs (#20572) Harry Mellor 2025-07-07 15:44:34 +01:00
  • b8a498c9b2 [Doc] Add outline for content tabs (#20571) Harry Mellor 2025-07-07 15:43:26 +01:00
  • 923147b5e8 [Doc] Fix internal links so they don't always point to latest (#20563) Harry Mellor 2025-07-07 12:15:50 +01:00
  • 45877ef740 [Doc] Use gh-pr and gh-issue everywhere we can in the docs (#20564) Harry Mellor 2025-07-07 11:54:22 +01:00
  • 6e4bef1bea [Doc] Remove extra whitespace from CI failures doc (#20565) Harry Mellor 2025-07-07 11:35:47 +01:00
  • 4ff79a136e [Misc] Set the minimum openai version (#20539) Jee Jee Li 2025-07-07 17:15:26 +08:00
  • 448acad31e [Misc] remove unused jinaai_serving_reranking (#18878) Abirdcfly 2025-07-07 17:14:12 +08:00
  • eb0b2d2f08 [Docs] Clean up tables in supported_models.md (#20552) Michael Yao 2025-07-07 16:46:31 +08:00
  • 3112271f6e [XPU] log clean up for XPU platform (#20553) Yan Ma 2025-07-07 16:38:22 +08:00
  • 1fd471e957 Add docstrings to url_schemes.py to improve readability (#20545) Michael Yao 2025-07-07 16:31:49 +08:00
  • 2c5ebec064 [XPU][CI] add v1/core test in xpu hardware ci (#20537) Liangliang Ma 2025-07-07 16:16:40 +08:00
  • 2e610deb72 [CI/Build] Enable phi2 lora test (#20540) Jee Jee Li 2025-07-07 13:10:41 +08:00
  • 6e2c19ce22 [Refactor]Abstract Platform Interface for Distributed Backend and Add xccl Support for Intel XPU (#19410) Yang Yang 2025-07-07 12:32:32 +08:00
  • 47db8c2c15 [Misc] add a tip for pre-commit (#20536) Reid 2025-07-07 10:42:06 +08:00
  • 462b269280 Implement OpenAI Responses API [1/N] (#20504) Woosuk Kwon 2025-07-06 18:32:13 -07:00
  • a5dd03c1eb Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)" v0.9.2rc2 v0.9.2 simon-mo 2025-07-06 14:02:36 -07:00
  • c18b3b8e8b [Bugfix] Add use_cross_encoder flag to use correct activation in ClassifierPooler (#20527) Cyrus Leung 2025-07-07 05:01:48 +08:00
  • 9528e3a05e [BugFix][Spec Decode] Fix spec token ids in model runner (#20530) Woosuk Kwon 2025-07-06 12:44:52 -07:00
  • 9fb52e523a [V1] Support any head size for FlexAttention backend (#20467) Cyrus Leung 2025-07-07 00:54:36 +08:00
  • e202dd2736 [V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412) Woosuk Kwon 2025-07-06 08:48:13 -07:00
  • 43813e6361 [Misc] call the pre-defined func (#20518) Reid 2025-07-06 18:25:29 +08:00
  • cede942b87 [Benchmark] Add support for multiple batch size benchmark through CLI in benchmark_moe.py (#20516) Brayden Zhong 2025-07-06 05:20:11 -04:00
  • fe1e924811 [Frontend] Support image object in llm.chat (#19635) Flora Feng 2025-07-05 23:47:13 -07:00
  • 4548c03c50 [TPU][Bugfix] fix the MoE OOM issue (#20339) Chengji Yao 2025-07-05 21:19:09 -07:00
  • 40b86aa05e [BugFix] Fix: ImportError when building on hopper systems (#20513) Lucas Wilkinson 2025-07-06 00:17:30 -04:00