Commit Graph

  • e54ee3ea33 [Core] Deduplicate generate/encode logic in AsyncLLM (#31510) Nick Hill 2025-12-29 18:42:45 -08:00
  • 358bfd315c fix: update kimi k2 tool parser logic (#31207) wangln19 2025-12-30 10:01:58 +08:00
  • 39512aba72 [Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction (#27577) Sage 2025-12-30 02:17:16 +02:00
  • 0f35429a0c [CI]Test Group 'NixlConnector PD accuracy tests' is fixed (#31460) qli88 2025-12-29 17:48:56 -06:00
  • d63b969675 [CI/ROCm] Fixing "V1 Test attention (H100)" test group. (#31187) Alexei-V-Ivanov-AMD 2025-12-29 15:53:59 -06:00
  • 56f516254c [Bugfix][ROCm] Fix Static Quant Issue (#31502) Robert Shaw 2025-12-29 16:27:55 -05:00
  • 9152a30d8f [MoE Refactor][12/N] Marlin Fp8 MoE Pure Function (#31499) Robert Shaw 2025-12-29 16:27:00 -05:00
  • c2ff33cc8c [Core] Enable async scheduling by default (#27614) Nick Hill 2025-12-29 12:20:55 -08:00
  • b12cb38398 implements register kv caches in lmcache connector (#31397) chunxiaozheng 2025-12-30 03:13:42 +08:00
  • 5bc664110f Optimize QKNorm for MiniMax-M2/M2.1 (#31493) Roger Young 2025-12-30 00:30:18 +08:00
  • b3a2bdf1ac [Feature] Add offline FastAPI documentation support for air-gapped environments (#30184) RickyChen / 陳昭儒 2025-12-30 00:22:39 +08:00
  • e37e7349e6 Replace nn.ConvNd with vLLM's ConvNdLayer for Transformers modeling backend (#31498) Harry Mellor 2025-12-29 16:20:01 +00:00
  • b5d2d71d26 Migrate doc to website: Hardware Plugins (1/N) (#31496) Roy Wang 2025-12-29 23:55:20 +08:00
  • decc244767 [Docs] Use relative md links instead of absolute html links for cross referencing (#31494) Harry Mellor 2025-12-29 13:33:44 +00:00
  • 9c884faa95 [Bugfix] Preserve tool call id/type/name in streaming finish chunk (#31438) amittell 2025-12-29 08:10:52 -05:00
  • 48d5ca4e8b [CI] fix test_chat_truncation_content_not_null test (#31488) Chauncey 2025-12-29 20:47:08 +08:00
  • bf73a3e4d7 [Bugfix][Frontend] Fix Jina reranker multimodal input compatibility (#31445) twj 2025-12-29 17:13:18 +08:00
  • 3ecfdc3776 [ROCm][GPTQ][Bugfix] Fix GPTQ GEMM kernel output zeroing race condition (#30719) Andreas Karatzas 2025-12-29 03:13:14 -06:00
  • 45c1ca1ca1 [ROCm][CI] Skip DeepGemm-dependent test on ROCm platform (#31462) Andreas Karatzas 2025-12-29 01:31:10 -06:00
  • 17347daaa2 [CI/Build][CPU] Update CPU CI test cases (#31466) Li, Jiang 2025-12-29 14:17:52 +08:00
  • b9793e6a8c Add Fused MoE Triton kernels for GLM-4.5-Air, GLM-4.5v, GLM-4.6v on 2x RTX Pro 6000 (#31407) Mamy Ratsimbazafy 2025-12-28 17:38:33 +01:00
  • 0b6b701050 [Model] Add tuned triton fused_moe configs for Qwen3Moe on B200 (#31448) Jzz1943 2025-12-29 00:38:07 +08:00
  • 094fcce250 [BugFix] Re-fix async multimodal cpu tensor race condition (#31373) Nick Hill 2025-12-28 03:05:08 -08:00
  • 573dd0e6f0 [ROCm] Migrate xgrammar to upstream release (#31327) Andreas Karatzas 2025-12-28 02:08:29 -06:00
  • f70368867e [ROCm][CI] Add TorchCodec source build for transcription tests (#31323) Andreas Karatzas 2025-12-28 02:06:05 -06:00
  • 96142f2094 [ROCm][CI] Added perceptron lib in requirements for isaac multi-modal test (#31441) Andreas Karatzas 2025-12-27 22:15:14 -06:00
  • 62def07d67 [BugFix] register quant scale tensors as buffer (#31395) Boyuan Feng 2025-12-27 19:20:02 -08:00
  • b326598e97 add tip for VLLM_USE_PRECOMPILED arg to reduce docker build time (#31385) yitingdc 2025-12-28 11:19:47 +08:00
  • 727c41f3fd [MoE Refactor][10/N] Cleanup Fp8 Process Weights After Loading (#31169) Robert Shaw 2025-12-27 15:22:48 -05:00
  • 2f12cd32c0 [BugFix] Fix cache issue in compilation_config (#31376) Boyuan Feng 2025-12-27 06:30:39 -08:00
  • 40a8756224 [Chore]: Remove HF format Phi4-MM examples (#31405) Isotr0py 2025-12-27 21:42:02 +08:00
  • 3d024985ab [CI/Build] Ignore max transformers version for more common tests (#31401) Isotr0py 2025-12-27 21:06:26 +08:00
  • 8711b21676 Fix/get raw stream patch #30905 (#30912) baonudesifeizhai 2025-12-26 23:08:47 -05:00
  • 52bf066516 [Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector (#30166) Yifan Qiao 2025-12-26 18:25:46 -08:00
  • 5326c89803 [XPU][CI]skip test_preprocess_error_handling due to fork/spawn issue (#31381) Kunshang Ji 2025-12-27 05:40:44 +08:00
  • 87f1b8ca2c CustomOp: Unify aiter impl into GroupedTopk (#31221) Xinyu Chen 2025-12-27 01:44:29 +08:00
  • 887e900b77 [Docs] Add profiler user docs for http request (#31370) rongfu.leng 2025-12-26 23:48:15 +08:00
  • 48e744976c [Mistral common] Ensure all functions are imported from the top & only use public methods (#31138) Patrick von Platen 2025-12-26 13:48:24 +01:00
  • ce1eafd1a5 [Core] Initialize LoRA support for tower and connector in multi-modal models (#26674) Jee Jee Li 2025-12-26 20:48:20 +08:00
  • 0b544e6476 [Docs] Fix some snippets (#31378) Harry Mellor 2025-12-26 12:47:41 +00:00
  • c3666f56fd [Misc] Fix Qwen2-MoE shared_expert_gate (#31339) Jee Jee Li 2025-12-26 13:10:39 +08:00
  • c79dbfa9ad [CI] Fix flaky vision beam search test with flexible semantic validation (#31324) Andreas Karatzas 2025-12-25 22:39:32 -06:00
  • 9ee05cbe7f Support LoRA and GPTQModel for PLaMo 2/3 (#31322) Shinichi Hemmi 2025-12-26 12:41:33 +09:00
  • 3b8f31b362 [benchmark] use model card root instead of id (#31329) Ning Xie 2025-12-26 10:55:56 +08:00
  • 2cd94259c8 [CI/Build] Ignore max transformers version skipping for initialization tests (#30619) Isotr0py 2025-12-26 10:50:32 +08:00
  • b7165d53c6 Feature/isaac 0.1 (#28367) oscardev256 2025-12-25 21:49:11 -05:00
  • 81786c8774 [BugFix] Fix async scheduling + reasoning with struct output (#31332) Nick Hill 2025-12-25 15:01:02 -08:00
  • f1531d9f2a [Hybrid] Mamba2 prefix cache blocks freeing for running requests (#28047) Stan Wozniak 2025-12-25 21:54:06 +01:00
  • 2d6001f491 [Model][Ernie4.5-VL] Support video metadata for timestamp rendering (#31274) SongHe 2025-12-25 22:07:15 +08:00
  • 030fc44914 use the same stream for cuda graph catpure and replay for NCCL (#29207) Amir Samani 2025-12-25 03:10:03 -08:00
  • 2532f437ee [Doc] Add troubleshooting for Triton PTX error about undefined gpu-name (#31338) Isotr0py 2025-12-25 18:26:34 +08:00
  • f15185fbdb [Benchmark Suite] improve cpu Benchmark Suite tests and comparison report for 0.12.0 (#30994) Louie Tsai 2025-12-25 00:51:45 -08:00
  • ba25a65992 [Frontend] add FunctionGemma tool parser support (#31218) Mark Gatere 2025-12-25 10:29:25 +03:00
  • 42826bbccd [Doc] Add tool call parser documentation for GPT-OSS models (#31212) Amith KK 2025-12-25 10:59:10 +05:30
  • 254f6b9867 [Bugfix] Fix eagle dp tests on A100 (#31241) Richard Zou 2025-12-24 19:05:04 -05:00
  • bc5ef333e0 [Perf] Add skip_clone to SamplingParams for internal request handling (#31041) Michael Goin 2025-12-24 17:35:57 -05:00
  • 09dc7c690c [Chore][1/2] Drop v0.14 deprecations (#31285) Cyrus Leung 2025-12-25 01:54:01 +08:00
  • 506eb0f454 [Bugfix] Remove dead block_quant_to_tensor_quant function (#31294) ゆり 2025-12-25 01:22:48 +08:00
  • 5d93089686 [cli] complete vllm cli help message (#31226) Ning Xie 2025-12-24 23:45:47 +08:00
  • 66c9887440 [Bugfix][Hardware][AMD] Fix FP8 dtype in silu_mul quantization (#31179) Kevin McKay 2025-12-24 09:37:11 -06:00
  • 1ff67df182 [CI] Reorganization pooling_mteb_test (#31265) wang.yuqi 2025-12-24 23:36:20 +08:00
  • 7cd288a4b3 [PERF] Add interleaved memory allocation to NUMA module (#30800) skaraban3807 2025-12-24 19:17:49 +05:30
  • d201807339 [Chore] Bump lm-eval version (#31264) Cyrus Leung 2025-12-24 21:39:13 +08:00
  • aa3868ecfe [Chore] Remove unused noqas (#31263) Cyrus Leung 2025-12-24 21:38:46 +08:00
  • 7adeb4bfa8 [Bugfix] Fix max_model_len="auto" handling (#31260) Cyrus Leung 2025-12-24 19:15:27 +08:00
  • bd89ce16d2 [Model] Introduce verify_and_update_model_config for VerifyAndUpdateConfig. (#31131) wang.yuqi 2025-12-24 17:54:57 +08:00
  • b41aeb3468 [Bugfix][ROCm] Fix load issue on deepseek quark quantization when shared expert enabled (#31261) Pleaplusone 2025-12-24 16:47:44 +08:00
  • ddfac7034e [CI/Build] Ignore data_parallel_size_local (#30281) Ryan Rock 2025-12-24 01:40:54 -06:00
  • 6559d96796 [ROCm][CI] Set TORCH_NCCL_BLOCKING_WAIT Distributed Tests On ROCm (#31259) Micah Williamson 2025-12-24 01:19:07 -06:00
  • 1c74150bca [ROCm][CI] Fix "Distributed Tests (H200)" Test (#31227) kliuae 2025-12-24 14:56:30 +08:00
  • 0247a91e00 [ROCm][CI] Fix entrypoints tests and Python-only installation test on ROCm (#28979) Andreas Karatzas 2025-12-24 00:42:30 -06:00
  • 8ee90c83f8 Add --max-model-len auto to auto-fit context to available memory (#29431) Michael Goin 2025-12-24 00:37:14 -05:00
  • d7e05ac743 [docker] Fix downloading sccache on aarch64 platform (#30070) Nick Cao 2025-12-23 21:36:33 -08:00
  • 471ddb99a0 [XPU] Remove distributed_executor_backend check (#30760) sihao_li 2025-12-24 13:34:33 +08:00
  • bb24592d13 [Qwen3-Omni] fixed _get_feat_extract_output_lengths function (#31007) Xiong Wang 2025-12-24 13:33:54 +08:00
  • 369f47aa0f [DeepSeek v3.2] Remove unnecessary syncwarps (#31047) Matthew Bonanni 2025-12-24 00:33:30 -05:00
  • dabff12ed3 [Bugfix][ROCm][Dynamo][DS 3.1][FP8] fix unsupported hasattr call when Dynamo tracing for ROCm device (#31149) zejunchen-zejun 2025-12-24 13:32:19 +08:00
  • 3bb9561928 Revert "[bench] Support common prefix len config (for decode-only bench)" (#31240) Ming Yang 2025-12-23 21:17:23 -08:00
  • 3ce791ac77 [ROCm][CI] Set VLLM_FLOAT32_MATMUL_PRECISION="tf32" For terratorch Tests In AMD CI (#31242) Micah Williamson 2025-12-23 21:21:50 -06:00
  • e42894f5b5 [ROCm][CI][Bugfix] Fix Siglip2 rotary embedding dispatch and InternVL video test tolerance (#31235) Andreas Karatzas 2025-12-23 20:56:58 -06:00
  • 76e6a95192 [Bug] Fix Number of dimensions of tensors must match. for Deepseek V3.2 (#31160) Wentao Ye 2025-12-23 21:41:09 -05:00
  • 8b59753cdb [P/D] Mooncake connector support more protocols (#30133) Chao Lei 2025-12-24 10:24:07 +08:00
  • 538e830caa [KVEvent] User request.block_hash for parent block_hash (#30544) Chen Zhang 2025-12-23 18:23:43 -08:00
  • 4ed11105d7 [Misc] Remove unused custom ops copy_blocks and copy_blocks_mla (#30967) rongfu.leng 2025-12-24 10:22:35 +08:00
  • dd424571c8 [Bugfix] Enable dynamic_dims for different embeds shape (#31223) Cyrus Leung 2025-12-24 10:15:47 +08:00
  • ca6a95ba25 [Chore] Simplify logic of _execute_mm_encoder (#31222) Cyrus Leung 2025-12-24 10:15:16 +08:00
  • bc0a5a0c08 [CI] Add Qwen3-Next-FP8 to Blackwell model tests (#31049) Vadim Gimpelson 2025-12-24 05:21:50 +04:00
  • bfa2c0bbb9 [ROCm][Bugfix] Fix RuntimeError in MMEncoderAttention by replacing .view() with .reshape() (#31203) Andreas Karatzas 2025-12-23 15:48:01 -06:00
  • f790068600 [Core] Add a random suffix to frontend-provided request IDs (#27987) Mark McLoughlin 2025-12-23 21:05:39 +00:00
  • 34916ae37f [Mamba] - Consolidate Mambas Attention Logic (#28133) Asaf Joseph Gardin 2025-12-23 22:57:00 +02:00
  • 0736f901e7 docs: Add llm-d integration to the website (#31234) Yuan Tang 2025-12-23 15:27:22 -05:00
  • c016c95b45 Use helper function instead of looping through attribute names (#29788) Harry Mellor 2025-12-23 17:31:56 +00:00
  • 1339878e13 Only patch original_max_position_embeddings for Transformers v4 (#31214) Harry Mellor 2025-12-23 16:46:32 +00:00
  • b94f80ffb8 [FIX] FP4 quantization kernel padding initialization bug (#31097) danielafrimi 2025-12-23 18:45:18 +02:00
  • 38c361f99d Fix edge case Mistral tool parser (#30724) Joachim Studnia 2025-12-23 15:19:58 +01:00
  • bb62dda2c3 [Misc] Introduce encode_*_url utility function (#31208) Cyrus Leung 2025-12-23 21:45:21 +08:00
  • 3faa8bee57 adapt voxtral (#31095) Patrick von Platen 2025-12-23 14:31:55 +01:00
  • b10d47e0e0 Add util function for checking nesting of rope parameters (#31146) Harry Mellor 2025-12-23 11:41:49 +00:00
  • 769f27e701 [OpenAI] Add parameter metadata to validation errors (#30134) R3hankhan 2025-12-23 17:00:12 +05:30
  • 23daef548d [Frontend] Support using chat template as custom score template for reranking models (#30550) Jakub Zakrzewski 2025-12-23 12:19:16 +01:00