Commit Graph

  • 7b5575fa7d [Bug] Fix vLLM config is not set error (#29999) Wentao Ye 2025-12-05 16:42:12 -05:00
  • 77e4472809 let draft model follow target model's config_format (#30152) Bangsheng Tang 2025-12-05 13:33:42 -08:00
  • 962d703818 [Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926) Divakar Verma 2025-12-05 13:57:26 -06:00
  • e23ca3a0e8 [CI] Re-use whisper_client for all tests (#30148) Nicolò Lucchesi 2025-12-05 20:47:37 +01:00
  • 3633035a3f [Misc] Rename CohereForAI references to CohereLabs (#30147) Russell Bryant 2025-12-05 14:41:40 -05:00
  • bff78310d9 [Enc-Dec] Fix OOT tokenizer issue (#30144) Nicolò Lucchesi 2025-12-05 20:23:33 +01:00
  • adb315060c [KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170) Tova Movshovitz 2025-12-05 20:33:26 +02:00
  • 4e26d3b09e [Compile] Conditional compilation. Introduce compile_ranges (#24252) Ilya Markov 2025-12-05 19:17:32 +01:00
  • 66e674cdd5 [Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315) Matthew Bonanni 2025-12-05 12:48:43 -05:00
  • dff0a2b394 [NIXL] Add remote_request_id to kv_transfer_params (#29665) Mark McLoughlin 2025-12-05 17:43:48 +00:00
  • dc264bcea1 [BugFix] Eagerly abort cancelled final-step requests (#29987) Nick Hill 2025-12-05 09:28:32 -08:00
  • 78c44fd722 [NIXL] Small cleanup of unused variables (#29618) Nicolò Lucchesi 2025-12-05 18:17:36 +01:00
  • e7296b08da [bugfix] Pass globals to aot_compiled function (#29428) Angela Yi 2025-12-05 08:54:26 -08:00
  • da7bc54ea8 [responsesAPI][5] ResponsesParser with tools for full MCP python loop (#29798) Andrew Xia 2025-12-05 08:11:50 -08:00
  • 949a6a19d2 [NIXL] Add compatibility checking to NIXL KV connector handshake (#29503) Mark McLoughlin 2025-12-05 14:52:45 +00:00
  • 2c174420f5 Reduce validation to a warning (#28749) Alec S 2025-12-05 09:02:49 -05:00
  • 0d8a7d8a26 [Compressed Tensors] Add XPU wNa16 support (#29484) Yi Liu 2025-12-05 22:02:09 +08:00
  • 9843e332da [CPU][Perf] Add fast vectorized exp impl from Arm Optimized Routines (#30068) Elham 2025-12-05 08:09:20 -05:00
  • b7d85cf25c [CI] Have pre-commit comment on a PR if pre-commit was not used (#30077) Harry Mellor 2025-12-05 13:03:45 +00:00
  • c2894d3883 [Feature] Add Layer-wise NVTX Support (#29990) Max Hu 2025-12-05 06:20:07 -05:00
  • 3628bcaaf2 [ROCm][MXFP4] Infer w4a4 quant method in rocm aiter fused moe (#29775) Zhiwei 2025-12-05 19:01:16 +08:00
  • b73b158ab0 [Bugfix] Fix parse_output_message crash on commentary with no recipient (#29972) strinczer 2025-12-05 10:51:12 +00:00
  • 7ae13c66ba [typing] fix type (#29964) Ning Xie 2025-12-05 18:46:08 +08:00
  • f16356fe36 [bench] Support common prefix len config (for decode-only bench) (#29934) Ming Yang 2025-12-05 02:26:52 -08:00
  • 65ee97288a [BugFix] Adding env variable to disable async grammar compilation (#29996) Alec S 2025-12-05 03:49:37 -05:00
  • 62b3333448 [Frontend] Remove deprecated -O.xx flag (#29991) Yanan Cao 2025-12-05 00:47:22 -08:00
  • feecba09af [CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues (#29997) rasmith 2025-12-05 02:42:25 -06:00
  • 6038b1b04b [Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978) amitz-nv 2025-12-05 10:34:33 +02:00
  • 60a66ea2dc [DOC]: Add kthena to integrations (#29931) Tiger Xu / Zhonghu Xu 2025-12-05 16:11:03 +08:00
  • 06579f9a82 [AMD][CI] Add ray[default] Dependency On ROCm To Pass v1/metrics/test_engine_logger_apis.py (#30110) Micah Williamson 2025-12-05 00:48:23 -06:00
  • 6e865b6a83 Refactor example prompts fixture (#29854) Chukwuma Nwaugha 2025-12-05 06:44:32 +00:00
  • d698bb382d [Bugfix] Correct num_q_heads on DCP for Flashinfer backends (#29487) Jingchun Gao 2025-12-05 13:54:31 +08:00
  • 2c22c4ca2d [ROCm][CI] Increase the memory threshold for test_deep_sleep_fp8_kvcache (#30104) Charlie Fu 2025-12-04 22:51:44 -06:00
  • 5867819eaf Do not guard during noop elimination pass (#30095) Laith Sakka 2025-12-04 20:10:12 -08:00
  • 7c9b2c8f81 [ROCm][CI] Add jiwer dependency for testing (#30081) Charlie Fu 2025-12-04 21:34:51 -06:00
  • 0098a6e3da [PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms (#29952) Qiu 2025-12-05 10:40:51 +08:00
  • befb59e5b1 [Model] Add Holo2 reasoning parser (#30048) Hubert de La Jonquiere 2025-12-05 03:38:45 +01:00
  • aaddc9c82a [CI] fix silent error in nightly wheel index generation script, add generation time to HTML index (#30060) Shengqi Chen 2025-12-05 08:48:59 +08:00
  • 263c38d74d [CI/Build] Update batch invariant test trigger (#30080) Zhewen Li 2025-12-04 16:42:37 -08:00
  • bcf43ab1f3 [CI/Build][AMD] Add Llama4 Maverick FP8 to AMD CI (#28695) Zhewen Li 2025-12-04 16:07:20 -08:00
  • 4470ee2f90 [Perf] Enable separate shared_experts stream only for CUDA (#30085) Alexander Matveev 2025-12-04 19:03:17 -05:00
  • 690cc3ef20 docs: update metrics design doc to use new vllm:kv_cache_usage_perc (#30041) TimWang 2025-12-05 07:37:14 +08:00
  • 1f0d184590 [aot_compile]change VLLM backend to read fake args from example_value (#29104) Laith Sakka 2025-12-04 14:33:45 -08:00
  • c8ab988b15 [BugFix] Fix DBO assert assert B_block_table == B_q (#29933) Lucas Wilkinson 2025-12-04 14:48:54 -05:00
  • 48a5fff66e [Bugfix] Missing tokens in return_token_ids when tool parsers is enabled in streaming mode (#29074) Peng-YM 2025-12-05 03:09:39 +08:00
  • 1119f6e47a Abstract eplb algo (#26471) Mercykid-bash 2025-12-05 03:09:09 +08:00
  • e10c84e06a Access partial_rotary_factor from rope_parameters (#29966) Harry Mellor 2025-12-04 18:42:49 +00:00
  • ece2825a29 [KVConnector] Remove v0-related kv connector components such as kv pipe and kv lookup buffer (#29705) Kuntai Du 2025-12-05 02:20:48 +08:00
  • 652ba93da3 [Bugfix] Fix FP8 MoE LoRA (#29890) Jee Jee Li 2025-12-05 02:17:49 +08:00
  • 6dcb07f676 support qwen3-vl handle requests with embeddings (#30037) Tao Yun 2025-12-05 01:34:06 +08:00
  • 46cbbca05c [CI][DCP][Perf] reduce DCP CI execution time (#29858) Qiu 2025-12-05 01:28:21 +08:00
  • b286a311c2 [Chore] Deprecate merge_by_field_config arg (#30035) Cyrus Leung 2025-12-05 01:21:24 +08:00
  • 990f806473 [Doc] clarify nightly builds in developer docs (#30019) Shengqi Chen 2025-12-05 00:28:37 +08:00
  • 5b4b42c0b6 Mark DBO test as flaky on b200 for Distributed B200 test (#29913) Doug Smith 2025-12-04 10:38:03 -05:00
  • cc050558f4 [Model Runner V2] Implement get_num_sampled_and_rejected kernel (#30029) Woosuk Kwon 2025-12-04 07:19:42 -08:00
  • 5c32a06a04 Use Transformers v5 RoPE standardisation and validation (#30046) Harry Mellor 2025-12-04 14:54:28 +00:00
  • dd97e047e0 Fix broken multiline assert in LoRAModelManager.register_module (#30032) Yongtao Huang 2025-12-04 22:04:42 +08:00
  • 9998ea5b57 Delete HF version of Phi 4 MM (#30049) Harry Mellor 2025-12-04 13:44:50 +00:00
  • 74c4d80c6c [Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling (#27145) wang.yuqi 2025-12-04 21:44:15 +08:00
  • 1b7c7f5159 [release] install regex (#30008) Kevin H. Luu 2025-12-04 03:18:29 -08:00
  • 6796ce8bdb [Bugfix] Fix the issue with interleaved thinking when using streaming (#30033) Chauncey 2025-12-04 19:11:59 +08:00
  • e96a6a6dca [ROCm][CI][Bugfix] Fixing the Multi-Modal Models Test (Extended) 1 group (#30013) Andreas Karatzas 2025-12-04 05:00:16 -06:00
  • 6366c098d7 Validating Runai Model Streamer Integration with S3 Object Storage (#29320) Noa Neria 2025-12-04 12:04:43 +02:00
  • 842aba501d [P/D] Introduce Mooncake Transfer Engine as kv_connector (#24718) dtc 2025-12-04 17:51:36 +08:00
  • f2f4cea6cc [CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER (#29995) rasmith 2025-12-04 03:30:22 -06:00
  • dfdda96747 [Core] Remove forced None assignment for deprecated PassConfig flags (#29994) Arpit Khandelwal 2025-12-04 04:15:04 -05:00
  • ffdd18111b Add DeepSeek-V3.2 tool parser. (#29848) Xu Wenqing 2025-12-04 16:46:34 +08:00
  • b8a6ae4158 [ROCm] add fallback for aiter fp8 decode mla (#30005) Ye (Charlotte) Qi 2025-12-04 00:45:57 -08:00
  • 899e2ef558 [Core] Fix standalone runs of test_reset_prefix_cache_e2e (#29899) Mark McLoughlin 2025-12-04 08:22:03 +00:00
  • 68eb5c8d97 [Misc] Move functions into PoolingMetadata (#30027) Cyrus Leung 2025-12-04 16:21:19 +08:00
  • 5430e110c0 [CI][AMD] Match Main CI Behavior By Skipping test_eplb_spec_decode In AMD CI (#30006) Micah Williamson 2025-12-04 02:20:54 -06:00
  • 3f1b03739a [ROCm] [Bugfix] compute_attn_mask_seqlen for qwen3 omni (#29974) TJian 2025-12-04 16:20:24 +08:00
  • 9aa33a74b0 [Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model (#30001) Charlie Fu 2025-12-04 01:52:28 -06:00
  • fd68e909db [docs] Remove _total from counter metrics names (#30028) CYJiang 2025-12-04 15:46:15 +08:00
  • 404fc4bfc0 [Frontend] refactor harmony utils output message parsing (#29820) daniel-salib 2025-12-03 23:36:57 -08:00
  • 82a64b3d8f [Bugfix] fixed deepseekv32 tool calling error (#30025) Chauncey 2025-12-04 15:12:12 +08:00
  • 9ae2f60374 [Misc] Various cleanups for MM input processing (#29970) Cyrus Leung 2025-12-04 14:22:20 +08:00
  • 80f8af4b2f Fix error while downloading dependencies for CPU backend (#29797) Jianwei Mao 2025-12-04 14:04:44 +08:00
  • 8aaa81b35f [KVConnector] remove unused code (the model aware kv ops class) (#29709) Kuntai Du 2025-12-04 14:00:52 +08:00
  • fca3f46658 [Frontend] Fixes anthropic /v1/messages streaming not containing input_tokens on first chunk (#29971) Benjamin Bartels 2025-12-04 05:50:27 +00:00
  • 28097d5638 [Bugfix][CPU] Fix CPU KV cache fallback memory allocation (#29604) gausah01 2025-12-04 05:01:15 +00:00
  • dd38ba3a26 [Bugfix] Fix adapter_enabled IMA (#29977) Jee Jee Li 2025-12-04 12:51:15 +08:00
  • 5f91cdda75 [Misc] Add docker build env for Ascend NPU (#30015) Li Wang 2025-12-04 11:53:00 +08:00
  • 33a3d6c798 fix LoRA-related examples (#29956) Iceber Gu 2025-12-04 11:48:30 +08:00
  • c493b9d092 [CI/Build] Add MM code path to Examples Test (#29986) Zhewen Li 2025-12-03 19:21:45 -08:00
  • ad32e3e19c enable multi-node in external launcher mode (#29833) Xieyang Xu 2025-12-03 17:02:02 -08:00
  • 1109f98288 [CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930) Shengqi Chen 2025-12-04 06:08:19 +08:00
  • b5407869c8 [Bugfix] Respect VLLM_CONFIGURE_LOGGING value (#28671) Elizabeth Thomas 2025-12-03 16:00:52 -06:00
  • 2902c34826 [Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929) bnellnm 2025-12-03 15:49:00 -05:00
  • ac1886588f [CI] Fix re import error (#29973) Wentao Ye 2025-12-03 15:16:54 -05:00
  • 2fc5d6e0d7 Fix LLMEngine.del dp_group cleanup condition (#29954) Yongtao Huang 2025-12-04 04:14:44 +08:00
  • afe9eb408e [Bugfix] Fix flashinfer ar+norm kernel not available issue (#29960) elvischenv 2025-12-04 02:50:53 +08:00
  • 19bee6d12d [Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470) Varun Sundar Rabindranath 2025-12-03 13:04:59 -05:00
  • dd5d1ef780 [Bugfix] Mistral tool parser streaming update (#19425) avigny 2025-12-03 18:45:31 +01:00
  • d1f7392c5f [ROCm][CI] Fix v1/logits_processors failure on ROCm (#29927) Micah Williamson 2025-12-03 11:17:07 -06:00
  • 9ae3c55b10 SigLIP example add chat_template (#29902) Yu Jiaqi 2025-12-04 00:12:58 +08:00
  • 9bcf92295a [Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163) Lumis Chen 2025-12-04 00:06:57 +08:00
  • 5aa9b09040 [CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29839) rasmith 2025-12-03 08:56:35 -06:00
  • 1bb17ecb39 [CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868) ioana ghiban 2025-12-03 14:33:50 +01:00
  • 15b1511a15 [GPU Backend] [Doc]: Remove duplicate statements on missing GPU wheels. (#29962) ioana ghiban 2025-12-03 13:56:47 +01:00