Commit Graph

  • ac201a0eaf [Feature] Support Decode Context Parallel (DCP) for MLA (#23734) yzds 2025-09-06 13:24:05 +08:00
  • 3c529fc994 [KV Sharing] Raise error if using eagle with fast prefill (#24350) Yong Hoon Shin 2025-09-05 20:22:40 -07:00
  • 35bf193864 [Doc]: fix typos in Python comments (#24294) Didier Durand 2025-09-06 04:41:12 +02:00
  • 35efa70297 Add @22quinn as code reviewer for RL related components (#24346) 22quinn 2025-09-05 18:56:15 -07:00
  • cee182b297 [Perf][V1] Fully overlap model execution (#23569) Benjamin Chislett 2025-09-05 21:20:17 -04:00
  • c954c6629c [CI] Add timeouts to tests (#24260) Rafael Vasquez 2025-09-05 20:26:22 -04:00
  • 9dfbeb41e5 [RFC] allow cancelation after shutdown in blocking collective_rpc (#23390) Shiyan Deng 2025-09-05 14:14:18 -07:00
  • eedb2a2a10 [Bugfix] Fix silu_mul+quant fusion test (#24341) elvischenv 2025-09-06 04:13:42 +08:00
  • 23a6c5280e [gpt-oss][Bugfix]Fix streamableparser for missing handling of certain token_ids (#24306) Chauncey 2025-09-06 01:26:00 +08:00
  • 7812bcf278 [docs] add shenzhen meetup (#24326) youkaichao 2025-09-05 22:48:42 +08:00
  • 006e7a34ae Adding int4 and int8 models for CPU benchmarking (#23709) Louie Tsai 2025-09-05 05:08:50 -07:00
  • e599e2c65e [XPU][P/D] Add XPU support in NixlConnector (#22436) liuzhenwei 2025-09-05 12:03:12 +08:00
  • c29fb540ff [gpt-oss] tool parser supports for /chat/completions [1/n] (#22386) Aaron Pham 2025-09-04 23:39:12 -04:00
  • 65e038931d [Frontend] Skip unnecessary detokenization when token_id is requested (#24236) Nicolò Lucchesi 2025-09-05 01:04:12 +02:00
  • 886ccbe5ba [CI/Build] Reduce the number of redundant cases to test for LoRA (#24276) Zhuohan Li 2025-09-04 14:58:44 -07:00
  • adc3ddb430 [Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727) elvischenv 2025-09-05 05:25:45 +08:00
  • 60b755cbcb [Misc] Have AsyncLLM custom_stat_loggers extend default logger list (#20952) Seiji Eicher 2025-09-04 14:25:30 -07:00
  • 482e52f56c QWEN3 Coder Fused MoE kernels Optimization configs (#24266) Saman A. Pour 2025-09-04 13:33:43 -07:00
  • 78336a0c3e Upgrade FlashInfer to v0.3.0 (#24086) Po-Han Huang (NVIDIA) 2025-09-05 00:49:20 +08:00
  • 94866d7c93 [Misc] Slight improve deepgemm print (#24085) Jee Jee Li 2025-09-05 00:06:51 +08:00
  • 83609ca91d [Doc]: fix typos in Python comments (#24173) Didier Durand 2025-09-04 17:52:17 +02:00
  • e41a0fa377 [Perf] Freeze core engine proc heap after init (#24008) Nick Hill 2025-09-04 07:55:23 -07:00
  • 37241077d5 [Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725) nvjullin 2025-09-04 21:25:40 +08:00
  • c9f7081f9c [LoRA]: Add lora support to qwen-2.5-omni (#24231) Yash Pratap Singh 2025-09-04 18:20:50 +05:30
  • 16ded21eeb [XPU] support Triton Attention backend on Intel GPU (#24149) Kunshang Ji 2025-09-04 20:41:08 +08:00
  • 2b30afa442 Use hidden_size_per_head as head_size fallback (#24221) nopperl 2025-09-04 20:59:16 +09:00
  • eafa8dcde6 [Model] Add pp support for hunyuan (#24212) Jiangyun Zhu 2025-09-04 18:58:26 +08:00
  • 6c7af8110a [Doc] Update vLLM Singapore Meetup info (#24234) TJian 2025-09-04 02:58:18 -07:00
  • 8f423e5f43 [Feature][Response API] Add streaming support for non-harmony (#23741) Kebe 2025-09-04 18:49:06 +09:00
  • 369a079568 [Hardware][Apple-CPU] Disable OneDNN build for Apple Silicon (#24200) Ignacio Sica 2025-09-04 06:48:25 -03:00
  • 402759d472 [Attention] FlashAttn MLA (#14258) Lucas Wilkinson 2025-09-04 05:47:59 -04:00
  • 2c301ee2eb [Bugfix] Fix Incremental Detokenization with tokenizers == 0.22.0 (#24159) Fanli Lin 2025-09-04 17:47:08 +08:00
  • 3efb9f4d95 [Attention][Platform] Refactor MLA to support Custom Op (#23332) whx 2025-09-04 17:46:37 +08:00
  • 04f3c35cff Improve flexibility of auto_tune.sh execution. (#23766) anthonsu 2025-09-04 02:41:41 -07:00
  • 51d5e9be7d [Core][Model] Terratorch backend integration (#23513) mgazz 2025-09-04 08:22:41 +01:00
  • e7fc70016f [Model] Add MiDashengLM model support (#23652) bingchen-mi 2025-09-04 15:08:09 +08:00
  • 12e1e63cc5 [Misc] Enhance output readability of helper script (#24214) Weida Hong 2025-09-04 14:38:26 +08:00
  • 57b1ce94f7 [CPU] Refactor CPU unquantized linear (#24150) Li, Jiang 2025-09-04 14:28:45 +08:00
  • cb55ad86fe Migrate ultravox inputs to TensorSchema (#23503) Benji Beck 2025-09-03 23:09:11 -07:00
  • 712b273f65 [Refactor] Introduce basic Renderer for completion-style request (#24010) Flora Feng 2025-09-03 22:21:12 -07:00
  • e919d6f549 [Kernel][Bugfix] Fix grouped topk cu (#24146) Qiming Zhang 2025-09-03 21:37:37 -07:00
  • a38f8bd54c [Feature][Responses API]Support MCP tools with streaming mode + background mode (#23927) wuhang 2025-09-04 12:05:10 +08:00
  • b5ee1e3261 Remove deprecated PyNcclConnector (#24151) Peter Pan 2025-09-04 06:49:16 +08:00
  • 36c260dad6 [Feature][gpt-oss] Add support for num_cached_tokens and num_reasoning_tokens tracking (#23460) George Nagy II 2025-09-03 15:08:47 -06:00
  • a43a3f1770 [Bugfix][DP] DP distribution does not require ray[default] (#23822) Kebe 2025-09-04 05:21:36 +09:00
  • 6adaed42f4 [Feature][P/D]: Optimize NIXL Connector xfer Launch (#23887) WeiQing Chen 2025-09-04 03:14:30 +08:00
  • a742322092 [Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend (#23289) Matthew Bonanni 2025-09-03 14:05:24 -04:00
  • 731a6940e3 Migrate whisper inputs to TensorSchema (#23505) Benji Beck 2025-09-03 11:04:00 -07:00
  • e9b92dcd89 [Kernels] Overlap shared experts with send/recv (#23273) bnellnm 2025-09-03 12:35:18 -04:00
  • fa4311d85f [V1] v1 engine + full CUDA graph support for PLaMo2 (#23998) nopperl 2025-09-04 00:24:02 +09:00
  • 6d80ae83e1 [Bugfix] Fixing division by zero in triton_attn if query_heads/kv_heads > 16 (#23424) Burkhard Ringlein 2025-09-03 17:01:09 +02:00
  • 4ba0c587ba FIX: Add libnuma-dev to Dockerfile for dev stage (#20388) dongbo910220 2025-09-03 22:17:20 +08:00
  • 6997a25ac6 [Model] Remove useless code from MiniMax implementation (#23982) qscqesze 2025-09-03 19:27:04 +08:00
  • 28f350e147 Support add_generation_prompt in embeddings endpoint with chat request (#23931) Jakub Smid 2025-09-03 12:47:55 +02:00
  • 51383bd472 [CI] Accelerate mteb test by setting SentenceTransformers mteb score to a constant (#24088) wang.yuqi 2025-09-03 17:23:56 +08:00
  • 9c99e4871f [Misc] Clean up deadcode for legacy processing pipeline (#24153) Isotr0py 2025-09-03 16:34:29 +08:00
  • 70549c1245 [CI/Build] Serve images used by multimodal tests through local HTTP Server (#23907) dsinghvi 2025-09-03 13:43:11 +05:30
  • f0c503f66e [Nixl] Heterogeneous TP support FlashInfer (#20189) Nicolò Lucchesi 2025-09-03 09:19:54 +02:00
  • f38035c123 [distributed][rl] remove nccl cumem env var override (#24141) youkaichao 2025-09-03 14:45:25 +08:00
  • 426cc8629f [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (#24132) Yong Hoon Shin 2025-09-02 21:57:59 -07:00
  • e81d4e69c1 [Misc] Add check for dual_chunk_attention (#24070) Jiangyun Zhu 2025-09-03 12:19:14 +08:00
  • 02d411fdb2 [Doc]: fix typos in Python comments (#24115) Didier Durand 2025-09-03 06:14:07 +02:00
  • d7e1e59972 [Doc]: fix typos in Python comments (#24093) Didier Durand 2025-09-03 06:05:45 +02:00
  • c4ed78b14f [Compile] Fix Compile Warning for w4a8_mm_entry.cu (#23660) Wentao Ye 2025-09-02 23:45:52 -04:00
  • 1bd007f234 fix some typos (#24071) co63oc 2025-09-03 11:44:50 +08:00
  • 136d853e65 [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (#23656) afeldman-nm 2025-09-02 22:52:51 -04:00
  • e32a0e8678 Upgrade xgrammar to 0.1.23 (#22988) Russell Bryant 2025-09-02 22:32:59 -04:00
  • 42dc59dbac Update release pipeline post PyTorch 2.8.0 update (#24073) youkaichao 2025-09-03 10:09:19 +08:00
  • 862f2ef893 [XPU] Fix the bug of LoRA logits on the XPU platform (#24081) Chaojun Zhang 2025-09-03 08:21:18 +08:00
  • 2fd1a40a54 [CI/Build] Disable SiluMul NVFP4 quant fusion tests (#24121) Matthew Bonanni 2025-09-02 19:50:28 -04:00
  • 930a24144c [Bug] R1 Accuracy: Fix routed_scaling_factor Double Mul Issue (#24119) Wentao Ye 2025-09-02 18:22:30 -04:00
  • 457e471971 [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (#23692) rasmith 2025-09-02 17:13:57 -05:00
  • d328f7894f [CI] Enable all hf transformers baselines in test_hybrid (#23936) Thomas Parnell 2025-09-02 22:15:06 +02:00
  • 98aee612aa [Log] Only Print Profiler Results on Rank 0 (#23370) Wentao Ye 2025-09-02 14:53:34 -04:00
  • 598bd74cf8 Fix weights loading for Apertus (#24100) nathan 2025-09-02 20:34:28 +02:00
  • 2417798471 [Metrics] Deprecate TPOT in favor of ITL (#24110) Mark McLoughlin 2025-09-02 19:10:10 +01:00
  • 9480ae24e3 [Bugfix] Fix packed_factor missing attribute error (#23902) Kyuyeun Kim 2025-09-02 10:56:31 -07:00
  • f399182e8c Run ruff format on a few files. (#24075) Chenheli Hua 2025-09-02 10:55:32 -07:00
  • 1c41310584 [Bugfix] Fix transform_config parsing in Compressed Tensors (#23945) Kyle Sayers 2025-09-02 13:54:10 -04:00
  • c83c4ff815 [Benchmark] Add support for local hf dataset path in benchmark (#23999) Jiangyun Zhu 2025-09-03 01:49:16 +08:00
  • 0e1759cd54 [docs] add SYS_NICE cap & security-opt for docker/k8s (#24017) Peter Pan 2025-09-03 01:27:20 +08:00
  • e66ed3e675 [CI Failure] Skip failing nvfp4 silu test (#23959) Michael Goin 2025-09-02 13:18:15 -04:00
  • e0653f6c0b [Model] Classification models support logit_bias / sigmoid_normalize (#24031) wang.yuqi 2025-09-03 00:48:57 +08:00
  • 38ba061f6f [BugFix] Fix EXAONE4 rotary embeddings (#23918) Kyungmin Lee 2025-09-02 23:40:55 +09:00
  • 0a74e9d0f2 [Gemma3n] Fix audio batching (#24052) Nicolò Lucchesi 2025-09-02 16:23:35 +02:00
  • 8bd5844989 correct LWS deployment yaml (#23104) Christian Berge 2025-09-02 14:04:59 +02:00
  • ce30dca5c4 [CI]: reduce HTTP calls inside entrypoints openai tests (#23646) Aziz 2025-09-02 12:49:32 +02:00
  • 2f0bab3f26 [Model] Support dp on ViT on GLM-4.5V (#23168) WeiQing Chen 2025-09-02 18:48:18 +08:00
  • fad73be1a5 [Doc]: fix typos in Python comments (#24077) Didier Durand 2025-09-02 11:38:55 +02:00
  • 56d04089ef Migrate Interns1 inputs to TensorSchema (#23510) Benji Beck 2025-09-01 21:35:45 -07:00
  • 7be0cb8e9e [XPU][Feature] fp8 online quantization support for XPU (#23148) Yan Ma 2025-09-02 12:06:53 +08:00
  • 1fa1d6a9a0 Migrate OvisImagePatchInputs to TensorSchema (#22024) Benji Beck 2025-09-01 21:01:36 -07:00
  • d59c986444 Remove runtime checks based on pooling params (#24051) Maximilien de Bayser 2025-09-02 00:54:37 -03:00
  • 04d0c60770 [Bugfix] Fix the issue that Blip2ForConditionalGeneration' object has… (#24028) damon 2025-09-02 11:54:20 +08:00
  • 2b41cbbf03 [V1][Mamba1] - FP32 SSM Kernel Support (#23506) Asaf Joseph Gardin 2025-09-02 06:53:00 +03:00
  • 0235103cbb [Doc]: fix typos in Python comments (#24042) Didier Durand 2025-09-02 04:07:45 +02:00
  • a344a5aa0a [bugfix]fix MTP hidden states (#24056) Lucia Fang 2025-09-01 14:09:37 -07:00
  • 5685370271 [Chore][V0 Deprecation] Move LogProb to a separate file (#24055) Woosuk Kwon 2025-09-01 12:07:53 -07:00
  • a0e0efd6bd [Model] Support DP for ViT on Kimi-VL-A3B-Thinking-2506 (#23817) WeiQing Chen 2025-09-02 00:56:56 +08:00
  • cf91a89dd2 [docs][misc] IOProcessor plugins fixes (#24046) Christian Pinto 2025-09-01 17:17:41 +01:00