Commit Graph

  • 756848e79e [Bugfix] Fix Lora Name Parsing (#17196) Alex Brooks 2025-04-27 06:33:09 -06:00
  • 18445edd0f [Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (#17033) Flex Wang 2025-04-27 05:30:53 -07:00
  • 30215ca61f [MISC] Use string annotation types for class definitions (#17244) Jade Zheng 2025-04-27 16:39:57 +08:00
  • 838cedade7 [Bugfix] Get a specific type of layer from forward context (#17222) Chen Zhang 2025-04-27 15:58:05 +08:00
  • 4283a28c2f [Bugfix] Fix QWen2 VL multimodal mapping (#17240) Jee Jee Li 2025-04-27 13:53:23 +08:00
  • 93a126fbc7 [Misc] Make cached tokenizer pickle-compatible (#17048) Cyrus Leung 2025-04-27 13:05:00 +08:00
  • 8e4b351a0c [Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591) rasmith 2025-04-26 19:35:08 -05:00
  • 9869453c42 Update test_flash_attn.py (#17102) Happy 2025-04-27 06:17:35 +08:00
  • 3642c59aa8 [CI/Build] remove -t for run-lm-eval-gsm-hf-baseline.sh (#16271) Reid 2025-04-27 02:25:05 +08:00
  • 43eea2953b [Minor] Fix lint error in main branch (#17233) Woosuk Kwon 2025-04-26 11:10:14 -07:00
  • de7eb10ce4 [Bugfix] Fix Qwen2.5-Omni M-RoPE position ids generation (#16878) Kero Liang 2025-04-27 01:41:35 +08:00
  • fd11a325b8 [MISC] rename interval to max_recent_requests (#14285) Ning Xie 2025-04-27 00:59:18 +08:00
  • 4d17e20310 Disable the torch.compile cache checks when VLLM_DISABLE_COMPILE_CACHE=1 (#16573) Lu Fang 2025-04-26 09:17:58 -07:00
  • 10fd1d7380 [Bugfix] fix error due to an uninitialized tokenizer when using skip_tokenizer_init with num_scheduler_steps (#9276) changjun.lee 2025-04-27 00:51:17 +09:00
  • 52b4f4a8d7 [Docs] Update structured output doc for V1 (#17135) Russell Bryant 2025-04-26 11:12:18 -04:00
  • e782e0a170 [Chore] added stubs for vllm_flash_attn during development mode (#17228) Aaron Pham 2025-04-26 10:45:26 -04:00
  • dc2ceca5c5 [BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set (#17088) Ning Xie 2025-04-26 22:34:24 +08:00
  • f8acd01ff7 [V1] Add structural_tag support using xgrammar (#17085) Russell Bryant 2025-04-26 10:06:37 -04:00
  • c48334d405 [Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device (#17186) Agata Dobrzyniewicz 2025-04-26 14:55:14 +02:00
  • 909fdaf152 [Bugfix] Fix standard models tests (#17217) Cyrus Leung 2025-04-26 17:26:41 +08:00
  • 8c1c926d00 [Bugfix] Fix missing int type for -n in multi-image example (#17223) Isotr0py 2025-04-26 16:49:52 +08:00
  • df6f3ce883 [Core] Remove prompt string from engine core data structures (#17214) Nick Hill 2025-04-25 23:41:05 -07:00
  • 513f074766 [CI/test] Fix Eagle Correctness Test (#17209) Woosuk Kwon 2025-04-25 23:40:36 -07:00
  • b07bf83c7d [BugFix] Avoid race conditions in zero-copy tensor transmission (#17203) Nick Hill 2025-04-25 23:00:07 -07:00
  • 53e8cf53a4 [V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661) Zijing Liu 2025-04-25 22:05:40 -07:00
  • 54271bb766 [ROCm][Misc] Follow-ups for Skinny Gemms on ROCm. (#17011) Charlie Fu 2025-04-26 00:05:10 -05:00
  • 9e96f56efb Allocate kv_cache with stride order (#16605) Shu Wang 2025-04-26 00:03:31 -05:00
  • b278911229 [Minor][Models] Fix Return Types of Llama & Eagle (#17220) Woosuk Kwon 2025-04-25 21:54:47 -07:00
  • 7bd0c7745c [Doc] Minor fix for the vLLM TPU setup page (#17206) yarongmu-google 2025-04-25 21:39:56 -07:00
  • 1cf0719ebd [Minor][Spec Decode] Add use_eagle to SpeculativeConfig (#17213) Woosuk Kwon 2025-04-25 21:08:15 -07:00
  • 537d5ee025 [doc] add Anything LLM integration (#17216) Reid 2025-04-26 12:03:23 +08:00
  • c8e5be35f7 [MISC][AMD] Add unused annotation to rocm kernel file (#17097) Lu Fang 2025-04-25 20:33:35 -07:00
  • a6e72e1e4f [Bugfix] [pytorch] Patch AOTAutogradCache._get_shape_env (#17142) James Wu 2025-04-25 23:28:20 -04:00
  • 5e83a7277f [v1] [P/D] Adding LMCache KV connector for v1 (#16625) Yihua Cheng 2025-04-25 22:03:38 -05:00
  • 68af5f6c5c [AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary (#17215) rasmith 2025-04-25 21:55:05 -05:00
  • 8de2901fea [Bugfix] gemma[2,3] interleaved attention when sliding window is disabled (#17180) Chen Zhang 2025-04-26 10:53:51 +08:00
  • c53e0730cb [Misc] Refine ray_serve_deepseek example (#17204) Rui Qiao 2025-04-25 16:06:59 -07:00
  • a0e619e62a [V1][Spec Decode] EAGLE-3 Support (#16937) Benjamin Chislett 2025-04-25 18:43:07 -04:00
  • 70116459c3 [BugFix][Frontend] Fix LLM.chat() tokenization (#16081) Nick Hill 2025-04-25 15:20:05 -07:00
  • 65e262b93b Fix Python packaging edge cases (#17159) Christian Heimes 2025-04-26 00:15:07 +02:00
  • 43faa0461a [Bugfix] Fix hybrid model tests (#17182) Cyrus Leung 2025-04-26 06:14:37 +08:00
  • 48cb2109b6 [V1] Move usage stats to worker and start logging TPU hardware (#16211) Daniel Li 2025-04-25 13:06:01 -07:00
  • a5450f11c9 [Security] Use safe serialization and fix zmq setup for mooncake pipe (#17192) Russell Bryant 2025-04-25 12:53:23 -04:00
  • 9d98ab5ec6 [Misc] Inline Molmo requirements (#17190) Cyrus Leung 2025-04-26 00:41:44 +08:00
  • df5c879527 [doc] update wrong hf model links (#17184) Reid 2025-04-26 00:40:54 +08:00
  • 423e9f1cbe Use Transformers helper get_text_config() instead of checking for text_config (#17105) Harry Mellor 2025-04-25 16:47:35 +01:00
  • 0bd7f8fca5 Bump Transformers to 4.51.3 (#17116) Harry Mellor 2025-04-25 16:34:34 +01:00
  • d5615af9ae [Bugfix] Fix Mistral ChatCompletionRequest Body Exception (#16769) Jasmond L 2025-04-25 22:26:30 +08:00
  • 19dcc02a72 [Bugfix] Fix mistral model tests (#17181) Cyrus Leung 2025-04-25 21:03:34 +08:00
  • 7feae92c1f [Doc] Move todo out of beam search docstring (#17183) Alex Brooks 2025-04-25 05:44:58 -06:00
  • f851b84266 [Doc] Add two links to disagg_prefill.md (#17168) Michael Yao 2025-04-25 18:23:57 +08:00
  • fc966e9cc6 Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158) Lu Fang 2025-04-25 02:10:32 -07:00
  • ef19e67d2c [Doc] Add headings to improve gptqmodel.md (#17164) Michael Yao 2025-04-25 16:13:13 +08:00
  • a41351f363 [Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734) rasmith 2025-04-25 02:45:02 -05:00
  • 6aae216b4e [Bugfix] remove fallback in guided_json (int range, patterns) (#16725) Sangyeon Cho 2025-04-25 15:54:43 +09:00
  • b22980a1dc [Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457) yexin(叶鑫) 2025-04-25 14:52:28 +08:00
  • 881f735827 [Misc] Benchmark Serving Script Support Appending Results (#17028) Lucas Wilkinson 2025-04-25 01:53:55 -04:00
  • 2f54045508 [Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099) Mengqing Cao 2025-04-25 13:51:02 +08:00
  • 5aa6efb9a5 [Misc] Clean up redundant code in uniproc_executor.py (#16762) Lifu Huang 2025-04-24 22:49:30 -07:00
  • 6ca0234478 Move missed SchedulerConfig args into scheduler config group in EngineArgs (#17131) Harry Mellor 2025-04-25 06:48:53 +01:00
  • 649818995f [Docs] Fix True->true in supported_models.md (#17141) Michael Goin 2025-04-24 22:20:04 -06:00
  • 7a0a9da72b [Doc] V1 : Update LoRA status (#17133) Varun Sundar Rabindranath 2025-04-24 23:17:22 -04:00
  • 69bff9bc89 fix float16 support for kimi-vl (#17156) Zaida Zhou 2025-04-25 11:16:32 +08:00
  • 41ca7eb491 [Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (#16864) Lucas Wilkinson 2025-04-24 23:12:21 -04:00
  • eef364723c [FEAT] [ROCm]: AITER Fused MOE V1 Support (#16752) vllmellm 2025-04-25 11:06:50 +08:00
  • 0d6e187e88 Use custom address for listening socket (#15988) jglaser 2025-04-24 21:57:16 -04:00
  • 9420a1fc30 Better error message for missing mistral params.json (#17132) Michael Goin 2025-04-24 17:43:08 -06:00
  • 583e900996 [Misc] Add example to run DeepSeek with Ray Serve LLM (#17134) Rui Qiao 2025-04-24 15:25:21 -07:00
  • 05e1fbfc52 Add chat template for Llama 4 models (#16428) Maximilien de Bayser 2025-04-24 17:19:36 -03:00
  • fe92176321 Add collective_rpc to llm engine (#16999) Yinghai Lu 2025-04-24 13:16:52 -07:00
  • 6d0df0ebeb [Docs] Generate correct github links for decorated functions (#17125) Russell Bryant 2025-04-24 13:39:43 -04:00
  • 0fa939e2d1 Improve configs - LoRAConfig + PromptAdapterConfig (#16980) Harry Mellor 2025-04-24 18:29:34 +01:00
  • 0422ce109f Add :markdownhelp: to EngineArgs docs so markdown docstrings render properly (#17124) Harry Mellor 2025-04-24 18:28:45 +01:00
  • 47bdee409c Molmo Requirements (#17026) Eyshika Agarwal 2025-04-24 12:08:37 -05:00
  • 49f189439d existing torch installation pip command fix for docs (#17059) Atilla 2025-04-24 20:07:21 +03:00
  • 5adf6f6b7f Updating builkite job for IBM Power (#17111) Aaruni Aggarwal 2025-04-24 22:36:17 +05:30
  • 4115f19958 [CI] Add automation for the tool-calling github label (#17118) Russell Bryant 2025-04-24 12:22:00 -04:00
  • 340d7b1b21 [V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665) Mark McLoughlin 2025-04-24 16:57:40 +01:00
  • 1bcbcbf574 [Misc] refactor example series - structured outputs (#17040) Reid 2025-04-24 22:49:48 +08:00
  • 82e43b2d7e Add missing rocm_skinny_gemms kernel test to CI (#17060) Michael Goin 2025-04-24 08:49:37 -06:00
  • 67309a1cb5 [Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970) wang.yuqi 2025-04-24 22:06:28 +08:00
  • b724afe343 [V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (#16954) Shanshan Shen 2025-04-24 21:15:03 +08:00
  • 21f4f1c9a4 Improve static type checking in LoRAModelRunnerMixin (#17104) Harry Mellor 2025-04-24 14:14:47 +01:00
  • b0c1f6202d [Misc] Remove OLMo2 config copy (#17066) Isotr0py 2025-04-24 21:14:32 +08:00
  • c0dfd97519 [V1][PP] Optimization: continue scheduling prefill chunks (#17080) Rui Qiao 2025-04-24 05:27:08 -07:00
  • a9138e85b1 Fix OOT registration test (#17099) Harry Mellor 2025-04-24 12:44:12 +01:00
  • 0a05ed57e6 Simplify TokenizerGroup (#16790) Harry Mellor 2025-04-24 12:43:56 +01:00
  • 14288d1332 Disable enforce_eager for V1 TPU sampler and structured output tests (#17016) Michael Goin 2025-04-24 03:50:09 -06:00
  • b411418ff0 [Chore] Remove Sampler from Model Code (#17084) Woosuk Kwon 2025-04-24 02:49:33 -07:00
  • 2bc0f72ae5 Add docs for runai_streamer_sharded (#17093) omer-dayan 2025-04-24 11:03:21 +03:00
  • 9c1244de57 [doc] update to hyperlink (#17096) Reid 2025-04-24 15:58:08 +08:00
  • db2f8d915c [V1] Update structured output (#16812) Reid 2025-04-24 14:57:17 +08:00
  • 6167c0e5d2 [Bugfix][Core] add seq_id_to_seq_group clearing to avoid memory leak when s… (#16472) 张宇 2025-04-24 11:25:37 +08:00
  • ed2e464653 Addendum Fix to support FIPS enabled machines with MD5 hashing (#17043) Areeb Syed 2025-04-24 08:25:00 +05:30
  • 2c8ed8ee48 More informative error when using Transformers backend (#16988) Harry Mellor 2025-04-24 03:54:03 +01:00
  • ed50f46641 [Bugfix] Enable V1 usage stats (#16986) Michael Goin 2025-04-23 20:54:00 -06:00
  • 46e678bcff [Minor] Use larger batch sizes for A100/B100/B200/MI300x (#17073) Woosuk Kwon 2025-04-23 19:18:59 -07:00
  • 6b2427f995 [Quantization]add prefix for commandA quantized model (#17017) Chen Xia 2025-04-23 17:32:40 -07:00
  • b07d741661 [CI/Build] workaround for CI build failure (#17070) Sangyeon Cho 2025-04-24 08:14:18 +09:00
  • 41fb013d29 [V1][Spec Decode] Always use argmax for sampling draft tokens (#16899) Woosuk Kwon 2025-04-23 14:57:43 -07:00