Commit Graph

  • ce75efeecb [BugFix] FA2 MLA Accuracy Issue (#18807) Lucas Wilkinson 2025-05-28 04:59:39 -04:00
  • aa42561e40 Fix PiecewiseCompileInterpreter (#17338) Richard Zou 2025-05-28 04:40:53 -04:00
  • de65fc8e1e [CI] improve embed testing (#18747) wang.yuqi 2025-05-28 15:16:35 +08:00
  • 0c492b7824 [Deprecation] Remove fallbacks for Embeddings API (#18795) Cyrus Leung 2025-05-28 15:09:04 +08:00
  • 0f0926b43f [Deprecation] Remove unused sync methods in async_timeout (#18792) Cyrus Leung 2025-05-28 15:08:48 +08:00
  • 7f2c1a87e9 [Deprecation] Require overriding get_dummy_text and get_dummy_mm_data (#18796) Cyrus Leung 2025-05-28 15:08:35 +08:00
  • b78f844a67 [Bugfix][FailingTest]Fix test_model_load_with_params.py (#18758) Rabi Mishra 2025-05-28 11:12:54 +05:30
  • 5e13c07d00 [V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (2) (#18781) RonaldBXu 2025-05-27 22:09:14 -07:00
  • 774c5fde30 [V1] fix torch profiling for V1 offline scenarios (#18445) Divakar Verma 2025-05-27 23:16:30 -05:00
  • 9a21e331ff [Bugfix]: correctly propagate errors message caught at the chat_templating step to the client (#18769) Guillaume Calmettes 2025-05-28 05:35:43 +02:00
  • 3e9ce609bd [Bugfix] Fix nomic max_model_len (#18755) wang.yuqi 2025-05-28 11:29:53 +08:00
  • 794ae1f551 [rocm] Fix wrong attention log (#18764) fxmarty-amd 2025-05-28 04:45:41 +02:00
  • d73a9457a5 [Core] Improve Tensor serialisation (#18774) Lukas Geiger 2025-05-28 02:46:21 +01:00
  • a3896c7f02 [Build] Fixes for CMake install (#18570) Luka Govedič 2025-05-27 20:49:24 -04:00
  • 51e98e4ffd [Bugfix] Disable prefix caching by default for benchmark (#18771) cascade 2025-05-27 17:18:09 -07:00
  • e56f44d9ec Support datasets in vllm bench serve and sync with benchmark_[serving,datasets].py (#18566) Michael Goin 2025-05-27 19:59:48 -04:00
  • e0cbad4e30 [Neuron] Support quantization on neuron (#18283) Satyajith Chilappagari 2025-05-27 15:10:33 -07:00
  • b48d5cca16 [CI/Build] [TPU] Fix TPU CI exit code (#18282) Carol Zheng 2025-05-27 14:54:59 -07:00
  • 5873877241 [Bugfix] Mistral tool calling when content is list (#18729) v0.9.0 Michael Goin 2025-05-27 12:05:37 -04:00
  • 696259ca01 [Core] Automatically cast multi-modal input dtype (#18756) Cyrus Leung 2025-05-27 23:45:48 +08:00
  • 6b6d496114 optimize get_kv_cache_torch_dtype (#18531) chunxiaozheng 2025-05-27 21:08:44 +08:00
  • aaa4ac1c95 Disable prefix cache by default for benchmark (#18639) cascade 2025-05-27 05:06:34 -07:00
  • 06a0338015 [V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010) Mark McLoughlin 2025-05-27 10:37:06 +01:00
  • 4318c0559d [CI/Build] Remove imports of built-in re (#18750) Cyrus Leung 2025-05-27 17:19:18 +08:00
  • a68e293cb9 [Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking (#18663) Hyogeun Oh (오효근) 2025-05-27 17:44:20 +09:00
  • 6881107948 [BUG FIX] minicpm (#18739) Shawn Huang 2025-05-27 16:04:49 +08:00
  • e0f0ff87b8 [Build] fix cpu build missing libtbbmalloc.so (#18744) Kebe 2025-05-27 16:03:56 +08:00
  • c24b1572ac Minor fix about MooncakeStoreConnector (#18721) maobaolong 2025-05-27 16:02:28 +08:00
  • 4693a3438c [Doc] cleanup deprecated flag for doc (#18715) Calvin Chen 2025-05-27 15:12:02 +08:00
  • bbd9a84dc5 [Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh (#18752) Łukasz Durejko 2025-05-27 09:10:26 +02:00
  • a547aeb828 feat(rocm-support): support mamba2 on rocm (#18565) almersawi 2025-05-27 11:07:53 +04:00
  • fc6d0c290f [Misc] improve docs (#18734) Reid 2025-05-27 15:07:01 +08:00
  • 753944fa9b [Doc] Update reproducibility doc and example (#18741) Cyrus Leung 2025-05-27 15:03:13 +08:00
  • 25a817f202 [Doc] Update OOT model docs (#18742) Cyrus Leung 2025-05-27 14:30:31 +08:00
  • d260f799a9 [FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (#18271) vllmellm 2025-05-27 14:14:07 +08:00
  • b50602d5f0 [Model][Gemma3] Cast image pixel values already on CPU (#18732) Lukas Geiger 2025-05-27 06:42:54 +01:00
  • 1f1b1bc03b [V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646) Isotr0py 2025-05-27 12:40:28 +08:00
  • 1f88dbd2bb [Misc] improve web section group title display (#18684) Reid 2025-05-27 12:35:16 +08:00
  • 0eebd74842 [Model][Gemma3] Simplify image input validation (#18710) Lukas Geiger 2025-05-27 04:13:37 +01:00
  • 27bebcd897 Convert examples to ruff-format (#18400) Harry Mellor 2025-05-26 17:57:54 +01:00
  • e7523c2e03 [V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs (#18608) Lukas Geiger 2025-05-26 16:49:36 +01:00
  • a869baca73 [Bugfix] Fix Llama GGUF initialization (#18717) Cyrus Leung 2025-05-26 22:49:22 +08:00
  • 82e2339b06 [Doc] Move examples and further reorganize user guide (#18666) Cyrus Leung 2025-05-26 22:38:04 +08:00
  • 9553fdb41e [Doc] Improve API docs (#18713) Cyrus Leung 2025-05-26 22:33:34 +08:00
  • 243eb9199f [Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM (#18701) dylan 2025-05-26 22:10:56 +08:00
  • 0665e29998 [Misc] add AutoGen integration (#18712) Reid 2025-05-26 21:56:18 +08:00
  • e76be06550 [Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI (#18709) Łukasz Durejko 2025-05-26 14:26:07 +02:00
  • 0877750029 [CI/Build] Split pooling and generation extended language models tests in CI (#18705) Isotr0py 2025-05-26 19:00:08 +08:00
  • 6d68030f1c [Model] Add support for YARN in NemotronNAS models (#18427) Naveassaf 2025-05-26 13:31:49 +03:00
  • 5a2c76cbe1 [CI] fix dump_input for str type (#18697) Ning Xie 2025-05-26 18:23:35 +08:00
  • 38b13dfe78 [CI/Build] Replace math.isclose with pytest.approx (#18703) Cyrus Leung 2025-05-26 17:05:17 +08:00
  • 61a45e7a72 [Bugfix] Fix Mistral-format models with sliding window (#18693) Cyrus Leung 2025-05-26 16:44:04 +08:00
  • 65523a0995 [Doc] Fix issue template format (#18699) Cyrus Leung 2025-05-26 15:45:39 +08:00
  • 4b7740a105 [GH] Add issue template for reporting CI failures (#18696) Cyrus Leung 2025-05-26 15:42:04 +08:00
  • 4ea62c0ea0 [CI] add missing argument (#18694) Ning Xie 2025-05-26 15:22:04 +08:00
  • 561b77a0d6 [Bugfix] Fix the lm_head in gpt_bigcode in lora mode (#6357) Maximilien de Bayser 2025-05-26 03:52:25 -03:00
  • abd4030d94 refactor: simplify request handler, use positive condition check for handler assignment (#18690) CYJiang 2025-05-26 14:32:28 +08:00
  • 8820821b59 [Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example (#18644) AlexZhao 2025-05-26 13:51:27 +08:00
  • fba0642704 [CI/Build][Doc] Update gte-Qwen2-1.5B-instruct usage (#18683) Cyrus Leung 2025-05-26 11:27:50 +08:00
  • 6071e989df [Core][Multimodal] Convert PIL Image to array without data copy when hashing (#18682) Lukas Geiger 2025-05-25 18:33:35 +01:00
  • 57fd13a707 [Bugfix] Fix profiling dummy data for Pixtral (#18677) Cyrus Leung 2025-05-25 22:05:30 +08:00
  • 3a886bd58c [Misc] small improve (#18680) Reid 2025-05-25 21:05:38 +08:00
  • 35be8fad62 [CI/build] fix no regex (#18676) Reid 2025-05-25 18:10:51 +08:00
  • f2faac745d [Bugfix] Fix cpu usage and cache hit stats reporting on cpu environment (#18674) Yuqi Zhang 2025-05-25 02:36:06 -07:00
  • 279f854519 [doc] improve readability (#18675) Reid 2025-05-25 16:40:31 +08:00
  • 624b77a2b3 [doc] fix broken links (#18671) Reid 2025-05-25 16:36:33 +08:00
  • 503f8487c2 [Misc] Reduce logs on startup (#18649) Cyrus Leung 2025-05-25 14:03:53 +08:00
  • 44073a7ac3 [BUGFIX] catch subclass first for try...except (#18672) Ning Xie 2025-05-25 13:34:24 +08:00
  • 63934543a0 Speed up the kernels/quantization/ tests (#18669) Michael Goin 2025-05-25 01:02:59 -04:00
  • 75f81750f3 [VLM] Initialize video input support for InternVL models (#18499) Isotr0py 2025-05-25 12:51:25 +08:00
  • 6ab681bcbe [Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (#18655) Mengqing Cao 2025-05-25 12:51:21 +08:00
  • cebc22f3b6 [Misc]Replace cuda hard code with current_platform in Ray (#14668) Chenguang Li 2025-05-25 11:26:31 +08:00
  • 6c6dcd8611 [MISC] correct signature for LoaderFunction (#18670) Ning Xie 2025-05-25 11:17:47 +08:00
  • 7891fdf0c6 [V1] Fix _pickle.PicklingError: Can't pickle <class 'transformers_modules.deepseek-ai.DeepSeek-V2-Lite... (#18640) Seiji Eicher 2025-05-24 20:07:20 -07:00
  • 6825d9a998 [BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding (#18668) Woosuk Kwon 2025-05-24 17:33:46 -07:00
  • b554ab736e [CI/Build] fix permission denied issue (#18645) Reid 2025-05-25 00:09:10 +08:00
  • 9ea7f1abf3 fix(regression): clone from reference items (#18662) Aaron Pham 2025-05-24 11:25:20 -04:00
  • 2807271c86 [CI] enforce import regex instead of re (#18665) Aaron Pham 2025-05-24 11:04:14 -04:00
  • b9018a3f9f [BugFix] Fix import error for fused_moe (#18642) wangxiyuan 2025-05-24 22:53:36 +08:00
  • 4ceafb6299 [MISC] typo fix and clean import (#18664) Ning Xie 2025-05-24 22:52:09 +08:00
  • 2e6705784f [CI/Build] chmod +x to cleanup_pr_body.sh (#18650) Cyrus Leung 2025-05-24 22:26:45 +08:00
  • 1cb194a018 [Doc] Reorganize user guide (#18661) Cyrus Leung 2025-05-24 22:25:33 +08:00
  • 2cd4d58df4 [Model] use AutoWeightsLoader for gpt2 (#18625) ztang2370 2025-05-24 21:36:13 +08:00
  • 6d166a8d35 [Doc] Add community links (#18657) Cyrus Leung 2025-05-24 21:06:38 +08:00
  • ef1dd6870f [Doc] Fix indentation problems in V0 Paged Attention docs (#18659) Cyrus Leung 2025-05-24 21:06:35 +08:00
  • e77dc4bad8 [MISC][pre-commit] Add pre-commit check for triton import (#17716) Mengqing Cao 2025-05-24 20:09:15 +08:00
  • 07458a51ce [Doc] Update README links, mark external links (#18635) Cyrus Leung 2025-05-24 17:57:15 +08:00
  • c1e4a4052d [V1][Spec Decode] Support multi-layer eagle draft model (#18030) qizixi 2025-05-24 02:45:34 -07:00
  • a859320575 [Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) (#18647) Yuanhao WU 2025-05-24 17:15:36 +08:00
  • 441dc63ac7 [Frontend] improve vllm serve --help display (#18643) Reid 2025-05-24 15:53:22 +08:00
  • d55e446d13 [V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (#18424) qizixi 2025-05-23 23:51:22 -07:00
  • ec82c3e388 FIX MOE issue in AutoRound format (#18586) Wenhua Cheng 2025-05-24 13:01:40 +08:00
  • 45ab403a1f config.py: Clarify that only local GGUF checkpoints are supported. (#18623) Mathieu Borderé 2025-05-24 02:46:34 +02:00
  • 2b10ba7491 [Bugfix][Nixl] Fix Preemption Bug (#18631) Robert Shaw 2025-05-23 19:30:16 -04:00
  • 4fc1bf813a [Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454) Feng XiaoLong 2025-05-24 07:16:26 +08:00
  • f2036734fb [ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation (#18160) Pavani Majety 2025-05-23 15:52:20 -07:00
  • 7d9216495c [Doc] Update references to doc files (#18637) Cyrus Leung 2025-05-24 06:49:21 +08:00
  • 0ddf88e16e [CI] Enable test_initialization to run on V1 (#16736) Michael Goin 2025-05-23 18:09:44 -04:00
  • 1645b60196 Use prebuilt FlashInfer x86_64 PyTorch 2.7 CUDA 12.8 wheel for CI (#18537) Huy Do 2025-05-23 14:17:16 -07:00
  • 2628a69e35 [V1] Support Deepseek MTP (#18435) Jiayi Yao 2025-05-23 12:26:28 -05:00