Commit Graph

  • 8d6cf89526 [V1] [Spec Decode] Support random sampling for spec decode (#13933) v0.8.0rc1 Lily Liu 2025-03-16 22:00:20 -07:00
  • 583a9778e0 [Benchmark] Do not save detailed info to json by default (#14879) Simon Mo 2025-03-16 21:48:11 -07:00
  • a73e183e36 [Misc] Replace os environ to monkeypatch in test suite (#14516) Sibi 2025-03-17 11:35:57 +08:00
  • 1e799b7ec1 [BugFix] Fix MLA + V1 + TP==1 causing reinitialization of cuda context (#14910) Lucas Wilkinson 2025-03-16 23:35:37 -04:00
  • 7f6c5ee06c [V1][Minor] Add __repr__ to ConstantList (#14907) Woosuk Kwon 2025-03-16 20:20:15 -07:00
  • faa0275730 [V1] Optimize the overhead of rewinding (#14905) Woosuk Kwon 2025-03-16 20:19:30 -07:00
  • 8a5a9b70d7 [CI/Build] Update defaults for test reproducibility (#14893) Cyrus Leung 2025-03-17 10:38:15 +08:00
  • bb3aeddfaf [CI] Nightly Tests (#14898) Robert Shaw 2025-03-16 22:06:43 -04:00
  • aecc780dba [V1] Enable Entrypoints Tests (#14903) Robert Shaw 2025-03-16 20:56:16 -04:00
  • 90df7f23aa [Doc] Add guidance for using ccache with pip install -e . in doc (#14901) Vadim Gimpelson 2025-03-17 03:10:04 +04:00
  • b9b5bdfc7d [Misc] Catching Ray Compiled Graph PP test failures for V1 (#14847) Rui Qiao 2025-03-16 15:46:42 -07:00
  • 31060b2757 [V1][BugFix] Detect interleaved sliding window attention (#14896) Woosuk Kwon 2025-03-16 14:53:53 -07:00
  • fc1f67715d [BugFix][V1] Fix overhead related to bad_words sampling when not in use (#14894) Nick Hill 2025-03-16 14:53:34 -07:00
  • f6137adbcb Revert "[Bugfix] Limit profiling run sequence length by max_model_len (#14785) (#14892) Cyrus Leung 2025-03-17 00:13:46 +08:00
  • e53b1350f2 [Bugfix] Explicitly disable Phi-4-multimodal in V1 (#14889) Cyrus Leung 2025-03-17 00:05:40 +08:00
  • d30aa7e9e6 [Bugfix] Limit profiling run sequence length by max_model_len (#14785) Kyle Sayers 2025-03-16 10:44:19 -04:00
  • d1ad2a57af [V1] [Spec Decode] Fix ngram tests (#14878) Lily Liu 2025-03-16 00:29:22 -07:00
  • b82662d952 [BugFix] Fix torch distributed stateless PG backend init (#14870) Nick Hill 2025-03-15 20:26:19 -07:00
  • 71c1e07107 [Kernel] Add more tuned configs (#14877) Simon Mo 2025-03-15 20:25:03 -07:00
  • b30c75dda4 [V1] Remove V0 fallback for mistral-tokenizer (#14873) Roger Wang 2025-03-15 20:21:11 -07:00
  • def232e122 [VLM] Clean up Phi-4-MM ViT implementation (#14812) Isotr0py 2025-03-16 09:53:52 +08:00
  • 3453b964a3 [Misc][Doc] Minor benchmark README update (#14874) Roger Wang 2025-03-15 18:46:17 -07:00
  • 61c6a5a796 [VLM] Merged multi-modal processor for Pixtral (#12211) Rémi Delacourt 2025-03-15 14:28:27 +01:00
  • 74bc397b0a [Core] Expose API endpoint /is_sleeping (#14312) Jun Duan 2025-03-15 09:28:14 -04:00
  • f58aea002c [CI][Intel GPU] refine intel GPU ci docker build (#14860) Kunshang Ji 2025-03-15 04:58:53 -07:00
  • 3556a41434 [VLM] Limit multimodal input cache by memory (#14805) Cyrus Leung 2025-03-15 17:52:05 +08:00
  • 9ed6ee92d6 [Bugfix] EAGLE output norm bug (#14464) Bryan Lu 2025-03-14 23:50:33 -07:00
  • ee3778d5fc [Build/CI] Upgrade jinja2 to get 3 moderate CVE fixes (#14839) Russell Bryant 2025-03-15 01:38:19 -04:00
  • aaacf17324 [Doc] V1 user guide (#13991) Jennifer Zhao 2025-03-14 22:17:59 -07:00
  • 4c7629cae9 [V1][Structured Output] calculate vocab_size eagerly (#14851) Aaron Pham 2025-03-15 01:09:51 -04:00
  • e0fdfa1608 [CI/Build] Delete LoRA bias test (#14849) Jee Jee Li 2025-03-15 13:09:25 +08:00
  • 5952d8ab61 [Attention] Get rid of mla cache alignment (#14842) Lucas Wilkinson 2025-03-15 01:08:25 -04:00
  • a2ae496589 [CPU] Support FP8 KV cache (#14741) Li, Jiang 2025-03-15 13:07:36 +08:00
  • 877e352262 [Docs] Add new East Coast vLLM Meetup slides to README and meetups.md (#14852) Simon Mo 2025-03-14 22:06:38 -07:00
  • d4d93db2c5 [V1] V1 Enablement Oracle (#13726) Robert Shaw 2025-03-15 01:02:20 -04:00
  • 8c0d15d5c5 [Misc][Easy] Annotate unused vars in the csrc files (#14798) Lu Fang 2025-03-14 21:40:09 -07:00
  • 97ac781c62 [Misc] Remove misleading message in gemma2 and gemma3 (#14850) Isotr0py 2025-03-15 12:35:12 +08:00
  • 776dcec8fe Disable outlines cache by default (#14837) Russell Bryant 2025-03-14 23:57:55 -04:00
  • ccf02fcbae Revert "[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of U… (#14848) Tyler Michael Smith 2025-03-14 23:45:42 -04:00
  • acaea3bb07 [Bugfix][V1] Fix flashinfer sampling (#14815) DefTruth 2025-03-15 11:42:38 +08:00
  • 9f37422779 [Neuron][CI] update docker run command (#14829) Liangfu Chen 2025-03-14 18:51:35 -07:00
  • dd344e0342 [Bugfix] Fix torch_xla in V0 which can't handle None seed introduced … (#14844) yarongmu-google 2025-03-14 17:41:15 -07:00
  • 54a8804455 [Doc] More neutral K8s deployment guide (#14084) Yuan Tang 2025-03-14 19:12:36 -04:00
  • bbd94a19fc [Build/CI] Upgrade aiohttp to incldue CVE fix (#14840) Russell Bryant 2025-03-14 19:11:28 -04:00
  • 233ffce1eb [Build/CI] Move ninja to common deps (#14835) Russell Bryant 2025-03-14 17:25:28 -04:00
  • 40677783aa [CI] Add TPU v1 test (#14834) Richard Liu 2025-03-14 14:13:30 -07:00
  • 14f301b541 Update to torch==2.6.0 (#12721) Michael Goin 2025-03-14 16:58:30 -04:00
  • 46f98893dd [V1] Fix model parameterization for structured output tests (#14833) Russell Bryant 2025-03-14 16:55:18 -04:00
  • fe66b34728 [Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies (#14778) Chih-Chieh Yang 2025-03-14 16:36:18 -04:00
  • 270a5da495 Re-enable the AMD Entrypoints Test (#14711) Alexei-V-Ivanov-AMD 2025-03-14 14:18:13 -05:00
  • 7097b4cc1c [release] Remove log cleanup commands from TPU job (#14838) Kevin H. Luu 2025-03-14 11:59:52 -07:00
  • 977a16772c [Bugfix][Kernel]: Fix AllSpark kernel compilation errors and enable for CUDA < 12.0 (#14430) Yajie Wang 2025-03-15 00:55:14 +08:00
  • 73deea2fdb [Frontend] track server_load (#13950) daniel-salib 2025-03-14 09:53:17 -07:00
  • 9d2b4a70f4 [V1][Metrics] Updated list of deprecated metrics in v0.8 (#14695) Mark McLoughlin 2025-03-14 16:45:25 +00:00
  • 0b0d6421b2 [Frontend] Fix log message to use http vs https (#14774) Russell Bryant 2025-03-14 12:21:09 -04:00
  • 1140991a7b [V1] Fix vocab size calculation for structured output (#14826) Russell Bryant 2025-03-14 12:18:38 -04:00
  • 613c5bb945 [Bugfix] Fix Aria test loading (#14823) Cyrus Leung 2025-03-15 00:11:23 +08:00
  • fd8e055ffb [BugFix]: properly catch templating error when preprocess input (#13976) Guillaume Calmettes 2025-03-14 08:58:34 -04:00
  • ab93f1360f [VLM] Various cleanup and fixes (#14806) Cyrus Leung 2025-03-14 20:58:19 +08:00
  • 40253bab44 [Bugfix][W8A8] fixed cutlass block fp8 binding (#14796) DefTruth 2025-03-14 18:32:42 +08:00
  • c77620d22d [V1][Minor] Minor code cleanup for scheduling metrics (#14800) Woosuk Kwon 2025-03-14 01:21:28 -07:00
  • 989ecd2007 [Misc] Gemma3ForConditionalGeneration supports LoRA (#14797) Jee Jee Li 2025-03-14 16:07:30 +08:00
  • 54cc46f3eb [Bugfix] Fix small typo in the example of Streaming delimiter (#14793) WeiCheng 2025-03-14 16:05:17 +08:00
  • 601bd3268e [Misc] Clean up type annotation for SupportsMultiModal (#14794) Cyrus Leung 2025-03-14 15:59:56 +08:00
  • 09269b3127 [BugFix]Fix performance serving benchmark when enable profiling (#14737) Li Wang 2025-03-14 15:02:05 +08:00
  • 27b50f1fe6 [Bugfix][Kernel][CPU] Fix num_tokens in CPU rotary embedding kernel (#14667) Thien Tran 2025-03-14 14:47:49 +08:00
  • 9532c49836 [Attention] MLA get rid of materialization (#14770) Lucas Wilkinson 2025-03-14 02:39:02 -04:00
  • 0c2af17c76 [CI] Fix missing example model id in processor test (#14787) Roger Wang 2025-03-13 22:52:15 -07:00
  • a6e0d096dd [Feature] Add visionarena offline support for benchmark_throughput (#14654) Jennifer Zhao 2025-03-13 21:07:54 -07:00
  • d3d4956261 [Neuron] flatten test parameterization for neuron attention kernels (#14712) Liangfu Chen 2025-03-13 20:46:56 -07:00
  • 4059adc31b [Misc][Minor] Simplify SamplingParams.__post_init__() (#14772) Nick Hill 2025-03-13 23:44:20 -04:00
  • f1f632d9ec [ci] Reduce number of tests in fastcheck (#14782) Kevin H. Luu 2025-03-13 20:43:45 -07:00
  • 95d680b862 [Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prepack when CPU does not support it (#14681) Thien Tran 2025-03-14 11:43:18 +08:00
  • fb4c7f8ef0 [Kernel] [V1] Further optimizations to ROCm (Triton) Backend to better handle GQA. (#14431) Thomas Parnell 2025-03-14 04:42:27 +01:00
  • 0b1cfa6180 [Kernel] LoRA - Enable CUDAGraphs for V1 (#14626) Varun Sundar Rabindranath 2025-03-13 23:42:04 -04:00
  • 32ef4983cd [V1] Temporarily disable FlashInfer Rejection Sampler (#14788) Woosuk Kwon 2025-03-13 20:40:35 -07:00
  • ad19c8a003 [V1] Move OOM check into sampler run (#14728) Roger Wang 2025-03-13 20:40:23 -07:00
  • 2a602b055a forward fix PR 14245, restore build on ROCm 6.2 (#14709) Jeff Daily 2025-03-13 20:40:15 -07:00
  • 7888e1d0a3 [V1] TPU - Enable prefix caching by default (#14773) Alexander Matveev 2025-03-13 23:40:05 -04:00
  • 60c872d4b6 [Doc] Fix small typo in Transformers fallback (#14791) Chen Zhang 2025-03-14 11:33:12 +08:00
  • 3fb17d26c8 [Doc] Fix typo in documentation (#14783) yasu52 2025-03-13 20:33:09 -07:00
  • d47807ba08 [Attention] Remove slow setattr in MLA (#14769) Lucas Wilkinson 2025-03-13 17:31:14 -04:00
  • 02fcaa3d0a [V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (#14624) afeldman-nm 2025-03-13 15:07:34 -04:00
  • 8a4a2efc6f [V1][Core] using cached vocab_size for Structured Outputs (#14630) Aaron Pham 2025-03-13 14:39:28 -04:00
  • 8e9ffd37d6 [Misc] Clean up processor tests (#14771) Cyrus Leung 2025-03-14 02:25:37 +08:00
  • 01b3fd0af7 [V1][Minor] Minor enhancements on scheduler (#14732) Woosuk Kwon 2025-03-13 08:53:22 -07:00
  • f53a0586b9 [Bugfix] Fix prompt format of GLM4V (#14539) Cyrus Leung 2025-03-13 19:37:17 +08:00
  • b1cc4dfef5 [VLM] Support loading InternVideo2.5 models as original InternVLChatModel (#14738) Isotr0py 2025-03-13 18:10:02 +08:00
  • 382403921f [VLM] Support pan-and-scan for Gemma3 multi-modal processor (#14672) Cyrus Leung 2025-03-13 17:23:12 +08:00
  • a73122de96 [Bugfix] fix benchmark moe (#14653) Jee Jee Li 2025-03-13 16:12:42 +08:00
  • bd44b812cb [CI/Build] Delete ultravox LoRA test (#14730) Jee Jee Li 2025-03-13 15:57:39 +08:00
  • 55211b01e8 [Bugfix] Fix chunked prefill for GGUF (#14666) Szymon Ożóg 2025-03-13 08:19:03 +01:00
  • 5d043c1685 [Quant] Bamba SupportsQuant (#14698) Kyle Sayers 2025-03-13 00:57:05 -04:00
  • 36d1ccb286 [Quant] BartModel SupportsQuant (#14699) Kyle Sayers 2025-03-13 00:55:59 -04:00
  • 1bc3b739c4 [V1][TPU] Add assertion on multi-step-scheduler (#14707) Siyuan Liu 2025-03-12 21:37:58 -07:00
  • 1bd32bc8dd [Config][Disaggregated] Add timeout configuration for the torch.store and add KVTransferConfig.kv_connector_extra_config (#14367) Mathis Felardos 2025-03-13 04:15:20 +01:00
  • 128bf75283 [BugFix][TritonMLA] Process weights after model loading for GGUF (#14555) TY-AMD 2025-03-13 11:14:36 +08:00
  • a94a699c3f [ROCm][FP8] Fix for adjustments needed only for fnuz (#14689) Gregory Shtrasberg 2025-03-12 23:14:04 -04:00
  • ab426ec9c0 Add ray[data] as tpu dependency (#14691) Richard Liu 2025-03-12 20:13:48 -07:00
  • 165290d357 [bugfix] fixup warning message for plugged schedulers for v1 (#14700) Joe Runde 2025-03-12 21:12:13 -06:00