Commit Graph

  • 992e5c3d34 Merge similar examples in offline_inference into single basic example (#12737) Harry Mellor 2025-02-20 12:53:51 +00:00
  • b69692a2d8 [Kernel] LoRA - Refactor sgmv kernels (#13110) Varun Sundar Rabindranath 2025-02-20 17:58:06 +05:30
  • a64a84433d [2/n][ci] S3: Use full model path (#13564) Kevin H. Luu 2025-02-20 01:20:15 -08:00
  • aa1e62d0db [ci] Fix spec decode test (#13600) Kevin H. Luu 2025-02-20 00:56:00 -08:00
  • 497bc83124 [CI/Build] Use uv in the Dockerfile (#13566) Michael Goin 2025-02-20 02:05:44 -05:00
  • 3738e6fa80 [API Server] Add port number range validation (#13506) Yuan Tang 2025-02-20 02:05:13 -05:00
  • 0023cd2b9d [ROCm] MI300A compile targets deprecation (#13560) Gregory Shtrasberg 2025-02-20 02:05:00 -05:00
  • 041e294716 [Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL (#13533) 2025-02-20 15:04:30 +08:00
  • 9621667874 [Misc] Warn if the vLLM version can't be retrieved (#13501) Alex Brooks 2025-02-19 23:24:48 -07:00
  • 8c755c3b6d [bugfix] spec decode worker get tp group only when initialized (#13578) Simon Mo 2025-02-19 20:46:28 -08:00
  • ba81163997 [core] add sleep and wake up endpoint and v1 support (#12987) youkaichao 2025-02-20 12:41:17 +08:00
  • 0d243f2a54 [ROCm][MoE] mi300 mixtral8x7B perf for specific BS (#13577) Divakar Verma 2025-02-19 22:01:02 -06:00
  • 88f6ba3281 [ci] Add AWS creds for AMD (#13572) Kevin H. Luu 2025-02-19 19:56:06 -08:00
  • 512368e34a [Misc] Qwen2.5 VL support LoRA (#13261) Jee Jee Li 2025-02-20 10:37:55 +08:00
  • 473f51cfd9 [3/n][CI] Load Quantization test models with S3 (#13570) Kevin H. Luu 2025-02-19 18:12:30 -08:00
  • a4c402a756 [BugFix] Avoid error traceback in logs when V1 LLM terminates (#13565) Nick Hill 2025-02-19 16:49:01 -08:00
  • 550d97eb58 [Misc] Avoid calling unnecessary hf_list_repo_files for local model path (#13348) Isotr0py 2025-02-20 02:57:48 +08:00
  • fbbe1fbac6 [MISC] Logging the message about Ray teardown (#13502) Cody Yu 2025-02-19 09:40:50 -08:00
  • 01c184b8f3 Fix copyright year to auto get current year (#13561) Wilson Wu 2025-02-20 00:55:34 +08:00
  • ad5a35c21b [doc] clarify multi-node serving doc (#13558) youkaichao 2025-02-19 22:32:17 +08:00
  • 5ae9f26a5a [Bugfix] Fix device ordinal for multi-node spec decode (#13269) shangmingc 2025-02-19 22:13:15 +08:00
  • 377d10bd14 [VLM][Bugfix] Pass processor kwargs properly on init (#13516) Cyrus Leung 2025-02-19 21:13:50 +08:00
  • 52ce14d31f [doc] clarify profiling is only for developers (#13554) youkaichao 2025-02-19 20:55:58 +08:00
  • 81dabf24a8 [CI/Build] force writing version file (#13544) Daniele 2025-02-19 11:48:03 +01:00
  • 423330263b [Feature] Pluggable platform-specific scheduler (#13161) Yannick Schnider 2025-02-19 10:16:38 +01:00
  • caf7ff4456 [V1][Core] Generic mechanism for handling engine utility (#13060) Nick Hill 2025-02-19 01:09:22 -08:00
  • f525c0be8b [Model][Speculative Decoding] DeepSeek MTP spec decode (#12755) Lucia Fang 2025-02-19 01:06:23 -08:00
  • 983a40a8bb [Bugfix] Fix Positive Feature Layers in Llava Models (#13514) Alex Brooks 2025-02-19 01:50:07 -07:00
  • fdc5df6f54 use device param in load_model method (#13037) Zhe Zhang 2025-02-19 16:05:02 +08:00
  • 3b05cd4555 [perf-benchmark] Fix ECR path for premerge benchmark (#13512) Kevin H. Luu 2025-02-18 23:56:11 -08:00
  • d5d214ac7f [1/n][CI] Load models in CI from S3 instead of HF (#13205) Kevin H. Luu 2025-02-18 23:34:59 -08:00
  • fd84857f64 [Doc] Add clarification note regarding paligemma (#13511) Roger Wang 2025-02-18 22:24:03 -08:00
  • 8aada19dfc [ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503) Divakar Verma 2025-02-19 00:23:24 -06:00
  • 9aa95b0e6a [perf-benchmark] Allow premerge ECR (#13509) Kevin H. Luu 2025-02-18 21:13:41 -08:00
  • d0a7a2769d [Hardware][Gaudi][Feature] Support Contiguous Cache Fetch (#12139) Yu-Zhou 2025-02-19 11:40:19 +08:00
  • 00b69c2d27 [Misc] Remove dangling references to --use-v2-block-manager (#13492) Harry Mellor 2025-02-19 03:37:26 +00:00
  • 4c82229898 [V1][Spec Decode] Optimize N-gram matching with Numba (#13365) Woosuk Kwon 2025-02-18 13:19:58 -08:00
  • c8d70e2437 Pin Ray version to 2.40.0 (#13490) Woosuk Kwon 2025-02-18 12:50:31 -08:00
  • 30172b4947 [V1] Optimize handling of sampling metadata and req_ids list (#13244) Nick Hill 2025-02-18 12:15:33 -08:00
  • a4d577b379 [V1][Tests] Adding additional testing for multimodal models to V1 (#13308) Murali Andoorveedu 2025-02-18 09:53:14 -08:00
  • 7b203b7694 [misc] fix debugging code (#13487) youkaichao 2025-02-19 01:37:11 +08:00
  • 4fb8142a0e [V1][PP] Enable true PP with Ray executor (#13472) Woosuk Kwon 2025-02-18 09:15:32 -08:00
  • a02c86b4dd [CI/Build] migrate static project metadata from setup.py to pyproject.toml (#8772) Daniele 2025-02-18 17:02:49 +01:00
  • 3809458456 [Bugfix] Fix invalid rotary embedding unit test (#13431) Liangfu Chen 2025-02-18 03:52:03 -08:00
  • d3231cb436 [Bugfix] Handle content type with optional parameters (#13383) zifeitong 2025-02-18 03:29:13 -08:00
  • 435b502a6e [ROCm] Make amdsmi import optional for other platforms (#13460) Cyrus Leung 2025-02-18 19:15:56 +08:00
  • 29fc5772c4 [Bugfix] Remove noisy error logging during local model loading (#13458) Isotr0py 2025-02-18 19:15:48 +08:00
  • 2358ca527b [Doc]: Improve feature tables (#13224) Harry Mellor 2025-02-18 10:52:39 +00:00
  • 8cf97f8661 [Bugfix] Fix failing transformers dynamic module resolving with spawn multiproc method (#13403) Isotr0py 2025-02-18 18:25:53 +08:00
  • e2603fefb8 [Bugfix] Ensure LoRA path from the request can be included in err msg (#13450) Yuan Tang 2025-02-18 03:19:15 -05:00
  • b53d79983c Add outlines fallback when JSON schema has enum (#13449) Michael Goin 2025-02-18 01:49:41 -05:00
  • 9915912f7f [V1][PP] Fix & Pin Ray version in requirements-cuda.txt (#13436) Woosuk Kwon 2025-02-17 21:58:06 -08:00
  • d1b649f1ef [Quant] Aria SupportsQuant (#13416) Kyle Sayers 2025-02-18 00:51:09 -05:00
  • ac19b519ed [core] fix sleep mode in pytorch 2.6 (#13456) youkaichao 2025-02-18 13:48:10 +08:00
  • a1074b3efe [Bugfix] Only print out chat template when supplied (#13444) Yuan Tang 2025-02-18 00:43:31 -05:00
  • 00294e1bc6 [Quant] Arctic SupportsQuant (#13366) Kyle Sayers 2025-02-18 00:35:09 -05:00
  • 88787bce1d [Quant] Molmo SupportsQuant (#13336) Kyle Sayers 2025-02-18 00:34:47 -05:00
  • 932b51cedd [v1] fix parallel config rank (#13445) youkaichao 2025-02-18 12:33:45 +08:00
  • 7c7adf81fc [ROCm] fix get_device_name for rocm (#13438) Divakar Verma 2025-02-17 22:07:12 -06:00
  • 67ef8f666a [Model] Enable quantization support for transformers backend (#12960) Isotr0py 2025-02-18 11:52:47 +08:00
  • efbe854448 [Misc] Remove dangling references to SamplingType.BEAM (#13402) Harry Mellor 2025-02-18 03:52:35 +00:00
  • b3942e157e [Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue (#13425) Tyler Michael Smith 2025-02-17 19:32:48 -05:00
  • cd4a72a28d [V1][Spec decode] Move drafter to model runner (#13363) Woosuk Kwon 2025-02-17 15:40:12 -08:00
  • 6ac485a953 [V1][PP] Fix intermediate tensor values (#13417) Cody Yu 2025-02-17 13:37:45 -08:00
  • 4c21ce9eba [V1] Get input tokens from scheduler (#13339) Woosuk Kwon 2025-02-17 11:01:07 -08:00
  • ce77eb9410 [Bugfix] Fix VLLM_USE_MODELSCOPE issue (#13384) r.4ntix 2025-02-17 22:22:01 +08:00
  • 30513d1cb6 [Bugfix] fix xpu communicator (#13368) Yan Ma 2025-02-17 20:59:18 +08:00
  • 1f69c4a892 [Model] Support Mamba2 (Codestral Mamba) (#9292) Tyler Michael Smith 2025-02-17 07:17:50 -05:00
  • 7b623fca0b [VLM] Check required fields before initializing field config in DictEmbeddingItems (#13380) Cyrus Leung 2025-02-17 17:36:07 +08:00
  • 238dfc8ac3 [MISC] tiny fixes (#13378) Mengqing Cao 2025-02-17 16:57:13 +08:00
  • 45186834a0 Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068) Huy Do 2025-02-17 00:16:32 -08:00
  • f857311d13 Fix spelling error in index.md (#13369) yankooo 2025-02-17 14:53:20 +08:00
  • 46cdd59577 [Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304) shangmingc 2025-02-17 11:32:26 +08:00
  • 2010f04c17 [V1][Misc] Avoid unnecessary log output (#13289) Jee Jee Li 2025-02-17 11:26:24 +08:00
  • 69e1d23e1e [V1][BugFix] Clean up rejection sampler & Fix warning msg (#13362) Woosuk Kwon 2025-02-16 12:25:29 -08:00
  • d67cc21b78 [Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend edge case (#13358) Isotr0py 2025-02-17 02:55:27 +08:00
  • e18227b04a [V1][PP] Cache Intermediate Tensors (#13353) Woosuk Kwon 2025-02-16 10:02:27 -08:00
  • 7b89386553 [V1][BugFix] Add __init__.py to v1/spec_decode/ (#13359) Woosuk Kwon 2025-02-16 09:39:08 -08:00
  • da833b0aee [Docs] Change myenv to vllm. Update python_env_setup.inc.md (#13325) 2025-02-17 00:04:21 +08:00
  • 5d2965b7d7 [Bugfix] Fix 2 Node and Spec Decode tests (#13341) Cyrus Leung 2025-02-16 22:20:22 +08:00
  • a0231b7c25 [platform] add base class for communicators (#13208) youkaichao 2025-02-16 22:14:22 +08:00
  • 124776ebd5 [ci] skip failed tests for flashinfer (#13352) youkaichao 2025-02-16 22:09:15 +08:00
  • b7d309860e [V1] Update doc and examples for H2O-VL (#13349) Roger Wang 2025-02-16 02:35:54 -08:00
  • dc0f7ccf8b [BugFix] Enhance test_pos_encoding to support execution on multi-devices (#13187) wchen61 2025-02-16 16:59:49 +08:00
  • d3d547e057 [Bugfix] Pin xgrammar to 0.1.11 (#13338) Michael Goin 2025-02-15 22:42:25 -05:00
  • 12913d17ba [Quant] Add SupportsQuant to phi3 and clip (#13104) Kyle Sayers 2025-02-15 22:28:33 -05:00
  • 80f63a3966 [V1][Spec Decode] Ngram Spec Decode (#12193) Lily Liu 2025-02-15 18:05:11 -08:00
  • 367cb8ce8c [Doc] [2/N] Add Fuyu E2E example for multimodal processor (#13331) Cyrus Leung 2025-02-15 23:06:23 +08:00
  • 54ed913f34 [ci/build] update flashinfer (#13323) youkaichao 2025-02-15 21:33:13 +08:00
  • 9206b3d7ec [V1][PP] Run engine busy loop with batch queue (#13064) Cody Yu 2025-02-15 03:59:01 -08:00
  • ed0de3e4b8 [AMD] [Model] DeepSeek tunings (#13199) rasmith 2025-02-15 05:58:09 -06:00
  • 2ad1bc7afe [V1][Metrics] Add iteration_tokens_total histogram from V0 (#13288) Mark McLoughlin 2025-02-15 11:56:19 +00:00
  • 7fdaaf48ef [Bugfix] Fix qwen2.5-vl image processor (#13286) Isotr0py 2025-02-15 19:00:11 +08:00
  • 067fa2255b [Bugfix]Fix search start_index of stop_checker (#13280) Xu Song 2025-02-15 13:39:42 +08:00
  • 9076325677 [BugFix] Don't scan entire cache dir when loading model (#13302) Nick Hill 2025-02-14 21:33:31 -08:00
  • 97a3d6d995 [Bugfix] Massage MLA's usage of flash attn for RoCM (#13310) Tyler Michael Smith 2025-02-15 00:33:25 -05:00
  • 579d7a63b2 [Bugfix][Docs] Fix offline Whisper (#13274) Nicolò Lucchesi 2025-02-15 06:32:37 +01:00
  • c9f9d5b397 [Bugfix][AMD] Update torch_bindings so that scaled_fp4_quant isn't build on ROCm (#13235) Sage Moore 2025-02-14 20:30:42 -08:00
  • 0c73026844 [V1][PP] Fix memory profiling in PP (#13315) Woosuk Kwon 2025-02-14 20:17:25 -08:00
  • 6a854c7a2b [V1][Sampler] Don't apply temp for greedy-only (#13311) Nick Hill 2025-02-14 18:10:53 -08:00