Commit Graph

  • fc601665eb [Misc] Update disaggregation benchmark scripts and test logs (#11456) Jiaxin Shan 2024-12-24 22:58:48 -08:00
  • 9832e5572a [V1] Unify VLLM_ENABLE_V1_MULTIPROCESSING handling in RayExecutor (#11472) Rui Qiao 2024-12-24 19:49:46 -08:00
  • 3f3e92e1f2 [Model] Automatic conversion of classification and reward models (#11469) Cyrus Leung 2024-12-25 02:22:22 +08:00
  • 409475a827 [Bugfix] Fix issues in CPU build Dockerfile. Fixes #9182 (#11435) Yuan Tang 2024-12-24 11:53:28 -05:00
  • 196c34b0ac [Misc] Move weights mapper (#11443) Jee Jee Li 2024-12-24 21:05:25 +08:00
  • 5c7963249d [attn][tiny fix] fix attn backend in MultiHeadAttention (#11463) Mengqing Cao 2024-12-24 20:39:36 +08:00
  • 461cde2080 [OpenVINO] Fixed installation conflicts (#11458) Ilya Lavrenov 2024-12-24 15:38:21 +04:00
  • 7a5286cc04 [Bugfix][Hardware][CPU] Fix CPU input_positions creation for text-only inputs with mrope (#11434) Isotr0py 2024-12-24 17:59:51 +08:00
  • b1b1038fbd [Bugfix] Fix Qwen2-VL LoRA weight loading (#11430) Jee Jee Li 2024-12-24 17:56:10 +08:00
  • 9edca6bf8f [Frontend] Online Pooling API (#11457) Cyrus Leung 2024-12-24 17:54:30 +08:00
  • 4f074fbf53 [Misc]Suppress irrelevant exception stack trace information when CUDA… (#11438) dpxa 2024-12-24 16:43:39 +08:00
  • a491d6f535 [V1] TP Ray executor (#11107) Rui Qiao 2024-12-23 15:00:12 -08:00
  • 32aa2059ad [Docs] Convert rST to MyST (Markdown) (#11145) Rafael Vasquez 2024-12-23 17:35:38 -05:00
  • 94d545a1a1 [Doc] Fix typo in the help message of '--guided-decoding-backend' (#11440) yansh97 2024-12-24 04:20:44 +08:00
  • 60fb4f3bcf [Bugfix] Add kv cache scales to gemma2.py (#11269) Michael Goin 2024-12-23 14:30:45 -05:00
  • 63afbe9215 [CI] Expand OpenAI test_chat.py guided decoding tests (#11048) Michael Goin 2024-12-23 13:35:38 -05:00
  • 8cef6e02dc [Misc] add w8a8 asym models (#11075) Dipika Sikka 2024-12-23 13:33:20 -05:00
  • b866cdbd05 [Misc] Add assertion and helpful message for marlin24 compressed models (#11388) Dipika Sikka 2024-12-23 13:23:38 -05:00
  • 2e726680b3 [Bugfix] torch nightly version in ROCm installation guide (#11423) Yuan Tang 2024-12-23 12:20:22 -05:00
  • 5bfb30a529 [Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389) Michael Goin 2024-12-23 10:06:20 -05:00
  • e51719ae72 mypy type checking for vllm/worker (#11418) Lucas Tucker 2024-12-23 07:55:49 -06:00
  • f30581c518 [misc][perf] remove old code (#11425) youkaichao 2024-12-23 00:01:08 -08:00
  • 048fc57a0f [CI] Unboock H100 Benchmark (#11419) Simon Mo 2024-12-22 14:17:43 -08:00
  • f1d1bf6288 [Bugfix] Fix fully sharded LoRAs with Mixtral (#11390) Jason T. Greene 2024-12-22 09:25:10 -06:00
  • 72d9c316d3 [cd][release] fix race conditions (#11407) youkaichao 2024-12-22 00:39:11 -08:00
  • 4a9139780a [cd][release] add pypi index for every commit and nightly build (#11404) youkaichao 2024-12-21 23:53:44 -08:00
  • 29c748930e [CI] Fix flaky entrypoint tests (#11403) Roger Wang 2024-12-21 21:08:44 -08:00
  • c2d1b075ba [Bugfix] Fix issues for Pixtral-Large-Instruct-2411 (#11393) Roger Wang 2024-12-21 02:15:03 -08:00
  • 584f0ae40d [V1] Make AsyncLLMEngine v1-v0 opaque (#11383) Ricky Xu 2024-12-20 23:14:08 -08:00
  • 51ff216d85 [Bugfix] update should_ignore_layer (#11354) George 2024-12-21 01:36:23 -05:00
  • dd2b5633dd [V1][Bugfix] Skip hashing empty or None mm_data (#11386) Woosuk Kwon 2024-12-21 14:22:21 +09:00
  • 47a0b615b4 Add ray[default] to wget to run distributed inference out of box (#11265) Jiaxin Shan 2024-12-20 13:54:55 -08:00
  • 5d2248d81a [doc] explain nccl requirements for rlhf (#11381) youkaichao 2024-12-20 13:00:56 -08:00
  • d573aeadcc [Bugfix] Don't log OpenAI field aliases as ignored (#11378) Michael Goin 2024-12-20 14:03:50 -05:00
  • 995f56236b [Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192) omer-dayan 2024-12-20 18:46:24 +02:00
  • 7c7aa37c69 [CI/Build] fix pre-compiled wheel install for exact tag (#11373) Daniele 2024-12-20 17:14:40 +01:00
  • 04139ade59 [V1] Fix profiling for models with merged input processor (#11370) Roger Wang 2024-12-20 04:04:21 -08:00
  • 1ecc645b8f [doc] backward compatibility for 0.6.4 (#11359) youkaichao 2024-12-19 21:33:53 -08:00
  • c954f21ac0 [misc] add early error message for custom ops (#11355) youkaichao 2024-12-19 21:18:25 -08:00
  • 86c2d8fd1c [Bugfix] Fix spec decoding when seed is none in a batch (#10863) Wallas Henrique 2024-12-20 02:15:31 -03:00
  • b880ffb87e [Misc] Add tqdm progress bar during graph capture (#11349) Michael Goin 2024-12-19 23:35:18 -05:00
  • 7801f56ed7 [ci][gh200] dockerfile clean up (#11351) youkaichao 2024-12-19 18:13:06 -08:00
  • 48edab8041 [Bugfix][Hardware][POWERPC] Fix auto dtype failure in case of POWER10 (#11331) Akash kaothalkar 2024-12-20 07:02:07 +05:30
  • a985f7af9f [CI] Adding CPU docker pipeline (#11261) Yuan 2024-12-20 03:46:55 +08:00
  • e461c262f0 [Misc] Remove unused vllm/block.py (#11336) yangzhibin 2024-12-20 01:54:24 +08:00
  • 276738ce0f [Bugfix] Fix broken CPU compressed-tensors test (#11338) Isotr0py 2024-12-20 01:37:31 +08:00
  • cdf22afdda [Misc] Clean up and consolidate LRUCache (#11339) Cyrus Leung 2024-12-20 00:59:32 +08:00
  • e24113a8fe [Model] Refactor Qwen2-VL to use merged multimodal processor (#11258) Isotr0py 2024-12-20 00:28:00 +08:00
  • 7379b3d4b2 [V1] Fix multimodal profiling for Molmo (#11325) Roger Wang 2024-12-19 08:27:22 -08:00
  • 6c7f881541 [Model] Add JambaForSequenceClassification model (#10860) Yehoshua Cohen 2024-12-19 16:48:06 +02:00
  • a0f7d53beb [Bugfix] Cleanup Pixtral HF code (#11333) Cyrus Leung 2024-12-19 21:22:00 +08:00
  • 5aef49806d [Feature] Add load generation config from model (#11164) Yanyi Liu 2024-12-19 18:50:38 +08:00
  • 98356735ac [misc] benchmark_throughput : Add LoRA (#11267) Varun Sundar Rabindranath 2024-12-19 02:43:16 -05:00
  • f26c4aeecb [Misc] Optimize ray worker initialization time (#11275) Rui Qiao 2024-12-18 23:38:02 -08:00
  • 8936316d58 [Kernel] Refactor Cutlass c3x (#10049) Varun Sundar Rabindranath 2024-12-19 02:00:18 -05:00
  • 6142ef0ada [VLM] Merged multimodal processor for Qwen2-Audio (#11303) Cyrus Leung 2024-12-19 14:14:17 +08:00
  • c6b0a7d3ba [V1] Simplify prefix caching logic by removing num_evictable_computed_blocks (#11310) Chen Zhang 2024-12-18 20:17:12 -08:00
  • a30482f054 [CI] Expand test_guided_generate to test all backends (#11313) Michael Goin 2024-12-18 23:00:38 -05:00
  • 17ca964273 [Model] IBM Granite 3.1 (#11307) Travis Johnson 2024-12-18 20:27:24 -07:00
  • 5a9da2e6e9 [Bugfix][Build/CI] Fix sparse CUTLASS compilation on CUDA [12.0, 12.2) (#11311) Tyler Michael Smith 2024-12-18 21:43:30 -05:00
  • fdea8ec167 [V1] VLM - enable processor cache by default (#11305) Alexander Matveev 2024-12-18 18:54:46 -05:00
  • ca5f54a9b9 [Bugfix] fix minicpmv test (#11304) Joe Runde 2024-12-18 10:34:26 -08:00
  • f954fe0e65 [FIX] update openai version (#11287) Kunshang Ji 2024-12-19 02:17:05 +08:00
  • 362cff1eb3 [CI][Misc] Remove Github Action Release Workflow (#11274) Simon Mo 2024-12-18 10:16:53 -08:00
  • 996aa70f00 [Bugfix] Fix broken phi3-v mm_processor_kwargs tests (#11263) Isotr0py 2024-12-19 02:16:40 +08:00
  • 60508ffda9 [Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995) Dipika Sikka 2024-12-18 09:57:16 -05:00
  • f04e407e6b [MISC][XPU]update ipex link for CI fix (#11278) Yan Ma 2024-12-18 14:34:23 +08:00
  • 8b79f9e107 [Bugfix] Fix guided decoding with tokenizer mode mistral (#11046) Wallas Henrique 2024-12-18 03:34:08 -03:00
  • 866fa4550d [Bugfix] Restore support for larger block sizes (#11259) Konrad Zawora 2024-12-18 01:39:07 +01:00
  • bf8717ebae [V1] Prefix caching for vision language models (#11187) Cody Yu 2024-12-17 16:37:59 -08:00
  • c77eb8a33c [Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264) Michael Goin 2024-12-17 19:34:06 -05:00
  • 2d1b9baa8f [Bugfix] Fix request cancellation without polling (#11190) v0.6.5 Joe Runde 2024-12-17 13:26:32 -07:00
  • f9ecbb18bf [Misc] Allow passing logits_soft_cap for xformers backend (#11252) Isotr0py 2024-12-17 16:37:04 +08:00
  • 02222a0256 [Misc] Kernel Benchmark for RMSNorm (#11241) Roger Wang 2024-12-16 22:57:02 -08:00
  • 2bfdbf2a36 [V1][Core] Use weakref.finalize instead of atexit (#11242) Tyler Michael Smith 2024-12-17 01:11:33 -05:00
  • e88db68cf5 [Platform] platform agnostic for EngineArgs initialization (#11225) wangxiyuan 2024-12-17 14:11:06 +08:00
  • 59c9b6ebeb [V1][VLM] Proper memory profiling for image language models (#11210) Roger Wang 2024-12-16 22:10:57 -08:00
  • 66d4b16724 [Frontend] Add OpenAI API support for input_audio (#11027) kYLe 2024-12-17 00:09:58 -06:00
  • 0064f697d3 [CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935) Michael Goin 2024-12-16 22:39:58 -05:00
  • 35bae114a8 fix gh200 tests on main (#11246) youkaichao 2024-12-16 17:22:38 -08:00
  • 88a412ed3d [torch.compile] fast inductor (#11108) youkaichao 2024-12-16 16:15:22 -08:00
  • c301616ed2 [ci][tests] add gh200 tests (#11244) youkaichao 2024-12-16 15:53:18 -08:00
  • 35ffa682b1 [Docs] hint to enable use of GPU performance counters in profiling tools for multi-node distributed serving (#11235) bk-TurbaAI 2024-12-16 23:20:39 +01:00
  • 551603feff [core] overhaul memory profiling and fix backward compatibility (#10511) youkaichao 2024-12-16 13:32:25 -08:00
  • efbce85f4d [misc] Layerwise profile updates (#10242) Varun Sundar Rabindranath 2024-12-16 13:14:57 -05:00
  • 2ca830dbaa [Doc] Reorder vision language examples in alphabet order (#11228) Isotr0py 2024-12-16 19:23:33 +08:00
  • d927dbcd88 [Model] Refactor Ultravox to use merged input processor (#11198) Isotr0py 2024-12-16 18:09:53 +08:00
  • bddbbcb132 [Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203) Jani Monoses 2024-12-16 11:56:19 +02:00
  • b3b1526f03 WIP: [CI/Build] simplify Dockerfile build for ARM64 / GH200 (#11212) cennn 2024-12-16 17:20:49 +08:00
  • 17138af7c4 [Bugfix] Fix the default value for temperature in ChatCompletionRequest (#11219) yansh97 2024-12-16 16:15:40 +08:00
  • 69ba344de8 [Bugfix] Fix block size validation (#10938) chenqianfzh 2024-12-15 16:38:40 -08:00
  • da6f409246 Update deploying_with_k8s.rst (#10922) AlexHe99 2024-12-16 08:33:58 +08:00
  • 25ebed2f8c [V1][Minor] Cache np arange to reduce input preparation overhead (#11214) Woosuk Kwon 2024-12-15 13:33:00 -08:00
  • d263bd9df7 [Core] Support disaggregated prefill with Mooncake Transfer Engine (#10884) shangmingc 2024-12-16 05:28:18 +08:00
  • 38e599d6a8 [Doc] add documentation for disaggregated prefilling (#11197) Kuntai Du 2024-12-15 13:31:16 -06:00
  • 96d673e0f8 [Bugfix] Fix error handling of unsupported sliding window (#11213) Cyrus Leung 2024-12-16 01:59:42 +08:00
  • b10609e6a1 [Misc] Clean up multi-modal processor (#11207) Cyrus Leung 2024-12-15 14:30:28 +08:00
  • a1c02058ba [torch.compile] allow tracking forward time (#11081) youkaichao 2024-12-14 19:45:00 -08:00
  • 15859f2357 [[Misc]Upgrade bitsandbytes to the latest version 0.45.0 (#11201) Jee Jee Li 2024-12-15 11:03:06 +08:00
  • 886936837c [Performance][Core] Optimize the performance of evictor v1 and v2 by applying a priority queue and lazy deletion (#7209) Sungjae Lee 2024-12-15 04:38:10 +09:00