Commit Graph

  • bb7991aa29 [V1] Add missing tokenizer options for Detokenizer (#10288) Roger Wang 2024-11-13 03:02:56 -08:00
  • d909acf9fe [Model][LoRA]LoRA support added for idefics3 (#10281) B-201 2024-11-13 17:25:59 +08:00
  • b6dde33019 [Core] Flashinfer - Remove advance step size restriction (#10282) Pavani Majety 2024-11-13 00:29:32 -08:00
  • 1b886aa104 [Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 (#9944) Austin Veselka 2024-11-13 02:28:13 -06:00
  • 3945c82346 [Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions (#10221) 电脑星人 2024-11-13 15:07:22 +08:00
  • 032fcf16ae [Doc] Fix typo in arg_utils.py (#10264) Xin Yang 2024-11-12 21:54:52 -08:00
  • 56a955e774 Bump to compressed-tensors v0.8.0 (#10279) Dipika Sikka 2024-11-13 00:54:10 -05:00
  • bbd3e86926 [V1] Support VLMs with fine-grained scheduling (#9871) Woosuk Kwon 2024-11-12 20:53:13 -08:00
  • 0d4ea3fb5c [core][distributed] use tcp store directly (#10275) youkaichao 2024-11-12 17:36:08 -08:00
  • 112fa0bbe5 [V1] Fix CI tests on V1 engine (#10272) Woosuk Kwon 2024-11-12 16:17:20 -08:00
  • 377b74fe87 Revert "[ci][build] limit cmake version" (#10271) youkaichao 2024-11-12 15:06:48 -08:00
  • 18081451f9 [doc] improve debugging doc (#10270) youkaichao 2024-11-12 14:43:52 -08:00
  • 96ae0eaeb2 [doc] fix location of runllm widget (#10266) youkaichao 2024-11-12 14:34:39 -08:00
  • 1f55e05713 [V1] Enable Inductor when using piecewise CUDA graphs (#10268) Woosuk Kwon 2024-11-12 13:39:56 -08:00
  • 8a06428c70 [LoRA] Adds support for bias in LoRA (#5733) Umesh 2024-11-12 11:08:40 -08:00
  • b41fb9d3b1 [Encoder Decoder] Update Mllama to run with both FlashAttention and XFormers (#9982) sroy745 2024-11-12 10:53:57 -08:00
  • 7c65527918 [V1] Use pickle for serializing EngineCoreRequest & Add multimodal inputs to EngineCoreRequest (#10245) Woosuk Kwon 2024-11-12 08:57:14 -08:00
  • 47db6ec831 [Frontend] Add per-request number of cached token stats (#10174) zifeitong 2024-11-12 08:42:28 -08:00
  • 176fcb1c71 [Bugfix] Fix QwenModel argument (#10262) Jie Fu (傅杰) 2024-11-13 00:36:51 +08:00
  • a838ba7254 [Misc]Fix Idefics3Model argument (#10255) Jee Jee Li 2024-11-12 21:07:11 +08:00
  • 36c513a076 [BugFix] Do not raise a ValueError when tool_choice is set to the supported none option and tools are not defined. (#10000) Guillaume Calmettes 2024-11-12 12:13:46 +01:00
  • d201d41973 [CI][CPU]refactor CPU tests to allow to bind with different cores (#10222) Yuan 2024-11-12 18:07:32 +08:00
  • 3a28f18b0b [doc] explain the class hierarchy in vLLM (#10240) youkaichao 2024-11-11 22:56:44 -08:00
  • 812c981fa0 Splitting attention kernel file (#10091) Aleksandr Malyshev 2024-11-11 22:55:07 -08:00
  • 7f5edb5900 [Misc][LoRA] Replace hardcoded cuda device with configurable argument (#10223) Jee Jee Li 2024-11-12 11:10:15 +08:00
  • eea55cca5b [1/N] torch.compile user interface design (#10237) youkaichao 2024-11-11 18:01:06 -08:00
  • 9cdba9669c [Doc] Update help text for --distributed-executor-backend (#10231) Russell Bryant 2024-11-11 20:55:09 -05:00
  • d1c6799b88 [doc] update debugging guide (#10236) youkaichao 2024-11-11 15:21:12 -08:00
  • 6ace6fba2c [V1] AsyncLLM Implementation (#9826) Robert Shaw 2024-11-11 18:05:38 -05:00
  • 08f93e7439 Make shutil rename in python_only_dev (#10233) Nikolai Shcheglov 2024-11-11 16:29:19 -06:00
  • 9d5b4e4dea [V1] Enable custom ops with piecewise CUDA graphs (#10228) Woosuk Kwon 2024-11-11 11:58:07 -08:00
  • 8a7fe47d32 [misc][distributed] auto port selection and disable tests (#10226) youkaichao 2024-11-11 11:54:59 -08:00
  • 4800339c62 Add docs on serving with Llama Stack (#10183) Yuan Tang 2024-11-11 14:28:55 -05:00
  • fe15729a2b [V1] Use custom ops for piecewise CUDA graphs (#10227) Woosuk Kwon 2024-11-11 11:26:48 -08:00
  • 330e82d34a [v1][torch.compile] support managing cudagraph buffer (#10203) youkaichao 2024-11-11 11:10:27 -08:00
  • d7a4f2207b [V1] Do not use inductor for piecewise CUDA graphs (#10225) Woosuk Kwon 2024-11-11 11:05:57 -08:00
  • f9dadfbee3 [V1] Fix detokenizer ports (#10224) Woosuk Kwon 2024-11-11 10:42:07 -08:00
  • 25144ceed0 Bump actions/setup-python from 5.2.0 to 5.3.0 (#10209) dependabot[bot] 2024-11-11 17:24:10 +00:00
  • e6de9784d2 [core][distributed] add stateless process group (#10216) youkaichao 2024-11-11 09:02:14 -08:00
  • 36fc439de0 [Doc] fix doc string typo in block_manager swap_out function (#10212) Yangcheng Li 2024-11-12 00:53:07 +08:00
  • 874f551b36 [Metrics] add more metrics (#4464) harrywu 2024-11-12 00:17:38 +08:00
  • 2cebda42bb [Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner (#10218) Isotr0py 2024-11-11 20:37:58 +08:00
  • 5fb1f935b0 [V1] Allow tokenizer_mode and trust_remote_code for Detokenizer (#10211) Roger Wang 2024-11-11 02:01:18 -08:00
  • 36e4acd02a [LoRA][Kernel] Remove the unused libentry module (#10214) Jee Jee Li 2024-11-11 17:43:23 +08:00
  • 58170d6503 [Hardware][CPU] Add embedding models support for CPU backend (#10193) Isotr0py 2024-11-11 16:54:28 +08:00
  • 9804ac7c7c Bump the patch-update group with 5 updates (#10210) dependabot[bot] 2024-11-11 07:22:40 +00:00
  • f89d18ff74 [6/N] pass whole config to inner model (#10205) youkaichao 2024-11-10 22:41:46 -08:00
  • f0f2e5638e [doc] improve debugging code (#10206) youkaichao 2024-11-10 17:49:40 -08:00
  • ad9a78bf64 [Doc] Fix typo error in vllm/entrypoints/openai/cli_args.py (#10196) yansh97 2024-11-11 08:14:22 +08:00
  • 73b9083e99 [misc] improve cloudpickle registration and tests (#10202) youkaichao 2024-11-10 16:10:53 -08:00
  • 20cf2f553c [Misc] small fixes to function tracing file path (#9543) Shawn Du 2024-11-11 07:21:06 +08:00
  • bfb7d61a7c [doc] Polish the integration with huggingface doc (#10195) Yongzao 2024-11-11 02:22:04 +08:00
  • 19682023b6 [Doc] Fix typo error in CONTRIBUTING.md (#10190) FuryMartin 2024-11-10 15:47:24 +08:00
  • 9fa4bdde9d [ci][build] limit cmake version (#10188) youkaichao 2024-11-09 16:27:26 -08:00
  • 51c2e1fcef [CI/Build] Split up models tests (#10069) Cyrus Leung 2024-11-10 03:39:14 +08:00
  • b09895a618 [Frontend][Core] Override HF config.json via CLI (#5836) Krishna Mandal 2024-11-09 08:19:27 -08:00
  • d88bff1b96 [Frontend] add add_request_id middleware (#9594) cjackal 2024-11-09 19:18:29 +09:00
  • 9e37266420 bugfix: fix the bug that stream generate not work (#2756) Zhao Yingzhuo 2024-11-09 18:09:48 +08:00
  • 8a4358ecb5 [doc] explaining the integration with huggingface (#10173) youkaichao 2024-11-09 01:02:54 -08:00
  • bd46357ad9 [bugfix] fix broken tests of mlp speculator (#10177) youkaichao 2024-11-09 00:04:50 -08:00
  • f192aeba74 [Bugfix] Enable some fp8 and quantized fullgraph tests (#10171) bnellnm 2024-11-09 03:01:27 -05:00
  • 8e1529dc57 [CI/Build] Add run-hpu-test.sh script (#10167) Chendi.Xue 2024-11-09 00:26:52 -06:00
  • 1a95f10ee7 [5/N] pass the whole config to model (#9983) youkaichao 2024-11-08 22:17:28 -08:00
  • 49d2a41a86 [Doc] Adjust RunLLM location (#10176) Cyrus Leung 2024-11-09 12:07:10 +08:00
  • 47672f38b5 [CI/Build] Fix VLM broadcast tests tensor_parallel_size passing (#10161) Isotr0py 2024-11-09 12:02:59 +08:00
  • f83feccd7f [Bugfix] Ignore GPTQ quantization of Qwen2-VL visual module (#10169) Michael Goin 2024-11-08 22:36:46 -05:00
  • e0191a95d8 [0/N] Rename MultiModalInputs to MultiModalKwargs (#10040) Cyrus Leung 2024-11-09 11:31:02 +08:00
  • d7edca1dee [CI/Build] Adding timeout in CPU CI to avoid CPU test queue blocking (#6892) Li, Jiang 2024-11-09 11:27:11 +08:00
  • 127c07480e [Kernel][Triton] Add Triton implementation for scaled_mm_triton to support fp8 and int8 SmoothQuant, symmetric case (#9857) rasmith 2024-11-08 18:59:22 -06:00
  • 10b67d865d [Bugfix] SymIntArrayRef expected to contain concrete integers (#10170) bnellnm 2024-11-08 17:44:18 -05:00
  • 4f93dfe952 [torch.compile] Fuse RMSNorm with quant (#9138) Luka Govedič 2024-11-08 16:20:08 -05:00
  • e1b5a82179 Rename vllm.logging to vllm.logging_utils (#10134) Florian Zimmermeister 2024-11-08 21:53:24 +01:00
  • 87713c6053 [CI/Build] Ignore .gitignored files for shellcheck (#10162) Luka Govedič 2024-11-08 14:53:36 -05:00
  • b5815c8413 [V1] Fix non-cudagraph op name (#10166) Woosuk Kwon 2024-11-08 10:23:04 -08:00
  • 6b30471586 [Misc] Improve Web UI (#10090) Rafael Vasquez 2024-11-08 12:51:04 -05:00
  • f6778620a9 Disable spec-decode + chunked-prefill for draft models with tensor parallelism > 1 (#10136) sroy745 2024-11-08 07:56:18 -08:00
  • 0535e5fe6c Fix edge case Mistral tokenizer (#10152) Patrick von Platen 2024-11-08 16:42:27 +01:00
  • b489fc3c91 [CI/Build] Update CPU tests to include all "standard" tests (#5481) Cyrus Leung 2024-11-08 23:30:04 +08:00
  • 208ce622c7 [V1]Enable APC by default only for text models (#10148) Roger Wang 2024-11-08 06:39:41 -08:00
  • 1ff4aed5bd [Model] Expose size to Idefics3 as mm_processor_kwargs (#10146) Isotr0py 2024-11-08 17:56:58 +08:00
  • f10797c0ce [Bugfix][XPU] Fix xpu tp by introducing XpuCommunicator (#10144) Yan Ma 2024-11-08 17:41:03 +08:00
  • f4c2187e29 [Misc] Fix typo in #5895 (#10145) Cyrus Leung 2024-11-08 17:07:01 +08:00
  • aea6ad629f Add hf_transfer to testing image (#10096) Michael Goin 2024-11-08 03:35:25 -05:00
  • da07a9ead7 Fixes a typo about 'max_decode_seq_len' which causes crashes with cuda graph. (#9285) Tao He 2024-11-08 13:31:28 +08:00
  • 3a7f15a398 [Doc] Move CONTRIBUTING to docs site (#9924) Russell Bryant 2024-11-08 00:15:12 -05:00
  • 7371749d54 [Misc] Fix ImportError causing by triton (#9493) Mengqing Cao 2024-11-08 13:08:51 +08:00
  • ad39bd640c [Bugfix] Add error handling when server cannot respond any valid tokens (#5895) DearPlanet 2024-11-08 12:58:37 +08:00
  • 40d0e7411d [Doc] Update FAQ links in spec_decode.rst (#9662) whyiug 2024-11-08 12:44:58 +08:00
  • 6bb52b0f97 [CI/Build] Give PR cleanup job PR write access (#10139) Russell Bryant 2024-11-07 23:10:20 -05:00
  • 201fc07730 [V1] Prefix caching (take 2) (#9972) Cody Yu 2024-11-07 17:34:44 -08:00
  • 42b4f46b71 [V1] Add all_token_ids attribute to Request (#10135) Woosuk Kwon 2024-11-07 17:08:24 -08:00
  • 073a472728 [Misc] report relevant env vars in collect_env.py tool (#9293) Jiangtao Hu 2024-11-07 16:14:01 -08:00
  • 93bff421bc Bump actions/checkout from 4.2.1 to 4.2.2 (#9746) dependabot[bot] 2024-11-07 21:44:58 +00:00
  • 28b2877d30 Online video support for VLMs (#10020) litianjian 2024-11-08 04:25:59 +08:00
  • 97b8475beb Bump actions/setup-python from 5.2.0 to 5.3.0 (#9745) dependabot[bot] 2024-11-07 18:55:35 +00:00
  • a2f1f3b089 [CI/Build] Automate PR body text cleanup (#10082) Russell Bryant 2024-11-07 13:26:28 -05:00
  • 3be5b26a76 [CI/Build] Add shell script linting using shellcheck (#7925) Russell Bryant 2024-11-07 13:17:29 -05:00
  • de0e61a323 [CI/Build] Always run mypy (#10122) Russell Bryant 2024-11-07 11:43:16 -05:00
  • 9d43afcc53 [Feature] [Spec decode]: Combine chunked prefill with speculative decoding (#9291) Nicolò Lucchesi 2024-11-07 17:15:14 +01:00
  • ae62fd17c0 [Frontend] Tool calling parser for Granite 3.0 models (#9027) Maximilien de Bayser 2024-11-07 12:09:02 -03:00