Commit Graph

  • 415f76a9cb Support mistral interleaved attn (#9414) Patrick von Platen 2024-10-16 15:28:30 +02:00
  • cf1d62a644 [Model] Support SDPA attention for Molmo vision backbone (#9410) Isotr0py 2024-10-16 19:52:01 +08:00
  • 59230ef32b [Misc] Consolidate example usage of OpenAI client for multimodal models (#9412) Roger Wang 2024-10-16 04:20:51 -07:00
  • cee711fdbb [Core] Rename input data types (#8688) Cyrus Leung 2024-10-16 18:49:37 +08:00
  • 1de76a0e55 [CI/Build] Test VLM embeddings (#9406) Cyrus Leung 2024-10-16 17:44:30 +08:00
  • 7abba39ee6 [Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303) Cyrus Leung 2024-10-16 14:31:00 +08:00
  • 7e7eae338d [Misc] Standardize RoPE handling for Qwen2-VL (#9250) Cyrus Leung 2024-10-16 13:56:17 +08:00
  • ed920135c8 [Bugfix] Molmo text-only input bug fix (#9397) Reza Salehi 2024-10-15 21:56:09 -07:00
  • 717a5f82cd [Bugfix][CI/Build] Fix CUDA 11.8 Build (#9386) Lucas Wilkinson 2024-10-15 20:15:21 -04:00
  • ba30942240 [Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034) Chang Su 2024-10-15 15:40:43 -07:00
  • 22f8a69549 [Misc] Directly use compressed-tensors for checkpoint definitions (#8909) Michael Goin 2024-10-15 18:40:25 -04:00
  • 5d264f4ab8 pass ignore_eos parameter to all benchmark_serving calls (#9349) Grace Ho 2024-10-15 13:30:44 -07:00
  • e9d517f276 [BugFix] Fix chat API continuous usage stats (#9357) Nick Hill 2024-10-15 07:19:48 +01:00
  • 55e081fbad [Bugfix] Update InternVL input mapper to support image embeds (#9351) hhzhang16 2024-10-14 21:29:19 -07:00
  • 8e836d982a [Doc] Fix code formatting in spec_decode.rst (#9348) Michael Goin 2024-10-15 00:29:11 -04:00
  • 44eaa5a5d9 [Frontend] Clarify model_type error messages (#9345) Steve Grubb 2024-10-15 00:29:01 -04:00
  • 169b530607 [Bugfix] Clean up some cruft in mamba.py (#9343) Tyler Michael Smith 2024-10-14 20:24:25 -04:00
  • f0fe4fe86d [Model] Make llama3.2 support multiple and interleaved images (#9095) Xiang Xu 2024-10-14 15:24:26 -07:00
  • 4d31cd424b [Frontend] merge beam search implementations (#9296) Brendan Wong 2024-10-14 15:05:52 -07:00
  • 473e7b3606 [TPU] Fix TPU SMEM OOM by Pallas paged attention kernel (#9350) Woosuk Kwon 2024-10-14 15:02:06 -07:00
  • fd47e57f4b [Docs] Remove PDF build from Readtehdocs (#9347) v0.6.3 Simon Mo 2024-10-14 11:57:47 -07:00
  • 203ab8f80f [CI/Build] setuptools-scm fixes (#8900) Daniele 2024-10-14 20:34:47 +02:00
  • 4141608c6a [Hardware][intel GPU] add async output process for xpu (#8897) Kunshang Ji 2024-10-15 02:23:33 +08:00
  • dfe43a2071 [Model] Molmo vLLM Integration (#9016) Reza Salehi 2024-10-14 07:56:24 -07:00
  • 16b24e7dcd [Bugfix] Bandaid fix for speculative decoding tests (#9327) Tyler Michael Smith 2024-10-13 19:02:11 -04:00
  • f519902c52 [CI] Fix merge conflict (#9317) Lily Liu 2024-10-12 23:41:23 -07:00
  • 250e26a63e [Bugfix]Fix MiniCPM's LoRA bug (#9286) Jee Jee Li 2024-10-13 00:36:47 +08:00
  • 2b184ddd4f [Misc][Installation] Improve source installation script and doc (#9309) Yunmeng 2024-10-13 00:36:40 +08:00
  • 00298e092c [Bugfix] Fix bug of xformer prefill for encoder-decoder (#9026) Xiang Xu 2024-10-12 00:00:43 -07:00
  • 89feb4c84d [SpecDec] Remove Batch Expansion (2/3) (#9298) Lily Liu 2024-10-11 22:13:37 -07:00
  • ec10cb8511 [BugFix] Fix tool call finish reason in streaming case (#9209) Maximilien de Bayser 2024-10-11 22:24:26 -03:00
  • d11b46f3a5 [bugfix] fix f-string for error (#9295) Prashant Gupta 2024-10-11 17:03:48 -07:00
  • c6cf9295e1 [Bugfix] Sets is_first_step_output for TPUModelRunner (#9202) Allen Wang 2024-10-11 15:28:10 -05:00
  • de9fb4bef8 [Bugfix][CI/Build] Fix docker build where CUDA archs < 7.0 are being detected (#9254) Lucas Wilkinson 2024-10-11 15:57:39 -04:00
  • 8baf85e4e9 [Doc] Compatibility matrix for mutual exclusive features (#8512) Wallas Henrique 2024-10-11 15:18:50 -03:00
  • 1a1823871d [Doc] Remove outdated comment to avoid misunderstanding (#9287) homeffjy 2024-10-12 02:02:03 +08:00
  • 6cf1167c1a [Model] Add GLM-4v support and meet vllm==0.6.2 (#9242) sixgod 2024-10-12 01:36:13 +08:00
  • f710090d8e [Kernel] adding fused moe kernel config for L40S TP4 (#9245) Burkhard Ringlein 2024-10-11 08:54:22 -07:00
  • 7342a7d7f8 [Model] Support Mamba (#6484) Tyler Michael Smith 2024-10-11 11:40:06 -04:00
  • df3dcdf49d [Bugfix] Fix priority in multiprocessing engine (#9277) Sebastian Schoennenbeck 2024-10-11 17:35:35 +02:00
  • 36ea79079b [Misc][LoRA] Support loading LoRA weights for target_modules in reg format (#9275) Jee Jee Li 2024-10-11 20:31:21 +08:00
  • e808156f30 [Misc] Collect model support info in a single process per model (#9233) Cyrus Leung 2024-10-11 19:08:11 +08:00
  • cbc2ef5529 [misc] hide best_of from engine (#9261) youkaichao 2024-10-10 21:30:44 -07:00
  • 94bf9ae4e9 [Misc] Fix sampling from sonnet for long context case (#9235) Andy Dai 2024-10-10 17:33:16 -07:00
  • f990bab2a4 [Doc][Neuron] add note to neuron documentation about resolving triton issue (#9257) omrishiv 2024-10-10 16:36:32 -07:00
  • e00c094f15 [torch.compile] generic decorators (#9258) youkaichao 2024-10-10 15:54:23 -07:00
  • a78c6ba7c8 [ci/build] Add placeholder command for custom models test (#9262) Kevin H. Luu 2024-10-10 15:45:09 -07:00
  • fb870fd491 Bump actions/setup-python from 3 to 5 (#9195) dependabot[bot] 2024-10-10 13:30:46 -07:00
  • 270953bafb Bump actions/checkout from 3 to 4 (#9196) dependabot[bot] 2024-10-10 13:30:35 -07:00
  • 9cc811c4ff Bump actions/github-script from 6 to 7 (#9197) dependabot[bot] 2024-10-10 13:30:24 -07:00
  • e4d652ea3e [torch.compile] integration with compilation control (#9058) youkaichao 2024-10-10 12:39:36 -07:00
  • 78c0b4166c Suggest codeowners for the core componenets (#9210) Simon Mo 2024-10-10 12:29:24 -07:00
  • 21efb603f5 [CI/Build] Make the Dockerfile.cpu file's PIP_EXTRA_INDEX_URL Configurable as a Build Argument (#9252) jordanyono 2024-10-10 14:18:18 -04:00
  • 055f3270d4 [Doc] Improve debugging documentation (#9204) Rafael Vasquez 2024-10-10 13:48:51 -04:00
  • 18511aeda6 [Bugfix] Fix Machete unittests failing with NotImplementedError (#9218) Lucas Wilkinson 2024-10-10 13:39:56 -04:00
  • 83ea5c72b9 [OpenVINO] Use torch 2.4.0 and newer optimim version (#9121) Ilya Lavrenov 2024-10-10 21:18:58 +04:00
  • 04de9057ab [Model] support input image embedding for minicpmv (#9237) whyiug 2024-10-10 23:00:47 +08:00
  • 07c11cf4d4 [Bugfix] Fix lm_head weights tying with lora for llama (#9227) Isotr0py 2024-10-10 21:11:56 +08:00
  • f3a507f1d3 [Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149) sroy745 2024-10-09 23:17:17 -07:00
  • a64e7b9407 [Bugfix] Machete garbage results for some models (large K dim) (#9212) Lucas Wilkinson 2024-10-10 02:16:17 -04:00
  • ce00231a8b [Bugfix] Fix Weight Loading Multiple GPU Test - Large Models (#9213) Michael Goin 2024-10-10 02:15:40 -04:00
  • de895f1697 [misc] improve model support check in another process (#9208) youkaichao 2024-10-09 21:58:27 -07:00
  • cf25b93bdd [Core] Fix invalid args to _process_request (#9201) Russell Bryant 2024-10-10 00:10:09 -04:00
  • d5fbb8706d [CI/Build] Update Dockerfile install+deploy image to ubuntu 22.04 (#9130) Michael Goin 2024-10-09 14:51:47 -04:00
  • cdca8994bd [CI/Build] mypy: check vllm/entrypoints (#9194) Russell Bryant 2024-10-09 13:15:28 -04:00
  • ca77dd7a44 [Hardware][CPU] Support AWQ for CPU backend (#7515) Li, Jiang 2024-10-10 00:28:08 +08:00
  • 7dea289066 Add Dependabot configuration for GitHub Actions updates (#1217) Ewout ter Hoeven 2024-10-09 17:16:26 +02:00
  • cfaa6008e6 [Bugfix] Access get_vocab instead of vocab in tool parsers (#9188) Cyrus Leung 2024-10-09 22:59:57 +08:00
  • 21906a6f50 [Bugfix] Fix lora loading for Compressed Tensors in #9120 (#9179) Ahmad Fahadh Ilyas 2024-10-09 05:10:44 -07:00
  • dc4aea677a [Doc] Fix VLM prompt placeholder sample bug (#9170) Jiangtao Hu 2024-10-09 16:59:42 +08:00
  • c8627cd41b [ci][test] use load dummy for testing (#9165) youkaichao 2024-10-09 00:38:40 -07:00
  • 8bfaa4e31e [Bugfix] fix composite weight loading and EAGLE weight loading (#9160) Cyrus Leung 2024-10-09 15:36:55 +08:00
  • 0b5b5d767e [Frontend] Log the maximum supported concurrency (#8831) AlpinDale 2024-10-09 07:03:14 +00:00
  • cdc72e3c80 [Model] Remap FP8 kv_scale in CommandR and DBRX (#9174) Hui Liu 2024-10-08 23:43:06 -07:00
  • 7627172bf4 [Bugfix][Doc] Report neuron error in output (#9159) Joe Rowell 2024-10-09 06:43:34 +01:00
  • 480b7f40cf [Misc] Improve validation errors around best_of and n (#9167) Travis Johnson 2024-10-08 22:54:48 -06:00
  • acce7630c1 Update link to KServe deployment guide (#9173) Yuan Tang 2024-10-08 23:58:49 -04:00
  • ffc4b27ea8 Add classifiers in setup.py (#9171) Yuan Tang 2024-10-08 22:30:48 -04:00
  • 2f4117c38e support bitsandbytes quantization with more models (#9148) chenqianfzh 2024-10-08 18:52:19 -07:00
  • 9ba0bd6aa6 Add lm-eval directly to requirements-test.txt (#9161) Michael Goin 2024-10-08 21:22:31 -04:00
  • 2a131965a8 mypy: check additional directories (#9162) Russell Bryant 2024-10-08 18:08:22 -04:00
  • bd37b9fbe2 [Bugfix] Try to handle older versions of pytorch (#9086) bnellnm 2024-10-08 17:28:12 -04:00
  • de24046fcd [Doc] Improve contributing and installation documentation (#9132) Rafael Vasquez 2024-10-08 16:22:08 -04:00
  • 1874c6a1b0 [Doc] Update vlm.rst to include an example on videos (#9155) Sayak Paul 2024-10-08 23:42:29 +05:30
  • 9a94ca4a5d [Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537) Daniele 2024-10-08 18:38:40 +02:00
  • cfba685bd4 [CI/Build] Add examples folder into Docker image so that we can leverage the templates*.jinja when serving models (#8758) Peter Pan 2024-10-09 00:37:34 +08:00
  • 069d3bd8d0 [Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151) Alex Brooks 2024-10-08 08:31:26 -06:00
  • a3691b6b5e [Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131) Alex Brooks 2024-10-08 08:12:56 -06:00
  • 8c746226c9 [Frontend] API support for beam search for MQLLMEngine (#9117) Brendan Wong 2024-10-07 22:51:43 -07:00
  • e1faa2a598 [misc] improve ux on readme (#9147) youkaichao 2024-10-07 22:26:25 -07:00
  • 80b57f00d5 [Intel GPU] Fix xpu decode input (#9145) Kunshang Ji 2024-10-08 11:51:14 +08:00
  • 04c12f8157 [misc] update utils to support comparing multiple settings (#9140) youkaichao 2024-10-07 19:51:49 -07:00
  • 8eeb857084 Add Slack to README (#9137) Simon Mo 2024-10-07 17:06:21 -07:00
  • fa45513a51 [misc] fix comment and variable name (#9139) youkaichao 2024-10-07 16:07:05 -07:00
  • c0d9a98d0c [Doc] Include performance benchmark in README (#9135) Kuntai Du 2024-10-07 15:04:06 -07:00
  • e0dbdb013d [CI/Build] Add linting for github actions workflows (#7876) Russell Bryant 2024-10-07 17:18:10 -04:00
  • 93cf74a8a7 [Doc]: Add deploying_with_k8s guide (#8451) TimWang 2024-10-08 04:31:45 +08:00
  • 151ef4efd2 [Model] Support NVLM-D and fix QK Norm in InternViT (#9045) Cyrus Leung 2024-10-07 19:55:12 +08:00
  • f19da64871 [Core] Refactor GGUF parameters packing and forwarding (#8859) Isotr0py 2024-10-07 18:01:46 +08:00
  • 4f95ffee6f [Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089) Isotr0py 2024-10-07 14:50:35 +08:00