Commit Graph

  • 772a66732d [platforms] restore xpu check for parallel config (#10479) youkaichao 2024-11-20 09:13:28 -08:00
  • 63f1fde277 [Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355) Li, Jiang 2024-11-20 18:57:39 +08:00
  • d5b28447e0 [Platforms] Refactor xpu code (#10468) Mengqing Cao 2024-11-20 14:52:13 +08:00
  • 09dbf9ff16 [Bugfix] Handle conflicts between modern and legacy fields (#10471) Cyrus Leung 2024-11-20 14:45:08 +08:00
  • 343041c4c4 [model] Reduce medusa weight (#10454) Sky Lee 2024-11-20 14:05:55 +08:00
  • ed701ca963 [ci/build] Combine nightly and optional (#10465) Kevin H. Luu 2024-11-19 19:36:03 -10:00
  • 7629a9c6e5 [CI/Build] Support compilation with local cutlass path (#10423) (#10424) wchen61 2024-11-20 13:35:50 +08:00
  • 709c9f1f25 [CI/Build] Add sphinx/rst linter for docs (#10366) Rafael Vasquez 2024-11-20 00:35:31 -05:00
  • b4be5a8adb [Bugfix] Enforce no chunked prefill for embedding models (#10470) Cyrus Leung 2024-11-20 13:12:51 +08:00
  • ad44437ba3 [Bugfix] Fix Mamba model initialization and MLP Speculator weights loading (#10456) Isotr0py 2024-11-20 13:04:05 +08:00
  • 9e05252b46 [Misc] Add __setitem__ for LazyDict (#10469) Yanyi Liu 2024-11-20 12:44:57 +08:00
  • d200972e7f [Bugfix] Marlin 2:4 temp fix for large M dim (>256) (#10464) Lucas Wilkinson 2024-11-19 22:40:33 -05:00
  • d5b68aba2f [CI/Build] Update Dockerfile.rocm (#10434) Alexei-V-Ivanov-AMD 2024-11-19 19:19:59 -06:00
  • a324d3a1a7 Change granite chat template to keep json list formatting for tool calls (#10452) Maximilien de Bayser 2024-11-19 22:16:54 -03:00
  • b00b33d77e [Model][Quantization] HQQ support through Marlin kernel expansion (#9766) ElizaWszola 2024-11-19 22:31:12 +01:00
  • efa9084628 [Core] Avoid metrics log noise when idle (#8868) Russell Bryant 2024-11-19 16:05:25 -05:00
  • 803f37eaaa [6/N] torch.compile rollout to users (#10437) youkaichao 2024-11-19 10:09:03 -08:00
  • fd9f124971 [Doc] fix link for page that was renamed (#10455) Russell Bryant 2024-11-19 12:48:30 -05:00
  • 1ea291a417 Fix: Build error seen on Power Architecture (#10421) Manjul Mohan 2024-11-19 23:04:57 +05:30
  • 11fd7ea639 [Pixtral-Large] Pixtral actually has no bias in vision-lang adapter (#10449) Patrick von Platen 2024-11-19 18:33:06 +01:00
  • f028dff33d [BugFix] Fix hermes tool parser output error stream arguments in some cases (#10395) (#10398) COSMOPlat 2024-11-19 21:42:50 +08:00
  • b4614656b8 [CI][CPU] adding numa node number as container name suffix (#10441) Yuan 2024-11-19 21:16:43 +08:00
  • 25f9c78961 [misc][plugin] improve plugin loading (#10443) youkaichao 2024-11-19 02:43:21 -08:00
  • 5390d6664f [Doc] Add the start of an arch overview page (#10368) Russell Bryant 2024-11-19 04:52:11 -05:00
  • 382b6a4852 [Misc] Avoid misleading warning messages (#10438) Jee Jee Li 2024-11-19 16:54:58 +08:00
  • 272e31c0bd [Bugfix] Guard for negative counter metrics to prevent crash (#10430) Travis Johnson 2024-11-18 21:57:10 -07:00
  • 74f8c2cf5f Add openai.beta.chat.completions.parse example to structured_outputs.rst (#10433) Michael Goin 2024-11-18 23:37:46 -05:00
  • 8c1fb50705 [Platform][Refactor] Extract func get_default_attn_backend to Platform (#10358) Mengqing Cao 2024-11-19 11:22:26 +08:00
  • 7eb719df13 [Bugfix]Fix Phi-3 BNB online quantization (#10417) Jee Jee Li 2024-11-19 11:21:42 +08:00
  • 284203f171 [ci/build] Have dependabot ignore all patch update (#10436) Kevin H. Luu 2024-11-18 15:04:25 -10:00
  • 90a6c759ca [misc] partial prefix & random input generation benchmark (#9929) Ricky Xu 2024-11-18 15:39:14 -08:00
  • 2298e69b5f [ci][bugfix] fix kernel tests (#10431) youkaichao 2024-11-18 15:29:37 -08:00
  • a03ea40792 [3/N][torch.compile] consolidate custom op logging (#10399) youkaichao 2024-11-18 15:14:59 -08:00
  • 96d999fbe8 [Kernel] Initial Machete W4A8 support + Refactors (#9855) Lucas Wilkinson 2024-11-18 14:59:29 -05:00
  • c2170a5b39 [Kernel] Explicitly specify other value in tl.load calls (#9014) Angus Wang 2024-11-18 11:39:40 -08:00
  • 6b2d25efc7 [Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107) Yan Ma 2024-11-19 02:18:05 +08:00
  • 281cc4b3cd [Model][Bugfix] Support TP for PixtralHF ViT (#10405) Michael Goin 2024-11-18 13:04:14 -05:00
  • 4f686d139f Fix open_collective value in FUNDING.yml (#10426) Andrew Nesbitt 2024-11-18 17:52:42 +00:00
  • 31894a2155 [Doc] Add documentation for Structured Outputs (#9943) ismael-dm 2024-11-18 18:52:12 +01:00
  • 7851b45196 [5/N][torch.compile] torch.jit.script --> torch.compile (#10406) youkaichao 2024-11-18 07:20:06 -08:00
  • 4186be8111 [Doc] Update doc for LoRA support in GLM-4V (#10425) B-201 2024-11-18 23:08:30 +08:00
  • e7ebb662d7 [Model] Remove transformers attention porting in VITs (#10414) Isotr0py 2024-11-18 21:45:21 +08:00
  • 5be4e52b65 [Model][LoRA]LoRA support added for glm-4v (#10418) B-201 2024-11-18 20:57:10 +08:00
  • 01aae1cc68 [Model] Remove redundant softmax when using PoolingType.STEP (#10415) Maybewuss 2024-11-18 18:05:36 +08:00
  • c7dec926f6 [VLM] Report multi_modal_placeholders in output (#10407) lkchen 2024-11-18 00:06:16 -08:00
  • 51bb12d17b [4/N][torch.compile] clean up set_torch_compile_backend (#10401) youkaichao 2024-11-17 23:57:20 -08:00
  • 47826cacf0 [Bugfix] Ignore ray reinit error when current platform is ROCm or XPU (#10375) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2024-11-18 05:29:26 +02:00
  • c4e464333e [Misc] Add uninitialized params tracking for AutoWeightsLoader (#10327) Isotr0py 2024-11-18 09:07:46 +08:00
  • d1557e66d3 [Misc] Enhance offline_inference to support user-configurable paramet… (#10392) wchen61 2024-11-17 19:32:40 +08:00
  • 80d85c5d7b [Bugfix] Fix mrope_position_delta in non-last prefill chunk (#10403) 电脑星人 2024-11-17 16:50:24 +08:00
  • 76aab90ab6 [Hardware] [HPU]add mark_step for hpu (#10239) Kunshang Ji 2024-11-17 16:44:44 +08:00
  • 8d74b5aee9 [platforms] refactor cpu code (#10402) youkaichao 2024-11-16 23:14:23 -08:00
  • cf349c4a97 [Bugfix][CPU] Fix CPU embedding runner with tensor parallel (#10394) Isotr0py 2024-11-17 15:12:04 +08:00
  • 905d0f0af4 [CI/Build] Fix IDC hpu [Device not found] issue (#10384) Chendi.Xue 2024-11-17 00:58:22 -06:00
  • 643ecf7b11 [V1] Refactor model executable interface for all text-only language models (#10374) Roger Wang 2024-11-16 21:18:46 -08:00
  • 4fd9375028 [2/N][torch.compile] make compilation cfg part of vllm cfg (#10383) youkaichao 2024-11-16 18:02:14 -08:00
  • 661a34fd4f [V1] Add code owners for V1 (#10397) Woosuk Kwon 2024-11-16 10:45:26 -08:00
  • 361c29e174 [Bugfix] Fix M-RoPE position calculation when chunked prefill is enabled (#10388) 电脑星人 2024-11-17 02:10:00 +08:00
  • b98d89efd4 [Misc] Medusa supports custom bias (#10361) Sky Lee 2024-11-17 00:33:01 +08:00
  • 8b6725b0cf [Misc] Update benchmark to support image_url file or http (#10287) Jaehyun An 2024-11-16 19:15:40 +09:00
  • 1d75472626 [BugFix] [Kernel] Fix GPU SEGV occuring in fused_moe kernel (#10385) rasmith 2024-11-16 03:55:05 -06:00
  • 2f427c2d16 [misc][plugin] improve log messages (#10386) youkaichao 2024-11-16 01:23:20 -08:00
  • 755b85359b [doc] add doc for the plugin system (#10372) youkaichao 2024-11-15 21:46:27 -08:00
  • 32e46e000f [Frontend] Automatic detection of chat content format from AST (#9919) Cyrus Leung 2024-11-16 13:35:40 +08:00
  • 4f168f69a3 [Docs] Misc updates to TPU installation instructions (#10165) Michael Green 2024-11-15 21:26:17 +00:00
  • 3e8d14d8a1 [Doc] Move PR template content to docs (#10159) Russell Bryant 2024-11-15 16:20:20 -05:00
  • a067f85e08 [Frontend] Add --version flag to CLI (#10369) Russell Bryant 2024-11-15 16:13:53 -05:00
  • c76ac49d26 [Docs] Add Nebius as sponsors (#10371) Simon Mo 2024-11-15 12:47:40 -08:00
  • a6221a144a [Misc] bump mistral common version (#10367) v0.6.4.post1 Simon Mo 2024-11-15 09:48:07 -08:00
  • 79ee45b428 [Misc] Bump up test_fused_moe tolerance (#10364) ElizaWszola 2024-11-15 17:31:18 +01:00
  • 691a3ec047 [Bugfix] Ensure special tokens are properly filtered out for guided structured output with MistralTokenizer (#10363) Guillaume Calmettes 2024-11-15 15:50:40 +01:00
  • 3a763ba0c3 [core][misc] keep compatibility for old-style classes (#10356) youkaichao 2024-11-15 05:55:51 -08:00
  • f2056f726d [Misc] Fix some help info of arg_utils to improve readability (#10362) shangmingc 2024-11-15 20:40:30 +08:00
  • 1d65ec7eeb [Bugfix] Fix fully sharded LoRA bug (#10352) Jee Jee Li 2024-11-15 18:34:58 +08:00
  • 26908554b2 [Doc] Remove float32 choice from --lora-dtype (#10348) Xin Yang 2024-11-15 02:22:57 -08:00
  • b311efd0bd [Misc] Fix import error in tensorizer tests and cleanup some code (#10349) Cyrus Leung 2024-11-15 17:34:17 +08:00
  • 3d158cdc8d Add default value to avoid Falcon crash (#5363) (#10347) wchen61 2024-11-15 16:52:20 +08:00
  • 02dbf30e9a [Build] skip renaming files for release wheels pipeline (#9671) v0.6.4 Simon Mo 2024-11-14 23:31:52 -08:00
  • 2ac6d0e75b [Misc] Consolidate pooler config overrides (#10351) Cyrus Leung 2024-11-15 14:59:00 +08:00
  • 2ec8827288 [Bugfix] Qwen-vl output is inconsistent in speculative decoding (#10350) Sky Lee 2024-11-15 13:40:10 +08:00
  • b40cf6402e [Model] Support Qwen2 embeddings and use tags to select model tests (#10184) Cyrus Leung 2024-11-15 12:23:09 +08:00
  • 2885ba0e24 [Misc] Change RedundantReshapesPass and FusionPass logging from info to debug (#10308) Tyler Michael Smith 2024-11-14 21:44:26 -05:00
  • bf2ddc6610 [bugfix] Fix static asymmetric quantization case (#10334) Luka Govedič 2024-11-14 20:35:11 -05:00
  • 972112d82f [Bugfix] Fix unable to load some models (#10312) Cyrus Leung 2024-11-15 08:55:54 +08:00
  • 11cd1ae6ad [Tool parsing] Improve / correct mistral tool parsing (#10333) Patrick von Platen 2024-11-15 01:42:49 +01:00
  • 554af9228d [Bugfix] use AF_INET6 for OpenAI Compatible Server with ipv6 (#9583) Zijin Xiao 2024-11-15 08:38:53 +08:00
  • b2e0ad3b59 [Perf] Reduce peak memory usage of llama (#10339) Murali Andoorveedu 2024-11-14 16:38:20 -08:00
  • 4a18fd14ba Support Roberta embedding models (#9387) Maximilien de Bayser 2024-11-14 18:23:29 -03:00
  • 1dbae0329c [Docs] Publish meetup slides (#10331) Woosuk Kwon 2024-11-14 08:19:38 -08:00
  • 675d603400 [CI/Build] Make shellcheck happy (#10285) Cyrus Leung 2024-11-14 17:47:53 +08:00
  • 03025c023f [CI/Build] Fix CPU CI online inference timeout (#10314) Isotr0py 2024-11-14 16:45:32 +08:00
  • 29f3ef26a3 [ci][distributed] disable hanging tests (#10317) youkaichao 2024-11-14 00:23:39 -08:00
  • 294bf467ba [Model] Add BNB quantization support for Idefics3 (#10310) B-201 2024-11-14 14:31:44 +08:00
  • 52b48c1ead [BugFix]: properly deserialize tool_calls iterator before processing by mistral-common when MistralTokenizer is used (#9951) Guillaume Calmettes 2024-11-14 05:48:16 +01:00
  • f67ce05d0b [Frontend] Pythonic tool parser (#9859) Mike Depinet 2024-11-13 20:14:34 -08:00
  • e0853b6508 [Misc] format.sh: Simplify tool_version_check (#10305) Russell Bryant 2024-11-13 22:12:35 -05:00
  • 504ac53d18 [misc] error early for old-style class (#10304) youkaichao 2024-11-13 18:55:39 -08:00
  • 15bb8330aa [Bugfix] Fix tensor parallel for qwen2 classification model (#10297) Isotr0py 2024-11-14 10:54:59 +08:00
  • ac49b59d8b [Bugfix] bitsandbytes models fail to run pipeline parallel (#10200) HoangCongDuc 2024-11-14 00:56:39 +08:00
  • 0b8bb86bf1 [1/N] Initial prototype for multi-modal processor (#10044) Cyrus Leung 2024-11-13 20:39:03 +08:00