Commit Graph

  • 2b5bf20988 [torch.compile] Adding torch compile annotations to some models (#9876) Yongzao 2024-11-01 15:25:47 +08:00
  • 93a76dd21d [Model] Support bitsandbytes for MiniCPMV (#9891) Michael Goin 2024-11-01 01:31:56 -04:00
  • 566cd27797 [torch.compile] rework test plans (#9866) youkaichao 2024-10-31 22:20:17 -07:00
  • 37a4947dcd [Bugfix] Fix layer skip logic with bitsandbytes (#9887) Michael Goin 2024-11-01 01:12:44 -04:00
  • 96e0c9cbbd [torch.compile] directly register custom op (#9896) youkaichao 2024-10-31 21:56:09 -07:00
  • 031a7995f3 [Bugfix][Frontend] Reject guided decoding in multistep mode (#9892) Joe Runde 2024-10-31 19:09:46 -06:00
  • b63c64d95b [ci/build] Configure dependabot to update pip dependencies (#9811) Kevin H. Luu 2024-10-31 12:55:38 -10:00
  • 9fb12f7848 [BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 (#9838) Mor Zusman 2024-10-31 22:06:25 +02:00
  • 55650c83a0 [Bugfix] Fix illegal memory access error with chunked prefill, prefix caching, block manager v2 and xformers enabled together (#9532) sasha0552 2024-10-31 18:46:36 +00:00
  • 77f7ef2908 [CI/Build] Adding a forced docker system prune to clean up space (#9849) Alexei-V-Ivanov-AMD 2024-10-31 12:02:58 -05:00
  • 16b8f7a86f [CI/Build] Add Model Tests for Qwen2-VL (#9846) Alex Brooks 2024-10-31 10:10:52 -06:00
  • 5608e611c2 [Doc] Update Qwen documentation (#9869) Jee Jee Li 2024-10-31 16:54:18 +08:00
  • 3ea2dc2ec4 [Misc] Remove deprecated arg for cuda graph capture (#9864) Roger Wang 2024-10-31 00:22:07 -07:00
  • d087bf863e [Model] Support quantization of Qwen2VisionTransformer (#9817) Michael Goin 2024-10-31 01:41:20 -04:00
  • 890ca36072 Revert "[Bugfix] Use host argument to bind to interface (#9798)" (#9852) Kevin H. Luu 2024-10-30 15:44:51 -10:00
  • abbfb6134d [Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837) Guillaume Calmettes 2024-10-31 02:15:56 +01:00
  • 64384bbcdf [torch.compile] upgrade tests (#9858) youkaichao 2024-10-30 16:34:22 -07:00
  • 00d91c8a2c [CI/Build] Simplify exception trace in api server tests (#9787) Yongzao 2024-10-31 05:52:05 +08:00
  • c2cd1a2142 [doc] update pp support (#9853) youkaichao 2024-10-30 13:36:51 -07:00
  • c787f2d81d [Neuron] Update Dockerfile.neuron to fix build failure (#9822) Harsha vardhan manoj Bikki 2024-10-30 12:22:02 -07:00
  • 33d257735f [Doc] link bug for multistep guided decoding (#9843) Joe Runde 2024-10-30 11:28:29 -06:00
  • 3b3f1e7436 [Bugfix][core] replace heartbeat with pid check (#9818) Joe Runde 2024-10-30 10:34:07 -06:00
  • 9ff4511e43 [Misc] Add chunked-prefill support on FlashInfer. (#9781) Elfie Guo 2024-10-30 09:33:53 -07:00
  • 81f09cfd80 [Model] Support math-shepherd-mistral-7b-prm model (#9697) Went-Liang 2024-10-31 00:33:42 +08:00
  • cc98f1e079 [CI/Build] VLM Test Consolidation (#9372) Alex Brooks 2024-10-30 10:32:17 -06:00
  • 211fe91aa8 [TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA (#9438) Woosuk Kwon 2024-10-30 02:41:38 -07:00
  • 6aa6020f9b [Misc] Specify minimum pynvml version (#9827) Jee Jee Li 2024-10-30 14:05:43 +08:00
  • ff5ed6e1bc [torch.compile] rework compile control with piecewise cudagraph (#9715) youkaichao 2024-10-29 23:03:49 -07:00
  • 7b0365efef [Doc] Add the DCO to CONTRIBUTING.md (#9803) Russell Bryant 2024-10-30 01:22:23 -04:00
  • 04a3ae0aca [Bugfix] Fix multi nodes TP+PP for XPU (#8884) Yan Ma 2024-10-30 12:34:45 +08:00
  • 62fac4b9aa [ci/build] Pin CI dependencies version with pip-compile (#9810) Kevin H. Luu 2024-10-29 17:34:55 -10:00
  • 226688bd61 [Bugfix][VLM] Make apply_fp8_linear work with >2D input (#9812) Michael Goin 2024-10-29 22:49:44 -04:00
  • 64cb1cdc3f Update README.md (#9819) Lily Liu 2024-10-29 17:28:43 -07:00
  • 1ab6f6b4ad [core][distributed] fix custom allreduce in pytorch 2.5 (#9815) youkaichao 2024-10-29 17:06:24 -07:00
  • bc73e9821c [Bugfix] Fix prefix strings for quantized VLMs (#9772) Michael Goin 2024-10-29 19:02:59 -04:00
  • 8d7724104a [Docs] Add notes about Snowflake Meetup (#9814) Simon Mo 2024-10-29 15:19:02 -07:00
  • 882a1ad0de [Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339) Will Eaton 2024-10-29 18:07:37 -04:00
  • 67bdf8e523 [Bugfix][Frontend] Guard against bad token ids (#9634) Joe Runde 2024-10-29 16:13:20 -05:00
  • 0ad216f575 [MISC] Set label value to timestamp over 0, to keep track of recent history (#9777) Kunjan 2024-10-29 12:52:19 -07:00
  • 7585ec996f [CI/Build] mergify: fix rules for ci/build label (#9804) Russell Bryant 2024-10-29 15:24:42 -04:00
  • ab6f981671 [CI][Bugfix] Skip chameleon for transformers 4.46.1 (#9808) Michael Goin 2024-10-29 14:12:43 -04:00
  • ac3d748dba [Model] Add LlamaEmbeddingModel as an embedding Implementation of LlamaModel (#9806) Junichi Sato 2024-10-30 02:40:35 +09:00
  • 0ce7798f44 [Misc]: Typo fix: Renaming classes (casualLM -> causalLM) (#9801) yannicks1 2024-10-29 18:39:20 +01:00
  • 0f43387157 [Bugfix] Use host argument to bind to interface (#9798) Sven Seeberg 2024-10-29 18:37:59 +01:00
  • 08600ddc68 Fix the log to correct guide user to install modelscope (#9793) tastelikefeet 2024-10-30 01:36:59 +08:00
  • 74fc2d77ae [Misc] Add metrics for request queue time, forward time, and execute time (#9659) 科英 2024-10-30 01:32:56 +08:00
  • 622b7ab955 [Hardware] using current_platform.seed_everything (#9785) wangshuai09 2024-10-29 22:47:44 +08:00
  • 09500f7dde [Model] Add BNB quantization support for Mllama (#9720) Isotr0py 2024-10-29 20:20:02 +08:00
  • ef7865b4f9 [Frontend] re-enable multi-modality input in the new beam search implementation (#9427) Zhong Qishuai 2024-10-29 19:49:47 +08:00
  • eae3d48181 [Bugfix] Use temporary directory in registry (#9721) Cyrus Leung 2024-10-29 13:08:20 +08:00
  • e74f2d448c [Doc] Specify async engine args in docs (#9726) Cyrus Leung 2024-10-29 13:07:57 +08:00
  • 7a4df5f200 [Model][LoRA]LoRA support added for Qwen (#9622) Jee Jee Li 2024-10-29 12:14:07 +08:00
  • c5d7fb9ddc [Doc] fix third-party model example (#9771) Russell Bryant 2024-10-28 22:39:21 -04:00
  • 76ed5340f0 [torch.compile] add deepseek v2 compile (#9775) youkaichao 2024-10-28 14:35:17 -07:00
  • 97b61bfae6 [misc] avoid circular import (#9765) youkaichao 2024-10-28 13:51:23 -07:00
  • aa0addb397 Adding "torch compile" annotations to moe models (#9758) Yongzao 2024-10-29 04:49:56 +08:00
  • 5f8d8075f9 [Model][VLM] Add multi-video support for LLaVA-Onevision (#8905) litianjian 2024-10-29 02:04:10 +08:00
  • 8b0e4f2ad7 [CI/Build] Adopt Mergify for auto-labeling PRs (#9259) Russell Bryant 2024-10-28 12:38:09 -04:00
  • 2adb4409e0 [Bugfix] Fix ray instance detect issue (#9439) Yan Ma 2024-10-28 15:13:03 +08:00
  • feb92fbe4a Fix beam search eos (#9627) Robert Shaw 2024-10-28 02:59:37 -04:00
  • 32176fee73 [torch.compile] support moe models (#9632) youkaichao 2024-10-27 21:58:04 -07:00
  • 4e2d95e372 [Hardware][ROCM] using current_platform.is_rocm (#9642) wangshuai09 2024-10-28 12:07:00 +08:00
  • 34a9941620 [Bugfix] Fix load config when using bools (#9533) madt2709 2024-10-27 10:46:41 -07:00
  • e130c40e4e Fix cache management in "Close inactive issues and PRs" actions workflow (#9734) Harry Mellor 2024-10-27 17:30:03 +00:00
  • 3cb07a36a2 [Misc] Upgrade to pytorch 2.5 (#9588) bnellnm 2024-10-27 05:44:24 -04:00
  • 8549c82660 [core] cudagraph output with tensor weak reference (#9724) youkaichao 2024-10-27 00:19:28 -07:00
  • 67a6882da4 [Misc] SpecDecodeWorker supports profiling (#9719) 科英 2024-10-27 12:18:03 +08:00
  • 6650e6a930 [Model] Add classification Task with Qwen2ForSequenceClassification (#9704) kakao-kevin-us 2024-10-27 02:53:35 +09:00
  • 07e981fdf4 [Frontend] Bad words sampling parameter (#9717) Vasiliy Alekseev 2024-10-26 19:29:38 +03:00
  • 55137e8ee3 Fix: MI100 Support By Bypassing Custom Paged Attention (#9560) ErkinSagiroglu 2024-10-26 13:12:57 +01:00
  • 5cbdccd151 [Hardware][openvino] is_openvino --> current_platform.is_openvino (#9716) Mengqing Cao 2024-10-26 18:59:06 +08:00
  • 067e77f9a8 [Bugfix] Steaming continuous_usage_stats default to False (#9709) Sam Stoelinga 2024-10-25 22:05:47 -07:00
  • 6567e13724 [Bugfix] Fix crash with llama 3.2 vision models and guided decoding (#9631) Travis Johnson 2024-10-25 16:42:56 -06:00
  • 228cfbd03f [Doc] Improve quickstart documentation (#9256) Rafael Vasquez 2024-10-25 17:32:10 -04:00
  • ca0d92227e [Bugfix] Fix compressed_tensors_moe bad config.strategy (#9677) Michael Goin 2024-10-25 15:40:33 -04:00
  • 9645b9f646 [V1] Support sliding window attention (#9679) Woosuk Kwon 2024-10-24 22:20:37 -07:00
  • a6f3721861 [Model] add a lora module for granite 3.0 MoE models (#9673) Will Johnson 2024-10-25 01:00:17 -04:00
  • 9f7b4ba865 [ci/Build] Skip Chameleon for transformers 4.46.0 on broadcast test #9675 (#9676) Kevin H. Luu 2024-10-24 17:59:00 -10:00
  • c91ed47c43 [Bugfix] Remove xformers requirement for Pixtral (#9597) Michael Goin 2024-10-24 18:38:05 -04:00
  • 59449095ab [Performance][Kernel] Fused_moe Performance Improvement (#9384) Charlie Fu 2024-10-24 17:37:52 -05:00
  • e26d37a185 [Log][Bugfix] Fix default value check for image_url.detail (#9663) Michael Goin 2024-10-24 13:44:38 -04:00
  • 722d46edb9 [Model] Compute Llava Next Max Tokens / Dummy Data From Gridpoints (#9650) Alex Brooks 2024-10-24 11:42:24 -06:00
  • c866e0079d [CI/Build] Fix VLM test failures when using transformers v4.46 (#9666) Cyrus Leung 2024-10-25 01:40:40 +08:00
  • d27cfbf791 [torch.compile] Adding torch compile annotations to some models (#9641) Yongzao 2024-10-25 00:31:42 +08:00
  • de662d32b5 Increase operation per run limit for "Close inactive issues and PRs" workflow (#9661) Harry Mellor 2024-10-24 17:17:45 +01:00
  • f58454968f [Bugfix]Disable the post_norm layer of the vision encoder for LLaVA models (#9653) litianjian 2024-10-24 22:52:07 +08:00
  • b979143d5b [Doc] Move additional tips/notes to the top (#9647) Cyrus Leung 2024-10-24 17:43:59 +08:00
  • ad6f78053e [torch.compile] expanding support and fix allgather compilation (#9637) Yongzao 2024-10-24 16:32:15 +08:00
  • 295a061fb3 [Kernel] add kernel for FATReLU (#9610) Jee Jee Li 2024-10-24 16:18:27 +08:00
  • 8a02cd045a [torch.compile] Adding torch compile annotations to some models (#9639) Yongzao 2024-10-24 15:54:57 +08:00
  • 4fdc581f9e [core] simplify seq group code (#9569) youkaichao 2024-10-24 00:16:44 -07:00
  • 3770071eb4 [V1][Bugfix] Clean up requests when aborted (#9629) Woosuk Kwon 2024-10-23 23:33:22 -07:00
  • 836e8ef6ee [Bugfix] Fix PP for ChatGLM and Molmo (#9422) Cyrus Leung 2024-10-24 14:12:05 +08:00
  • 056a68c7db [XPU] avoid triton import for xpu (#9440) Yan Ma 2024-10-24 13:14:00 +08:00
  • 33bab41060 [Bugfix]: Make chat content text allow type content (#9358) Vinay R Damodaran 2024-10-24 01:05:49 -04:00
  • b7df53cd42 [Bugfix] Use "vision_model" prefix for MllamaVisionModel (#9628) Michael Goin 2024-10-23 22:07:44 -04:00
  • bb01f2915e [Bugfix][Model] Fix Mllama SDPA illegal memory access for batched multi-image (#9626) Michael Goin 2024-10-23 22:03:44 -04:00
  • b548d7a5f4 [CI/Build] Add bot to close stale issues and PRs (#9436) Russell Bryant 2024-10-23 18:45:26 -04:00
  • fc6c274626 [Model] Add Qwen2-Audio model support (#9248) Yunfei Chu 2024-10-24 01:54:22 +08:00
  • 150b779081 [Frontend] Enable Online Multi-image Support for MLlama (#9393) Alex Brooks 2024-10-23 11:28:57 -06:00