Commit Graph

  • f03cc667a0 [Misc] Minor fixes in requirements.txt (#3769) Woosuk Kwon 2024-04-01 03:15:48 -07:00
  • 563c1d7ec5 [CI/Build] Make Marlin Tests Green (#3753) Robert Shaw 2024-03-30 21:18:34 -05:00
  • 9c82a1bec3 [Doc] Update installation doc (#3746) youkaichao 2024-03-30 16:34:38 -07:00
  • b6d103542c [Kernel] Layernorm performance optimization (#3662) mawong-amd 2024-03-30 14:26:38 -07:00
  • 51c31bc10c CMake build elf without PTX (#3739) v0.4.0 Simon Mo 2024-03-29 18:53:08 -07:00
  • 3ad438c66f Fix build when nvtools is missing (#3698) bnellnm 2024-03-29 21:52:39 -04:00
  • 203d4f82ac [Core][Bugfix] cache len of tokenizer (#3741) youkaichao 2024-03-29 18:46:39 -07:00
  • 991143cfcd [BugFix] Use consistent logger everywhere (#3738) Nick Hill 2024-03-29 16:26:44 -07:00
  • 8b2d3cbc1b usage lib get version another way (#3735) Simon Mo 2024-03-29 15:57:08 -07:00
  • 9765b5c406 [ROCm][Bugfix] Fixed several bugs related to rccl path and attention selector logic (#3699) Hongxia Yang 2024-03-29 17:52:36 -04:00
  • 430530fc18 bump version to v0.4.0 (#3712) Simon Mo 2024-03-29 12:28:33 -07:00
  • 97356f3c7e [Bugfix] Command-R Max Model Length (#3727) Roger Wang 2024-03-29 12:27:51 -07:00
  • f510395bbf [BugFix][Frontend] Fix completion logprobs=0 error (#3731) Roy 2024-03-30 00:38:21 +08:00
  • 6110c39dc8 [BugFix] Fix tokenizer out of vocab size (#3685) Roy 2024-03-29 23:18:59 +08:00
  • d8658c8cc1 Usage Stats Collection (#2852) yhu422 2024-03-28 22:16:12 -07:00
  • 7bc94a0fdd add ccache to docker build image (#3704) Simon Mo 2024-03-28 22:14:24 -07:00
  • 756b30a5f3 [Core][Test] move local_rank to the last arg with default value(#3711) youkaichao 2024-03-28 21:19:45 -07:00
  • 395aa823ea [Misc] Minor type annotation fix (#3716) Woosuk Kwon 2024-03-28 21:12:24 -07:00
  • 26422e477b [Test] Make model tests run again and remove --forked from pytest (#3631) SangBin Cho 2024-03-29 13:06:40 +09:00
  • f342153b48 Revert "bump version to v0.4.0" (#3708) youkaichao 2024-03-28 18:49:42 -07:00
  • 27a57cad52 bump version to v0.4.0 (#3705) Simon Mo 2024-03-28 18:26:51 -07:00
  • 98a42e7078 [Benchmark] Change mii to use persistent deployment and support tensor parallel (#3628) Yile (Michael) Gu 2024-03-28 17:33:52 -07:00
  • 0267fef52a [Core] fix del of communicator (#3702) youkaichao 2024-03-28 17:24:58 -07:00
  • 4716a32dd4 fix logging msg for block manager (#3701) Simon Mo 2024-03-28 16:29:55 -07:00
  • c0935c96d3 [Bugfix] Set enable_prefix_caching=True in prefix caching example (#3703) Woosuk Kwon 2024-03-28 16:26:30 -07:00
  • cb40b3ab6b [Kernel] Add MoE Triton kernel configs for A100 40GB (#3700) Woosuk Kwon 2024-03-28 15:26:24 -07:00
  • 515386ef3c [Core] Support multi-node inference(eager and cuda graph) (#3686) Roy 2024-03-29 06:01:55 +08:00
  • a4075cba4d [CI] Add test case to run examples scripts (#3638) Simon Mo 2024-03-28 14:36:10 -07:00
  • 96aa014d1e fix benchmark format reporting in buildkite (#3693) Simon Mo 2024-03-28 14:35:16 -07:00
  • 1715056fef [Bugfix] Update neuron_executor.py to add optional vision_language_config (#3695) Adam Boeglin 2024-03-28 10:43:34 -07:00
  • b51c1cc9d2 [2/N] Chunked prefill data update (#3538) SangBin Cho 2024-03-29 02:06:01 +09:00
  • ce567a2926 [Kernel] DBRX Triton MoE kernel H100 (#3692) Roger Wang 2024-03-28 10:05:34 -07:00
  • d6ea427f04 [Model] Add support for Qwen2MoeModel (#3346) wenyujin333 2024-03-28 23:19:59 +08:00
  • 14ccd94c89 [Core][Bugfix]Refactor block manager for better testability (#3492) Cade Daniel 2024-03-27 23:59:28 -07:00
  • 8267b06c30 [Kernel] Add Triton MoE kernel configs for DBRX on A100 (#3679) Woosuk Kwon 2024-03-27 22:22:25 -07:00
  • 3492859b68 [CI/Build] update default number of jobs and nvcc threads to avoid overloading the system (#3675) youkaichao 2024-03-27 21:18:54 -07:00
  • 098e1776ba [Model] Add support for xverse (#3610) hxer7963 2024-03-28 09:12:54 +08:00
  • 10e6322283 [Model] Fix and clean commandr (#3671) Roy 2024-03-28 08:20:00 +08:00
  • 6d9aa00fc4 [Docs] Add Command-R to supported models (#3669) Woosuk Kwon 2024-03-27 15:20:00 -07:00
  • 1182607e18 Add support for Cohere's Command-R model (#3433) zeppombal 2024-03-27 21:19:32 +00:00
  • 45b6ef6513 feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277) Roger Wang 2024-03-27 13:39:26 -07:00
  • 1956931436 [Misc] add the "download-dir" option to the latency/throughput benchmarks (#3621) AmadeusChan 2024-03-27 16:39:05 -04:00
  • e24336b5a7 [Model] Add support for DBRX (#3660) Megha Agarwal 2024-03-27 13:01:46 -07:00
  • d18f4e73f3 [Bugfix] [Hotfix] fix nccl library name (#3661) youkaichao 2024-03-27 10:23:54 -07:00
  • 82c540bebf [Bugfix] More faithful implementation of Gemma (#3653) Woosuk Kwon 2024-03-27 09:37:18 -07:00
  • 8f44facddd [Core] remove cupy dependency (#3625) youkaichao 2024-03-27 00:33:26 -07:00
  • e66b629c04 [Misc] Minor fix in KVCache type (#3652) Woosuk Kwon 2024-03-26 23:14:06 -07:00
  • 76879342a3 [Doc]add lora support (#3649) Jee Li 2024-03-27 10:06:46 +08:00
  • 566b57c5c4 [Kernel] support non-zero cuda devices in punica kernels (#3636) Jee Li 2024-03-27 08:37:42 +08:00
  • 0dc72273b8 [BugFix] Fix ipv4 address parsing regression (#3645) Nick Hill 2024-03-26 14:39:44 -07:00
  • a979d9771e [Bugfix] Fix ipv6 address parsing bug (#3641) liiliiliil 2024-03-27 02:58:20 +08:00
  • 8af890a865 Enable more models to inference based on LoRA (#3382) Jee Li 2024-03-26 09:09:31 +08:00
  • dfeb2ecc3a [Misc] Include matched stop string/token in responses (#2976) Nick Hill 2024-03-25 17:31:32 -07:00
  • 3a243095e5 Optimize _get_ranks in Sampler (#3623) Antoni Baum 2024-03-25 16:03:02 -07:00
  • 64172a976c [Feature] Add vision language model support. (#3042) xwjiang2010 2024-03-25 14:16:30 -07:00
  • f408d05c52 hotfix isort on logprobs ranks pr (#3622) Simon Mo 2024-03-25 11:55:46 -07:00
  • 0b4997e05c [Bugfix] API stream returning two stops (#3450) Dylan Hawk 2024-03-25 10:14:34 -07:00
  • c13ad1b7bd feat: implement the min_tokens sampling parameter (#3124) Travis Johnson 2024-03-25 11:14:26 -06:00
  • 819924e749 [Core] Adding token ranks along with logprobs (#3516) Swapnil Parekh 2024-03-25 13:13:10 -04:00
  • 01bfb22b41 [CI] Try introducing isort. (#3495) SangBin Cho 2024-03-25 23:59:47 +09:00
  • e67c295b0c [Bugfix] fix automatic prefix args and add log info (#3608) TianYu GUO 2024-03-25 20:35:22 +08:00
  • 925f3332ca [Core] Refactor Attention Take 2 (#3462) Woosuk Kwon 2024-03-24 21:39:33 -07:00
  • b0dfa91dd7 [Model] Add starcoder2 awq support (#3569) 少年 2024-03-25 12:07:36 +08:00
  • 56a8652f33 [Bugfix] store lock file in tmp directory (#3578)" (#3599) Woosuk Kwon 2024-03-24 20:06:50 -07:00
  • 6d93d35308 [BugFix] tensor.get_device() -> tensor.device (#3604) Kunshang Ji 2024-03-25 10:01:13 +08:00
  • 837e185142 [CI/Build] fix flaky test (#3602) youkaichao 2024-03-24 17:43:05 -07:00
  • 42bc386129 [CI/Build] respect the common environment variable MAX_JOBS (#3600) youkaichao 2024-03-24 17:04:00 -07:00
  • 8b268a46a7 [CI] typo fix: is_hip --> is_hip() (#3595) youkaichao 2024-03-24 16:03:06 -07:00
  • 41deac4a3d [BugFix] 1D query fix for MoE models (#3597) Nick Hill 2024-03-24 16:00:16 -07:00
  • af9e53496f [BugFix] Fix Falcon tied embeddings (#3590) Woosuk Kwon 2024-03-24 06:34:01 -07:00
  • f8a12ecc7f [Misc] Bump transformers version (#3592) Roger Wang 2024-03-24 06:32:45 -07:00
  • 3c5ab9b811 [Misc] Fix BLOOM copyright notice (#3591) Woosuk Kwon 2024-03-23 23:30:56 -07:00
  • 743a0b7402 [Bugfix] use SoftLockFile instead of LockFile (#3578) kota-iizuka 2024-03-24 03:43:11 +09:00
  • bfdb1ba5c3 [Core] Improve detokenization performance for prefill (#3469) Antoni Baum 2024-03-22 13:44:12 -07:00
  • cf2f084d56 Dynamic scheduler delay to improve ITL performance (#3279) Thomas Parnell 2024-03-22 20:28:14 +01:00
  • f721096d48 [BugFix] Some fixes for custom allreduce kernels (#2760) Hanzhi Zhou 2024-03-21 23:02:58 -07:00
  • e90fc21f2e [Hardware][Neuron] Refactor neuron support (#3471) Zhuohan Li 2024-03-21 18:22:17 -07:00
  • ea5f14e6ff [Bugfix][Model] Fix Qwen2 (#3554) Roy 2024-03-22 08:18:58 +08:00
  • b7050ca7df [BugFix] gemma loading after quantization or LoRA. (#3553) Taemin Lee 2024-03-22 05:16:57 +09:00
  • c188ecb080 [Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config (#3551) Woosuk Kwon 2024-03-21 07:58:12 -07:00
  • 865732342b [Misc][Log] Add log for tokenizer length not equal to vocabulary size (#3500) Roy 2024-03-21 18:07:48 +08:00
  • 4c07dd28c0 [🚀 Ready to be merged] Added support for Jais models (#3183) Lalit Pradhan 2024-03-21 13:45:24 +04:00
  • 3bbff9e5ab Fix 1D query issue from _prune_hidden_states (#3539) SangBin Cho 2024-03-21 17:49:06 +09:00
  • 6ebd02bdef [PREFIX CACHING FOLLOW UP] OrderedDict-based evictor (#3431) ElizaWszola 2024-03-21 07:20:04 +01:00
  • 523e30ea0c [BugFix] Hot fix in setup.py for neuron build (#3537) Zhuohan Li 2024-03-20 17:59:52 -07:00
  • f1c0fc3919 Migrate logits computation and gather to model_runner (#3233) Roy 2024-03-21 07:25:01 +08:00
  • 6e435de766 [1/n][Chunked Prefill] Refactor input query shapes (#3236) SangBin Cho 2024-03-21 06:46:05 +09:00
  • 426ec4ec67 [1/n] Triton sampling kernel (#3186) Antoni Baum 2024-03-20 14:45:08 -07:00
  • 80e254834d [Bugfix] Fix ROCm support in CMakeLists.txt (#3534) James Whedbee 2024-03-20 16:05:03 -05:00
  • ba8ae1d84f Check for _is_cuda() in compute_num_jobs (#3481) bnellnm 2024-03-20 13:06:56 -04:00
  • 84eaa68425 Abort when nvcc command is not found in the PATH (#3527) Allen.Dou 2024-03-21 00:28:29 +08:00
  • 5ee14494e4 [Misc] Remove cache stream and cache events (#3461) Woosuk Kwon 2024-03-20 00:38:53 -07:00
  • 4ad521d8b5 [Core] Add generic typing to LRUCache (#3511) Nick Hill 2024-03-20 00:36:09 -07:00
  • 9474e89ba4 [PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled (#3357) ElizaWszola 2024-03-20 08:11:11 +01:00
  • 20478c4d3a Use lru_cache for some environment detection utils (#3508) Simon Mo 2024-03-19 14:34:15 -07:00
  • 63e8b28a99 [Doc] minor fix of spelling in amd-installation.rst (#3506) Jim Burtoft 2024-03-19 16:32:30 -04:00
  • cc63d03fbb Revert "[Core] Cache some utils" (#3507) Simon Mo 2024-03-19 13:22:58 -07:00
  • 2a60c9bd17 [Doc] minor fix to neuron-installation.rst (#3505) Jim Burtoft 2024-03-19 16:21:35 -04:00
  • c614cfee58 Update dockerfile with ModelScope support (#3429) ifsheldon 2024-03-20 01:54:59 +08:00
  • 7341c77d69 [BugFix] Avoid initializing CUDA too early (#3487) Nick Hill 2024-03-18 23:05:20 -07:00