Commit Graph

  • 324960a95c [TPU][CI] Update torchxla version in requirement-tpu.txt (#12422) Siyuan Liu 2025-01-24 23:23:03 -08:00
  • f1fc0510df [Misc] Add FA2 support to ViT MHA layer (#12355) Isotr0py 2025-01-25 15:07:35 +08:00
  • bf21481dde [ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408) Divakar Verma 2025-01-24 22:17:19 -06:00
  • fb30ee92ee [Bugfix] Fix BLIP-2 processing (#12412) Cyrus Leung 2025-01-25 11:42:42 +08:00
  • 221d388cc5 [Bugfix][Kernel] Fix moe align block issue for mixtral (#12413) ElizaWszola 2025-01-24 20:49:28 -05:00
  • 3132a933b6 [Bugfix][Kernel] FA3 Fix - RuntimeError: This flash attention build only supports pack_gqa (for build size reasons). (#12405) Lucas Wilkinson 2025-01-24 15:20:59 -05:00
  • df5dafaa5b [Misc] Remove deprecated code (#12383) Cyrus Leung 2025-01-25 03:45:20 +08:00
  • ab5bbf5ae3 [Bugfix][Kernel] Fix CUDA 11.8 being broken by FA3 build (#12375) Lucas Wilkinson 2025-01-24 10:27:59 -05:00
  • 3bb8e2c9a2 [Misc] Enable proxy support in benchmark script (#12356) Junichi Sato 2025-01-24 23:58:26 +09:00
  • e784c6b998 [ci/build] sync default value for wheel size (#12398) youkaichao 2025-01-24 17:54:29 +08:00
  • 9a0f3bdbe5 [Hardware][Gaudi][Doc] Add missing step in setup instructions (#12382) Mohit Deopujari 2025-01-24 01:43:49 -08:00
  • c7c9851036 [ci/build] fix wheel size check (#12396) youkaichao 2025-01-24 17:31:25 +08:00
  • 3c818bdb42 [Misc] Use VisionArena Dataset for VLM Benchmarking (#12389) Roger Wang 2025-01-24 00:22:04 -08:00
  • 6dd94dbe94 [perf] fix perf regression from #12253 (#12380) youkaichao 2025-01-24 11:34:27 +08:00
  • 0e74d797ce [V1] Increase default batch size for H100/H200 (#12369) Woosuk Kwon 2025-01-23 19:19:55 -08:00
  • 55ef66edf4 Update compressed-tensors version (#12367) Dipika Sikka 2025-01-23 22:19:42 -05:00
  • 5e5630a478 [Bugfix] Path join when building local path for S3 clone (#12353) omer-dayan 2025-01-24 05:06:07 +02:00
  • d3d6bb13fb Set weights_only=True when using torch.load() (#12366) Russell Bryant 2025-01-23 21:17:30 -05:00
  • 24b0205f58 [V1][Frontend] Coalesce bunched RequestOutputs (#12298) Nick Hill 2025-01-23 17:17:41 -08:00
  • c5cffcd0cd [Docs] Update spec decode + structured output in compat matrix (#12373) Russell Bryant 2025-01-23 20:15:52 -05:00
  • 682b55bc07 [Docs] Add meetup slides (#12345) Woosuk Kwon 2025-01-23 14:10:03 -08:00
  • 9726ad676d [Misc] Fix OpenAI API Compatibility Issues in Benchmark Script (#12357) Junichi Sato 2025-01-24 07:02:13 +09:00
  • eb5cb5e528 [BugFix] Fix parameter names and process_after_weight_loading for W4A16 MoE Group Act Order (#11528) Dipika Sikka 2025-01-23 16:40:33 -05:00
  • 2cbeedad09 [Docs] Document Phi-4 support (#12362) Isotr0py 2025-01-24 03:18:51 +08:00
  • 2c85529bfc [TPU] Update TPU CI to use torchxla nightly on 20250122 (#12334) Siyuan Liu 2025-01-23 10:50:16 -08:00
  • e97f802b2d [FP8][Kernel] Dynamic kv cache scaling factors computation (#11906) Gregory Shtrasberg 2025-01-23 13:04:03 -05:00
  • 6e650f56a1 [torch.compile] decouple compile sizes and cudagraph sizes (#12243) youkaichao 2025-01-24 02:01:30 +08:00
  • 3f50c148fd [core] add wake_up doc and some sanity check (#12361) youkaichao 2025-01-24 02:00:50 +08:00
  • 8c01b8022c [Bugfix] Fix broken internvl2 inference with v1 (#12360) Isotr0py 2025-01-24 01:20:33 +08:00
  • 99d01a5e3d [V1] Simplify M-RoPE (#12352) Roger Wang 2025-01-23 07:13:23 -08:00
  • d07efb31c5 [Doc] Troubleshooting errors during model inspection (#12351) Cyrus Leung 2025-01-23 22:46:58 +08:00
  • 978b45f399 [Kernel] Flash Attention 3 Support (#12093) Lucas Wilkinson 2025-01-23 09:45:48 -05:00
  • c5b4b11d7f [Bugfix] Fix k_proj's bias for whisper self attention (#12342) Isotr0py 2025-01-23 18:15:33 +08:00
  • 8ae5ff2009 [Hardware][Gaudi][BugFix] Fix dataclass error due to triton package update (#12338) liuzhenwei 2025-01-23 16:35:46 +08:00
  • 511627445e [doc] explain common errors around torch.compile (#12340) youkaichao 2025-01-23 14:56:02 +08:00
  • f0ef37233e [V1] Add uncache_blocks (#12333) Cody Yu 2025-01-22 20:19:21 -08:00
  • 7551a34032 [Docs] Document vulnerability disclosure process (#12326) Russell Bryant 2025-01-22 22:44:09 -05:00
  • 01a55941f5 [Docs] Update FP8 KV Cache documentation (#12238) Michael Goin 2025-01-22 22:18:09 -05:00
  • 8d7aa9de71 [Bugfix] Fixing AMD LoRA CI test. (#12329) Alexei-V-Ivanov-AMD 2025-01-22 20:53:02 -06:00
  • 68c4421b6d [AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD (#12282) rasmith 2025-01-22 18:10:37 -06:00
  • aea94362c9 [Frontend][V1] Online serving performance improvements (#12287) Nick Hill 2025-01-22 14:22:12 -08:00
  • 7206ce4ce1 [Core] Support reset_prefix_cache (#12284) Cody Yu 2025-01-22 10:52:27 -08:00
  • 96f6a7596f [Bugfix] Fix HPU multiprocessing executor (#12167) Konrad Zawora 2025-01-22 19:07:07 +01:00
  • 84bee4bd5c [Misc] Improve the readability of BNB error messages (#12320) Jee Jee Li 2025-01-23 00:56:54 +08:00
  • fc66dee76d [Misc] Fix the error in the tip for the --lora-modules parameter (#12319) Robin 2025-01-23 00:48:41 +08:00
  • 6609cdf019 [Doc] Add docs for prompt replacement (#12318) Cyrus Leung 2025-01-22 22:56:29 +08:00
  • 16366ee8bb [Bugfix][VLM] Fix mixed-modality inference backward compatibility for V0 (#12313) Roger Wang 2025-01-22 05:06:36 -08:00
  • 528dbcac7d [Model][Bugfix]: correct Aria model output (#12309) zhou fan 2025-01-22 19:39:19 +08:00
  • cd7b6f0857 [VLM] Avoid unnecessary tokenization (#12310) Cyrus Leung 2025-01-22 19:08:31 +08:00
  • 68ad4e3a8d [Core] Support fully transparent sleep mode (#11743) youkaichao 2025-01-22 14:39:32 +08:00
  • 4004f144f3 [Build] update requirements of no-device (#12299) Mengqing Cao 2025-01-22 14:29:31 +08:00
  • 66818e5b63 [core] separate builder init and builder prepare for each batch (#12253) youkaichao 2025-01-22 14:13:52 +08:00
  • 222a9dc350 [Benchmark] More accurate TPOT calc in benchmark_serving.py (#12288) Nick Hill 2025-01-21 21:46:14 -08:00
  • cbdc4ad5a5 [Ci/Build] Fix mypy errors on main (#12296) Cyrus Leung 2025-01-22 12:06:54 +08:00
  • 016e3676e7 [CI] add docker volume prune to neuron CI (#12291) Liangfu Chen 2025-01-21 18:47:49 -08:00
  • 64ea24d0b3 [ci/lint] Add back default arg for pre-commit (#12279) Kevin H. Luu 2025-01-21 17:15:27 -08:00
  • df76e5af26 [VLM] Simplify post-processing of replacement info (#12269) Cyrus Leung 2025-01-22 08:48:13 +08:00
  • 09ccc9c8f7 [Documentation][AMD] Add information about prebuilt ROCm vLLM docker for perf validation purpose (#12281) Hongxia Yang 2025-01-21 18:49:22 -05:00
  • 69196a9bc7 [BUGFIX] When skip_tokenize_init and multistep are set, execution crashes (#12277) Aleksandr Malyshev 2025-01-21 15:30:46 -08:00
  • 2acba47d9b [bugfix] moe tuning. rm is_navi() (#12273) Divakar Verma 2025-01-21 16:47:32 -06:00
  • 9c485d9e25 [Core] Free CPU pinned memory on environment cleanup (#10477) Jani Monoses 2025-01-21 21:56:41 +02:00
  • fa9ee08121 [Misc] Set default backend to SDPA for get_vit_attn_backend (#12235) wangxiyuan 2025-01-22 03:52:11 +08:00
  • 347eeebe3b [Misc] Remove experimental dep from tracing.py (#12007) Adrian Cole 2025-01-21 11:51:55 -08:00
  • 18fd4a8331 [Bugfix] Multi-sequence broken (#11898) Andy Lo 2025-01-21 19:51:35 +00:00
  • 132a132100 [v1][stats][1/n] Add RequestStatsUpdate and RequestStats types (#10907) Ricky Xu 2025-01-21 11:51:13 -08:00
  • 1e60f87bb3 [Kernel] fix moe_align_block_size error condition (#12239) Jinzhen Lin 2025-01-22 02:30:28 +08:00
  • 9705b90bcf [Bugfix] fix race condition that leads to wrong order of token returned (#10802) Jannis Schönleber 2025-01-21 18:47:04 +01:00
  • 3aec49e56f [ci/build] update nightly torch for gh200 test (#12270) youkaichao 2025-01-21 23:03:17 +08:00
  • c64612802b [Platform] improve platforms getattr (#12264) Mengqing Cao 2025-01-21 22:42:41 +08:00
  • 9a7c3a0042 Remove pytorch comments for outlines + compressed-tensors (#12260) Thomas Parnell 2025-01-21 14:49:08 +01:00
  • b197a5ccfd [V1][Bugfix] Fix data item ordering in mixed-modality inference (#12259) Roger Wang 2025-01-21 05:18:43 -08:00
  • c81081fece [torch.compile] transparent compilation with more logging (#12246) youkaichao 2025-01-21 19:32:55 +08:00
  • a94eee4456 [Bugfix] Fix mm_limits access for merged multi-modal processor (#12252) Cyrus Leung 2025-01-21 18:09:39 +08:00
  • f2e9f2a3be [Misc] Remove redundant TypeVar from base model (#12248) Cyrus Leung 2025-01-21 16:40:39 +08:00
  • 1f1542afa9 [Misc]Add BNB quantization for PaliGemmaForConditionalGeneration (#12237) Jee Jee Li 2025-01-21 15:49:08 +08:00
  • 96912550c8 [Misc] Rename MultiModalInputsV2 -> MultiModalInputs (#12244) Cyrus Leung 2025-01-21 15:31:19 +08:00
  • 2fc6944c5e [ci/build] disable failed and flaky tests (#12240) youkaichao 2025-01-21 13:25:03 +08:00
  • 5fe6bf29d6 [BugFix] Fix GGUF tp>1 when vocab_size is not divisible by 64 (#12230) Nicolò Lucchesi 2025-01-21 05:23:14 +01:00
  • d4b62d4641 [AMD][Build] Porting dockerfiles from the ROCm/vllm fork (#11777) Gregory Shtrasberg 2025-01-20 23:22:23 -05:00
  • ecf67814f1 Add quantization and guided decoding CODEOWNERS (#12228) Michael Goin 2025-01-20 20:23:40 -05:00
  • 750f4cabfa [Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222) Jinzhen Lin 2025-01-21 08:42:16 +08:00
  • 06a760d6e8 [bugfix] catch xgrammar unsupported array constraints (#12210) Cheng Kuan Yong Jason 2025-01-21 08:42:02 +08:00
  • da7512215f [misc] add cuda runtime version to usage data (#12190) youkaichao 2025-01-21 08:31:01 +08:00
  • af69a6aded fix: update platform detection for M-series arm based MacBook processors (#12227) Işık 2025-01-20 22:23:28 +00:00
  • 7bd3630067 [Misc] Update CODEOWNERS (#12229) Roger Wang 2025-01-20 14:19:09 -08:00
  • 96663699b2 [CI] Pass local python version explicitly to pre-commit mypy.sh (#12224) Chen Zhang 2025-01-20 23:49:18 +08:00
  • 18572e3384 [Bugfix] Fix HfExampleModels.find_hf_info (#12223) Cyrus Leung 2025-01-20 23:35:36 +08:00
  • 86bfb6dba7 [Misc] Pass attention to impl backend (#12218) wangxiyuan 2025-01-20 23:25:28 +08:00
  • 5f0ec3935a [V1] Remove _get_cache_block_size (#12214) Chen Zhang 2025-01-20 21:54:16 +08:00
  • c222f47992 [core][bugfix] configure env var during import vllm (#12209) youkaichao 2025-01-20 19:35:59 +08:00
  • 170eb35079 [misc] print a message to suggest how to bypass commit hooks (#12217) youkaichao 2025-01-20 18:06:24 +08:00
  • b37d82791e [Model] Upgrade Aria to transformers 4.48 (#12203) Cyrus Leung 2025-01-20 17:58:48 +08:00
  • 3127e975fb [CI/Build] Make pre-commit faster (#12212) Cyrus Leung 2025-01-20 17:36:24 +08:00
  • 4001ea1266 [CI/Build] Remove dummy CI steps (#12208) Cyrus Leung 2025-01-20 16:41:57 +08:00
  • 5c89a29c22 [misc] add placeholder format.sh (#12206) youkaichao 2025-01-20 16:04:49 +08:00
  • 59a0192fb9 [Core] Interface for accessing model from VllmRunner (#10353) Cyrus Leung 2025-01-20 15:00:59 +08:00
  • 83609791d2 [Model] Add Qwen2 PRM model support (#12202) Isotr0py 2025-01-20 14:59:46 +08:00
  • 0974c9bc5c [Bugfix] Fix incorrect types in LayerwiseProfileResults (#12196) Yuan Tang 2025-01-20 01:59:20 -05:00
  • d2643128f7 [DOC] Add missing docstring in LLMEngine.add_request() (#12195) Yuan Tang 2025-01-20 01:59:00 -05:00
  • c5c06209ec [DOC] Fix typo in docstring and assert message (#12194) Yuan Tang 2025-01-20 01:58:29 -05:00