Commit Graph

  • 7c1f760024 [Kernel][TPU][ragged-paged-attn] vLLM code change for PR#8896 (#15659) yarongmu-google 2025-03-28 21:13:15 -07:00
  • da461f3cbf [TPU][V1][Bugfix] Fix w8a8 recompiilation with GSM8K (#15714) Nicolò Lucchesi 2025-03-29 05:13:06 +01:00
  • 5b800f0932 [Bugfix] set VLLM_WORKER_MULTIPROC_METHOD=spawn for vllm.entrypoionts.openai.api_server (#15700) Jinzhen Lin 2025-03-29 12:12:26 +08:00
  • 8427f70493 Use numba 0.61 for python 3.10+ to support numpy>=2 (#15692) cyyever 2025-03-29 12:11:51 +08:00
  • 7a7992085b [CI] Speed up V1 structured output tests (#15718) Russell Bryant 2025-03-29 00:10:45 -04:00
  • 1286211f57 [Bugfix] LoRA V1: add and fix entrypoints tests (#15715) Varun Sundar Rabindranath 2025-03-28 21:10:41 -07:00
  • 6d531ad7b8 [Misc][V1] Misc code streamlining (#15723) Nick Hill 2025-03-28 20:59:47 -07:00
  • 762b424a52 [Docs] Document v0 engine support in reasoning outputs (#15739) Ce Gao 2025-03-29 11:46:57 +08:00
  • de1cb38769 [Model] Support Skywork-R1V (#15397) pengyuange 2025-03-29 11:39:21 +08:00
  • c802f5430d [ROCm][AMD][Build] Update AMD supported arch list (#15632) Gregory Shtrasberg 2025-03-28 23:39:18 -04:00
  • cff8991a50 [Docs][V1] Optimize diagrams in prefix caching design (#15716) simpx 2025-03-29 11:33:58 +08:00
  • f3f8d8fff4 implement prometheus fast-api-instrumentor for http service metrics (#15657) daniel-salib 2025-03-28 17:12:02 -07:00
  • 26df46ee59 [Misc] cli auto show default value (#15582) Reid 2025-03-29 06:23:00 +08:00
  • c3f687ac22 [V1] TPU - Fix the chunked prompt bug (#15713) Alexander Matveev 2025-03-28 16:19:04 -04:00
  • 04437e313d [Bugfix] [torch.compile] Add Dynamo metrics context during compilation (#15639) Luka Govedič 2025-03-28 16:01:09 -04:00
  • 038bededba [TPU] [Perf] Improve Memory Usage Estimation (#15671) Robert Shaw 2025-03-28 10:37:52 -07:00
  • d03308be0c [Misc] Remove stale func in KVTransferConfig (#14746) shangmingc 2025-03-29 01:33:32 +08:00
  • c6bc0034d0 [Misc] Remove unused utils and clean up imports (#15708) Cyrus Leung 2025-03-29 00:41:16 +08:00
  • 70e132244a [Minor] Remove TGI launching script (#15646) Woosuk Kwon 2025-03-28 09:30:08 -07:00
  • 47e9038d23 Fix cpu offload testing for gptq/awq/ct (#15648) Michael Goin 2025-03-28 10:29:32 -06:00
  • 432cf22a6a [Bugfix] Fix regex compile display format (#15368) Kebe 2025-03-28 23:58:44 +08:00
  • 2914006fe0 [doc] add missing imports (#15699) Reid 2025-03-28 23:56:48 +08:00
  • 7329ff5468 [V1] Support disable_any_whtespace for guidance backend (#15584) Russell Bryant 2025-03-28 11:46:45 -04:00
  • 541d1df486 [Bugfix] embed_is_patch for Idefics3 (#15696) Cyrus Leung 2025-03-28 23:27:52 +08:00
  • 3b00ff9138 [Bugfix][v1] xgrammar structured output supports Enum. (#15594) Chauncey 2025-03-28 21:14:53 +08:00
  • 91276c5721 [Model] Adding torch compile annotations to chatglm (#15624) Jee Jee Li 2025-03-28 21:14:09 +08:00
  • 0b4167526d [Docs] Add "Generation quality changed" section to troubleshooting (#15701) Harry Mellor 2025-03-28 13:03:21 +00:00
  • fd5fd26902 [Frontend] update priority for --api-key and VLLM_API_KEY (#15588) Reid 2025-03-28 19:40:12 +08:00
  • 3bbaacbe15 [Bugfix][Frontend] Eliminate regex based check in reasoning full generator (#14821) Ce Gao 2025-03-28 19:20:35 +08:00
  • a10314c6b3 [Misc] Fix test_sleep to use query parameters (#14373) Lize Cai 2025-03-28 19:00:14 +09:00
  • 70f2c2a709 [Bugfix] Fix 'InductorAdaptor object has no attribute 'cache_dir' (#15674) Jee Jee Li 2025-03-28 17:10:40 +08:00
  • 280d074103 [CPU][CI] Improve CPU Dockerfile (#15690) Li, Jiang 2025-03-28 16:36:31 +08:00
  • 32b14baf8a [Refactor][Frontend] Keep all logic about reasoning into one class (#14428) Ce Gao 2025-03-28 15:23:30 +08:00
  • 2d9045fce8 [TPU][CI] Fix TPUModelRunner Test (#15667) Robert Shaw 2025-03-28 03:01:26 -04:00
  • 355f66348c [V1] Remove legacy input registry (#15673) Cyrus Leung 2025-03-28 14:34:34 +08:00
  • 8693e47e6a [Bugfix] Fix mm_hashes forgetting to be passed (#15668) Cyrus Leung 2025-03-28 13:51:05 +08:00
  • cec8c7d7f8 Refactor error handling for multiple exceptions in preprocessing (#15650) Jason (Siyu) Zhu 2025-03-27 20:27:20 -07:00
  • 4d0ec37267 [Quantization][FP8] Adding support for fp8 gemm layer input in fp8 (#14578) Gregory Shtrasberg 2025-03-27 22:58:16 -04:00
  • e7f720ea56 [Misc]add coding benchmark for speculative decoding (#15303) Chen Xia 2025-03-27 19:47:05 -07:00
  • 4ae17bf1e2 Revert "Use Cache Hinting for fused_moe kernel (#15511)" (#15645) Wes 2025-03-27 20:45:55 -06:00
  • 8a49eea74b [CI][TPU] Temporarily Disable Quant Test on TPU (#15649) Robert Shaw 2025-03-27 22:45:05 -04:00
  • b4245a48df [Doc] Fix dead links in Job Board (#15637) wwl2755 2025-03-27 21:43:40 -05:00
  • 4e0f6076be [Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (#14948) Kebe 2025-03-28 10:13:41 +08:00
  • 726efc6a32 [Quantization][V1] BitsAndBytes support V1 (#15611) Jee Jee Li 2025-03-28 10:12:47 +08:00
  • bd45912b99 [TPU] Lazy Import (#15656) Robert Shaw 2025-03-27 21:57:01 -04:00
  • 15dac210f0 [V1] AsyncLLM data parallel (#13923) Nick Hill 2025-03-27 16:14:41 -07:00
  • 112b3e5b3b [CI] Update rules for applying tpu label. (#15634) Russell Bryant 2025-03-27 18:15:26 -04:00
  • 32d669275b Correct PowerPC to modern IBM Power (#15635) cnorman 2025-03-27 17:04:32 -05:00
  • 4098b72210 [Bugfix][TPU][V1] Fix recompilation (#15553) Nicolò Lucchesi 2025-03-27 20:15:06 +01:00
  • 46450b8d33 Use absolute placement for Ask AI button (#15628) Harry Mellor 2025-03-27 18:52:18 +00:00
  • 13ac9cab21 [Misc] Avoid direct access of global mm_registry in compute_encoder_budget (#15621) Cyrus Leung 2025-03-28 01:52:00 +08:00
  • 66aa4c0bf4 [Feature] Add middleware to log API Server responses (#15593) Yuan Tang 2025-03-27 13:49:38 -04:00
  • 247181536f [Misc] Replace is_encoder_decoder_inputs with split_enc_dec_inputs (#15620) Cyrus Leung 2025-03-28 01:36:32 +08:00
  • 07bf813fb5 [Doc] Link to onboarding tasks (#15629) Cyrus Leung 2025-03-28 00:30:53 +08:00
  • 8958217ad5 [Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 (#15211) Hiroaki Sugiyama 2025-03-27 23:29:29 +09:00
  • ac5bc615b0 [Model] MiniCPM-V/O supports V1 (#15487) Cyrus Leung 2025-03-27 21:07:29 +08:00
  • 8063dfc61a [Doc] update --system for transformers installation in docker doc (#15616) Reid 2025-03-27 20:38:46 +08:00
  • 6278bc829e Fix incorrect filenames in vllm_compile_cache.py (#15494) Richard Zou 2025-03-27 06:33:41 -04:00
  • 3f532cb6a6 [Misc] Use model_redirect to redirect the model name to a local folder. (#14116) wang.yuqi 2025-03-27 17:21:23 +08:00
  • e6c9053f9e [Misc] Clean up scatter_patch_features (#15559) Cyrus Leung 2025-03-27 15:45:00 +08:00
  • 43ed4143c4 [Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM (#15587) Robert Shaw 2025-03-27 02:47:25 -04:00
  • f4c98b4d4c [Misc] Consolidate LRUCache implementations (#15481) Bella kira 2025-03-27 14:43:43 +08:00
  • e1e0fd7543 [TPU] Avoid Triton Import (#15589) Robert Shaw 2025-03-27 02:43:02 -04:00
  • df8d3d1287 [Misc] Restrict ray version dependency and update PP feature warning in V1 (#15556) Rui Qiao 2025-03-26 23:21:07 -07:00
  • 619d3de8bd [TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS (#15583) Chengji Yao 2025-03-26 22:46:26 -07:00
  • ecff8309a3 [ROCm] Env variable to trigger custom PA (#15557) Gregory Shtrasberg 2025-03-27 01:46:12 -04:00
  • dcf2a590f5 Allow torchao quantization in SiglipMLP (#15575) Jerry Zhang 2025-03-26 22:45:51 -07:00
  • 54aa619459 [V1] Refactor num_computed_tokens logic (#15307) Cody Yu 2025-03-26 21:54:36 -07:00
  • fb22be5817 [moe][quant] add weight name case for offset (#15515) Mengqing Cao 2025-03-27 12:50:29 +08:00
  • 7f301dd8ef [Doc] Update V1 user guide for fp8 kv cache support (#15585) Wei Zeng 2025-03-26 19:39:03 -07:00
  • 8095341a01 [misc] LoRA: Remove unused long context test data (#15558) Varun Sundar Rabindranath 2025-03-26 19:04:51 -07:00
  • 69db16a46a add platform check back (#15578) Chenyaaang 2025-03-26 18:50:27 -07:00
  • ce78f9af4e Add automatic tpu label to mergify.yml (#15560) Michael Goin 2025-03-26 19:39:58 -06:00
  • 9239bf718e [Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972) ElizaWszola 2025-03-27 01:54:44 +01:00
  • 7a6d45bc8a Support FIPS enabled machines with MD5 hashing (#15299) Matthew Vine 2025-03-26 20:19:46 -04:00
  • e74ff409e0 [TPU] support disabling xla compilation cache (#15567) Chengji Yao 2025-03-26 17:09:28 -07:00
  • 7a888271f5 Use Cache Hinting for fused_moe kernel (#15511) Wes 2025-03-26 17:21:34 -06:00
  • 9d119a86ae [V1] TPU CI - Fix test_compilation.py (#15570) Alexander Matveev 2025-03-26 17:51:54 -04:00
  • b2e85e26f4 [V1] TPU - Revert to exponential padding by default (#15565) Alexander Matveev 2025-03-26 17:35:05 -04:00
  • dd8a29da99 Applying some fixes for K8s agents in CI (#15493) Alexei-V-Ivanov-AMD 2025-03-26 15:35:11 -05:00
  • 27df5199d9 Support SHA256 as hash function in prefix caching (#15297) marko 2025-03-26 19:11:28 +01:00
  • 35fad35a48 [V1][Sampler] Faster top-k only implementation (#15478) Nick Hill 2025-03-26 10:56:47 -07:00
  • 733e7c9e95 [Refactor] Remove unnecessary backend parameter in structured output interface (#15317) Aaron Pham 2025-03-26 13:51:56 -04:00
  • 0af4d764d6 Fix weight loading for some models in Transformers backend (#15544) Harry Mellor 2025-03-26 17:17:53 +00:00
  • e64afa455c multi-node offline DP+EP example (#15484) youkaichao 2025-03-26 23:54:24 +08:00
  • 1711b929b6 [Model] Add Reasoning Parser for Granite Models (#14202) Alex Brooks 2025-03-26 08:28:07 -06:00
  • c091c0a588 Improve validation of TP in Transformers backend (#15540) Harry Mellor 2025-03-26 14:26:48 +00:00
  • 1aa162e030 Apply torchfix (#15532) cyyever 2025-03-26 20:09:06 +08:00
  • cf5c8f1686 Separate base model from TransformersModel (#15467) Harry Mellor 2025-03-26 10:13:38 +00:00
  • 4ec2cee000 [Misc] improve example script output (#15528) Reid 2025-03-26 18:12:47 +08:00
  • 99f536f830 [Misc] Enhance warning information to user-defined chat template (#15408) wwl2755 2025-03-26 04:21:15 -05:00
  • 5ebf66748b [FEAT][ROCm] Integrate Fused MoE Kernels from AITER (#14967) vllmellm 2025-03-26 16:30:30 +08:00
  • 781d056280 [Feature] Enhance EAGLE Architecture with Proper RMS Norms (#14990) Bryan Lu 2025-03-26 01:24:07 -07:00
  • 5aefd6ac31 Fix raw_request extraction in load_aware_call decorator (#15382) daniel-salib 2025-03-25 22:29:54 -07:00
  • 6c663dfd5e [misc] LoRA - Skip LoRA kernels when not required (#15152) Varun Sundar Rabindranath 2025-03-25 20:33:45 -07:00
  • 33437bc6e7 [BugFix] Fix nightly MLA failure (FA2 + MLA chunked prefill, i.e. V1, producing bad results) (#15492) Lucas Wilkinson 2025-03-25 23:33:22 -04:00
  • 23114d3364 [Misc] Warn about v0 in benchmark_paged_attn.py (#15495) Tyler Michael Smith 2025-03-25 23:31:04 -04:00
  • 997c8811d6 [Model] Support multi-image for Molmo (#15438) Cyrus Leung 2025-03-26 11:26:33 +08:00
  • e42389f9d7 Transformers backend already supports V1 (#15463) Harry Mellor 2025-03-26 03:26:16 +00:00
  • ff38f0a32c [CI/Build] LoRA: Delete long context tests (#15503) Varun Sundar Rabindranath 2025-03-25 17:18:34 -07:00