Commit Graph

  • 01b6113659 [TPU] optimize the all-reduce performance (#15903) Chengji Yao 2025-04-02 17:25:14 -07:00
  • 1b84eff03a [V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#15736) Hyesoo Yang 2025-04-02 17:18:08 -07:00
  • 55acf86bf8 Fix huggingface-cli[hf-xet] -> huggingface-cli[hf_xet] (#15969) Harry Mellor 2025-04-03 00:37:30 +01:00
  • f021b97993 [V1] Support Mistral3 in V1 (#15950) Michael Goin 2025-04-02 16:36:24 -06:00
  • 1cab43c2d2 [misc] instruct pytorch to use nvml-based cuda check (#15951) youkaichao 2025-04-03 01:02:58 +08:00
  • 8bd651b318 Restricted cmake to be less than version 4 as 4.x breaks the build of… (#15859) Nishidha 2025-04-02 21:49:39 +05:30
  • 58e234a754 [Misc] V1 LoRA support CPU offload (#15843) Jee Jee Li 2025-04-02 23:04:43 +08:00
  • e86c414d6a [Model] use AutoWeightsLoader in model load_weights (#15770) rongfu.leng 2025-04-02 22:47:31 +08:00
  • 550b2801ad [CPU][Bugfix] Using custom allreduce for CPU backend (#15934) Li, Jiang 2025-04-02 22:46:47 +08:00
  • cefb9e5a28 [Frontend] Implement Tool Calling with tool_choice='required' (#13483) Matthias Matt 2025-04-02 16:45:45 +02:00
  • 98d7367b61 [Metrics] Hide deprecated metrics (#15458) Mark McLoughlin 2025-04-02 15:37:19 +01:00
  • 594a8b9030 [Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (#15938) Chauncey 2025-04-02 21:33:52 +08:00
  • 44f990515b [CI] Remove duplicate entrypoints-test (#15940) Kay Yan 2025-04-02 17:44:01 +08:00
  • 252937806c [Bugfix][Benchmarks] Ensure async_request_deepspeed_mii uses the OpenAI choices key (#15926) Brayden Zhong 2025-04-02 05:19:35 -04:00
  • 51826d51fa Add minimum version for huggingface_hub to enable Xet downloads (#15873) Harry Mellor 2025-04-02 10:03:36 +01:00
  • 14e53ed11f [V1] Fix json_object support with xgrammar (#15488) Russell Bryant 2025-04-02 05:00:08 -04:00
  • ddb94c2605 [core] Add tags parameter to wake_up() (#15500) Eric Tang 2025-04-02 01:59:27 -07:00
  • 90969fb39a [Kernel] Add more dtype support for GGUF dequantization (#15879) LukasBluebaum 2025-04-02 10:58:48 +02:00
  • 101f1481f9 [Build/CI] Update lm-eval to 0.4.8 (#15912) Chris Thi 2025-04-02 04:47:57 -04:00
  • 2edc87b161 [Bugfix] Fix cache block size calculation for CPU MLA (#15848) Thien Tran 2025-04-02 16:45:02 +08:00
  • 4203926f10 [CI/Build] Further clean up LoRA tests (#15920) Jee Jee Li 2025-04-02 16:39:09 +08:00
  • cdb57015a7 [Misc] Replace print with logger (#15923) Chauncey 2025-04-02 16:37:38 +08:00
  • aa557e6422 [Benchmark]Fix error message (#15866) Li Wang 2025-04-02 16:32:24 +08:00
  • 0e00d40e4f [V1][Bugfix] Fix typo in MoE TPU checking (#15927) Roger Wang 2025-04-01 23:46:42 -07:00
  • c920e01242 [Doc] Update rocm.inc.md (#15917) chun 2025-04-02 15:38:26 +09:00
  • 274d8e8818 [V1][Minor] Enhance SpecDecoding Metrics Log in V1 (#15902) Woosuk Kwon 2025-04-01 23:38:02 -07:00
  • 2039c6305b [Bugfix] Fix imports for MoE on CPU (#15841) Thien Tran 2025-04-02 11:33:55 +08:00
  • 6efb195a6e [V1] Fix: make sure k_index is int64 for apply_top_k_only (#15907) Brayden Zhong 2025-04-01 22:06:44 -04:00
  • 24b7fb455a [Spec Decode] Fix input triton kernel for eagle (#15909) Ekagra Ranjan 2025-04-01 21:15:14 -04:00
  • 58f5a59769 [Docs] Add Intel as Sponsor (#15913) Simon Mo 2025-04-01 17:16:55 -07:00
  • db9dfcfa6a [Docs] Add Ollama meetup slides (#15905) Simon Mo 2025-04-01 13:58:59 -07:00
  • 9ef98d527e [Model][MiniMaxText01] Support MiniMaxText01 model inference (#13454) Gerald 2025-04-02 04:23:55 +08:00
  • 93491aefc7 [BugFix] make sure socket close (#15875) yihong 2025-04-02 04:10:24 +08:00
  • 7acd539cd7 [Docs] update usage stats language (#15898) Simon Mo 2025-04-01 12:54:13 -07:00
  • e75a6301bd [V1][Spec Decode] Implement Eagle Proposer [1/N] (#15729) Woosuk Kwon 2025-04-01 12:33:16 -07:00
  • a79cc68b3a [V1][Metrics] Initial speculative decoding metrics (#15151) Mark McLoughlin 2025-04-01 18:45:04 +01:00
  • 7e3f7a4ee7 [CI] Disable flaky structure decoding test temporarily. (#15892) Roger Wang 2025-04-01 10:42:34 -07:00
  • 9ec8257914 [Model] Add module name prefixes to gemma3 (#15889) cloud11665 2025-04-02 02:13:40 +09:00
  • 38327cf454 [Model] Aya Vision (#15441) Jennifer Zhao 2025-04-01 09:30:43 -07:00
  • dfa82e2a3d [CI/Build] Clean up LoRA tests (#15867) Jee Jee Li 2025-04-02 00:28:50 +08:00
  • e59ca942f5 Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932) bnellnm 2025-04-01 12:07:43 -04:00
  • a57a3044aa [ROCm][Build][Bugfix] Bring the base dockerfile in sync with the ROCm fork (#15820) Gregory Shtrasberg 2025-04-01 11:56:39 -04:00
  • 4e5a0f6ae2 [Misc] Allow using OpenCV as video IO fallback (#15055) Isotr0py 2025-04-01 23:55:13 +08:00
  • b63bd14999 Reinstate format.sh and make pre-commit installation simpler (#15890) Harry Mellor 2025-04-01 16:41:30 +01:00
  • 2041c0e360 [Doc] Quark quantization documentation (#15861) chaow-amd 2025-04-01 23:32:45 +08:00
  • 085cbc4f9f [New Model]: jinaai/jina-reranker-v2-base-multilingual (#15876) wang.yuqi 2025-04-01 23:32:26 +08:00
  • 2b93162fb0 Remove format.sh as it's been unsupported >70 days (#15884) Harry Mellor 2025-04-01 15:27:46 +01:00
  • 2e45bd29fe [Misc] remove unused script (#15746) Reid 2025-04-01 21:58:05 +08:00
  • 51d7c6a2b2 [Model] Support Mistral3 in the HF Transformers format (#15505) Michael Goin 2025-04-01 07:10:05 -06:00
  • f3aca1ee30 setup correct nvcc version with CUDA_HOME (#15725) Yang Chen 2025-04-01 06:09:40 -07:00
  • 8dd41d6bcc [Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE (#15831) Rui Qiao 2025-04-01 06:07:53 -07:00
  • 0a298ea418 [Bugfix] Fix no video/image profiling edge case for MultiModalDataParser (#15828) Isotr0py 2025-04-01 18:17:11 +08:00
  • d330558bab [Docs] Fix small error in link text (#15868) Harry Mellor 2025-04-01 11:05:14 +01:00
  • 656fd72976 [Misc] Fix speculative config repr string (#15860) shangmingc 2025-04-01 17:26:22 +08:00
  • 79455cf421 [Misc] Enable V1 LoRA by default (#15320) Varun Sundar Rabindranath 2025-04-01 04:53:56 -04:00
  • 30d6a015e0 [Feature] specify model in config.yaml (#15798) Wei Zeng 2025-04-01 01:20:06 -07:00
  • 8af5a5c4e5 fix: can not use uv run collect_env close #13888 (#15792) yihong 2025-04-01 15:45:49 +08:00
  • 3a5f0afcd2 [V1] Implement sliding window attention in kv_cache_manager (#14097) Chen Zhang 2025-04-01 15:33:17 +08:00
  • c7e63aa4d8 [ROCm] Use device name in the warning (#15838) Gregory Shtrasberg 2025-04-01 03:10:48 -04:00
  • 4a9ce1784c [sleep mode] clear pytorch cache after sleep (#15248) Lionel Villard 2025-04-01 01:58:58 -04:00
  • 7e4e709b43 [V1] TPU - Fix fused MOE (#15834) Alexander Matveev 2025-04-01 01:58:07 -04:00
  • 63d8eabed0 [Bugfix]: Fix is_embedding_layer condition in VocabParallelEmbedding (#15824) Alexey Kiryushin 2025-04-01 05:57:59 +00:00
  • e830b01383 [Bugfix] Fix extra comma (#15851) Percy 2025-04-01 00:57:28 -05:00
  • ff6473980d [Bugfix][Model] fix mllama multi-image (#14883) Yan Ma 2025-04-01 13:53:37 +08:00
  • a164aea35d [Frontend] Add Phi-4-mini function calling support (#14886) Kinfey 2025-04-01 13:50:05 +08:00
  • a76f547e11 Rename fallback model and refactor supported models section (#15829) Harry Mellor 2025-04-01 06:49:41 +01:00
  • b7b7676d67 [Distributed] Add custom allreduce support for ROCM (#14125) Ilya Markov 2025-04-01 07:49:12 +02:00
  • e6e3c55ef2 Move dockerfiles into their own directory (#14549) Harry Mellor 2025-03-31 21:47:32 +01:00
  • f98a4920f9 [V1][Core] Remove unused speculative config from scheduler (#15818) Mark McLoughlin 2025-03-31 20:15:21 +01:00
  • d4bfc23ef0 Fix Transformers backend compatibility check (#15290) Harry Mellor 2025-03-31 18:27:07 +01:00
  • 9a2160fa55 [V1] TPU CI - Add basic perf regression test (#15414) Alexander Matveev 2025-03-31 13:25:20 -04:00
  • 2de4118243 fix: change GB to GiB in logging close #14979 (#15807) yihong 2025-04-01 01:00:50 +08:00
  • 239b7befdd [V1][Spec Decode] Remove deprecated spec decode config params (#15466) shangmingc 2025-04-01 00:19:35 +08:00
  • 09e974d483 [Bugfix] Check dimensions of multimodal embeddings in V1 (#15816) Cyrus Leung 2025-04-01 00:01:35 +08:00
  • e5ef4fa99a Upgrade transformers to v4.50.3 (#13905) Harry Mellor 2025-03-31 16:59:37 +01:00
  • 037bcd942c [Bugfix] Fix missing return value in load_weights method of adapters.py (#15542) Mrm 2025-03-31 21:56:42 +08:00
  • c2e7507ad4 [Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats (#15813) Alex Brooks 2025-03-31 07:23:53 -06:00
  • 3aa2b6a637 [Model] Update support for NemotronNAS models (#15008) Naveassaf 2025-03-31 15:35:14 +03:00
  • 555aa21905 [V1] Fully Transparent Implementation of CPU Offloading (#15354) youkaichao 2025-03-31 20:22:34 +08:00
  • e7ae3bf3d6 fix: better install requirement for install in setup.py (#15796) yihong 2025-03-31 20:13:32 +08:00
  • b932c048ac Recommend developing with Python 3.12 in developer guide (#15811) Harry Mellor 2025-03-31 12:54:49 +01:00
  • e85829450d [Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050) Charlie Fu 2025-03-31 06:42:18 -05:00
  • effc5d24fa [Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup (#15748) Jennifer Zhao 2025-03-31 00:38:58 -07:00
  • 18ed3132d2 [Misc] update the comments (#15780) Chengyang LIU 2025-03-30 19:39:56 -07:00
  • 9b459eca88 [V1][Scheduler] Avoid calling _try_schedule_encoder_inputs for every request (#15778) Woosuk Kwon 2025-03-30 14:10:42 -07:00
  • 70fedd0f79 fix: Comments to English for better dev experience (#15768) yihong 2025-03-31 01:47:57 +08:00
  • bb103b29bf [Bugfix] Added embed_is_patch mask for fuyu model (#15731) kYLe 2025-03-30 05:45:08 -05:00
  • 248e76c4df fix: lint fix a ruff checkout syntax error (#15767) yihong 2025-03-30 18:36:02 +08:00
  • 803d5c35f3 [V1] Override mm_counts for dummy data creation (#15703) Cyrus Leung 2025-03-30 18:20:42 +08:00
  • 7fd8c0f85c fix test_phi3v (#15321) pansicheng 2025-03-30 17:01:34 +08:00
  • 44c3a5abc3 [doc] update conda to usage link in installation (#15761) Reid 2025-03-30 16:12:13 +08:00
  • 6909a76201 [Bugfix] Fix Mistral guided generation using xgrammar (#15704) Julien Denize 2025-03-30 05:20:19 +02:00
  • 045533716b [CI] xgrammar structured output supports Enum. (#15757) Chauncey 2025-03-30 11:20:02 +08:00
  • 3c0ff914ac [Bugfix] Fix Mllama interleaved images input support (#15564) Isotr0py 2025-03-30 02:11:15 +08:00
  • 2bc4be4e32 [V1][Minor] Simplify rejection sampler's parse_output (#15741) Woosuk Kwon 2025-03-29 09:25:17 -07:00
  • c67abd614f [V1] Support interleaved modality items (#15605) Roger Wang 2025-03-29 06:30:09 -07:00
  • 6fa7cd3dbc [Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore (#12957) shangmingc 2025-03-29 19:01:46 +08:00
  • 94744ba41a [V1] [Feature] Collective RPC (#15444) wwl2755 2025-03-29 05:39:14 -05:00
  • 4965ec42d2 [FEAT] [ROCm] Add AITER int8 scaled gemm kernel (#15433) TJian 2025-03-29 18:33:56 +08:00
  • 73aa7041bf [doc] update doc (#15740) Reid 2025-03-29 12:27:22 +08:00