Commit Graph

  • 371f7e4ca2 [Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar (#18627) Cyrus Leung 2025-05-24 01:22:40 +08:00
  • 15b45ffb9a [Doc] Avoid documenting dynamic / internal modules (#18626) Cyrus Leung 2025-05-24 00:58:02 +08:00
  • 273cb3b4d9 [Doc] Fix top-level API links/docs (#18621) Cyrus Leung 2025-05-24 00:46:56 +08:00
  • 8ddd1cf26a [Doc] fix list formatting (#18624) David Xia 2025-05-23 12:41:17 -04:00
  • 6550114c9c [v1] Redo "Support multiple KV cache groups in GPU model runner (#17945)" (#18593) Chen Zhang 2025-05-24 00:39:47 +08:00
  • 9520a989df [Docs] Change mkdocs to not use directory urls (#18622) Michael Goin 2025-05-23 12:33:21 -04:00
  • 3d28ad343f Fix figures in design doc (#18612) Harry Mellor 2025-05-23 17:09:54 +01:00
  • 6a7988c55b Refactor pplx init logic to make it modular (prepare for deepep) (#18200) youkaichao 2025-05-23 23:43:43 +08:00
  • 022d8abe29 [Doc] Use a different color for the announcement (#18616) Cyrus Leung 2025-05-23 23:25:03 +08:00
  • 5221815a00 [Doc] Fix markdown list indentation for MkDocs rendering (#18620) Hyogeun Oh (오효근) 2025-05-24 00:23:21 +09:00
  • 1068556b2c [Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS (#18579) Simon Mo 2025-05-23 07:43:58 -07:00
  • 2cd1fa4556 [Misc] add Haystack integration (#18601) Reid 2025-05-23 21:21:19 +08:00
  • d4c2919760 Include private attributes in API documentation (#18614) Harry Mellor 2025-05-23 15:18:31 +02:00
  • 6220f3c6b0 [Bugfix] Fix transformers model impl ignored for mixtral quant (#18602) Tristan Leclercq 2025-05-23 14:54:13 +02:00
  • 52fb23f47e Fix examples with code blocks in docs (#18609) Harry Mellor 2025-05-23 14:53:44 +02:00
  • 6dd51c7ef1 [CI/Build] Fix V1 flag being set in entrypoints tests (#18598) Cyrus Leung 2025-05-23 20:51:53 +08:00
  • 2edb533af2 Replace {func} with mkdocs style links (#18610) Harry Mellor 2025-05-23 14:51:38 +02:00
  • 38a95cb4a8 [Doc] Fix indent of contributing to vllm (#18611) Hyogeun Oh (오효근) 2025-05-23 21:50:07 +09:00
  • cd821ea5d2 [CI] fix kv_cache_type argument (#18594) Ning Xie 2025-05-23 19:49:18 +08:00
  • 7ab056c273 [Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to requirements/cpu.txt (#18542) Kay Yan 2025-05-23 19:38:42 +08:00
  • 6526e05111 Add myself as docs code owner (#18605) Harry Mellor 2025-05-23 13:08:31 +02:00
  • e493e48524 [V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled (#17731) Madeesh Kannan 2025-05-23 12:38:23 +02:00
  • 4ce64e2df4 [Bugfix][Model] Fix baichuan model loader for tp (#18597) Mengqing Cao 2025-05-23 17:39:05 +08:00
  • fbb13a2c15 Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (#18034)" (#18600) Cyrus Leung 2025-05-23 17:18:22 +08:00
  • a1fe24d961 Migrate docs from Sphinx to MkDocs (#18145) Harry Mellor 2025-05-23 11:09:53 +02:00
  • d0bc2f810b [Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (#18430) Yuqi Zhang 2025-05-23 01:41:37 -07:00
  • b046cf792d [Feature][V1]: suupports cached_tokens in response usage (#18149) Chauncey 2025-05-23 16:41:03 +08:00
  • 54af915949 [Doc] Update quickstart and install for cu128 using --torch-backend=auto (#18505) Michael Goin 2025-05-23 04:36:37 -04:00
  • 71ea614d4a [Feature]Add async tensor parallelism using compilation pass (#17882) cascade 2025-05-23 01:03:34 -07:00
  • 4c611348a7 [V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (#18034) RonaldBXu 2025-05-23 00:37:18 -07:00
  • 60cad94b86 [Hardware] correct method signatures for HPU,ROCm,XPU (#18551) Ning Xie 2025-05-23 13:31:59 +08:00
  • 9c1baa5bc6 [Misc] Replace cuda hard code with current_platform (#16983) Shanshan Shen 2025-05-23 12:38:50 +08:00
  • 4be2255c81 [Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (#17291) Teruaki Ishizaki 2025-05-23 13:30:47 +09:00
  • ed5d408255 [Neuron] Remove bypass on EAGLEConfig and add a test (#18514) aws-elaineyz 2025-05-22 21:26:32 -07:00
  • 583507d130 [Spec Decode] Make EAGLE3 draft token ID mapping optional (#18488) Benjamin Chislett 2025-05-22 23:17:39 -04:00
  • e44d8ce8c7 [Bugfix] Set KVTransferConfig.engine_id in post_init (#18576) lkchen 2025-05-22 19:54:42 -07:00
  • 93ecb8139c [BugFix] Increase TP execute_model timeout (#18558) Nick Hill 2025-05-22 19:22:11 -07:00
  • fae453f8ce [Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs (#18482) CYJiang 2025-05-23 10:15:32 +08:00
  • 4b0da7b60e Enable hybrid attention models for Transformers backend (#18494) Harry Mellor 2025-05-23 04:12:08 +02:00
  • c6b636f9fb [V1][Spec Decoding] Use model_loader.get_model() to load models (#18273) Mark McLoughlin 2025-05-23 03:05:44 +01:00
  • 04eb88dc80 Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569) Chenheli Hua 2025-05-22 18:59:18 -07:00
  • 46791e1b4b [AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568) rasmith 2025-05-22 20:45:35 -05:00
  • c32e249a23 [Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926) Sanger Steel 2025-05-22 21:44:18 -04:00
  • c91fe7b1b9 [Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917) Kai Wu 2025-05-22 16:44:08 -07:00
  • a04720bc36 [V1][Spec Decode][Bugfix] Load quantize weights for EAGLE (#18290) Ekagra Ranjan 2025-05-22 18:17:33 -04:00
  • 7b9d832c80 [Tool] Add NIXL installation script (#18172) lkchen 2025-05-22 14:33:16 -07:00
  • 6e588da0f4 [Build/CI] Fix CUDA 11.8 build (#17679) Tyler Michael Smith 2025-05-22 15:13:54 -04:00
  • f8d2cc5f55 [Compile][Platform] Make PiecewiseBackend pluggable and extendable (#18076) Mengqing Cao 2025-05-23 03:11:53 +08:00
  • 721fb9b181 [Platform] Move platform check to right place (#18470) wangxiyuan 2025-05-23 03:11:28 +08:00
  • 1f3a1200e4 [Bugfix] make test_openai_schema.py pass (#18224) David Xia 2025-05-22 14:34:06 -04:00
  • 54631f8262 [Misc] Call ndarray.tobytes() directly instead of ndarray.data.tobytes() (#18347) Lukas Geiger 2025-05-22 17:00:13 +01:00
  • cb506ecb5a [Misc] improve Automatic Prefix Caching example (#18554) Reid 2025-05-22 22:50:46 +08:00
  • 93f71673ce [BugFix][CPU] Fix x86 SHM distributed module initialization (#18536) Li, Jiang 2025-05-22 22:35:00 +08:00
  • 3f505233fd [Doc] Add stream flag for chat completion example (#18524) Calvin Chen 2025-05-22 22:07:10 +08:00
  • 4e04eceb58 [Bugfix] Use random hidden states in dummy sampler run (#18543) Bowen Wang 2025-05-22 06:48:56 -07:00
  • 71075029f2 [Doc] Support --stream arg in openai_completion_client.py script (#18388) CYJiang 2025-05-22 21:20:17 +08:00
  • ca86a7cf6e [CI/Build] Update bamba test model location (#18544) Harry Mellor 2025-05-22 15:01:07 +02:00
  • a35a494745 [Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible (#18513) lkchen 2025-05-22 05:24:43 -07:00
  • f6037d1907 [Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18526) 2025-05-22 20:22:53 +08:00
  • fa72f9a812 Order sequence ids + config update to support specifying custom quantization layers (#18279) aws-elaineyz 2025-05-22 02:20:36 -07:00
  • ebed81fbf5 Update default neuron config for speculation (#18274) aws-elaineyz 2025-05-22 02:18:55 -07:00
  • e2d7d31244 [Neuron] Update Dockerfile.neuron to use latest neuron release (2.23) (#18512) Satyajith Chilappagari 2025-05-22 02:17:34 -07:00
  • 23b67b37b2 [Doc] Fix invalid JSON in example args (#18527) Cyrus Leung 2025-05-22 15:11:46 +08:00
  • db5a29ba19 [Bugfix] Fix LoRA test (#18518) Jee Jee Li 2025-05-22 12:48:53 +08:00
  • 51797775c3 [Bugfix][Model] Make Olmo2Model weight loading return loaded weights (#18504) Shane A 2025-05-21 21:17:03 -07:00
  • cf5984b2fe [BugFix][DP] Send DP wave completion only from dp_rank==0 (#18502) Nick Hill 2025-05-21 20:25:25 -07:00
  • d022115cc6 [Bugfix] Inconsistent token calculation compared to HF in llava family (#18479) youngrok cha 2025-05-22 12:21:47 +09:00
  • acb54ca8e1 Intialize io_thread_pool attribute in the beginning. (#18331) Rabi Mishra 2025-05-22 08:51:14 +05:30
  • 6e0fd34d3c [CI] Fix race condition with StatelessProcessGroup.barrier (#18506) Russell Bryant 2025-05-21 23:19:13 -04:00
  • 176d62e4ea [MISC] update project urls in pyproject.toml (#18519) Ning Xie 2025-05-22 11:17:34 +08:00
  • 20bd6f4d2e [FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) (#18500) Dhia Eddine Rhaiem 2025-05-22 06:23:59 +04:00
  • 1f079540db [Bugfix] Consistent ascii handling in tool parsers (#17704) Sebastian Schoennenbeck 2025-05-21 22:41:23 +02:00
  • 94d8ec8d2b [FEAT][ROCm] Upgrade AITER MLA v1 backend (#18338) vllmellm 2025-05-22 01:34:28 +08:00
  • bb0a311213 Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945) (#18459) Mark McLoughlin 2025-05-21 18:25:23 +01:00
  • dd5fa7e04f [ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004) Hosang 2025-05-21 11:35:00 -04:00
  • 2b16104557 [Misc] Update deprecation message for --enable-reasoning (#18404) Hyogeun Oh (오효근) 2025-05-21 23:33:11 +09:00
  • 371376f996 [Build] fix Dockerfile shell (#18402) Kebe 2025-05-21 22:32:06 +08:00
  • c6c10ca920 [Bugfix] Reduce moe_sum test size to avoid OOM (#18484) bnellnm 2025-05-21 09:46:39 -04:00
  • c154d89306 [Doc] fix arg docstring in linear layers (#18410) GiantCroc 2025-05-21 21:45:57 +08:00
  • eca18691d2 [MODEL] FalconH1 (#18406) Dhia Eddine Rhaiem 2025-05-21 15:59:06 +04:00
  • 61acfc45bc [Bugfix][Failing Test] Fix test_events.py (#18460) Rabi Mishra 2025-05-21 17:27:28 +05:30
  • 107f5fc4cb [Misc] refactor disaggregated-prefill-v1 example (#18474) Reid 2025-05-21 19:10:14 +08:00
  • 907f935de9 [V1] Fix general plugins not loaded in engine for multiproc (#18326) Yong Hoon Shin 2025-05-21 01:21:49 -07:00
  • 5d7f545204 [Frontend] deprecate --device arg (#18399) Kebe 2025-05-21 16:21:17 +08:00
  • cd8dfc6dfc [Misc] MultiConnector._connectors type (#18423) Nicolò Lucchesi 2025-05-21 07:48:43 +02:00
  • d06dd72ba9 [Bugfix][Failing Test] Fix nixl connector test when promt size < block size (#18429) wwl2755 2025-05-21 00:41:44 -05:00
  • ad0012a0ac Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)" (#18456) Cyrus Leung 2025-05-21 13:39:22 +08:00
  • 92247c522e [Bug] Fix moe_sum signature (#18440) bnellnm 2025-05-21 01:37:08 -04:00
  • 0c15c2e486 [Bugfix] config.head_dim is now explicitly set to None (#18432) Gregory Shtrasberg 2025-05-21 00:04:33 -04:00
  • 3b17ea26e4 [TPU] Re-enable the Pallas MoE kernel (#18025) Michael Goin 2025-05-20 22:52:27 -04:00
  • 23baa2180b fix:Build torch wheel inline rather than picking from nightly (#18351) Dilip Gowda Bhagavan 2025-05-21 03:52:24 +05:30
  • 980a172474 [Kernel] update comment for KV shape in unified triton attn (#18099) Percy 2025-05-20 13:19:34 -05:00
  • e1f5a71ed7 [Model] use AutoWeightsLoader for bloom (#18300) Calvin Chen 2025-05-21 00:40:05 +08:00
  • f4a8a37465 [Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356) Michael Goin 2025-05-20 12:08:37 -04:00
  • 8f55962a7f [Misc] refactor prompt embedding examples (#18405) Reid 2025-05-20 23:26:12 +08:00
  • be48360c1f [Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407) 2025-05-20 21:59:48 +08:00
  • 86847700d7 [CI] Add mteb testing to test the accuracy of the embedding model (#17175) wang.yuqi 2025-05-20 21:51:12 +08:00
  • d6c86d09ae Update cpu.txt (#18398) 汪志鹏 2025-05-20 18:53:23 +08:00
  • 6b35cb10a0 [Misc] Add LoRA code owner (#18387) Jee Jee Li 2025-05-20 18:27:30 +08:00
  • 1b1e8e05ff [doc] update env variable export (#18391) Reid 2025-05-20 16:53:27 +08:00