Commit Graph

  • 1b49148e47 [Installation] Allow lower versions of FastAPI to maintain Ray 2.9 compatibility (#8764) Cyrus Leung 2024-09-27 07:54:09 +08:00
  • 4b377d6feb [BugFix] Fix test breakages from transformers 4.45 upgrade (#8829) Nick Hill 2024-09-27 00:46:43 +01:00
  • 71d21c73ab [Bugfix] Fixup advance_step.cu warning (#8815) Tyler Michael Smith 2024-09-26 19:23:45 -04:00
  • ee2da3e9ef fix validation: Only set tool_choice auto if at least one tool is provided (#8568) Chirag Jain 2024-09-27 04:53:17 +05:30
  • e2f6f26e86 [Bugfix] Fix print_warning_once's line info (#8867) Tyler Michael Smith 2024-09-26 19:18:26 -04:00
  • b28d2104de [Misc] Change dummy profiling and BOS fallback warns to log once (#8820) Michael Goin 2024-09-26 19:18:14 -04:00
  • 93d364da34 [Bugfix] Include encoder prompts len to non-stream api usage response (#8861) Pernekhan Utemuratov 2024-09-26 15:47:00 -07:00
  • d9cfbc891e [ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (#8872) Kevin H. Luu 2024-09-26 15:02:16 -07:00
  • 70de39f6b4 [misc][installation] build from source without compilation (#8818) youkaichao 2024-09-26 13:19:04 -07:00
  • 68988d4e0d [CI/Build] Fix missing ci dependencies (#8834) fyuan1316 2024-09-27 02:04:39 +08:00
  • 520db4dbc1 [Docs] Add README to the build docker image (#8825) Michael Goin 2024-09-26 14:02:52 -04:00
  • f70bccac75 [Build/CI] Upgrade to gcc 10 in the base build Docker image (#8814) Tyler Michael Smith 2024-09-26 13:07:18 -04:00
  • 4bb98f2190 [Misc] Update config loading for Qwen2-VL and remove Granite (#8837) Roger Wang 2024-09-26 07:45:30 -07:00
  • 7193774b1f [Misc] Support quantization of MllamaForCausalLM (#8822) v0.6.2 Michael Goin 2024-09-25 17:46:22 -04:00
  • e2c6e0a829 [Doc] Update doc for Transformers 4.45 (#8817) Roger Wang 2024-09-25 13:29:48 -07:00
  • 770ec6024f [Model] Add support for the multi-modal Llama 3.2 model (#8811) Chen Zhang 2024-09-25 13:29:32 -07:00
  • 4f1ba0844b Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810) Simon Mo 2024-09-25 10:36:26 -07:00
  • 873edda6cf [Misc] Support FP8 MoE for compressed-tensors (#8588) Michael Goin 2024-09-25 12:43:36 -04:00
  • 64840dfae4 [Frontend] MQLLMEngine supports profiling. (#8761) 科英 2024-09-26 00:37:41 +08:00
  • 28e1299e60 rename PromptInputs and inputs with backward compatibility (#8760) Cyrus Leung 2024-09-26 00:36:47 +08:00
  • 0c4d2ad5e6 [VLM][Bugfix] internvl with num_scheduler_steps > 1 (#8614) DefTruth 2024-09-26 00:35:53 +08:00
  • c6f2485c82 [[Misc]] Add extra deps for openai server image (#8792) Jee Jee Li 2024-09-26 00:35:23 +08:00
  • 300da09177 [Kernel] Fullgraph and opcheck tests (#8479) bnellnm 2024-09-25 10:35:52 -04:00
  • 1c046447a6 [CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777) Hongxia Yang 2024-09-25 10:26:37 -04:00
  • 8fae5ed7f6 [Misc] Fix minor typo in scheduler (#8765) Woo-Yeon Lee 2024-09-25 16:53:03 +09:00
  • 3368c3ab36 [Bugfix] Ray 2.9.x doesn't expose available_resources_per_node (#8767) David Newman 2024-09-25 17:52:26 +10:00
  • 1ac3de09cd [Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer (#8672) Adam Tilghman 2024-09-25 00:49:26 -07:00
  • 3e073e66f1 [Bugfix] load fc bias from config for eagle (#8790) sohamparikh 2024-09-25 02:16:30 -04:00
  • c23953675f [Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (#8770) Isotr0py 2024-09-25 14:16:11 +08:00
  • e3dd0692fa [BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv (#8250) zifeitong 2024-09-24 22:53:43 -07:00
  • fc3afc20df Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (#8752) sroy745 2024-09-24 21:26:36 -07:00
  • b4522474a3 [Bugfix][Kernel] Implement acquire/release polyfill for Pascal (#8776) sasha0552 2024-09-25 04:26:33 +00:00
  • ee777d9c30 Fix test_schedule_swapped_simple in test_scheduler.py (#8780) sroy745 2024-09-24 21:26:18 -07:00
  • 6e0c9d6bd0 [Bugfix] Use heartbeats instead of health checks (#8583) Joe Runde 2024-09-24 21:37:38 -06:00
  • 6da1ab6b41 [Core] Adding Priority Scheduling (#5958) Archit Patke 2024-09-24 21:50:50 -05:00
  • 01b6f9e1f0 [Core][Bugfix] Support prompt_logprobs returned with speculative decoding (#8047) Travis Johnson 2024-09-24 18:29:56 -06:00
  • 13f9f7a3d0 [[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768) Jee Jee Li 2024-09-25 08:08:55 +08:00
  • 1e7d5c01f5 [misc] soft drop beam search (#8763) youkaichao 2024-09-24 15:48:39 -07:00
  • 2467b642dd [CI/Build] fix setuptools-scm usage (#8771) Daniele 2024-09-24 21:38:12 +02:00
  • 72fc97a0f1 [Bugfix] Fix torch dynamo fixes caused by replace_parameters (#8748) Lucas Wilkinson 2024-09-24 14:33:21 -04:00
  • 2529d09b5a [Frontend] Batch inference for llm.chat() API (#8648) Andy 2024-09-24 12:44:11 -04:00
  • a928ded995 [Kernel] Split Marlin MoE kernels into multiple files (#8661) ElizaWszola 2024-09-24 18:31:42 +02:00
  • cc4325b66a [Bugfix] Fix potentially unsafe custom allreduce synchronization (#8558) Hanzhi Zhou 2024-09-24 01:08:14 -07:00
  • 8ff7ced996 [Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658) Alex Brooks 2024-09-24 01:36:46 -06:00
  • 3f06bae907 [Core][Model] Support loading weights by ID within models (#7931) Peter Salas 2024-09-24 00:14:15 -07:00
  • b8747e8a7c [MISC] Skip dumping inputs when unpicklable (#8744) Cody Yu 2024-09-23 23:10:03 -07:00
  • 3185fb0cca Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" (#8750) Simon Mo 2024-09-23 22:45:20 -07:00
  • 0250dd68c5 re-implement beam search on top of vllm core (#8726) youkaichao 2024-09-23 22:08:12 -07:00
  • 88577ac928 Fix tests in test_scheduler.py that fail with BlockManager V2 (#8728) sroy745 2024-09-23 21:43:13 -07:00
  • 530821d00c [Hardware][AMD] ROCm6.2 upgrade (#8674) Hongxia Yang 2024-09-23 21:52:39 -04:00
  • 1a2aef3e59 Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335) Alexander Matveev 2024-09-23 18:38:04 -04:00
  • 5f7bb58427 Fix typical acceptance sampler with correct recovered token ids (#8562) jiqing-feng 2024-09-24 03:32:27 +08:00
  • b05f5c9238 [Core] Allow IPv6 in VLLM_HOST_IP with zmq (#8575) Russell Bryant 2024-09-23 15:15:41 -04:00
  • 9b0e3ec970 [Kernel][LoRA] Add assertion for punica sgmv kernels (#7585) Jee Jee Li 2024-09-24 02:57:42 +08:00
  • 86e9c8df29 [Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701) Lucas Wilkinson 2024-09-23 13:46:26 -04:00
  • ee5f34b1c2 [CI/Build] use setuptools-scm to set __version__ (#4738) Daniele 2024-09-23 18:44:26 +02:00
  • f2bd246c17 [VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (#8707) Jani Monoses 2024-09-23 17:43:09 +03:00
  • a79e522984 [Model] Support pp for qwen2-vl (#8696) Yanyi Liu 2024-09-23 21:46:59 +08:00
  • 3e83c12b5c [Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (#8733) Li, Jiang 2024-09-23 21:15:16 +08:00
  • e551ca1555 [Hardware][CPU] Refactor CPU model runner (#8729) Isotr0py 2024-09-23 20:12:20 +08:00
  • 9b8c8ba119 [Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657) Alex Brooks 2024-09-23 01:44:48 -06:00
  • d23679eb99 [Bugfix] fix docker build for xpu (#8652) Yan Ma 2024-09-23 13:54:18 +08:00
  • 57a0702e63 [Bugfix] Fix CPU CMake build (#8723) Luka Govedič 2024-09-22 23:40:46 -04:00
  • 3dda7c2250 [Bugfix] Avoid some bogus messages RE CUTLASS's revision when building (#8702) Tyler Michael Smith 2024-09-22 22:24:59 -04:00
  • 92ba7e7477 [misc] upgrade mistral-common (#8715) youkaichao 2024-09-22 15:41:59 -07:00
  • d4a2ac8302 [build] enable existing pytorch (for GH200, aarch64, nightly) (#8713) youkaichao 2024-09-22 12:47:54 -07:00
  • c6bd70d772 [SpecDec][Misc] Cleanup, remove bonus token logic. (#8701) Lily Liu 2024-09-22 12:34:14 -07:00
  • 5b59532760 [Model][VLM] Add LLaVA-Onevision model support (#8486) litianjian 2024-09-23 01:51:44 +08:00
  • ca2b628b3c [MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (#8703) Huazhong Ji 2024-09-23 01:44:09 +08:00
  • 8ca5051b9a [Misc] Use NamedTuple in Multi-image example (#8705) Alex Brooks 2024-09-22 06:56:20 -06:00
  • 06ed2815e2 [Model] Refactor BLIP/BLIP-2 to support composite model loading (#8407) Cyrus Leung 2024-09-22 20:24:21 +08:00
  • 0e40ac9b7b [ci][build] fix vllm-flash-attn (#8699) youkaichao 2024-09-21 23:24:58 -07:00
  • 13d88d4137 [Bugfix] Refactor composite weight loading logic (#8656) Isotr0py 2024-09-22 12:33:27 +08:00
  • d66ac62854 [Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (#8643) Tyler Michael Smith 2024-09-21 19:45:02 -04:00
  • 9dc7c6c7f3 [dbrx] refactor dbrx experts to extend FusedMoe class (#8518) Divakar Verma 2024-09-21 16:09:39 -05:00
  • ec4aaad812 [Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (#8646) rasmith 2024-09-21 04:20:54 -05:00
  • 4dfdf43196 [Doc] Fix typo in AMD installation guide (#8689) Andy Dai 2024-09-21 00:24:12 -07:00
  • 5e85f4f82a [VLM] Use SequenceData.from_token_counts to create dummy data (#8687) Cyrus Leung 2024-09-21 14:28:56 +08:00
  • 71c60491f2 [Kernel] Build flash-attn from source (#8245) Luka Govedič 2024-09-21 02:27:10 -04:00
  • 0faab90eb0 [beam search] add output for manually checking the correctness (#8684) youkaichao 2024-09-20 19:55:33 -07:00
  • 0455c46ed4 [Core] Factor out common code in SequenceData and Sequence (#8675) Cyrus Leung 2024-09-21 10:30:39 +08:00
  • d4bf085ad0 [MISC] add support custom_op check (#8557) Kunshang Ji 2024-09-21 10:03:55 +08:00
  • 0057894ef7 [Core] Rename PromptInputs and inputs(#8673) Cyrus Leung 2024-09-21 10:00:54 +08:00
  • 0f961b3ce9 [Bugfix] Fix incorrect llava next feature size calculation (#8496) zyddnys 2024-09-20 18:48:32 -04:00
  • 7f9c8902e3 [Hardware][AWS] update neuron to 2.20 (#8676) omrishiv 2024-09-20 15:19:44 -07:00
  • 7c8566aa4f [Doc] neuron documentation update (#8671) omrishiv 2024-09-20 15:04:37 -07:00
  • b4e4eda92e [Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640) Patrick von Platen 2024-09-20 23:33:03 +02:00
  • 2874bac618 [Bugfix] Config got an unexpected keyword argument 'engine' (#8556) Pastel! 2024-09-21 05:00:45 +08:00
  • 035fa895ec [Misc] Show AMD GPU topology in collect_env.py (#8649) Cyrus Leung 2024-09-21 04:52:19 +08:00
  • b28298f2f4 [Bugfix] Validate SamplingParam n is an int (#8548) saumya-saran 2024-09-20 12:46:02 -07:00
  • 2940afa04e [CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (#8670) Alexey Kondratiev(AMD) 2024-09-20 13:27:44 -04:00
  • 3b63de9353 [Model] Add OLMoE (#7922) Niklas Muennighoff 2024-09-20 09:31:41 -07:00
  • 260d40b5ea [Core] Support Lora lineage and base model metadata management (#6315) Jiaxin Shan 2024-09-19 23:20:56 -07:00
  • 9e5ec35b1f [bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (#8474) William Lin 2024-09-19 20:49:54 -07:00
  • 18ae428a0d [Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571) Amit Garg 2024-09-19 17:54:02 -07:00
  • de6f90a13d [Misc] guard against change in cuda library name (#8609) bnellnm 2024-09-19 18:36:30 -04:00
  • 6cb748e190 [CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (#8551) Alexey Kondratiev(AMD) 2024-09-19 16:06:32 -04:00
  • 9e99407e3c Create SECURITY.md (#8642) Simon Mo 2024-09-19 12:16:28 -07:00
  • ea4647b7d7 [Doc] Add documentation for GGUF quantization (#8618) Isotr0py 2024-09-20 03:15:55 +08:00
  • e42c634acb [Core] simplify logits resort in _apply_top_k_top_p (#8619) 盏一 2024-09-20 02:28:25 +08:00