Commit Graph

  • a62bc0109c [Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark. (#10105) Atlas 2024-11-07 19:20:30 +08:00
  • 999df95b4e [Bugfix] Make image processor respect mm_processor_kwargs for Qwen2-VL (#10112) Jiahao Li 2024-11-07 18:50:44 +08:00
  • a6f332d0d9 [Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target (#10108) Li, Jiang 2024-11-07 18:42:50 +08:00
  • 0dfba97b42 [Frontend] Fix multiple values for keyword argument error (#10075) (#10076) Lei Yang 2024-11-07 17:07:19 +08:00
  • aa9078fa03 Adds method to read the pooling types from model's files (#9506) Flávia Béo 2024-11-07 05:42:40 -03:00
  • e036e527a0 [CI/Build] Improve mypy + python version matrix (#10041) Russell Bryant 2024-11-07 02:54:16 -05:00
  • 6192e9b8fe [Core][Distributed] Refactor ipc buffer init in CustomAllreduce (#10030) Hanzhi Zhou 2024-11-06 23:50:47 -08:00
  • d7263a1bb8 Doc: Improve benchmark documentation (#9927) Rafael Vasquez 2024-11-07 02:50:35 -05:00
  • 104d729656 [CI/Build] re-add codespell to CI (#10083) Russell Bryant 2024-11-07 01:54:46 -05:00
  • db7db4aab9 [Misc] Consolidate ModelConfig code related to HF config (#10104) Cyrus Leung 2024-11-07 14:00:21 +08:00
  • 1fa020c539 [V1][BugFix] Fix Generator construction in greedy + seed case (#10097) Nick Hill 2024-11-07 05:06:57 +00:00
  • e7b84c394d [doc] add back Python 3.8 ABI (#10100) youkaichao 2024-11-06 21:06:41 -08:00
  • a4b3e0c1e9 [Hardware][CPU] Update torch 2.5 (#9911) Li, Jiang 2024-11-07 12:43:08 +08:00
  • 29862b884b [Frontend] Adjust try/except blocks in API impl (#10056) Nick Hill 2024-11-07 04:07:51 +00:00
  • d3859f1891 [Misc][XPU] Upgrade to Pytorch 2.5 for xpu backend (#9823) Yan Ma 2024-11-07 09:29:03 +08:00
  • 4ab3256644 [Bugfix] Fix FP8 torch._scaled_mm fallback for torch>2.5 with CUDA<12.4 (#10095) Michael Goin 2024-11-06 19:54:13 -05:00
  • 719c1ca468 [core][distributed] add stateless_init_process_group (#10072) youkaichao 2024-11-06 16:42:09 -08:00
  • 74f2f8a0f1 [CI/Build] Always run the ruff workflow (#10092) Russell Bryant 2024-11-06 17:25:23 -05:00
  • d58268c56a [V1] Make v1 more testable (#9888) Joe Runde 2024-11-06 12:57:35 -07:00
  • 87bd7e0515 [CI/Build] change conflict PR comment from mergify (#10080) Russell Bryant 2024-11-06 13:15:42 -05:00
  • 098f94de42 [CI/Build] Drop Python 3.8 support (#10038) Russell Bryant 2024-11-06 09:31:01 -05:00
  • 399c798608 Remove ScaledActivation for AWQ (#10057) Michael Goin 2024-11-06 09:27:06 -05:00
  • 406d4cc480 [Model][LoRA]LoRA support added for Qwen2VLForConditionalGeneration (#10022) Eric 2024-11-06 22:13:15 +08:00
  • a5bba7d234 [Model] Add Idefics3 support (#9767) Jee Jee Li 2024-11-06 19:41:17 +08:00
  • 2003cc3513 [Model][LoRA]LoRA support added for LlamaEmbeddingModel (#10071) Jee Jee Li 2024-11-06 17:49:19 +08:00
  • 6a585a23d2 [Hotfix] Fix ruff errors (#10073) Woosuk Kwon 2024-11-06 01:24:28 -08:00
  • a02a50e6e5 [Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143) Konrad Zawora 2024-11-06 10:09:10 +01:00
  • a5fda50a10 [CI/Build] Fix large_gpu_mark reason (#10070) Isotr0py 2024-11-06 16:50:37 +08:00
  • 21063c11c7 [CI/Build] drop support for Python 3.8 EOL (#8464) Aaron Pham 2024-11-06 02:11:55 -05:00
  • 4be3a45158 [distributed] add function to create ipc buffers directly (#10064) youkaichao 2024-11-05 22:35:03 -08:00
  • 4089985552 [V1] Integrate Piecewise CUDA graphs (#10058) Woosuk Kwon 2024-11-05 22:16:04 -08:00
  • 9d59b75593 [Bugfix] Remove CustomChatCompletionContentPartParam multimodal input type (#10054) zifeitong 2024-11-05 21:13:09 -08:00
  • ea928f608c [Bugfix] Gpt-j-6B patch kv_scale to k_scale path (#10063) arakowsk-amd 2024-11-05 21:10:40 -08:00
  • 2bcbae704c [Bugfix] Fix edge-case crash when using chat with the Mistral Tekken Tokenizer (#10051) Travis Johnson 2024-11-05 21:28:29 -07:00
  • ffc0f2b47a [Model][OpenVINO] Fix regressions from #8346 (#10045) Peter Salas 2024-11-05 20:19:15 -08:00
  • 82bfc38d07 [Misc] Sort the list of embedding models (#10037) Cyrus Leung 2024-11-06 12:05:05 +08:00
  • c4cacbaa7f [v1] reduce graph capture time for piecewise cudagraph (#10059) youkaichao 2024-11-05 18:19:50 -08:00
  • 0c63c34f72 [Bugfix][SpecDecode] kv corruption with bonus tokens in spec decode (#9730) Sungjae Lee 2024-11-06 10:45:45 +09:00
  • 966e31697b [Bugfix] Fix pickle of input when async output processing is on (#9931) Wallas Henrique 2024-11-05 21:39:26 -03:00
  • 43300bd98a [Bugfix] Properly propagate trust_remote_code settings (#10047) zifeitong 2024-11-05 16:34:40 -08:00
  • ca9844b340 [bugfix] fix weak ref in piecewise cudagraph and tractable test (#10048) youkaichao 2024-11-05 14:49:20 -08:00
  • 235366fe2e [CI] Prune back the number of tests in tests/kernels/* (#9932) Michael Goin 2024-11-05 16:02:32 -05:00
  • 02462465ea [CI] Prune tests/models/decoder_only/language/* tests (#9940) Michael Goin 2024-11-05 16:02:23 -05:00
  • b9c64c0ca7 [Misc] Modify BNB parameter name (#9997) Jee Jee Li 2024-11-06 03:40:08 +08:00
  • d2e80332a7 [Feature] Update benchmark_throughput.py to support image input (#9851) lkchen 2024-11-05 11:30:02 -08:00
  • a53046b16f [Model] Support quantization of PixtralHFTransformer for PixtralHF (#9921) Michael Goin 2024-11-05 13:42:20 -05:00
  • 731aec5be7 [CI/Build] Limit github CI jobs based on files changed (#9928) Russell Bryant 2024-11-05 13:30:42 -05:00
  • 09d3550372 [Misc] Add logging for CUDA memory (#10027) Chenghao (Alan) Yang 2024-11-05 11:50:50 -06:00
  • cd34029e91 Refactor TPU requirements file and pin build dependencies (#10010) Richard Liu 2024-11-05 08:48:44 -08:00
  • 5952d81139 [Frontend] Fix tcp port reservation for api server (#10012) Russell Bryant 2024-11-05 10:50:57 -05:00
  • 93dee88f6b [Misc] vllm CLI flags should be ordered for better user readability (#10017) Chauncey 2024-11-05 18:59:56 +08:00
  • 7a83b1aec0 [BugFix] Lazy import ray (#10021) Gene Der Su 2024-11-05 02:04:10 -08:00
  • ad23318928 [Bugfix] Fixup Mamba (#10004) Tyler Michael Smith 2024-11-04 22:46:38 -05:00
  • bbc3619dc8 [Core] Make encoder-decoder inputs a nested structure to be more composable (#9604) Cyrus Leung 2024-11-05 10:07:31 +08:00
  • 04bbf38e05 [Core] Use os.sched_yield in ShmRingBuffer instead of time.sleep (#9994) Tyler Michael Smith 2024-11-04 20:08:21 -05:00
  • 8f0a9ca890 [Bugfix] Respect modules_to_not_convert within awq_marlin (#9895) Michael Goin 2024-11-04 18:57:44 -05:00
  • 2094062b4e [4.5/N] bugfix for quant config in speculative decode (#10007) youkaichao 2024-11-04 15:11:59 -08:00
  • d93478b399 [Bugfix] Upgrade to pytorch 2.5.1 (#10001) bnellnm 2024-11-04 18:11:28 -05:00
  • ac04a97a9f [Frontend] Add max_tokens prometheus metric (#9881) tomeras91 2024-11-05 00:53:24 +02:00
  • 9a5664d4a4 [Misc] Refactor benchmark_throughput.py (#9779) lkchen 2024-11-04 14:32:16 -08:00
  • 04cef2c6ab [Bugfix] Fix MQLLMEngine hanging (#9973) Robert Shaw 2024-11-04 16:01:43 -05:00
  • 6e056bcf04 [Doc] Update VLM doc about loading from local files (#9999) Roger Wang 2024-11-04 11:47:11 -08:00
  • 5208dc7a20 [Bugfix][CI/Build][Hardware][AMD] Shard ID parameters in AMD tests running parallel jobs (#9279) hissu-hyvarinen 2024-11-04 21:37:46 +02:00
  • 1c45f4c385 [CI] Basic Integration Test For TPU (#9968) Robert Shaw 2024-11-04 14:34:26 -05:00
  • 603a661ae8 [Model] factoring out MambaMixer out of Jamba (#8993) Mor Zusman 2024-11-04 20:00:00 +02:00
  • fb2716d641 [Misc]Reduce BNB static variable (#9987) Jee Jee Li 2024-11-05 01:04:40 +08:00
  • 8d72bb20fa [4/N] make quant config first-class citizen (#9978) youkaichao 2024-11-04 08:51:31 -08:00
  • ac6b8f19b9 [Frontend] Multi-Modality Support for Loading Local Image Files (#9915) Chauncey 2024-11-04 23:34:57 +08:00
  • ccb5376a9a [Bugfix][OpenVINO] Fix circular reference #9939 (#9974) Mengqing Cao 2024-11-04 18:14:13 +08:00
  • ea4adeddc1 [Bugfix] Fix E2EL mean and median stats (#9984) Tran Quang Dai 2024-11-04 16:37:58 +07:00
  • 4dbcbbeb09 [Misc] Compute query_start_loc/seq_start_loc on CPU (#9447) Yang Zheng 2024-11-04 16:54:37 +08:00
  • b67feb1274 [Bugfix]Using the correct type hints (#9885) Gregory Shtrasberg 2024-11-04 01:19:51 -05:00
  • c49f0407ba [Bugfix] Fix MiniCPMV and Mllama BNB bug (#9917) Jee Jee Li 2024-11-04 11:36:41 +08:00
  • 91c9ebbb1b [V1] Fix Configs (#9971) Robert Shaw 2024-11-03 19:24:40 -05:00
  • 54597724f4 [Model] Add support for H2OVL-Mississippi models (#9747) shanshan wang 2024-11-03 18:15:36 -06:00
  • 1f1b6d6eda [V1] Support per-request seed (#9945) Nick Hill 2024-11-03 17:14:17 +00:00
  • 3bb4befea7 [bugfix] fix tsts (#9959) youkaichao 2024-11-02 15:54:05 -07:00
  • ae5279a163 [torch.compile] Adding torch compile to vision-language models (#9946) Yongzao 2024-11-03 03:56:05 +08:00
  • 1b73ab2a1f [CI/Build] Quoting around > (#9956) Nikita Furin 2024-11-02 22:50:28 +03:00
  • cea808f325 [3/N] model runner pass the whole config to model (#9958) youkaichao 2024-11-02 12:08:49 -07:00
  • 74b529ceee [bugfix] fix chatglm dummy_data_for_glmv (#9955) youkaichao 2024-11-02 08:03:33 -07:00
  • d6459b4516 [V1] Fix EngineArgs refactor on V1 (#9954) Robert Shaw 2024-11-02 10:44:38 -04:00
  • e893795443 [2/N] executor pass the complete config to worker/modelrunner (#9938) youkaichao 2024-11-02 07:35:05 -07:00
  • 1d4cfe2be1 [Doc] Updated tpu-installation.rst with more details (#9926) Michael Green 2024-11-02 14:06:45 +00:00
  • eed92f12fc [Docs] Update Granite 3.0 models in supported models table (#9930) Nick Hill 2024-11-02 09:02:18 +00:00
  • af7380d83b [torch.compile] fix cpu broken code (#9947) youkaichao 2024-11-01 23:35:47 -07:00
  • a78dd3303e [Encoder Decoder] Add flash_attn kernel support for encoder-decoder models (#9559) sroy745 2024-11-01 23:22:49 -07:00
  • d522034c85 [ci/build] Have dependabot ignore pinned dependencies (#9935) Kevin H. Luu 2024-11-01 13:56:13 -10:00
  • 6c0b7f548d [Core][VLM] Add precise multi-modal placeholder tracking (#8346) Peter Salas 2024-11-01 16:21:10 -07:00
  • d151fde834 [ci/build] Bump the patch-update group with 10 updates (#9897) dependabot[bot] 2024-11-01 23:04:42 +00:00
  • 27cd36e6e2 [Bugfix] PicklingError on RayTaskError (#9934) Gene Der Su 2024-11-01 15:08:23 -07:00
  • 18bd7587b7 [1/N] pass the complete config from engine to executor (#9933) youkaichao 2024-11-01 13:51:57 -07:00
  • 598b6d7b07 [Bugfix/Core] Flashinfer k_scale and v_scale (#9861) Pavani Majety 2024-11-01 12:15:05 -07:00
  • aff1fd8188 [torch.compile] use interpreter with stable api from pytorch (#9889) youkaichao 2024-11-01 11:50:37 -07:00
  • 4581d2cc02 [Core] Refactor: Clean up unused argument in Scheduler._preempt (#9696) André Jonasson 2024-11-01 19:41:38 +01:00
  • 1dd4cb2935 [Bugfix] Fix edge cases for MistralTokenizer (#9625) Travis Johnson 2024-11-01 11:33:15 -06:00
  • ba0d892074 [Frontend] Use a proper chat template for VLM2Vec (#9912) Cyrus Leung 2024-11-01 22:09:07 +08:00
  • 30a2e80742 [CI/Build] Add Model Tests for PixtralHF (#9813) Michael Goin 2024-11-01 09:55:29 -04:00
  • 06386a64dd [Frontend] Chat-based Embeddings API (#9759) Cyrus Leung 2024-11-01 16:13:35 +08:00
  • d3aa2a8b2f [Doc] Update multi-input support (#9906) Cyrus Leung 2024-11-01 15:34:49 +08:00