Commit Graph

  • 7a3a83e3b8 [CI/Build] Move model-specific multi-modal processing tests (#11934) Cyrus Leung 2025-01-11 13:50:05 +08:00
  • c32a7c7c0c [Bugfix] fused_experts_impl wrong compute type for float32 (#11921) shaochangxu 2025-01-11 13:49:39 +08:00
  • 2118d0565c [Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design (#11672) Sungjae Lee 2025-01-11 13:49:38 +09:00
  • 899136b857 [ci] fix broken distributed-tests-4-gpus (#11937) youkaichao 2025-01-11 09:07:24 +08:00
  • c9f09a4fe8 [mypy] Fix mypy warnings in api_server.py (#11941) Fred Reiss 2025-01-10 17:04:58 -08:00
  • d45cbe70f5 [Bugfix] Check that number of images matches number of <|image|> tokens with mllama (#11939) Travis Johnson 2025-01-10 16:26:00 -07:00
  • 8a579408f3 [Misc] Update benchmark_prefix_caching.py fixed example usage (#11920) minmin 2025-01-11 04:39:22 +08:00
  • 46fa98ccad [Misc] Clean up debug code in Deepseek-V3 (#11930) Isotr0py 2025-01-11 03:19:15 +08:00
  • aa1e77a19c [Hardware][CPU] Support MOE models on x86 CPU (#11831) Li, Jiang 2025-01-11 00:07:58 +08:00
  • 5959564f94 Doc fix in benchmark_long_document_qa_throughput.py (#11933) Kuntai Du 2025-01-10 23:51:43 +08:00
  • f33e033e27 [Docs] Fix docstring in get_ip function (#11932) Kuntai Du 2025-01-10 23:51:02 +08:00
  • 482cdc494e [Doc] Rename offline inference examples (#11927) Harry Mellor 2025-01-10 15:50:29 +00:00
  • 20410b2fda [platform] support custom torch.compile backend key (#11318) wangxiyuan 2025-01-10 23:46:51 +08:00
  • 12664ddda5 [Doc] [1/N] Initial guide for merged multi-modal processor (#11925) Cyrus Leung 2025-01-10 22:30:25 +08:00
  • 241ad7b301 [ci] Fix sampler tests (#11922) youkaichao 2025-01-10 20:45:33 +08:00
  • d85c47d6ad Replace "online inference" with "online serving" (#11923) Harry Mellor 2025-01-10 12:05:56 +00:00
  • ef725feafc [platform] support pytorch custom op pluggable (#11328) wangxiyuan 2025-01-10 18:02:38 +08:00
  • d907be7dc7 [misc] remove python function call for custom activation op (#11885) cennn 2025-01-10 17:18:25 +08:00
  • d53575a5f0 [ci] fix gh200 tests (#11919) youkaichao 2025-01-10 16:25:17 +08:00
  • 61af633256 [BUGFIX] Fix UnspecifiedPlatform package name (#11916) Kunshang Ji 2025-01-10 16:20:46 +08:00
  • ac2f3f7fee [Bugfix] Validate lora adapters to avoid crashing server (#11727) Joe Runde 2025-01-10 00:56:36 -07:00
  • cf5f000d21 [torch.compile] Hide KV cache behind torch.compile boundary (#11677) Chen Zhang 2025-01-10 13:14:42 +08:00
  • 3de2b1eafb [Doc] Show default pooling method in a table (#11904) Cyrus Leung 2025-01-10 11:25:20 +08:00
  • b844b99ad3 [VLM] Enable tokenized inputs for merged multi-modal processor (#11900) Cyrus Leung 2025-01-10 11:24:00 +08:00
  • c3cf54dda4 [Doc][5/N] Move Community and API Reference to the bottom (#11896) Cyrus Leung 2025-01-10 11:10:12 +08:00
  • 36f5303578 [Docs] Add Modal to deployment frameworks (#11907) Charles Frye 2025-01-09 15:26:37 -08:00
  • 9a228348d2 [Misc] Provide correct Pixtral-HF chat template (#11891) Cyrus Leung 2025-01-10 01:19:37 +08:00
  • bd82872211 [ci]try to fix flaky multi-step tests (#11894) youkaichao 2025-01-09 22:47:29 +08:00
  • 405eb8e396 [platform] Allow platform specify attention backend (#11609) wangxiyuan 2025-01-09 21:46:50 +08:00
  • 65097ca0af [Doc] Add model development API Reference (#11884) Cyrus Leung 2025-01-09 17:43:40 +08:00
  • 1d967acb45 [Bugfix] fix beam search input errors and latency benchmark script (#11875) Ye (Charlotte) Qi 2025-01-09 01:36:39 -08:00
  • 0bd1ff4346 [Bugfix] Override dunder methods of placeholder modules (#11882) Cyrus Leung 2025-01-09 17:02:53 +08:00
  • 310aca88c9 [perf]fix current stream (#11870) youkaichao 2025-01-09 15:18:21 +08:00
  • a732900efc [Doc] Intended links Python multiprocessing library (#11878) Guspan Tanadi 2025-01-09 12:39:39 +07:00
  • d848800e88 [Misc] Move print_*_once from utils to logger (#11298) Cyrus Leung 2025-01-09 12:48:12 +08:00
  • 730e9592e9 [Doc] Recommend uv and python 3.12 for quickstart guide (#11849) Michael Goin 2025-01-08 22:37:48 -05:00
  • 1fe554bac3 treat do_lower_case in the same way as the sentence-transformers library (#11815) Maximilien de Bayser 2025-01-09 00:05:43 -03:00
  • 615e4a5401 [CI] Turn on basic correctness tests for V1 (#10864) Tyler Michael Smith 2025-01-08 21:20:44 -05:00
  • 3db0cafdf1 [Docs] Add Google Cloud Meetup (#11864) Simon Mo 2025-01-08 12:38:28 -08:00
  • 526de822d5 [Kernel][Triton][AMD] Use block size heuristic for avg 2.8x speedup for int8 models (#11698) rasmith 2025-01-08 14:23:15 -06:00
  • 56fe4c297c [TPU][Quantization] TPU W8A8 (#11785) Robert Shaw 2025-01-08 14:33:29 -05:00
  • 47de8821d3 [Misc]add some explanations for BlockHashType (#11847) WangErXiao 2025-01-09 02:21:30 +08:00
  • 5984499e47 [Doc] Expand Multimodal API Reference (#11852) Cyrus Leung 2025-01-09 01:14:14 +08:00
  • ca47e176af [Misc] Move some model utils into vision file (#11848) Cyrus Leung 2025-01-09 01:04:46 +08:00
  • 78f4590b60 [Bugfix][XPU] fix silu_and_mul (#11823) Yan Ma 2025-01-09 00:11:50 +08:00
  • 2f7024987e [CI/Build][Bugfix] Fix CPU CI image clean up (#11836) Li, Jiang 2025-01-08 23:18:28 +08:00
  • 6cd40a5bfe [Doc][4/N] Reorganize API Reference (#11843) Cyrus Leung 2025-01-08 21:34:44 +08:00
  • aba8d6ee00 [Doc] Move examples into categories (#11840) Harry Mellor 2025-01-08 13:09:53 +00:00
  • 2a0596bc48 [VLM] Reorganize profiling/processing-related code (#11812) Cyrus Leung 2025-01-08 18:59:58 +08:00
  • f12141170a [torch.compile] consider relevant code in compilation cache (#11614) youkaichao 2025-01-08 18:46:43 +08:00
  • cfd3219f58 [Hardware][Apple] Native support for macOS Apple Silicon (#11696) Wallas Henrique 2025-01-08 05:35:49 -03:00
  • a1b2b8606e [Docs] Update sponsor name: 'Novita' to 'Novita AI' (#11833) Simon Mo 2025-01-07 23:05:46 -08:00
  • ad9f1aa679 [doc] update wheels url (#11830) youkaichao 2025-01-08 14:36:49 +08:00
  • 889e662eae [misc] improve memory profiling (#11809) youkaichao 2025-01-08 14:36:03 +08:00
  • ef68eb28d8 [Bug] Fix pickling of ModelConfig when RunAI Model Streamer is used (#11825) Cyrus Leung 2025-01-08 13:40:09 +08:00
  • 259abd8953 [Docs] reorganize sponsorship page (#11639) Simon Mo 2025-01-07 21:16:08 -08:00
  • f645eb6954 [Bugfix] Add checks for LoRA and CPU offload (#11810) Jee Jee Li 2025-01-08 13:08:48 +08:00
  • f4923cb8bc [OpenVINO] Fixed Docker.openvino build (#11732) Ilya Lavrenov 2025-01-08 09:08:30 +04:00
  • b640b19cc0 Fixed docker build for ppc64le (#11518) Nishidha 2025-01-08 10:35:37 +05:30
  • dc71af0a71 Remove the duplicate imports of MultiModalKwargs and PlaceholderRange… (#11824) WangErXiao 2025-01-08 12:09:25 +08:00
  • 4d29e91be8 [Misc] sort torch profiler table by kernel timing (#11813) Divakar Verma 2025-01-07 20:57:04 -06:00
  • 91445c7bc8 [Bugfix] Fix image input for Pixtral-HF (#11741) Cyrus Leung 2025-01-08 10:17:16 +08:00
  • 5950f555a1 [Doc] Group examples into categories (#11782) Harry Mellor 2025-01-08 01:20:12 +00:00
  • a4e2b26856 [Bugfix] Significant performance drop on CPUs with --num-scheduler-steps > 1 (#11794) Jie Fu (傅杰) 2025-01-08 08:15:50 +08:00
  • 973f5dc581 [Doc]Add documentation for using EAGLE in vLLM (#11417) sroy745 2025-01-07 11:19:12 -08:00
  • c994223d56 [Bugfix] update the prefix for qwen2 (#11795) jiangjiadi 2025-01-08 02:36:34 +08:00
  • 869579a702 [optimization] remove python function call for custom op (#11750) youkaichao 2025-01-08 01:04:28 +08:00
  • c0efe92d8b [Doc] Add note to gte-Qwen2 models (#11808) Cyrus Leung 2025-01-07 21:50:58 +08:00
  • d9fa1c05ad [doc] update how pip can install nightly wheels (#11806) youkaichao 2025-01-07 21:42:58 +08:00
  • 2de197bdd4 [V1] Support audio language models on V1 (#11733) Roger Wang 2025-01-07 03:47:36 -08:00
  • 869e829b85 [doc] add doc to explain how to use uv (#11773) youkaichao 2025-01-07 18:41:17 +08:00
  • 8f37be38eb [Bugfix] Comprehensively test and fix LLaVA-NeXT feature size calculation (#11800) Cyrus Leung 2025-01-07 18:25:02 +08:00
  • 8082ad7950 [V1][Doc] Update V1 support for LLaVa-NeXT-Video (#11798) Roger Wang 2025-01-07 01:55:39 -08:00
  • 1e4ce295ae [CI][CPU] adding build number to docker image name (#11788) Yuan 2025-01-07 15:28:01 +08:00
  • ce1917fcf2 [Doc] Create a vulnerability management team (#9925) Russell Bryant 2025-01-07 01:57:32 -05:00
  • e512f76a89 fix init error for MessageQueue when n_local_reader is zero (#11768) XiaobingZhang 2025-01-07 14:12:48 +08:00
  • 898cdf033e [CI] Fix neuron CI and run offline tests (#11779) Liangfu Chen 2025-01-06 21:36:10 -08:00
  • 0f3f3c86ec [Bugfix] Update attention interface in Whisper (#11784) Roger Wang 2025-01-06 20:36:24 -08:00
  • b278557935 [Kernel][LoRA]Punica prefill kernels fusion (#11234) Jee Jee Li 2025-01-07 12:01:39 +08:00
  • 8ceffbf315 [Doc][3/N] Reorganize Serving section (#11766) Cyrus Leung 2025-01-07 11:20:01 +08:00
  • d93d2d74fd [XPU] Make pp group initilized for pipeline-parallelism (#11648) YiSheng5 2025-01-07 11:09:58 +08:00
  • d0169e1b0f [Model] Future-proof Qwen2-Audio multi-modal processor (#11776) Cyrus Leung 2025-01-07 11:05:17 +08:00
  • 08fb75c72e [Bugfix] Fix LLaVA-NeXT feature size precision error (for real) (#11772) Cyrus Leung 2025-01-07 09:10:54 +08:00
  • 91b361ae89 [V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685) Roger Wang 2025-01-06 11:58:16 -08:00
  • e20c92bb61 [Kernel] Move attn_type to Attention.__init__() (#11690) Chen Zhang 2025-01-07 00:11:28 +08:00
  • 32c9eff2ff [Bugfix][V1] Fix molmo text-only inputs (#11676) Jee Jee Li 2025-01-06 23:22:25 +08:00
  • 4ca5d40adc [doc] explain how to add interleaving sliding window support (#11771) youkaichao 2025-01-06 21:57:44 +08:00
  • 9279b9f83d [Bugfix] Fix max image size for LLaVA-Onevision (#11769) Roger Wang 2025-01-06 05:48:53 -08:00
  • ee77fdb5de [Doc][2/N] Reorganize Models and Usage sections (#11755) Cyrus Leung 2025-01-06 21:40:31 +08:00
  • 996357e480 [VLM] Separate out profiling-related logic (#11746) Cyrus Leung 2025-01-06 16:02:21 +08:00
  • 2a622d704a k8s-config: Update the secret to use stringData (#11679) Suraj Deshmukh 2025-01-06 00:01:22 -08:00
  • 9c749713f6 [mypy] Forward pass function type hints in lora (#11740) Lucas Tucker 2025-01-06 01:59:36 -06:00
  • 022c5c6944 [V1] Refactor get_executor_cls (#11754) Rui Qiao 2025-01-05 23:59:16 -08:00
  • f8fcca100b [Misc] Fix typo for valid_tool_parses (#11753) Rui Qiao 2025-01-05 23:12:38 -08:00
  • 06bfb51963 [V1] Add BlockTable class (#11693) Woosuk Kwon 2025-01-06 14:24:42 +09:00
  • 408e560015 [Bugfix] Remove block size constraint (#11723) Cody Yu 2025-01-05 20:49:55 -08:00
  • 402d378360 [Doc] [1/N] Reorganize Getting Started section (#11645) Cyrus Leung 2025-01-06 10:18:33 +08:00
  • 9e764e7b10 [distributed] remove pynccl's redundant change_state (#11749) cennn 2025-01-06 09:05:48 +08:00
  • 33fc1e2e86 [Frontend] Improve StreamingResponse Exception Handling (#11752) Robert Shaw 2025-01-05 16:35:01 -05:00
  • eba17173d3 fix: [doc] fix typo (#11751) Lancer 2025-01-06 00:48:16 +08:00