Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

9013e24f7b [torch.compile] Adding torch compile annotations to some models (#9614) Yongzao 2024-10-24 01:07:48 +08:00
fd0e2cfdb2 [Misc] Separate total and output tokens in benchmark_throughput.py (#8914) Michael Goin 2024-10-23 12:47:20 -04:00
e5ac6a4199 [Bugfix] Fix divide by zero when serving Mamba models (#9617) Tyler Michael Smith 2024-10-23 12:40:43 -04:00
dbdd3b5e5a [misc] comment to avoid future confusion about baichuan (#9620) youkaichao 2024-10-23 09:14:44 -07:00
e7116c017c [Bugfix] Fix _init_vision_model in NVLM_D model (#9611) Cyrus Leung 2024-10-23 22:09:04 +08:00
31a08f5bd2 [Model] Add min_pixels / max_pixels to Qwen2VL as mm_processor_kwargs (#9612) Alex Brooks 2024-10-23 08:05:18 -06:00
c18e1a3418 [VLM] Enable overriding whether post layernorm is used in vision encoder + fix quant args (#9217) Cyrus Leung 2024-10-23 19:27:37 +08:00
3ff57ebfca [Model] Initialize Florence-2 language backbone support (#9555) Isotr0py 2024-10-23 18:42:47 +08:00
2394962d70 [Hardware][XPU] using current_platform.is_xpu (#9605) Mengqing Cao 2024-10-23 16:28:21 +08:00
51c24c9736 [Build] Fix FetchContent multiple build issue (#9596) Luka Govedič 2024-10-23 00:43:07 -04:00
831540cf04 [Model] Support E5-V (#9576) Cyrus Leung 2024-10-23 11:35:29 +08:00
29061ed9df [Misc] Add an env var VLLM_LOGGING_PREFIX, if set, it will be prepend to all logging messages (#9590) Flex Wang 2024-10-22 20:17:28 -07:00
65050a40e6 [Bugfix] Generate exactly input_len tokens in benchmark_throughput (#9592) Chen Zhang 2024-10-22 17:45:35 -07:00
208cb34c81 [Doc]: Update tensorizer docs to include vllm[tensorizer] (#7889) Seth Kimmel 2024-10-22 15:43:25 -07:00
b17046e298 [BugFix] Fix metrics error for --num-scheduler-steps > 1 (#8234) yulei 2024-10-23 06:43:03 +08:00
d1e8240875 [Bugfix] Fix spurious "No compiled cutlass_scaled_mm ..." for W8A8 on Turing (#9487) Lucas Wilkinson 2024-10-22 18:41:13 -04:00
cb6fdaa0a0 [Misc] Make benchmarks use EngineArgs (#9529) Jeremy Arnold 2024-10-22 17:40:38 -05:00
23b899a8e6 [Bugfix] fix detokenizer shallow copy (#5919) Aurick Qiao 2024-10-22 18:38:12 -04:00
17c79f3c36 [torch.compile] auto infer dynamic_arg_dims from type annotation (#9589) youkaichao 2024-10-22 13:43:37 -07:00
cd5601ac37 [BugFix] Prevent exporting duplicate OpenTelemetry spans (#9017) Ronen Schaffer 2024-10-22 21:11:53 +03:00
434984e665 [Frontend] Support custom request_id from request (#9550) Yuhong Guo 2024-10-23 02:07:30 +08:00
32a1ee74a0 [Hardware][Intel CPU][DOC] Update docs for CPU backend (#6212) Yuan 2024-10-22 10:38:04 -07:00
08075c3448 [Bugfix] Eagle: change config name for fc bias (#9580) gopalsarda 2024-10-22 21:44:22 +05:30
bb392ea2d2 [Model][VLM] Initialize support for Mono-InternVL model (#9528) Isotr0py 2024-10-23 00:01:46 +08:00
9dbcce84a7 [Neuron] [Bugfix] Fix neuron startup (#9374) xendo 2024-10-22 14:51:41 +02:00
a48e3ec052 [CI/Build][LoRA] Temporarily fix long context failure issue (#9579) Jee Jee Li 2024-10-22 19:32:51 +08:00
6c5af09b39 [V1] Implement vLLM V1 [1/N] (#9289) Woosuk Kwon 2024-10-22 01:24:07 -07:00
3ddbe25502 [Hardware][CPU] using current_platform.is_cpu (#9536) wangshuai09 2024-10-22 15:50:43 +08:00
0d02747f2e support TP in qwen2 bnb (#9574) chenqianfzh 2024-10-22 00:13:23 -07:00
f7db5f0fa9 [Doc] Use shell code-blocks and fix section headers (#9508) Rafael Vasquez 2024-10-22 02:43:24 -04:00
ca30c3c84b [Core] Remove evictor_v1 (#9572) Kuntai Du 2024-10-21 23:55:49 -05:00
c0292211ce [CI/Build] Replaced some models on tests for smaller ones (#9570) Wallas Henrique 2024-10-22 01:52:14 -03:00
74692421f7 [Bugfix]: phi.py get rope_theta from config file (#9503) Falko1 2024-10-22 04:53:36 +02:00
29acd2c34c [Bugfix][OpenVINO] fix_dockerfile_openvino (#9552) ngrozae 2024-10-22 04:47:52 +02:00
f085995a7b [CI/Build] Remove unnecessary fork_new_process (#9484) Cyrus Leung 2024-10-22 10:47:29 +08:00
b729901139 [Bugfix]: serialize config by value for --trust-remote-code (#6751) Travis Johnson 2024-10-21 20:46:24 -06:00
76a5e13270 [core] move parallel sampling out from vllm core (#9302) youkaichao 2024-10-21 17:31:44 -07:00
ef7faad1b8 🐛 Fixup more test failures from memory profiling (#9563) Joe Runde 2024-10-21 19:10:56 -05:00
575dcebe9a [CI] Make format checker error message more user-friendly by using emoji (#9564) Kuntai Du 2024-10-21 18:45:15 -05:00
711f3a7806 [Frontend] Don't log duplicate error stacktrace for every request in the batch (#9023) Wallas Henrique 2024-10-21 18:49:41 -03:00
15713e3b75 [BugFix] Update draft model TP size check to allow matching target TP size (#9394) Nick Hill 2024-10-21 22:14:29 +01:00
d621c43df7 [doc] fix format (#9562) youkaichao 2024-10-21 13:54:57 -07:00
9d9186be97 [Frontend] Reduce frequency of client cancellation checking (#7959) Nick Hill 2024-10-21 21:28:10 +01:00
5241aa1494 [Model][Bugfix] Fix batching with multi-image in PixtralHF (#9518) Michael Goin 2024-10-21 14:20:07 -04:00
ec6bd6c4c6 [BugFix] Use correct python3 binary in Docker.ppc64le entrypoint (#9492) Varad Ahirwadkar 2024-10-21 23:13:02 +05:30
8ca8954841 [Bugfix][Misc]: fix graph capture for decoder (#9549) yudian0504 2024-10-22 01:33:30 +08:00
f6b97293aa [Model] FalconMamba Support (#9325) Dhia Eddine Rhaiem 2024-10-21 20:50:16 +04:00
496e991da8 [Doc] Consistent naming of attention backends (#9498) Thomas Parnell 2024-10-21 16:29:57 +02:00
696b01af8f [CI/Build] Split up decoder-only LM tests (#9488) Cyrus Leung 2024-10-21 12:27:50 +08:00
855e0e6f97 [Frontend][Misc] Goodput metric support (#9338) Andy Dai 2024-10-20 11:39:32 -07:00
4fa3e33349 [Kernel] Support sliding window in flash attention backend (#9403) Chen Zhang 2024-10-20 10:57:52 -07:00
962d2c6349 [Model][Pixtral] Use memory_efficient_attention for PixtralHFVision (#9520) Michael Goin 2024-10-20 01:29:14 -04:00
5b59fe0f08 [Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger (#9530) Chen Zhang 2024-10-19 17:05:02 -07:00
8e3e7f2713 [Model][Pixtral] Optimizations for input_processor_for_pixtral_hf (#9514) Michael Goin 2024-10-19 10:44:29 -04:00
263d8ee150 [Bugfix] Fix missing task for speculative decoding (#9524) Cyrus Leung 2024-10-19 14:49:40 +08:00
c5eea3c8ba [Frontend] Support simpler image input format (#9478) Yue Zhang 2024-10-18 23:17:07 -07:00
85dc92fc98 [CI/Build] Configure matcher for actionlint workflow (#9511) Russell Bryant 2024-10-19 02:04:18 -04:00
dfd951ed9b [CI/Build] Add error matching for ruff output (#9513) Russell Bryant 2024-10-19 01:42:20 -04:00
82c25151ec [Doc] update gpu-memory-utilization flag docs (#9507) Joe Runde 2024-10-18 22:26:36 -05:00
1325872ec8 [Frontend] Avoid creating guided decoding LogitsProcessor unnecessarily (#9521) Nick Hill 2024-10-19 04:21:01 +01:00
380e18639f 🐛 fix torch memory profiling (#9516) Joe Runde 2024-10-18 20:25:19 -05:00
337ed76671 [Bugfix] Fix offline mode when using mistral_common (#9457) sasha0552 2024-10-19 01:12:32 +00:00
0c9a5258f9 [Kernel] Add env variable to force flashinfer backend to enable tensor cores (#9497) Thomas Parnell 2024-10-19 02:55:48 +02:00
d11bf435a0 [MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py (#9510) Cody Yu 2024-10-18 14:30:55 -07:00
9bb10a7d27 [MISC] Add lora requests to metrics (#9477) Kunjan 2024-10-18 13:50:18 -07:00
3921a2f29e [Model] Support Pixtral models in the HF Transformers format (#9036) Michael Goin 2024-10-18 15:29:56 -04:00
67a7e5ef38 [CI/Build] Add error matching config for mypy (#9512) Russell Bryant 2024-10-18 15:17:53 -04:00
051eaf6db3 [Model] Add user-configurable task for models that support both generation and embedding (#9424) Cyrus Leung 2024-10-19 02:31:58 +08:00
7dbe738d65 [Misc] benchmark: Add option to set max concurrency (#9390) Russell Bryant 2024-10-18 14:15:28 -04:00
ae8b633ba3 [Bugfix] Fix offline_inference_with_prefix.py (#9505) Tyler Michael Smith 2024-10-18 12:59:19 -04:00
1bbbcc0b1d [CI/Build] Fix lint errors in mistral tokenizer (#9504) Cyrus Leung 2024-10-19 00:09:35 +08:00
25aeb7d4c9 [BugFix] Fix and simplify completion API usage streaming (#9475) Nick Hill 2024-10-18 15:10:26 +01:00
d2b1bf55ec [Frontend][Feature] Add jamba tool parser (#9154) tomeras91 2024-10-18 13:27:48 +03:00
1ffc8a7362 [BugFix] Typing fixes to RequestOutput.prompt and beam search (#9473) Nick Hill 2024-10-18 08:19:53 +01:00
944dd8edaf [CI/Build] Use commit hash references for github actions (#9430) Russell Bryant 2024-10-18 00:54:58 -04:00
154a8ae880 [Qwen2.5] Support bnb quant for Qwen2.5 (#9467) Haoyu Wang 2024-10-18 12:40:14 +08:00
de4008e2ab [Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352) Joe Runde 2024-10-17 21:47:27 -05:00
48138a8415 [BugFix] Stop silent failures on compressed-tensors parsing (#9381) Dipika Sikka 2024-10-17 21:54:00 -04:00
343f8e0905 Support BERTModel (first encoder-only embedding model) (#9056) Robert Shaw 2024-10-17 19:21:01 -04:00
bb76538bbd [Hardwware][Neuron] Simplify model load for transformers-neuronx library (#9380) Shashwat Srijan 2024-10-17 15:39:39 -07:00
d615b5c9f8 [Bugfix] Print warnings related to mistral_common tokenizer only once (#9468) sasha0552 2024-10-17 21:44:20 +00:00
d65049daab [Bugfix] Add random_seed to sample_hf_requests in benchmark_serving script (#9013) Kai Wu 2024-10-17 14:11:11 -07:00
eca2c5f7c0 [Bugfix] Fix support for dimension like integers and ScalarType (#9299) bnellnm 2024-10-17 15:08:34 -04:00
0f41fbe5a3 [torch.compile] Fine-grained CustomOp enabling mechanism (#9300) Luka Govedič 2024-10-17 14:36:37 -04:00
7871659abb [Misc] Remove commit id file (#9470) Cyrus Leung 2024-10-18 01:34:37 +08:00
a2c71c5405 [CI/Build] remove .github from .dockerignore, add dirty repo check (#9375) v0.6.3.post1 Daniele 2024-10-17 19:25:06 +02:00
81ede99ca4 [Core] Deprecating block manager v1 and make block manager v2 default (#8704) Kuntai Du 2024-10-17 11:38:15 -05:00
5eda21e773 [Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344) Li, Jiang 2024-10-18 00:21:04 +08:00
8e1cddcd44 [TPU] Call torch._sync(param) during weight loading (#9437) Woosuk Kwon 2024-10-17 09:00:11 -07:00
5e443b594f [Bugfix] Allow prefill of assistant response when using mistral_common (#9446) sasha0552 2024-10-17 15:06:37 +00:00
9d30a056e7 [misc] CUDA Time Layerwise Profiler (#8337) Lucas Wilkinson 2024-10-17 10:36:09 -04:00
390be74649 [Misc] Print stack trace using logger.exception (#9461) Cyrus Leung 2024-10-17 21:55:48 +08:00
e312e52b44 [Kernel] Add Exllama as a backend for compressed-tensors (#9395) Lucas Wilkinson 2024-10-17 09:48:26 -04:00
dbfa8d31d5 Add notes on the use of Slack (#9442) Yuan Tang 2024-10-17 00:46:46 -04:00
92d86da217 [BugFix] [Kernel] Fix GPU SEGV occurring in int8 kernels (#9391) rasmith 2024-10-16 20:34:06 -05:00
c3fab5f769 [Bugfix][Kernel] Prevent integer overflow in fp8 dynamic per-token quantize kernel (#9425) Tyler Michael Smith 2024-10-16 19:46:06 -04:00
776dbd74f1 [CI/Build] mypy: Resolve some errors from checking vllm/engine (#9267) Russell Bryant 2024-10-16 18:55:59 -04:00
8345045833 [Performance][Spec Decode] Optimize ngram lookup performance (#9333) Lily Liu 2024-10-16 12:37:45 -07:00
5b8a1fde84 [Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft (#9396) Junhao Li 2024-10-16 12:40:24 -04:00
fb60ae9b91 [Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189) Mor Zusman 2024-10-17 00:12:43 +08:00

... 127 128 129 130 131 ...