This website requires JavaScript.
1b49148e47
[Installation] Allow lower versions of FastAPI to maintain Ray 2.9 compatibility (#8764 )
Cyrus Leung
2024-09-27 07:54:09 +08:00
4b377d6feb
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829 )
Nick Hill
2024-09-27 00:46:43 +01:00
71d21c73ab
[Bugfix] Fixup advance_step.cu warning (#8815 )
Tyler Michael Smith
2024-09-26 19:23:45 -04:00
ee2da3e9ef
fix validation: Only set tool_choice auto if at least one tool is provided (#8568 )
Chirag Jain
2024-09-27 04:53:17 +05:30
e2f6f26e86
[Bugfix] Fix print_warning_once's line info (#8867 )
Tyler Michael Smith
2024-09-26 19:18:26 -04:00
b28d2104de
[Misc] Change dummy profiling and BOS fallback warns to log once (#8820 )
Michael Goin
2024-09-26 19:18:14 -04:00
93d364da34
[Bugfix] Include encoder prompts len to non-stream api usage response (#8861 )
Pernekhan Utemuratov
2024-09-26 15:47:00 -07:00
d9cfbc891e
[ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (#8872 )
Kevin H. Luu
2024-09-26 15:02:16 -07:00
70de39f6b4
[misc][installation] build from source without compilation (#8818 )
youkaichao
2024-09-26 13:19:04 -07:00
68988d4e0d
[CI/Build] Fix missing ci dependencies (#8834 )
fyuan1316
2024-09-27 02:04:39 +08:00
520db4dbc1
[Docs] Add README to the build docker image (#8825 )
Michael Goin
2024-09-26 14:02:52 -04:00
f70bccac75
[Build/CI] Upgrade to gcc 10 in the base build Docker image (#8814 )
Tyler Michael Smith
2024-09-26 13:07:18 -04:00
4bb98f2190
[Misc] Update config loading for Qwen2-VL and remove Granite (#8837 )
Roger Wang
2024-09-26 07:45:30 -07:00
7193774b1f
[Misc] Support quantization of MllamaForCausalLM (#8822 )
v0.6.2
Michael Goin
2024-09-25 17:46:22 -04:00
e2c6e0a829
[Doc] Update doc for Transformers 4.45 (#8817 )
Roger Wang
2024-09-25 13:29:48 -07:00
770ec6024f
[Model] Add support for the multi-modal Llama 3.2 model (#8811 )
Chen Zhang
2024-09-25 13:29:32 -07:00
4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility (#8760 ) (#8810 )
Simon Mo
2024-09-25 10:36:26 -07:00
873edda6cf
[Misc] Support FP8 MoE for compressed-tensors (#8588 )
Michael Goin
2024-09-25 12:43:36 -04:00
64840dfae4
[Frontend] MQLLMEngine supports profiling. (#8761 )
科英
2024-09-26 00:37:41 +08:00
28e1299e60
rename PromptInputs and inputs with backward compatibility (#8760 )
Cyrus Leung
2024-09-26 00:36:47 +08:00
0c4d2ad5e6
[VLM][Bugfix] internvl with num_scheduler_steps > 1 (#8614 )
DefTruth
2024-09-26 00:35:53 +08:00
c6f2485c82
[[Misc]] Add extra deps for openai server image (#8792 )
Jee Jee Li
2024-09-26 00:35:23 +08:00
300da09177
[Kernel] Fullgraph and opcheck tests (#8479 )
bnellnm
2024-09-25 10:35:52 -04:00
1c046447a6
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777 )
Hongxia Yang
2024-09-25 10:26:37 -04:00
8fae5ed7f6
[Misc] Fix minor typo in scheduler (#8765 )
Woo-Yeon Lee
2024-09-25 16:53:03 +09:00
3368c3ab36
[Bugfix] Ray 2.9.x doesn't expose available_resources_per_node (#8767 )
David Newman
2024-09-25 17:52:26 +10:00
1ac3de09cd
[Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer (#8672 )
Adam Tilghman
2024-09-25 00:49:26 -07:00
3e073e66f1
[Bugfix] load fc bias from config for eagle (#8790 )
sohamparikh
2024-09-25 02:16:30 -04:00
c23953675f
[Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (#8770 )
Isotr0py
2024-09-25 14:16:11 +08:00
e3dd0692fa
[BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv (#8250 )
zifeitong
2024-09-24 22:53:43 -07:00
fc3afc20df
Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (#8752 )
sroy745
2024-09-24 21:26:36 -07:00
b4522474a3
[Bugfix][Kernel] Implement acquire/release polyfill for Pascal (#8776 )
sasha0552
2024-09-25 04:26:33 +00:00
ee777d9c30
Fix test_schedule_swapped_simple in test_scheduler.py (#8780 )
sroy745
2024-09-24 21:26:18 -07:00
6e0c9d6bd0
[Bugfix] Use heartbeats instead of health checks (#8583 )
Joe Runde
2024-09-24 21:37:38 -06:00
6da1ab6b41
[Core] Adding Priority Scheduling (#5958 )
Archit Patke
2024-09-24 21:50:50 -05:00
01b6f9e1f0
[Core][Bugfix] Support prompt_logprobs returned with speculative decoding (#8047 )
Travis Johnson
2024-09-24 18:29:56 -06:00
13f9f7a3d0
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768 )
Jee Jee Li
2024-09-25 08:08:55 +08:00
1e7d5c01f5
[misc] soft drop beam search (#8763 )
youkaichao
2024-09-24 15:48:39 -07:00
2467b642dd
[CI/Build] fix setuptools-scm usage (#8771 )
Daniele
2024-09-24 21:38:12 +02:00
72fc97a0f1
[Bugfix] Fix torch dynamo fixes caused by replace_parameters (#8748 )
Lucas Wilkinson
2024-09-24 14:33:21 -04:00
2529d09b5a
[Frontend] Batch inference for llm.chat() API (#8648 )
Andy
2024-09-24 12:44:11 -04:00
a928ded995
[Kernel] Split Marlin MoE kernels into multiple files (#8661 )
ElizaWszola
2024-09-24 18:31:42 +02:00
cc4325b66a
[Bugfix] Fix potentially unsafe custom allreduce synchronization (#8558 )
Hanzhi Zhou
2024-09-24 01:08:14 -07:00
8ff7ced996
[Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658 )
Alex Brooks
2024-09-24 01:36:46 -06:00
3f06bae907
[Core][Model] Support loading weights by ID within models (#7931 )
Peter Salas
2024-09-24 00:14:15 -07:00
b8747e8a7c
[MISC] Skip dumping inputs when unpicklable (#8744 )
Cody Yu
2024-09-23 23:10:03 -07:00
3185fb0cca
Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" (#8750 )
Simon Mo
2024-09-23 22:45:20 -07:00
0250dd68c5
re-implement beam search on top of vllm core (#8726 )
youkaichao
2024-09-23 22:08:12 -07:00
88577ac928
Fix tests in test_scheduler.py that fail with BlockManager V2 (#8728 )
sroy745
2024-09-23 21:43:13 -07:00
530821d00c
[Hardware][AMD] ROCm6.2 upgrade (#8674 )
Hongxia Yang
2024-09-23 21:52:39 -04:00
1a2aef3e59
Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335 )
Alexander Matveev
2024-09-23 18:38:04 -04:00
5f7bb58427
Fix typical acceptance sampler with correct recovered token ids (#8562 )
jiqing-feng
2024-09-24 03:32:27 +08:00
b05f5c9238
[Core] Allow IPv6 in VLLM_HOST_IP with zmq (#8575 )
Russell Bryant
2024-09-23 15:15:41 -04:00
9b0e3ec970
[Kernel][LoRA] Add assertion for punica sgmv kernels (#7585 )
Jee Jee Li
2024-09-24 02:57:42 +08:00
86e9c8df29
[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )
Lucas Wilkinson
2024-09-23 13:46:26 -04:00
ee5f34b1c2
[CI/Build] use setuptools-scm to set __version__ (#4738 )
Daniele
2024-09-23 18:44:26 +02:00
f2bd246c17
[VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (#8707 )
Jani Monoses
2024-09-23 17:43:09 +03:00
a79e522984
[Model] Support pp for qwen2-vl (#8696 )
Yanyi Liu
2024-09-23 21:46:59 +08:00
3e83c12b5c
[Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (#8733 )
Li, Jiang
2024-09-23 21:15:16 +08:00
e551ca1555
[Hardware][CPU] Refactor CPU model runner (#8729 )
Isotr0py
2024-09-23 20:12:20 +08:00
9b8c8ba119
[Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657 )
Alex Brooks
2024-09-23 01:44:48 -06:00
d23679eb99
[Bugfix] fix docker build for xpu (#8652 )
Yan Ma
2024-09-23 13:54:18 +08:00
57a0702e63
[Bugfix] Fix CPU CMake build (#8723 )
Luka Govedič
2024-09-22 23:40:46 -04:00
3dda7c2250
[Bugfix] Avoid some bogus messages RE CUTLASS's revision when building (#8702 )
Tyler Michael Smith
2024-09-22 22:24:59 -04:00
92ba7e7477
[misc] upgrade mistral-common (#8715 )
youkaichao
2024-09-22 15:41:59 -07:00
d4a2ac8302
[build] enable existing pytorch (for GH200, aarch64, nightly) (#8713 )
youkaichao
2024-09-22 12:47:54 -07:00
c6bd70d772
[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701 )
Lily Liu
2024-09-22 12:34:14 -07:00
5b59532760
[Model][VLM] Add LLaVA-Onevision model support (#8486 )
litianjian
2024-09-23 01:51:44 +08:00
ca2b628b3c
[MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (#8703 )
Huazhong Ji
2024-09-23 01:44:09 +08:00
8ca5051b9a
[Misc] Use NamedTuple in Multi-image example (#8705 )
Alex Brooks
2024-09-22 06:56:20 -06:00
06ed2815e2
[Model] Refactor BLIP/BLIP-2 to support composite model loading (#8407 )
Cyrus Leung
2024-09-22 20:24:21 +08:00
0e40ac9b7b
[ci][build] fix vllm-flash-attn (#8699 )
youkaichao
2024-09-21 23:24:58 -07:00
13d88d4137
[Bugfix] Refactor composite weight loading logic (#8656 )
Isotr0py
2024-09-22 12:33:27 +08:00
d66ac62854
[Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (#8643 )
Tyler Michael Smith
2024-09-21 19:45:02 -04:00
9dc7c6c7f3
[dbrx] refactor dbrx experts to extend FusedMoe class (#8518 )
Divakar Verma
2024-09-21 16:09:39 -05:00
ec4aaad812
[Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (#8646 )
rasmith
2024-09-21 04:20:54 -05:00
4dfdf43196
[Doc] Fix typo in AMD installation guide (#8689 )
Andy Dai
2024-09-21 00:24:12 -07:00
5e85f4f82a
[VLM] Use SequenceData.from_token_counts to create dummy data (#8687 )
Cyrus Leung
2024-09-21 14:28:56 +08:00
71c60491f2
[Kernel] Build flash-attn from source (#8245 )
Luka Govedič
2024-09-21 02:27:10 -04:00
0faab90eb0
[beam search] add output for manually checking the correctness (#8684 )
youkaichao
2024-09-20 19:55:33 -07:00
0455c46ed4
[Core] Factor out common code in SequenceData and Sequence (#8675 )
Cyrus Leung
2024-09-21 10:30:39 +08:00
d4bf085ad0
[MISC] add support custom_op check (#8557 )
Kunshang Ji
2024-09-21 10:03:55 +08:00
0057894ef7
[Core] Rename PromptInputs and inputs(#8673 )
Cyrus Leung
2024-09-21 10:00:54 +08:00
0f961b3ce9
[Bugfix] Fix incorrect llava next feature size calculation (#8496 )
zyddnys
2024-09-20 18:48:32 -04:00
7f9c8902e3
[Hardware][AWS] update neuron to 2.20 (#8676 )
omrishiv
2024-09-20 15:19:44 -07:00
7c8566aa4f
[Doc] neuron documentation update (#8671 )
omrishiv
2024-09-20 15:04:37 -07:00
b4e4eda92e
[Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640 )
Patrick von Platen
2024-09-20 23:33:03 +02:00
2874bac618
[Bugfix] Config got an unexpected keyword argument 'engine' (#8556 )
Pastel!
2024-09-21 05:00:45 +08:00
035fa895ec
[Misc] Show AMD GPU topology in collect_env.py (#8649 )
Cyrus Leung
2024-09-21 04:52:19 +08:00
b28298f2f4
[Bugfix] Validate SamplingParam n is an int (#8548 )
saumya-saran
2024-09-20 12:46:02 -07:00
2940afa04e
[CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (#8670 )
Alexey Kondratiev(AMD)
2024-09-20 13:27:44 -04:00
3b63de9353
[Model] Add OLMoE (#7922 )
Niklas Muennighoff
2024-09-20 09:31:41 -07:00
260d40b5ea
[Core] Support Lora lineage and base model metadata management (#6315 )
Jiaxin Shan
2024-09-19 23:20:56 -07:00
9e5ec35b1f
[bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (#8474 )
William Lin
2024-09-19 20:49:54 -07:00
18ae428a0d
[Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571 )
Amit Garg
2024-09-19 17:54:02 -07:00
de6f90a13d
[Misc] guard against change in cuda library name (#8609 )
bnellnm
2024-09-19 18:36:30 -04:00
6cb748e190
[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (#8551 )
Alexey Kondratiev(AMD)
2024-09-19 16:06:32 -04:00
9e99407e3c
Create SECURITY.md (#8642 )
Simon Mo
2024-09-19 12:16:28 -07:00
ea4647b7d7
[Doc] Add documentation for GGUF quantization (#8618 )
Isotr0py
2024-09-20 03:15:55 +08:00
e42c634acb
[Core] simplify logits resort in _apply_top_k_top_p (#8619 )
盏一
2024-09-20 02:28:25 +08:00