Michael Goin
|
520db4dbc1
|
[Docs] Add README to the build docker image (#8825)
|
2024-09-26 11:02:52 -07:00 |
|
Tyler Michael Smith
|
f70bccac75
|
[Build/CI] Upgrade to gcc 10 in the base build Docker image (#8814)
|
2024-09-26 10:07:18 -07:00 |
|
Roger Wang
|
4bb98f2190
|
[Misc] Update config loading for Qwen2-VL and remove Granite (#8837)
|
2024-09-26 07:45:30 -07:00 |
|
Michael Goin
|
7193774b1f
|
[Misc] Support quantization of MllamaForCausalLM (#8822)
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
v0.6.2
|
2024-09-25 14:46:22 -07:00 |
|
Roger Wang
|
e2c6e0a829
|
[Doc] Update doc for Transformers 4.45 (#8817)
|
2024-09-25 13:29:48 -07:00 |
|
Chen Zhang
|
770ec6024f
|
[Model] Add support for the multi-modal Llama 3.2 model (#8811)
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-25 13:29:32 -07:00 |
|
Simon Mo
|
4f1ba0844b
|
Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810)
|
2024-09-25 10:36:26 -07:00 |
|
Michael Goin
|
873edda6cf
|
[Misc] Support FP8 MoE for compressed-tensors (#8588)
|
2024-09-25 09:43:36 -07:00 |
|
科英
|
64840dfae4
|
[Frontend] MQLLMEngine supports profiling. (#8761)
|
2024-09-25 09:37:41 -07:00 |
|
Cyrus Leung
|
28e1299e60
|
rename PromptInputs and inputs with backward compatibility (#8760)
|
2024-09-25 09:36:47 -07:00 |
|
DefTruth
|
0c4d2ad5e6
|
[VLM][Bugfix] internvl with num_scheduler_steps > 1 (#8614)
|
2024-09-25 09:35:53 -07:00 |
|
Jee Jee Li
|
c6f2485c82
|
[[Misc]] Add extra deps for openai server image (#8792)
|
2024-09-25 09:35:23 -07:00 |
|
bnellnm
|
300da09177
|
[Kernel] Fullgraph and opcheck tests (#8479)
|
2024-09-25 08:35:52 -06:00 |
|
Hongxia Yang
|
1c046447a6
|
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777)
|
2024-09-25 22:26:37 +08:00 |
|
Woo-Yeon Lee
|
8fae5ed7f6
|
[Misc] Fix minor typo in scheduler (#8765)
|
2024-09-25 00:53:03 -07:00 |
|
David Newman
|
3368c3ab36
|
[Bugfix] Ray 2.9.x doesn't expose available_resources_per_node (#8767)
Signed-off-by: darthhexx <darthhexx@gmail.com>
|
2024-09-25 00:52:26 -07:00 |
|
Adam Tilghman
|
1ac3de09cd
|
[Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer (#8672)
|
2024-09-25 07:49:26 +00:00 |
|
sohamparikh
|
3e073e66f1
|
[Bugfix] load fc bias from config for eagle (#8790)
|
2024-09-24 23:16:30 -07:00 |
|
Isotr0py
|
c23953675f
|
[Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (#8770)
|
2024-09-24 23:16:11 -07:00 |
|
zifeitong
|
e3dd0692fa
|
[BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv (#8250)
|
2024-09-25 05:53:43 +00:00 |
|
sroy745
|
fc3afc20df
|
Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (#8752)
|
2024-09-24 21:26:36 -07:00 |
|
sasha0552
|
b4522474a3
|
[Bugfix][Kernel] Implement acquire/release polyfill for Pascal (#8776)
|
2024-09-24 21:26:33 -07:00 |
|
sroy745
|
ee777d9c30
|
Fix test_schedule_swapped_simple in test_scheduler.py (#8780)
|
2024-09-24 21:26:18 -07:00 |
|
Joe Runde
|
6e0c9d6bd0
|
[Bugfix] Use heartbeats instead of health checks (#8583)
|
2024-09-24 20:37:38 -07:00 |
|
Archit Patke
|
6da1ab6b41
|
[Core] Adding Priority Scheduling (#5958)
|
2024-09-24 19:50:50 -07:00 |
|
Travis Johnson
|
01b6f9e1f0
|
[Core][Bugfix] Support prompt_logprobs returned with speculative decoding (#8047)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-09-24 17:29:56 -07:00 |
|
Jee Jee Li
|
13f9f7a3d0
|
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768)
|
2024-09-24 17:08:55 -07:00 |
|
youkaichao
|
1e7d5c01f5
|
[misc] soft drop beam search (#8763)
|
2024-09-24 15:48:39 -07:00 |
|
Daniele
|
2467b642dd
|
[CI/Build] fix setuptools-scm usage (#8771)
|
2024-09-24 12:38:12 -07:00 |
|
Lucas Wilkinson
|
72fc97a0f1
|
[Bugfix] Fix torch dynamo fixes caused by replace_parameters (#8748)
|
2024-09-24 14:33:21 -04:00 |
|
Andy
|
2529d09b5a
|
[Frontend] Batch inference for llm.chat() API (#8648)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-09-24 09:44:11 -07:00 |
|
ElizaWszola
|
a928ded995
|
[Kernel] Split Marlin MoE kernels into multiple files (#8661)
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-09-24 09:31:42 -07:00 |
|
Hanzhi Zhou
|
cc4325b66a
|
[Bugfix] Fix potentially unsafe custom allreduce synchronization (#8558)
|
2024-09-24 01:08:14 -07:00 |
|
Alex Brooks
|
8ff7ced996
|
[Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-24 07:36:46 +00:00 |
|
Peter Salas
|
3f06bae907
|
[Core][Model] Support loading weights by ID within models (#7931)
|
2024-09-24 07:14:15 +00:00 |
|
Cody Yu
|
b8747e8a7c
|
[MISC] Skip dumping inputs when unpicklable (#8744)
|
2024-09-24 06:10:03 +00:00 |
|
Simon Mo
|
3185fb0cca
|
Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" (#8750)
|
2024-09-24 05:45:20 +00:00 |
|
youkaichao
|
0250dd68c5
|
re-implement beam search on top of vllm core (#8726)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
|
2024-09-23 22:08:12 -07:00 |
|
sroy745
|
88577ac928
|
Fix tests in test_scheduler.py that fail with BlockManager V2 (#8728)
|
2024-09-24 04:43:13 +00:00 |
|
Hongxia Yang
|
530821d00c
|
[Hardware][AMD] ROCm6.2 upgrade (#8674)
|
2024-09-23 18:52:39 -07:00 |
|
Alexander Matveev
|
1a2aef3e59
|
Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335)
|
2024-09-23 15:38:04 -07:00 |
|
jiqing-feng
|
5f7bb58427
|
Fix typical acceptance sampler with correct recovered token ids (#8562)
|
2024-09-23 12:32:27 -07:00 |
|
Russell Bryant
|
b05f5c9238
|
[Core] Allow IPv6 in VLLM_HOST_IP with zmq (#8575)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-09-23 12:15:41 -07:00 |
|
Jee Jee Li
|
9b0e3ec970
|
[Kernel][LoRA] Add assertion for punica sgmv kernels (#7585)
|
2024-09-23 18:57:42 +00:00 |
|
Lucas Wilkinson
|
86e9c8df29
|
[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701)
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-09-23 13:46:26 -04:00 |
|
Daniele
|
ee5f34b1c2
|
[CI/Build] use setuptools-scm to set __version__ (#4738)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-23 09:44:26 -07:00 |
|
Jani Monoses
|
f2bd246c17
|
[VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (#8707)
|
2024-09-23 14:43:09 +00:00 |
|
Yanyi Liu
|
a79e522984
|
[Model] Support pp for qwen2-vl (#8696)
|
2024-09-23 13:46:59 +00:00 |
|
Li, Jiang
|
3e83c12b5c
|
[Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (#8733)
|
2024-09-23 13:15:16 +00:00 |
|
Isotr0py
|
e551ca1555
|
[Hardware][CPU] Refactor CPU model runner (#8729)
|
2024-09-23 20:12:20 +08:00 |
|