Brittany
|
8df2dc3c88
|
[TPU] Update pallas.py to support trillium (#8871)
|
2024-09-27 01:16:55 -07:00 |
|
Isotr0py
|
6d792d2f31
|
[Bugfix][VLM] Fix Fuyu batching inference with max_num_seqs>1 (#8892)
|
2024-09-27 01:15:58 -07:00 |
|
Peter Pan
|
0e088750af
|
[MISC] Fix invalid escape sequence '\' (#8830)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2024-09-27 01:13:25 -07:00 |
|
youkaichao
|
dc4e3df5c2
|
[misc] fix collect env (#8894)
|
2024-09-27 00:26:38 -07:00 |
|
Cyrus Leung
|
3b00b9c26c
|
[Core] renamePromptInputs and inputs (#8876)
|
2024-09-26 20:35:15 -07:00 |
|
Maximilien de Bayser
|
344cd2b6f4
|
[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-09-26 17:01:42 -07:00 |
|
Cyrus Leung
|
1b49148e47
|
[Installation] Allow lower versions of FastAPI to maintain Ray 2.9 compatibility (#8764)
|
2024-09-26 16:54:09 -07:00 |
|
Nick Hill
|
4b377d6feb
|
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829)
|
2024-09-26 16:46:43 -07:00 |
|
Tyler Michael Smith
|
71d21c73ab
|
[Bugfix] Fixup advance_step.cu warning (#8815)
|
2024-09-26 16:23:45 -07:00 |
|
Chirag Jain
|
ee2da3e9ef
|
fix validation: Only set tool_choice auto if at least one tool is provided (#8568)
|
2024-09-26 16:23:17 -07:00 |
|
Tyler Michael Smith
|
e2f6f26e86
|
[Bugfix] Fix print_warning_once's line info (#8867)
|
2024-09-26 16:18:26 -07:00 |
|
Michael Goin
|
b28d2104de
|
[Misc] Change dummy profiling and BOS fallback warns to log once (#8820)
|
2024-09-26 16:18:14 -07:00 |
|
Pernekhan Utemuratov
|
93d364da34
|
[Bugfix] Include encoder prompts len to non-stream api usage response (#8861)
|
2024-09-26 15:47:00 -07:00 |
|
Kevin H. Luu
|
d9cfbc891e
|
[ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (#8872)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-09-26 15:02:16 -07:00 |
|
youkaichao
|
70de39f6b4
|
[misc][installation] build from source without compilation (#8818)
|
2024-09-26 13:19:04 -07:00 |
|
fyuan1316
|
68988d4e0d
|
[CI/Build] Fix missing ci dependencies (#8834)
|
2024-09-26 11:04:39 -07:00 |
|
Michael Goin
|
520db4dbc1
|
[Docs] Add README to the build docker image (#8825)
|
2024-09-26 11:02:52 -07:00 |
|
Tyler Michael Smith
|
f70bccac75
|
[Build/CI] Upgrade to gcc 10 in the base build Docker image (#8814)
|
2024-09-26 10:07:18 -07:00 |
|
Roger Wang
|
4bb98f2190
|
[Misc] Update config loading for Qwen2-VL and remove Granite (#8837)
|
2024-09-26 07:45:30 -07:00 |
|
Michael Goin
|
7193774b1f
|
[Misc] Support quantization of MllamaForCausalLM (#8822)
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
v0.6.2
|
2024-09-25 14:46:22 -07:00 |
|
Roger Wang
|
e2c6e0a829
|
[Doc] Update doc for Transformers 4.45 (#8817)
|
2024-09-25 13:29:48 -07:00 |
|
Chen Zhang
|
770ec6024f
|
[Model] Add support for the multi-modal Llama 3.2 model (#8811)
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-25 13:29:32 -07:00 |
|
Simon Mo
|
4f1ba0844b
|
Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810)
|
2024-09-25 10:36:26 -07:00 |
|
Michael Goin
|
873edda6cf
|
[Misc] Support FP8 MoE for compressed-tensors (#8588)
|
2024-09-25 09:43:36 -07:00 |
|
科英
|
64840dfae4
|
[Frontend] MQLLMEngine supports profiling. (#8761)
|
2024-09-25 09:37:41 -07:00 |
|
Cyrus Leung
|
28e1299e60
|
rename PromptInputs and inputs with backward compatibility (#8760)
|
2024-09-25 09:36:47 -07:00 |
|
DefTruth
|
0c4d2ad5e6
|
[VLM][Bugfix] internvl with num_scheduler_steps > 1 (#8614)
|
2024-09-25 09:35:53 -07:00 |
|
Jee Jee Li
|
c6f2485c82
|
[[Misc]] Add extra deps for openai server image (#8792)
|
2024-09-25 09:35:23 -07:00 |
|
bnellnm
|
300da09177
|
[Kernel] Fullgraph and opcheck tests (#8479)
|
2024-09-25 08:35:52 -06:00 |
|
Hongxia Yang
|
1c046447a6
|
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777)
|
2024-09-25 22:26:37 +08:00 |
|
Woo-Yeon Lee
|
8fae5ed7f6
|
[Misc] Fix minor typo in scheduler (#8765)
|
2024-09-25 00:53:03 -07:00 |
|
David Newman
|
3368c3ab36
|
[Bugfix] Ray 2.9.x doesn't expose available_resources_per_node (#8767)
Signed-off-by: darthhexx <darthhexx@gmail.com>
|
2024-09-25 00:52:26 -07:00 |
|
Adam Tilghman
|
1ac3de09cd
|
[Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer (#8672)
|
2024-09-25 07:49:26 +00:00 |
|
sohamparikh
|
3e073e66f1
|
[Bugfix] load fc bias from config for eagle (#8790)
|
2024-09-24 23:16:30 -07:00 |
|
Isotr0py
|
c23953675f
|
[Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (#8770)
|
2024-09-24 23:16:11 -07:00 |
|
zifeitong
|
e3dd0692fa
|
[BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv (#8250)
|
2024-09-25 05:53:43 +00:00 |
|
sroy745
|
fc3afc20df
|
Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (#8752)
|
2024-09-24 21:26:36 -07:00 |
|
sasha0552
|
b4522474a3
|
[Bugfix][Kernel] Implement acquire/release polyfill for Pascal (#8776)
|
2024-09-24 21:26:33 -07:00 |
|
sroy745
|
ee777d9c30
|
Fix test_schedule_swapped_simple in test_scheduler.py (#8780)
|
2024-09-24 21:26:18 -07:00 |
|
Joe Runde
|
6e0c9d6bd0
|
[Bugfix] Use heartbeats instead of health checks (#8583)
|
2024-09-24 20:37:38 -07:00 |
|
Archit Patke
|
6da1ab6b41
|
[Core] Adding Priority Scheduling (#5958)
|
2024-09-24 19:50:50 -07:00 |
|
Travis Johnson
|
01b6f9e1f0
|
[Core][Bugfix] Support prompt_logprobs returned with speculative decoding (#8047)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-09-24 17:29:56 -07:00 |
|
Jee Jee Li
|
13f9f7a3d0
|
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768)
|
2024-09-24 17:08:55 -07:00 |
|
youkaichao
|
1e7d5c01f5
|
[misc] soft drop beam search (#8763)
|
2024-09-24 15:48:39 -07:00 |
|
Daniele
|
2467b642dd
|
[CI/Build] fix setuptools-scm usage (#8771)
|
2024-09-24 12:38:12 -07:00 |
|
Lucas Wilkinson
|
72fc97a0f1
|
[Bugfix] Fix torch dynamo fixes caused by replace_parameters (#8748)
|
2024-09-24 14:33:21 -04:00 |
|
Andy
|
2529d09b5a
|
[Frontend] Batch inference for llm.chat() API (#8648)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-09-24 09:44:11 -07:00 |
|
ElizaWszola
|
a928ded995
|
[Kernel] Split Marlin MoE kernels into multiple files (#8661)
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-09-24 09:31:42 -07:00 |
|
Hanzhi Zhou
|
cc4325b66a
|
[Bugfix] Fix potentially unsafe custom allreduce synchronization (#8558)
|
2024-09-24 01:08:14 -07:00 |
|
Alex Brooks
|
8ff7ced996
|
[Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-24 07:36:46 +00:00 |
|