Maximilien de Bayser
|
344cd2b6f4
|
[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-09-26 17:01:42 -07:00 |
|
Nick Hill
|
4b377d6feb
|
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829)
|
2024-09-26 16:46:43 -07:00 |
|
Chirag Jain
|
ee2da3e9ef
|
fix validation: Only set tool_choice auto if at least one tool is provided (#8568)
|
2024-09-26 16:23:17 -07:00 |
|
Pernekhan Utemuratov
|
93d364da34
|
[Bugfix] Include encoder prompts len to non-stream api usage response (#8861)
|
2024-09-26 15:47:00 -07:00 |
|
Chen Zhang
|
770ec6024f
|
[Model] Add support for the multi-modal Llama 3.2 model (#8811)
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-25 13:29:32 -07:00 |
|
Adam Tilghman
|
1ac3de09cd
|
[Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer (#8672)
|
2024-09-25 07:49:26 +00:00 |
|
Jiaxin Shan
|
260d40b5ea
|
[Core] Support Lora lineage and base model metadata management (#6315)
|
2024-09-20 06:20:56 +00:00 |
|
Nick Hill
|
d9cd78eb71
|
[BugFix] Nonzero exit code if MQLLMEngine startup fails (#8572)
|
2024-09-18 20:17:55 +00:00 |
|
Alexander Matveev
|
7c7714d856
|
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-09-18 13:56:58 +00:00 |
|
Jiaxin Shan
|
e351572900
|
[Misc] Add argument to disable FastAPI docs (#8554)
|
2024-09-18 09:51:59 +00:00 |
|
Patrick von Platen
|
a54ed80249
|
[Model] Add mistral function calling format to all models loaded with "mistral" format (#8515)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-09-17 17:50:37 +00:00 |
|
Kevin Lin
|
47f5e03b5b
|
[Bugfix] Bind api server port before starting engine (#8491)
|
2024-09-16 13:56:28 -07:00 |
|
Nick Hill
|
acd5511b6d
|
[BugFix] Fix clean shutdown issues (#8492)
|
2024-09-16 09:33:46 -07:00 |
|
lewtun
|
837c1968f9
|
[Frontend] Expose revision arg in OpenAI server (#8501)
|
2024-09-16 15:55:26 +00:00 |
|
Nick Hill
|
551ce01078
|
[Core] Add engine option to return only deltas or final output (#7381)
|
2024-09-12 12:02:00 -07:00 |
|
Luis Vega
|
1f0c75afa9
|
[BugFix] Fix Duplicate Assignment in Hermes2ProToolParser (#8423)
|
2024-09-12 11:10:11 -07:00 |
|
tomeras91
|
5a60699c45
|
[Bugfix]: Fix the logic for deciding if tool parsing is used (#8366)
|
2024-09-12 03:55:30 +00:00 |
|
youkaichao
|
f842a7aff1
|
[misc] remove engine_use_ray (#8126)
|
2024-09-11 18:23:36 -07:00 |
|
Pooya Davoodi
|
cea95dfb94
|
[Frontend] Create ErrorResponse instead of raising exceptions in run_batch (#8347)
|
2024-09-11 05:30:11 +00:00 |
|
Cyrus Leung
|
8c054b7a62
|
[Frontend] Clean up type annotations for mistral tokenizer (#8314)
|
2024-09-10 16:49:11 +00:00 |
|
Adam Lugowski
|
58fcc8545a
|
[Frontend] Add progress reporting to run_batch.py (#8060)
Co-authored-by: Adam Lugowski <adam.lugowski@parasail.io>
|
2024-09-09 11:16:37 -07:00 |
|
Kyle Mistele
|
08287ef675
|
[Bugfix] Streamed tool calls now more strictly follow OpenAI's format; ensures Vercel AI SDK compatibility (#8272)
|
2024-09-09 10:45:11 -04:00 |
|
Jiaxin Shan
|
db3bf7c991
|
[Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-09-05 18:10:33 -07:00 |
|
Kyle Mistele
|
e02ce498be
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-04 13:18:13 -07:00 |
|
Cyrus Leung
|
855c262a6b
|
[Frontend] Multimodal support in offline chat (#8098)
|
2024-09-04 05:22:17 +00:00 |
|
Nick Hill
|
d4db9f53c8
|
[Benchmark] Add --async-engine option to benchmark_throughput.py (#7964)
|
2024-09-03 20:57:41 -04:00 |
|
Roger Wang
|
5231f0898e
|
[Frontend][VLM] Add support for multiple multi-modal items (#8049)
|
2024-08-31 16:35:53 -07:00 |
|
Nick Hill
|
4289cad37f
|
[Frontend] Minor optimizations to zmq decoupled front-end (#7957)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
|
2024-08-28 17:22:43 -07:00 |
|
Cyrus Leung
|
51f86bf487
|
[mypy][CI/Build] Fix mypy errors (#7929)
|
2024-08-27 23:47:44 -07:00 |
|
Patrick von Platen
|
6fc4e6e07a
|
[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739)
|
2024-08-27 12:40:02 +00:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
0b769992ec
|
[Bugfix]: Use float32 for base64 embedding (#7855)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2024-08-26 03:16:38 +00:00 |
|
Tyler Rockwood
|
d81abefd2e
|
[Frontend] add json_schema support from OpenAI protocol (#7654)
|
2024-08-23 23:07:24 -07:00 |
|
Pooya Davoodi
|
8da48e4d95
|
[Frontend] Publish Prometheus metrics in run_batch API (#7641)
|
2024-08-23 23:04:22 -07:00 |
|
Pooya Davoodi
|
6885fde317
|
[Bugfix] Fix run_batch logger (#7640)
|
2024-08-23 13:58:26 -07:00 |
|
Joe Runde
|
b903e1ba7f
|
[Frontend] error suppression cleanup (#7786)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-22 21:50:21 +00:00 |
|
Joe Runde
|
cde9183b40
|
[Bug][Frontend] Improve ZMQ client robustness (#7443)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-22 02:18:11 +00:00 |
|
William Lin
|
dd53c4b023
|
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-21 15:39:26 -07:00 |
|
Robert Shaw
|
970dfdc01d
|
[Frontend] Improve Startup Failure UX (#7716)
|
2024-08-21 19:53:01 +00:00 |
|
Robert Shaw
|
f7e3b0c5aa
|
[Bugfix][Frontend] Fix Issues Under High Load With zeromq Frontend (#7394)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-21 13:34:14 -04:00 |
|
Cyrus Leung
|
baaedfdb2d
|
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
|
2024-08-20 23:28:21 -07:00 |
|
Robert Shaw
|
e3b318216d
|
[ Bugfix ] Fix Prometheus Metrics With zeromq Frontend (#7279)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-18 20:19:48 +00:00 |
|
Rui Qiao
|
bae888cb8e
|
[Bugfix] Clear engine reference in AsyncEngineRPCServer (#7618)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-16 20:44:05 -07:00 |
|
Gordon Wong
|
0e39a33c6d
|
[Bugfix][Hardware][AMD][Frontend] add quantization param to embedding checking method (#7513)
|
2024-08-16 10:05:18 -06:00 |
|
Nick Hill
|
9587b050fb
|
[Core] Use uvloop with zmq-decoupled front-end (#7570)
|
2024-08-15 22:48:07 -07:00 |
|
Grant Pinkert
|
f878c8feb0
|
[Feature]: Add OpenAI server prompt_logprobs support #6508 (#7453)
|
2024-08-16 02:38:08 +00:00 |
|
Michael Goin
|
9c8e2d1161
|
[Bugfix][Harmless] Fix float16 dtype for model_is_embedding (#7566)
|
2024-08-15 18:26:19 -07:00 |
|
jack
|
67d115db08
|
[Bugfix][Frontend] Disable embedding API for chat models (#7504)
Co-authored-by: jack <jack@alex>
|
2024-08-14 09:15:19 -07:00 |
|
youkaichao
|
33e5d7e6b6
|
[frontend] spawn engine process from api server process (#7484)
|
2024-08-13 15:40:17 -07:00 |
|
Andrew Wang
|
97a6be95ba
|
[Misc] improve logits processors logging message (#7435)
|
2024-08-13 02:29:34 +00:00 |
|
Rui Qiao
|
198d6a2898
|
[Core] Shut down aDAG workers with clean async llm engine exit (#7224)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-12 17:57:16 -07:00 |
|