biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Maximilien de Bayser	344cd2b6f4	[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-09-26 17:01:42 -07:00
Nick Hill	4b377d6feb	[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829 )	2024-09-26 16:46:43 -07:00
Chirag Jain	ee2da3e9ef	fix validation: Only set tool_choice `auto` if at least one tool is provided (#8568 )	2024-09-26 16:23:17 -07:00
Pernekhan Utemuratov	93d364da34	[Bugfix] Include encoder prompts len to non-stream api usage response (#8861 )	2024-09-26 15:47:00 -07:00
Chen Zhang	770ec6024f	[Model] Add support for the multi-modal Llama 3.2 model (#8811 ) Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-25 13:29:32 -07:00
Adam Tilghman	1ac3de09cd	[Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer (#8672 )	2024-09-25 07:49:26 +00:00
Jiaxin Shan	260d40b5ea	[Core] Support Lora lineage and base model metadata management (#6315 )	2024-09-20 06:20:56 +00:00
Nick Hill	d9cd78eb71	[BugFix] Nonzero exit code if MQLLMEngine startup fails (#8572 )	2024-09-18 20:17:55 +00:00
Alexander Matveev	7c7714d856	[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-18 13:56:58 +00:00
Jiaxin Shan	e351572900	[Misc] Add argument to disable FastAPI docs (#8554 )	2024-09-18 09:51:59 +00:00
Patrick von Platen	a54ed80249	[Model] Add mistral function calling format to all models loaded with "mistral" format (#8515 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-17 17:50:37 +00:00
Kevin Lin	47f5e03b5b	[Bugfix] Bind api server port before starting engine (#8491 )	2024-09-16 13:56:28 -07:00
Nick Hill	acd5511b6d	[BugFix] Fix clean shutdown issues (#8492 )	2024-09-16 09:33:46 -07:00
lewtun	837c1968f9	[Frontend] Expose revision arg in OpenAI server (#8501 )	2024-09-16 15:55:26 +00:00
Nick Hill	551ce01078	[Core] Add engine option to return only deltas or final output (#7381 )	2024-09-12 12:02:00 -07:00
Luis Vega	1f0c75afa9	[BugFix] Fix Duplicate Assignment in Hermes2ProToolParser (#8423 )	2024-09-12 11:10:11 -07:00
tomeras91	5a60699c45	[Bugfix]: Fix the logic for deciding if tool parsing is used (#8366 )	2024-09-12 03:55:30 +00:00
youkaichao	f842a7aff1	[misc] remove engine_use_ray (#8126 )	2024-09-11 18:23:36 -07:00
Pooya Davoodi	cea95dfb94	[Frontend] Create ErrorResponse instead of raising exceptions in run_batch (#8347 )	2024-09-11 05:30:11 +00:00
Cyrus Leung	8c054b7a62	[Frontend] Clean up type annotations for mistral tokenizer (#8314 )	2024-09-10 16:49:11 +00:00
Adam Lugowski	58fcc8545a	[Frontend] Add progress reporting to run_batch.py (#8060 ) Co-authored-by: Adam Lugowski <adam.lugowski@parasail.io>	2024-09-09 11:16:37 -07:00
Kyle Mistele	08287ef675	[Bugfix] Streamed tool calls now more strictly follow OpenAI's format; ensures Vercel AI SDK compatibility (#8272 )	2024-09-09 10:45:11 -04:00
Jiaxin Shan	db3bf7c991	[Core] Support load and unload LoRA in api server (#6566 ) Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-09-05 18:10:33 -07:00
Kyle Mistele	e02ce498be	[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649 ) Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com> Co-authored-by: Kyle Mistele <kyle@constellate.ai>	2024-09-04 13:18:13 -07:00
Cyrus Leung	855c262a6b	[Frontend] Multimodal support in offline chat (#8098 )	2024-09-04 05:22:17 +00:00
Nick Hill	d4db9f53c8	[Benchmark] Add `--async-engine` option to benchmark_throughput.py (#7964 )	2024-09-03 20:57:41 -04:00
Roger Wang	5231f0898e	[Frontend][VLM] Add support for multiple multi-modal items (#8049 )	2024-08-31 16:35:53 -07:00
Nick Hill	4289cad37f	[Frontend] Minor optimizations to zmq decoupled front-end (#7957 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic>	2024-08-28 17:22:43 -07:00
Cyrus Leung	51f86bf487	[mypy][CI/Build] Fix mypy errors (#7929 )	2024-08-27 23:47:44 -07:00
Patrick von Platen	6fc4e6e07a	[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739 )	2024-08-27 12:40:02 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	0b769992ec	[Bugfix]: Use float32 for base64 embedding (#7855 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2024-08-26 03:16:38 +00:00
Tyler Rockwood	d81abefd2e	[Frontend] add json_schema support from OpenAI protocol (#7654 )	2024-08-23 23:07:24 -07:00
Pooya Davoodi	8da48e4d95	[Frontend] Publish Prometheus metrics in run_batch API (#7641 )	2024-08-23 23:04:22 -07:00
Pooya Davoodi	6885fde317	[Bugfix] Fix run_batch logger (#7640 )	2024-08-23 13:58:26 -07:00
Joe Runde	b903e1ba7f	[Frontend] error suppression cleanup (#7786 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-08-22 21:50:21 +00:00
Joe Runde	cde9183b40	[Bug][Frontend] Improve ZMQ client robustness (#7443 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-08-22 02:18:11 +00:00
William Lin	dd53c4b023	[misc] Add Torch profiler support (#7451 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-08-21 15:39:26 -07:00
Robert Shaw	970dfdc01d	[Frontend] Improve Startup Failure UX (#7716 )	2024-08-21 19:53:01 +00:00
Robert Shaw	f7e3b0c5aa	[Bugfix][Frontend] Fix Issues Under High Load With `zeromq` Frontend (#7394 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-21 13:34:14 -04:00
Cyrus Leung	baaedfdb2d	[mypy] Enable following imports for entrypoints (#7248 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Fei <dfdfcai4@gmail.com>	2024-08-20 23:28:21 -07:00
Robert Shaw	e3b318216d	[ Bugfix ] Fix Prometheus Metrics With `zeromq` Frontend (#7279 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-18 20:19:48 +00:00
Rui Qiao	bae888cb8e	[Bugfix] Clear engine reference in AsyncEngineRPCServer (#7618 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-16 20:44:05 -07:00
Gordon Wong	0e39a33c6d	[Bugfix][Hardware][AMD][Frontend] add quantization param to embedding checking method (#7513 )	2024-08-16 10:05:18 -06:00
Nick Hill	9587b050fb	[Core] Use uvloop with zmq-decoupled front-end (#7570 )	2024-08-15 22:48:07 -07:00
Grant Pinkert	f878c8feb0	[Feature]: Add OpenAI server prompt_logprobs support #6508 (#7453 )	2024-08-16 02:38:08 +00:00
Michael Goin	9c8e2d1161	[Bugfix][Harmless] Fix float16 dtype for model_is_embedding (#7566 )	2024-08-15 18:26:19 -07:00
jack	67d115db08	[Bugfix][Frontend] Disable embedding API for chat models (#7504 ) Co-authored-by: jack <jack@alex>	2024-08-14 09:15:19 -07:00
youkaichao	33e5d7e6b6	[frontend] spawn engine process from api server process (#7484 )	2024-08-13 15:40:17 -07:00
Andrew Wang	97a6be95ba	[Misc] improve logits processors logging message (#7435 )	2024-08-13 02:29:34 +00:00
Rui Qiao	198d6a2898	[Core] Shut down aDAG workers with clean async llm engine exit (#7224 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-12 17:57:16 -07:00

1 2 3 4 5 ...

278 Commits