biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Brendan Wong	4d31cd424b	[Frontend] merge beam search implementations (#9296 )	2024-10-14 15:05:52 -07:00
Maximilien de Bayser	ec10cb8511	[BugFix] Fix tool call finish reason in streaming case (#9209 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-10-11 18:24:26 -07:00
Cyrus Leung	cfaa6008e6	[Bugfix] Access `get_vocab` instead of `vocab` in tool parsers (#9188 )	2024-10-09 08:59:57 -06:00
Daniele	9a94ca4a5d	[Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537 )	2024-10-08 09:38:40 -07:00
Alex Brooks	069d3bd8d0	[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-08 14:31:26 +00:00
Brendan Wong	8c746226c9	[Frontend] API support for beam search for MQLLMEngine (#9117 )	2024-10-08 05:51:43 +00:00
youkaichao	18b296fdb2	[core] remove beam search from the core (#9105 )	2024-10-07 05:47:04 +00:00
Yanyi Liu	fdf59d30ea	[Bugfix] fix tool_parser error handling when serve a model not support it (#8709 )	2024-10-06 12:51:08 +00:00
Brendan Wong	168cab6bbf	[Frontend] API support for beam search (#9087 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-05 23:39:03 -07:00
Flávia Béo	0dcc8cbe5a	Adds truncate_prompt_tokens param for embeddings creation (#8999 ) Signed-off-by: Flavia Beo <flavia.beo@ibm.com>	2024-10-04 18:31:40 +00:00
代君	3dbb215b38	[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405 )	2024-10-04 10:36:39 +08:00
Guillaume Calmettes	83caf35e08	[BugFix] Enforce Mistral ToolCall id constraint when using the Mistral tool call parser (#9020 )	2024-10-03 16:44:52 +08:00
Sebastian Schoennenbeck	35bd215168	[Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API (#8965 )	2024-10-01 09:58:06 +00:00
Joe Runde	062c89e7c9	[Frontend][Core] Move guided decoding params into sampling params (#8252 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-01 09:34:25 +08:00
danieljannai21	6c9ba48fde	[Frontend] Added support for HF's new `continue_final_message` parameter (#8942 )	2024-09-29 17:59:47 +00:00
Maximilien de Bayser	344cd2b6f4	[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-09-26 17:01:42 -07:00
Nick Hill	4b377d6feb	[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829 )	2024-09-26 16:46:43 -07:00
Chirag Jain	ee2da3e9ef	fix validation: Only set tool_choice `auto` if at least one tool is provided (#8568 )	2024-09-26 16:23:17 -07:00
Pernekhan Utemuratov	93d364da34	[Bugfix] Include encoder prompts len to non-stream api usage response (#8861 )	2024-09-26 15:47:00 -07:00
Chen Zhang	770ec6024f	[Model] Add support for the multi-modal Llama 3.2 model (#8811 ) Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-25 13:29:32 -07:00
Adam Tilghman	1ac3de09cd	[Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer (#8672 )	2024-09-25 07:49:26 +00:00
Jiaxin Shan	260d40b5ea	[Core] Support Lora lineage and base model metadata management (#6315 )	2024-09-20 06:20:56 +00:00
Nick Hill	d9cd78eb71	[BugFix] Nonzero exit code if MQLLMEngine startup fails (#8572 )	2024-09-18 20:17:55 +00:00
Alexander Matveev	7c7714d856	[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-18 13:56:58 +00:00
Jiaxin Shan	e351572900	[Misc] Add argument to disable FastAPI docs (#8554 )	2024-09-18 09:51:59 +00:00
Patrick von Platen	a54ed80249	[Model] Add mistral function calling format to all models loaded with "mistral" format (#8515 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-17 17:50:37 +00:00
Kevin Lin	47f5e03b5b	[Bugfix] Bind api server port before starting engine (#8491 )	2024-09-16 13:56:28 -07:00
Nick Hill	acd5511b6d	[BugFix] Fix clean shutdown issues (#8492 )	2024-09-16 09:33:46 -07:00
lewtun	837c1968f9	[Frontend] Expose revision arg in OpenAI server (#8501 )	2024-09-16 15:55:26 +00:00
Nick Hill	551ce01078	[Core] Add engine option to return only deltas or final output (#7381 )	2024-09-12 12:02:00 -07:00
Luis Vega	1f0c75afa9	[BugFix] Fix Duplicate Assignment in Hermes2ProToolParser (#8423 )	2024-09-12 11:10:11 -07:00
tomeras91	5a60699c45	[Bugfix]: Fix the logic for deciding if tool parsing is used (#8366 )	2024-09-12 03:55:30 +00:00
youkaichao	f842a7aff1	[misc] remove engine_use_ray (#8126 )	2024-09-11 18:23:36 -07:00
Pooya Davoodi	cea95dfb94	[Frontend] Create ErrorResponse instead of raising exceptions in run_batch (#8347 )	2024-09-11 05:30:11 +00:00
Cyrus Leung	8c054b7a62	[Frontend] Clean up type annotations for mistral tokenizer (#8314 )	2024-09-10 16:49:11 +00:00
Adam Lugowski	58fcc8545a	[Frontend] Add progress reporting to run_batch.py (#8060 ) Co-authored-by: Adam Lugowski <adam.lugowski@parasail.io>	2024-09-09 11:16:37 -07:00
Kyle Mistele	08287ef675	[Bugfix] Streamed tool calls now more strictly follow OpenAI's format; ensures Vercel AI SDK compatibility (#8272 )	2024-09-09 10:45:11 -04:00
Jiaxin Shan	db3bf7c991	[Core] Support load and unload LoRA in api server (#6566 ) Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-09-05 18:10:33 -07:00
Kyle Mistele	e02ce498be	[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649 ) Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com> Co-authored-by: Kyle Mistele <kyle@constellate.ai>	2024-09-04 13:18:13 -07:00
Cyrus Leung	855c262a6b	[Frontend] Multimodal support in offline chat (#8098 )	2024-09-04 05:22:17 +00:00
Nick Hill	d4db9f53c8	[Benchmark] Add `--async-engine` option to benchmark_throughput.py (#7964 )	2024-09-03 20:57:41 -04:00
Roger Wang	5231f0898e	[Frontend][VLM] Add support for multiple multi-modal items (#8049 )	2024-08-31 16:35:53 -07:00
Nick Hill	4289cad37f	[Frontend] Minor optimizations to zmq decoupled front-end (#7957 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic>	2024-08-28 17:22:43 -07:00
Cyrus Leung	51f86bf487	[mypy][CI/Build] Fix mypy errors (#7929 )	2024-08-27 23:47:44 -07:00
Patrick von Platen	6fc4e6e07a	[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739 )	2024-08-27 12:40:02 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	0b769992ec	[Bugfix]: Use float32 for base64 embedding (#7855 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2024-08-26 03:16:38 +00:00
Tyler Rockwood	d81abefd2e	[Frontend] add json_schema support from OpenAI protocol (#7654 )	2024-08-23 23:07:24 -07:00
Pooya Davoodi	8da48e4d95	[Frontend] Publish Prometheus metrics in run_batch API (#7641 )	2024-08-23 23:04:22 -07:00
Pooya Davoodi	6885fde317	[Bugfix] Fix run_batch logger (#7640 )	2024-08-23 13:58:26 -07:00
Joe Runde	b903e1ba7f	[Frontend] error suppression cleanup (#7786 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-08-22 21:50:21 +00:00

1 2 3 4 5

243 Commits