youkaichao
|
04c12f8157
|
[misc] update utils to support comparing multiple settings (#9140)
|
2024-10-08 02:51:49 +00:00 |
|
Isotr0py
|
f19da64871
|
[Core] Refactor GGUF parameters packing and forwarding (#8859)
|
2024-10-07 10:01:46 +00:00 |
|
Isotr0py
|
4f95ffee6f
|
[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089)
|
2024-10-07 06:50:35 +00:00 |
|
Cyrus Leung
|
8c6de96ea1
|
[Model] Explicit interface for vLLM models and support OOT embedding models (#9108)
|
2024-10-07 06:10:35 +00:00 |
|
youkaichao
|
18b296fdb2
|
[core] remove beam search from the core (#9105)
|
2024-10-07 05:47:04 +00:00 |
|
Varun Sundar Rabindranath
|
cb3b2b9ba4
|
[Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-10-06 12:48:11 -07:00 |
|
Cyrus Leung
|
b22b798471
|
[Model] PP support for embedding models and update docs (#9090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-10-06 16:35:27 +08:00 |
|
Brendan Wong
|
168cab6bbf
|
[Frontend] API support for beam search (#9087)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-10-05 23:39:03 -07:00 |
|
Andy Dai
|
5df1834895
|
[Bugfix] Fix order of arguments matters in config.yaml (#8960)
|
2024-10-05 17:35:11 +00:00 |
|
Chen Zhang
|
cfadb9c687
|
[Bugfix] Deprecate registration of custom configs to huggingface (#9083)
|
2024-10-05 21:56:40 +08:00 |
|
Xin Yang
|
15986f598c
|
[Model] Support Gemma2 embedding model (#9004)
|
2024-10-05 06:57:05 +00:00 |
|
ElizaWszola
|
05d686432f
|
[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
|
2024-10-04 12:34:44 -06:00 |
|
Flávia Béo
|
0dcc8cbe5a
|
Adds truncate_prompt_tokens param for embeddings creation (#8999)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
|
2024-10-04 18:31:40 +00:00 |
|
Roger Wang
|
26aa325f4f
|
[Core][VLM] Test registration for OOT multimodal models (#8717)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:38:25 -07:00 |
|
Prashant Gupta
|
9ade8bbc8d
|
[Model] add a bunch of supported lora modules for mixtral (#9008)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
|
2024-10-04 16:24:40 +00:00 |
|
Cyrus Leung
|
0e36fd4909
|
[Misc] Move registry to its own file (#9064)
|
2024-10-04 10:01:37 +00:00 |
|
Murali Andoorveedu
|
0f6d7a9a34
|
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:56:58 +08:00 |
|
代君
|
3dbb215b38
|
[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405)
|
2024-10-04 10:36:39 +08:00 |
|
sroy745
|
91add85ec4
|
Fix failing spec decode test (#9054)
|
2024-10-03 23:07:29 +00:00 |
|
youkaichao
|
9aaf14c62e
|
[misc] add forward context for attention (#9029)
|
2024-10-03 12:09:42 -07:00 |
|
xendo
|
63e39937f9
|
[Frontend] [Neuron] Parse literals out of override-neuron-config (#8959)
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
|
2024-10-03 18:02:07 +00:00 |
|
Guillaume Calmettes
|
83caf35e08
|
[BugFix] Enforce Mistral ToolCall id constraint when using the Mistral tool call parser (#9020)
|
2024-10-03 16:44:52 +08:00 |
|
Shawn Tan
|
19f0d25796
|
[Model] Adding Granite MoE. (#8206)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-03 09:33:57 +08:00 |
|
afeldman-nm
|
563649aafe
|
[Core] Combined support for multi-step scheduling, chunked prefill & prefix caching (#8804)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
|
2024-10-02 07:52:20 +00:00 |
|
Lily Liu
|
1570203864
|
[Spec Decode] (1/2) Remove batch expansion (#8839)
|
2024-10-01 16:04:42 -07:00 |
|
Isotr0py
|
bc4eb65b54
|
[Bugfix] Fix Fuyu tensor parallel inference (#8986)
|
2024-10-01 17:51:41 +08:00 |
|
Joe Runde
|
062c89e7c9
|
[Frontend][Core] Move guided decoding params into sampling params (#8252)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-01 09:34:25 +08:00 |
|
Lily Liu
|
bce324487a
|
[CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. (#8975)
|
2024-10-01 00:51:40 +00:00 |
|
Mor Zusman
|
f13a07b1f8
|
[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533)
|
2024-09-29 17:35:58 -04:00 |
|
danieljannai21
|
6c9ba48fde
|
[Frontend] Added support for HF's new continue_final_message parameter (#8942)
|
2024-09-29 17:59:47 +00:00 |
|
Jee Jee Li
|
3d49776bbb
|
[Model][LoRA]LoRA support added for MiniCPMV2.5 (#7199)
|
2024-09-29 06:59:45 +00:00 |
|
Cyrus Leung
|
26a68d5d7e
|
[CI/Build] Add test decorator for minimum GPU memory (#8925)
|
2024-09-29 02:50:51 +00:00 |
|
ElizaWszola
|
d081da0064
|
[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-09-28 18:19:40 -07:00 |
|
sroy745
|
5bf8789b2a
|
[Bugfix] Block manager v2 with preemption and lookahead slots (#8824)
|
2024-09-29 09:17:45 +08:00 |
|
Cyrus Leung
|
e1a3f5e831
|
[CI/Build] Update models tests & examples (#8874)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-28 09:54:35 -07:00 |
|
Varun Sundar Rabindranath
|
19d02ff938
|
[Bugfix] Fix PP for Multi-Step (#8887)
|
2024-09-28 08:52:46 -07:00 |
|
Varun Sundar Rabindranath
|
c2ec430ab5
|
[Core] Multi-Step + Single Step Prefills via Chunked Prefill code path (#8378)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-09-27 13:32:07 -07:00 |
|
Luka Govedič
|
172d1cd276
|
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271)
|
2024-09-27 14:25:10 -04:00 |
|
youkaichao
|
a9b15c606f
|
[torch.compile] use empty tensor instead of None for profiling (#8875)
|
2024-09-27 08:11:32 -07:00 |
|
Isotr0py
|
6d792d2f31
|
[Bugfix][VLM] Fix Fuyu batching inference with max_num_seqs>1 (#8892)
|
2024-09-27 01:15:58 -07:00 |
|
Cyrus Leung
|
3b00b9c26c
|
[Core] renamePromptInputs and inputs (#8876)
|
2024-09-26 20:35:15 -07:00 |
|
Maximilien de Bayser
|
344cd2b6f4
|
[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-09-26 17:01:42 -07:00 |
|
Nick Hill
|
4b377d6feb
|
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829)
|
2024-09-26 16:46:43 -07:00 |
|
Chirag Jain
|
ee2da3e9ef
|
fix validation: Only set tool_choice auto if at least one tool is provided (#8568)
|
2024-09-26 16:23:17 -07:00 |
|
Chen Zhang
|
770ec6024f
|
[Model] Add support for the multi-modal Llama 3.2 model (#8811)
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-25 13:29:32 -07:00 |
|
Simon Mo
|
4f1ba0844b
|
Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810)
|
2024-09-25 10:36:26 -07:00 |
|
Michael Goin
|
873edda6cf
|
[Misc] Support FP8 MoE for compressed-tensors (#8588)
|
2024-09-25 09:43:36 -07:00 |
|
Cyrus Leung
|
28e1299e60
|
rename PromptInputs and inputs with backward compatibility (#8760)
|
2024-09-25 09:36:47 -07:00 |
|
bnellnm
|
300da09177
|
[Kernel] Fullgraph and opcheck tests (#8479)
|
2024-09-25 08:35:52 -06:00 |
|
sroy745
|
fc3afc20df
|
Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (#8752)
|
2024-09-24 21:26:36 -07:00 |
|