Chauncey
|
3b00ff9138
|
[Bugfix][v1] xgrammar structured output supports Enum. (#15594)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-28 06:14:53 -07:00 |
|
Robert Shaw
|
2d9045fce8
|
[TPU][CI] Fix TPUModelRunner Test (#15667)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-03-28 00:01:26 -07:00 |
|
Robert Shaw
|
8a49eea74b
|
[CI][TPU] Temporarily Disable Quant Test on TPU (#15649)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-27 19:45:05 -07:00 |
|
Nick Hill
|
15dac210f0
|
[V1] AsyncLLM data parallel (#13923)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-27 16:14:41 -07:00 |
|
Nicolò Lucchesi
|
4098b72210
|
[Bugfix][TPU][V1] Fix recompilation (#15553)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-27 19:15:06 +00:00 |
|
Cody Yu
|
54aa619459
|
[V1] Refactor num_computed_tokens logic (#15307)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-27 04:54:36 +00:00 |
|
marko
|
27df5199d9
|
Support SHA256 as hash function in prefix caching (#15297)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-03-26 11:11:28 -07:00 |
|
Nick Hill
|
35fad35a48
|
[V1][Sampler] Faster top-k only implementation (#15478)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-26 10:56:47 -07:00 |
|
Chenyaaang
|
ac3cd6e83c
|
[core] add bucket padding to tpu_model_runner (#14995)
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-25 17:27:22 -04:00 |
|
Lu Fang
|
082ab86f5f
|
[V1] Support long_prefill_token_threshold in v1 scheduler (#15419)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-25 14:22:26 -07:00 |
|
yarongmu-google
|
0a049c7d86
|
[CI/Build] Add tests for the V1 tpu_model_runner. (#14843)
Signed-off-by: Yarong Mu <ymu@google.com>
|
2025-03-25 12:27:16 -04:00 |
|
Russell Bryant
|
a09ad90a72
|
[V1] guidance backend for structured output + auto fallback mode (#14779)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
|
2025-03-24 21:02:33 -07:00 |
|
Woosuk Kwon
|
ebcebeeb6b
|
[V1][Spec Decode] Enable spec decode for top-p & top-k sampling (#15063)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-24 17:16:46 -07:00 |
|
Nick Hill
|
9d72daf4ce
|
[V1][Perf] Simpler request output queues (#15156)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-24 22:44:08 +00:00 |
|
Woosuk Kwon
|
b9bd76ca14
|
[V1][Spec Decode] Respect prompt_lookup_max (#15348)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-23 10:41:44 -07:00 |
|
shangmingc
|
50c9636d87
|
[V1][Usage] Refactor speculative decoding configuration and tests (#14434)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-22 19:28:10 -10:00 |
|
Russell Bryant
|
eb63ea1e18
|
[V1] Add disable-any-whitespace option support for xgrammar (#15316)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-22 15:56:17 +00:00 |
|
Nicolò Lucchesi
|
cfbb8c930f
|
[TPU][V1] MHA Pallas backend (#15288)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-21 08:50:39 -07:00 |
|
Chen Zhang
|
93a00d7dde
|
[v1] Refactor KVCacheConfig (#14079)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-03-21 04:56:27 -07:00 |
|
Hyesoo Yang
|
47195057e9
|
[V1][TPU] Speed up top-k on TPU by using torch.topk (#15242)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
|
2025-03-20 19:19:40 -07:00 |
|
Woosuk Kwon
|
0c6f5023c3
|
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface (#15250)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-20 17:50:43 -07:00 |
|
Jason
|
d8e82bc06d
|
[Bugfix] fix V1 Engine crash while handling requests with duplicate request id (#15043)
Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>
|
2025-03-20 10:01:02 -07:00 |
|
Nicolò Lucchesi
|
d8c6d7d6b5
|
[V1][TPU] Support V1 Sampler for ragged attention (#14227)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-19 21:00:39 -07:00 |
|
Murali Andoorveedu
|
61c7a1b856
|
[V1] Minor V1 async engine test refactor (#15075)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: andoorve <murali.andoorveedu@mail.utoronto.ca>
Co-authored-by: andoorve <murali.andoorveedu@mail.utoronto.ca>
|
2025-03-19 10:37:17 -07:00 |
|
Cyrus Leung
|
f690372b68
|
[Core] Update dtype detection and defaults (#14858)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-19 13:49:33 +08:00 |
|
Alexander Matveev
|
72a8639b68
|
[V1] TPU - CI/CD use smaller model (#15054)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-18 21:39:21 +00:00 |
|
Woosuk Kwon
|
99abb8b650
|
[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels (#14930)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-18 14:31:54 -07:00 |
|
Aaron Pham
|
c0efdd655b
|
[Fix][Structured Output] using vocab_size to construct matcher (#14868)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-03-17 11:42:45 -04:00 |
|
vllmellm
|
2bb0e1a799
|
[Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-17 11:33:35 +00:00 |
|
Lily Liu
|
8d6cf89526
|
[V1] [Spec Decode] Support random sampling for spec decode (#13933)
Create Release / Create Release (push) Has been cancelled
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-16 22:00:20 -07:00 |
|
Sibi
|
a73e183e36
|
[Misc] Replace os environ to monkeypatch in test suite (#14516)
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-16 20:35:57 -07:00 |
|
Robert Shaw
|
aecc780dba
|
[V1] Enable Entrypoints Tests (#14903)
|
2025-03-16 17:56:16 -07:00 |
|
Nick Hill
|
fc1f67715d
|
[BugFix][V1] Fix overhead related to bad_words sampling when not in use (#14894)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-16 14:53:34 -07:00 |
|
Lily Liu
|
d1ad2a57af
|
[V1] [Spec Decode] Fix ngram tests (#14878)
|
2025-03-16 00:29:22 -07:00 |
|
Robert Shaw
|
d4d93db2c5
|
[V1] V1 Enablement Oracle (#13726)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-03-14 22:02:20 -07:00 |
|
Russell Bryant
|
46f98893dd
|
[V1] Fix model parameterization for structured output tests (#14833)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-14 20:55:18 +00:00 |
|
afeldman-nm
|
02fcaa3d0a
|
[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (#14624)
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
|
2025-03-13 19:07:34 +00:00 |
|
Nick Hill
|
f5d3acd474
|
[BugFix][V1] Fix parallel sampling finishing/aborts (#14512)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-12 10:29:48 -07:00 |
|
Benjamin Chislett
|
5c538c37b2
|
[V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing (#14645)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-03-11 22:12:41 -07:00 |
|
Aaron Pham
|
77a318bd01
|
[V1][Core] Support MistralTokenizer for Structured Output (#14625)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-12 10:40:09 +08:00 |
|
Russell Bryant
|
4bf82d4b90
|
[V1] Add regex structured output support with xgrammar (#14590)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-11 23:03:44 +08:00 |
|
22quinn
|
eb8b5eb183
|
[V1] Support bad_words in sampler (#13376)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-08 14:50:26 -08:00 |
|
Alexander Matveev
|
cb8bdfade2
|
[V1] TPU - Add tensor parallel support via Ray (#13618)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-08 08:19:38 -05:00 |
|
afeldman-nm
|
ef64044079
|
[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949)
|
2025-03-08 01:48:12 +00:00 |
|
Nick Hill
|
8ed5421aaa
|
[V1] Eagerly remove finished requests from the batch (#14388)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-07 10:56:00 -08:00 |
|
Aaron Pham
|
80e9afb5bc
|
[V1][Core] Support for Structured Outputs (#12388)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-07 07:19:11 -08:00 |
|
Himanshu Jaju
|
cd579352bf
|
[V1] Do not detokenize if sampling param detokenize is False (#14224)
Signed-off-by: Himanshu Jaju <hj@mistral.ai>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-06 10:40:24 -08:00 |
|
Harry Mellor
|
bf0560bda9
|
Reinstate best_of for V0 (#14356)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-06 08:34:22 -08:00 |
|
Lucas Wilkinson
|
f6bb18fd9a
|
[BugFix] MLA + V1, illegal memory access and accuracy issues (#14253)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-05 17:10:13 -08:00 |
|
Lu Fang
|
53ea6ad830
|
[V1][Easy] Add empty allowed_token_ids in the v1 sampler test (#14308)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-05 21:41:18 +00:00 |
|