This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
Code
Issues
Pull Requests
Actions
2
Packages
Projects
Releases
Wiki
Activity
Files
9da4aad44b7878032ef2bb32eb1b4e1ab86f8351
vllm
/
vllm
/
attention
History
Thomas Parnell
e1684a766a
[Bugfix] Fix hard-coded value of x in context_attention_fwd (
#6373
)
...
Signed-off-by: Thomas Parnell <
tpa@zurich.ibm.com
>
2024-07-12 18:30:54 -07:00
..
backends
[Bugfix][TPU] Fix megacore setting for v5e-litepod (
#6397
)
2024-07-12 15:59:47 -07:00
ops
[Bugfix] Fix hard-coded value of x in context_attention_fwd (
#6373
)
2024-07-12 18:30:54 -07:00
__init__.py
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (
#4681
)
2024-05-15 14:00:10 +09:00
layer.py
[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (
#4888
)
2024-07-08 17:12:15 +00:00
selector.py
[Misc] Remove flashinfer warning, add flashinfer tests to CI (
#6351
)
2024-07-12 01:32:06 +00:00