vllm/vllm/model_executor at 4abed65c5806d0514432d102f959a1c84d341171 - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

History

chenqianfzh 4664ceaad6 support bitsandbytes 8-bit and FP4 quantized models (#7445 )

2024-08-29 19:09:08 -04:00

..

guided_decoding

[misc][core] lazy import outlines (#7831 )

2024-08-24 00:51:38 -07:00

support bitsandbytes 8-bit and FP4 quantized models (#7445 )

2024-08-29 19:09:08 -04:00

support bitsandbytes 8-bit and FP4 quantized models (#7445 )

2024-08-29 19:09:08 -04:00

[VLM][Core] Fix exceptions on ragged NestedTensors (#7974 )

2024-08-29 03:24:31 +00:00

__init__.py

[Performance] Optimize e2e overheads: Reduce python allocations (#7162 )

2024-08-08 21:34:28 -07:00

custom_op.py

[XPU] fallback to native implementation for xpu custom op (#7670 )

2024-08-20 00:26:09 -07:00

parameter.py

[Misc] update fp8 to use vLLMParameter (#7437 )

2024-08-22 08:36:18 -04:00

pooling_metadata.py

[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )

2024-05-11 11:30:37 -07:00

sampling_metadata.py

[Core] Optimize SPMD architecture with delta + serialization optimization (#7109 )

2024-08-18 17:57:20 -07:00

utils.py

[Hardware][Neuron] Refactor neuron support (#3471 )

2024-03-22 01:22:17 +00:00