vllm/vllm/model_executor at 2709c0009aa434fbf2ef0fe48ca3094a2268b190 - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

History

Simon Mo dd7e8f5f64 refactor complemention api for readability (#2499 )

2024-01-18 16:45:14 -08:00

..

[Experimental] Prefix Caching Support (#1669 )

2024-01-17 16:32:10 -08:00

fix stablelm.py tensor-parallel-size bug (#2482 )

2024-01-18 09:39:46 -08:00

Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )

2024-01-03 11:30:22 -08:00

__init__.py

Refactor Worker & InputMetadata (#1843 )

2023-11-29 22:16:37 -08:00

input_metadata.py

[Experimental] Prefix Caching Support (#1669 )

2024-01-17 16:32:10 -08:00

model_loader.py

Implement lazy model loader (#2044 )

2023-12-12 22:21:45 -08:00

sampling_metadata.py

Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )

2024-01-03 11:30:22 -08:00

utils.py

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 )

2023-11-15 22:50:41 -08:00

weight_utils.py

refactor complemention api for readability (#2499 )

2024-01-18 16:45:14 -08:00