vllm/vllm/v1/engine at 54271bb7661be9a05dde5ac1164f47659add2f6f - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

History

Daniel Li 48cb2109b6 [V1] Move usage stats to worker and start logging TPU hardware (#16211 )

2025-04-25 14:06:01 -06:00

..

__init__.py

[V1][DP] More robust DP/EP dummy request coordination (#16277 )

2025-04-22 19:12:15 -07:00

async_llm.py

[V1] Move usage stats to worker and start logging TPU hardware (#16211 )

2025-04-25 14:06:01 -06:00

core_client.py

Use @property and private field for data_parallel_rank_local (#17053 )

2025-04-23 08:50:08 -07:00

core.py

[V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (#16954 )

2025-04-24 06:15:03 -07:00

detokenizer.py

Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158 )

2025-04-25 17:10:32 +08:00

exceptions.py

[V1][Frontend] Improve Shutdown And Logs (#11737 )

2025-04-16 19:48:34 -07:00

llm_engine.py

[V1] Move usage stats to worker and start logging TPU hardware (#16211 )

2025-04-25 14:06:01 -06:00

logprobs.py

[V1] Aggregate chunked prompt logprobs in model runner (#14875 )

2025-03-24 12:27:57 -04:00

mm_input_cache.py

[Bugfix] Multi-modal caches not acting like LRU caches (#16593 )

2025-04-14 09:24:16 -07:00

output_processor.py

Simplify TokenizerGroup (#16790 )

2025-04-24 04:43:56 -07:00

parallel_sampling.py

[V1] Avoid redundant input processing in n>1 case (#14985 )

2025-03-20 22:24:10 -07:00

processor.py

Simplify TokenizerGroup (#16790 )

2025-04-24 04:43:56 -07:00