Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

While training a tensorflow seq2seq model I see the following messages :

W tensorflow/core/common_runtime/gpu/] PoolAllocator: After 0 get requests, put_count=2362 evicted_count=2000 eviction_rate=0.84674 and unsatisfied allocation rate=-nan

W tensorflow/core/common_runtime/gpu/] PoolAllocator: After 38 get requests, put_count=5436 evicted_count=5000 eviction_rate=0.919794 and unsatisfied allocation rate=0

What does it mean, does it mean I am having some resource allocation issues? Am running on Titan X 3500+ CUDA ,12 GB GPU

1 Answer

0 votes
by (33.1k points)

There are multiple memory allocators in TensorFlow, so that memory can be used in different ways. Their behavior has some adaptive aspects.

For GPU users, there is a PoolAllocator for CPU memory that is pre-registered with the GPU for fast DMA. A tensor that is expected to be transferred from CPU to GPU, e.g., will be allocated from this pool.

The main working concept of PoolAllocators is to decompose the cost of calling a more expensive underlying allocator by keeping around a pool of allocated then freed chunks that are eligible for immediate reuse. Their default approach is to grow slowly until the removal rate drops below some constant. The removal rate is the proportion of free calls where we return an unused part from the pool to the underlying pool in order not to exceed the size limit.) In the log messages above, you see "Raising pool_size_limit_" lines that show the pool size growing. If your program actually has a steady state behavior with a maximum size collection of chunks it needs, the pool will grow to accommodate it, and then grow no more. It behaves this way rather than simply retaining all chunks ever allocated so that sizes needed only rarely, or only during program startup, are less likely to be retained in the pool.

These messages should only be a reason for concern if you run out of memory. In such a case the log messages may help diagnose the problem. The peak execution speed may only be attained after the memory pools have grown to the proper size.

Browse Categories