0 votes
1 view
in Machine Learning by (12.5k points)

I want to create a dataset that has the same format as the cifar-10 data set to use with Tensorflow. It should have images and labels. Basically, I'd like to be able to take the cifar-10 code but different images and labels, and run that code. I haven't found any information on how to do this online, and am completely new to machine learning.

1 Answer

0 votes
by (32.8k points)

The following code would help you to solve your problem:

from PIL import Image

import numpy as np

im = Image.open('images.jpeg')

im = (np.array(im))

r = im[:,:,0].flatten()

g = im[:,:,1].flatten()

b = im[:,:,2].flatten()

label = [1]

out = np.array(list(label) + list(r) + list(g) + list(b),np.uint8)

out.tofile("out.bin")

This code syntax will convert an image into a byte file that is ready for use in CIFAR10. For multiple images, just keep concatenating the arrays, as stated in the format above. You should get a file size of 427*427*3 + 1 = 546988 bytes. Assuming your pictures are RGB and values range from 0-255. Check the run in TensorFlow. 

Hope this answer helps you!

...