I'm trying to put together a linear regression model but some of my featured are not numerical e.g. "Car Colour" whereas others are e.g. "Engine Size". In non-numerical cases, I'm unsure of how to represent this when adding as an input feature. The only way i could think of doing this would be to represent each color with a different value e.g. (red = 1, blue = 2, green = 3...) however this doesn't seem acceptable as this implies that green is "better" than red.

Can anybody help... I'm implementing this in Java so I'd appreciate an algorithm expressed in this language or to be language independent.