Recently, a novel machine learning model has emerged in the field of reinforcement learning known as deep Q-learning. This model is capable of finding the best possible solution in systems consisting of millions of choices, without ever experiencing it before, and has been used to beat the best human minds at complex games such as, Go and chess, which both have a huge number of possible decisions and outcomes for each move. With a human-level intelligence, it has solved the problems that no other machine learning model has done before. Here, we show the steps needed for implementing this model to an optical problem. We investigate the colour generation by dielectric nanostructures and show that this model can find geometrical properties that can generate much purer red, green and blue colours compared to previously reported results. The model found these results in 9000 steps from a possible 34.5 million solutions. This technique can easily be extended to predict and optimise the design parameters for other optical structures.