I am using Word2Vec with a dataset of roughly 11,000,000 tokens looking to do both word similarity (as part of synonym extraction for a downstream task) but I don't have a good sense of how many dimensions I should use with Word2Vec. Does anyone have a good heuristic for the range of dimensions to consider based on the number of tokens/sentences?

You can say that a typical interval is between 100-300. I would suggest at least 50D to reach the lowest accuracy. If you pick a lesser number of dimensions, then you might start to lose properties of high dimensional spaces. If training time is not a big deal for your application, I would hold 200D dimensions as it would provide nice features. Extreme accuracy can be obtained with 300D. After 300D word features, training will be extremely slow.

