The noise shape
In order to understand SpatialDropout1D, you should get used to the notion of the noise shape. In plain vanilla dropout, every part is unbroken or dropped severally. For example, if the tensor is [2, 2, 2], each of 8 elements can be zeroed out depending on random coin flip (with certain "heads" probability); in total, there will be 8 independent coin flips and any number of values may become zero, from 0 to 8.
Sometimes there is a need to do more than that. For example, one may need to drop the whole slice along 0 axis. The noise_shape during this case is [1, 2, 2] and the dropout involves only 4 independent random coin flips. The first part can either be unbroken along or be born along. The number of zeroed elements can be 0, 2, 4, 6 or 8. It cannot be 1 or 5.
Another way to view this is to imagine that input tensor is in fact [2, 2], but each value is double-precision (or multi-precision). Instead of dropping the bytes within the middle, the layer drops the full multi-byte value.
Why is it useful?
The example above is just for illustration and isn't common in real applications. More realistic example is this: shape(x) = [k, l, m, n] and noise_shape = [k, 1, 1, n]. In this case, each batch and channel component will be kept independently, but each row and column will be kept or not kept together. In alternative words, the whole [l, m] feature map will be either kept or dropped.
You may want to do this to account for adjacent pixels correlation, especially in the early convolutional layers. Effectively, you want to prevent co-adaptation of pixels with its neighbors across the feature maps, and make them learn as if no other feature maps exist. This is specifically what SpatialDropout2D is doing: it promotes independence between feature maps.
The SpatialDropout1D is very similar: given shape(x) = [k, l, m] it uses noise_shape = [k, 1, m] and drops entire 1-D feature maps.