I understand neural networks with any number of hidden layers can approximate nonlinear functions, however, can it approximate:

f(x) = x^2

I can't think of how it could. It seems like a very obvious limitation of neural networks that can potentially limit what it can do. For example, because of this limitation, neural networks probably can't properly approximate many functions used in statistics like Exponential Moving Average, or even variance.

Speaking of moving average, can recurrent neural networks properly approximate that? I understand how a feedforward neural network or even a single linear neuron can output a moving average using the sliding window technique, but how would recurrent neural networks do it without X amount of hidden layers (X being the moving average size)?

Also, let us assume we don't know the original function f, which happens to get the average of the last 500 inputs, and then output a 1 if it's higher than 3, and 0 if it's not. But for a second, pretend we don't know that, it's a black box.

How would a recurrent neural network approximate that? We would first need to know how many timesteps it should have, which we don't. Perhaps a LSTM network could, but even then, what if it's not a simple moving average, it's an exponential moving average? I don't think even LSTM can do it.

Even worse still, what if f(x,x1) that we are trying to learn is simply

f(x,x1) = x * x1

That seems very simple and straightforward. Can a neural network learn it? I don't see how.

Am I missing something huge here or are machine learning algorithms extremely limited? Are there other learning techniques besides neural networks that can actually do any of this?