You can do that but it is not remotely AI. The term understand in this case means “is able to find a relevant answering phrase”. In a product such as Alexa (Amazon) the speech is prepared and parsed to generate words. The sentence or phrase is linguistically parsed to yield a “planned for” request or weasel words about not understanding. The request is altered to a set of potential answers which may require further lookups to get information from a site. The result is converted from words to voice and emitted. NONE of this is AI, it is pure computing in a very open field in which answers do not have to be perfect… but great if they are!
The hearing and conversion to words include a link to remembered experiences related to the domain addressed by the words. The AI has an understanding of reality and so can check the legality of the question (contextually and factually). If the knowledge of the answer is available then you will get a freshly constructed (original) sentence which knows about your expectations and so fulfills your requirements as to an answer. What then follows is a potential dialogue about the question which can last for an indefinite period of time. The key component here is something called discourse production - the ability to communicate with someone who shares a common experiential basis.
This common basis requires a common set of sensors and sampling of the same environment as you are in. At this point interpreting stuff in sensors, data fusion, qualia, consciousness all interact.
If you want to learn Artificial Intelligence from scratch then you can go this video tutorial:
If you are a complete beginner and want to make your career as an AI engineer then go through this Artificial Intelligence course and upskill yourself.