Impala has its own daemons running on all your nodes which cache some of the data that is in HDFS, so that these daemons can return data quickly without having to go through a whole Map/Reduce job.
So, while processing SQL-like queries, It does not write intermediate results on disk, instead Impala does full SQL processing in memory directly, which helps its daemons to return data very quickly without even going through all the MapReduce jobs.
On the other hand, Hive uses underlying Map Reduce architecture for processing data which increases an extra layer to go through. This is the only reason why Impala gets an edge over Hive in terms of processing speed.
But Impala is not used for analyzing large datasets, It is only used for running queries on HDFS and Apache HBase as it does not require data to be transformed.
It can be a great tool to process some small ad-hoc queries but when it comes to perform data intensive task, where you need to analyze and process large dataset, Hive is your guy. Hive greatly simplifies the data processing tasks at scale.