Incremental data load is the process implemented by Sqoop in order to perform a synchronization of data from a relational database management system to Hadoop. In Sqoop, an incremental data load command exists to achieve the same.
This is either done using Sqoop where we actually make changes in the data present inside HDFS or via using Hive to keep the actual data unchanged. The following are the various attributes to keep in mind when doing this:
- Mode (incremental): In order for Sqoop to determine the new rows. There are two modes possible, 'Last-Modified' and 'Append'.
- Col: To specify the column to determine which rows to import.
- Value: Denotes the maximum value of the check column from the previous import execution.