A pre-copy script in a copy activity is used to execute SQL commands on the sink (destination) before copying new data from the source. The sink table refers to the destination table which holds the results after any necessary changes or updates have been applied, such as removing records based on the change tracking table.
Steps to Use Pre-Copy Script from the Copy Activity to Remove Records from Sink Based on Change Tracking Table from Source
Step 1: Identify the change tracking table
You need to make sure that the source database has Change Tracking enabled on the tables that track the changes. This will allow you to identify which rows were added, modified, or deleted.
Step 2: Create the Source and Sink Datasets:
Now you need to set up the source dataset and sink dataset. Source Dataset should point to the table with change tracking enabled and Sink Dataset should point to the destination table where the records are copied.
Create a Copy Activity to move data from source to sink. Once done, you need to configure your destination table under the sink tab. Now, in the “Pre-Copy Script” section, enter the SQL script that will run on the sink database before the data copy operation starts. This script should remove or clean up the records in the sink table that are identified by the change tracking mechanism.
Step 4: Run the pipeline
Execute the data pipeline. The pre-copy script will run first, removing specified records from the sink. Then, the Copy Activity will transfer new data from the source to the sink.
Example to Use Pre-Copy Script
Let’s learn the concept in more detail with the example
# Create source and Change Tracking tables
CREATE TABLE source_table (
record_id INT PRIMARY KEY,
data_column VARCHAR(100)
);
CREATE TABLE change_tracking_table (
record_id INT,
change_type VARCHAR(10),
change_timestamp TIMESTAMP
);
# Provide Sample Data
INSERT INTO source_table (record_id, data_column) VALUES (1, 'Data1'), (2, 'Data2'), (3, 'Data3');
INSERT INTO change_tracking_table (record_id, change_type, change_timestamp) VALUES (1, 'DELETE', NOW()), (3, 'DELETE', NOW());
# Create a Sink table
CREATE TABLE sink_table (record_id INT PRIMARY KEY, data_column VARCHAR(100));
INSERT INTO sink_table (record_id, data_column) VALUES (1, 'Data1'), (2, 'Data2'), (3, 'Data3');
# Pre-copy Script
DELETE FROM sink_table
Configure in Copy Activity and run the pipeline.
Output :
Before Execution of Pre-Copy Script:
sink_table
record_id | column_data |
1 | Data1 |
2 | Data2 |
3 | Data3 |
After Execution of Pre-Copy Script:
sink_table
record_id | column_data |
2 | Data2 |
In this example, before execution, the sink table contained all records ‘Data 1’, ‘Data 2’, and ‘Data 3’, but after execution, only the non-deleted record ‘Data 2’ remained.
Get 100% Hike!
Master Most in Demand Skills Now!
Conclusion
In data integration, it is important to make sure that the destination table accurately matches the current state of the source data. We have also explored processes like Identifying the change tracking table, writing the pre-copy script, configuring the copy activity and running pipelines in this blog.
FAQS
1. What is a pre-copy script?
A pre-copy script is a SQL script that runs before the actual data copy process in a copy activity.
2. How can I use a pre-copy script to remove records in the sink based on a change tracking table from the source?
You can write a SQL script that deletes records from the sink table based on the change tracking information from the source table.
DELETE FROM sink_table
WHERE id IN (
SELECT id FROM source_table
WHERE change_date > '2023-01-01'
);
3. How do I configure the pre-copy script in a copy activity?
Open your data integration tool, such as Azure Data Factory. In the pipeline, go to the settings of the copy activity. In the “Pre-Copy Script” section, enter the SQL script.