SSIS User Handbook
If you are looking for a tool to automate the maintenance of the SQL Server database, which can perform data integration and workflow application, for the extraction, transformation, and loading of data, then SSIS can be taken into consideration. This cheat sheet will guide you through the basic concepts that are required to get started with it. It is a handy reference sheet for beginners and those who are familiar with various BI tools.
In this MSBI tutorial, we will be covering the following aspects of the SSIS user handbook:
- SQL Server Integration Server (SSIS)
- Features of SSIS
- Uses of SSIS
- SSIS Date/Time Data Types
- Data Warehouse
- Data Warehousing
- Transformations Available in SSIS
- SSIS Expressions
You can also download the printable PDF of this SSIS cheat sheet.
SQL Server Integration Service (SSIS):
SSIS is a component of the Microsoft SQL Server database that can be used to perform data migration tasks, by collecting data from various data sources and storing it in the central location. The main functions performed by SSIS are the extraction, transformation, and loading of the data.
SSIS Architecture
- Extraction: It is the collection of data from various sources.
- Transformation: It is converting the data in different forms collected from different sources according to the business requirement.
- Loading: Then, the data is loaded to a data warehouse.
- Data warehouse: It captures the data from various sources for useful analysis and access.
- Data Warehousing: It is the data storage system used for assembling and managing data from various sources for better decision-making.
Watch this SSIS Tutorial Full Course for Beginners video
Features of SSIS
Studio environment: SSIS includes two studios
- SSDT: Referred to as SQL Server Data Tools which is used to develop integrated service packages and perform several tasks like
-
- Copying basic package data from source to destination.
- Updating the properties of packages while run time.
- Creating deployment.
- Saving the package copies to SQL Server
- SSMS: Referred to as SQL Server management studio which is used for the package management in a production environment and perform tasks such as
-
- To create a folder for package management
- Executive Package utility is used to run the packages and store on a local system
- Generates a command line while running the executive package utility
- To store and fetch packages from SQL Server
Packages: The SSIS package is a collection of control flow and data flow. Control flow includes tasks and data flow task and data flow including source, transformation, and destination
Expressions: SSIS Expressions are the combination of literals, identifiers, and operators
Event handling: Handling a workflow that is designed for the number of events in different ways where it might occur and similar to building the packages.
SSIS Date/Time Data Types:
- DT_BOOL: It is a 1-bit Boolean Value
- DT_BYTES: It is a binary data value and the length is variable, the maximum length is up to 8000 bytes.
- DT_CY: It is a currency value and this data type is an eight-byte signed integer with a scale of 4 and a maximum precision of 19 digits
- DT_DATE (Format: yyyy-mm-dd): It is a data structure that consists of year, month, day, hour, minute, seconds, and fractional seconds. The maximum scale of fractional seconds is 7 digits
- DT_DBDATE: A date structure that consists of year, month, and day.
- DT_DBTIM (Format: hh:mm: ss): A time structure that consists of the hour, minute, and second.
- DT_DBTIME2 (Format: hh:mm: ss[.fffffff]): A time structure that consists of hour, minute, second, and fractional seconds. The maximum scale of fractional seconds is 7 digits
- DT_DBTIMESTAMP (Format: yyyy-mm-dd hh:mm: ss[. fff]): A timestamp structure that consists of year, month, day, hour, minute, second, and fractional seconds. The maximum scale of fractional seconds is 3 digits
- DT_DBTIMESTAMP2 (Format: yyyy-mm-dd hh:mm: ss[.fffffff]): A timestamp structure that consists of year, month, day, hour, minute, second, and fractional seconds. The maximum scale of fractional seconds is 7 digits
- DT_DBTIMESTAMPOFFSET (Format: yyyy-mm-dd hh:mm: ss[.fffffff] [{+|-} hh:mm]):A timestamp structure that consists of year, month, day, hour, minute, second, and fractional seconds. The maximum scale of fractional seconds is 7 digits
- DT_DECIMAL: An exact numeric value with a fixed precision and a fixed scale. This data type is a 12-byte unsigned integer with a separate sign, a scale of 0 to 28, and a maximum precision of 29
- DT_FILETIME (Format: yyyy-mm-dd hh:mm:ss:fff): A 64-bit value that represents the number of 100-nanosecond intervals maximum scale of fractional seconds is 3 digits
- DT_GUID: It is a globally unique identifier (GUID).
- DT_I1: It is a one-byte, signed integer.
- DT_I2: It is a two-byte, signed integer
- DT_I4: It is a four-byte, signed integer.
- DT_I8: It is an eight-byte, signed integer
- DT_NUMERIC: An exact numeric value with a fixed precision and scale. This data type is a 16-byte unsigned integer with a separate sign.
- DT_R4: It is a single-precision floating-point value.
- DT_R8: It is a double-precision floating-point value
- DT_STR: It is a null-terminated ANSI/MBCS character string with a maximum length of 8000 characters
- DT_UI1: It is a one-byte, unsigned integer
- DT_UI2: It is a two-byte, unsigned integer
- DT_UI4: It is a four-byte, unsigned integer.
- DT_UI8: It is an eight-byte, unsigned integer.
- DT_WSTR: It is a null-terminated Unicode character string with a maximum length of 4000 characters
- DT_IMAGE: It is a binary value with a maximum size of 2^31 -1 byte
- DT_NTEXT: It is a Unicode character string with a maximum length of 2^30-1 characters
- DT_TEXT: An ANSI character string with a maximum length of 2^31-1 characters
Get 100% Hike!
Master Most in Demand Skills Now!
Data Warehouse
A data warehouse is a collection of business data streaming in from different sources used by a business or enterprise for data analytics and decision-making. It is a single source of information where a company stores its data related to customers, sales, employees, etc.
The raw data is fetched from various sources and stored in the layer called the staging layer. From there it goes to the warehouse, where controlled access is given to various users with the help of SQL commands.
The key features of a data warehouse are as follows:
- Subject-oriented: This allows an organization to build a warehouse based on the type of information it wants to extract and analyze.
- Time-variant: Data is maintained through different time intervals, such as weekly, monthly, quarterly, etc.
- Integration: Organizations consistently add and connect data from diverse sources and arrange it uniformly.
- Non-volatile: The data only flows inside the warehouse. Once stored there, it cannot be changed or deleted by anyone.
Data Warehousing
Data Warehousing is the process of constructing and managing a data warehouse to collect data from various sources and provide business insights. It is carried out by transforming the normal transaction processes in the integration layer and storing the data in the data vault in the second normalized form (2NF).
From the data vault, separate entities called ‘data marts’ can be created for an individual user, having specific queries for the database. To create a data warehouse, organizations have to follow the ETL process, where ETL stands for extraction, transformation, and loading the data into the warehouse.
- Extraction: Getting the data from the original database into the stage layer
- Transformation: Transforming the data in a consistent format, e.g., transforming the profit made in different countries into a single currency type
- Loading: Once the data is transformed, it will be loaded to the data warehouse.
Transformations available in SSIS:
- Aggregate – It is used to apply aggregate functions to Record Sets to produce new output records from aggregated values.
- Audit – It is used to add Package and Task Level Metadata – such as Machine Name, Execution Instance, Package Name, Package ID, etc.
- Character map – It is used to perform SQL Server operations such as changing data from lower case to upper case.
- Conditional split – It is used to separate available input into separate output pipelines based on Boolean Expressions designed for each output.
- Copy column – It is used to add a copy of the column to the output so that we can later transform the copy keeping the original for auditing.
- Cache: It is used to write the data from a connected data source to a cache file
- Data conversion – It is used to convert columns data types from one type to another type. It stands for Explicit Column Conversion.
- Data mining query – It is used to perform data mining queries against analysis services and to manage Predictions, Graphs, and Controls.
- Derived column – Used to create a new column from given expressions.
- Export column – It is used to export an Image specific column from the database to a flat file.
- Fuzzy grouping – It is used for data cleansing by finding rows that are likely duplicates.
- Fuzzy lookup – It is used for Pattern Matching and Ranking based on fuzzy logic.
- Import column – It reads the image-specific column from the database onto a flat-file.
- Lookup – It performs a search of a given reference object set against a data source and is used for exact matches only.
- Merge – It is used to Merge two sorted data sets into a single data set into a single data flow.
- Merge join – It is used to merge two data sets into a single dataset using a join junction.
- Multicast – It sends a copy of supplied Data Source onto multiple Destinations.
- Row count – It is used to store the resulting row count from the data flow/transformation into a variable.
- Row sampling – It captures the sample data by using a row count of the total rows in dataflow specified by rows or percentage.
- Union all – It is used to Merge multiple data sets into a single dataset.
- Pivot – It is used for data source normalization by converting rows into columns
SSIS Expressions:
Some common expressions used to interact with your data with SSIS packages are given in the below table:
Statement |
Expression |
Create a file name with today’s date |
The expression on the Flat File or File Connection Manager:
“C:\\Project\\MyExtract”+(DT_WSTR,30)(DT_DBDATE)GETDATE() + “.csv”
Expression Output Example: C:\Project\MyExtract2009-03-20.csv |
Use a two-digit date |
RIGHT(“0” + (DT_WSTR, 2)MONTH(GETDATE()),2)
Expression Output: 03 (if the month is March) |
Multiple conditions if statement |
In this example, the statement determines that if the ColumnName is blank or NULL, it will be set to unknown. To make a Logical AND condition, use “&&” instead of the “||” operator.
ISNULL(ColumnName)||TRIM(ColumnName)==””?”Unknown”: ColumnName |
Returns the first five characters from a zip code |
Derived Column Transform in the Data Flow:
SUBSTRING(ZipCodePlus4,1,5) |
Remove a given character from a string |
Derived Column Transform in the Data Flow:
REPLACE(SocialSecurityNumber, “-“,””) |
Uppercase data |
Derived Column Transfrom in the Data Flow:
UPPER(ColumnName) |
Replace NULL with another value |
Derived Column Transform in the Data flow:
ISNULL(ColumnName)?”New Value”: ColumnName |
Replace blanks with NULL values |
Derived Column Transform in the Data Flow:
TRIM(ColumnName)==”” ? (DT_STR, 4, 1252)NULL(DT_STR, 4, 1252) : ColumnName |
Remove any non-numeric data from a column |
Script Transform in the Data Flow Task with the code as follows:
Imports System.Text.RegularExpressions
Public Overrides Sub Input()_ProcessInputRows(ByVal Row As Input()Buffer)
If Row.ColumnName_IsNull = False Or Row.ColumnName = “” Then
Dim pattern As String = String.Empty
Dim r As Regex = Nothing
pattern = “[^0-9]”
r = New Regex(pattern, RegexOptions.Compiled)
Row.ColumnName = Regex.Replace(Row.ColumnName, pattern, “”)
End If
End Sub |
Convert text to proper case |
Script Transform with the line of partial code as follows:
Row.OutputName = StrConv(Row.InputName, VBStrConv.ProperCase) |
Build dynamic SQL statement |
Expression on the SQLStatementSource property of Execute SQL Task(VB 2008):
“SELECT Column From ” + @[User::TableName] +WHERE
DateFilterColumn = ‘” + (DT_WSTR,4)YEAR(@
[User::DateTimeVar]) + RIGHT(“0” + (DT_WSTR,2)MONTH(@
[User::DateTimeVar]),2) + RIGHT(“0” + (DT_WSTR,2)DAY(@
[User::DateTimeVar]),2) + “‘” |
Calculate the beginning of the previous month |
Expression on component or task:
(DT_DATE)(DT_DBDATE)DATEADD
(“dd”,-1 * (DAY( GETDATE() )-1), DATEADD(“month”,-1, GETDATE() )) |
Round to the nearest two-decimal mark |
The expression on Derived Column Transform:
ROUND(YourNumber, 2)
Expression Output Example: 1.2600000 |
Uses of SSIS
- To combine data from multiple data sources
- For populating data warehouses and data marts
- For cleaning and standardizing the data
- To integrate BI with the data transformation process
- For automating administrative functions and loading the data
Download a Printable PDF of this Cheat Sheet
Conclusion
With this, we come to the end of the SSIS cheat sheet. To get in-depth knowledge, check out our interactive, live-online SSIS Certification Training here, which comes with 24/7 support to guide you throughout your learning period. Intellipaat’s SSIS online training will make you proficient in gathering data from flat files, XML, and relational data sources. It involves working on data extraction, integration, and transformation to align with business needs.