Sap Hana Architecture
SAP is likely the largest computer software house in the world today. SAP is famous for its Enterprise Resource Planning (ERP) software, used by most of the largest companies throughout the world.
The SAP BW (Business Warehouse—the term used to describe the underlying technology, as opposed to Business Intelligence, used to describe the user-facing technologies) system allows users to report on the data stored in the ERP system, allowing anything from simple analysis to complex simulations on sales forecasts depending on different factors.
The BW system usually does not use the same exact database machine as the ERP system—data is moved from the ERP to the BW machine for reporting. This is done so as not to impact the data entry (vital) functions with someone wanting a report on last year’s sales (which is less important). Hence, the data in BW (and in data warehouses in general) is not always up-to-date. Data loads are generally done once per day, introducing a slight delay in data freshness..
The two different principal functions of SAP’s software, as follows:
- SAP ERP stores data in a database
- SAP BW takes the data in the database, aggregates it, and presents totals and trends to the user
SAP HANA architecture
The two main parts of the SAP HANA server system are the hardware and the software. On the client side, SAP provides the SAP HANA Studio which allows application modeling.
For data reporting on a SAP HANA system, SAP’s BusinessObjects software can connect natively to SAP HANA, and reporting can be done in any other program that can create and consume MDX queries (such as Microsoft Excel pivot tables), which SAP HANA supports natively.
The following diagram is an overview (provided by SAP) of the SAP HANA system architecture, showing clearly the different components and integration between them:
The SAP HANA box itself is a massively multi-core, multi-CPU server, with a great deal of memory—up to several terabytes. One of the main strong points of SAP HANA is its ability to process data in parallel, cutting the initial (large) amount of data into small chunks, and then giving each chunk to a separate CPU to work on—hence the need for the large number of CPU cores.
One other aspect of the system is that wherever possible, data is kept in memory, in order to speed up access time. Where a traditional database system might set aside a gigabyte or two of memory as a cache, SAP HANA takes this to the next level, using nearly all the server’s memory for the data, making access times nearly instantaneous.
The database software powering SAP HANA is what’s known as a column-based RDBMS, and is a logical evolution of the following three existing technologies that were already in use at SAP:
SAP’s search engine, a component of SAP NetWeaver since 2000. TREX already included in-memory and columnar store attributes, which were designed to improve performance by searching data already in main memory, and already in highly optimized data structures.
SAP’s own RDBMS technology. MaxDB is a very capable, relatively simple (when compared to some other big players such as Oracle) RDBMS system. It is capable of running the SAP ERP or SAP BW, despite having very low system requirements and a fairly shallow learning curve. MaxDB brought in the persistence (that is, what happens when the power goes off—a crucial question for an in-memory system) and backup layers to SAP HANA.
A lightweight, OLTP in-memory RDBMS system, acquired by SAP in 2005 when they bought Transact in Memory. P*Time provided the in-memory backbone to the SAP HANA software. It is worth noting that P*Time is a traditional row-based, not column-based, data store.
How does a column-based database work:
In any given database, tables such as the following table exist:
Whereas, a traditional database system will store the data in a row-based format, as shown as follows:
1, Joe, Smith, 35000, EUR
2, Emma, Thomson, 40000,USD
3, Sam, Wiggins, 42500, USD
A column-based RDBMS will store each column together, as shown as follows:
A column-based database will be able to more quickly scan a column of data than a row-based system. This point is especially important for data reporting (like SAP BW), and enables results to be given to the user much more quickly.
Another important aspect of a column-based RDBMS is data compression. Since all values in a column are stored together, there is the possibility of storing the value only once, alongside the number of occurrences. So in the example table we’ve just seen, the last column might be stored as follows: