Data Warehouse Concepts; Architecture; Data Warehouse vs. Database; OLAP vs. OLTP
Last Updated on July 17, 2023 by Editorial Team
Author(s): Muttineni Sai Rohith
Originally published on Towards AI.
Data Warehouse is an information system that contains historical and commutative data from single or multiple sources. It is a centralized storage system that allows storing, analyzing, and interpretation of data in order to facilitate better decision-making. It is a vital component of business intelligence which is developed with the goal of creating a trove of historical data that can be retrieved and analyzed to provide useful insights according to the organizationβs operations.
Data Warehouse Architecture
A Data Warehouse architecture consists of different standard vital components. Each Data Warehouse is defined differently but is mostly characterized using these vital components. Below is the snapshot of the Data Warehouse architecture used across multiple groups within the organization.
Operational System β An operational system is a method used in data warehousing to refer to a system that is used to process the day-to-day transactions of an organization.
Flat Files β A Flat file system is a system of files in which transactional data is stored, and every file in the system must have a different name.
Meta Data β A set of data that defines and gives information about other data. Meta Data summarizes necessary information about data, which can make finding and work with particular instances of data more accessible. For example, author, data build, data changed, and file size is examples of very basic document metadata.
Metadata is used to direct a query to the most appropriate data source.
We must clean and process your operational information before putting it into the warehouse.
Staging area β simplifies data cleansing and consolidation for operational methods coming from multiple source systems, especially for enterprise data warehouses where all relevant data of an enterprise is consolidated.
Data Warehouse Staging Area is a temporary location where a record from source systems is copied.
We may want to customize our warehouseβs architecture for multiple groups within our organization. We can do this by adding data marts.
A data mart is a segment of a data warehouse that can provide information for reporting and analysis on a section, unit, department, or operation in the company, e.g., sales, payroll, production, etc.
Data Warehouse vs. Database
A database is a collection of related data that represents some elements of the real world, whereas a Data warehouse is an information system that stores historical and commutative data from single or multiple sources. The database helps to perform the fundamental operation of the business, while the data warehouse helps us to analyze our business.
For example, In the banking sector, to handle transactions and maintain consistency, we should use Database, whereas to analyze the data, Banking statements and predict user credibility, we have to use Data Warehouse.
Some key Differences between a Database and a Data Warehouse are listed below:
As said above, a Database uses the OLTP processing method, whereas Datawarehouse uses the OLAP processing method. So Now letβs understand the difference between OLTP and OLAP β
OLTP vs OLAP
OLTP (On-Line Transaction Processing) is featured by a large number of short online transactions (INSERT, UPDATE, and DELETE). The primary significance of OLTP operations is very rapid query processing, maintaining record integrity in multi-access environments, and effective consistency with the number of transactions per second. In the OLTP database, there is an accurate and current record, and the schema used to save the transactional database is the entity model (usually 3NF).
OLAP (On-Line Analytical Processing) is represented by a relatively low volume of transactions. Queries are very difficult and involve aggregations. For OLAP operations, response time is an effective measure. OLAP applications are generally used by Data Mining techniques. In the OLAP database, there is aggregated, historical information stored in multi-dimensional schemas (generally star schema or snowflake schema).
Below are some differences between the OLTP and OLAP β
The biggest difference between an OLTP and OLAP system is the amount of data analyzed in a single transaction. OLTP handles many concurrent customers and queries touching only a single data or limited collection of records at a time, whereas an OLAP system must have the efficiency to operate on millions of data to answer a single query.
The above information is being extracted from multiple sites, and I am storing this here so that it will be helpful for me or someone when they want to refer to this again.
Happy Learningβ¦
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI