In a single load pattern current Data warehouse system contribute much efforts to address the problems at different stages like gathering and integrating the source systems, transformation and target problems.
The motto of Data Vault is “Divide and Conquer!”, a simple strategy followed. This strategy wins by analyzing the problems at each stage of data processing like Sourcing, Transforming and Targeting problems. Sourcing problems:· Cross system Joins, filters and aggregations.· Synchronization of multiple source data availability.· Indexing issues, leading to performance problems.· Source systems password issues, Bad and out of range source data, structure complexities and transactional record locks.Transforming problems:· Quality and alignment.· Sequence assignment, often leading to lack of parallelism · Data type corrections · Error handling, tracking the invalid data.
· Multiple sources, targets and target errors.· Implementing Business rules, especially across SPLIT data streams.Target Problems:· Lack of database tuning· Update, Insert and Delete mixed statements – forcing data Order to be specific, cutting off possibilities for executing in parallel.
· Handling errors with Multiple targets in one stream.· Index issues while loading the target Data Marts.Sourcing problems are addressed by separating each source load and loading to a staging area, i.e. no business rules are applied, data type conversions and system joins are not made. These simple rules ensure to get the data from the source and get out when the source data is ready avoiding waiting, joining, performance and time complexities. Moving the business rules downstream between Data vault and Data Marts keeps transforming problems away, business rules like joins, filters aggregations etc.
This allows effective targeting proper Data Mart, with the proper business rules set. Loading raw data into Data Vault area, this provides simple, maintainable and easy to use load codes that meets the business needs. With this pattern it prevents from re-engineering loading routines to integrate new systems or add new data, having only inserts will solve traditional problems with no more locking problems providing high degrees of parallelism, partitioning and performance. Although, we end up lot more routines, each routine is thousand times less complex and easier to manage.