Data integration technologies have evolved over the past decade, but advances to support big data are more recent. Our research shows a disparity in how well organizations handle big data integration tasks. Those that are mostly or completely adequate are accessing (for 63%), loading (60%), extracting (59%), archiving (55%) and copying (52%) data while the areas most in need of improvement are virtualizing (39%), profiling (37%), blending (34%), master data management (33%) and masking for privacy (33%). At the system level, the research finds that conventional enterprise capabilities are most often needed: load balancing (cited by 51%), cross-platform support (47%), a development and testing environment (42%), systems management (40%) and scalable execution of tasks (39%). To test the range of big data integration capabilities before it is applied to production projects, the “sandbox” has become the standard approach. For their development and testing environment, the largest percentage (36%) said they will use an internal sandbox with specialized big data. This group of findings reveals that big data integration has enterprise-level requirements that go beyond just loading data to build on advances in data integration.
Big data must not be a separate store of data but part of the overall enterprise and data architecture; that is necessary to ensure full integration and use of the data. Organizations that see data integration as critical to big data are embarking on sophisticated efforts to achieve it. The data integration capabilities most critical to their big data efforts are to develop and manage metadata that can be shared across BI systems (cited by 58%), to join disparate data sources during transformation (56%) and to establish rules for processing and routing data (56%).
For a process as complex as big data integration, choosing the right technology tool can be difficult. More than half (55%) of organizations are planning to change the way they assess and select such technology. Evaluations of big data integration tools should include considerations of how to deploy it and what sort of vendors can provide it. Almost half (46%) of organizations prefer to integrate big data on-premises while 28 percent opt for cloud-based software as a service and 17 percent have no preference. Half of organizations plan to use cloud computing for managing big data; another one-third (32%) don’t know whether they will. The research shows that the most important technology and vendor criteria used to evaluate big data integration technology are usability (very important for 53%), reliability (52%) and functionality (49%). These top three evaluation criteria are followed by manageability, TCO/ROI, adaptability and validation of vendors. Organizations are most concerned to have technology that is easy to use and can scale to meet their needs.
Big data cannot be used effectively without integration; we observe that the big data industry has not paid as much attention to information management as it should – after all, this is what enables automating the flow of data. Organizations trying to use big data without a focus on information management will have difficulty in optimizing the use of their data assets for business needs. Our research into big data integration finds that the proper technology is critical to meet these needs. We also learned from our benchmark research into big data analytics that data preparation is the largest and most time-consuming set of tasks that needs to be streamlined for best use of the analytics that reveal actionable insights. Organizations that are initiating or expanding their big data deployments whether onpremises or within cloud computing environments should have integration at the top of their priority list to ensure they do not create silos of data that they can’t fully exploit.
Regards,
Mark Smith
CEO and Chief Research Officer