Big data analytics is the use of advanced analytic techniques against very large, diverse data sets that include structured, semi-structured and unstructured data, from different sources, and in different sizes from terabytes to zettabytes.
Big data is a term applied to data sets whose size or type is beyond the ability of traditional relational databases to capture, manage and process the data with low latency. Big data has one or more of the following characteristics: high volume, high velocity or high variety. Artificial intelligence (AI), mobile, social and the Internet of Things (IoT) are driving data complexity through new forms and sources of data. For example, big data comes from sensors, devices, video/audio, networks, log files, transactional applications, web, and social media — much of it generated in real time and at a very large scale.
Analysis of big data allows analysts, researchers and business users to make better and faster decisions using data that was previously inaccessible or unusable. Businesses can use advanced analytics techniques such as text analytics, machine learning, predictive analytics, data mining, statistics and natural language processing to gain new insights from previously untapped data sources independently or together with existing enterprise data.
The life of an enterprise architect is becoming busy and difficult. Before the era of big data, the enterprise architect “only” had to worry about the data and systems within their own data center. However, over the past decade there were revolutionary changes to the way information is used by businesses and how data management platforms support the information available from modern data sources.
Cloud broke down the boundaries of enterprise data centers, with applications housed and data created outside the “four walls” of an organization. This introduced a host of complexities for enterprise architects focused on security, privacy, and control. Mobile influences continued to push data outside the data center. Maintaining data flows to each of those data access points, often as tablets or mobile phones, introduced additional troubles. Incoming data from mobile devices brought new data formats and a flood of information to the enterprise architect.
These alterations in the formats and locations of systems and data created massive change for data-driven organizations who want to develop competitive advantage. That advantage may come in the form of new data sources such as device sensor logs, social media streams, and mobile device geolocation information; create new projects to take advantage of these new data sources; and establish environments with diverse data management platforms to support these efforts.]
New and Exciting Data Sources
The transformations over the past decade mandated and created a range of additional data sources, both inside and outside an organization, for enterprise architects to consider.
External data sources from third-party content, often via cloudbased providers, can change their data structure without notification to downstream organizations using that information. Event log and device sensor information also have variability based on their individual configuration and frequently come in the form of multistructured formats such as XML or JSON. Social data sources created by customers and the general public are based on textual formats and audio/video content. Both text and audio/visual are difficult to store and utilize due to the nature of the information.
Traditional relational data sources are also included in the wave of big data change, but they have their own challenges. Increasingly, the information is coming from outside the data center from third parties or cloud-based implementations of corporate data, which requires enterprise architects to seek out and learn how to utilize that information.
Business-oriented data consumers can manage their own pursuits, which can take the form of data discovery or exploratory activities to find new uses for big data sources. They can be analytical projects to align costs via cost-management activities, or advanced analytical projects to find the next set of attributes for high-revenue customers.
This is not a one-off implementation like a “spreadmart” or an unmanaged shadow IT project. Instead, it is a supervised and administered environment strategically provided to business stakeholders so they can meet their own needs while utilizing corporate assets. This may be in the form of new big data sources and platforms, and capturing valuable metadata that can be utilized across the organization.
Enterprise architects create self-service capabilities for big data. By designing and implementing application environments with configurable software components, empowering technologists through skills development, and actively sanctioning interaction between technology implementation teams and the various lines of business, enterprise architects provide environments where business stakeholders can have requirements met at the “speed of business” rather than the speed of an implementation backlog. The business can focus on how to best use new big data resources without waiting on tactical IT workflows while maintaining the implementation components for distribution and continued development across the organization.
The above post is an extract from a white paper from IBM entitled “Making Sense of Big Data”. The full paper can be downloaded here.