The term “Big Data” is used to describe information so large and complex that it becomes almost impossible to process using traditional methods. Because of the volume and/or unstructured nature of this data, making it useful or turning it into what the industry is calling “operational intelligence” (OI) is extremely difficult.
According to information provided by International Data Corporation (IDC), unstructured data (generated by machines) may account for more than 90% of the data in today’s organizations.
This type of data, (usually found in enormous and ever increasing volumes) records some sort of activity, behavior, or measurement of performance. Today, organizations are missing the opportunities that big data can provide because they are focused on structured data using traditional tools for business intelligence (BI) and data warehousing (DW).
Using these main-stream methods- such as relational or multi-dimensional databases in an attempt to understand big data is problematic (to say the least!) Attempting to use these tools for big data solution development requires serious experience and the development of very complex solutions and even then, in practice they do not allow enough flexibility to “ask any question” or get those questions answered in real time—which is now the expectation, not a “nice to have” feature.
Splunk – “solution accelerator”
Splunk started by focusing on the information technology department supporting the monitoring of servers, messaging queues, websites, etc. but is now recognized for its ability to help with the specific challenges (and opportunities) of effectively organizing and managing massive amounts of any kind of machine-generated big data.
Getting “right down to it”, Splunk reads (almost) any (even real-time) data into its internal repository, quickly indexes it and makes it available for immediate analytical analysis and reporting.
Typcial query languages depend on schemas. A (database) schema is how the data is to be “placed together” or structured. This structure is based upon the knowledge of the possible applications that will consume the data, the facts or type of information that will be loaded into the database, or the (identified) interests of the possible end-users. Splunk uses a “NoSQL” approach that is reportedly based on UNIX concepts and does not require any predefined schema.
Correlating of Information
Using Splunk Search, it is possible to easily identify relationships and patterns in your data and data sources based upon:
- Time, proximity or distance
- Transactions, either a single transaction or a series
- Sub-searches (these are searches that actually take the results of one search and then use them as input or to effect other searches)
- Lookups to external data and data sources
- SQL-like “joins”,
- Etc.
Keep in mind that the powerful capabilities built into Splunk do not stop with just flexible searching and correlating. With Splunk, users can also quickly create reports and dashboards with charts, histograms, trend lines, and much other visualization without the cost associated with the structuring or modeling of the data first.
Conclusion
Splunk has been emerging as a definitive leader for collecting, analyzing and visualizing machine big data. Its universal method of organizing and extracting insights from massive amounts of data, from virtually any source of data, has opened up and will continue to open up new opportunities for itself in unconventional areas. Bottom line – you’ll be seeing much more form Splunk!