Quantcast
Channel: CoE – Data & Analytics
Viewing all articles
Browse latest Browse all 73

Sub-Searching – with Splunk

$
0
0

You’ll find that it is pretty typical to utilize the concept of sub-searching in Splunk.

A “sub search” is simply a “search within a search” or, a search that uses another search as an argument. Sub searches in Splunk must be contained in square brackets and are evaluated first by the Splunk interpreter.

Think of a Sub search as being similar to a SQL subquery (a subquery is a SQL query nested inside a larger query).

Sub searches are mainly used for three purposes:

  • Parametrization (of a search, using the output of another search)
  • Appending (running a separate search, but stitching the output to the first search using the Splunk append command).
  • Conditions (To create a conditional search where you only see results of your search if it meets the criteria or perhaps threshold of the sub-search).

Normally, you’ll use a sub-search to take the results of one search and use them in another search all in a single Splunk search pipeline. Because of how this works, the second search must be able to accept arguments; such as with the append command (as I’ve already mentioned).

Parametrization

sourcetype=TM1* ERROR[search earliest=-30d | top limit=1 date_mday| fields + date_mday]

The above Splunk search utilizes a sub search as a parametrized search of all TM1 logs indexed within a Splunk instance that have “error” events. The sub search (enclosed in the square brackets) filters the search first to the past 30 days and then to the day which had the most events.

Appending

The Splunk appendcommand can be used to append the results of a sub-search to the results of a current search:

sourcetype=TM1* ERROR | stats dc(date_year), count by sourcetype | append [search sourcetype=TM1* | top 1 sourcetype by date_year]

The above Splunk search utilizes a sub search with an append command to combine 2 TM1 server log searches; these search though all indexed TM1 sources for “Error” events. The first search yields a count of events by TM1 source by year; the second (sub) search returns the top or (or most active) TM1 source by year. The results of the 2 searches are then appended.

 Conditional

sourcetype=access_* | stats dc(clientip), count by method | append [search sourcetype=access_* clientip where action = 'addtocart' by method]

The above Splunk search – which counts the number of different IP addresses which accessed a server and also finds the user who accessed the server the most for each type of page request (method) is modified with a “where clause” to limit the counts to only those that are “addtocart” actions. (In other words, which user added the most to his online shopping cart whether they actual purchased anything or not).

Output Settings for Sub-searches

When performing Splunk sub searches you will often utilize the format command. This command takes the results of a sub-search and formats them into a single result.

Depending upon the search pipeline, the results returned may be numerous, which will impact the performance of your search. To remedy this you can change the number of results that the format command operates over in-line with your search by appending the following to the end of your sub-search:

| format maxresults = <integer>.

I recommended that you take a very conservative approach and utilize the Splunk limits.conf file to enforce limits of all you rsub-searches.  This file exists in the $SPLUNK_HOME/etc/system/default/ folder (for global settings) or, for localized control, you may find (or create) a copy in $SPLUNK_HOME/etc/system/local/ folder.

The file controls all Splunk searches (providing it is coded correctly, based upon your environment) but also contains a section specific to Splunk sub-searches, titled “subsearch”.

Within this section, there are 3 important sub-sections:

  • maxout (this is the maximum number of results to return form a subsearch. The default is 100).
  • maxtime (this is the maximum number of seconds to run a subsearch before finalizing. Defaults to 60).
  • ttl (this is the time to cache a given subsearch’ s results (the default is 300).

Splunk On!

 


Viewing all articles
Browse latest Browse all 73

Trending Articles