Flat 20% & up to 50% off + Free additional Courses. Hurry up!

Grouping Events


7.1 Introduction

There are several ways to group events. The most common approach uses either the transaction or stats command. But when should you use transaction and when should you use stats?

The rule of thumb: If you can use stats, use stats. It’s faster than transaction, especially in a distributed environment. With that speed, however, comes some limitations. You can only group events with stats if they have at least one common field value and if you require no other constraints. Typically, the raw event text is discarded.

Like stats, the transaction command can group events based on common field values, but it can also use more complex constraints such as total time span of the transaction, delays between events within the transaction, and required beginning and ending events. Unlike stats, transaction retains the raw event text and field values from the original events, but it does not compute any statistics over the grouped events, other than the duration (the delta of the _time field between oldest and newest events in the transaction) and the eventcount (the total number of events in the transaction).

The transaction command is most useful in two specific cases:

  • When unique field values (also known as identifiers) are not sufficient to discriminate between discrete transactions.
  • When it is desirable to see the raw text of the events rather than an analysis on the constituent fields of the events.

Again, when neither of these cases is applicable, it is a better practice to use stats, as search performance for stats is generally better than transaction. Often there is a unique identifier, and stats can be used.

For example, to compute statistics on the duration of trades identified by the unique identifier trade_id, the following searches yield the same answer:

… | transaction trade_id

| chart count by duration

… | stats range(_time) as duration by trade_id

| chart count by duration

The second search is more efficient.


However, if trade_id values are reused but the last event of each trade is indicated by the text “END”, the only viable solution is:

… | transaction trade_id endswith=END

| chart count by duration

If, instead of an end condition, trade_id values are not reused within 10 minutes, the most viable solution is:

… | transaction trade_id maxpause=10m

| chart count by duration

Finally, a brief word about performance. No matter what search commands you use, it’s imperative for performance that you make the base search as specific as possible. Consider this search:

sourcetype=x | transaction field=ip maxpause=15s | search


Here we are retrieving all events of sourcetype=x, building up transactions, and then throwing away any that don’t have an ip= If all your events have the same ip value, this search should be:

sourcetype=x ip= | transaction field=ip maxpause=15s

This search retrieves only the events it needs to and is much more efficient.


7.2 Recipes


7.2.1 Unifying Field Names


You need to build transactions from multiple data sources that use different field names for the same identifier.



Typically, you can join transactions with common fields like:

… | transaction username

But when the username identifier is called different names (login, name, user, owner, and so on) in different data sources, you need to normalize the field names.

If sourcetype A only contains field_A and sourcetype B only contains field_B, create a new field called field_Z which is either field_A or field_B, depending on which is present in an event. You can then build the transaction based on the value of field_Z.

sourcetype=A OR sourcetype=B

| eval field_Z = coalesce(field_A, field_B)

| transaction field_Z


7.2.2 Finding Incomplete Transactions


You need to report on incomplete transactions, such as users who have logged in but not logged out.



Suppose you are searching for user sessions starting with a login and ending with a logout:

… | transaction userid startswith=”login”


You would like to build a report that shows incomplete transactions—users who have logged in but not logged out. How can you achieve this?

The transaction command creates an internal boolean field named closed_txn to indicate if a given transaction is complete or not. Normally incomplete transactions are not returned, but you can ask for these “evicted” partial transactions by specifying the parameter keepevicted=true. Evicted transactions are sets of events that do not match all the transaction parameters.

For example, the time requirements are not met in an evicted transaction. Transactions that fulfill all the requirements are marked as complete by having the field closed_txn set to 1 (rather than 0 for incomplete transactions). So the pattern for finding incomplete transactions would generally be:

… | transaction <conditions> keepevicted=true

| search closed_txn=0

In our case, however, there’s a wrinkle. An endswith condition not matching will not set the closed_txn=0 because events are processed from newest to oldest. Technically, the endswith condition starts the transaction, in terms of processing. To get around this, we need to filter transactions based on the closed_txn field, as well as make sure that our transactions don’t have both a login and a logout:

… | transaction userid startswith=”login”



| search closed_txn=0 NOT (login logout)


7.2.3 Calculating Times within Transactions


You need to find the duration times between events in a transaction.



The basic approach is to use the eval command to mark the points in time needed to measure the different durations, and then calculate the durations between these points using eval after the transaction command. but you have the full range of eval functions available to you for more complex situations.

…| eval p1start = if(searchmatch(“phase1”), _time, null())

| eval p2start = if(searchmatch(“phase2”), _time, null())

Next we make the actual transactions:

… | transaction id startswith=”start of event”

endswith=“end of event”

Finally we calculate the duration for each transaction, using the values calculated above.

…| eval p1_duration = p2start - p1start

| eval p2_duration = (_time + duration) - p2start


In this example, we calculated the time of the last event by added _time (the time of the first event) and adding duration to it. Once we knew the last event’s time, we calculated p2_duration as the difference between the last event and the start of phase2.


7.2.4 Finding the Latest Events


You need to find the latest event for each unique field value. For example, when was the last time each user logged in?



At first, you might be tempted to use the transaction or stats command. For example, this search returns, for each unique userid, the first value seen for each field:

… | stats first(*) by userid


Note that this search returns the first value of each field seen for events that have the same userid. It provides a union of all events that have that user ID, which is not what we want. What we want is the first event with a unique userid. The proper way to do that is with the dedup command:

… | dedup userid


7.2.5 Finding Repeated Events


You want to group all events with repeated occurrences of a value in order to remove noise from reports and alerts.



Suppose you have events as follows:

2012-07-22 11:45:23 code=239

2012-07-22 11:45:25 code=773

2012-07-22 11:45:26 code=-1

2012-07-22 11:45:27 code=-1

2012-07-22 11:45:28 code=-1

2012-07-22 11:45:29 code=292

2012-07-22 11:45:30 code=292

2012-07-22 11:45:32 code=-1

2012-07-22 11:45:33 code=444

2012-07-22 11:45:35 code=-1

2012-07-22 11:45:36 code=-1

Your goal is to get 7 events, one for each of the code values in a row: 239, 773, -1, 292, -1, 444, -1. You might be tempted to use the transaction command as follows:

… | transaction code


Using transaction here is a case of applying the wrong tool for the job. As long as we don’t really care about the number of repeated runs of duplicates, the more straightforward approach is to use dedup, which removes duplicates. By default, dedup will remove all duplicate events (where an event is a duplicate if it has the same values for the specified fields). But that’s not what we want; we want to remove duplicates that appear in a cluster. To do this, dedup has a consecutive=true option that tells it to remove only duplicates that are consecutive.

… | dedup code consecutive=true


7.2.6 Time Between Transactions


You want to determine the time between transactions, such as how long it’s been between user visits to your website.



Suppose we have a basic transaction search that groups all events by a given user (clientip-cookie pair), but splits the transactions when the user is inactive for 10 minutes:

… | transaction clientip, cookie maxpause=10m

Ultimately, our goal is to calculate, for each clientip-cookie pair, the difference in time between the end time of a transaction and the start time of a more recent (i.e. ‘previous’ in order of events returned) transaction.

That time difference is the gap between transactions. For example, suppose we had two pseudo transactions, returned from most recent to oldest:

T1: start=10:30 end=10:40 clientip=a cookie=x

T2: start=10:10 end=10:20 clientip=a cookie=x

The gap in time between these two transactions is the difference between the start time of T1 (10:30) and the end time of T2 (10:20), or 10 minutes. The rest of this recipe explains how to calculate these values.

First, we need to calculate the end time of each transaction, keeping in mind that the timestamp of a transaction is the time that the first event occurred and the duration is the number of seconds that elapsed between the first and last event in the transaction:

… | eval end_time = _time + duration


Next we need to add the start time from the previous (i.e., more recent) transaction to each transaction. That will allow us to calculate the difference between the start time of the previous transaction and our calculated end_time.

To do this we can use streamstats to calculate the last value of the start time (_time) seen in a sliding window of just one transaction— global=false and window=1—and to ignore the current event in that sliding window—current=false. In effect, we’re instructing streamstats to look only at the previous event’s value. Finally, note that we’re specifying that this window is only applicable to the given user (clientip-cookie pair):

… | streamstats first(_time) as prev_starttime

global=false window=1 current=false by clientip, cookie


At this point, the relevant fields might look something like this:

T1: _time=10:00:06, duration=4, end_time=10:00:10

T2: _time=10:00:01, duration=2, end_time=10:00:03


T3: _time=10:00:00, duration=0, end_time=10:00:01



Now, we can finally calculate the difference in time between the previous transaction’s start time (prev_starttime) and the calculated end_time. That difference is the gap between transactions, the amount of time (in seconds) passed between two consecutive transactions from the same user (clientip-cookie pair).

… | eval gap_time = prev_starttime – end_time

A detailed understanding of Splunk is available in this blog post for your perusal!

"0 Responses on Grouping Events"

Training in Cities

Bangalore, Hyderabad, Chennai, Delhi, Kolkata, UK, London, Chicago, San Francisco, Dallas, Washington, New York, Orlando, Boston

100% Secure Payments. All major credit & debit cards accepted Or Pay by Paypal.


Sales Offer

  • To avail this offer, enroll before 26th October 2016.
  • This offer cannot be combined with any other offer.
  • This offer is valid on selected courses only.
  • Please use coupon codes mentioned below to avail the offer
DW offer

Sign Up or Login to view the Free Grouping Events.