Lookup feature in Splunk
These lookup table recipes briefly show advanced solutions to common, real-world problems. Splunk’s lookup feature lets you reference fields in an external CSV file that match fields in your event data. Using this match, you can enrich your event data with additional fields. Note that we do not cover external scripted lookups or time-based lookups.
Watch this Splunk Tutorial video
These recipes extensively use three lookup search commands: lookup, inputlookup, and outputlookup.
For each event, this command finds matching rows in an external CSV table and returns the other column values, enriching the events. For example, an event with a host field value and a lookup table that has host and machine_type rows, specifying …| lookup mylookup host adds the machine_type value corresponding to the host value to each event.
Bydefault, matching is case-sensitive and does not support wildcards, but you can configure these options. Using the lookup command matches values in external tables explicitly. Automatic lookups, which are set up using Splunk Manager, match values implicitly.
This command returns the whole lookup table as search results. For example,… | inputlookup mylookup returns a search result for each row in the table mylookup, which has two field values: host and machine_type.
You might wonder how to create a lookup table. This command outputs the current search results to a lookup table on disk. For example, … |outputlookup mytable.csv saves all the results into mytable.csv.
Setting Default Lookup Values
You need a default field value if an event’s value is not in the lookup table.
There are several solutions. Using an explicit lookup, you can simply use the eval coalesce function:
… | lookup mylookup ip | eval domain=coalesce(domain,”unknown”)
Using automatic lookups, there’s a setting for that. Go to Manager >>Lookups >> Lookup Definition >> mylookup, select the Advanced options checkbox, and make the following changes:
Set Minimum matches: 1
Set Default matches: unknown
Save the changes.
Using Reverse Lookups
You need to search for events based on the output of a lookup table.
Splunk permits you to use reverse lookup searches, meaning you can search for the output value of an automatic lookup and Splunk can translate that into a search for the corresponding input fields of the lookup.
Using a Two-Tiered Lookup
You need to do a two-layered lookup. For example, look up an IP address in a table of common, well-known hosts and, if that fails for a given event, then and only then use a secondary, more expensive full DNS lookup.
After we’ve retrieved events, we do our initial lookup against local_dns.csv, a local lookup file:
… | lookup local_dns ip OUTPUT hostname
If the lookup doesn’t match, the hostname field is null for that event.We now perform the second, expensive lookup on events that have no hostname. By using OUTPUTNEW instead of OUTPUT, the lookup will only run on events that have a null value for hostname.
… | lookup dnslookup ip OUTPUTNEW hostname
Putting it all together:
... | lookup local_dns ip OUTPUT hostname
| lookup dnslookup ip OUTPUTNEW hostname
Using Multistep Lookups
You need to look up a value in one lookup file and use a returned field value from that first lookup to do a second lookup using a different lookup file.
You can do this manually by running sequential lookup commands. For example, if a first lookup table takes values of field A and outputs values of field B, and a second lookup table takes values of field B and outputs values of field C:
… | lookup my_first_lookup A | lookup my_second_lookup B
More interestingly, this can be done using automatic lookups, where this chaining happens automatically. It is imperative, however, that the lookups are run in the correct order, by using the alphanumeric precedence of property names.
Go to Manager >> Lookups >> Automatic lookups, and create two automatic lookups, making sure that the one to run later has a named value greater than the previous lookup name. For example:
0_first_lookup = my_first_lookup A OUTPUT B
1_second_lookup = my_second_lookup B OUTPUT C
Creating a Lookup Table from Search Results
You want to create a lookup table from search results.
If you were to simply do:
<some search> | outputlookup mylookupfile.csv
you might encounter two problems. First, events have many fields, including internal fields like _raw, and _time, which you don’t want in your lookup table. Second, of the fields you do care about, most likely there are duplicate values on the events retrieved. To handle the first problem, we won’t use the fields command because it’s inconvenient to remove internal fields. Instead, we’ll use the table command to better limit the fields to what we explicitly specify. To solve the second problem, use the dedup command.
Putting it all together:
… | table field1, field2
| dedup field1
| outputlookup mylookupfile.csv
Appending Results to Lookup Tables
You need to append results to an existing lookup table. For example, you want to create a single lookup table based on the results of multiple iterations of the same search. Specifically, suppose you wanted to keep track of the last IP each user logged in from. You might want to run a job every 15 minutes to look that up and update the lookup table with new users.
The basic procedure is to get the set of results you want to append to the lookup table, use inputlookup to append the current contents of the lookup, and use outputlookup to write the lookup. The command looks like this:
| fields the_interesting_fields
| inputlookup mylookup append=true
| dedup the_interesting_fields
| outputlookup mylookup
First, we told Splunk to retrieve the new data and retain only the fields needed for the lookup table. Next, we used inputlookup to append the existing rows in mylookup, by using the append=true option. Next we remove duplicates with dedup. Finally, we used outputlookup to output all these results to mylookup.
Comparing Results to Lookup Values
You want to compare the values in the lookup list with those in your events. For example, you have a lookup table with IP addresses and want to know which IP addresses occur in your data.
If events with particular field values are a small subset of your events, you can efficiently use subsearches to find relevant events. Use inputlookup in a subsearch to generate a large OR search of all the values seen in your lookup table. The size of the list returned from a subsearch can be 10,000 items in size (modifiable in limits.conf).
yoursearch [ inputlookup mylookup | fields ip ]
The resulting search executed looks similar to:
yoursearch AND ( ip=18.104.22.168 OR ip=22.214.171.124 OR … )
You can test what the subsearch returns by running the search that is inside the subsearch and appending the format command:
| inputlookup mylookup | fields ip | format
Controlling Lookup Matches
You have multiple entries in a lookup table for a given combination of input fields and want the first value to match. For example, your lookup table maps hostnames to several host aliases, and you want the first alias.
By default, Splunk returns up to 100 matches for lookups not involving a time element. You can update it to return only one. Using the UI, go to Manager >> Lookups >> Lookup definitions and edit or create your lookup definition. Select the Advanced options checkbox and enter 1 for Maximum matches.
Alternatively, you can edit the applicable transforms.conf. Add max_matches=1 to your lookups stanza.
You have a lookup table with ranges of IP addresses that you want to match.
Suppose your events have IP addresses in them and you have a table of IP ranges and ISPs:
You can specify a match_type for a lookup. Unfortunately, this functionality isn’t available in the UI but you can set it in the transforms.conf config file.
Set the match_type to CIDR for your network_range.
match_type = CIDR(network_range)
Matching with Wildcards
You need wildcard matching for your lookup table.
Suppose you have a lookup table with URLs you’d like to match on:
By including wildcard (*) characters in your lookup table values, you can direct Splunk to match on wildcards. As in the “Matching IPs” recipe, you can specify a match_type for a lookup in the transforms.conf config file:
match_type = WILDCARD(url)
Learn more about Splunk in this insightful blog now!