• Articles
  • Tutorials
  • Interview Questions

Lookup Tables

Lookup feature in Splunk

These lookup table recipes briefly show the advanced solutions to a common and real-world problem. Splunk lookup feature lets you reference fields in an external CSV file that match fields in your event data. Using this match, you can enrich your event data with additional fields. Note that we do not cover external scripted lookups or time-based lookups.

Watch this Splunk Tutorial video

Video Thumbnail

Introduction

These recipes extensively use three lookup search commands: lookup, inputlookup, and outputlookup.

lookup

For each event, this command finds matching rows in an external CSV table and returns the other column values, enriching the events. For example, an event with a host field value and a lookup table that has a host and machine_type rows, specifying …| lookup mylookup host adds the machine_type value corresponding to the host value to each event.

Bydefault, matching is case-sensitive and does not support wildcards, but you can configure these options. Using the lookup command matches values in external tables explicitly. Automatic lookups, which are set up using Splunk Manager, match values implicitly.

inputlookup

This command returns the whole lookup table as search results. For example,… | inputlookup mylookup returns a search result for each row in the table mylookup, which has two field values: host and machine_type.

outputlookup

You might wonder how to create a lookup table. This command outputs the current search results to a lookup table on disk. For example, … |outputlookup mytable.csv saves all the results into mytable.csv.

Recipes

Setting Default Lookup Values

Problem
You need a default field value if an event’s value is not in the lookup table.
Solution
There are several solutions. Using an explicit lookup, you can simply use the eval coalesce function:

… | lookup mylookup ip | eval domain=coalesce(domain,”unknown”)

Using automatic lookups, there’s a setting for that. Go to Manager >>Lookups >> Lookup Definition >> mylookup, select the Advanced options checkbox, and make the following changes:

 
 Set Minimum matches: 1
Set Default matches: unknown
Save the changes.

Using Reverse Lookups

Problem
You need to search for events based on the output of a lookup table.
Solution
Splunk permits you to use reverse lookup searches, meaning you can search for the output value of an automatic lookup and Splunk can translate that into a search for the corresponding input fields of the lookup.

Get 100% Hike!

Master Most in Demand Skills Now!

Using a Two-Tiered Lookup

Problem
You need to do a two-layered lookup. For example, look up an IP address in a table of common, well-known hosts and, if that fails for a given event, then and only then use a secondary, more expensive full DNS lookup.
Solution
After we’ve retrieved events, we do our initial lookup against local_dns.csv, a local lookup file:
… | lookup local_dns ip OUTPUT hostname
If the lookup doesn’t match, the hostname field is null for that event. We now perform the second, expensive lookup on events that have no hostname. By using OUTPUTNEW instead of OUTPUT, the lookup will only run on events that have a null value for the hostname.
… | lookup dnslookup ip OUTPUTNEW hostname
Putting it all together:

... | lookup local_dns ip OUTPUT hostname
| lookup dnslookup ip OUTPUTNEW hostname

Using Multistep Lookups

Problem
You need to look up a value in one lookup file and use a returned field value from that first lookup to do a second lookup using a different lookup file.

Solution
You can do this manually by running sequential lookup commands. For example, if a first lookup table takes values of field A and outputs values of field B, and a second lookup table takes values of field B and outputs values of field C:

… | lookup my_first_lookup A | lookup my_second_lookup B

More interestingly, this can be done using automatic lookups, where this chaining happens automatically. It is imperative, however, that the lookups are run in the correct order, by using the alphanumeric precedence of property names.
Go to Manager >> Lookups >> Automatic lookups, and create two automatic lookups, making sure that the one to run later has a named value greater than the previous lookup name. For example:

0_first_lookup = my_first_lookup A OUTPUT B
1_second_lookup = my_second_lookup B OUTPUT C

Creating a Lookup Table from Search Results

Problem
You want to create a lookup table from search results.
Solution
If you were to simply do:
<some search> | outputlookup mylookupfile.csv
you might encounter two problems. First, events have many fields, including internal fields like _raw, and _time, which you don’t want in your lookup table. Second, of the fields you do care about, most likely there are duplicate values on the events retrieved.

To handle the first problem, we won’t use the fields command because it’s inconvenient to remove internal fields. Instead, we’ll use the table command to better limit the fields to what we explicitly specify. To solve the second problem, use the dedup command.
Putting it all together:

… | table field1, field2
| dedup field1
| outputlookup mylookupfile.csv

Appending Results to Lookup Tables

Problem
You need to append results to an existing lookup table. For example, you want to create a single lookup table based on the results of multiple iterations of the same search. Specifically, suppose you wanted to keep track of the last IP each user logged in from. You might want to run a job every 15 minutes to look that up and update the lookup table with new users.
Solution
The basic procedure is to get the set of results you want to append to the lookup table, use inputlookup to append the current contents of the lookup, and use outputlookup to write the lookup. The command looks like this:
your_search_to_retrieve_values_needed

| fields the_interesting_fields
| inputlookup mylookup append=true
| dedup the_interesting_fields
| outputlookup mylookup

First, we told Splunk to retrieve the new data and retain only the fields needed for the lookup table. Next, we used inputlookup to append the existing rows in mylookup, by using the append=true option. Next, we remove duplicates with dedup. Finally, we used outputlookup to output all these results to mylookup.

Comparing Results to Lookup Values

Problem
You want to compare the values in the lookup list with those in your events. For example, you have a lookup table with IP addresses and want to know which IP addresses occur in your data.
Solution
If events with particular field values are a small subset of your events, you can efficiently use subsearches to find relevant events. Use inputlookup in a subsearch to generate a large OR search of all the values seen in your lookup table. The size of the list returned from a subsearch can be 10,000 items in size (modifiable in limits.conf).
yoursearch [ inputlookup mylookup | fields ip ]
The resulting search executed looks similar to:
yoursearch AND ( ip=1.2.3.4 OR ip=1.2.3.5 OR … )
You can test what the subsearch returns by running the search that is inside the subsearch and appending the format command:
| inputlookup mylookup | fields ip | format

Controlling Lookup Matches

Problem
You have multiple entries in a lookup table for a given combination of input fields and want the first value to match. For example, your lookup table maps hostnames to several host aliases, and you want the first alias.
Solution

By default, Splunk returns up to 100 matches for lookups not involving a time element. You can update it to return only one. Using the UI, go to Manager >> Lookups >> Lookup definitions and edit or create your lookup definition. Select the Advanced options checkbox and enter 1 for Maximum matches.
Alternatively, you can edit the applicable transforms.conf. Add max_matches=1 to your lookups stanza.

Matching IPs

Problem
You have a lookup table with ranges of IP addresses that you want to match.
Solution
Suppose your events have IP addresses in them and you have a table of IP ranges and ISPs:
network_range, isp
220.165.96.0/19, isp_name1
220.64.192.0/19, isp_name2

You can specify a match_type for a lookup. Unfortunately, this functionality isn’t available in the UI but you can set it in the transforms.conf config file.
Set the match_type to CIDR for your network_range.
In transforms.conf:
[mylookup]
match_type = CIDR(network_range)

Matching with Wildcards

Problem
You need wildcard matching for your lookup table.
Solution
Suppose you have a lookup table with URLs you’d like to match on:
url, allowed
*.google.com/*, True
www.blacklist.org*, False
*/img/*jpg, False
By including wildcard (*) characters in your lookup table values, you can direct Splunk to match wildcards. As in the “Matching IPs” recipe, you can specify a match_type for a lookup in the transforms.conf config file:
[mylookup]
match_type = WILDCARD(url)

Course Schedule

Name Date Details
Big Data Course 23 Nov 2024(Sat-Sun) Weekend Batch View Details
30 Nov 2024(Sat-Sun) Weekend Batch
07 Dec 2024(Sat-Sun) Weekend Batch

About the Author

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.