Environment Setup to Install Splunk
It covers installing Splunk, importing your data, and a bit about how the data is organized to facilitate searching.
Machine Data Basics
Splunk’s mission is to make machine data useful for people. Splunk divides raw machine data into discrete pieces of information known as events. When you do a simple search, Splunk retrieves the events that match your search terms. Each event consists of discrete pieces of data known as fields. In clock data, the fields might include second, minute, hour, day, month, and year.
Watch this Splunk Tutorial video
Types of Data Splunk Can Read
One of the common characteristics of machine data is that it almost always contains some indication of when the data was created or when an event described by the data occurred.
Given this characteristic, Splunk’s indexes are optimized to retrieve events in time-series order. If the raw data does not have an explicit timestamp, Splunk assigns the time at which the event was indexed by Splunk to the events in the data or uses other approximations, such as the time the file was last modified or the timestamp of previous events.
The only other requirement is that the machine data be textual, not binary, data. Image and sound files are common examples of binary data files. Some types of binary files, like the core dump produced when a program crashes, can be converted to textual information, such as a stack trace. Splunk can call your scripts to do that conversion before indexing the data. Ultimately, though, Splunk data must have a textual representation to be indexed and searched.
Splunk Data Sources
During indexing, Splunk can read machine data from any number of sources. The most common input sources are:
- Files: Splunk can monitor specific files or directories. If data is added to a file or a new file is added to a monitored directory, Splunk reads that data.
- The Network: Splunk can listen on TCP or UDP ports, reading any data sent.
- Scripted Inputs: Splunk can read the machine data output by programs or scripts, such as a Unix® command or a custom script that monitors sensors.
Downloading and Installing Splunk
We can download fully functional Splunk for free, for learning, or support small to moderate use of Splunk, and after downloading install Splunk after it starts the Splunk.
To start Splunk on Windows, launch the application from the Start menu. To start Splunk on Mac OS X or Unix, open a terminal window. Go to the directory where you installed Splunk, go to the bin subdirectory, and, at the command prompt, type:
./splunk start
The very last line of the information you see when Splunk starts is:
The Splunk web interface is at http://your-machinename:
8000
Follow that link to the login screen. If you don’t have a username and password, the default credentials are admin and change me. After you log in, the Welcome screen appears. The Welcome screen shows what you can do with your pristine instance of Splunk: add data or launch the search app.
Bringing Data in for Indexing
The next step in learning and exploring Splunk is to add some data to the index so you can explore it.
There are two steps to the indexing process:
- Downloading the sample file from the Splunk website
- Telling Splunk to index that file
To add the file to Splunk:
- From the Welcome screen, click Add Data.
- Click From files and directories on the bottom half of the screen.
- Select Skip preview.
- Click the radio button next to Upload and index a file.
- Select the file you downloaded to your desktop.
- Click Save.
Watch this Splunk Tutorial for Beginners video:
Understanding How Splunk Indexes Data
Splunk’s core value to most organizations is its unique ability to index machine data so that it can be quickly searched for analysis, reporting, and alerts. The data that you start with is called raw data. Splunk indexes raw data by creating a time-based map of the words in the data without modifying the data itself.
Before Splunk can search massive amounts of data, it must index the data. The Splunk index is similar to indexes in the back of textbooks, which point to pages with specific keywords. In Splunk, the “pages” are called events.
Splunk divides a stream of machine data into individual events. Remember, an event in machine data can be as simple as one line in a log file or as complicated as a stack trace containing several hundred lines.
Every grouping event in Splunk has at least four default fields. Default fields are indexed along with the raw data. The timestamp (_time) field is special because Splunk indexers use it to order events, enabling Splunk to efficiently retrieve events within a time range.