Splunk – 8.0.1 Metrics vs Events Licensing Comparison - Updated with Metrics MK¶
Note- this page is hidden from view due to inaccuracies discovered long after this article was published. The below method is flawed, and actually only records the first instance from each perfmon category collected.
Info
This post was originally published in 2020, and has been adopted to this static-site from wordpress.
This is an updated version of the original 8.0.1 test, located here. The reason for the update- Splunk reached out to me and provided me with a newly introduced method of ingesting metrics, as of version 8.0.
As a result, I implemented the new methods, and re-executed the tests, INCLUDING the original methods, along with the new methods as well.
[su_spoiler title="TL;DR Spoiler"]
By leveraging the Metrics MK format- I was able to reduce my license requirement by over 90% compared to PerfmonMK format as events. Compared to the default out of the box Perfmon data, over 98% reduction in licensing!
At the same time, It used less overall disk storage then any of the other current methods, while performing MUCH faster!
If you aren't evaluating converting your perfmon data to metrics, You need to start!!
[/su_spoiler]
How testing will be performed¶
For testing purposes, I will have four inputs, each pointing at their own separate index. Each of the inputs are configured with the same data collection, and interval.
- Regular Perfmon as Events (Default for TA_Windows)
- Regular Perfmon as Metrics
- Perfmon MK as Events
- Perfmon MK as Metrics MK (New Method)
For testing, I will be looking at the LogicalDisk perfmon, collecting data at a 15 second interval, with a very generous handful of metrics selected, to facilitate collecting a lot of data, rather quickly.
[su_accordion] [su_spoiler title="inputs.conf" open="no" style="fancy" icon="plus" anchor="" class=""]
# Regular Perfmon Data, stored in Events index.
[perfmon://LogicalDisk_Event]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 1
interval = 15
useEnglishOnly = true
index=perfmon_disk_events
showZeroValue=1
# Regular Perfmon Data, stored in Metrics index.
[perfmon://LogicalDisk_Metric]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 1
interval = 15
useEnglishOnly = true
index=perfmon_disk_metrics
showZeroValue=1
sourcetype=Perfmon_To_Metric
# Perfmon MK Data, Stored in Events index.
[perfmon://LogicalDisk_MK_Event]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 1
interval = 15
useEnglishOnly = true
index=perfmon_mk_disk_events
mode=multikv
showZeroValue=1
# Perfmon MK Data, Stored in Metrics Index.
[perfmon://LogicalDisk_MK_MVMetric]
counters = % Free Space;; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 1
interval = 15
mode=multikv
useEnglishOnly = true
index=perfmon_mk_disk_metrics_mk
showZeroValue=1
sourcetype=PerfmonMK_To_MetricMK_AUTO
[/su_spoiler] [su_spoiler title="props.conf" open="no" style="fancy" icon="plus" anchor="" class=""]
#Convert Regular Perfmon Event, into a Metric
[Perfmon_To_Metric]
TRANSFORMS-_value = value
TRANSFORMS-metric_name = perfmon_metric_name
TRANSFORMS-instance = instance
SEDCMD-remove-whitespace = s/ /_/g s/\s/ /g
#Convert Perfmon MK Event, into a multi-key Metric
[PerfmonMK_To_MetricMK_AUTO]
INDEXED_EXTRACTIONS = tsv
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = 1
category = Log To Metrics
pulldown_type = 1
METRIC-SCHEMA-TRANSFORMS = metric-schema:PerfmonMK_To_MetricMK_AUTO
TRANSFORMS-perfmonmk = perfmonmk:PerfmonMK_To_MetricMK_AUTO
[/su_spoiler] [su_spoiler title="transforms.conf" open="no" style="fancy" icon="plus" anchor="" class=""]
[value]
REGEX = .*Value=(\S+).*
FORMAT = _value::$1
WRITE_META = true
[perfmon_metric_name]
REGEX = .*object=(\S+).*counter=(\S+).*
FORMAT = metric_name::$1.$2 metric_type::$1
WRITE_META = true
[instance]
REGEX = .*instance=(\S+).*
FORMAT = instance::$1
WRITE_META = true
[metric-schema:PerfmonMK_To_MetricMK_AUTO]
METRIC-SCHEMA-MEASURES = _ALLNUMS_
[perfmonmk:PerfmonMK_To_MetricMK_AUTO]
WRITE_META = 1
REGEX = collection=\"?(?<collection>[^\"\n]+)\"?\ncategory=\"?(?<category>[^\"\n]+)\"?\nobject=\"?(?<object>[^\"\n]+)\"?\n([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t\n([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t\n
FORMAT = collection::"$1" category::"$2" object::"$3" "$4"::"$28" "$5"::"$29" "$6"::"$30" "$7"::"$31" "$8"::"$32" "$9"::"$33" "$10"::"$34" "$11"::"$35" "$12"::"$36" "$13"::"$37" "$14"::"$38" "$15"::"$39" "$16"::"$40" "$17"::"$41" "$18"::"$42" "$19"::"$43" "$20"::"$44" "$21"::"$45" "$22"::"$46" "$23"::"$47" "$24"::"$48" "$25"::"$49" "$26"::"$50" "$27"::"$51"
WRITE_META = true
[/su_spoiler] [su_spoiler title="indexes.conf" open="no" style="fancy" icon="plus" anchor="" class=""]
# Regular Perfmon Data, Events Index.
[perfmon_disk_events]
coldPath = $SPLUNK_DB\$_index_name\colddb
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = $SPLUNK_DB\$_index_name\db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB\$_index_name\thaweddb
# Regular Perfmon Data, Metrics Index.
[perfmon_disk_metrics]
coldPath = $SPLUNK_DB\$_index_name\colddb
datatype = metric
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = $SPLUNK_DB\$_index_name\db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB\$_index_name\thaweddb
# Perfmon MK Data, Events Index.
[perfmon_mk_disk_events]
coldPath = $SPLUNK_DB\$_index_name\colddb
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = $SPLUNK_DB\$_index_name\db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB\$_index_name\thaweddb
# Perfmon MK Data, Metrics Index.
[perfmon_mk_disk_metrics_mk]
coldPath = $SPLUNK_DB\$_index_name\colddb
enableDataIntegrityControl = 0
datatype = metric
enableTsidxReduction = 0
homePath = $SPLUNK_DB\$_index_name\db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB\$_index_name\thaweddb
[/su_spoiler] [/su_accordion]
Testing will be performed on a new install of Splunk enterprise 8.0.1, on my workstation. 32GB ram, xeon processor. (Don't worry- I am still trying to get ahold of a Ryzen....)
NO additional or 3rd party apps are installed. Testing was performed on a fresh install of Splunk, with only the above configuration files added.
The tests were started at 8:57am, and ended at 9:27am.
Data Collection Methods¶
Event Count¶
Count of events was obtained by recording the number displayed at http://localhost:8000/en-US/manager/search/data/indexes
Storage Usage¶
Storage utilization was obtained in Windows explorer by manually going to C:\Program Files\Splunk\var\lib\splunk, right clicking the folders for each of the indexes, and recording "Size on disk"
License Utilization¶
index=_internal source="C:\\Program Files\\Splunk\\var\\log\\splunk\\license_usage.log"
| stats sum(b) as Size by idx
| eval Size= Size/1024
Performance Testing¶
Performance tests will be done with a specific query used for each index. Due to the limited amount of data (30 minutes, at a 15 second interval), there may not be enough data to do a "Production" test. Tests will be ran on the same timespan from 9am to 9:30am. An average of 5 query times will be recorded.
Here are the individual searches:
[su_accordion] [su_spoiler title="perfmon_disk_events" open="no" style="fancy" icon="plus" anchor="" class=""] index=perfmon_disk_events instance="C:" counter="% Disk Read Time" | timechart span=15s avg(Value) [/su_spoiler] [su_spoiler title="perfmon_mk_disk_events" open="no" style="fancy" icon="plus" anchor="" class=""] index=perfmon_mk_disk_events | timechart span=15s avg(%_Disk_Read_Time) [/su_spoiler] [su_spoiler title="perfmon_disk_metrics" open="no" style="fancy" icon="plus" anchor="" class=""] | mstats avg(_value) WHERE metric_name="LogicalDisk.%_Disk_Read_Time" AND index="perfmon_disk_metrics" span=15s [/su_spoiler] [su_spoiler title="perfmon_mk_disk_metrics_mk" open="no" style="fancy" icon="plus" anchor="" class=""] | mstats avg(_value) WHERE metric_name="%_Disk_Read_Time" AND index="perfmon_mk_disk_metrics_mk" span=15s [/su_spoiler] [/su_accordion]
Test Results - 30 Minutes¶
| Index | Event Count | Disk Size | License Usage |
| perfmon -> events | 10,856 | 572 KB | 1,431 KB |
| perfmon -> metrics | 10,856 | 516 KB | 1,508 KB |
| perfmon_mk -> events | 118 | 292 KB | 173 KB |
| perfmon_mk -> metrics_mk | 118 | 256 KB | 16 KB |
PerfmonMK -> MetricsMK Statistics¶
| % License Decrease compared to Perfmon Events | 98% |
| % License Decrease compared to Perfmon MK | 90.7% |
| % Disk Usage Decrease compared to Perfmon Events | 55% |
| % Disk Usage Decrease compared to Perfmon MK | 12% |
Performance Results¶
[su_spoiler title="Performance Testing - Raw Data" style="fancy"]
index=perfmon_disk_events instance=”C:” counter=”% Disk Read Time” | timechart span=15s avg(Value)
This search has completed and has returned 121 results by scanning 109 events in 0.132 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.223 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.136 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.139 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.122 seconds
index=perfmon_mk_disk_events | timechart span=15s avg(%_Disk_Read_Time)
This search has completed and has returned 121 results by scanning 109 events in 0.161 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.159 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.149 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.142 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.195 seconds
| mstats avg(_value) WHERE metric_name="LogicalDisk.%_Disk_Read_Time" AND index="perfmon_disk_metrics" span=15s
This search has completed and has returned 109 results by scanning 436 events in 0.079 seconds
This search has completed and has returned 109 results by scanning 436 events in 0.088 seconds
This search has completed and has returned 109 results by scanning 436 events in 0.081 seconds
This search has completed and has returned 109 results by scanning 436 events in 0.15 seconds
This search has completed and has returned 109 results by scanning 436 events in 0.087 seconds
| mstats avg(_value) WHERE metric_name=%_Disk_Read_Time AND index=perfmon_mk_disk_metrics_mk span=15s
This search has completed and has returned 109 results by scanning 109 events in 0.19 seconds
This search has completed and has returned 109 results by scanning 109 events in 0.076 seconds
This search has completed and has returned 109 results by scanning 109 events in 0.09 seconds
This search has completed and has returned 109 results by scanning 109 events in 0.158 seconds
This search has completed and has returned 109 results by scanning 109 events in 0.081 seconds
[/su_spoiler]
| Index Name | Average Speed (Seconds) |
| perfmon_disk_events | 0.1504 |
| perfmon_mk_disk_events | 0.1612 |
| perfmon_disk_metrics | 0.097 |
| perfmon_mk_disk_metrics_mk | 0.119 |
Disclaimer: 30 minutes of data is not enough data to do a real-world comparison test.
If you wanted an accurate test, I would recommend searching at least one month of data in an production system. These tests were performed on my local machine, and are subject to variances caused by other processes running in the background.
My conclusion:
Metrics are faster then events. I will not give a percentage here, because I do not feel enough data is present to create an accurate test of measuring performance.
Conclusions¶
In the original post, the method used to convert Perfmon MK events to metrics was a pretty old method introduced in the Splunk infrastructure app a few years back. After making the post, Splunk's engineering team reached out to me providing a lot of technical insight and documentation into the Metrics MK format.
After converting my tests to utilize the metrics MK format, I am completely blown away at the reduction in Licensing, and disk. Compared to the PerfmonMK format I am using in production currently, I can save over 90% on licensing, and over 10% on storage consumption by switching to a MUCH faster format, which is easier for users to ingest.
If you are interested in converting your perfmon data to metrics, I am in the process of finishing up a python script which will automatically build out the props.conf and transforms.conf to do so, with no manual configuration adjustments required.
If you are interested in contributing to this project, please visit the github page here.
My two cents- If you are not in the process of converting your data to MetricsMK, You should be!!! I cannot express how much better the performance, license usage, and disk usage is compared to the out-of-the-box perfmon format.
Documentation¶
Metrics Overview: https://docs.splunk.com/Documentation/Splunk/8.0.1/Metrics/Overview
Using Multi-Value Metrics: https://docs.splunk.com/Documentation/Splunk/8.0.1/Metrics/GetMetricsInOther
Log to Metrics Overview: https://docs.splunk.com/Documentation/Splunk/8.0.1/Metrics/L2MOverview
Special Thanks¶
I have received a lot of assistance from the Splunk team to provide this article. As such, I would like to call out their assistance.
- David Maislin @ Splunk has greatly assisted with issues related to Metrics, and has provided a lot of recommendations on putting together this content.
- (More coming after I get their permission to post their names.)