Complete Latency Troubleshooting Command Reference

How to Read This Guide: Each command shows the actual output you’ll see on your system. The green/red examples below each command show real outputs – green means your system is optimized for low latency, red means there are problems that will cause latency spikes. Compare your actual output to these examples to quickly identify issues.

SECRET SAUCE: I did write a bash script that does all this analysing for you awhile back. Been meaning to push to my repos.

Its sitting in one my 1000’s of text files of how to do’s. 😁. Im sure you all have those…..more to come…

System Information Commands

uname -a

uname -a

Flags:

  • -a: Print all system information

Example Output:

Linux trading-server 5.15.0-rt64 #1 SMP PREEMPT_RT Thu Mar 21 13:30:15 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

What to look for: PREEMPT_RT indicates real-time kernel is active

✓ GOOD OUTPUT (real-time kernel):
Linux server 5.15.0-rt64 #1 SMP PREEMPT_RT Thu Mar 21 13:30:15 UTC 2024
Shows “PREEMPT_RT” = real-time kernel for predictable latency
✗ BAD OUTPUT (standard kernel):
Linux server 5.15.0-generic #1 SMP Thu Mar 21 13:30:15 UTC 2024
Shows “generic” with no “PREEMPT_RT” = standard kernel with unpredictable latency

Performance Profiling Commands

perf stat

perf stat [options] [command]

Key flags:

  • -e <events>: Specific events to count
  • -a: Monitor all CPUs
  • -p <pid>: Monitor specific process

Example Usage & Output:

perf stat -e cycles,instructions,cache-misses,branch-misses ./trading_app

 Performance counter stats for './trading_app':

     4,234,567,890      cycles                    #    3.456 GHz
     2,987,654,321      instructions              #    0.71  insn per cycle
        45,678,901      cache-misses              #   10.789 % of all cache refs
         5,432,109      branch-misses             #    0.234 % of all branches

What to look for: Instructions per cycle (should be >1), cache miss rate (<5% is good), branch miss rate (<1% is good)

✓ GOOD OUTPUT:
2,987,654,321 instructions # 2.15 insn per cycle
45,678,901 cache-misses # 3.2 % of all cache refs
5,432,109 branch-misses # 0.8 % of all branches
Why: Good = >2.0 IPC (CPU efficient), <5% cache misses, <1% branch misses.

✗ BAD OUTPUT:
1,234,567,890 instructions # 0.65 insn per cycle
156,789,012 cache-misses # 15.7 % of all cache refs
89,432,109 branch-misses # 4.2 % of all branches

Why: Bad = <1.0 IPC (CPU starved), >10% cache misses, >4% branch misses.


eBPF Tools

Note: eBPF tools are part of the BCC toolkit. Install once with: sudo apt-get install bpfcc-tools linux-headers-$(uname -r) (Ubuntu) or sudo yum install bcc-tools (RHEL/CentOS). After installation, these become system-wide commands.

funclatency

sudo funclatency [options] 'function_pattern'

Key flags:

  • -p <pid>: Trace specific process
  • -u: Show in microseconds instead of nanoseconds

Example Output:

sudo funclatency 'c:malloc' -p 1234 -u

     usecs               : count     distribution
         0 -> 1          : 1234     |****************************************|
         2 -> 3          : 567      |******************                      |
         4 -> 7          : 234      |*******                                 |
         8 -> 15         : 89       |**                                      |
        16 -> 31         : 23       |                                        |
        32 -> 63         : 5        |                                        |

What to look for: Long tail distributions indicate inconsistent performance

✓ GOOD OUTPUT (consistent performance):

usecs : count distribution
0 -> 1 : 4567 |****************************************|
2 -> 3 : 234 |** |
4 -> 7 : 12 | |
Why: Good shows 95%+ calls in 0-3μs (predictable).

✗ BAD OUTPUT (inconsistent performance):
usecs : count distribution
0 -> 1 : 1234 |****************** |
2 -> 3 : 567 |******** |
4 -> 7 : 234 |*** |
8 -> 15 : 189 |** |
16 -> 31 : 89 |* |
32 -> 63 : 45 | |

Why: Bad shows calls scattered across many latency ranges (unpredictable).


Network Monitoring Commands

netstat -i

netstat -i

Example Output:

Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0      1500  1234567      0      0 0       987654      0      0      0 BMRU
lo       65536    45678      0      0 0        45678      0      0      0 LRU

What to look for:

  • RX-ERR, TX-ERR: Hardware errors
  • RX-DRP, TX-DRP: Dropped packets (buffer overruns)
  • RX-OVR, TX-OVR: FIFO overruns
✓ GOOD OUTPUT:
eth0 1500 1234567 0 0 0 987654 0 0 0 BMRU
Why: Good = all error/drop counters are 0.

✗ BAD OUTPUT:
eth0 1500 1234567 5 1247 23 987654 12 89 7 BMRU

Why:Bad
= RX-ERR=5, RX-DRP=1247, TX-ERR=12, TX-DRP=89 means network problems causing packet loss and latency spikes.


CPU and Memory Analysis

vmstat 1

vmstat [delay] [count]

Example Output:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 789456  12345 234567    0    0     0     5 1234 2345  5  2 93  0  0
 0  0      0 789234  12345 234678    0    0     0     0 1456 2567  3  1 96  0  0

What to look for:

  • r: Running processes (should be ≤ CPU count)
  • si/so: Swap in/out (should be 0)
  • cs: Context switches per second (lower is better for latency)
  • wa: I/O wait percentage (should be low)
✓ GOOD OUTPUT (8-CPU system):
procs -----memory------ ---swap-- --system-- ------cpu-----
r b si so in cs us sy id wa st
2 0 0 0 1234 2345 5 2 93 0 0
Why: Good: r=2 (≤8 CPUs), si/so=0 (no swap), cs=2345 (low context switches), wa=0 (no I/O wait).

✗ BAD OUTPUT (8-CPU system):
procs -----memory------ ---swap-- --system-- ------cpu-----
r b si so in cs us sy id wa st
12 1 45 67 8234 15678 85 8 2 15 0

Why Bad: r=12 (>8 CPUs = overloaded), si/so>0 (swapping = latency spikes), cs=15678 (high context switches), wa=15 (I/O blocked).


Interpreting the Results

Good Latency Indicators:

  • perf stat: >2.0 instructions per cycle
  • Cache misses: <5% of references
  • Branch misses: <1% of branches
  • Context switches: <1000/sec per core
  • IRQ latency: <10 microseconds
  • Run queue length: Mostly 0
  • No swap activity (si/so = 0)
  • CPUs at max frequency
  • Temperature <80°C

Red Flags:

  • Instructions per cycle <1.0
  • Cache miss rate >10%
  • High context switch rate (>10k/sec)
  • IRQ processing >50us
  • Consistent run queue length >1
  • Any swap activity
  • CPU frequency scaling active
  • Memory fragmentation (no high-order pages)
  • Thermal throttling events

This reference guide provides the foundation for systematic latency troubleshooting – use the baseline measurements to identify problematic areas, then dive deeper with the appropriate tools!

Leave a Reply

Your email address will not be published. Required fields are marked *

0