Monitoring

now browsing by category

 

HOW TO CHECK CPU, MEMORY, & DISKS THRESHHOLDS on an ARRAY of HOSTS.

So I was tinkering around as usual. I thought this will come in handy for other engineers

If you a large cluster of servers that can suddenly over night loose all its MEM,CPU,DISK due to the nature of your businesses. Its difficult to monitor that from a GUI and on an array of hosts more often  than not.

Cloud Scenario……

Say you find a node that is dying because too many clients are using resources and you need migrate instances off to another node, only you don’t know which nodes have the needed resources without having to go look at all the nodes individually.

This tends be every engineers pain point. So I decide to come up with quick easy solution for emergency situations, where you don’t have time to sifting through alert systems that only show you data on a per host basis, that tend to load very slowly.

This bash script will check the CPU, MEM, DISK MOUNTS (including NFS) and tell which ones are okay and which ones are

CPU – calculated by the = 100MaxThrottle – Cpu-idle = CPU-usage
note: it also creates a log /opt/cpu.log on each host

MEM – calculate by Total Mem / Used Memory * 100 = Percentage of Used Memory
note: it also creates a log /opt/mem.log on each host

Disk – Any mount that reaches the warn threshold… COMPLAIN

.

Now, itemised the bash script so you can just comment out item you don’t want to use at the bottom of the script if you wanted to say just check CPU/MEM

#Written By Nick Tailor

#!/bin/bash

now=`date -u -d”+8 hour” +’%Y-%m-%d %H:%M:%S’`

#cpu use threshold

cpu_warn=’75’

#disk use threshold

disk_warn=’80’

#—cpu

item_cpu () {

cpu_idle=`top -b -n 1 | grep Cpu | awk ‘{print $8}’|cut -f 1 -d “.”`

cpu_use=`expr 100 – $cpu_idle`

echo “now current cpu utilization rate of $cpu_use $(hostname) as on $(date)” >> /opt/cpu.log

if [ $cpu_use -gt $cpu_warn ]

then

echo “cpu warning!!! $cpu_use Currently HIGH $(hostname)”

else

echo “cpu ok!!! $cpu_use% use Currently LOW $(hostname)”

fi

}

#—mem

item_mem () {

#MB units

LOAD=’80.00′

mem_free_read=`free -h | grep “Mem” | awk ‘{print $4+$6}’`

MEM_LOAD=`free -t | awk ‘FNR == 2 {printf(“%.2f%”), $3/$2*100}’`

echo “Now the current memory space remaining ${mem_free_read} GB $(hostname) as on $(date)” >> /opt/mem.log

if [[ $MEM_LOAD > $LOAD ]]

then

echo “$MEM_LOAD not good!! MEM USEAGE is HIGH – Free-MEM-${mem_free_read}GB $(hostname)”

else

echo “$MEM_LOAD ok!! MEM USAGE is beLOW 80% – Free-MEM-${mem_free_read}GB $(hostname)”

fi

}

#—disk

item_disk () {

df -H | grep -vE ‘^Filesystem|tmpfs|cdrom’ | awk ‘{ print $5 ” ” $1 }’ | while read output;

do

echo $output

  usep=$(echo $output | awk ‘{ print $1}’ | cut -d’%’ -f1 )

partition=$(echo $output | awk ‘{ print $2 }’ )

if [ $usep -ge $disk_warn ]; then

echo “AHH SHIT!, MOVE SOME VOLUMES IDIOT…. \”$partition ($usep%)\” on $(hostname) as on $(date)”

fi

done

}

item_cpu

item_mem

#item_disk – This is so you can comment out whole sections of the script without having to do the whole section by individual lines.

Now the cool part.

Now if you have a centrally managed jump host that allows you to get out from your estate. Ideally you would want to setup ssh keys on the hosts and ensure you have sudo permissions on the those hosts.

We want to loop this script through an array of hosts and have it run and then report back all the findings in once place. This is extremely handy if your in resource crunch.

This assumes you have SSH KEYS SETUP & SUDO for your user setup.

Create the script

1.On your jump host as your “user” not root
a.vi coolchecks.sh
b.Copy the above code and paste
c.Save the file
2.Next chmod the permission to executable
d.chmod +x coolcheck.sh

Next

3.Create a servers.txt file
e.vi servers.txt
f.List out servers in a column

Server1
Server2

Server3

Server4

g.Save the file.
4.Now we want to loop that through the list of servers and then have it spit out the results and pipe the information to a file on the jumps host.

Run your forloop with ssh keys and sudo already setup.

.

1.for HOST in $(cat servers.txt); do ssh $HOST “sudo bash -s” < coolcheck.sh; done 2>&1 | tee -a cpumem.status.DEV

Logfile – cpumem.status.DEVwill be the log file that has all the info

Output:

cpu ok!!! 3% use Currently dev1.nicktailor.com

17.07% ok!! MEM USAGE is beLOW 80% – Free-MEM-312.7GB dev1.nicktailor.com

5% /dev/mapper/VolGroup00-root

3% /dev/sda2

5% /dev/sda1

1% /dev/mapper/VolGroup00-var_log

72% 192.168.1.101:/data_1

28% 192.168.1.102:/data_2

80% 192.168.1.103:/data_3

AHH SHIT!, MOVE SOME VOLUMES IDIOT…. “192.168.1.104:/data4 (80%)” on dev1.nicktailor.com as on Fri Apr 30 11:55:16 EDT 2021

.

Okay so now I’m gonna show you a dirty way to do it, because im just dirty. So say your in horrible place that doesn’t use keys, because they’re waiting to be hacked by password. 😛

.

DIRTY WAY – So this assumes you have sudo permissions on the hosts.

Note: I do not recommend doing this way if you are a newb. Doing it this way will basically log your password in the bash history and if you don’t know how to clean up after yourself, well………………….you’re going to get owned.

I’m only showing you this because some cyber security “folks” believe that not using keys is easier to deal with in some parallel realities iv visited… You can do the exact same thing above, without keys. But leave massive trail behind you. Hence why you should use secure keys with passwords.

.

Not Recommended for Newbies:
Forloop AND passing your ssh password inside it.

2.for HOST in $(cat servers.txt); do sshpass -p’SHHPASSWORD!‘ ssh -o ‘StrictHostKeyChecking no’ -p 22 $HOST “sudo bash -s” < coolcheck.sh; done 2>&1 | tee -a cpumem.status.DEV

.

Log file – cpumem.status.DEVwill be the log file that has all the info

Output:

cpu ok!!! 3% use Currently dev1.nicktailor.com

17.07% ok!! MEM USAGE is beLOW 80% – Free-MEM-312.7GB dev1.nicktailor.com

5% /dev/mapper/VolGroup00-root

3% /dev/sda2

5% /dev/sda1

1% /dev/mapper/VolGroup00-var_log

72% 192.168.1.101:/data_1

28% 192.168.1.102:/data_2

80% 192.168.1.103:/data_3 

AHH SHIT!, MOVE SOME VOLUMES IDIOT…. “192.168.1.104:/data4 (80%)” on dev1.nicktailor.com as on Fri Apr 30 11:55:16 EDT 2021

.