Start a conversation

Node Down Because Of /var Reaching 100% Disk Space

Overview

You may have noticed some alarms related to an STP node being down. The problem occurs when /var reaches 100% on disk space. On syslog, you may find these two errors:

  • Audit daemon is low on disk space for logging

  • The audit daemon is now halting the system

 

Solution

The audit daemon is low on disk space for logging because the /var file system does not have enough free space. The audit daemon is halting the server because of this. You may start by removing unwanted files from this partition to free up some space.

The audit daemon is probably flooding the /var/log/messages file, consuming space on the /var partition, and leading to the issue. If that is the case, stop the audit service temporarily in all the servers to prevent it from shutting down any new node. For that, run the following commands as root:

  • systemctl stop auditd
  • systemctl disable auditd

Please note that no alarm is raised for the low disk space issue once it reaches a critical threshold. That must be done at the OS level, and the application does not manage OS traps. In case you want to create the OS trap manually, please follow these steps:

  1. Update /etc/sysconfig/snmpd as follows:
    OPTIONS="-Ln -I -smux -p /var/run/snmpd.pid -c /etc/snmp/snmpd.conf udp:11114 udp:161"
  2. Edit /etc/sysconfig/snmptrapd as follows:
    OPTIONS=-Lsd -m ALL -F "%02.2h:%02.2j:%02.2k TRAP%w.%q %v from %B" 11173 162
  3. Make sure of the disableAuthorization value on /etc/snmp/snmptrapd.conf
    disableAuthorization yes
  4. Edit the /etc/snmp/snmpd.conf file as follows, so the traps do not appear only in the /var/log/messages file in the localhost (replace [remote_IP_address] with the IP of the node where the trap should be sent and set the threshold to the desired value):
    ###########################################################################
    snmpd.conf
    ###########################################################################
    
    rocommunity public
    rwcommunity private
    
    syslocation test
    syscontact mbalance
    
    sysservices 78
    trapsink 127.0.0.1 public
    
    trapcommunity public
    
    authtrapenable 1
    
    createUser internal MD5 "snmp1234"
    iquerySecName internal
    rouser internal
    
    defaultMonitors yes
    monitor -r 1 -u internal -o dskErrorMsg "Disk Usage Error" dskErrorFlag != 0
    
    trap2sink [remote_IP_address] public 162
    
    includeAllDisks 70%  (The value that you want)
    
  5. Restart the SNMPD service: systemctl restart snmpd
  6. Restart the SNMPTrapD service: systemctl restart snmptrapd

Make sure the OS MIBs listed below are installed on the supervision system:

  • HOST-RESOURCES-MIB.txt
  • HOST-RESOURCES-TYPES.txt
  • UCD-DISKIO-MIB.txt
  • RFC1213-MIB
  • UCD-SNMP-MIB
  • DISMAN-EVENT-MIB

The MIB files can be extracted directly from the servers under /usr/local/share/snmp/mibs. This configuration will send an SNMP trap if the disks usage exceeds 70% (or the value you specified)

Choose files or drag and drop files
Was this article helpful?
Yes
No
  1. Priyanka Bhotika

  2. Posted

Comments