Overview
A sudden and sustained increase in CPU utilization was observed on two SMSC RTR nodes (RTR01 and RTR04). This increase was identified via internal monitoring tools and confirmed by performance logs and system diagnostics. The abnormal CPU usage was not linked to any configuration or software changes within the Lithium platform. Initial system logs indicated abnormal behavior commencing late evening on April 23 and early morning April 24, prompting further investigation.
Solution
The root cause of the high CPU usage was identified as an external process called xagt
, which is associated with a third-party security agent developed by FireEye (now Trellix). This agent is not a component of the NewNet Lithium system.
Steps and Observations:
-
Initial Diagnostics:
- CPU increase was reported to be abrupt, peaking around 45%.
- The issue was observed across the day with no specific time pattern.
- No changes were made to the affected systems on or around April 23 or 24.
-
Log Analysis:
- Syslog entries showed relevant activity aligning with the CPU spike:
- RTR01: Significant activity started at 8 PM on April 23.
- RTR04: Similar activity began at 3 AM on April 24.
- Syslog entries showed relevant activity aligning with the CPU spike:
-
Process Identification:
- Using the
top
command outputs during high CPU usage,xagt
was identified as the top resource-consuming process. - This process is not part of the Lithium or SMSC software stack.
- To check the CPU usage and the RAM RSS usage, execute the commands below:
top
ps aux --sort -rss
- Using the
-
Conclusion and Recommendation:
- The
xagt
agent is a FireEye (Trellix) endpoint protection or monitoring service. - Its high CPU usage suggests possible misconfiguration, resource contention, or malfunction.
- To resolve this issue, coordination with the system or OS administration team is necessary to:
- Review the behavior of the
xagt
service. - Evaluate recent updates or configuration changes to this agent.
- Consider disabling or reconfiguring the agent to mitigate its impact on system performance.
- These same steps would apply to any other Third-Party Agent causing CPU issues.
- Review the behavior of the
- The
Randall Shawver
Comments