Overview
The error "DAL_open_env BDB error: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery" occurs when attempting to start the tp_ams process on your node. This issue is typically due to Berkeley DB environment corruption, possibly from unclean shutdowns or disk issues. The problem was resolved by applying database recovery steps and restarting the AMS process.
Information
Error Message: "DAL_open_env BDB error: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery"
Cause: The error indicates that the Berkeley DB environment is corrupted or was left in a bad state, potentially due to unclean shutdowns, concurrent writers, or disk/permission issues.
Resolution Steps:
-
Verify System Status:
- Check for any recent crashes or power losses.
- Ensure there is sufficient free disk space and inodes using
df -h
anddf -i
. - Confirm that file permissions have not changed unexpectedly.
-
Apply Database Recovery Steps:
mkdir /data/Backup1/log_replica mkdir /data/Backup1/store_replica mkdir /data/Backup1/log_master mkdir /data/Backup1/store_master mv /dbamslog/master/* /data/Backup1/log_master mv /dbamsstore/master/* /data/Backup1/store_master mv /dbamslog/replica/* /data/Backup1/log_replica mv /dbamsstore/replica/* /data/Backup1/store_replica chmod 777 /data chmod -R 777 /dbamsstore chmod -R 777 /dbamslog chown textpass:textpass /dbamsstore/master/ chown textpass:textpass /dbamsstore/replica/ chown textpass:textpass /dbamslog/replica/ chown textpass:textpass /dbamslog/master/
-
Restart the AMS Process:
tp_start --tp_ams
-
Verify Resolution:
- Check the status of the tp_ams process using
tp_status
to ensure it is operating correctly.
- Check the status of the tp_ams process using
Note: If the issue persists, verify with infrastructure teams for any underlying hardware or hypervisor issues that might affect the filesystem.
Frequently Asked Questions
- How do I know if this error applies to my situation?
- You will see the error message "DAL_open_env BDB error: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery" when attempting to start the tp_ams process.
- What should I do if the recovery steps do not resolve the issue?
- Ensure there are no underlying hardware or hypervisor issues affecting the filesystem. If the problem persists, contact support for further assistance.
- Are the recovery steps valid every time this issue occurs?
- Yes, the recovery steps are generally low risk and should resolve the issue. However, it may require the AMS to rebuild messages from replica nodes, which could result in some message loss depending on the state at the time of the incident.
Mohammed Amer
Comments