Start a conversation

Resolving Berkeley DB Environment Corruption for tp_ams Process

Overview

The error "DAL_open_env BDB error: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery" occurs when attempting to start the tp_ams process on your node. This issue is typically due to Berkeley DB environment corruption, possibly from unclean shutdowns or disk issues. The problem was resolved by applying database recovery steps and restarting the AMS process.

Information

Error Message: "DAL_open_env BDB error: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery"

Cause: The error indicates that the Berkeley DB environment is corrupted or was left in a bad state, potentially due to unclean shutdowns, concurrent writers, or disk/permission issues.

Resolution Steps:

  1. Verify System Status:
    • Check for any recent crashes or power losses.
    • Ensure there is sufficient free disk space and inodes using df -h and df -i.
    • Confirm that file permissions have not changed unexpectedly.
  2. Apply Database Recovery Steps:
    mkdir /data/Backup1/log_replica
    mkdir /data/Backup1/store_replica
    mkdir /data/Backup1/log_master
    mkdir /data/Backup1/store_master
    
    mv /dbamslog/master/* /data/Backup1/log_master
    mv /dbamsstore/master/* /data/Backup1/store_master
    mv /dbamslog/replica/* /data/Backup1/log_replica
    mv /dbamsstore/replica/* /data/Backup1/store_replica
    
    chmod 777 /data
    chmod -R 777 /dbamsstore
    chmod -R 777 /dbamslog
    chown textpass:textpass /dbamsstore/master/
    chown textpass:textpass /dbamsstore/replica/
    chown textpass:textpass /dbamslog/replica/
    chown textpass:textpass /dbamslog/master/
  3. Restart the AMS Process
    tp_start --tp_ams
  4. Verify Resolution
    • Check the status of the tp_ams process using tp_status to ensure it is operating correctly.

Note: If the issue persists, verify with infrastructure teams for any underlying hardware or hypervisor issues that might affect the filesystem.

Frequently Asked Questions

How do I know if this error applies to my situation?
You will see the error message "DAL_open_env BDB error: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery" when attempting to start the tp_ams process.
What should I do if the recovery steps do not resolve the issue?
Ensure there are no underlying hardware or hypervisor issues affecting the filesystem. If the problem persists, contact support for further assistance.
Are the recovery steps valid every time this issue occurs?
Yes, the recovery steps are generally low risk and should resolve the issue. However, it may require the AMS to rebuild messages from replica nodes, which could result in some message loss depending on the state at the time of the incident.
Choose files or drag and drop files
Was this article helpful?
Yes
No
  1. Mohammed Amer

  2. Posted

Comments