Bash script to manage file directory by size



  • I am housekeeping some directories by running a script to delete everything older than X days.

    find /var/backups/app1* -mtime +30 -exec rm {} \;

    That seems to be working ok, but size is varying among servers. I would rather delete oldest files once I threshold is met.

    Example: I have a hard limit of 5GB that I cannot exceed. After keeping 30 days worth of files on some servers I am only using 500MB and others 3GB. I would rather keep more days and just not exceed 5GB so I can use all available space.

    So I am thinking something like this :

    If directory exceeds 4GB, then delete the 5 oldest days of logs



  • @IRJ said in Bash script to manage file directory by size:

    I am housekeeping some directories by running a script to delete everything older than X days.

    find /var/backups/app1* -mtime +30 -exec rm {} \;

    That seems to be working ok, but size is varying among servers. I would rather delete oldest files once I threshold is met.

    Example: I have a hard limit of 5GB that I cannot exceed. After keeping 30 days worth of files on some servers I am only using 500MB and others 3GB. I would rather keep more days and just not exceed 5GB so I can use all available space.

    So I am thinking something like this :

    If directory exceeds 4GB, then delete the 5 oldest days of logs

    You can use du -h to get directory usage and then eval it.

    Edit: I believe du -hs will give you a summarized in human readable form



  • @IRJ said in Bash script to manage file directory by size:

    I am housekeeping some directories by running a script to delete everything older than X days.

    find /var/backups/app1* -mtime +30 -exec rm {} \;

    That seems to be working ok, but size is varying among servers. I would rather delete oldest files once I threshold is met.

    Example: I have a hard limit of 5GB that I cannot exceed. After keeping 30 days worth of files on some servers I am only using 500MB and others 3GB. I would rather keep more days and just not exceed 5GB so I can use all available space.

    So I am thinking something like this :

    If directory exceeds 4GB, then delete the 5 oldest days of logs

    why 5 days? why not just one if that gets you below 4 GB, if not, then run again, etc?



  • Bash is so primitive. Use python, php, perl or whatever you feel comfortable with instead.
    Then you can make the script do whatever you want and also produce a meaningful log file.



  • I recently went through this with some backups. I was originally just doing the mtime thing. But then we found one of the server sending the backups was screwing up and not sending all 4 parts.. So we suddenly had no valid backups.

    We fixed the server and then I made this script file_cleanup.sh

    #!/bin/bash
    # Send everything to logs and screen.
    exec 1> >(logger -s -t $(basename $0)) 2>&1
    
    # Variables and descriptions of their use.
    # Array of dates found in the filename of the backup files.
    arrDates=()
    # Number of full backup sets to keep.
    keep=4
    # How many full backup sets have been found.
    found=0
    # Bas path to the backup files, minus the last folder.
    base="/home/username/"
    # Full path to the backup files, populated by the script.
    path=""
    
    # This script requires that the final folder name be passed as a paramter.
    # This is because it is designed to be ran independently for each subfolder.
    # ex: ./file_cleanup.sh FolderA
    # ex: ./file_cleanup.sh FolderB
    
    #check for the path to be passed
    if [ ! -z "$1" ]
    then
        # Create the full path to be checked based on the passed parameter.
        path=$base$1
    else
        exit 127
    fi
    
    printf "Executing cleanup of backup files located in $path.\n"
    
    # Loop through all of the files in the path and parse out an array of the file dates from the file names.
    # All backups are named `backup-0000001-YYYYMMDD-XXXX*`.
    cd $path
    for f in backup-*
    do
        # The date is from character 15 for 8 characters.
        arrDates=("${arrDates[@]}" "${f:15:8}")
    done
    cd ~
    
    # Sort in reverse order and only show unique dates.
    arrDates=($(printf '%s\n' "${arrDates[@]}" | sort -ru))
    
    # Loop through the array of dates and check for there to be 4 files for each date.
    for checkdate in "${arrDates[@]}"
    do
        count=$(find "$path"/backup-0000001-"$checkdate"-* -type f -printf '.' | wc -c)
        if [ $count -eq 4 ] && [ $found -lt $keep ]
        then
            found=$((found+1))
            printf "Checking $checkdate, we found $count files. We are keeping this date, currently we have $found dates saved.\n"
        elif [ $count -gt 0 ] && [ ! $count -eq 4 ]
        then
            printf "Incorrect number of files '('$count')' found, removing invalid backup dated $checkdate.\n"
            rm $path/backup-*-$checkdate-*
        elif [ $count -gt 0 ] && [ $found -eq $keep ]
        then
            printf "We have already found $keep full sets of backup files. Removing backup files dated $checkdate.\n"
            rm $path/backup-*-$checkdate-*
        else
            printf "The date $checkdate returned $count files. This is an unhandled scenario, doing nothing.\n"
        fi
    done
    


  • @pmoncho said in Bash script to manage file directory by size:

    @IRJ said in Bash script to manage file directory by size:

    I am housekeeping some directories by running a script to delete everything older than X days.

    find /var/backups/app1* -mtime +30 -exec rm {} \;

    That seems to be working ok, but size is varying among servers. I would rather delete oldest files once I threshold is met.

    Example: I have a hard limit of 5GB that I cannot exceed. After keeping 30 days worth of files on some servers I am only using 500MB and others 3GB. I would rather keep more days and just not exceed 5GB so I can use all available space.

    So I am thinking something like this :

    If directory exceeds 4GB, then delete the 5 oldest days of logs

    You can use du -h to get directory usage and then eval it.

    Edit: I believe du -hs will give you a summarized in human readable form

    Yeah I think this is the route I will need to go.



  • @IRJ said in Bash script to manage file directory by size:

    @pmoncho said in Bash script to manage file directory by size:

    @IRJ said in Bash script to manage file directory by size:

    I am housekeeping some directories by running a script to delete everything older than X days.

    find /var/backups/app1* -mtime +30 -exec rm {} \;

    That seems to be working ok, but size is varying among servers. I would rather delete oldest files once I threshold is met.

    Example: I have a hard limit of 5GB that I cannot exceed. After keeping 30 days worth of files on some servers I am only using 500MB and others 3GB. I would rather keep more days and just not exceed 5GB so I can use all available space.

    So I am thinking something like this :

    If directory exceeds 4GB, then delete the 5 oldest days of logs

    You can use du -h to get directory usage and then eval it.

    Edit: I believe du -hs will give you a summarized in human readable form

    Yeah I think this is the route I will need to go.

    Can you install PowerShell on it? Then it'd be really easy for me to help 😉



  • @JaredBusch THis is what it looks like on a run

    journalctl -u backup-cleanup  -f
    -- Logs begin at Wed 2020-01-08 22:25:52 CST. --
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: Executing cleanup of backup files located in /home/toptech/FolderA.
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: Checking 20200131, we found 4 files. We are keeping this date, currently we have 1 dates saved.
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: Checking 20200130, we found 4 files. We are keeping this date, currently we have 2 dates saved.
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: Checking 20200129, we found 4 files. We are keeping this date, currently we have 3 dates saved.
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: Checking 20200128, we found 4 files. We are keeping this date, currently we have 4 dates saved.
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: We have already found 4 full sets of backup files. Removing backup files dated 20200127.
    


  • @JaredBusch said in Bash script to manage file directory by size:

    @JaredBusch THis is what it looks like on a run

    journalctl -u backup-cleanup  -f
    -- Logs begin at Wed 2020-01-08 22:25:52 CST. --
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: Executing cleanup of backup files located in /home/toptech/FolderA.
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: Checking 20200131, we found 4 files. We are keeping this date, currently we have 1 dates saved.
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: Checking 20200130, we found 4 files. We are keeping this date, currently we have 2 dates saved.
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: Checking 20200129, we found 4 files. We are keeping this date, currently we have 3 dates saved.
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: Checking 20200128, we found 4 files. We are keeping this date, currently we have 4 dates saved.
    Jan 31 08:00:39 ftp.domain.local file_cleanup.sh[37655]: We have already found 4 full sets of backup files. Removing backup files dated 20200127.
    

    That is nice and clean. Easy to integrate with SIEM if you'd like as well.


Log in to reply