Cleanup script help



  • I currently have a FTP server at a site that is the backup target of some bespoke software.

    Currently, it simply deletes files via a daily cron that deletes files a mtime parameter.

    [[email protected] ~]# crontab -l
    #Delete all files older than 30 days. Check daily beginning at 06:00
    0 6 * * * find /home/tt/h* -mtime +10 -type f -delete
    1 6 * * * find /home/tt/nc* -mtime +10 -type f -delete
    2 6 * * * find /home/tt/nlr* -mtime +10 -type f -delete
    3 6 * * * find /home/tt/s* -mtime +10 -type f -delete
    4 6 * * * find /home/tt/th* -mtime +10 -type f -delete
    

    I need to improve this now to actually be a smart delete and ensure that there are 4 full backups. Each backup is 4 files as pictured here:

    38150b38-094b-49ef-a61a-82cdc4dc7353-image.png

    Any recommendations for a starting point?



  • ok this is what I came up with.

    #!/bin/bash
    # Send everything to logs and screen.
    exec 1> >(logger -s -t $(basename $0)) 2>&1
    
    # Variables and descriptions of their use.
    # Array of dates found in the filename of the backup files.
    arrDates=()
    # Number of full backup sets to keep.
    keep=4
    # How many full backup sets have been found.
    found=0
    # Bas path to the backup files, minus the last folder.
    base="/home/jbusch/"
    # Full path to the backup files, populated by the script.
    path=""
    
    # This script requires that the final folder name be passed as a paramter.
    # This is because it is designed to be ran independently for each subfolder.
    # ex: ./file_cleanup.sh Hartford
    # ex: ./file_cleanup.sh Seymour
    
    #check for the path to be passed
    if [ ! -z "$1" ]
    then
        # Create the full path to be checked based on the passed parameter.
        path=$base$1
    else
        exit 127
    fi
    
    printf "Executing cleanup of backup files located in $path.\n"
    
    # Loop through all of the files in the path and parse out an array of the file dates from the file names.
    # All backups are named `backup-0000001-YYYYMMDD-XXXX*`.
    cd $path
    for f in backup-*
    do
        # The date is from character 15 for 8 characters.
        arrDates=("${arrDates[@]}" "${f:15:8}")
    done
    cd ~
    
    # Sort in reverse order and only show unique dates.
    arrDates=($(printf '%s\n' "${arrDates[@]}" | sort -ru))
    
    # Loop through the array of dates and check for there to be 4 files for each date.
    for checkdate in "${arrDates[@]}"
    do
        count=$(find "$path"/backup-0000001-"$checkdate"-* -type f -printf '.' | wc -c)
        if [ $count -eq 4 ] && [ $found -lt $keep ]
        then
            found=$((found+1))
            printf "Checking $checkdate, we found $count files. We are keeping this date, currently we have $found dates saved.\n"
        elif [ $count -gt 0 ] && [ ! $count -eq 4 ]
        then
            printf "Incorrect number of files '('$count')' found, removing invalid backup dated $checkdate.\n"
            rm $path/backup-*-$checkdate-*
        elif [ $count -gt 0 ] && [ $found -eq $keep ]
        then
            printf "We have already found $keep full sets of backup files. Removing backup files dated $checkdate.\n"
            rm $path/backup-*-$checkdate-*
        else
            printf "The date $checkdate returned $count files. This is an unhandled scenario, doing nothing.\n"
        fi
    done
    

    output look like this

    [[email protected] FTPTest]$ ./file_cleanup.sh FTPTest
    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200107, we found 4 files. We are keeping this date, currently we have 1 dates saved.
    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200105, we found 4 files. We are keeping this date, currently we have 2 dates saved.
    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200104, we found 4 files. We are keeping this date, currently we have 3 dates saved.
    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200103, we found 4 files. We are keeping this date, currently we have 4 dates saved.
    <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20200102.
    <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191230.
    <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191228.
    


  • Not my code (but I do use a variation of it), and maybe not your solution; but I hope this gets you going in the direction you need to be.

    #  A "safe" function for removing backups older than REMOVE_AGE + 1 day(s), always keeping at least the ALWAYS_KEEP youngest
    remove_old_backups() {
        local file_prefix="${backup_file_prefix:-$1}"
        local temp=$(( REMOVE_AGE+1 ))  # for inverting the mtime argument: it's quirky ;)
        # We consider backups made on the same day to be one (commonly these are temporary backups in manual intervention scenarios)
        local keeping_n=`/usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime -"$temp" -printf '%Td-%Tm-%TY\n' | sort -d | uniq | wc -l`
        local extra_keep=$(( $ALWAYS_KEEP-$keeping_n ))
    
        /usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime +$REMOVE_AGE -printf '%[email protected] %p\n' |  sort -n | head -n -$extra_keep | cut -d ' ' -f2 | xargs -r rm
    }
    

    It takes a backup_file_prefix env variable or it can be passed as the first argument and expects environment variables ALWAYS_KEEP (minimum number of files to keep) and REMOVE_AGE (num days to pass to -mtime). It expects a gz or tgz extension. There are a few other assumptions as you can see in the comments, mostly in the name of safety.

    Credit this post: https://stackoverflow.com/questions/20358865/remove-all-files-older-than-x-days-but-keep-at-least-the-y-youngest/52230709#52230709

    Good luck! And be sure to post your solution.



  • This is what I have at the moment.

    stick it in a directory and run it.

    I think I would be better served to loop on the known dates. but I would have to figure out how to parse them out of ls or find

    #!/bin/bash
    
    # create test files
    rm backup-*
    touch backup-0000001-20191228-1182-critical-data.tar.gz
    touch backup-0000001-20191228-1182.log
    touch backup-0000001-20191228-1182-mysqldump.sql.gz
    #touch backup-0000001-20191228-1182-toptech-software.tar.gz
    touch backup-0000001-20191229-1183-critical-data.tar.gz
    touch backup-0000001-20191229-1183.log
    touch backup-0000001-20191229-1183-mysqldump.sql.gz
    touch backup-0000001-20191229-1183-toptech-software.tar.gz
    touch backup-0000001-20191230-1184-critical-data.tar.gz
    touch backup-0000001-20191230-1184.log
    touch backup-0000001-20191230-1184-mysqldump.sql.gz
    touch backup-0000001-20191230-1184-toptech-software.tar.gz
    touch backup-0000001-20191231-1185-critical-data.tar.gz
    touch backup-0000001-20191231-1185.log
    touch backup-0000001-20191231-1185-mysqldump.sql.gz
    touch backup-0000001-20191231-1185-toptech-software.tar.gz
    touch backup-0000001-20200101-1186-critical-data.tar.gz
    touch backup-0000001-20200101-1186.log
    touch backup-0000001-20200101-1186-mysqldump.sql.gz
    touch backup-0000001-20200101-1186-toptech-software.tar.gz
    touch backup-0000001-20200102-1187-critical-data.tar.gz
    touch backup-0000001-20200102-1187.log
    touch backup-0000001-20200102-1187-mysqldump.sql.gz
    touch backup-0000001-20200102-1187-toptech-software.tar.gz
    touch backup-0000001-20200103-1188-critical-data.tar.gz
    touch backup-0000001-20200103-1188.log
    #touch backup-0000001-20200103-1188-mysqldump.sql.gz
    touch backup-0000001-20200103-1188-toptech-software.tar.gz
    touch backup-0000001-20200104-1189-critical-data.tar.gz
    touch backup-0000001-20200104-1189.log
    touch backup-0000001-20200104-1189-mysqldump.sql.gz
    touch backup-0000001-20200104-1189-toptech-software.tar.gz
    touch backup-0000001-20200105-1190-critical-data.tar.gz
    touch backup-0000001-20200105-1190.log
    touch backup-0000001-20200105-1190-mysqldump.sql.gz
    touch backup-0000001-20200105-1190-toptech-software.tar.gz
    #touch backup-0000001-20200106-1191-critical-data.tar.gz
    touch backup-0000001-20200106-1191.log
    touch backup-0000001-20200106-1191-mysqldump.sql.gz
    touch backup-0000001-20200106-1191-toptech-software.tar.gz
    touch backup-0000001-20200107-1192-critical-data.tar.gz
    touch backup-0000001-20200107-1192.log
    touch backup-0000001-20200107-1192-mysqldump.sql.gz
    touch backup-0000001-20200107-1192-toptech-software.tar.gz
    
    
    
    keep=4
    found=0
    for i in {0..13}
    do
        checkdate=$(date --date="-$i days" +"%Y%m%d")
        count=$(find backup-0000001-"$checkdate"-* -type f -printf '.' | wc -c)
        if [ $count -eq 4 ] && [ $found -lt $keep ]
        then
            found=$((found+1))
            echo Checking $checkdate, we found $count files. We are keeping this date, currently we have $found dates saved.
        elif [ $count -gt 0 ] && [ ! $count -eq 4 ]
        then
            echo Incorrect number of files '('$count')' found, removing invalid backup
        elif [ $count -gt 0 ] && [ $found -eq $keep ]
        then
            echo We have already found $keep full sets of backup files. Removing backup files dated $checkdate.
        else
            echo The date $checkdate returned $count files. 
        fi
    done
    


  • @JaredBusch Could you use the stat command?

    stat -c "%y" /path/*



  • @DustinB3403 said in Cleanup script help:

    @JaredBusch Could you use the stat command?

    stat -c "%y" /path/*

    I don't know that I can 100% trust the file date to match the date in the filename.



  • ok this gets me just the date bit. now to get it into an array of unique only.

    for f in backup-*
    do
        echo ${f:15:8}
    done
    


  • ok this is what I came up with.

    #!/bin/bash
    # Send everything to logs and screen.
    exec 1> >(logger -s -t $(basename $0)) 2>&1
    
    # Variables and descriptions of their use.
    # Array of dates found in the filename of the backup files.
    arrDates=()
    # Number of full backup sets to keep.
    keep=4
    # How many full backup sets have been found.
    found=0
    # Bas path to the backup files, minus the last folder.
    base="/home/jbusch/"
    # Full path to the backup files, populated by the script.
    path=""
    
    # This script requires that the final folder name be passed as a paramter.
    # This is because it is designed to be ran independently for each subfolder.
    # ex: ./file_cleanup.sh Hartford
    # ex: ./file_cleanup.sh Seymour
    
    #check for the path to be passed
    if [ ! -z "$1" ]
    then
        # Create the full path to be checked based on the passed parameter.
        path=$base$1
    else
        exit 127
    fi
    
    printf "Executing cleanup of backup files located in $path.\n"
    
    # Loop through all of the files in the path and parse out an array of the file dates from the file names.
    # All backups are named `backup-0000001-YYYYMMDD-XXXX*`.
    cd $path
    for f in backup-*
    do
        # The date is from character 15 for 8 characters.
        arrDates=("${arrDates[@]}" "${f:15:8}")
    done
    cd ~
    
    # Sort in reverse order and only show unique dates.
    arrDates=($(printf '%s\n' "${arrDates[@]}" | sort -ru))
    
    # Loop through the array of dates and check for there to be 4 files for each date.
    for checkdate in "${arrDates[@]}"
    do
        count=$(find "$path"/backup-0000001-"$checkdate"-* -type f -printf '.' | wc -c)
        if [ $count -eq 4 ] && [ $found -lt $keep ]
        then
            found=$((found+1))
            printf "Checking $checkdate, we found $count files. We are keeping this date, currently we have $found dates saved.\n"
        elif [ $count -gt 0 ] && [ ! $count -eq 4 ]
        then
            printf "Incorrect number of files '('$count')' found, removing invalid backup dated $checkdate.\n"
            rm $path/backup-*-$checkdate-*
        elif [ $count -gt 0 ] && [ $found -eq $keep ]
        then
            printf "We have already found $keep full sets of backup files. Removing backup files dated $checkdate.\n"
            rm $path/backup-*-$checkdate-*
        else
            printf "The date $checkdate returned $count files. This is an unhandled scenario, doing nothing.\n"
        fi
    done
    

    output look like this

    [[email protected] FTPTest]$ ./file_cleanup.sh FTPTest
    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200107, we found 4 files. We are keeping this date, currently we have 1 dates saved.
    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200105, we found 4 files. We are keeping this date, currently we have 2 dates saved.
    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200104, we found 4 files. We are keeping this date, currently we have 3 dates saved.
    <13>Jan  7 16:51:59 file_cleanup.sh: Checking 20200103, we found 4 files. We are keeping this date, currently we have 4 dates saved.
    <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20200102.
    <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191230.
    <13>Jan  7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191228.
    

Log in to reply