Solved Cleanup script help
-
I currently have a FTP server at a site that is the backup target of some bespoke software.
Currently, it simply deletes files via a daily cron that deletes files a
mtime
parameter.[root@ftp ~]# crontab -l #Delete all files older than 30 days. Check daily beginning at 06:00 0 6 * * * find /home/tt/h* -mtime +10 -type f -delete 1 6 * * * find /home/tt/nc* -mtime +10 -type f -delete 2 6 * * * find /home/tt/nlr* -mtime +10 -type f -delete 3 6 * * * find /home/tt/s* -mtime +10 -type f -delete 4 6 * * * find /home/tt/th* -mtime +10 -type f -delete
I need to improve this now to actually be a smart delete and ensure that there are 4 full backups. Each backup is 4 files as pictured here:
Any recommendations for a starting point?
-
ok this is what I came up with.
#!/bin/bash # Send everything to logs and screen. exec 1> >(logger -s -t $(basename $0)) 2>&1 # Variables and descriptions of their use. # Array of dates found in the filename of the backup files. arrDates=() # Number of full backup sets to keep. keep=4 # How many full backup sets have been found. found=0 # Bas path to the backup files, minus the last folder. base="/home/jbusch/" # Full path to the backup files, populated by the script. path="" # This script requires that the final folder name be passed as a paramter. # This is because it is designed to be ran independently for each subfolder. # ex: ./file_cleanup.sh Hartford # ex: ./file_cleanup.sh Seymour #check for the path to be passed if [ ! -z "$1" ] then # Create the full path to be checked based on the passed parameter. path=$base$1 else exit 127 fi printf "Executing cleanup of backup files located in $path.\n" # Loop through all of the files in the path and parse out an array of the file dates from the file names. # All backups are named `backup-0000001-YYYYMMDD-XXXX*`. cd $path for f in backup-* do # The date is from character 15 for 8 characters. arrDates=("${arrDates[@]}" "${f:15:8}") done cd ~ # Sort in reverse order and only show unique dates. arrDates=($(printf '%s\n' "${arrDates[@]}" | sort -ru)) # Loop through the array of dates and check for there to be 4 files for each date. for checkdate in "${arrDates[@]}" do count=$(find "$path"/backup-0000001-"$checkdate"-* -type f -printf '.' | wc -c) if [ $count -eq 4 ] && [ $found -lt $keep ] then found=$((found+1)) printf "Checking $checkdate, we found $count files. We are keeping this date, currently we have $found dates saved.\n" elif [ $count -gt 0 ] && [ ! $count -eq 4 ] then printf "Incorrect number of files '('$count')' found, removing invalid backup dated $checkdate.\n" rm $path/backup-*-$checkdate-* elif [ $count -gt 0 ] && [ $found -eq $keep ] then printf "We have already found $keep full sets of backup files. Removing backup files dated $checkdate.\n" rm $path/backup-*-$checkdate-* else printf "The date $checkdate returned $count files. This is an unhandled scenario, doing nothing.\n" fi done
output look like this
[jbusch@dt-jared FTPTest]$ ./file_cleanup.sh FTPTest <13>Jan 7 16:51:59 file_cleanup.sh: Checking 20200107, we found 4 files. We are keeping this date, currently we have 1 dates saved. <13>Jan 7 16:51:59 file_cleanup.sh: Checking 20200105, we found 4 files. We are keeping this date, currently we have 2 dates saved. <13>Jan 7 16:51:59 file_cleanup.sh: Checking 20200104, we found 4 files. We are keeping this date, currently we have 3 dates saved. <13>Jan 7 16:51:59 file_cleanup.sh: Checking 20200103, we found 4 files. We are keeping this date, currently we have 4 dates saved. <13>Jan 7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20200102. <13>Jan 7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191230. <13>Jan 7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191228.
-
Not my code (but I do use a variation of it), and maybe not your solution; but I hope this gets you going in the direction you need to be.
# A "safe" function for removing backups older than REMOVE_AGE + 1 day(s), always keeping at least the ALWAYS_KEEP youngest remove_old_backups() { local file_prefix="${backup_file_prefix:-$1}" local temp=$(( REMOVE_AGE+1 )) # for inverting the mtime argument: it's quirky ;) # We consider backups made on the same day to be one (commonly these are temporary backups in manual intervention scenarios) local keeping_n=`/usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime -"$temp" -printf '%Td-%Tm-%TY\n' | sort -d | uniq | wc -l` local extra_keep=$(( $ALWAYS_KEEP-$keeping_n )) /usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime +$REMOVE_AGE -printf '%T@ %p\n' | sort -n | head -n -$extra_keep | cut -d ' ' -f2 | xargs -r rm }
It takes a backup_file_prefix env variable or it can be passed as the first argument and expects environment variables ALWAYS_KEEP (minimum number of files to keep) and REMOVE_AGE (num days to pass to -mtime). It expects a gz or tgz extension. There are a few other assumptions as you can see in the comments, mostly in the name of safety.
Credit this post: https://stackoverflow.com/questions/20358865/remove-all-files-older-than-x-days-but-keep-at-least-the-y-youngest/52230709#52230709
Good luck! And be sure to post your solution.
-
This is what I have at the moment.
stick it in a directory and run it.
I think I would be better served to loop on the known dates. but I would have to figure out how to parse them out of
ls
orfind
#!/bin/bash # create test files rm backup-* touch backup-0000001-20191228-1182-critical-data.tar.gz touch backup-0000001-20191228-1182.log touch backup-0000001-20191228-1182-mysqldump.sql.gz #touch backup-0000001-20191228-1182-toptech-software.tar.gz touch backup-0000001-20191229-1183-critical-data.tar.gz touch backup-0000001-20191229-1183.log touch backup-0000001-20191229-1183-mysqldump.sql.gz touch backup-0000001-20191229-1183-toptech-software.tar.gz touch backup-0000001-20191230-1184-critical-data.tar.gz touch backup-0000001-20191230-1184.log touch backup-0000001-20191230-1184-mysqldump.sql.gz touch backup-0000001-20191230-1184-toptech-software.tar.gz touch backup-0000001-20191231-1185-critical-data.tar.gz touch backup-0000001-20191231-1185.log touch backup-0000001-20191231-1185-mysqldump.sql.gz touch backup-0000001-20191231-1185-toptech-software.tar.gz touch backup-0000001-20200101-1186-critical-data.tar.gz touch backup-0000001-20200101-1186.log touch backup-0000001-20200101-1186-mysqldump.sql.gz touch backup-0000001-20200101-1186-toptech-software.tar.gz touch backup-0000001-20200102-1187-critical-data.tar.gz touch backup-0000001-20200102-1187.log touch backup-0000001-20200102-1187-mysqldump.sql.gz touch backup-0000001-20200102-1187-toptech-software.tar.gz touch backup-0000001-20200103-1188-critical-data.tar.gz touch backup-0000001-20200103-1188.log #touch backup-0000001-20200103-1188-mysqldump.sql.gz touch backup-0000001-20200103-1188-toptech-software.tar.gz touch backup-0000001-20200104-1189-critical-data.tar.gz touch backup-0000001-20200104-1189.log touch backup-0000001-20200104-1189-mysqldump.sql.gz touch backup-0000001-20200104-1189-toptech-software.tar.gz touch backup-0000001-20200105-1190-critical-data.tar.gz touch backup-0000001-20200105-1190.log touch backup-0000001-20200105-1190-mysqldump.sql.gz touch backup-0000001-20200105-1190-toptech-software.tar.gz #touch backup-0000001-20200106-1191-critical-data.tar.gz touch backup-0000001-20200106-1191.log touch backup-0000001-20200106-1191-mysqldump.sql.gz touch backup-0000001-20200106-1191-toptech-software.tar.gz touch backup-0000001-20200107-1192-critical-data.tar.gz touch backup-0000001-20200107-1192.log touch backup-0000001-20200107-1192-mysqldump.sql.gz touch backup-0000001-20200107-1192-toptech-software.tar.gz keep=4 found=0 for i in {0..13} do checkdate=$(date --date="-$i days" +"%Y%m%d") count=$(find backup-0000001-"$checkdate"-* -type f -printf '.' | wc -c) if [ $count -eq 4 ] && [ $found -lt $keep ] then found=$((found+1)) echo Checking $checkdate, we found $count files. We are keeping this date, currently we have $found dates saved. elif [ $count -gt 0 ] && [ ! $count -eq 4 ] then echo Incorrect number of files '('$count')' found, removing invalid backup elif [ $count -gt 0 ] && [ $found -eq $keep ] then echo We have already found $keep full sets of backup files. Removing backup files dated $checkdate. else echo The date $checkdate returned $count files. fi done
-
@JaredBusch Could you use the stat command?
stat -c "%y" /path/*
-
@DustinB3403 said in Cleanup script help:
@JaredBusch Could you use the stat command?
stat -c "%y" /path/*
I don't know that I can 100% trust the file date to match the date in the filename.
-
ok this gets me just the date bit. now to get it into an array of unique only.
for f in backup-* do echo ${f:15:8} done
-
ok this is what I came up with.
#!/bin/bash # Send everything to logs and screen. exec 1> >(logger -s -t $(basename $0)) 2>&1 # Variables and descriptions of their use. # Array of dates found in the filename of the backup files. arrDates=() # Number of full backup sets to keep. keep=4 # How many full backup sets have been found. found=0 # Bas path to the backup files, minus the last folder. base="/home/jbusch/" # Full path to the backup files, populated by the script. path="" # This script requires that the final folder name be passed as a paramter. # This is because it is designed to be ran independently for each subfolder. # ex: ./file_cleanup.sh Hartford # ex: ./file_cleanup.sh Seymour #check for the path to be passed if [ ! -z "$1" ] then # Create the full path to be checked based on the passed parameter. path=$base$1 else exit 127 fi printf "Executing cleanup of backup files located in $path.\n" # Loop through all of the files in the path and parse out an array of the file dates from the file names. # All backups are named `backup-0000001-YYYYMMDD-XXXX*`. cd $path for f in backup-* do # The date is from character 15 for 8 characters. arrDates=("${arrDates[@]}" "${f:15:8}") done cd ~ # Sort in reverse order and only show unique dates. arrDates=($(printf '%s\n' "${arrDates[@]}" | sort -ru)) # Loop through the array of dates and check for there to be 4 files for each date. for checkdate in "${arrDates[@]}" do count=$(find "$path"/backup-0000001-"$checkdate"-* -type f -printf '.' | wc -c) if [ $count -eq 4 ] && [ $found -lt $keep ] then found=$((found+1)) printf "Checking $checkdate, we found $count files. We are keeping this date, currently we have $found dates saved.\n" elif [ $count -gt 0 ] && [ ! $count -eq 4 ] then printf "Incorrect number of files '('$count')' found, removing invalid backup dated $checkdate.\n" rm $path/backup-*-$checkdate-* elif [ $count -gt 0 ] && [ $found -eq $keep ] then printf "We have already found $keep full sets of backup files. Removing backup files dated $checkdate.\n" rm $path/backup-*-$checkdate-* else printf "The date $checkdate returned $count files. This is an unhandled scenario, doing nothing.\n" fi done
output look like this
[jbusch@dt-jared FTPTest]$ ./file_cleanup.sh FTPTest <13>Jan 7 16:51:59 file_cleanup.sh: Checking 20200107, we found 4 files. We are keeping this date, currently we have 1 dates saved. <13>Jan 7 16:51:59 file_cleanup.sh: Checking 20200105, we found 4 files. We are keeping this date, currently we have 2 dates saved. <13>Jan 7 16:51:59 file_cleanup.sh: Checking 20200104, we found 4 files. We are keeping this date, currently we have 3 dates saved. <13>Jan 7 16:51:59 file_cleanup.sh: Checking 20200103, we found 4 files. We are keeping this date, currently we have 4 dates saved. <13>Jan 7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20200102. <13>Jan 7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191230. <13>Jan 7 16:51:59 file_cleanup.sh: We have already found 4 full sets of backup files. Removing backup files dated 20191228.