Working with Files In Linux



  • I am working on document cleanup in an ancient custom (shitty) application we are trying to retire. Basically, there are files everywhere, and I need to find the files that are referenced in the database in the filesystem. My plan is to dump the file references from the application's database into a table, and do the same for the filesystem in another table. I will then match by filename and go from there.

    However, I'm not sure how to approach capturing the files at the filesystem level. Say said files are structured in /this/directory, what would be the best way to capture the following data?

    Filename | Absolute Path | Modified Date

    Any advice would be appreciated. For what it's worth, this is on CentOS 7.

    Thanks!!


  • Service Provider

    No need to get the filename, the absolute path will include that already.


  • Service Provider

    I'm not clear what you are asking. Do you want a list of ALL files under said /directory or are you looking for only certain ones?



  • @scottalanmiller said:

    No need to get the filename, the absolute path will include that already.

    I want the file name and path to said file separate, but I suppose I could separate them through another step. I'm going to be matching by file name. basically table1.filename = table2.filename



  • @scottalanmiller said:

    I'm not clear what you are asking. Do you want a list of ALL files under said /directory or are you looking for only certain ones?

    Every single file under /this/directory.


  • Service Provider

    @anthonyh said:

    @scottalanmiller said:

    No need to get the filename, the absolute path will include that already.

    I want the file name and path to said file separate, but I suppose I could separate them through another step. I'm going to be matching by file name. basically table1.filename = table2.filename

    Just use a filter on the existing file, no need to make a separate file for that.


  • Service Provider

    @anthonyh said:

    @scottalanmiller said:

    I'm not clear what you are asking. Do you want a list of ALL files under said /directory or are you looking for only certain ones?

    Every single file under /this/directory.

    Oh okay.

    find /dir -type f -print
    

    Where /dir is the directory name in question. See if that gives you want you want.



  • This is super easy to do in Linux.... If you know all the commands like @scottalanmiller! 😃



  • @scottalanmiller said:

    @anthonyh said:

    @scottalanmiller said:

    I'm not clear what you are asking. Do you want a list of ALL files under said /directory or are you looking for only certain ones?

    Every single file under /this/directory.

    Oh okay.

    find /dir -type f -print
    

    Where /dir is the directory name in question. See if that gives you want you want.

    That gives me the absolute path, but no date. I found this command that gets me a little closer:

    find /this/directory -type f -exec stat -c "%n %y" {} ;

    Gives me this:

    /this/directory/data/EFile/DOC/227349_FS86478.pdf 2011-08-19 10:21:22.000000000 -0700

    But it's not ideal, yet. I'd need to delimit the file and timestamp with something other than a space. I would love to eliminate the decimal on the seconds as well as the timezone, but I can work around those.



  • Ooh, I'm very close!

    find /this/directory -type f -printf "%f\t" -printf "%h\t" -printf "%Tc\n"

    Gets me this:

    254405_FS85691.pdf /this/directory/data/EFile/CASEDOC Mon 27 Aug 2012 08:52:15 AM PDT

    If I can get the timestamp formatted as YYY-MM-DDD HH:MM:SS (24h time) I will be golden! I don't care about PDT vs PST.



  • I think I've got it close enough!

    find /this/directory -type f -printf "%f\t" -printf "%h\t" -printf "%TY-%Tm-%Td %TH:%TM\n"

    Result:

    101581_PR78450.pdf /this/directory/data/EFile/MO 2007-10-30 11:16


  • Service Provider

    @anthonyh said:

    @scottalanmiller said:

    @anthonyh said:

    @scottalanmiller said:

    I'm not clear what you are asking. Do you want a list of ALL files under said /directory or are you looking for only certain ones?

    Every single file under /this/directory.

    Oh okay.

    find /dir -type f -print
    

    Where /dir is the directory name in question. See if that gives you want you want.

    That gives me the absolute path, but no date. I found this command that gets me a little closer:

    find /this/directory -type f -exec stat -c "%n %y" {} ;

    Gives me this:

    /this/directory/data/EFile/DOC/227349_FS86478.pdf 2011-08-19 10:21:22.000000000 -0700

    But it's not ideal, yet. I'd need to delimit the file and timestamp with something other than a space. I would love to eliminate the decimal on the seconds as well as the timezone, but I can work around those.

    Easier to work with the date if you use UNIX time instead of a human readable format. And you can use the cut command to trim off anything trailing that you don't want.