ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Linux: Looking for Large Folders with du

    IT Discussion
    linux sam linux administration du df unix scottalanmiller
    2
    3
    2.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller
      last edited by

      One of the most common tasks of the system administration is locating "what is using up the disk space" on a machine. You might look at a filesystem using df and find that it is using more space than you expected and you want to find out where that space is being used.

      Filesystem      Size  Used Avail Use% Mounted on
      /dev/sda1       114G   18G   91G  17% /
      

      Using the du command, and combining it with a few, simple command line tools, we can quickly and manually explore the filesystem to look for important space wasters. We start by doing a summary du on the root of the filesystem in question, which is the root / in this case.

      # du -smx --exclude=proc /* | sort -n | tail -n 5
      705	/opt
      1050	/root
      2755	/var
      5572	/usr
      7135	/home
      

      Wow, that seems like a long command. We should break it down to understand what we just did. First the du portion. We start with the -smx flags. This mean to summarize each directory that we encounter, display output in megabytes and the x means to limit the recursive file discovery to only the current filesystem (any other filesystem mounted under one of those locations will be skipped.) The --exclude=proc portion tells the du command not to read the /proc folder as that is not an on disk system and will cause errors and delays in the command unnecessarily. The directory /* option denotes to read everything (the * wildcard) under the root / mount point. Then the output of that statement is piped (see our lesson on BASH redirection and pipes) into the sort command where we use the -n option to make it sort numerically instead of alphabetically. Finally we pipe that output into the tail command where we limit the output to the final (or largest) five items discovered by the initial du command. It is because of the sorting that we need to use megabytes instead of human readable form in the initial command.

      That might seem like a lot at first but once you know the simple building blocks of du, sort and tail along with BASH command structures it is quite simple and straightforward and similar to many tasks that we will do as system administrators using standard tools.

      Now, given the output of the command that we just saw, we can delve deeper into the directory structure to narrow down where culprits may exist. One of the reasons that we often do this task manually is that it is just simply quick and easy and does not require a more complicated tool, but also because we can easily massage the data to take into account things that we know about the system, like that the /home directory contains things that we cannot delete and investigating it is a waste of time (that's an example and would not normally be true.) In this case, we will assume that /var is using more space that we feel is appropriate and we will look there to see what is taking up the space that it is.

      We will change directory into the folder in question and run the original command again (removing the absolute path starting point to make it generic so that we can run it again and again.)

      # cd /var
      # du -smx --exclude=proc * | sort -n | tail -n 5
      5	log
      14	backups
      176	tmp
      297	lib
      2265	cache
      

      From this we now see that the cache is the big user of space within the /var directory. We can learn more about what is using space within that by repeating our steps from above.

      # cd cache
      # du -smx --exclude=proc * | sort -n | tail -n 5
      2	man
      7	cups
      7	debconf
      87	apt-xapian-index
      2163	apt
      

      And now we see that the apt directory (its absolute path at this point is /var/cache/apt) is what is using nearly all of the space not only of cache but of var above it.

      # cd apt
      # du -smx --exclude=proc * | sort -n | tail -n 5
      45	pkgcache.bin
      45	srcpkgcache.bin
      2074	archives
      

      Going down into apt we see that archives is nearly all of the space used with apt. We are learning a lot from a single, simple exercise. One more level, we will find what is going on:

      # cd archives
      # du -smx --exclude=proc * | sort -n | tail -n 5
      62	chromium-browser_48.0.2564.116-0ubuntu0.14.04.1.1111_amd64.deb
      66	chromium-browser_49.0.2623.108-0ubuntu0.14.04.1.1113_amd64.deb
      66	chromium-browser_49.0.2623.87-0ubuntu0.14.04.1.1112_amd64.deb
      79	duck_4.9.2.19773_amd64.deb
      82	duck_4.7.5.18825_amd64.deb
      # pwd
      /var/cache/apt/archives
      

      At this bottom most level our command turns up individual files that are of roughly the same size; this tells us that the final directory that we have arrived at (as shown with the pwd command) contains a large number of small files that together add up to take up the large amount of space that we had observed. We can verify this of course using either the ls or du commands, but we already know it to be true. We can also do a quick count of the files in the directory to understand the scope:

      # ls | wc -l
      1251
      

      That is a lot of files, no wonder that even being generally pretty small that they are taking up so much space.

      I recommend doing this as an exercise on your own system. Use du to delve into the filesystem and see what is taking up a large amount of space in different areas.

      Part of a series on Linux Systems Administration by Scott Alan Miller

      1 Reply Last reply Reply Quote 3
      • stacksofplatesS
        stacksofplates
        last edited by

        I've always just used -h for human readable. I never realized -m would give you the MB size.

        scottalanmillerS 1 Reply Last reply Reply Quote 2
        • scottalanmillerS
          scottalanmiller @stacksofplates
          last edited by

          @johnhooks said in Linux: Looking for Large Folders with du:

          I've always just used -h for human readable. I never realized -m would give you the MB size.

          I have the "advantage" of having learned this stuff before the human readable flag was added 😉

          1 Reply Last reply Reply Quote 3
          • 1 / 1
          • First post
            Last post