ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Downloading full Website offline

    IT Discussion
    wget website download
    9
    41
    2.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dbeatoD
      dbeato
      last edited by dbeato

      So I have been playing around with downloading a site offline for archiving purposes. In this case I have written scripts for the below:

      For a Full Website (This will download the whole site as it is)

      wget -mkEpnp https://mangolassi.it
      

      For a group of Posts in numerical order(This example downloads all the topics from Mangolassi)

      #!/bin/bash
      for i in {1..2200000}
      do
        wget -mkEpnp https://mangolassi.it/topic/$i
      done
      
      Emad RE 1 Reply Last reply Reply Quote 4
      • scottalanmillerS
        scottalanmiller
        last edited by

        I wish that we had that many topics, ha.

        dbeatoD 1 Reply Last reply Reply Quote 0
        • dbeatoD
          dbeato @scottalanmiller
          last edited by

          @scottalanmiller said in Downloading full Website offline:

          I wish that we had that many topics, ha.

          #!/bin/bash
          for i in {1..2200000}
          do
            wget -mkEpnp https://community.spiceworks.com/topic/$i
          done
          

          Example for Spiceworks.

          1 Reply Last reply Reply Quote 1
          • scottalanmillerS
            scottalanmiller
            last edited by

            Works really well. Testing against ML now. Not the most elegant way to get full content, kind of brute force. But it gets it all, and that's the important part. Gets all the media along with it, like images. So you end up with multiple copies of a lot of that stuff, I would imagine. Takes a while to run because it following the millions of links to gets everything related to a page, not just the page itself. But boy is it fast.

            1 Reply Last reply Reply Quote 0
            • dbeatoD
              dbeato
              last edited by

              I also found this application for Windows that does the same
              https://www.cyotek.com/cyotek-webcopy

              scottalanmillerS 1 Reply Last reply Reply Quote 2
              • scottalanmillerS
                scottalanmiller @dbeato
                last edited by

                @dbeato I've got a cool script that tells you which threads on ML are popular. Kind of heavy, and only so useful alone, but it is trivial to modify it to track scripts that you care about. I'll make a post for it.

                dbeatoD 1 Reply Last reply Reply Quote 1
                • scottalanmillerS
                  scottalanmiller
                  last edited by

                  You can tell when you run that thing, both the NodeBB platform and CloudFlare really show the traffic shape change. One big thing is because it harvests old thread it hits content that is not cached. So the cache hit ratio just takes a beating.

                  1 Reply Last reply Reply Quote 0
                  • Emad RE
                    Emad R @dbeato
                    last edited by

                    @dbeato

                    Save Page WE
                    https://chrome.google.com/webstore/detail/save-page-we/dhhpefjklgkmgeafimnjhojgjamoafof

                    Extension on chrome and firefox, saves single page using MHT and does that in good way, if you want single page

                    1 Reply Last reply Reply Quote 1
                    • dbeatoD
                      dbeato @scottalanmiller
                      last edited by

                      @scottalanmiller said in Downloading full Website offline:

                      @dbeato I've got a cool script that tells you which threads on ML are popular. Kind of heavy, and only so useful alone, but it is trivial to modify it to track scripts that you care about. I'll make a post for it.

                      That would be cool to see.

                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                      • scottalanmillerS
                        scottalanmiller @dbeato
                        last edited by

                        @dbeato said in Downloading full Website offline:

                        @scottalanmiller said in Downloading full Website offline:

                        @dbeato I've got a cool script that tells you which threads on ML are popular. Kind of heavy, and only so useful alone, but it is trivial to modify it to track scripts that you care about. I'll make a post for it.

                        That would be cool to see.

                        It has been posted.

                        dbeatoD 1 Reply Last reply Reply Quote 1
                        • dbeatoD
                          dbeato @scottalanmiller
                          last edited by

                          @scottalanmiller said in Downloading full Website offline:

                          @dbeato said in Downloading full Website offline:

                          @scottalanmiller said in Downloading full Website offline:

                          @dbeato I've got a cool script that tells you which threads on ML are popular. Kind of heavy, and only so useful alone, but it is trivial to modify it to track scripts that you care about. I'll make a post for it.

                          That would be cool to see.

                          It has been posted.

                          I saw... I am slow

                          1 Reply Last reply Reply Quote 0
                          • ObsolesceO
                            Obsolesce
                            last edited by

                            Why back it up that way vs the server or files and DB?

                            dbeatoD 1 Reply Last reply Reply Quote 0
                            • dbeatoD
                              dbeato @Obsolesce
                              last edited by

                              @Obsolesce said in Downloading full Website offline:

                              Why back it up that way vs the server or files and DB?

                              Because I don't have access to them at least on those two examples.

                              ObsolesceO 1 Reply Last reply Reply Quote 0
                              • ObsolesceO
                                Obsolesce @dbeato
                                last edited by Obsolesce

                                @dbeato said in Downloading full Website offline:

                                @Obsolesce said in Downloading full Website offline:

                                Why back it up that way vs the server or files and DB?

                                Because I don't have access to them at least on those two examples.

                                Oh, why are you supposed to back up ML without having access to the back-end?

                                And, how would you restore anything with those backups?

                                dbeatoD scottalanmillerS 3 Replies Last reply Reply Quote 0
                                • dbeatoD
                                  dbeato @Obsolesce
                                  last edited by

                                  @Obsolesce said in Downloading full Website offline:

                                  @dbeato said in Downloading full Website offline:

                                  @Obsolesce said in Downloading full Website offline:

                                  Why back it up that way vs the server or files and DB?

                                  Because I don't have access to them at least on those two examples.

                                  Oh, why are you supposed to back up ML without having access to the back-end?

                                  And, how would you restore anything with those backups?

                                  I am not, I was just downloading an offline version. It was a test. ML is pretty big and other forums are big so not a backup.

                                  1 Reply Last reply Reply Quote 0
                                  • scottalanmillerS
                                    scottalanmiller @Obsolesce
                                    last edited by

                                    @Obsolesce said in Downloading full Website offline:

                                    @dbeato said in Downloading full Website offline:

                                    @Obsolesce said in Downloading full Website offline:

                                    Why back it up that way vs the server or files and DB?

                                    Because I don't have access to them at least on those two examples.

                                    Oh, why are you supposed to back up ML without having access to the back-end?

                                    And, how would you restore anything with those backups?

                                    Its an emergency procedure for someone who worries that something might happen to the community and disappear. You could programtically reconstruct the community if you had to.

                                    DB access is way better. Obviously.

                                    1 Reply Last reply Reply Quote 0
                                    • scottalanmillerS
                                      scottalanmiller @Obsolesce
                                      last edited by

                                      @Obsolesce said in Downloading full Website offline:

                                      @dbeato said in Downloading full Website offline:

                                      @Obsolesce said in Downloading full Website offline:

                                      Why back it up that way vs the server or files and DB?

                                      Because I don't have access to them at least on those two examples.

                                      Oh, why are you supposed to back up ML without having access to the back-end?

                                      And, how would you restore anything with those backups?

                                      RE: Restore

                                      It builds a static version of the site that you could host.

                                      1 Reply Last reply Reply Quote 0
                                      • DashrenderD
                                        Dashrender
                                        last edited by

                                        So - is someone considering doing that in case another site fails? I wonder how much storage is needed?

                                        dbeatoD 1 Reply Last reply Reply Quote 1
                                        • dbeatoD
                                          dbeato @Dashrender
                                          last edited by

                                          @Dashrender said in Downloading full Website offline:

                                          I wonder how much storage is needed?

                                          For example ML took about 24 GB of two days downloading, I stopped it because I didn't need it.

                                          DashrenderD scottalanmillerS 2 Replies Last reply Reply Quote 0
                                          • DashrenderD
                                            Dashrender @dbeato
                                            last edited by

                                            @dbeato said in Downloading full Website offline:

                                            @Dashrender said in Downloading full Website offline:

                                            I wonder how much storage is needed?

                                            For example ML took about 24 GB of two days downloading, I stopped it because I didn't need it.

                                            lol, not the site I was talking about 😛

                                            dbeatoD 1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 1 / 3
                                            • First post
                                              Last post