I wrote a thing



  • I wrote a utility in Go to get unique strings like the uniq utility. The plus side to mine is you don't need to sort the input, it's faster, and cross platform. You can run over stdin or a file.

    https://gitlab.com/hooksie1/goniq

    Here's a timed run with 2,799,264 words. It's a list of 466,544 words repeated 6 times.

    time sort allwords.txt | uniq
    sort allwords.txt  6.58s user 0.21s system 127% cpu 5.348 total
    uniq  2.88s user 0.79s system 68% cpu 5.347 total
    
    time goniq allwords.txt
    goniq allwords.txt  1.96s user 0.81s system 114% cpu 2.428 total
    

    But even with a sorted list it's still faster:

    uniq allwordssorted.txt  2.90s user 0.73s system 99% cpu 3.651 total
    
    goniq allwordssorted.txt  1.66s user 0.74s system 120% cpu 1.986 total


  • @stacksofplates Very interesting, thanks!


Log in to reply