Dedupe A Huge List In Powershell/DOS (Windows Command Line)?

Discussion in 'Mail Chat' started by phdesign, May 27, 2016.

  1. phdesign

    phdesign Active Member

    Joined:
    Dec 12, 2011
    Messages:
    123
    Likes Received:
    32
    Trophy Points:
    28
    Hey All,

    Every once in a while when I have a new machine to set up, I grab a bunch of current full suppression files I want to combine. I extract all the files, concatenate them with a quick DOS command
    copy *.txt list.txt
    And end up with a list that is well over 1.5GB, emails only, 1 column.

    First off, it is too big for my good old ElistPro to dedupe. I'm on Windows 10, Core i7 16GB RAM with 1TB of disk space so the computer isn't the holdup.

    Sure, I can aggregate the 30+ individual files 8 or 10 at a time and use ElistPro to get this done. Ok... already did that. Not a big deal, but not automated either.

    But for the sake of knowing if it is possible, is there a Powershell or DOS script that can do this in less than a few minutes?

    I tried this and had to kill it after 20 minutes (it had 11.4GB of my RAM tied up and I got bored of waiting):

    gc list.txt| sort | get-unique > list_unique.txt

    I did test on some small files and it worked. However, on the biggy, it fails.

    Perhaps someone knows the way to take the 30+ files and sort/dedupe them when combining them?


    Or does anyone know an easy DOS command that does it similar to how I combine them above:

    copy *.txt {some command to only get unique rows} list.txt

    I know there are some easier ways on a Linux machine, but I run some automated scripts on my Windows laptop and this would make my life a few steps more simple. (And yes, I know DOS isn't really DOS anymore...but I think most everyone knows what I mean when I say DOS right...)

    Thanks
     
  2. damian

    damian Active Member

    Joined:
    Oct 2, 2012
    Messages:
    228
    Likes Received:
    35
    Trophy Points:
    28
    maybe its not the computer but elist? Never used that one but I've deduped plenty of 1gb+ files using sslm no problem.
     
  3. phdesign

    phdesign Active Member

    Joined:
    Dec 12, 2011
    Messages:
    123
    Likes Received:
    32
    Trophy Points:
    28
    Yeah, elist is outdated and seems problematic with 64bit and/or newer versions of Windows.

    Really looking for a single command that can be called automatically and do it in not more than a couple of minutes. I have never used SSLM - seems someone was passing it around here recently or wanted to put it in the tools. But again, not sure that solves my desire for automation.
     
  4. nickphx

    nickphx VIP

    Joined:
    Apr 2, 2011
    Messages:
    1,139
    Likes Received:
    363
    Trophy Points:
    83
    Gender:
    Male
    Location:
    guadalajara, chiuhuahua
  5. swank

    swank Member

    Joined:
    Dec 12, 2013
    Messages:
    282
    Likes Received:
    20
    Trophy Points:
    18
    Location:
    Internet
    I never thought about trying powershell. Id imagine there is a script that would work though. What about trying to use AutoIt with some of the work? Just have to find it out there. I know sslm should do the trick. I have a registered copy of it. There is also Lmate Plat you could try. Not sure if Shawn still sells or supports Lmate Plat (dont think so)

    nickphx knows his shit so id probably try his advice before looking to purchase or seek out these apps.
     
  6. guidito

    guidito VIP

    Joined:
    Mar 25, 2014
    Messages:
    56
    Likes Received:
    8
    Trophy Points:
    8
    Gender:
    Male
    Location:
    Buenos Aires, Argentina
    lol who uses powershell
    cygwin is the way to go I use it too.
    or get a copy of SSLM
     
  7. phdesign

    phdesign Active Member

    Joined:
    Dec 12, 2011
    Messages:
    123
    Likes Received:
    32
    Trophy Points:
    28
    Sadly I do use a few "DOS" commands for easy file manipulation, but I don't actually use Powershell. Was looking at Powershell not because I actually use it but just something that worked better than DOS without having to install yet something else... and that could be controlled automatically.

    I don't yet have a copy of SSLM... can you even buy that these days and does it deal well with massive files on Windoze 10?
    Someone was going to send me SSLM to "evaluate" but never connected with him. I've just been using Listmate and running the files in segments. Couple extra steps but it does the job.
     
  8. swank

    swank Member

    Joined:
    Dec 12, 2013
    Messages:
    282
    Likes Received:
    20
    Trophy Points:
    18
    Location:
    Internet
  9. guidito

    guidito VIP

    Joined:
    Mar 25, 2014
    Messages:
    56
    Likes Received:
    8
    Trophy Points:
    8
    Gender:
    Male
    Location:
    Buenos Aires, Argentina

Share This Page