Waraxe IT Security Portal  
  Login or Register
::  Home  ::  Search  ::  Your Account  ::  Forums  ::   Waraxe Advisories  ::  Tools  ::
December 7, 2019
Menu
 Home
 Logout
 Discussions
 Forums
 Members List
 IRC chat
 Tools
 Base64 coder
 MD5 hash
 CRC32 checksum
 ROT13 coder
 SHA-1 hash
 URL-decoder
 Sql Char Encoder
 Affiliates
 y3dips ITsec
 Md5 Cracker
 User Manuals
 AlbumNow
 Content
 Content
 Sections
 FAQ
 Top
 Info
 Feedback
 Recommend Us
 Search
 Journal
 Your Account



User Info
Welcome, Anonymous
Nickname
Password
(Register)

Membership:
Latest: MichaelSnaRe
New Today: 0
New Yesterday: 0
Overall: 9145

People Online:
Visitors: 300
Members: 0
Total: 300
PacketStorm News
Currently there is a problem with headlines from this site
Log in Register Forum FAQ Memberlist Search
IT Security and Insecurity Portal

www.waraxe.us Forum Index -> Wordlists -> a problem with removing duplicates from wordlists
Post new topic  Reply to topic View previous topic :: View next topic 
a problem with removing duplicates from wordlists
PostPosted: Fri Jun 13, 2008 4:40 am Reply with quote
earthquaker
Advanced user
Advanced user
 
Joined: Jun 02, 2008
Posts: 111
Location: q8




hey, i am having a problem with removing duplicates from my wordlists as i have got lists that are over 2 gb so i tried couple of softwares but they suddenly crash!

does any one know how can i remove duplicates from such large files because i have a total of 46Gb wordlists and im willing to share them but its hard to share huge amount of wordlists with alot of duplicates

thanks
View user's profile Send private message
PostPosted: Tue Sep 23, 2008 8:37 pm Reply with quote
stereoa
Beginner
Beginner
 
Joined: Sep 23, 2008
Posts: 4




usort?
View user's profile Send private message
PostPosted: Tue Sep 23, 2008 8:39 pm Reply with quote
waraxe
Site admin
Site admin
 
Joined: May 11, 2004
Posts: 2407
Location: Estonia, Tartu




I am using php/mysql for wordlist storage and compilation, but thats just because php is my favourite tool Smile
View user's profile Send private message Send e-mail Visit poster's website
PostPosted: Thu Nov 06, 2008 12:48 am Reply with quote
Sm0ke
Moderator
Moderator
 
Joined: Nov 25, 2006
Posts: 141
Location: Finland




Use PasswordsPro dictionary sorting its fast for big wordlists.
View user's profile Send private message
PostPosted: Mon Feb 09, 2009 4:33 pm Reply with quote
Baston
Regular user
Regular user
 
Joined: Dec 16, 2008
Posts: 17




I had the same problem and i've written a little script in perl to split my wl in many files based on the first char of the line.
With that, you can then dedup every file and you are sure that when you add them back, you won't have any dupe ...

Code:

#!/usr/bin/perl

my $dir = 'alpha';
my $pre = 'singles-';
my $post = '.txt';
my @chars = ('A' .. 'Z', 0 .. 9);
my %files;

## Make the dir
if (!-d $dir) {
   mkdir $dir or die "Cannot create $dir\n";
}

## Open files
for my $char ( @chars ) {
    my $file = $pre.$char.$post;
    open $files{$char}, '>>', "$dir/$file" or die "Canīt open $file: $!\n";
}
open (OUTOTHER, ">> $dir/$pre"."!"."$post") or die "Cannot open output file $dir/$pre.!.$post\n";

## Processing files
my $infile;
foreach my $param (@ARGV) {
  print "processing $param \n";
  open (INFILE, "<$param") or die "Cannot open input file $infile\n";
   while (<INFILE>) {
        my $line=$_;
         chomp($line);
      my $start = uc(substr($line,0,1)); #Returns the first char of the line uppercase
               
                if ($start =~ /[A-Z0-9]/) {
                    print {$files{$start}} "$line\n";
                }
                else {
                    print OUTOTHER $line,"\n";
                }
   }
   close(INFILE);
   print "$param processed\n";
}

## Closing files
close $_ foreach values %files;
close(OUTOTHER);
View user's profile Send private message
PostPosted: Tue Aug 25, 2009 6:00 am Reply with quote
Mooka91
Advanced user
Advanced user
 
Joined: Aug 15, 2009
Posts: 73




http://hashkiller.com/files/downloads/wordlist-tools/

Take your pick, Plenty of Dupe removers there
View user's profile Send private message
a problem with removing duplicates from wordlists
  www.waraxe.us Forum Index -> Wordlists
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
All times are GMT  
Page 1 of 1  

  
  
 Post new topic  Reply to topic  




Powered by phpBB Đ 2001-2008 phpBB Group






Book Opinions
All logos and trademarks in this site are property of their respective owner. The comments and posts are property of their posters, all the rest (c) 2004-2013 Janek Vind "waraxe"
Page Generation: 0.077 Seconds