Google hacking for penetration tester - part 21 doc

We already have a scraper see the previous section and thus we just need something that will extract the meta information from the file.Thomas Springer at ServerSniff.net was kind enough

Trang 1

Figure 5.20 The LinkedIn Profile of the Author of a Government Document

Can this process of grabbing documents and analyzing them be automated? Of course!

As a start we can build a scraper that will find the URLs of Office documents (.doc, ppt, xls, pps) We then need to download the document and push it through the meta information

parser Finally, we can extract the interesting bits and do some post processing on it We

already have a scraper (see the previous section) and thus we just need something that will

extract the meta information from the file.Thomas Springer at ServerSniff.net was kind

enough to provide me with the source of his document information script After some slight changes it looks like this:

#!/usr/bin/perl

# File-analyzer 0.1, 07/08/2007, thomas springer

# stripped-down version

# slightly modified by roelof temmingh @ paterva.com

# this code is public domain - use at own risk

# this code is using phil harveys ExifTool - THANK YOU, PHIL!!!!

# http://www.ebv4linux.de/images/articles/Phil1.jpg

Trang 2

use strict;

use Image::ExifTool;

#passed parameter is a URL

my ($url)=@ARGV;

# get file and make a nice filename

my $file=get_page($url);

my $time=time;

my $frand=rand(10000);

my $fname="/tmp/".$time.$frand;

# write stuff to a file

open(FL, ">$fname");

print FL $file;

close(FL);

# Get EXIF-INFO

my $exifTool=new Image::ExifTool;

$exifTool->Options(FastScan => '1');

$exifTool->Options(Binary => '1');

$exifTool->Options(Unknown => '2');

$exifTool->Options(IgnoreMinorErrors => '1');

my $info = $exifTool->ImageInfo($fname); # feed standard info into a hash

# delete tempfile

unlink ("$fname");

my @names;

print "Author:".$$info{"Author"}."\n";

print "LastSaved:".$$info{"LastSavedBy"}."\n";

print "Creator:".$$info{"creator"}."\n";

print "Company:".$$info{"Company"}."\n";

print "Email:".$$info{"AuthorEmail"}."\n";

exit; #comment to see more fields

foreach (keys %$info){

print "$_ = $$info{$_}\n";

}

Trang 3

sub get_page{

my ($url)=@_;

#use curl to get it - you might want change this

# 25 second timeout - also modify as you see fit

my $res=`curl -s -m 25 $url`;

return $res;

}

Save this script as docinfo.pl.You will notice that you’ll need some PERL libraries to use this, specifically the Image::ExifTool library, which is used to get the meta data from the files.

The script uses curl to download the pages from the server, so you’ll need that as well Curl

is set to a 25-second timeout On a slow link you might want to increase that Let’s see how

this script works:

$ perl docinfo.pl http://www.elsevier.com/framework_support/permreq.doc

Author:Catherine Nielsen

LastSaved:Administrator

Creator:

Company:Elsevier Science

Email:

The scripts looks for five fields in a document: Author, LastedSavedBy, Creator, Company, and AuthorEmail.There are many other fields that might be of interest (like the software used

to create the document) On it’s own this script is only mildly interesting, but it really starts

to become powerful when combining it with a scraper and doing some post processing on

the results Let’s modify the existing scraper a bit to look like this:

#!/usr/bin/perl

use strict;

my ($domain,$num)=@ARGV;

my @types=("doc","xls","ppt","pps");

my $result;

foreach my $type (@types){

$result=`curl -s -A moo

"http://www.google.com/search?q=filetype:$type+site:$domain&hl=en&

num=$num&filter=0"`;

parse($result);

}

sub parse {

Trang 4

my $start;

my $end;

my $token="<div class=g>";

my $count=1;

while (1){

$start=index($result,$token,$start);

$end=index($result,$token,$start+1);

if ($start == -1 || $end == -1 || $start == $end){

last;

}

my $snippet=substr($result,$start,$end-$start);

my ($pos,$url) = cutter("<a href=\"","\"",0,$snippet);

my ($pos,$heading) = cutter(">","</a>",$pos,$snippet);

my ($pos,$summary) = cutter(""," ",$pos,$snippet);

# remove and

$heading=cleanB($heading);

$url=cleanB($url);

$summary=cleanB($summary);

print $url."\n";

$start=$end;

$count++;

}

sub cutter{

my ($starttok,$endtok,$where,$str)=@_;

my $startcut=index($str,$starttok,$where)+length($starttok);

my $endcut=index($str,$endtok,$startcut+1);

my $returner=substr($str,$startcut,$endcut-$startcut);

my @res;

push @res,$endcut;

push @res,$returner;

return @res;

}

sub cleanB{

Trang 5

my ($str)=@_;

$str=~s///g;

$str=~s/<\/b>//g;

return $str;

}

Save this script as scraper.pl.The scraper takes a domain and number as parameters.The

number is the number of results to return, but multiple page support is not included in the

code However, it’s child’s play to modify the script to scrape multiple pages from Google.

Note that the scraper has been modified to look for some common Microsoft Office

for-mats and will loop through them with a site:domain_parameter filetype:XX search term Now

all that is needed is something that will put everything together and do some post processing

on the results.The code could look like this:

#!/bin/perl

use strict;

my ($domain,$num)=@ARGV;

my %ALLEMAIL=(); my %ALLNAMES=();

my %ALLUNAME=(); my %ALLCOMP=();

my $scraper="scrape.pl";

my $docinfo="docinfo.pl";

print "Scraping please wait \n";

my @all_urls=`perl $scraper $domain $num`;

if ($#all_urls == -1 ){

print "Sorry - no results!\n";

exit;

}

my $count=0;

foreach my $url (@all_urls){

print "$count / $#all_urls : Fetching $url";

my @meta=`perl $docinfo $url`;

foreach my $item (@meta){

process($item);

}

$count++;

}

#show results

Trang 6

print "\nEmails:\n -\n";

foreach my $item (keys %ALLEMAIL){

print "$ALLEMAIL{$item}:\t$item";

}

print "\nNames (Person):\n -\n";

foreach my $item (keys %ALLNAMES){

print "$ALLNAMES{$item}:\t$item";

}

print "\nUsernames:\n -\n";

foreach my $item (keys %ALLUNAME){

print "$ALLUNAME{$item}:\t$item";

}

print "\nCompanies:\n -\n";

foreach my $item (keys %ALLCOMP){

print "$ALLCOMP{$item}:\t$item";

}

sub process {

my ($passed)=@_;

my ($type,$value)=split(/:/,$passed);

$value=~tr/A-Z/a-z/;

if (length($value)<=1) {return;}

if ($value =~ /[a-zA-Z0-9]/){

if ($type eq "Company"){$ALLCOMP{$value}++;}

else {

if (index($value,"\@")>2){$ALLEMAIL{$value}++; } elsif (index($value," ")>0){$ALLNAMES{$value}++; } else{$ALLUNAME{$value}++; }

}

This script first kicks off scraper.pl with domain and the number of results that was

passed to it as parameters It captures the output (a list of URLs) of the process in an array,

and then runs the docinfo.pl script against every URL.The output of this script is then sent

for further processing where some basic checking is done to see if it is the company name,

an e-mail address, a user name, or a person’s name.These are stored in separate hash tables for later use When everything is done, the script displays each collected piece of informa-tion and the number of times it occurred across all pages Does it actually work? Have a look:

Trang 7

# perl combined.pl xxx.gov 10

Scraping please wait

0 / 35 : Fetching http://www.xxx.gov/8878main_C_PDP03.DOC

1 / 35 : Fetching http://***.xxx.gov/1329NEW.doc

2 / 35 : Fetching http://***.xxx.gov/LP_Evaluation.doc

3 / 35 : Fetching http://*******.xxx.gov/305.doc

<cut>

Emails:

-1: ***zgpt@***.ksc.xxx.gov

1: ***ikrb@kscems.ksc.xxx.gov

1: ***ald.l.***mack@xxx.gov

1: ****ie.king@****.xxx.gov

Names (Person):

-1: audrey sch***

1: corina mo****

1: frank ma****

2: eileen wa****

2: saic-odin-**** hq

1: chris wil****

1: nand lal****

1: susan ho****

2: john jaa****

1: dr paul a cu****

1: *** project/code 470

1: bill mah****

1: goddard, pwdo - bernadette fo****

1: joanne wo****

2: tom naro****

1: lucero ja****

1: jenny rumb****

1: blade ru****

1: lmit odi****

2: **** odin/osf seat

1: scott w mci****

2: philip t me****

1: annie ki****

Trang 8

-1: cgro****

1: gidel****

1: rdcho****

1: fbuchan****

2: sst****

1: rbene****

1: rpan****

2: l.j.klau****

1: gane****h

1: amh****

1: caroles****

2: mic****e

1: baltn****r

3: pcu****

1: md****

1: ****wxpadmin

1: mabis****

1: ebo****

2: grid****

1: bkst****

1: ***(at&l)

Companies:

-1: shadow conservatory

[SNIP]

The list of companies has been chopped way down to protect the identity of the gov-ernment agency in question, but the script seems to work well.The script can easily be modified to scrape many more results (across many pages), extract more fields, and get other file types By the way, what the heck is the one unedited company known as the “Shadow Conservatory?”

Trang 9

Figure 5.21 Zero Results for “Shadow Conservatory”

The tool also works well for finding out what (and if ) a user name format is used.

Consider the list of user names mined from somewhere:

Usernames:

-1: 79241234

1: 78610276

1: 98229941

1: 86232477

2: 82733791

2: 02000537

1: 79704862

1: 73641355

2: 85700136

From the list it is clear that an eight-digit number is used as the user name.This infor-mation might be very useful in later stages of an attack.

Taking It One Step Further

Sometimes you end up in a situation where you want to hook the output of one search as

the input for another process.This process might be another search, or it might be

some-thing like looking up an e-mail address on a social network, converting a DNS name to a

domain, resolving a DNS name, or verifying the existence of an e-mail account How do I

Trang 10

link two e-mail addresses together? Consider Johnny’s e-mail address johnny@ihackstuff.com and my previous e-mail address at SensePost roelof@sensepost.com.To link these two addresses

together we can start by searching for one of the e-mail addresses and extracting sites, e-mail addresses, and phone numbers Once we have these results we can do the same for the other e-mail address and then compare them to see if there are any common results (or nodes) In this case there are common nodes (see Figure 5.22).

Figure 5.22 Relating Two E-mail Addresses from Common Data Sources

If there are no matches, we can loop through all of the results of the first e-mail address, again extracting e-mail addresses, sites, and telephone numbers, and then repeat it for the second address in the hope that there are common nodes.

What about more complex sequences that involve more than searching? Can you get locations of the Pentagon data centers by simply looking at public information? Consider Figure 5.23.

What’s happening here? While it looks seriously complex, it really isn’t.The procedure

to get to the locations shown in this figure is as follows:

Tiêu đề	Google’s Part In An Information Collection Framework
Tác giả	Thomas Springer, Roelof Temmingh
Trường học	ServerSniff.net
Chuyên ngành	Information Security
Thể loại	Bài viết
Năm xuất bản	2007
Thành phố	Unknown

Định dạng
Số trang	10
Dung lượng	355,85 KB