Gooscan can be used to send an individual query or a series of queries read from a file.The -q option takes one argument, which can be any valid Google query.. ■ [-o output_file] optiona
Trang 1Figure 12.7 Gooscan’s Usage
Gooscan’s most commonly used options are outlined in the included README file
Let’s take a look at how the various options work:
■ <-t target> (required argument) This is the Google appliance or server to scan
An IP address or host name can be used here Caution: Entering www.google.com
here violates Google’s terms of service and is neither recommended nor condoned
by the author
■ <-q query | -i query_file> (required argument) The query or query file to send Gooscan can be used to send an individual query or a series of queries read
from a file.The -q option takes one argument, which can be any valid Google
query For example, these are valid options:
-q googledorks -q "microsoft sucks"
-q "intitle:index.of secret"
■ [ -i input_file] (optional argument) The -i option takes one argument—the
name of a Gooscan data file Using a data file allows you to perform multiple queries with Gooscan See the following list for information about the included Gooscan data files
■ [-o output_file] (optional argument)Gooscan can create a nice HTML output file.This file includes links to the actual Google search results pages for each query
■ [-p proxy:port] (optional argument) This is the address and port of an HTML proxy server Queries will be sent here and bounced off to the appliance indicated
Trang 2with the -t argument.The format can be similar to 10.1.1.150:80 or proxy.validcom-pany.com:8080.
■ [-v] (optional argument) Verbose mode Every program needs a verbose mode, especially when the author sucks with a command-line debugger
■ [-s site] (optional argument) This filters only results from a certain site, adding
the site operator to each query Gooscan submits.This argument has absolutely no
meaning when used against Google appliances, since Google appliances are already site filtered For example, consider the following Google queries:
site:microsoft.com linux site:apple.com microsoft site:linux.org microsoft
■ With advanced express permission from Google (you do have advanced per-mission from Google, don’t you?) you could run the following with Gooscan to achieve the same results:
$ /gooscan -t www.google.com -s microsoft.com linux
$ /gooscan -t www.google.com -s apple.com microsoft
$ /gooscan -t www.google.com -s linux.org microsoft
The [-x] and [-d] options are used with the Google appliance We don’t talk too much
about the Google appliance in this book Suffice it to say that the vast majority of the tech-niques that work against Google.com will work against a Google appliance as well
Gooscan’s Data Files
Used in multiple query mode, Gooscan reads queries from a data file.The format of the data files is as follows:
search_type | search_string | count | description
search_type can be one of the following:
■ intitle Finds search_string in the title of the page If requested on the command
line, Gooscan will append the site query Example:
intitle|error||
This will find the word error in the title of a page.
■ inurl Finds search_string in the URL of the page If requested on the command
line, Gooscan will append the site query Example:
inurl|admin||
Trang 3This will find the word admin in the URL of a page.
■ indexof Finds search_string in a directory listing If requested on the command line,
Gooscan will append the site query Directory listings often will have the term
index of in the title of the page Gooscan will generate a Google query that looks
something like this:
intitle:index.of search_string
NOTE
When using the site switch, Gooscan automatically performs a generic search for directory listings That query looks like this: intitle:index.of site:site_name
If this generic query returns no results, Gooscan will skip any subsequent indexof searches It is a logical conclusion to skip specific indexof searches if the most generic of indexof searches returns nothing
■ filetype Finds search_string as a filename, inserting the site query if requested on the
command line For example:
filetype|cgi cgi||
This search will find files that have an extension of cgi
■ raw This search_type allows the user to build custom queries.The query is passed to
Google unmodified, adding a site query if requested in the command line For example:
raw|filetype:xls email username password||
This example will find Excel spreadsheets with the words email, username, and password inside the document.
■ search string The search_string is fairly straightforward Any string is allowed here except chars \n and |.This string is HTML-ized before sending to Google.The A character is converted to %65, and so on.There are some exceptions, such as the
fact that spaces are converted to the + character
■ count This field records the approximate number of hits found when a similar
query is run against all of Google Site is not applied.This value is somewhat
arbi-trary in that it is based on the rounded numbers supplied by Google and that this number can vary widely based on when and how the search is performed Still, this number can provide a valuable watermark for sorting data files and creating custom
Trang 4data files For example, zero count records could safely be eliminated before run-ning a large search (This field is currently not used by Gooscan.)
■ description This field describes the search type Currently, only the filetype.gs data file populates this field Keep reading for more information on the filetype.gs data file
Several data files are included with Gooscan, each with a distinct purpose:
■ gdork.gs This file includes excerpts from the Google Hacking Database (GHDB) hosted at http://johnny.ihackstuff.com.The GHDB is the Internet’s largest database
of Google hacking queries maintained by thousands of members who make up the Search Engine Hacking Forums, also hosted at http://johnny.ihackstuff.com Updated many times a week, the GHDB currently sits at around 1500 unique queries
■ filetype.gs This huge file contains every known filetype in existence, according to
www.filext.com By selecting interesting lines from this file, you can quickly deter-mine the types of files that exist on a server that might warrant further investiga-tion We suggest creating a subset of this file (with a Linux command such as: head -50 filetype.gs > short_filetype.gs
for use in the field Do not run this file as is It’s too big With over 8,000 queries, this search would certainly take quite a while and burn precious
resources on the target server Instead, rely on the numbers in the count field to
tell you how many (approximate) sites contain these files in Google, selecting only those that are the most common or relevant to your site.The filetypes.gs file lists the most commonly found extensions at the top
■ inurl.gs This very large data file contains strings from the most popular CGI
scan-ners, which excel at locating programs on Web servers Sorted by the approximate number of Google hits, this file lists the most common strings at the top, with very esoteric CGI vulnerability strings listed near the bottom.This data file locates the strings in the URL of a page.This is another file that shouldn’t be run in its entirety
■ indexof.gs Nearly identical to the inurl.gs file, this data file finds the strings in a directory listing Run portions of this file, not all of it!
Using Gooscan
Gooscan can be used in two distinct ways: single-query mode or multiple-query mode
Single-query mode is little better than using Google’s Web search feature, with the exception
that Gooscan will provide you with Google’s number of results in a more portable format
Trang 5As shown in Figure 12.8, a search for the term daemon9 returns 2440 results from all of
Google.To narrow this search to a specific site, such as phrack.org, add the [-s] option For
example:
gooscan -q "daemon9" -t www.google.com -s phrack.org.
Figure 12.8 Gooscan’s Single-Query Mode
Notice that Gooscan presents a very lengthy disclaimer when you select www.google.com as the target server.This disclaimer is only presented when you submit a
search that potentially violates Google TOS.The output from a standard Gooscan run is
fairly paltry, listing only the number of hits from the Google search.You can apply the [-o]
option to create a nicer HTML output format.To run the daemon9 query with nicer output,
run:
gooscan -q "daemon9" -t www.google.com -o daemon9.html
As shown in Figure 12.9, the HTML output lists the options that were applied to the
Gooscan run, the date the scan was performed, a list of the queries, a link to the actual
Google search, and the number of results
Trang 6Figure 12.9 Gooscan’s HTML Output in Single-Query Mode
The link in the HTML output points to Google Clicking the link will perform the Google search for you Don’t be too surprised if the numbers on Google’s page differ from what is shown in the Gooscan output; Google’s search results are sometimes only approxi-mations
Running Google in multiple-query mode is a blatant violation of Google’s TOS but shouldn’t cause too much of a Google-stink if it’s done judiciously One way to keep Google
on your good side is to respect the spirit of its TOS by sending small batches of queries and not pounding the server with huge data files As shown in Figure 12.10, you can create a
small data file using the head command A command such as:
head –5 data_files/gdork.gs > data_files/little_gdork.gs
will create a four-query data file, since the gdork.gs file has a commented header line
Trang 7Figure 12.10 Running Small Data Files Could Keep Google from Frowning at You
The output from the multiple-query run of Gooscan is still paltry, so let’s take a look at the HTML output shown in Figure 12.11
Figure 12.11 Gooscan’s HTML Output in Multiple-Query Mode
Trang 8Using Gooscan with the [-s] switch we can narrow our results to one particular site, in
this case http://johnny.ihackstuff.com, with a command such as:
Gooscan -t www.google.com -i data_files/little_gdork.gs -o ihackstuff.html -s
johnny.ihackstuff.com
as shown in Figure 12.12 (Don’t worry, that Johnny guy won’t mind!)
Figure 12.12 A Site-Narrowed Gooscan Run
Most site-narrowed Gooscan runs should come back pretty clean, as this run did If you see hits that look suspicious, click the link to see exactly what Google saw Figure 12.13 shows the Google search in its entirety
In this case, we managed to locate the Google Hacking Database itself, which included a reference that matched our Google query.The other searches didn’t return any results, because they were a tad more specific than the Calamaris query, which didn’t search titles, URLs, filetypes, and the like
In summary, Gooscan is a great tool for checking your Web site’s exposure, but it should
be used cautiously since it does not use the Google API Break your scans into small batches, unless you (unwisely) like thumbing your nose at the Establishment
Trang 9Figure 12.13 Linking to Google’s Results from Gooscan
Windows Tools and the NET Framework
The Windows tools we’ll look at all require the Microsoft NET framework, which can be
located with a Google query of NET framework download.The successful installation of the
framework depends on a number of factors, but regardless of the version of Windows you’re
running, assume that you must be current on all the latest service packs and updates If
Windows Update is available on your version of Windows, run it.The Internet Explorer
upgrade, available from the Microsoft Web site (Google query: Internet Explorer upgrade) is the
most common required update for successful installation of the NET Framework Before
downloading and installing Athena or Wikto, make sure you’ve got the NET Framework
(versions 1.1 or 2.0 respectively) properly installed
NOTE
The only way Google will explicitly allow you to automate your queries is via the Google Application Programming Interface Some of the API tools cov-ered in this book rely on the SOAP API, which Google discontinued in favor
of the AJAX API If you have an old SOAP API key, you’re in luck That key will still work with API-based tools However, if you don’t have a SOAP key, you should consider using SensePost’s Aura program
(www.sensepost.com/research/aura) as an alternative to the old SOAP API
Trang 10Athena by Steve Lord (steve@buyukada.co.uk) is a Windows-based Google scanner that is not based on the Google API As with Gooscan, the use of this tool is in violation of
Google’s TOS and that as a result, Google can block your IP range from using its search engine Athena is potentially less intrusive than Gooscan, since Athena only allows you to perform one search at a time, but Google’s TOS is clear: no automated scanning is allowed Just as we discussed with Gooscan, use any non-API tool judiciously History suggests that if you’re nice to Google, Google will be nice to you
Athena can be downloaded from http://snakeoillabs.com/.The download consists of a single MSI file Assuming you’ve installed version 1.1 of the NET Framework, the Athena installer is a simple wizard, much like most Windows-based software Once installed and run, Athena presents the main screen, as shown in Figure 12.14
Figure 12.14 Athena’s Main Screen
As shown, this screen resembles a simple Web browser.The Refine Search field allows you
to enter or refine an existing query.The Search button is similar to Google’s Search button and executes a search, the results of which are shown in the browser window
To perform basic searches with Athena, you need to load an XML file containing your desired search strings Simply open the file from within Athena and all the searches will
appear in the Select Query drop-down box For example, loading the digicams XML file