Figure 12.26 Advanced Dork Link Context Right-clicking on a highlighted text will invoke the highlighted text search mode of Advanced Dork, as shown in Figure 12.27... Table 12.3 Advance
Trang 1Figure 12.24 GSI Options Screen
Execution is simple as well Simply fill in the name of the target website, and click Start GSI.
The results will be shown in a hierarchical format as shown in Figure 12.25
Figure 12.25 GSI Output
Trang 2Notice that the results are presented in a hierarchical tree that represents the files and direc-tories on the target site Each link can be clicked on to browse to the appropriate page
Alternatively you can right-click within Firefox and select GSI In this case, GSI will
launch with the query filled in based on the selected text, or if no text is selected, GSI will automatically fill in the name of the current website
GSI has several options to select from, as shown in Table 12.2
Table 12.2 GSI Options
GSI Option Description
Recursive Search If you choose to use a recursive search, GSI will use inurl
searches For example, if you choose to do a Google Site Index on tankedgenius.com It would first send a query site:tankedgenius.com The query would return a result
of http://www.tankedgenius.com/blog/cp/index.html If a recursive search is at level is at 1, then it would also send
a query of site:tankedgenius.com inurl:blog It would then add the results from that to the index If the recur-sion level is set to 2, it would also send a query of site:tankedgenius.com inurl:cp and get the results
Full website names By default GSI displays an indented site index with only
the directory name showing for each link If you would prefer, you can set this option so that it shows the entire link
NOTE
Due to the nature of the Google queries that GSI sends, GSI may get 403 errors from Google These errors are normal when sending queries with mul-tiple operators
Advanced Dork
Advanced Dork is an extension for Firefox and Mozilla browsers which provides Google Advanced Operators for use directly from the right-click context menu Written by CP, the tool is available from https://addons.mozilla.org/en-US/firefox/addon/2144
Like all Firefox extensions, installation is a snap: simply click the link to the xpi file from within Firefox and the installation will launch
Trang 3Advanced Dork is context sensitive—Right licking will invoke Advanced Dork based on where the right-click was performed For example, right-clicking on a link will invoke
link-specific options as shown in Figure 12.26
Figure 12.26 Advanced Dork Link Context
Right-clicking on a highlighted text will invoke the highlighted text search mode of Advanced Dork, as shown in Figure 12.27
Figure 12.27
Trang 4This mode will allow you to use the highlighted word in an intitle, inurl, intext, site or ext
search Several awesome options are available to Advanced Dork, as shown in Figures 12.28 and 12.29
Figure 12.28
Figure 12.29
Some of these options are explained in Table 12.3
Trang 5Table 12.3 Advanced Dork Options
Highlight Text Functions Right click to choose from over 15 advanced
Google operators This function can be dis-abled in the options menu
Right-Click HTML Page Info Right click anywhere on a page with no text
selected, and Advanced Dork will focus on the page’s HTML title and ALT tags for searching using the intitle and allintext operators, respec-tively This function can be disabled in the options menu
Right-Click Links Right click on a link to choose from site: links
domain, link: this link, and cache: this link Site: links domain will only search the domain name, not the full url
Right Click URL Bar Right click the URL Bar (Address Bar) and
choose from site, inurl, link, and cache Inurl works with the highlighted portion of text only Site will only search the domain name, not the full url
Advanced Dork is an amazing tool for any serious Google user.You should definitely add it to your arsenal
Getting Help from Google
So far we’ve looked at various ways of checking your site for potential information leaks, but what can you do if you detect such leaks? First and foremost, you should remove the
offending content from your site.This may be a fairly involved process, but to do it right,
you should always figure out the source of the leak, to ensure that similar leaks don’t happen
in the future Information leaks don’t just happen; they are the result of some event that
occurred Figure out the event, resolve it, and you can begin to stem the source of the
problem Google makes a great Web page available that helps answer some of the most com-monly asked questions from a Webmaster’s perspective.The “Google Information for
Webmasters” page, located at www.google.com/webmasters, lists all sorts of answers to
com-monly asked questions
Solving the local problem is only half the battle In some cases, Google has a cached copy of your information leak just waiting to be picked up by a Google hacker.There are
two ways you can delete a cached version of a page.The first method involves the automatic URL removal system at http://www.google.com/webmasters/tools/removals.This page,
shown in Figure 12.30, requires that you first verify your e-mail address Although this
Trang 6appears to be a login for a Google account, Google accounts don’t seem to provide you access In most cases, you will have to reregister, even if you have a Google account.The exception seems to be Google Groups accounts, which appear to allow access to this page without a problem
Figure 12.30 Google’s Automatic URL Removal Tool
The URL removal tool will walk you through a series of questions that will verify your ownership of the content and determine what it is that you are trying to remove Each of the options is fairly self-explanatory, but remember that the responsibility for content removal rests with you.You should ensure that your content is indeed removed from your site, and follow up the URL removal process with manual checks
Trang 7The subject of Web server security is too big for any one book.There are so many varied
requirements combined with so many different types of Web server software, application
software, and operating system software that no one book could do the topic justice
However, a few general principles can at least help you prevent the devastating effects a
malicious Google hacker could inflict on a site you’re charged with protecting
First, understand how the Web server software operates in the event of an unexpected condition Directory listings, missing index files, and specific error messages can all open up
avenues for offensive information gathering Robots.txt files, simple password authentication,
and effective use of META tags can help steer Web crawlers away from specific areas of your
site Although Web data is generally considered public, remember that Google hackers might
take interest in your site if it appears as a result of a malicious Google search Default pages,
directories and programs can serve as an indicator that there is a low level of technical
know-how behind a site Servers with this type of default information serve as targets for
hackers Get a handle on what, exactly; a search engine needs to know about your site to
draw visitors without attracting undue attention as a result of too much exposure Use any
of the available tools, such as Gooscan, Athena, Wikto, GSI, Google Rower and Advanced
Dork, to help you search Google for your site’s information leaks If you locate a page that
shouldn’t be public, use Google’s removal tools to flush the page from Google’s database
Solutions Fast Track
A Good, Solid Security Policy
■ An enforceable, solid security policy should serve as the foundation of any security effort
■ Without a policy, your safeguards could be inefficient or unenforceable
Web Server Safeguards
■ Directory listings, error messages, and misconfigurations can provide too much information
■ Robots.txt files and specialized META tags can help direct search engine crawlers
away from specific pages or directories
■ Password mechanisms, even basic ones, keep crawlers away from protected content
■ Default pages and settings indicate that a server is not well maintained and can make that server a target
Trang 8Hacking Your Own Site
■ Use the site operator to browse the servers you’re charged with protecting Keep an
eye out for any pages that don’t belong
■ Use a tool like Gooscan, Athena, GSI , Google Rower or Advanced Dork to assess your exposure.These tools do not use the Google API, so be aware that any blatant abuse or excessive activity could get your IP range cut off from Google
■ Use a tool like Wikto, which uses the Google API and should free you from fear of getting shut down
■ Use the Google Hacking Database to monitor the latest Google hacking queries Use the GHDB exports with tools like Gooscan, Athena, or Wikto
Getting Help from Google
■ Use Google’s Webmaster page for information specifically geared toward Webmasters
■ Use Google’s URL removal tools to get sensitive data out of Google’s databases
Links to Sites
■ http://johnny.ihackstuff.com The home of the Google Hacking Database (GHDB), the search engine hacking forums, the Gooscan tool, and the GHDB export files
■ www.snakeoillabs.com Home of Athena
■ http://www.seorank.com/robots-tutorial.htm A good tutorial on using the robots.txt file
http://googleblog.blogspot.com/2007/02/robots-exclusion-protocol.html Information about Google’s Robots policy
http://www.microsoft.com/technet/archive/security/chklist/iis5cl.mspx The IIS 5.0 Security Checklist
http://technet2.microsoft.com/windowsserver/en/library/ace052a0-a713-423e-8e8c-4bf198f597b81033.mspx The IIS 6.0 Security Best Practices
http://httpd.apache.org/docs/2.0/misc/security_tips.html Apache Security Tips document
Trang 9www.sensepost.com/research/aura Sensepost’s AURA, which simulates Google SOAP API calls
http://www.tankedgenius.com Home of JeffBall and Cp’s GSI and Google Rower tools
https://addons.mozilla.org/en-US/firefox/addon/2144 Home of Cp’s Advanced Dork
Q: What is the no-cache pragma? Will it keep my pages from caching on Google’s servers?
A: The no-cache pragma is a META tag that can be entered into a document to instruct
the browser not to load the page into the browser’s cache.This does not affect Google’s caching feature; it is strictly an instruction to a client’s browser See
www.htmlgoodies.com/beyond/nocache.html for more information
Q: I’d like to know more about securing Web servers Can you make any
recommendations?
A:
Q: Can you provide any more details about securing IIS?
A: Microsoft makes available a very nice IIS Security Planning Tool.Try a Google search for
IIS Security Planning Tool Microsoft also makes available an IIS 5 security checklist;
Google for IIS 5 services checklist An excellent read pertaining to IIS 6 can be found with
a query like “elements of IIS security” Also, frequent the IIS Security Center.Try querying for IIS security center.
Q: Okay, enough about IIS What about securing Apache servers?
A: Securityfocus.com has a great article, “Securing Apache: Step-by-Step,” available from
www.securityfocus.com/infocus/1694
Frequently Asked Questions
The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts To have
your questions about this chapter answered by the author, browse to www.
syngress.com/solutions and click on the “Ask the Author” form
Trang 10Q: Which is the best tool for checking my Google exposure?
A: That’s a tough question, and the answer depends on your needs.The absolute most
through way to check your Web site’s exposure is to use the site operator A query such
as site:gulftech.org will show you all the pages on gulftech.org that Google knows about.
By looking at each and every page, you’ll absolutely know what Google has on you Repeat this process once a week
If this is too tedious, you’ll need to consider an automation tool A step above the
site technique is Athena Athena reads the full contents of the GHDB and allows you to step through each query, applying a site value to each search.This allows you to step
through the comprehensive list of “bad searches” to see if your site is affected Athena does not use the Google API but is not automated in the truest sense of the word Gooscan is potentially the biggest Google automation offender when used improperly, since it is built on the GHDB and will crank through the entire GHDB in fairly short order It does not use the Google API, and Google will most certainly notice you using
it in its wide-open configuration.This type of usage is not recommended, since Google could make for a nasty enemy, but when Gooscan is used with discretion and respect for the spirit of Google’s no-automation rule, it is a most thorough automated tool