Once the names are read into the list and forced to lowercase, removing duplicates is trivial: //: C10:RemoveDuplicates.cpp // Remove duplicate names from a mailing list long before = n
Trang 1<input type="submit" name="submit"
value="Remove Address From C++ Mailing List">
Each form contains one data-entry field called email-address, as well as a couple of hidden
fields which don’t provide for user input but carry information back to the server nonetheless
The subject-field tells the CGI program the subdirectory where the resulting file should be placed The command-field tells the CGI program whether the user is requesting that they be added or removed from the list From the action, you can see that a GET is used with a
program called mlm.exe (for “mailing list manager”) Here it is:
const string contact("Bruce@EckelObjects.com");
// Paths in this program are for Linux/Unix You
// must use backslashes (two for each single
// slash) on Win32 servers:
const string rootpath("/home/eckel/");
Trang 2cout << "<h2>You cannot use white space "
"in your email address" << endl;
return 0;
}
if(email.find('@') == string::npos) {
cout << "<h2>You must use a proper email"
" address including an '@' sign" << endl;
return 0;
}
if(email.find('.') == string::npos) {
cout << "<h2>You must use a proper email"
" address including a '.'" << endl;
out << email << endl;
cout << "<br><H2>" << email << " has been ";
if(query["command-field"] == "add")
cout << "added";
else if(query["command-field"] == "remove")
Trang 3cout << "removed";
cout << "<br>Thank you</H2>" << endl;
} ///:~
Again, all the CGI work is done by the CGImap From then on it’s a matter of pulling the
fields out and looking at them, then deciding what to do about it, which is easy because of the
way you can index into a map and also because of the tools available for standard strings
Here, most of the programming has to do with checking for a valid email address Then a file name is created with the email address as the name and “.add” or “.remove” as the extension, and the email address is placed in the file
Maintaining your list
Once you have a list of names to add, you can just paste them to end of your list However, you might get some duplicates so you need a program to remove those Because your names may differ only by upper and lowercase, it’s useful to create a tool that will read a list of names from a file and place them into a container of strings, forcing all the names to
lowercase as it does:
//: C10:readLower.h
// Read a file into a container of string,
// forcing each line to lower case
inline char downcase(char c) {
using namespace std; // Compiler bug
Trang 4sized buffer, which is more fragile
Once the names are read into the list and forced to lowercase, removing duplicates is trivial: //: C10:RemoveDuplicates.cpp
// Remove duplicate names from a mailing list
long before = names.size();
// You must sort first for unique() to work:
Trang 5The sort must be performed so that all duplicates are adjacent to each other Then unique( )
can remove all the adjacent duplicates The program also keeps track of how many duplicate names were removed
When you have a file of names to remove from your list, readLower( ) comes in handy
typedef list<string> Container;
int main(int argc, char* argv[]) {
requireArgs(argc, 3);
Container names, removals;
readLower(argv[1], names);
readLower(argv[2], removals);
long original = names.size();
Container::iterator rmit = removals.begin();
long removed = original - names.size();
cout << "On removal list: " << removals.size()
<< "\n Removed: " << removed << endl;
} ///:~
Here, a list is used instead of a vector (since readLower( ) is a template, it adapts) Although there is a remove( ) algorithm that can be applied to containers, the built-in list::remove( ) seems to work better The second command-line argument is the file containing the list of names to be removed An iterator is used to step through that list, and the list::remove( )
function removes every instance of each name from the master list Here, the list doesn’t need
to be sorted first
Unfortunately, that’s not all there is to it The messiest part about maintaining a mailing list is the bounced messages Presumably, you’ll just want to remove the addresses that produce bounces If you can combine all the bounced messages into a single file, the following
program has a pretty good chance of extracting the email addresses; then you can use
RemoveGroup to delete them from your list
Trang 6//: C10:ExtractUndeliverable.cpp
// Find undeliverable names to remove from
// mailing list from within a mail file
// containing many messages
// The in() function allows you to check whether
// a string in this set is part of your argument
// Calculate array length:
#define ALEN(A) ((sizeof A)/(sizeof *A))
StringSet
starts(start_str, ALEN(start_str)),
Trang 7continues(continue_str, ALEN(continue_str));
int main(int argc, char* argv[]) {
requireArgs(argc, 2,
"Usage:ExtractUndeliverable infile outfile");
FILE* infile = fopen(argv[1], "rb");
FILE* outfile = fopen(argv[2], "w");
const char* delimiters= " \t<>():;,\n\"";
char* name = strtok(buf, delimiters);
A lot of what this program does is read lines looking for string matches To make this
convenient, I created a StringSet class with a member function in( ) that tells you whether any of the strings in the set are in the argument The StringSet is initialized with a constant two-dimensional of strings and the size of that array Although the StringSet makes the code
easier to read, it’s also easy to add new strings to the arrays
Trang 8Both the input file and the output file in main( ) are manipulated with standard I/O, since it’s not a good idea to mix I/O types in a program Each line is read using fgets( ), and if one of them matches with the starts StringSet, then what follows will contain email addresses, until
you see some dashes (I figured this out empirically, by hunting through a file full of bounced
email) The continues StringSet contains strings whose lines should be ignored For each of
the lines that potentially contains an addresses, each address is extracted using the Standard C
Library function strtok( ) and then it is added to the set<string> called names Using a set
eliminates duplicates (you may have duplicates based on case, but those are dealt with by
RemoveGroup.cpp The resulting set of names is then printed to the output file
Mailing to your list
There are a number of ways to connect to your system’s mailer, but the following program just takes the simple approach of calling an external command (“fastmail,” which is part of
Unix) using the Standard C library function system( ) The program spends all its time
building the external command
When people don’t want to be on a list anymore they will often ignore instructions and just reply to the message This can be a problem if the email address they’re replying with is different than the one that’s on your list (sometimes it has been routed to a new or aliased address) To solve the problem, this program prepends the text file with a message that
informs them that they can remove themselves from the list by visiting a URL Since many email programs will present a URL in a form that allows you to just click on it, this can
produce a very simple removal process If you look at the URL, you can see it’s a call to the
mlm.exe CGI program, including removal information that incorporates the same email
address the message was sent to That way, even if the user just replies to the message, all you have to do is click on the URL that comes back with their reply (assuming the message is
automatically copied back to you)
Trang 9msg << "To be removed from this list, "
"DO NOT REPLY TO THIS MESSAGE Instead, \n"
"click on the following URL, or visit it "
"using your Web browser This \n"
"way, the proper email address will be "
"removed Here's the URL:\n"
logfile << command << endl;
static int mailcounter = 0;
Trang 10The first command-line argument is the list of email addresses, one per line The names are
read one at a time into the string called name using getline( ) Then a temporary file called m.txt is created to build the customized message for that individual; the customization is the
note about how to remove themselves, along with the URL Then the message body, which is
in the file specified by the second command-line argument, is appended to m.txt Finally, the command is built inside a string: the “-F” argument to fastmail is who it’s from, the “-r”
argument is who to reply to The “-s” is the subject line, the next argument is the file
containing the mail and the last argument is the email address to send it to
You can start this program in the background and tell Unix not to stop the program when you sign off of the server However, it takes a while to run for a long list (this isn’t because of the program itself, but the mailing process) I like to keep track of the progress of the program by sending a status message to another email account, which is accomplished in the last few lines
information could be stored in a uniform format, in a subdirectory specified by a hidden field
in the HTML form, and in a file that included the user’s email address – of course, in the general case the email address doesn’t guarantee uniqueness (the user may post more than one submission) so the date and time of the submission can be mangled in with the file name to make it unique If you can do this, then you can create a new data-collection page just by defining the HTML and creating a new subdirectory on your server For example, every time I come up with a new class or workshop, all I have to do is create the HTML form for signups –
no CGI programming is required
The following HTML page shows the format for this scheme Since a CGI POST is more general and doesn’t have any limit on the amount of information it can send, it will always be
used instead of a GET for the ExtractInfo.cpp program that will implement this system
Although this form is simple, yours can be as complicated as you need it
Trang 11<form action="/cgi-bin/ExtractInfo.exe"
method="POST">
<input type="hidden" name="subject-field"
value="test-extract-info">
<input type="hidden" name="reminder"
value="Remember your lunch!">
<input type="hidden" name="test-field"
<p>Email address (Required): <input
type="text" size="45" name="email-address" >
</p>Comment:<br>
<textarea name="Comment" rows="6" cols="55">
</textarea>
<p><input type="submit" name="submit">
<input type="reset" name="reset"</p>
</form><hr></body></html>
///:~
Right after the form’s action statement, you see
<input type="hidden"
This means that particular field will not appear on the form that the user sees, but the
information will still be submitted as part of the data for the CGI program
The value of this field named “subject-field” is used by ExtractInfo.cpp to determine the
subdirectory in which to place the resulting file (in this case, the subdirectory will be extract-info”) Because of this technique and the generality of the program, the only thing you’ll usually need to do to start a new database of data is to create the subdirectory on the
“test-server and then create an HTML page like the one above The ExtractInfo.cpp program will
do the rest for you by creating a unique file for each submission Of course, you can always change the program if you want it to do something more unusual, but the system as shown will work most of the time
The contents of the “reminder” field will be displayed on the form that is sent back to the user when their data is accepted The “test-field” indicates whether to dump test information to the resulting Web page If “mail-copy” exists and contains anything other than “no” the value string will be parsed for mailing addresses separated by ‘;’ and each of these addresses will get a mail message with the data in it The “email-address” field is required in each case and the email address will be checked to ensure that it conforms to some basic standards
The “confirmation” field causes a second program to be executed when the form is posted This program parses the information that was stored from the form into a file, turns it into
Trang 12human-readable form and sends an email message back to the client to confirm that their information was received (this is useful because the user may not have entered their email address correctly; if they don’t get a confirmation message they’ll know something is wrong) The design of the “confirmation” field allows the person creating the HTML page to select more than one type of confirmation Your first solution to this may be to simply call the program directly rather than indirectly as was done here, but you don’t want to allow someone else to choose – by modifying the web page that’s downloaded to them – what programs they can run on your machine
Here is the program that will extract the information from the CGI request:
//: C10:ExtractInfo.cpp
// Extracts all the information from a CGI POST
// submission, generates a file and stores the
// information on the server By generating a
// unique file name, there are no clashes like
// you get when storing to a single file
const string contact("Bruce@EckelObjects.com");
// Paths in this program are for Linux/Unix You
// must use backslashes (two for each single
// slash) on Win32 servers:
const string rootpath("/home/eckel/");
void show(CGImap& m, ostream& o);
// The definition for the following is the only
// thing you must change to customize the program
void
store(CGImap& m, ostream& o, string nl = "\n");
int main() {
cout << "Content-type: text/html\n"<< endl;
Post p; // Collect the POST data
Trang 13cout << "<h2>You cannot include white space "
"in your email address" << endl;
return 0;
}
if(email.find('@') == string::npos) {
cout << "<h2>You must include a proper email"
" address including an '@' sign" << endl;
return 0;
}
if(email.find('.') == string::npos) {
cout << "<h2>You must include a proper email"
" address including a '.'" << endl;
return 0;
}
// Create a unique file name with the user's
// email address and the current time in hex
const int bsz = 1024;
char fname[bsz];
time_t now;
time(&now); // Encoded date & time
sprintf(fname, "%s%X.txt", email.c_str(), now);
string path(rootpath + query["subject-field"] +
Trang 14out << "///{" << path << endl;
// Display optional reminder:
if(query["reminder"].size() != 0)
cout <<"<H1>" << query["reminder"] <<"</H1>";
show(query, cout); // For results page
store(query, out); // Stash data in file
cout << "<br><H2>Your submission has been "
"posted as<br>" << fname << endl
<< "<br>Thank you</H2>" << endl;
out.close();
// Optionally send generated file as email
// to recipients specified in the field:
recipients.push_back(to); // Last one
// "fastmail" only available on Linux/Unix:
for(int i = 0; i < recipients.size(); i++) {
// Execute a confirmation program on the file
// Typically, this is so you can email a
// processed data file to the client along with
Trang 15// ampersand runs it as a separate process:
if(name != "email-address" &&
name != "confirmation" &&
name != "submit" &&
name != "mail-copy" &&
name != "test-field" &&
// Change this to customize the program:
void store(CGImap& m, ostream& o, string nl) {
if(name != "email-address" &&
name != "confirmation" &&
name != "submit" &&
name != "mail-copy" &&
name != "test-field" &&
name != "reminder")
Trang 16o << nl << "[{[" << name << "]}]" << nl
<< "[([" << nl << value << nl << "])]"
<< nl;
// Delimiters were added to aid parsing of
// the resulting text file
}
} ///:~
The program is designed to be as generic as possible, but if you want to change something it
is most likely the way that the data is stored in a file (for example, you may want to store it in
a comma-separated ASCII format so that you can easily read it into a spreadsheet) You can
make changes to the storage format by modifying store( ), and to the way the data is
displayed by modifying show( )
main( ) begins using the same three lines you’ll start with for any POST program The rest of the program is similar to mlm.cpp because it looks at the “test-field” and “email-address”
(checking it for correctness) The file name combines the user’s email address and the current
date and time in hex – notice that sprintf( ) is used because it has a convenient way to convert
a value to a hex representation The entire file and path information is stored in the file, along with all the data from the form, which is tagged as it is stored so that it’s easy to parse (you’ll see a program to parse the files a bit later) All the information is also sent back to the user as
a simply-formatted HTML page, along with the reminder, if there is one If “mail-copy” exists and is not “no,” then the names in the “mail-copy” value are parsed and an email is sent to each one containing the tagged data Finally, if there is a “confirmation” field, the value selects the type of confirmation (there’s only one type implemented here, but you can easily add others) and the command is built that passes the generated data file to the program (called
ProcessApplication.exe) That program will be created in the next section
Parsing the data files
You now have a lot of data files accumulating on your Web site, as people sign up for
whatever you’re offering Here’s what one of them might look like:
Trang 17This is a brief example, but there are as many fields as you have on your HTML form Now,
if your event is compelling you’ll have a whole lot of these files and what you’d like to do is automatically extract the information from them and put that data in any format you’d like
For example, the ProcessApplication.exe program mentioned above will use the data in an
email confirmation message You’ll also probably want to put the data in a form that can be
Trang 18easily brought into a spreadsheet So it makes sense to start by creating a general-purpose tool
that will automatically parse any file that is created by ExtractInfo.cpp:
DataPair(istream& in) { get(in); }
DataPair& get(istream& in);
string filePath, email;
// Parse the data from a file:
FormData(char* fileName);
void dump(ostream& os = cout);
string operator[](const string& key);
}; ///:~
The DataPair class looks a bit like the CGIpair class, but it’s simpler When you create a DataPair, the constructor calls get( ) to extract the next pair from the input stream The operator bool indicates an empty DataPair, which usually signals the end of an input stream FormData contains the path where the original file was placed (this path information is stored within the file), the email address of the user, and a vector<DataPair> to hold the information The operator[ ] allows you to perform a map-like lookup, just as in CGImap
Here are the definitions:
Trang 19getline(in,ln);
while(ln.find("[{[") == string::npos)
if(!getline(in, ln)) return *this; // End
first = ln.substr(3, ln.find("]}]") - 3);
getline(in, ln); // Throw away [([
int begin = strlen("From[");
int end = email.find("]");
int length = end - begin;
email = email.substr(begin, length);
// Get the rest of the data:
Trang 20void FormData::dump(ostream& os) {
os << "filePath = " << filePath << endl;
os << "email = " << email << endl;
for(iterator i = begin(); i != end(); i++)
by a begin-marker of “[([” and an end-marker of “])]”) which it places in the first and second
members, respectively
The FormData constructor is given a file name to open and read The FormData object
always expects there to be a file path and an email address, so it reads those itself before
getting the rest of the data as DataPairs
With these tools in hand, extracting the data becomes quite easy:
const string from("Bruce@EckelObjects.com");
const string replyto("Bruce@EckelObjects.com");
const string basepath("/home/eckel");
Trang 21int main(int argc, char* argv[]) {
requireArgs(argc, 1);
FormData fd(argv[1]);
char tfname[L_tmpnam];
tmpnam(tfname); // Create a temporary file name
string tempfile(basepath + tfname + fd.email);
ofstream reply(tempfile.c_str());
assure(reply, tempfile.c_str());
reply << "This message is to verify that you "
"have been added to the list for the "
<< fd["subject-field"] << " Your signup "
"form included the following data; please "
"ensure it is correct You will receive "
"further updates via email Thanks for your "
"interest in the class!" << endl;
// "fastmail" only available on Linux/Unix:
string command("fastmail -F " + from +
" -r " + replyto + " -s \"" +
fd["subject-field"] + "\" " +
tempfile + " " + fd.email);
system(command.c_str()); // Wait to finish
remove(tempfile.c_str()); // Erase the file
} ///:~
This program first creates a temporary file to build the email message in Although it uses the
Standard C library function tmpnam( ) to create a temporary file name, this program takes
the paranoid step of assuming that, since there can be many instances of this program running
at once, it’s possible that a temporary name in one instance of the program could collide with the temporary name in another instance So to be extra careful, the email address is appended onto the end of the temporary file name
The message is built, the DataPairs are added to the end of the message, and once again the Linux/Unix fastmail command is built to send the information An interesting note: if, in Linux/Unix, you add an ampersand (&) to the end of the command before giving it to
system( ), then this command will be spawned as a background process and system( ) will immediately return (the same effect can be achieved in Win32 with start) Here, no
ampersand is used, so system( ) does not return until the command is finished – which is a
good thing, since the next operation is to delete the temporary file which is used in the
command