1. Trang chủ
  2. » Công Nghệ Thông Tin

professional perl programming wrox 2001 phần 10 doc

120 212 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Perl And Bidi
Thể loại Tài liệu
Định dạng
Số trang 120
Dung lượng 1,69 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

".str"; # generate filename from language nameopenLANGFILE, "$filename"; #read in the markers & values in a hashwhile { chomp$_; $marker, $value = split"\t", $_; $valueof{$marker} = $val

Trang 1

Every paragraph has a default embedding level, and thus a default direction associated with it This is

also called the base direction of the paragraph For example:

A paragraph with a beginning like this in the Latin script would have a defaultembedding level as Level 0, and hence its base direction would be left to right

What the bidi Algorithm Does

The bidi algorithm uses all these formatting codes and embedding levels for analyzing text to decidehow it should be rendered Here is briefly how it goes about doing it:

❑ It breaks up the text into paragraphs by locating the paragraph separators This is necessarybecause all the directional formatting codes are only effective within a paragraph

Furthermore this is where the base direction is set The rest of the algorithm treats the text on

Perl and bidi

Since Perl is a language frequently used for text processing, it is natural that Perl should have bidicapabilities We have an implementation of the bidi algorithm on Linux that can be used by Perl

We require a C library named FriBidi, which is basically a free implementation of the bidi algorithm,

written by Dov Grobgeld A Perl module has also been written by the same author, acting as aninterface to the C library and is available as FriBidi-0.03.tar.gz from

http://imagic.weizmann.ac.il/~dov/freesw/FriBidi.The FriBidi module enables us to do the following:

❑ Convert an ISO 8859-8 string to a FriBidi Unicode string:

Trang 2

❑ Convert the string obtained above to an ISO 8859-8 character set:

unicode_to_iso8859_8($toDisplay);

This makes sure that it is in a 'ready-to-display' format, assuming the terminal can displayISO 8859-8 characters (such as xterm)

❑ Translate a string from a FriBidi Unicode string to capRTL and vice versa:

caprtl_to_unicode($capRTLString);

unicode_to_caprtl($fribidiString);

The capRTL format is where the CAPITAL LETTERS are mapped as having a strong right to left character property (RTL) This format is frequently used for illustrating bidi properties on

displays with limited ability, such as ASCII-only displays

The following is a small example to demonstrate FriBidi's capabilities

First, we create a small file with the following text, named bidisample:

THUS, SAID THE CAMEL TO THE MEN, " there is more than one way to do it."

AND THE MEN REPLIED " now we see what you mean by bidi",

RISING WITH CONTENTMENT WRIT ON THEIR FACES

This is the code to render the above file in bidi fashion:

chop; # remove line separator

$uniStr = caprtl_to_unicode ( $_ ); # convert line to FriBidi string

$visStr = log2vis ( $uniStr ); # run it through the bidi algorithm

$outStr = unicode_to_caprtl ( $visStr ); # convert it back to format

# that can be displayed on

# usual ASCII terminalprint $outStr,"\n";

}

> perl bidirender.pl

"theres more than one way to do it " ,NEM EHT OT LEMAC EHT DIAS SUHT

,"now we see what you mean by bidi " DEILPER NEM EHT DNA

.SECAF RIEHT NO TIRW TNEMTNETNOC HTIW GNISIR

Trang 3

Perl, I18n and Unicode

Now let us take a brief look at a solution to the problem of language barriers A more extensive view ofinternationalization can be found in Chapter 26 Unicode helps us out in this matter, by providing auniform way of representing all possible characters of all the living languages in this world

We are about to see how easy it is to enable people all over the world to understand what we are saying

in their own language This example may be tried out by anyone with a day or two of Perl experience.Although it is in no way complete, with no error checking and pretense of handling any real-worldcomplexity, it demonstrates the ease with which Perl handles Unicode Let us imagine the followingscenario:

An airport wants to have information kiosks at various locations outside the arrival lounge for foreigntourists They need the information to be displayed in Arabic, Japanese, Russian, Greek, English,Spanish, Portuguese, and a whole host of other languages They would like the kiosks to enable the user

to view information about the city, events, weather, flight schedule, sight-seeing tours, and also be able

to make and confirm reservations in affiliated hotels

Our task here is obviously to create a Perl program that is able to handle Unicode and, therefore, to

an extent, solve this problem The first thing we need to do is create a template HTML file containing

a few HTML tags, but with the text replaced by text 'markers' – M1, M2, M3, and so on We one filefor each language, in the following format (obviously, all the files should contain Unicode text encoded

in UTF-8):

M1:charset "string corresponding to charset"

M2:title "string corresponding to title"

M3:heading "string corresponding to heading"

M4:text "text string"

To put the task in another way, we need to write a program that takes in the language name as the inputand accordingly generates a file called outfile.html by filling in the template file with the strings inthe language requested The outputted file should be UTF-8 encoded Unicode

This involves a few things such as installing Unicode fonts, installing a Unicode editor, creating

template HTML files, writing scripts, and so on Let us look at these step-by-step

Installing Unicode Fonts

For UNIX with the X Window System and Netscape Navigator, information regarding Unicode

fonts for X11 in general can be found on http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html The latestversion of the UCS fonts package is available from http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz

For Windows with IE 5.5, Unicode fonts can be selected during installation or can be downloaded fromhttp://www.microsoft.com/typography/multilang/default.htm Another good place for links to fonts ishttp://www.ccss.de/slovo/unifonts.htm

Installing a Unicode Editor

For UNIX with the X Window System Yudit is a good choice of an editor that supports UTF-8.

Available from http://www.yudit.org/

For Windows 95 and 98 Sharmahd Computing's UniPad is a good editor, available from

http://www.sharmahd.com/unipad/ For Windows NT and 2000, Notepad is able to handle Unicode.

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 4

Creating the HTML Template

Now we can go about creating the template HTML files and the string resource files The next script issimply an HTML template, called templateLeft.html that we will use with our program:

to left This is the template templateRight.html that we will use with right-to-left languages:

Trang 5

The following image is a screenshot of a sample string file for the Arabic language Note the rendering

of Arabic text from left to right:

Processing the Resource Files

The fourth stage in our solution to the problem is creating the Perl script This script will process theresource files it is given and generate the localized pages:

#!/usr/bin/perl

# Xlate.pluse warnings;

use strict;

my ($langname,$filename, $marker, $mark, $value, $wholefile, $thisval, $template,

%valueof);

print "Enter the language for the output html file: \n";

$langname = lc<>; # get language name and turn it into lowecasechomp $langname;

$filename = $langname ".str"; # generate filename from language nameopen(LANGFILE, "$filename");

#read in the markers & values in a hashwhile(<LANGFILE>) {

chomp($_);

($marker, $value) = split("\t", $_);

$valueof{$marker} = $value;

}close(LANGFILE);

# use the correct template

$wholefile=join('', <TMPLT>); # slurp entire file into a string

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 6

close TMPLT;

foreach $mark (keys %valueof) {

$thisval = $valueof{$mark}; # get the value related to the marker

$wholefile =~ s/$mark/$thisval/g; # do the replacement}

print OUTFILE $wholefile; # write out complete langname.html file

print "output written to $langname.html \n";

close OUTFILE;

This is the big surprise – the script looks too simple In fact, no extra processing is required to handleUnicode

Running the Script

Now we can execute the code in the usual way and provide the language required:

output written to arabic.html

The Output Files

After running the script and having it produce the *.html files, we can open them in a browser and seewhat has been written

The following is a screenshot of the english.html file generated by the script:

Trang 7

Next, is the arabic.html file generated by the script Note the direction of Arabic script as rendered

by the browser:

There are more examples of the same phrase written in different languages at

http://www.trigeminal.com/samples/provincial.html, thanks to Michael Kaplan A few more such sitesare http://www.columbia.edu/kermit/utf8.html (hosted by the Kermit Project),

http://www.unicode.org/unicode/standard/WhatIsUnicode.html (hosted by the Unicode Consortium)and http://hcs.harvard.edu/~igp/glass.html (hosted by the IGP)

This simple method of replacing text markers is still widely used However, localizing a large web sitetakes much more than just being able to handle Unicode strings Things such as cultural preferences,date format, currency (which are covered in Chapter 26) need to be taken into consideration Thismeans we should probably turn to using methods such as HTML::Template, HTML::Mason,

HTML::Embperl or maybe something like XML::Parser in order to create an industrial strengthmultilingual site

Work in Progress

There are still a few things about Unicode support in Perl that are under development For instance, it isnot possible right now to determine if an arbitrary piece of data is UTF-8 encoded or not We cannotforce the encoding to be used when performing I/O to anything other than UTF-8, and will have aproblem if the pattern during a match does not contain Unicode, but the string to be matched atruntime does Also the use utf8 pragma is on its way out In order to follow the current state ofUnicode support in Perl, one can join the Perl-unicode mailing list by sending a blank message tomajordomo@perl.org with a subject line saying subscribe perl-unicode

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 8

❑ Seen how people have tackled the issue of providing an international coding system.

❑ Looked at how Unicode can be used in regular expressions and tried our hand at writing ourown character property

❑ Demonstrated how Perl can be used to deal with texts in languages that are written from right

to left as opposed to left to right such as English

❑ Provided a real world example of how we can deal with language barriers across the world

Trang 9

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 11

Locale and Internationalization

Not everybody understands English, and not everybody expresses dates in the standard US format forexample, 2/20/01 Instead, people worldwide speak hundreds of different languages and expressthemselves in almost as many alphabets Furthermore, even if they do speak the same language, theymay have different ways of expressing dates, times, and so on This chapter aims to provide us with thetools to write Perl programs in many different languages, and to see the kinds of problems we mayencounter along the way

We will take a look at the kinds of ways we can develop a multilingual application in Perl, be it for aweb site or for something entirely different Going multilingual does not only mean expressing messages

in different languages, but also adapting the minor details of our site, for instance, to obey theconventions of other cultures To use the example mentioned above, it means that we must change thepresentation of a date or time Catering for other languages will always be a good thing to do, attractingmore visitors to our website

As an example, we may be surprised with the results if we translated a personal home page fromSpanish into German Its popularity and number of hits would rise rapidly, and visitors will beespecially surprised at how the page may be instantly translated into German – if someone in Germanyaccessed it The same goes for all multilingual websites: the site does its best to guess the country oforigin of the user, and show messages accordingly For instance, if we connect to Google from a Spanishdomain (.es), for example melmac.ugr.es we will immediately get the URL

http://www.google.com/intl/es/, which is in Spanish This is a neat trick that deduces the location from

the Top-Level Domain (TLD) of our machine, and it will score points with the Spanish-speaking

population

However, localization is not as easy as showing two different pages depending on where the client

comes from It is also a matter of knowing the language that the user, or more correctly, the user'sapplication's client, is immersed in The site will then show information in such a way that the client,and therefore the user, can understand it For example, a quantity such as 3'5 will not mean much to anEnglish person, but it means 3.5 to a Spaniard The application must be aware of its cultural 'location',and act accordingly

Team-Fly®

Trang 12

If we live in a country whose native language is not English, undergoing this process is a must We willneed to use it for undertaking tasks as simple as alphabetical sorting, showing quantities, or matchingregular expressions Our program will not work correctly, at least from the point of view of the user, if

we do not use the right locale settings.

The good news is that many people have already thought about this problem in depth There areseveral frameworks that make writing multilingual and localized applications easier It will come as evenbetter news to readers of this book, that Perl has excellent support for this, as we shall soon see

In this chapter, we will see several ways of creating Perl applications for a multilingual environment,from simple tricks such as storing messages in different languages, and sorting according to local uses, torecognizing a foreign language or conjugating foreign verbs At this early stage it is important to notethat most of the programs presented in this chapter will not work in non-POSIX machines, includingWin9x, and Macs

Why Go Locale?

Suppose we want to create a Spanish web site to show the number of firms investing in a particularventure Users should be able to access the site and find out the names of the primary investors, and thesite should be designed accordingly One basic hurdle to overcome would be the need to show a list ofthose firms in alphabetical order

First, we need to create a plain text file in which we can list the firms Note that these are in no

alphabetical order We call this file firms.txt:

> perldoc perllocale

Trang 13

> perl -e 'use locale; print sort <>;' firms.txt

be a different letter, but at least accented vowels were not alphabetized as different characters from theirunaccented counterparts The bad ordering of the 'ch' could be a bug in the locale implementation ofour system (or maybe a bug in our understanding of the local alphabetization rules, as we will see lateron) It could be improved for other Spanish locale implementations, so in order to fix it, we have todelve a little deeper into what locale actually means

The locale framework is concurrent with the phrase 'When in Rome, do as the Romans do' – computershave to use the local alphabet and local numbers This is the reason why localization was included intothe POSIX standard, (that is, the set of functions and files that should be understood by all UNIXplatforms) There are some non-UNIX platforms, such as Windows NT (which has a slightly flawedimplementation of the POSIX standard) which can use the locale set of functions, also known as NLS

(for National Language Support) If one arrives at Perl from the C field, all functions, constants etc.

related to locale are in the locale.h header file

This header distributes localization elements into several categories:

LC_COLLATE Collation or sorting

LC_CTYPES Character types, distinguishing whether a character is alphanumeric or not

LC_MONETARY Handling monetary amounts

LC_TIME Displaying time

LC_MESSAGES Messages, not used in Perl by default

There are a few more categories that are not so widely used, whose names should give away theirmeanings: LC_PAPER, LC_NAME, LC_ADDRESS, and LC_MEASUREMENT, LC_IDENTIFICATION

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 14

These constants can be used as environment variables, or else using the POSIX setlocale function.This means we have to use POSIX to have them available in our program Going back to our problemregarding the alphabet of the Spanish language, we can locate a locale definition on the Internet, whicheffectively includes the traditional sorting, that is, 'ch' after 'c' One place in which such a locale

definition can be found, amongst many others, is under the name

There is a more complete list of locales for Linux available from IBM DeveloperWorks, at

http://oss.software.ibm.com/developerworks/opensource/locale/download.html

This includes locales such as Maltese, which will be required later These locales are still in beta stages,

and thus look set to change considerably in the future

After installing the package, we can create the following short script:

#!/usr/bin/perl

# sort.pl

use warnings;

use strict;

use POSIX qw(LC_COLLATE setlocale strcoll);

setlocale(LC_COLLATE, 'es_US');

print sort {strcoll($a, $b);} <>;

When we run this program we will obtain the following output:

> perl sort.pl firms.txt

Before we continue, this program probably needs a bit of explanation Instead of using the use locale

pragma, it relies on POSIX calls to do sorting correctly By default, locale uses the locale settingscontained in the LC_* UNIX environment variables However, in this case we want to be able to change

locale in runtime The POSIXsetlocale function allows us to do this; we take the locale category

we want to change as the first argument We then take the locale we want to change it to as the secondargument It must be a valid locale, that is, it must correspond to a file already installed on our system

If it works correctly, it returns the name of the locale it has been set to; in this case, es_US The reasonfor using this locale is that, for some strange reason, it seems to be the only one to use the 'traditional'Spanish ordering

Trang 15

The problem with setting locales using this function (instead of the pragma), is that despite what is said

in the perllocale documentation, the Perl functions cmp and sort do not use it by default For thisreason we need to use an alternative form of sort, sort {expr}, and the POSIX function strcoll,which compares (or collates, hence the name) using the setting specified in the setlocale call

Delving Deeper into Local Culture

Now, suppose we have a scenario where a user views the site and wants to add another firm to the list ofinvestors Using the current code, they would have to undertake this process in Spanish, thereforeeliminating the likelihood of nonSpanish speaking firms investing through the site At first this mayseem a difficult task because it is impossible to find out where the user is from, since Spanish

autonomous regions do not have their own top level domains It would however, be possible if therewas only one user in each region responsible for liaising through our web site

So, how can we use those different locales? To start with, we need to find the way locales are called.Locales usually have three, sometimes four parts, written in the following way:

xx_YY.charset@dialect Breaking this down would perhaps make it easier to understand

❑ xx represents the language

❑ YY represents the country

❑ charset is the character set, such as ISO8859-15 (for Latin languages) or KOI8 (forRussian)

❑ @dialect is a particular modality, for instance, a dialectal variety, such as no@nynorsk inNorwegian It also represents special symbols such as ca_ES@euro (which is a variant of the

ca_ES locale including the euro symbol)

Following this form, a typical Spanish locale would be es_ES.iso885915, or es_ES@euro, which can

be found in any of the above-mentioned web sites if they are not already installed in the system On thesame page we can find locales for Basque, Catalan and Galician – the three other official languages inSpain

Getting back to the example, we decide to greet the visitors to our site in their own language and showthem the local time (of our site) using the format favored by them The site will then show them howmuch money each company has invested In order to do this, we start by creating a new plain text filecalled firms_money.txt:

to use the site to find out the amount of money invested in their own currency A more general

solution would be considering the names of the people connecting to our site from different parts ofthe world

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 16

my ($locale, $hello, $notice) = ('es_ES', 'Hola',

'aquí está tu informe para');

if ($clientMap{$clientName}) {

($locale, $hello, $notice) = @{$clientMap{$clientName}};

}

setlocale(LC_ALL, $locale); # use settings

Arguments are read in the lines above, and locales assigned The default value in Spain is (somewhatunsurprisingly) Spanish If the name of the client given as the first argument in the command line is akey in clientMap, the values are assigned to the three variables that represent the locale name, thehello message, and the message to be given as an introduction The locale name is then used in the

setlocale function to assign that value to all categories

The file has been read, and entered into the %investments hash, keyed by the name of the firm

# obtain time info

my ($sec, $min, $hour, $mday, $mon, $year, $wday) = localtime (time);

print "$hello $clientName, $notice ",

strftime("%c", $sec, $min, $hour, $mday, $mon, $year, $wday), "\n";

Trang 17

Currency conversion is used in the lines below The objective of these lines is to show how informationabout the local international currency name and symbol is also contained in the locale settings; theyare retrieved using the localeconv POSIX function This function returns a hash reference, but weonly use the keys int_curr_symbol, which references the three letter international name of thecurrency, such as USD (for US dollars) and ESP (Spanish pesetas), and currency_symbol The

currency_symbol references the symbol of such currency, such as $ for the US dollar (amongstothers), and £ for the British pound

#Set up currency conversion

my $q = Finance::Quote->new;

$q->timeout(60);

my $lconv = localeconv(); chop($lconv->{int_curr_symbol});

my $conversion_rate=$q->currency("ESP",$lconv->{int_curr_symbol}) || 1.0;

for (sort {strcoll($a,$b);} keys %investments) {printf

("%s %.2f ESP %.2f %s %s \n",

$_, $investments{$_},

$investments{$_}*$conversion_rate,

$lconv->{int_curr_symbol},

$lconv->{currency_symbol}

);

}Now, we can use the script and obtain output such as the following Of course the exact output willdepend on the country we are in, the date, time, currency rates, etc

> perl invest.pl Arnaldo firms_money.txt

Holá Arnaldo, aquí está tu informe para 13 dic 2000 14:56:22 CET

Andalia 4567987.00 ESP 52530160.34 USD $

Ántico 1000000.00 ESP 11499630.00 USD $

Cántaliping 46168.50 ESP 530920.67 USD $

Cantamornings 6669876.00 ESP 76701106.15 USD $

Chilindrina 2000.35 ESP 23003.28 USD $

Cflab.org 123456.70 ESP 1419706.37 USD $

Zinzun.com 33445.00 ESP 384605.13 USD $

> perl invest.pl Patxi firms_money.txt

Kaixo Patxi, egunkaria hemen hire 00-12-13 14:57:38 CET

Andalia 4567987.00 ESP 4567987.00 ESP Pts

Ántico 1000000.00 ESP 1000000.00 ESP Pts

Cántaliping 46168.50 ESP 46168.50 ESP Pts

Cantamornings 6669876.00 ESP 6669876.00 ESP Pts

Chilindrina 2000.35 ESP 2000.35 ESP Pts

Cflab.org 123456.70 ESP 123456.70 ESP Pts

Zinzun.com 33445.00 ESP 33445.00 ESP Pts

> perl invest.pl Orlando firms_money.txt

Holá Orlando, aquí está tu informe para 13 dic 2000 14:57:59 CET

Andalia 4567987.00 ESP 23995.27 USD $

Ántico 1000000.00 ESP 5252.92 USD $

Cántaliping 46168.50 ESP 242.52 USD $

Cantamornings 6669876.00 ESP 35036.33 USD $

Chilindrina 2000.35 ESP 10.51 USD $

Cflab.org 123456.70 ESP 648.51 USD $

Zinzun.com 33445.00 ESP 175.68 USD $

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 18

> perl invest.pl Joe firms_money.txt

Hi Joe, here's your report for Wed 13 Dec 2000 02:59:49 PM CET

Andalia 4567987.00 ESP 24019.30 USD $

Cantamornings 6669876.00 ESP 35071.41 USD $

Chilindrina 2000.35 ESP 10.52 USD $

Cflab.org 123456.70 ESP 649.16 USD $

Cántaliping 46168.50 ESP 242.76 USD $

Zinzun.com 33445.00 ESP 175.86 USD $

Ántico 1000000.00 ESP 5258.18 USD $

Barring bugs such as the alphabetic ordering in Basque (whose alphabet does not include the letter 'c',and thus, no word starting with 'c' can possibly go before 'd') and the dancing of the 'ch' from one place

to another, this should look like something native speakers would feel comfortable with

First thing we put into this program is the Finance::Quote module (available from CPAN) The mainuse of this module is for stock quotes, but it is basically a front end for doing searches over the YahooFinance servers This also means that we must be on-line to take advantage of it Each time we try to get

a currency conversion rate, it is forwarded to a UserAgent This requests the rate from Yahoo,

manipulates it, and gives it back to us It might also take a while, depending on the speed of the

connection, and it could result in some additional errors, depending on the state of the Yahoo site Theprogram would work perfectly without this currency conversion, but the point of including it here is toshow another aspect of localization: local currencies

The subsequent lines create a hash of arrays with the (theoretically unique) names of the customers, andset a different locale, currency, and greeting, with respect to the local language If the name given isnone of the above, it defaults to the es_ES locale and greetings From that line on, the following actiontakes place: locale is set, file is read, and system time is read and converted to local time The

strftime function is used; this function formats time into a string using different options; in this case,

%c instructs it to use the preferred date format for the local settings This format will even translateweekday names, in addition to putting date and time in the local favorite arrangement For instance,Americans prefer 12-hour clocks plus AM/PM, while many Europeans opt for 24 hour clocks

A Finance::Quote object is then created, setting the timeout, so that it will only wait 60 seconds forthe answer The object is used in the next line, where it is asked for the conversion rate between theSpanish peseta (ESP) and local currency If the quantities in the file were in euros, we could have usedthe es_ES@euro locale instead, and conversions would have been made from euros instead of frompesetas

Finally, the hash containing the firm names and investment is printed in three columns: name,

investment in pesetas, and investment translated into the local currency It should be noted, however,that the period (.) is used as separator for decimals in English speaking countries, whilst the comma isused in Latin countries

We could even mix and match several settings; for instance, in the case of our friend, Alberto, who lives

in Miami, we would have used en_US, for currency and numbers, and es_MX (for México), or es_CU

(Cuba) for sorting:

setlocale(LC_COLLATE, 'es_MX');

setlocale(LC_NUMERIC, 'en_US');

setlocale(LC_MONETARY, 'en_US');

setlocale(LC_TIME, 'en_US');

Trang 19

framework for internationalization of programs The idea would be to have a way to store programmessages in several languages, and retrieve them using a key, preferably the short version of themessage In reality there are two of them:

❑ gettext – a GNU standard tool for internationalization It is widely used by many GNUapplications The current version at the time of writing is 0.10.35 It is directly available fromhttp://www.gnu.org/software/gettext/gettext.html and is supported by several tools, and an

EMACS mode For more information regarding this subject, refer to a book such as

Professional Linux Programming from Wrox Press, ISBN 1861003013.

❑ Locale::Maketext – a complete and purely Perl based solution At the time of writing thelatest version is 0.18 The documentation (available by typing >Perldoc

Locale::Maketext, after installation) includes a synopsis It is important to note that atpresent, Locale::Maketext is still in its early stages

For the purposes of our site, we have decided upon gettext, since it has very good support in Perl

The first thing we have to do is to create a so-called Portable Object (PO) file, which contains the

necessary information to translate messages The main elements of PO files are the keywords msgid

and msgstr, which contain the plain (in this case, Spanish) and translated string, respectively A file canjust be created using a text editor (or EMACS + PO mode) in this form:

msgid "Hola"

msgstr "Olá"

A set of these messages in a file, along with other information such as comments, and context, is called a

catalogue , and corresponds to a domain, which is usually a language For instance, the file above could

be saved as CA.po, for the Catalan language

There are also a couple of editors we can use to edit PO files POedit, which is available from

http://www.volny.cz/v.slavik/poedit/,has a simple to use interface based on the GTK library As an

alternative, if we favor the desktop environment, KBabel is available as a part of the KDE software

development kit, which can be obtained from http://www.kde.org For the time being, we opt for thePerl way, and choose to use Locale::PO, an object oriented class for creating PO files The followingprogram is written using this module:

#!/usr/bin/perl

# pocreate.pluse warnings;

Trang 20

my %hash = (

"EN" => ['Hello', "here's your report for"],

"EU" => ['Kaixo', "egunkaria hemen hire"],

"CA" => ['Hola', "aquí está el teu inform per"],

"GA" => ['Olá', "aquí está o seu relatório pra"],

"FR" => ['Salut', "voici votre raport pour"],

"DE" => ['Hallo', "ist hier ihr Report für"],

"IT" => ['Ciao', "qui è il vostro rapporto per"]

);

my @orig = ("Hola", "aquí está tu informe para");

For each element in this hash, a PO file is created This file receives a reference to an array of

Locale::PO objects, which features as main elements' a msgid, containing the key to the message, and

a msgstr, the translation of the message to the language represented in the file

After saving the file using the Locale::PO->save_file_fromarray function, it is converted to aninternal format using the MsgFormat command We are using the Spanish equivalent as keys, but anyother language, such as English, can be used Theoretically, these are the files that should be handled bythe team of translators in charge of localizing a program

for (keys %hash) {

my @po;

for ($i = 0; $i <= $#orig; $i++) {

$po[$i] = new Locale::PO();

$po[$i]->msgid($orig[$i]);

$po[$i]->msgstr($hash{$_}->[$i]);

}Locale::PO->save_file_fromarray("$_.po",\@po);

msgid "aquí está tu informe para"

msgstr "aquí está el teu inform per"

These files can be used by any program compliant with gettext (that is, any program that uses thelibgettext library) In the case of Perl, there are two modules that use them: Locale::PGetText and

Locale::gettext The main difference between them is that the first is a pure Perl implementation of

gettext, using its own machine readable file formats, while the second needs to have gettext

installed In both cases, po files have to be compiled to a machine readable file We will opt for thefirst, but both are perfectly valid Indeed, a program that belongs to the Locale::PGetText module iscalled in the following line of the above code:

MsgFormat $_ < $_.po;

Trang 21

The following program produces a database file, which can then be read by Locale::PGetText Theimplementation of this module is reduced to a few lines; it merely reads the DB file and ties it to a hash.The hash is accessed using a couple of functions, shown in the following example, which is an

elaboration of the previous invest.pl program:

#!/usr/bin/perl

# pouse.pluse warnings;

use strict;

use Finance::Quote;

use POSIX qw(localeconv setlocale LC_ALL LC_CTYPE strftime strcoll);

use Locale::PGetText; # this is new

# in this case only the locale is needed, the rest is stored in PO files

my %clientMap = ('Jordi' => 'ca_ES','Patxi' => 'eu_ES','Miguelanxo' => 'gl_ES','Orlando' => 'es_AR','Arnaldo' => 'es_CO','Joe' => 'en_US');

die "Usage: $0 <clientName> <fileName> \n" if $#ARGV <1;

my ($clientName,$fileName) = @ARGV;

my $locale = $clientMap{$clientName} || 'es_ES';

setlocale(LC_ALL, $locale); # use settings

Local messages are retrieved from the PO files using functions from the Locale::PGetText module.The directory where the files are located, is first set with setLocaleDir and then the correspondingfile is loaded

#Spanish for 'Here is your report for'

my $notice = gettext("Aquí está tu informe para");

Trang 22

#The rest is the same as invest.pl

#Set up currency conversion

my $q = Finance::Quote->new;

$q->timeout(60);

my $lconv = localeconv(); chop($lconv->{int_curr_symbol});

my $conversion_rate=$q->currency("ESP", $lconv->{int_curr_symbol}) || 1.0;

This will output exactly the same as the last but one example, invest.pl, with the only differencebeing, that in this case, we did not need to code the translated languages explicitly within the program.Adding new languages and messages is just a matter of adding the name of a person, and the locale, andthe currency they use The program will select the relevant file itself, extracting the language code fromthe two first letters of the locale name In this case, the directory /root/txt/properl should besubstituted for the directory in which our own po files reside The main change from the previousversion is around the lines where Locale::PGetText functions are used: the two letter code for thelanguage is extracted from the locale name and translated to all capitals; this code is used to set thelanguage, using the function setLanguage This in turn loads the corresponding database file;

messages are then obtained from the DB, and used as they were in the previous example This is not ahuge improvement, but at least we can now build, bit by bit, a multilingual dictionary of words andphrases that can be used and reused in all of our applications, with a common (and simple) interface

We soon realize that looking for an archetypal name in each place and deducing their mother tongue(and everything else) is not very efficient Besides, it is not very efficient to have to translate every singlemessage by hand What is needed is a way of automating the process For example, using the Internet to

do the job

Suppose now, that we want to go one step further and design a web site (which we will call El Escalón),enabling users all over the world to log in using their own specific username, and then to type a shortreport The following program implements these ideas (note that it will not work on Windows as it usesthe WWW::Babelfish module, which is only available on UNIX systems):

print header, start_html(-bgcolor => 'white', -title => "El Escalón");

We find out the language from the TLD, or by using any of the other tricks contained in the

grokLanguage subroutine If nothing works, we can always query the user

my ($TLD) = (remote_host() =~ /\.(\D+)$/);

print start_form;

my $language = grokLanguage($TLD) || print "Your language",

textfield('language'); # English name of the language

Trang 23

textfield('codename')])), "\n",Tr(td([translate('Type in your report', $language),

textfield('report')])), "\n",Tr(td([translate('What is your assessment of the situation', $language),popup_menu('situation',

[translate('Could not be worse', $language),translate('Pretty good', $language),

translate('Could be worse', $language)])])), "\n",end_table;

}print p({align => 'center'}, submit()), end_form, end_html;

For a more fuller description of the methods supported by the WWW::Babelfish module, we can dothe usual command:

return $toTranslate;

}

my $translated = $tr->translate('source' => 'English',

'destination' => $language,'text' => $toTranslate );

print $language, " ", $tr->error if $tr->error;

$translated = $toTranslate unless defined ($translated);

return $translated;

}

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 24

This subroutine tries to find out the language of the client using several tricks, which will be explainedlater.

sub grokLanguage {

# Returns the name of the language, in English, if possible

my $TLD = shift;

return param('language') if (param('language')); # try predefined

my ($ua) = user_agent() =~ /\[(\w+)\]/ ; # and now from the user agentreturn code2language($ua) if ($ua);

return code2language($TLD) if (defined ($TLD));

# Or from the top-level domainreturn;

The [es] means it is a client compiled for the Spanish language Not all languages are represented, andthis might not be valid for browsers outside the Mozilla realm, but if this information is there, we cantake advantage of it This code is translated to a language name using code2language, which is part ofthe Locale::Language module

Then it tries to deduce our language from the top-level domain of our computer Our server receives,along with the name of the file the client requires, a wealth of information from the client For example,

it receives the host name of the computer of the client In some cases, the host name will not be

resolved, that is, converted from its numerical address, and it will only get an address of the form154.67.33.3 From this address it would be difficult to ascertain the native language of the client in thearea of the calling computor; in many other cases, generic TLDs such as com or net are used

Finally, the computer is revealed as something like somecomputer.somedomain.it, which means thatthis calling computer is most probably in Italy In this case we use the Perl module

Locale::Language, which comes along with Locale::Country, to deduce the language from thetwo-letter TLD code This is a good hunch at best, since in most countries more than one language isspoken, but at least we can use it as a fallback method if the other two fail The code2language

function in this module understands most codes, and translates them to the English name of the

language that is spoken However, there is a small caveat here: the language code need not be the same

as the TLD code; for instance, UK is the top level domain for the United Kingdom, but it is the code forthe Ukrainian language However, we can expect that most English clients will be caught by one of theother methods The right way to proceed, would be to convert TLD to country name, and then to one

or several language codes or names This is not practical, however, so we have to make do with theabove method, which will work in at least some cases

Trang 25

Once the language is known (or guessed), it is used to translate messages from English to several otherlanguages (the translate subroutine performs this task) It uses the WWW::Babelfish module(available on CPAN), which is a front end to the translating site at: http://babelfish.altavista.com/ Thissite then takes phrases in several languages, translates them to English, and back again All this impliestwo requirements:

❑ The user must be on-line; each call to translate means a call to Babelfish

❑ The language requiring translation must be included in this list

In summary, the program could determine which language the user is accustomed to speaking Ofcourse, this is not the best solution in terms of speed or completeness for a multilingual web site If wereceive more than a couple of requests an hour, we will be placing a heavy load on Babelfish, and wewill have access cut down A solution based on gettext can suffice for a few languages and a fewmessages, but it will not scale up particularly well The best solution is to keep messages in a relationaldatabase, indexed by message key and language, and have a certified translator to do all the translation.Another possible solution in Windows (and other operating systems with shared libraries) is to use onefile for each language, and load the file which best suits the language of the user during run-time

It's About Time: Time Zones

It goes without saying that different areas of the globe fall into different time zones, and users fromdifferent countries will log in at different local times To take this into account, we can create a Perlprogram to remind us of the local time Just by inputting the name of the city and general zone we are

in, our computer returns local time:

#!/usr/bin/perl

# zones.pluse warnings;

use strict;

use Time::Zone;

use POSIX;

die "Usage = $0 City Zone \n" if $#ARGV < 1;

my ($city, $zone) = @ARGV;

my ($sec, $min, $hour) = localtime(time + $offset);

print qq(You are in zone $thisTZDifference with respect to GMT is ),$offset/3600, qq( hoursAnd local time is $hour hours $min minutes $sec seconds);

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 26

When issued with the following example commands, the program outputs something similar to:

> perl zones.pl Cocos Indian

You are in zone CCT

Difference with respect to GMT is 8 hours

And local time is 14 hours 34 minutes 38 seconds

> perl zones.pl Tokyo Asia

You are in zone JST

Difference with respect to GMT is 9 hours

And local time is 18 hours 4 minutes 42 seconds

> perl zones.pl Tegucigalpa America

You are in zone CST

Difference with respect to GMT is -6 hours

And local time is 12 hours 4 minutes 48 seconds

Time zone notation is hairy stuff There are at least three ways of denoting it:

❑ Using a (normally) 3 letter code, such as GMT (Greenwich Mean Time), CET (CentralEuropean Time), or JST (Japan Standard time) A whole list can be found in the file

/usr/lib/share/zoneinfo/zone.tab (in a RedHat >=6.x system; we might have to lookfor it in a different place, or not find it at all, in a different system) The Locale::Zonetab

module also provides a list, and comes with examples Here is a list of the codes for all lines These are not exactly the same as time zones, as a country might choose to follow thetime-line it falls into:

time-Change with respect to GMT Time zone denomination

Trang 27

❑ Using delta with respect to GMT; GMT+1 for instance, or GMT-12.

❑ Using the name of a principal city and zone; but zones are not exactly continents There areten zones: Europe, Asia, America, Antarctica, Australia, Pacific, Indian, Atlantic, Africa andArctic For instance, the time zone a Spaniard is in would be Madrid/Europe This

denomination has the advantage of also including Daylight Saving Time information (more onthis later)

❑ there are several other notations: A to Z (used by US Military), national notations (forinstance, in the US) well, a whole lot of them There is a lot of information in the

Date::Manip pod page

The whole time zone aspect is a bit more complicated, since local time changes according to Daylight Saving Time, a French invention to make people go to work by night and come back in broad daylight

We therefore have two options: go back to old times, when people worked to the sun, or use local rules

by asking the users, which is the approach most operating systems take Most operating systems includeinformation for all possible time zones; for instance, Linux includes local time and time zone

information in the /usr/lib/share/zoneinfo directory, which has been compiled using the zic

compiler from time zone rule files The baseline is that we really have to go out of our way, and ask alot of local information to find out what the local time zone is Most people just ask, or take time fromthe local computer Ideally, a GPS-equipped computer would have a lookup table allowing it to knowthe local time for every co-ordinates in the world

In our case, we solve the problem the fast way In most cases, knowing the name of the most populouscity in the surroundings, and the general zone out of the ten mentioned above is all we need From thisinformation, the three letter code of the time zone, along with DST information, is obtained usingPOSIX functions tzset and tzname The first sets the time zone to the one contained in the TZ

environment variable, and the second deduces the time zone and DST information from it Thisinformation is later used to compute the offset in seconds with respect to local time, which is used tocompute the new time by adding that offset, in seconds, to actual time Finally, all the above

information is printed out We will skip formatting it in the way required by locale, since we haveseen that already in a previous section Besides, city and general zone might not be enough information

to find out which locale the user prefers

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 28

Looks Like a Foreign Language

Continuing with our scenario, we were not too convinced by the setup of our reporting web site Therewere many reports, coming in from many well-known (and not so well-known) domains, and we couldnot find out which language they were written in Even though we can not translate between theselanguages, we would like at least to know which language was being used The following script goessome way to remedying this situation:

#!/usr/bin/perl -T

# langident.pl

use warnings;

# lack of use strict;

use CGI qw(:standard);

use Locale::Language;

use Lingua::Ident;

print header, start_html;

if (!param('phrase')) {

print start_form,

"->", textfield('phrase'), "<-", end_form;

exit;

}}print em(param('phrase')), br,

module To do that, a previous step is needed:

> trainlid xx < sample.xx > data.xx

This has to be done for each sample.* we have There are some samples supplied with the

Lingua::Ident module, but we also look at three samples extracted from appropriate web pages say,namely Maltese, Basque, and Dutch (sample.mt, sample.eu and sample.nl) The trainlid

program, which comes with the module, creates a data.* file with letter combination frequencies As

an argument, it takes the name of the locale the language refers to, and as input, a file with a sample of alanguage, and then it outputs the frequency file

Trang 29

The data.* files are then used to initialize the Lingua::Ident module This will enable it to identify

as many languages as it is given Unfortunately, this is done statistically, and it is not guaranteed toidentify the correct language; Dutch and German are often mistaken for each other, but it will work formost languages, as can be seen in the following figures:

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 30

Conjugating in Portuguese

A rather striking way of learning a foreign language would be to use Perl Surprisingly, a very usefulmodule called Lingua::PT::Conjugate has been written, for the purpose of conjugating Portugueseverbs It includes two utilities, conjug, and treinar, the first for conjugating, and the second forinteractive training with Portuguese verbs For instance, we could type:

> conjug i ficar pres

ficar :

pres fico ficas fica ficamos ficais ficam

This is the equivalent of the verb 'to do' in English, conjugated in present tense, from the first person inthe singular to the third person in the plural It does not use ISO-8859-1 encoding (due to the i in

conjug i) The following script utilizes this module, although it should be noted that this is not Windowscompliant because the WWW::Babelfish module cannot be installed on Windows systems:

#!/usr/bin/perl -T

# conjug.pl

use warnings;

# lack of use strict here

use CGI qw(:standard *table);

pp => 'Past Participle',grd => Gerund);

my @labels = keys %tenses;

my %persons =(1 => I,

Trang 31

After setting up data for the combo boxes (popup_menu), here comes the program itself:

print start_table({bgcolor => 'lightblue', cellspacing => 0,

cellpadding => 5}),start_form,

Tr(td(["English Verb",textfield('verb')])), "\n",Tr(td(['Tense',,

popup_menu('tense' , \@labels, 'pres', \%tenses)])), "\n",Tr(td(['Person',

popup_menu('person', \@labelsP, 1, \%persons)])),Tr(td({colspan => 2, align => 'center'}, submit())),end_form, end_table;

This is the part of the script that processes the form, translates the verb, and conjugates it:

} else {print h1("Translating ", param('verb'), " to ", param('tense'),

" in person ", param('person')),p(em (conjug(translate(param('verb')), param('tense'), param('person'))));

}print end_html;

sub translate {

my ($toTranslate) = @_;

my $tr = new WWW::Babelfish() || return $toTranslate;

my $translated = $tr->translate('source' => 'English',

'destination' => 'Portuguese','text' => "$toTranslate");

print $language, " ", $tr->error if $tr->error;

$translated = $toTranslate unless defined ($translated);

return $translated;

}

In this script, our knowledge of the WWW::Babelfish module is used to translate the gerund (that is,the English form of the verb ending in '-ing') of the verb from English to Portuguese, and then conjugatethe verb in Portuguese Verbs in Portuguese are pretty much the same as many other Latin languages:they have to be conjugated, and have different forms for different tenses and different persons, not just acouple of forms as in English Thus, we have to state the person (first, second, or third, singular orplural) and the tense

The script follows two paths, depending on whether it is called for the first time or called with a verb If

it is called directly, it prints a form that queries for the verb, which should be entered in gerund form,since this seems to be the way of getting infinitives in Babelfish The tense is then requested, and thistakes the English form as a parameter, and the script outputs the Portuguese form (for instance, ivo as

in imperativo, which means, obviously enough, imperative)

Once the verb has been set, the script, using a version of the translate subroutine locked forPortuguese, translates the verb into Portuguese, then conjugates it by calling the

Lingua::PT::Conjugate::conjug method This method takes the verb, tense, and person asarguments, and returns the verb form For instance, typing in driving and selecting the first personplural in the future tense (we will drive) will be processed by the program as:

Trang 32

The only problem is that there is no such module for other languages as well, though if we want to workwith other languages, CPAN includes several 'national' modules that take into account national

peculiarities The two main groups are the No::* set, and the Cz::* set

The Cz:: set includes Cz::Cstocs, Cz::Sort, and Cz::Time The reasoning for the presence ofthese modules is basically due to the fact that locale settings for the Czech and Slovak languages seem to

be broken, so these modules include their own sorting and time display routines Cz::Cstocs encodesfrom il2 to ASCII and back These modules will probably be made obsolete with better locale modules,and Unicode/UTF8 encoding

The No:: set also includes modules for date and sorting in Norway, No::Dato and No::Sort

respectively In addition, several modules that might be useful for e-commerce applications are

available:

❑ No::KontoNr – checks social security numbers

❑ No::PersonNr – checks account numbers

❑ No::Telenor – computes, phone call prices for different telephone companies, times, anddays of the week

The 'Lingua::*' Modules

We can extend our login/report based web site so that users can be known by a pseudonym or

nickname (this may not seem very useful, but it is not an uncommon feature on many sites these days).The major new feature though, is that users should now be able to type their report into the site, and thesite can give them feedback on their report, judging it on its readability rating The following codeimplements these new ideas (note that the Lingua::EN::Fathom module cannot be installed on aWindows machine as yet):

#!/usr/bin/perl -T

# report.pl

use warnings;

# lack of use strict

use CGI qw( :standard *table *ul);

use Lingua::EN::Syllable;

use Lingua::EN::Nickname;

use Lingua::EN::Fathom;

Trang 33

print header, start_html(-bgcolor => white, -title => "Reporting");

if (!param(report)) {print start_table({bgcolor => lightblue,

cellspacing => 0, cellpađing => 5}), start_form,Tr(td(["Name" , textfield('namé)])), "\n",

Tr(td(["Report" , textareắreport')])), "\n",Tr(td({colspan => 2, align => center}, submit())),end_form, end_table;

p("And you will be paid ", $numSyll*0.25,

"\$ for $numSyll syllables"),}

print end_html;

This script manages to use all three Lingua modules mentioned above (we may have some problemsbuilding the Lingua::En::Fathom module in English, but we can try and build it in Linux and thencopy the resulting files to Windows) The main topic of the Lingua modules is (the English) language;and there are modules for syntactic parsing, recognizing gender, and so forth There are also a fewmodules that deal with Russian (Lingua::RU::Charset), and the Portuguese module mentionedabovẹ Hopefully, this will be extended in the future, but for the moment, we have to make do with themany modules that deal with the English languagẹ In this case, we are using modules that countsyllables (Lingua::EN::Syllable), and find the real name that corresponds to a nickname such as'Joé or 'Jimmý (Lingua::EN::Nickname) Most curious of all, Lingua::EN::Fathom analyzes text

for readability, assigning it three standard indices called Kincaid, Flesch, and Fog Kincaid returns the

Flesch-Kincaid grade level for the text Flesch grades the text from 0 to 100 according to its readability;

a score between 60 and 70 being ideal Fog returns the (theoretical) mental age needed to understandthe text

The structure of the script is very similar to the others in this chapter It checks if the CGI parameterreport is present If it is not present, it prints the form, requesting the nickname and the report If it ispresent, it is processed, creating a Fathom object, which analyzes text, counts syllables, and computesthe name corresponding to the nickname, using nickroot from the following lines (taken from theabove code):

Trang 34

The rest is printing, and calling the fog, kincaid, and flesch functions to get the respective indices.The expected style of result is shown in the following figure:

This does not say much about the linguistic reporting skills of 'Tamara' A fog index of 21 is way out (18would already be pretty difficult to understand), and the Kincaid index is also quite far from the optimal(between 7 and 8)

A small piece of advice here: nickroot does not appear in the Nickname POD; rootname is

mentioned instead The documentation, (version 1.1) is badly out of sync with the current release.

Spelling and Stemming

A simple way to improve the readability ratings of the reports submitted by users would be to check thespelling for errors The previous program has therefore been modified to check English spelling andcorrect it, if possible:

This code is not Windows compliant Although the Lingua::Ispell module can be installed on

Windows, the code will not work, because the Windows spellchecker does not work in the same way

as the ispell program on UNIX

Trang 35

use Lingua::Ispell qw(spellcheck);

print header, start_html(-bgcolor => white,

-title => "Reporting");

if (!param(report)) {print start_table({bgcolor => 'lightblué, cellspacing => 0,

cellpađing => 5}), start_form,Tr(td(["Name" , textfield('namé)])), "\n",Tr(td(["Report" , textareắreport')])), "\n",Tr(td({colspan => 2, align => 'center'}, submit())), end_form, end_table;

p("Which has received the following ratings"),start_ul, li("Fog = ", $text->fog ),

li("Kincaid = ", $text->kincaid),li("Flesch = ", $text->flesch),end_ul,

p("And had the following misspellings "), start_ul;

for (keys %errors) {print li($_, " Suggestions :", join(" ", @{$errors{$_}}));

}print end_ul, p("And you will be paid ", $numSyll*0.25, "\$ for

$numSyll syllables ");

}print end_html;

As we can see, this program is quite similar to the previous examplẹ In the highlighted lines, the path

to the ispell program is set in the following line:

Trang 36

This function returns an array of references to hashes, which has several keys for each word in the inputstring; term contains the original word, type is the flag set by ispell We should pay attention if

type is none or miss In this last case, a new key, misses, can be used This contains an array ofpossible correct spellings of the word We only flag near misses and place them in the %errors hash,and print them afterwards

Take into account that the ispell utility seems to be a bit obsolete, and may not be widely available.Indeed, it has been replaced with aspell in the current RedHat Linux distributions, but there is nodifference in results, since aspell includes an ispell wrapper in the package If we type in an Englishname and a report composed of the following phrases, the program will return something like this:

Perl can play other word games besides the spell check For instance, in many information retrievalapplications, it is sometimes useful to know the root of a word Take the case where somebody issearching for the word 'birds' They may also be looking for 'birdies', 'bird' or 'bird's nest' The

operation that extracts the root of a word is called stemming, and it is a language dependent operation.

Incidentally, this operation is very easy in English, with conjugations that have only a few forms, nodeclinations, and no male/female variation in adjectives Difficulties may arise in other languages such

as Basque, which has 14 different declinations or cases, and 3 different numbers: singular, plural and

mugagabe, indefinite or collective There is, however, a Lingua::Stem module that can help (in theEnglish case at least):

#!/usr/bin/perl

# stem.pl

use warnings;

use strict;

Trang 37

use Lingua::Stem qw(stem);

my @words = qw(birds birdie flying flown worked working);

my $stemmer = Lingua::Stem->new(-locale => 'EN-US');

my $stems = $stemmer->stem(@words);

print "Word \tStem \n";

while (@$stems) {print pop @words, "\t", pop @$stems, "\n";

}This returns something like this:

applications

Writing Multilingual Web Pages

In order to give an example of a multilingual web page, we are going to create two forms The firstallows users to input their name, and the language they understand User information will be stored incookies by the first script for use in the second script The second script will then display, amongst otherthings, a greeting in the users' chosen language, and will also allow users to type in text and have ittranslated into Spanish

#!/usr/bin/perl -T

# form.pluse warnings;

# lack of use strict;

use CGI qw(:standard *table);

use Locale::Language;

use Locale::Country;

if (!param('name')) {print header, start_html(-bgcolor => 'white', -title => "ID card"),start_table({bgcolor => 'lightblue', cellspacing => 0, cellpadding => 5}

), start_form,Tr(td("Name" ),td({colspan => 3},textfield('name'))), "\n",

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 38

Tr(td(["Language you understand",radio_group('lang', [English, French, Spanish])])), "\n",Tr(td({colspan => 4, align => 'center'}, submit())), end_form, end_table;

} else { #Process form and set cookie

my @cookies;

push(@cookies,cookie(-name => 'name', -value => param('name'),

-expires => '+3M', -path => '/cgi-bin'));

push(@cookies, cookie (-name => 'lang', -value =>

language2code(param('lang')),-expires => '+3M', -path => '/cgi-bin'));

print header(-cookie => \@cookies),start_html(-bgcolor => 'white', -title => "ID card accepted"),

"Application accepted \n";

}

print end_html;

This example is pretty straightforward: by default, it prints a form, and if the name field is already filled,

it is processed The language is also processed to a language code: en, fr, or es The path in the

cookie function will have to be changed to whatever path our script resides in The expiry date canalso be changed to suit any other needs; in our case it was set to 3 months

Next, we will use a comma separated flatfile called messages.txt to store our multilingual messages.Our application will then search through it, displaying the correct language on our report form Ofcourse, this depends on the language that the user has indicated that they understand

All messages are normalized, with no capitals, so that capitalization can be established by the

application:

hello, hola, salut

please, por favor, s'il vous plait

here is, aquí está, voici

welcome, bienvenido, bienvenu

today's date is, la fecha del día es, aujourd'hui est le

your name, su nombre, votre nom

report, informe, rapport

send, enviar, envoyer

received, recibido, reçu

by, por, par

Since languages are taken from cookies, and we set the cookies ourselves, we would not expect this togive an error (but there will always be nonvalid input when one expects valid input) Again note that thescript will not work on Windows, since the WWW::Babelfish module cannot be installed:

#!/usr/bin/perl -T

# babel.pl

use warnings;

# lack of strict

Trang 39

print header, start_html(-bgcolor => 'white', -title => "Reporting");

setlocale(LC_TIME, cookie('locale'));

# If it does not exist, defaults to local

So far, common processing is used for both showing and processing the form Now, if the report CGIparameter is not present, it will show the form If it is present, it will process it

if (!param('report')) { #Show form

my ($sec, $min, $hour, $mday, $mon, $year, $wday) = localtime (time);

print h2(ucfirst(getMessage('hello', $language)),

" $name,", getMessage("today's date is", $language),strftime("%c", $sec, $min, $hour, $mday, $mon, $year, $wday), "\n");

print start_table({bgcolor => 'lightblue', cellspacing => 0,

cellpadding => 5}),start_form,

Tr(td([ucfirst(getMessage('report', $language)) ,textarea(-rows => 10,

-columns => 50,-name => 'report')])), "\n",Tr(td({colspan => 2, align => 'center'},submit(ucfirst(getMessage('send', $language))))),end_form, end_table;

If the form has been completed, we have the parameter report We then have to process it:

} else { # process it and show it in Englishprint h2(ucfirst(getMessage('received', $language)),getMessage('report', $language),

getMessage('by', $language), " $name"),p(b("Original"), param('report')),p(em (translate(param('report'), code2language($language))));

}} else { # ask loginprint redirect(-uri => '/cgi-bin/form.pl');

}print end_html;

The getMessage subroutine (below) opens up a filehandle to the flatfile database, taking the key andthe language code as parameters It then places each phrase into an array, and then searches it,

returning the desired element The other subroutine, translate, does exactly as expected – it

translates our report into Spanish Beware, however, that Babelfish cannot translate every language intoany other language In our case, users selecting French as the language they understand, will not havetheir report translated into Spanish For a full listing of languages Babelfish can translate to and from,visit http://www.infotektur.com/demos/babelfish/en.html

Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com

Trang 40

This is the getMessage subroutine used in the babel.pl script:

sub getMessage {

my ($key,$language) = @_;chomp ($key,$language);

$msg = $final[$counter+2];

return $msg;

}elsif ($_ =~ /$key/ and $language =~ /en/) {

$msg = $final[$counter];

return $msg;

}else {next;

}}}

and here is the translate subroutine:

sub translate {

my ($toTranslate, $language) = @_;

my $tr = new WWW::Babelfish() || return $toTranslate;

my $translated = $tr->translate('source' => $language,

'destination' => 'Spanish','text' => $toTranslate);

print $language, " ", $tr->error if $tr->error;

$translated = $toTranslate unless defined ($translated);

return $translated;

}

Ngày đăng: 12/08/2014, 23:23

TỪ KHÓA LIÊN QUAN