automated data extraction from the web

Tài liệu Báo cáo khoa học: "Extraction and Approximation of Numerical Attributes from the Web" pdf

Tài liệu Báo cáo khoa học: "Extraction and Approximation of Numerical Attributes from the Web" pdf

... of the addressed numerical at- tributes. Evaluation was done using human subjects. It is difficult to do an automated evaluation, since the nature of the data is different from that of the QA dataset. ... indeed most (≥ 50%) of the retrieved values fit the re- trieved bounds. If the lower and/or upper bound 1311 contradicts more than half of the data, we reject the bound. Otherwise we remove all ... value for the given object. During the first stage it is possible that we directly extract from the text a set of values for the requested object. The bounds processing step rejects some of these...

Ngày tải lên: 20/02/2014, 04:20

10 467 0
Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

... that, using the new web mining scheme, the web mining throughput is increased by 32%; (ii) The quality of the mined data is improved. By lever- aging the web pages’ HTML structures, the sen- tence ... English-Chinese parallel data from the web. The mining procedure is initiated by acquiring Chinese website list. We have downloaded about 300,000 URLs of Chinese websites from the web directories ... performance on the web data, the similarity of the HTML tag struc- tures between the parallel web documents should be leveraged properly in the sentence alignment model. In order to improve the quality...

Ngày tải lên: 08/03/2014, 02:21

8 436 0
Tài liệu Báo cáo khoa học: "Automatic Collection of Related Terms from the Web" pptx

Tài liệu Báo cáo khoa học: "Automatic Collection of Related Terms from the Web" pptx

... query is a term, its hit is the number of pages that contain the term on the Web. We use the following notation. H(x)= the number of pages that contain the term x” The number H (x) can be used ... in the compiled corpus. R: the target term did not exist on the collected web pages. Only 43 terms (20%) out of 210 terms were col- lected by the system. This low recall primarily comes from the ... Sentence extraction The system decomposes each page into sen- tences, and extracts the sentences that contain the seed term s. The reason why we use the additional three queries is that they work...

Ngày tải lên: 20/02/2014, 16:20

4 438 0
Báo cáo khoa học: "Automatic Set Instance Extraction using the Web" pptx

Báo cáo khoa học: "Automatic Set Instance Extraction using the Web" pptx

... com- ponents: the Fetcher, Extractor, and Ranker. The Fetcher is responsible for fetching web docu- ments, and the URLs of the documents come from top results retrieved from the search engine us- ing the ... a page. All other candidate instances bracketed by these con- textual strings derived from a particular page are extracted from the same page. After the candidates are extracted, the Ranker constructs ... instance extraction for each dataset measured in MAP. NP is the Noisy Instance Provider, NE is the Noisy Instance Expander, and BS is the Bootstrapper. quality of the initial list, and the Bootstrapper...

Ngày tải lên: 08/03/2014, 00:20

9 336 0
Báo cáo khoa học: "Automatic Acquisition of Ranked Qualia Structures from the Web" potx

Báo cáo khoa học: "Automatic Acquisition of Ranked Qualia Structures from the Web" potx

... (not calculated over the Web) as well as the conditional probability cal- culated over the Web (Web- P) delivered the best re- sults, while the PMI-based ranking measure yielded the worst results. ... coefficient (Web- Jac), the Pointwise Mutual Information (Web- PMI) and the conditional probability (Web- P). We also present a version of the conditional probability which does not use the Web but merely ... appropriate queries to the web search engine and choosing the article leading to the highest number of results. The corresponding patterns are then matched in the 50 snippets returned by the search engine...

Ngày tải lên: 08/03/2014, 02:21

8 381 0
Báo cáo khoa học: "Mining Parenthetical Translations from the Web by Word Alignment" potx

Báo cáo khoa học: "Mining Parenthetical Translations from the Web by Word Alignment" potx

... our modified version of the competitive link- ing algorithm, the link score of a pair of words is the sum of the φ 2 scores of the words themselves, their prefixes and their suffixes. In addition ... BLEU score based on the test data in the 2006 NIST MT Evaluation Workshop. 6 Related Work Nagata et al. (2001) made the first proposal to mine translations from the web. Their work was concentrated ... pairs, where the translation of the in-parenthesis terms is a suffix of the pre-parenthesis text. The lengths and frequency counts of the suffixes have been used to determine what is the translation...

Ngày tải lên: 17/03/2014, 02:20

9 612 0
Báo cáo khoa học: "Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs" pdf

Báo cáo khoa học: "Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs" pdf

... hyponym patterns to extract class instances from the web and then evalu- ates them further by computing mutual information scores based on web queries. The work by (Widdows and Dorow, 2002) on lex- ical ... to instantiate the pattern. On the first iteration, the pattern is given to Google as a web query, and new class members are extracted from the retrieved text snippets. We wanted the system to ... progresses. Initially, the seed is the only trusted class member and the only vertex in the graph. The bootstrapping process begins by instan- tiating the doubly-anchored pattern with the seed class...

Ngày tải lên: 17/03/2014, 02:20

9 345 0
Báo cáo khoa học: "Extracting Hypernym Pairs from the Web" potx

Báo cáo khoa học: "Extracting Hypernym Pairs from the Web" potx

... relations from the web. We compare our approach with hypernym ex- traction from morphological clues and from large text corpora. We show that the abun- dance of available data on the web enables obtaining ... in em- ploying the web for the extraction of hypernym re- lations. We are especially curious about whether the size of the web allows to achieve meaningful results with basic extraction techniques. In ... WordNet. In the center group of ten pairs all errors are caused by the mor- phological approach while all other errors originate from the web extraction method. 4 Concluding remarks The contributions...

Ngày tải lên: 17/03/2014, 04:20

4 396 0
Tài liệu Module 11: Accessing Data from the Outlook 2000 Client ppt

Tài liệu Module 11: Accessing Data from the Outlook 2000 Client ppt

... Accessing Data from the Outlook 2000 Client Using the Data Source Control  Function of the Data Source Control  Used as the reporting engine  Manages the connection to the underlying data ... list from a relational data source, the PivotTable Service is used to create a multidimensional data cube from the relational data bound to the Data Source control. This data cube is then used ... manipulate data from the data source, and disconnect from the data source when you finish using the data. One of the major benefits of ADO is that it requires fewer calls to achieve the same...

Ngày tải lên: 21/12/2013, 06:15

62 409 0
Tài liệu Fertility, Family Planning, and Women’s Health: New Data From the 1995 National Survey of Family Growth pptx

Tài liệu Fertility, Family Planning, and Women’s Health: New Data From the 1995 National Survey of Family Growth pptx

... nonvoluntaryintercourse.Onesetof questionswasintheinterviewer- administeredportionofthesurveyand thesecondwasintheself-administered portion(AudioCASI).Inthe interviewer-administeredseries,they wereaskedwhethertheirfirst intercoursewas‘‘voluntaryornot voluntary.’’Forabout8percentof women15–44yearsofagewhohave hadintercourse,theirfirstintercourse wasnotvoluntary(table21).Forthose whosefirstintercourseoccurredatage 15oryounger,thatfirstintercoursewas nonvoluntaryfor16percentcompared with7percentorlessforthosewhose firstintercourseoccurredatage16or older.Thepercentwhosefirst intercoursewasnonvoluntaryisnearly 10percentamongwomenwhosefirst intercoursewasbefore1975compared withabout6percentamongwomenwho firsthadintercourseinthe1990’s (table21). Intheself-administered(Audio CASI)portionoftheinterview,women wereaskedarelatedbutdifferent question:whethertheyhadeverbeen forcedbyamantohavesexual intercourseagainsttheirwill.About 20percentofwomenreportedthatthey hadbeenforcedbyamantohave intercourseagainsttheirwillatsome timeintheirlives(table22).Thus, table21showsthatfor8percentof women,theirfirstintercoursewas nonvoluntary;table22showsthat 20percenthadhadnonvoluntary intercourseatsometime—not necessarilyatfirstintercourse.Table22 alsoshowsthat6percentofwomen reportedthattheywereforcedtohave intercoursebeforetheywere15and another6percentbeforetheywere18.A fairlyhighpercentofformerlymarried (divorcedorseparated)women—about 35percent—reportedthattheyhadbeen forcedtohaveintercourse.Thisfinding deservesfurtherstudy. FirstSexualPartner Therehasbeenmuchpublic discussionaboutthepartnersofsexually activeteenagers.Table23profilesthe ageofmalepartnersatwomen’sfirst voluntaryintercourse.Abouttwo-thirds (66percent)ofwomenwhohadtheir firstvoluntaryintercoursebeforethey were16hadfirstpartnerswhowere under18yearsofage;21percenthad firstpartners18–19yearsofage; 7percenthadfirstpartners20–22years ofage,2percenthadfirstpartners 23–24yearsofage,and4percenthad firstpartners25yearsofageorolder (table23). Only3percentofwomenhadtheir firstintercoursewithamantheyjust met.About3outof5women (61percent)were‘‘goingsteady’’or ‘‘goingtogether’’withthemantheyhad intercoursewiththefirsttime,andabout 1in5wereengagedormarriedtohim. About12percentofallwomenwere marriedwhentheyhadtheirfirst intercourse.Amongwomen40–44years ofage(bornin1951–55),23percent weremarriedtotheirpartneratfirst intercoursewhileabout2percentof women15–19yearsofage(born 1971–75)weremarriedtotheirfirst partner.Womenwholivedwithbothof theirparentsthroughouttheirchildhood weremorelikelythanotherwomento havebeenmarriedtotheirpartnerat firstintercourse(table24). FirstIntercourseRelativeto FirstMarriage Amongever-marriedwomen15–44 yearsofage,82percenthadfirst intercoursebeforetheyweremarried. About69percentofthosefirstmarried in1965–74hadtheirfirstintercourse beforemarriagecomparedwith 89percentofthosefirstmarriedinthe 1990’s.Only2percentofthosefirst marriedin1965–74hadtheirfirst intercourse5yearsormorebefore marriagecomparedwith56percentof thosefirstmarriedinthe1990’s (table25). NumberofSexualPartners Asmentionedpreviously,some questionsonabortion,sexualpartners, andforcedsexualintercoursewere askedinboththeinterviewer- administeredandtheself-administered (AudioCASI)portionsoftheinterview. Responsestosensitivequestionsappear tohavebeenaffectedbythecomputer self-administeredmodeofinterviewing. Tables26–31showdataonthenumber ofsexualpartnersinthelast1year,5 years,andlifetime,usingboththe interviewer-administeredandself- administeredmethods.Presentingdata basedonbothmodesofinterviewing allowstheexaminationofdifferencesin reportingduetothemodeof interviewing(table26versus27, table28versus29,andtable30versus 31);andtheselectionoffindingsmost appropriateforcomparisontoother surveys. About3percentofunmarried womentoldtheinterviewerthatthey hadhadfourormoremalesexual partnersinthelast12months(table26), comparedwith9percentreportingfour ormorepartnersinAudioCASI (table27).Asimilardisparitywasfound whencomparingtheinterviewerresults withAudioCASIresultsforthenumber ofpartnerssinceJanuary1991(alittle lessthan5years,onaverage). Amongunmarriedwomen,14percent toldtheinterviewertheyhadfouror moremalesexualpartnerssinceJanuary 1991(table28)while18percent reportedinAudioCASIthattheyhad hadfourormorepartnersinthattime (table29). Thistopicdeservesmoredetailed study,butitappearsthatusingthemore privateinterviewtechniquegavea higherandpresumablymorecomplete estimateofthenumberofpartners amongunmarriedwomen(8,11). MarriageandCohabitation Tables32–37show1995dataon formalmarriageandunmarried cohabitation.About38percentof women15–44yearsofagehadnever beenmarriedwheninterviewedin1995 (table32).Thepercentnevermarried washigherineveryagegroupin1995 thanitwasin1982(24).Abouthalfof women25–39yearsofagehavehadan unmarriedcohabitationwithamanat sometimeintheirlives;10to 11percentofwomenintheirtwenties arecurrentlycohabitingwithaman (table33). About30percentofwomen25–39 yearsofagelivedwithaman (cohabited)beforetheirfirstmarriage (table34).Overone-half(57percent)of Series23,No.19[Page5 ... thepopulation.Thenumberofwomen sherepresentsinthepopulationiscalled hersamplingweight.Sampling weightsmayvaryconsiderablyfromthis averagevaluedependingonthe respondentsrace,theresponseratefor similarwomen,andotherfactors.As withanysamplesurvey,theestimatesin thisreportaresubjecttosampling variability.SignicancetestsonNSFG datashouldbedonetakingthesampling designintoaccount. Nonsamplingerrorswereminimized bystringentquality-controlprocedures thatincludedthoroughinterviewer training,checkingtheconsistencyof answersduringandaftertheinterview, imputingmissingdata,andadjustingthe samplingweightsfornonresponseand undercoveragetomatchnationaltotals. Estimatesofsamplingerrorsandother statisticalaspectsofthesurveyare describedinmoredetailinanother separatereport(13). Thisreportshowsndingsby characteristicsofthewomaninterviewed, includingherage,maritalstatus, education,parity,householdincome dividedbythepovertylevel,andraceand Hispanicorigin.Ithasbeenshownthat blackandHispanicwomenhavemarkedly lowerlevelsofincome,education,and accesstohealthcareandhealthinsurance, thanwhitewomen(14).Theseandother factors,ratherthanraceororiginperse, probablyaccountfordifferencesinthe behaviorsandoutcomesstudiedinthis reportamongwhite,black,andHispanic women(15). TableBshowsafactorthatshould beconsideredininterpretingtrendsin pregnancy-relatedbehaviorintheUnited States:thechangingagecompositionof thereproductive-agepopulation.In 1982,therewere54.1millionwomenof reproductiveageintheUnitedStates;in 1988,57.9million;andin1995,60.2 million(16).Thelargebabyboom cohort,bornbetween1946and1964, was1834yearsofagein1982,2442 yearsofagein1988,and3149years ofagein1995.Theselargebirthcohorts werepreceded(upto1945)and followed(196580)bysmallercohorts. Whiletheoverallnumberofwomen 1544yearsofageroseby6million,or 11percentbetween1982and1995 ,the numberofteenagewomendroppedby about6percent,thenumberofwomen 2024yearsofagedroppedby 15percent,andthenumberofwomen 2529droppedby6percent(tableB).In contrast,thenumberofwomen3044 yearsofageincreasedsharplyfor example,thenumberofwomen4044 yearsofageincreasedby59percent between1982and1995.Also,women 3044yearsofageaccountedfor 54percentofwomen1544yearsofage in1995comparedwith44percentin 1982.Thesedifferencesinage compositionmayberelevantwhenever timetrendsamongwomen1544years ofagearebeingdiscussed. Publicuselesbasedonthe1995 NSFGareavailableoncomputertape. TheywillalsobeavailableonCompact DiscRead-OnlyMemory(CD-ROM). Questionsaboutthecostandavailability ofthecomputertapesshouldbedirected totheNationalTechnicalInformation Service(NTIS),5285PortRoyalRoad, Springeld,VA22161,703487-4650, or1800-553-NTIS.Questionsregarding theCD-ROMlesshouldbedirectedto NCHSDataDisseminationBranchat 301436-8500. Results T ables117containmeasuresof pregnancyandbirthintheUnited States. ChildrenEverBornandTotal BirthsExpected In1995,women1544yearsof ageintheUnitedStateshadhadan averageof1.2birthsperwoman (table1).Thiscompareswith1.2in 1988and1.3in1982(17).In1995, women1544yearsofageexpectedto nishtheirchildbearingwithan averageof2.2childrenperwoman (table1)comparedwith2.2in1988 and2.4in1982(17). Theproportionwhoreportthatthey haveneverbeenpregnantwasmarkedly higherforcollegegraduatesthanfor thosewhodidnotcompletehighschool (table3).Thissamepatternbyeducation isalsoseenwhendataforlivebirthsare examined(tables45):about49percent ofwomen2244yearsofagewhohad graduatedfromcollegehadhadnolive birthsasofthedateofinterview comparedwithjust8percentofwomen 2244yearsofagewithoutahigh schooldiploma(table4).Withinrace andHispanicorigingroups,thepattern wasthesame:collegegraduateshad markedlyhigherpercentschildlessthan womenwithlesseducation(table5). Table6showsacomparison betweenlivebirthsreportedinthe NSFGandlivebirthsregisteredonbirth certicatesintheyears199194.In eachindividualcalendaryearandfor thesumoftheyears199194 ,the NSFGestimateofthenumberofbirths isveryclosetothebirthcerticatetotal anddiffersfromitbylessthanthe NSFGssamplingerror.TheNSFG estimateisalsoverycloseforwhite women.TheNSFGestimateforblack womenisslightlylower,andthe estimateforotherracessomewhat higherthanthebirthcerticatedata.A discussionofthisdifferenceisgivenin thedenitionofRaceandHispanic originintheDenitionsofTerms. Overall,andbycharacteristicsother thanrace,however,table6showsthat TableB.Numberofwomen,byage:UnitedStates,1982,1988,and1995 Ageơ ... Human Services. These organizations, along with leading researchers from outside the government, helped to design the survey. Further details on the planning and operation of the survey are given...

Ngày tải lên: 12/02/2014, 23:20

125 760 0
w