1. Trang chủ
  2. » Luận Văn - Báo Cáo

Tài liệu Báo cáo khoa học: "WORD AND OBJECT IN DISEASE DESCRIPTIONS" doc

4 533 0
Tài liệu được quét OCR, nội dung có thể không chính xác
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Word and object in disease descriptions
Tác giả M.S. Blois, D.D. Sherertz, M.S. Tuttle
Trường học University of California, San Francisco
Chuyên ngành Medical information science
Thể loại Research paper
Thành phố San Francisco
Định dạng
Số trang 4
Dung lượng 319,76 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The original purpose of the book was to introduce a standard nomenclature of disease names, and the attributes are organized in conventional medical form: a definition consists of a brie

Trang 1

M.S Blois, D.D Sherertz, M.S Tuttle Section on Medical Information Science University of Calife.rnia, San Francisco Experiments were conducted on a book, Current Medical

information and Terminology, (AMA, Chicago, 1971, edited

by Burgess Gordon, M.D.), which is a compendium of 3262

diseases, each of which is defined by a collection of

attributes The original purpose of the book was to

introduce a standard nomenclature of disease names, and

the attributes are organized in conventional medical

form: a definition consists of a brief description of

the relevant symptoms, signs, laboratory findings, and

the like Each disease is, in addition, assigned to one

(or at most two) of eleven disease categories which en-

umerate physiological systems (skin, respiratory, card-

Lovascular, etc.) While the editorial style of the

book is highly telegraphic, with many attributes being

expressed as single words, it is nevertheless easily

readable (see Figure 1)

The vocabulary employed consists of about 19,000 distinct

“words" (determined by a lexical definition), roughly

divided equally between common English words and medical

terms We measured word frequency by "disease occur-

rence", (the number of disease definitions in which a

given word occurs one or more times) Sy this measure,

only seven words occurred in more than half the disease

definitions, and about 40% of the vocabulary occurred in

words at the top of the frequency list together with the

number of occurrences.)

Assisted by the facilities of the TMoyrx operating sys-

tem, we created a series of inverted files (from a

magnetic tape of the CMIT text), and developed a set of

interactive programs to form 4 word-and-<context query

system This system has enabled us to study the problem

of inferring term reference in this large sample of text

(some 333,000 word occurrences), within the context of

diseases

An interesting early result was the ease with which many

medical terms could be algorithmically separated from

common English words After adjusting for the fact that

some disease categories are larger than others, we de-

fined an entropy~like measure of the distribution of

word occurrences over the eleven physiological categor-

ies as a measure of category specificity We reasoned

that some medical terms such as ‘murmur’, while not

specific to any particular heart disease, are specific

to heart disease generally This term would not, for

example, be used in describing endocrine disorders

Such a word would be expected to occur in category 04

(cardiovascular disease) frequentiy, and not in the

other categories Such a term would, by our measure,

have a low ‘entropy’ A common English word like ‘of',

would be used in the descriptions of all kinds of dis-

ease, and would accordingly have a high ‘entropy’

Tables 2 and 3 show the top and bottom of the list of

all words occurring in two or more diseases sorted by

this entropy measure In these lists, as our hypothesis

seems to imply, low ‘entropy' corresponds to high

‘specificity', and high 'entropy' to low 'specificity'

This separation of medical terms from common English

words, by algorithmic means, is facilitated by the

context supplied by the notion of "disease category’,

and the fact that this was represented in the CMIT text

* This work was supported in part by grants from The

Commonwealth Fund, and from the National Library of

Medicine (1 K10 LM00014)

Our second experiment investigated the co-occurrence properties of some medical terms Aware that many medi- cal diagnostic programs have assumed attribute independ- ence, we sought to shed light on the appropriateness of

the assumption by evaluating it in terms of word co-

occurrence in disease definitions

Since the previously described procedure had given us a means of selecting medical terms from commen English words, it was possible to produce lists of 'pure' medical terms We then wrote a program which formed all pairs of such terms (ignoring order) We defined an

‘association measure’ (A) which measured the difference between the observed co-occurrences of term-pairs (they could co-occur in any location in the definition and in either order), and the co-occurrences expected from chance alone Tables 4 and 5 show the top and bottom of

a list of all pairs formed from the low entropy terms in the previous experiment The first 1120 terms were chosen, that is, those having an entropy of 2.0 napiers

ez less The pair list was then sorted by this associa~ tion measure, A

Word pairs which are found to be highly associated, appear to do so for two reasons The test, which is trivial, is that some word pairs are semantically one word despite their being lexically, two Common examples would be ‘White House’ and ‘Hong Kong'; medical examples are ‘vital capacity', 'axis deviation', and

‘slit lamp’ These could have been avoided algorithmic- ally by not taking adjacent words in forming the term- pairs, without any significant overall effect The second reasons for high frequency word co-occurrence is that both words are causally related through underlying physiological mechanisms It is these which had the greatest interest for us, and the measure A, may be viewed as a measure of the non-independence of the symp- toms or signs themselves

The term pairs which are negatively associated, have this property for the same reason If the two terms are used typically in the descriptions of different diseases, they are less likely to co-cccur than by chance (Ina baseball story on the sports page, we would not find

*‘pass', ‘punt’, or ‘tackle'} These negatively assoc- tated pairs may have value in diagnostic programs for the recognition of two or more diseases in a given patient, a problem not satisfactorily dealt with by even the most sophisticated of current programs

Finally, an extension of the entropy concept permits one

to generate (algorithmically) the vocabularies used by the medical specialties (which correspond to the disease categories represented in CMIT This is done by assign-~ ing terms which occur predominantly in one category to a single vocabulary and then sorting by entropy Tables 6 and 7 show the vocabularies used in dermatology and gas~ troenterology (as derived from CMIT) These vocabular- ies, it will be noted, can be used as ‘hit lists' for the purpose of recognizing the content of medical texts

In summary, we see the ability to differentiate medical terms from common words by context, and the ability to relate the medical words by meaning, as two of the first steps toward text processing algorithms that preserve and can manipulate the semantic content of words in med- ical texts

Trang 2

AT FEVER, MOUNTAIN: FEVER, MOUNTAIN TICK

ET VIRUS TRANSMITTED BY TICK DERMACENTOR

ANDERSONI

5M CHILLS: HEADACHE; PHOTOPHOBIA; BACK-

ACHE; PAIN IN EYE; MYALGIA; ANOREXIA;

NAUSEA: VOMITING; PROSTRATION

SG SEASONAL, MARCH TO JULY, IN WESTERN

UNITED STATES; INCUBATION PERIOD 4-6

DAYS; ONSET ABRUPT; POSSIBLY SLIGHT

ERYTHEMA; SUSTAINED FEVER, 102-104 F

OR HIGHER SIGNIFICANT; PULSE RATE

INCREASED COURSE: IN PREVENTION,

REMOVAL OF TICK FROM SKIN; APPLICA-

TIONS TO SKIN OF TURPENTINE, IODINE,

ACETONE; REMOVAL OF TICK BY INSERTION

OF NEEDLE BETWEEN MOUTH PARTS; ASPIRIN

FOR PAIN; ANTIBIOTIC TREATMENT IN-

EFFECTIVE

cM ENCEPHALITIS, MENINGITIS ESPECIALLY

IN CHILDREN

LB WBC DECREASED: MONOCYTOSIS; COMPLE=

MENT=FIXATION TEST POSITIVE; INJECT-

TION OF SERUM OR CSF KILLING SUCKLING

MICE; NEUTRALIZATION OF VIRUS WITH

IMMUNE SERUM RESULTING IN SURVIVAL

Figure 1 Typical disease 'definition'

taken from CMIT

possibly 489 severe 368 years

usually 443 chronic 340 weakness

on 434 treatment 337 inflammation

infection 431 later 336 age

features 426 absent 338 within

at 421 asymptomatic 331 lower

associ ated 415 rarely 327 necrosis

increased 414 hereditary 325 pos ‘ive

blood 398 abdominal 316 whe

fever 75 involver ant 381 congenital

loss 369 especially 361 enlargement

Table 1 The highest frequency words

used in CMIT, together with the number

of disease definitions in which the

word occurs at least once

eV} 606 602) 02 VÚI V02 VỐI VỚI UY V01 76 tri n 52 -OL V.ỐI V0 6.02 «03 0) VỤ) VI 7 01 03 ueteke - 2Ô c0} VỐL V0 07 Ơ) 03) UL 0 1 603) «602 loding 17

.Ð3 V00 U2 V03 HH li VỚI Od) I2 U2 Ok ạcg LÔ

VI VŨ V01 VÌ) 606) 608 Ot) UI 65) £02) VŨ) prenchowcopy 26

OF U2 VÂN ,02 U 07 VÉ ỐI O3 602 72 cảtazactC 53

Ol U2 VŨ) Ũ) ỐI ÔA = 602 72 VÚ3 ,OI 03 uEethrali &&

Ol Ô3 02G V03 ÚZ Uk Ú02 71 03 V02 VŨ) urethrs 5ð

e030 Ol) ,U2 OA VỦA V02 II UY ỐI VŨ) cervix bé

VŨ VŨI Ú2 V02 0A ỐI VỐI OL 6.06) VỐN 669 viên L2 -02 U13 VŨ Ũ) VÔ Oe 02 Ol 03 Od TE dneraocular 21 H/Ẫ0C OF V02 05 2.02 0A V02 £70) 0) 102 ,Ú3 pyuria 2

oOf 6.0L) 402) 4.05) oot 603) VỐI OE 60? ,IÁO 20) ventticl+e 9$

«G1 006 602) OS) p06 VỐOA VỚI 03) Oo 2 sdenơms 2l

-05 V06 O3 1.02 ỦA 08 02 ,ỐI 03 02 Ũ1 splemecteny 27 j2 02 02) 0H U) V6 UI V027 07 03 03 targear LÍ

«02 2.02 1.02.03) 65 i2 oUF Ud) 602 03 valve 35

„AI 60S) 602) 6030 U2 0A 02 VỐI ,IÍ 02 b5 deen = 20

«03.02.03 Vi 02 106 VÚA OE we 602) ,Ô) peenmocnoreax 28

Table 2 The lowest ‘entropy’ words

in CMIT, in order of increasing ‘entropy’ The entropy is given in the first column; the entries in the next 11 columns are

the percent of occurrences in the ll

disease categories (body as a whole, skin, musculo-skeletal, respiratory, Cardiovas~ cular, hemic and lymphatic, GI, GU, endo- crine, nervous, organs of special sense)

Trang 3

¿.4b20 ,ia vỦ0 UB UY

2.5050 6 9 612 ud ,U

2.3655 ,.35 U9 ,LÌ ,07

2.5047 ,L2 ,ỦU ,00 WU?

2.3640 lu 210 11d ,09

2.3642 08 ,L1 ,12 ,UW

2.3647 ,I1 ,Ub5 „|! 0á

2.36353 U7 UW 121 05

2.3600 60% 106 ,UB WUT

2.3668 07 U9 ,Ủb 06

È.3687 09 .09 09 v10

2.3701 03 07 13 ,U9

2.3708 10 u6 10 LÔ

2.171) 406 06 12 12

2.3748 wt 09 ,07 0H

2.3776 UF II lá 208

1.3793 U6 ,Ú9 U§ ,Ú?

2.3796 ,06 08 UB ,UB

2.3794 ,09 0H 08 1Ú

2.5801 U9 ,ñ9 ,I10 UT

1.3815 ,06 L0 ,0R il

2.3621 ,Ú7 ,IU ,I0 sắt

2.3647 06 wil «Lh wud

2.3055 tl 09 09 09

2.3646 ,U9 609 ,0H 0U

2.4899 08 IL ,I0 ,.ŨE

2.3902 09- 09 09 08

2.1936 07 10 LD ,UB

2.3950 08 10 10 U9

2.3955 606 210 ,lU 09

Table 3 The

CMIT

words

A M1 P11 Uo up

U.952U 43 9b (23, 0)

U.967L 24 ,9b (24 , 0}

0/9470 21 296 (21, 0)

U.9521 27 ,9 (21 , U)

0.9279 l6 ⁄4 (l6, 0}

U,926? 16 ,93& (16, 0)

ÚW,9247 21 96 (2L, 0}

0.9191 11 ,92 (11, U)

0.3146 34 9á (32, 1}

U.9i26 lo 3& (là, Ú)

U.9061 19 9% (19, Ủ)

0.9030 29 34 (2H, Ú)

O.9usS 13.92 (22, 0)

0.9032 11 9L (2U , Ở}

U,H9f4 áp 692 (4S, 8}

0.0965 21 91 (20, 0}

U.B95% d .9U (8, U}

0.8956 8 90 (8, 4)

0,496 12 293 (1d , 0)

U.8912 JU ,Jé (29, 1)

0.8906 9 .WL (9, 0}

0.4dbB9L.- k6 ,92 (1 , 1)

U.d69l il ,92 (l1, U}

U.dB4@ (1 93 (13, 9)

0.884) 21 92 (22, 0)

U.H877 db 8 (16, 2)

G.dale 29 9U (37, 0)

0.8867 21 VWU (27, 0)

0.0866 29 97 (2W , 2)

0.8433 10 92 (10, 0

0.882Đ 55 ,31 (5L, 1)

U-06 7 M9 (7? , 0)

O.4802 J4 ,32 (22, 1)

0.4766 ll 92 (11, 0}

0.8733 # ,9U (4, 0}

Table 4

,UÄ ,09 ,0U ñ9_ ,0wW „10

„VŨ? ,8 10

highest '

#1

Ul

su)

Ud -ul

Oe U3

„83

02 +2

„út +40 a3 +95 U2

«Uj

Note that these

(1031 (1a) (103)

{150}

(103) (103)

-UỦb ,l3 ,11 0H bhogery a

10 LD „IÍ 20?) common 422

«lu ,06 ,UW ,UŠ marked 39

-lG Ú7 11 Jil absence 447

oll 433) 07) wth simple 46 c0? 414 608 10% 2 130

+06 ,1? 09 07 severe 489

OF tl 09 09 lace 125

;i2 ,09 ,nW 05 af 332

-lÍ ub ,l| ,ỦÁ mọạc 478

209 410) 1L 0? and 603

DU ,U3 113 ,08 cases 260

etl 98 10 210 usually 1379 -0F 09 l] «09 general 7ñ -Q? 08 09 12 as 9HO -10 ,Ú7 ,09 12 of 3206 sli ,07 ,0M lIL from 989

«0# oll O07) 210 after 538

„1U JO?) 093 ,12 sarly Jat

„1U ,09 ,08 LE by LeU8

entropy’ words in are common English

UL (23) vana-cava

„0L (2U) jonalacton-nenuf acture

«UL (21) salinercatharsis eUL (27) inhalation=macter

-00 (12) ÍtA€tuca—=comaminut ed +01 (30) tcz~Leade

Ou (1k) nasai-rhinoscopy

«Ub (23) inheiat lon~percutaneous

«Ul (34) “ir-ppm

The top of the word-pair list in

decreasing order of association value (A)

151

~-Ö0.0974k á9 U02 (UỤ, 5} „12

“0.0891 Je ,03 (0, á} „L2

Table 5 The bottom words

ud

(381) (Mil) (381) (361) (jot) (Jal) (381) (281) (381) (281) (34L) (381?

{3H1)

{31441)

(381) (341) (381) (381) (3ảL}

(38) G41) (381) (381) (381) (381) (Jbl) (341) (381)

(3811

(281) (381) (381) (Jol) C381}

Pj

„04

ad viỆ

UZ

U2

oud

22

oud

aude

ui

«03 ÚI

od

Gl

aud

.U2

„Ji

„01 U2 aul

Ud

«Ui

oUL

bd

„03

ed +01

„0i

uz

„U2

«12

04 oGL

„it

„0t

uj

(110)

(1)

(150) (bá) (511 (30)

(50)

(44)

(44)

(47)

(93)

(41) (170) (&0)

(uU)

(745) (36)

(35)

(u4) (60) (35)

(52) (5e) (05)

(64)

(97)

(31L?

($3) (od) (129)

(32) (95) (30)

(bá) (3U)

(50)

(30)

(ol)

(29) (29)

{bu}

(29)

(29) (5a)

(23)

(25)

wie

poueeventricular

bone-vacinal

bone=«cp

bonda~csfv1K bong~srrLcture

nona=ttris

bane=paroxyse+L bona=carchneterLzáL toa

b°ng=rhychnm bonsa¬^sgLaucona bane=n

bone—wave uyspnea“epidgarnis bone-qrs

boneer ight ponw=scerility

aySpnes-nerves

dyapnea~-scalp cone~tisias

dyspnea=urechral

vone-corive

dyspnea-gait

bone=cillecy oone=pulsonic dyspnea-hyperkeratouis dyspnea~koseg

bone-atrial

ouneeurethral

bone-perineus

bone-ovary

uyspnea-cystoscopy

dyspneacdiak

dyupasa-nyscapmue

bonn=srtery

dyspnea=,enitalts bana=ventricla

bone-engtocardiagrapny

pone—con junccive

bone-leads

bong=exertional

dyspnua=penis dyaunaea~nehnavLor

bọne~>dLapnrage

bone=ual lap

bona=pup11

bone-gallbladder bone=dysarthria

bone-abortion

bone-urectnra

bane=con1unacc1vak bone=fieLd oone-eavironaent

of the word=pair list, showing the negatively correlating

Trang 4

1.9926

2.0008

2.0032

Table 6

vocabulary

papules 76

acanthosis = 44

nyperkeratosig 56

macuiea 11

involution 22

senaceaus horny 21 keratin 1%

stratus 21

pruritus 185

soles 40

itches

pa’

crust 16 keratosis 17

Circusacribed 65 crusting 27

meat | 15

leaving 4 Plaques 57 sunlight 25 verrucous 14 nail +2 scaly 22 ridges 25 hyperkaratoLic 17

A word list generated algorith-

mically which constitutes a dermatological

The disease category ‘skin’

is represented by the third column

Table 7 A word list generated algorith- mically which constitutes a vocabulary of gastroenterology The eighth column represents the disease category ‘digestive system’,

Ngày đăng: 21/02/2014, 20:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm