sự phát triển nhanh chóng của các ứng dụng công nghệ thông tin và internet vào nhiều lĩnh vực đời sống xã hội, quản lý kinh tế, khoa học lỹ thuật... đã tạo ra nhiều cơ sở dữ liệu khổng lồ
Trang 1? ~
4.1 M(1 DAV
Chu'dng nay trinh bay mQts6 ling d\mg nhli:
Minh [82]
trifc khuiln E-Coli [81]
d6ng cua khuilnxo~n Onchocerca [14], [81]
5.000.000 nhan khiu, ghi nMn nhung thong tin cd ban v~ hQ va nhan khiu co
dang ky thu'Cfngtru hay t:,lmtrU trang hQ, bitn dQng v~ hQ va nhan khiu nhu':tang, giam (sinh, tii', di, de'n ) va cae thay d6i v~ tinh chit khac MQt s6 tllUQCtlnh trang bang dli li~u nhan khiu du'<;lctrinh bay trong bang 4.1.
Mi~n gia tIi cua mQt s6 thuQc tinh:
a)Ngimh daD tq.o
Hoa chit' :V~t li~u xay difng', 'Gia'y g6 diem', ' Sanh sii" , , Lu'dng thtfc rhifc
Trang 2ph~m',' D~t, da may moc',' Khoa hQc t1,1'nhien',' toaD hQc' 'v~t 19 hQc',
b)Nghl nghifp
Dom(NGHE_1D)={' Khong co ngh~', ' Uinh d,!-o,qUail19cae co quail cua Dang,
GOanthi va cac bQ ph~n cua cd quaD do', 'Can bQ nghien euu khoa hQc', 'Can
be>gi<io d\1c daD t'!-o' }
Bang 4.1 Me>ts6 thuQctlnh trong bang dii'li~u nhan kh§:u
c) Bdng cap
d£ng d,!-ihQc', 'CaD d!ng, d,!-ihQc ' 'Tren d,!-ihQc' , 'Khong xac dinh tdnh do', , Khong thuQc tdnh dQ nao neu tren'}.
Ten thuQc tlnh Yn,ghla Ten thuQc tlnh Ynghla
NK_1D Mfi nhan khu LOP _HOC_ID Ma tdnh dO hQc
'"
van
I NG_PS1NH Ngay pMt sinh TT_LAODONG noh tf<,lnglaD ,
d(mg
!
I
TUO1 1D Ma tu6 NGANH DT AO Nganh daD t<,la
DAN TOC 1D Dan tOc NGHE 1D Ngh nghi<;:p
TON G1AO I Ton ,giao CAP QL Y ID cap QU<ln19
PX CU TRU Phuong xa cll"trU CHUC NANG Ch((c Dang
LY_DO_CU_T Ly do cu tni TP _KINHTE - Thanh phn kinh
khu
Trang 3d) Lao dl}ng
Dom(TT_LAODONG)={'Chu'a co vi~c lam', 'Vi~c lam 6n dinh', 'Vi~c lam t~m
4.2.2 Tun lu~t ke't htjp
Sa d\lng cac thu~t giai tim t~p ph6 bien, lu~t kef h<;fpvao bai roan tim caclu~t ket h<;fptmng CSDL nhan kha':u Ma tr~n bi€u di~n ngu dnh khai thac dii'li~u co dong la nhan kha':u,cQt la cac chi baa co d~ng <a,v>, vdi a Ia thuQc tinh
va v la gia tr~cua thu<)ctinh MQt s6 chI baa tieu bi€u nhu':
<NGANH_DTAO: Khong co nganh dao t~o'>
<NGANH_DTAO, '£)o a<;lc,tham do, khai rhac'>
<BANG_CAP_I, CaD dAng d~i hQc'>; <BANG_CAP_I, 'Tren d~i hQc'>
<BANG_CAP_I, 'Khong xac dinh tdnh d<)' >
trang ph\ll\lc AI, trong d6 co cac lu~ r sau :
~
doanh)( chI liy cac eip tu pho giam d5c ho~c tLiongduong tra len) ~ CaDdAngd'.ti hQc , CF=O.59%.Lu~t R68 cho tha'y tdnh d(>hQc vin cila Ianh d~o cae coquail, xi nghi~p san xuit, kinh doanh nba nudc (qu6c doanh)
CF=l.O
4.2.3 Tun lu~t pban lop
Sa dt:lllgcae thu~t giai tim lu~t pMn Wpdlja tren lu~t ket h<;fptmng do ve
mE cila lu~t pMn ldp Ia cae thuQc tinh Nganh daD t~o, Ngh~, dQ tu6i va ve
Trang 4phai cua lu~t pMn ldp Ia tlnh tr~ng laDd(>ng.Cac k€t qua khai thac lu~t phanlOp du'Qcli~t ke trong phl;lIl;lCA2 M(>tso lu~t phan lOpdn du'Qcquail tam nhu':
R5: Nganh daD ti:lO= Xay dl;l'ng:=>Chu'a co vi~c lam CF=O.85
R8: Nganh daD ti:lO= Giao dl;lC,su' ph<;lm=> Chu'a co vi~c lam CF=O.84
R16: Nganh daD ti:lO= Sinh v~t hQc=> Chu'a co vi~c lam CF=O.85
<BANG_CAP_I, u.o dang! dai hoc> '-HTT_LAODONG GhLJdGOVlec lam> ;LJPP = 0.42 .
<B.O./'IG_CAP_I Cong nhankylhuat) "'><TT_LAODONG GhLJdco viee lam> ~upp = 0.06
<BANG_CAP -I Khong xac dinh duoc trinh do> > <TT_L6.0 DONG chuaco 'v'lee lam> su~
<BANG_CAP _I T rung hoe chuyen nghiep> -><TT _LAODONG chua co viec !.'lliI> :rupp z,
<GIOI_TlNH_- Nam> <BANG_CAP_I Coo dang I daihoc> u.><TT_LAOOONI:' ChUd co vi.
<GIOI_TlNH_, Ham> <BANG_CAP -I Trung hoc chuyen nghiep> -HTT_LAODONG ehw.
<GIOI TINH Ham> -.><TT lAODONG ct.» co viec lam> supp = 0.46 coni = 0.82
<GIOCTlNH: Nu> <BANG_CAP -I Cao dang I daihoc> ><TT _LAOOONG chua co vie<
<GIOI_TlNH_ Nu> <8ANG_c;AP -I Trung hoc chuyen nghiep> u'HTT_LAOOONG ehtid I
<GIOI TINH Nu>-><TT:tAODONG.cnuacoviecJam> SIJPP=0.35 coni = 078
<NBANH_OTAQ, Che taonap rap I sua ct.»~vath.et bi> <GIOI_TlNH Naffi> -'HT
<NGANH DTAO.Chetaollaoraol $uact.»_vathietbi> u><TT LAOOONG.chu;'e-':J
Hinh 4.1: Giao di~n cua phan m~m tim lu~t pMn lOptrong CSDL
4.3 TIM DO~N L!P PHO BIEN TRONG T!P DU LItU CAC TRINH TIJ
SINH HQC ADN CUA TRt)C KHUAN E-COLI (811
4.3.1 T~p dii' li~u trmh tl1 sinh hC}ccua tnfc khuftn E-Coli Promoter
hi~u b~t dftu m(>tgen ( phl;lll;lCBl) Cae dii' li~u du'Qc khao sat trong lu~n an la
Trang 5http://www.ics.uci.edu Dit li~u trlnh tlf E-Coli Promoter dU<;1c trlnh bay (j ph\l
Promoter ( nhom cQng) va 53 trlnh tlf con l'.li khong thuQc nhom E-ColiPromoter ( nhom trit)
4.3.2 TIm do~n I~p ph6 bie'n trong nhorn cQng
Dung cac thu~t giai dli trlnh bay trong chuang 2 vao t~p dit li~u 53 trlnh tlfsinh hQc thuQcnh6m cQng E-Coli Promoter se kham pha cae do'.ln l~p ph6 bie'n
MQt s6 do<;1nl~p c~n lu'u ynhu': ATTTTT CAAAAA, TTTTTT TTTGTT, GTATAA, ATAATGCGCC, TATAATGCGC.
4.3.3 TIm t6 hqp do~n lijp ph6 bie'n co kha nang phan 100p
Ph\ll\lc B5 cho tha'y kha Dang d6ng thai xua't hi~n cae do,:!nl~p ph6 bie'n
Idlt 20 :GCAA!i1ljJATCAATGTGGACTTTTCTGCCGT GACACTTTTGTTACGCGTTT
Go phi tiep tuc
Hinh 4.2 Ban d6 xua't hi~n t6 h<;1p{ATA, ATT}
Trang 64.3.4.TIm t6 hQ'pth11ttf cac do~n lijp ph6 hie'n co kha Dang philo lop
Dung cac thu~t giai trong chudng 2 d~ tlm t6 hQpthli £1.1cac do(,lnI~p ph6bie'n co kha Dang phan lop nhom cQng E-Coli Promoter MQt s6 t6 hQp thd' t1,1'cae do(,lnl~p ph6 bie'n co kha Dangphan lOpnhom cQngE-Coli Promoter nhu:
cao
*** E-COLI PROMOTER***
KHAM PHA 03 MAU LIEH HHAU
Co 2 chooi choaTAT->AAT->GCGC
Co 2 chuoi trong 10p cong
Co 0 chooi trong 10p tro
Sai so phan 10p cong "1.00
Id# "15
+:TTDrnDATTTTTCGCTTGT~GGCCGG~AACTCCC~~~CACCACTGACA-Id# 38
~.)
Hinh 4.3 Ban d6 xua't hi~n t6 hQptbU't\.l'do~n l~p ph6 bien TAT ~AA T~GCGC
4AT~O KIEN TRUe PHAN CAP DU LItU TRINH TV ADN vA KHAOSAT TRINH TV TUONG DONG eUA KHUAN XOAN ONCHOCERCA[14], [81]
4.4.1 Khudn xoiin Onchocerca
MQt trong nhung nguyen nhan hang dftu gay ra b~nh loa d con nguoi Ia
b<$nhOnchocerciasis gay ra bdi khu~n xo~n Onchocerca Day la lO(,liky sinhtrung duQc philt hi<$nd Chau Phi, Nam My va Trung My Du li<$ukhu~n xo~nOnchocerca duQc la'y tit Web Site NCBI ( http://www.ncbi.nlm.nil.gov) Pht,i It;lcB3 la t~p du li<$utrlnh tif sinh hQccua khu~n xo~n Onchocerca
Trang 74.4.2 T~o kiln trUc phan ca'p C1}ffidii' Ii~u khwln xoi\n Onchocerca
Dung cac thu~t giai dii trlnh bay trong chuang 3 d€ t~o ki€n true phan ca'p
li~u khuffn xo~n Onchocerca dU'<Jctrlnh bay trong hlnh 4.4.
Cac ma 56 nhu OVUO2740, OVUO2866 «ng vdi cae ma 56 truy c~puO2740,uO2866 trong website co dia chi httv://www.ncbi.nlm.nih.gov/entrez,Vdi ma s6 OVUO2740se co thong tin chi ti€t v~ tdnh tt,I'nhusau:
Related Seout:[Lc,es PubMed Taxo1omv
DNA linear INV
25-Onchocerca volvulus isolate Liberia10 tandem repeat sequeLce U02740
Metazoa; Nematoda; Chromadorea; ;pirurida;
1 attttqcaaa attqattatt aacagatgac ctatgacata taatttcaaa
61 qtaccttcaa attqagtcct aaqaaaaata ttcgactata tttttt
-" OVUUZ73' H'ATTTTGCAAAATTGATTAT-T~"'AG~TC.AC"TA"'A CATA'AA'
1
- ovuuz,""" nATGCA~AA' "AATTAA~AAC"'~'GACe'A"_A'CC.C"AA'
~ 0VU02733, -A&T' T G C~AAA' T-AAT T~~CAAC' "~T GACeT A" -.AT C~C" AA'
- ~,H'AGTTTT"CAAA TTGCCTAATAACTGATG~CCTAH.A C<OCT A,~T.
" OVU029A8, u'ATTTTGCAA.ATTGATT~T-T"ACA<"'.'GACCT' c.ACA'A'"
-OVUQ2950 A,TTTGCAAAATTGATTAT TAACAGA'GACCT"" GACATAH
~ OVUO283A G,TTTGCAAAATTGAHAT-TAACAGATGACCT T"-ACATAT<
Trang 8D1,1'avao-thong tin do NCB! cling ca"p,d~ thiet l~p cac thong tin chi tiet v€
Chi tiet cua C\lID[I 2] [I 24] [I 25] nhttsau:
(1] #Entries: 2, Level 2(Internal), Sibling: 0, Size: 7322/53248
OYUO2592; OVUO2593
(1 2] #Entries: 9, Level 1(Internal), Sibling: 0, Size: 32886/53248
OYUO2594; OYUO2731; OYUO2732; OYUO2733;
OYUO2734:0 volvulus isolate Liberia04 tandem repeat sequence.
OYUO2776:0 volvulus isolate Malil2 tandem repeat sequence.
OYUO2737; OVUO2738
(1 24) #Entries: 11, Level O(Leat), Sibling: 0, Size: 40190/53248
OYUO2740:0 volvulus isolate Liberia 10 tandem repeat
OYUO2766:0 volvulus isolate Mali02 tandem repeat sequence.
OYUO2813:0 volvulus isolate ZaireO5tandem repeat sequence.
OVUO2856:0 volvulus isolate Guatemala07 tandem repeat sequence.
OYUO2872:0 volvulus isolate Guatemala23 tandem repeat sequence.
OYUO2776:0 volvulus isolate Mali12 tandem repeat sequence.
OYUO2852:0 volvulus isolate GuatemalaO3 tandem repeat sequence.
OYUO2734:0 volvulus isolate Liberia04 tandem repeat sequence.
OYUO2786:0 volvulus isolate Mali22 tandem repeat sequence.
OYUO2806:0 volvulus isolate Liberia53 tandemrepeat sequence.
OYUO2771:0 volvulus isolate Mali07 tandem repeat sequence.
(1 2 5) #Entries: 14, Level O(Leat),Sibling: 0, Size: 5] ]46/53248
OYUO28]7:0 volvulus isolate ZaireO9tandem repeat sequence.
OYUO2750:0 volvulus isolate Liberia20 tandem repeat sequence.
OYUO2780:0 volvulus isolate MaliI6 tandem repeat sequence.
OYUO2784:0 volvulus isolate ~i20 tandem repeat sequence.
OYUO2796:0 volvulus isolate Liberia43 tandem repeat sequence.
OYUO2822:0 volvulus isolate Zaire14 tandem repeat sequence.
OYUO2778:0 volvulus isolate Mali14 tandem repeat sequence.
OYUO2823:0 volvulus isolate Zaire15 tandem repeat sequence.
OYUO283I :0 volvulus isolate BrazilO2tandem repeat sequence.
OYUO2855:0 volvulus isolate GuatemalaO6 tandem repeat sequence.
Trong C\lIDcon [l 2 5] co cac tdnh tif :
OVUO2831 :0 volvulus isolate BrazilO2 tandem repeat sequence.
.
. OVUO2855:0 volvulus isolate GuatemaJaO6 tandem repeat sequence.
Dng voi cac khugn Onchocerca xuat hi~n C1Trung My va Nam My
Cl,1IDcha [I 2] U'ng voi trlnh t1,1':
. OVUO2734:0 volvulus isolate LiberiaO4 tandem repeat sequence.
OVUO2776:0 volvulus isolate Mali12 tandem repeat sequence.
Trang 9
Ung vdi cac khuffn Onchocerca xuat hit';n (j Phi CMu Nho thong tin chi
xuat phat tti' CMu Phi r6i Ian truy6n sang Trung My va Nam My Ket qua gem
!! 0VIJC273T O.V- - LbeOo(I1 D~jA>e<i'-lence.
!! 0YU6V'32 O.VoMrWii_LiIooioI)2 DNA,_ence.
:: 0'1\.II!2133'0 V li>eoioO3DNA ,ence
!! 0\I\J0Z734, 0."""""" -lJJeMW DNA_-"">Ce
g> 0\IU02134: D."""""" isdoteIJIeoiaO.I DNAreQUenCe
~ OWO27!!S 0."'- isdote Ma22 D~jA _ICe.
~ 0I.'U0Z806 0.11- isdotelbetio53 DNA'equence
Nodo§:'J5124801>«"'" lood,"""""""' I D~jA
~.I,oad ~ D.~ A\ farse 0 t!JqI <.I!:or~At 0 Leave-: W l!.tp
sa'HangeO"",I Jr-t 19.ti -oueoy.a HgmoSearch~ ~rinI 0 Pgffl ~ ~
IISt tll: I!D~Llf.P1T"",far_du.te- !J~ESH~~ )' $~:-!;.-::a-
4.4.3 Tun vimg bao t6n gen qua cae the-h~ tie-nhoa
Trang 10Ph1,111,1c B8 trinh bay cac trinh t1,1' lien ti'ng eua cae e1,1m[J 3 13], [1 3 14], [1 3 15], [1 3 16] Khi so sanh cae; trinh t1!lien ling co th6 kham pha ra eae vungbao t5n gen qua cae the' ht$tie'n hoa Sau day la doi;1nd~u lien trang cae trinh t1,1'
[1 3 [1 3 [1 3
13] :AGTTTTCAAAATTGCCTAA-TAACTGATGACCTA-G
14] :AATTTTCAAAATTGCCTAA-TAACTGATGACCTATG
15] :AGTTTTCAAAATTGC-TAA-TAACTGA -C-TATG 16] :-GTTTTCAAAATTGCCTAA-TAA-TGATGACCTATG
Vung bao t5n eua eae tdnh t\1'trong e1,1m:
: TTTTCAAAATTGC-TAA-TAA-TGA -C-TA-G4.4.4 Truy va'n trinh tl1 tu'dng dang [14]
TIm tdnh t1!tu'ong d5ng v6i tdnh t1!eo ma OVUO2776 (NCBI) va ban
tmy e~p OVUO2776.
":",,::~~~ ::,,.L -tP'
,.: """""', >." J ,c ce, r <' cerM ce<' : "" - '.L"""'-_'-""""""""
t~A!::.iIi;iiii-< I-, -""".::.IIA-_o lIJoSIs ,
Hinh 4.6 Ke't qua truy v!n trinh t1!tu'ong d6ng vdi trinh t\1'OVUO2776 (NCBI)
Danh sach cae trlnh tl1 ke't qua:
OVUO2783 -A TTTTGCAAAA TTGATIAT -TAACAGATGACCT
ATGA-CAT AT AA TTICAAAAAACGGGT
ACGT-ACCTTCAAACTGAGTCCTAAGAAAAATATTCGACfATAlll TIT.
Trang 11OVUO2812 -ATTTTGCAAAA TTGA TT AT -TAACAGA TGACCT
ATGA-
CATATAATTTCAAAAAACGGGTACGT-ACCTTCAAACTGAGTCCTAAGAAAAATATTCGACTATAlll
IlT-OVUO2825 -ATTTTGCAAAATTGATT AT -T AACAGA TGACCT
ATGA-CAT AT AA TTTCAAAAAACGGGT
ACGT-
ACCTTCAAACTGAGTCCTAAAAAAAGTTTTCGACTATAlll11T-OVUO2828 -ATTTTGCAAAATTGATT AT -T AACAGA TGACCT
ATGA-
CATATAATTTCAAAAAACGGGTACGT-ACCTTCAAACTGAGTCCTAAAAAAAGTTTTCGACTATAII
llIT-OVUO2835 -ATTTTGCAAAATTGA TT AT -T AACAGA TGACCT
ATGA-CAT AT AA TTTCAAAAAACGGGT
ACGT-
ACCTTCAAACTGAGTCCTAAAAAAAGTTTTCGACTATATTTTTT-OVUO2840 -ATTTTGCAAAA TTGA TT AT -T AACAGA TGACCT
ATGA-CAT AT AA TTTCAAAAAACGGGT
ACGT-
ACCTTCAAACTGAGTCCTAAAAAAAGTTTTCGACTATAllll1l-OVUO2833 -GTTTTGCAAAATTGA TT AT-TAACAGA TGACCT
ATGA-
CATATAATTTCAAAAAACGGGTACGT-ACCTTCAAACTGAGTCCT AAAAAAAGTTTTCGACT ATA IT I I
IT-4.5.TIM DAYTit PHO BIEN NHAMD~C TRUNG CHO KHOI NGU LJtU
4.5.1 Kho ngii'li~u cae bai tho"cua nha tho"Xu:!n Di~u
Day tit ph5 bie'n la t~p hClPnhi€u tit dung qnh nhau va co th~ khong co ynghla v€ m~t eu phap va ngfi'~~Ia Ph~n nay sa dl,lllgkho ngfi'li~u g()m tuy~nt?P cae bai thd cila nha thd Xuan Di~u, Sau do sa dl,lllgcay h~u t6da phat tri~ntrang ehudng 2 d~ rim day tit ph5 bie'n trang mQt bai thd va trang kh6i figii'li~ug6m nhi€u bai thd
4.5.2.Dimg cay h4u to'me} rqng di phat hi~n cae day tit ph6 bie'n
Lue nay cae dong eila van ban se ]a ddn y! d~ xae d!nh dQ ph5 bie'n cua
ne'u sO'dong ehU'aday tit Ion hdn hay bang T Vi dl;{sau se tlm cae day tit ph6bie'n trang bai thd "Gdi hudng rho gio" eua nha thd Xuan Di~u:
Trang 12so'call 20
sI:Bie't baa boa d~p trong rung th~m.
s2: Bern gdi hu'dng cho gi6 ph1,lphang
s3: MfftmQtdoi thdm trong ke nui
s4: Khong ngu'oi du tii'de'n nhiim hang
s5: Roa ngo dem hu'dng glJi gi6 ki~u
s6: La truy~n tin thi1:mgQitinh yell
s7: Song Ie boa dc;fidng them tui.
s8: Gi6 m~e h6n hu'dng nh<;ltvdi ehi~u.
s9: Tan mac phu'dng ngan I<;legi6 dim
s 10: Du'dirung hu'dngd~p chgng tri am
sII: Tren rung boa d~p rdi tren da
s 12: L~ng Ie hoang bon phu bu'deth~m
s13: Dnh yell muon thud v§n Ia hu'dng
sI4: Bie't mffylong thdm md gifi'adu'ong.
sI5: Bii mffttin~~eu trong gi6 cui
sI6: Khong ngu'oi thffu ro de'n ngu6n thu'dng
s 17: Thien h<;lva tmh nh~n tide md
sI8: NMn r6i khong hilu mQngva thc;f
s20: U6ng nh1,lylong tu'mt~ng khacb bo
(2);hoa d~p (2);dem (2);gdi( 2);mfft (2);thdm(2);nui( 2);khong ngu'C1i(2);de'n I 2); muon (2);long( 2);nh~n ( 2).
Trang 13Cac- day tit ph6 bie"nt6i d~i Iii : bitt( 2);gio ( 5);gdi ( 2); boa d~p ( 2); hu'dng
ffmh 4.71a giao di~n cua ph~n m~m tlm day tu ph6 bien trong van ban
"-d"p[3]
i",.~ [3)
"'- -.01 b!m [1)
""" e.m III '" adio(71
I-<-b"""" [.41
- ,:'rcho(l)
~" ,.Ji[11 ""a_[11
"",,-[1J
""-_("1
B"'~A j Qu ,.
Bi'" t b"" boa dp p 'Na n1"ORtb =
om ad i """DR cOo g'd pO po> 0'"
"
i-It , K"" ""
2 2
Hinh 4.7: Day tuph6 bitn trong bai tho "Odi hu'ang cho gio""cua Xuan Dic$u
cua nha van Nguy€n Cong~Hoan
4.5.3.TIm cac day tu ph6 bie"ntrong kh6i ngu Ii~u g6m nhi~u van ban
eo th~ rut ra cae day tiCph6 bitn trong kh6i ngu lic$u g6m nhi~u van ban (xu!t hic$n t6i thi~u trong hai van ban) Cho OJEO, gQi Fr( OJ)la t~p cae day tiC ph6
'"
bitn cua van ban OJ.Icy hi~u Fr(O)=UFT(Ok)' Fr(O) xac d!nh t~p cac day
tU-k=1
ph6 bitn d~e tru'ngcho cac van ban
GQin=IFr(O)I, m6i van ban DiED du<;jcdae trung b~ng vector nh! phan Vi
eo n thanh ph~n trong do thanh phc1nthli j cua vector vi co tf! 1 neu day tiCtj la