Ulwahlulo lweMigodi yokuLawulwa kweMpahla

Ukwahlula yindlela yokumbiwa kwedatha eyenza iindidi ekuqokelelwa kwedatha ukwenzela ukukunceda kwizibikezelo ezichanekileyo kunye nokuhlalutya. Kwakhona kuthiwa ngezinye izihlandlo esibizwa ngokuba ngumnqophiso woMnqumo , ukuhlengahlengiswa ngenye yezindlela ezahlukahlukeneyo ezijoliswe ukwenza uhlalutyo lweedasethi ezinkulu kakhulu.

Kutheni Ukwahlukaniswa?

Inqolobane enkulu kakhulu iya kuba yinto eqhelekileyo kwihlabathi lanamhlanje "yedatha enkulu." Cinga i database ephethe i-terabyte ezininzi zeedata-i-terabyte enye yeetrillion bytes yedatha.

I-Facebook yedwa iqokelela ii-terabytes ezingama-600 zedatha entsha nganye ngosuku (ngo-2014, ixesha lokugqibela elixela le ngcaciso). Umngeni oyintloko wenkcazelo enkulu yindlela yokuyiqonda ngayo.

Kwaye umthamo omkhulu awuyena yodwa ingxaki: idatha enkulu iphinda ihluke, ingatshintshwa kwaye iguquke ngokukhawuleza. Cinga idatha yomsindo kunye nevidiyo, izithuba zeendaba zoluntu, idatha ye-3D okanye i-geospatial data. Olu hlobo lwedatha alukwahlulanga okanye luhlelwe lula.

Ukujongana nale mngeni, kukho uluhlu lweendlela ezizenzekelayo zokukhipha ulwazi oluncedo luye lwaphuhliswa, phakathi kwezigaba .

Ukwahlula Kwenziwa njani

Kwiengozi yokunyuka kakhulu ukuya kwi-tech-speak, makhe sixoxe ngendlela udidi olusebenza ngayo. Injongo kukudala imigaqo yemigangatho eya kuphendula umbuzo, wenze isigqibo, okanye uqikelele ukuziphatha.Ukuqala, isethi yedatha yoqeqesho iququzelelwe iqulethe izixhobo ezithile kunye nesiphumo esinokwenzeka.

Umsebenzi we-algorithm yokuhlukanisa ukufumanisa ukuba loo mpawu yeempawu ifikela kwisiphelo sayo.

Umzekelo : Mhlawumbi ikariti yekhadi lekhredithi lizama ukufumana ukuba yeyiphi ithemba elifanele lifumane isipho sekhadi lesikweletu.

Le nto ingaba isethi yayo yoqeqesho lweedatha:

Idatha yoQeqesho
Igama Ubudala Ngesini Ingeniso yonyaka IsiKhokelo seKhadi leKhadi
John Doe 25 M $ 39,500 Hayi
Jane Doe 56 F $ 125,000 Ewe

Iikhowudi ze " Age ", i- Gender , kunye neNgeniso yonyaka inika ixabiso le "mveliso". Kwisixhobo sokuqeqesha, isalathisi sokuqhelanisa siyaziwa. I-algorithm yokwahlula ke izama ukuqonda indlela ixabiso le-attribut predictor fi kelelwa ngayo: ziziphi ubudlelwane phakathi kwezinto eziza kuqala kunye nesigqibo? Kuya kuphuhlisa imilinganiselo yemigaqo yokuqikelela, ngokuqhelekileyo i-IF / ITEN statement, umzekelo:

Ukuba (Ubudala> 18 OKANYE Ubudala <75) NENKQUBO YONYAKA> 40,000 KWENYE I-Credit Card Offer = ewe

Ngokucacileyo, lo ngumzekelo olula, kwaye i-algorithm iya kufuna i-sampling data enkulu kakhulu kunezo rekodi ezimbini eziboniswe apha. Ukongezelela, imigaqo yokubikezela inokuba yinkimbinkimbi kakhulu, kubandakanywa nemithetho-mithetho yokubamba iinkcukacha.

Okulandelayo, i-algorithm inikezwa "isethi yokuqikelela" yedatha yokuhlalutya, kodwa le setethi ayinalo uphawu lokuqikelela (okanye isigqibo):

Data Predictor
Igama Ubudala Ngesini Ingeniso yonyaka IsiKhokelo seKhadi leKhadi
UJack Frost 42 M $ 88,000
UMary Murray 16 F $ 0

Idatha yolungiso luza kunceda ukuqikelela ukuchaneka kwemithetho yokuqulunqa, kwaye imigaqo iyahlulwa kuze kube yilapho umthuthukisi ebona ukuba iziprofeto ziyimpumelelo kwaye zi luncedo.

Usuku ukuya kwiSuku imizekelo yokuBaluleka

Ukwahlula, kunye namanye amacandelo okuchonga kwedatha, kusemva kwamathuba ethu omhla ngemihla njengabasebenzisi.

Isibikezelo sezulu singenza ukusetyenziswa kohlengahlengiso ukunika ingxelo ukuba ngaba imini iya kubaleka, ilanga okanye ifule. Umsebenzi wezobugqirha unokuhlalutya iimeko zezempilo ukuchaza kwangaphambili iziphumo zonyango. Uhlobo lweendlela zokuhlelwa, i-Naive Bayesian, lusetyenziso olunemiqathango yokuhlukanisa ii-imeyli ze-spam. Ukuqhelana nokukhwabanisa kwizibonelelo zesebe, ukuhlengahlengiswa kukulandelelaniso imihla ngemihla ukuhlalutya idatha nokuvelisa izibikezelo.