Qestion

Qestion代写 Give two examples of business applications for classification.Automatic message classification and spam filtering.

 

Qestion代写
Qestion代写

1-8:FFTFFTTF

9-14: BBDBCB

 

15.Give two examples of business applications for classification.          Qestion代写

 

Automatic message classification and spam filtering.

Malicious emails and links. Detecting phishing attacks becomes critical for all IT departments in organizations, considering the recent case of the Petya virus, which was distributed among corporate infrastructures through email attachments. Currently, there are many public datasets that provide labeled records of malware or even URLs that can be used directly to build classifying models to protect your organization.

 

Fraud detection. As fraudulent actions are very domain-specific, they mostly rely on private datasets that organizations have. For example, many banks that have fraud cases in their data use supervised fraud detection techniques to block potentially fraudulent money transactions accounting for such variables as transaction time, location, money amounts, etc.

 

16.Explain the difference between structured and unstructured data      Qestion代写

 

Compared to structured data (that is, row data stored in a database, which can be expressed logically using a two-dimensional table structure), data that is not conveniently represented by a two-dimensional logical table in a database is called unstructured data. Includes office documents in all formats, text, images, XML, HTML, various reports, images and audio / video information, and more.

17.What does over-fitting mean in Machine Learning?   Qestion代写

 

Overfitting: The root cause is too many feature sizes, too many model assumptions, too many parameters, too little training data, and too much noise, which leads to a perfect prediction of the training set by the fitted function, but the prediction of the test set of new data difference. Overfitting the training data without taking into generalization capabilities.

Solution: (1) reduce the feature size; (2) regularize and reduce the parameter value.

 

18.Why does it become difficult to use Exact Bayes for classification with many features?

 

Because the Bayesian algorithm assumes that each feature is independent of each other, when the number of features is very large, we have no way to assume that all features remain independent of each other. It is possible to violate the assumptions of the Bayesian formula.

 

19.What is ‘Training set’ and ‘Test set’?

 

Training Dataset: The data set of the internal parameters of the training model is used. Classfier directly adjusts itself based on the training set to obtain better classification results.

Test Dataset: The test set is used to evaluate the generalization ability of the model, that is, the previous model used the validation set to determine the hyperparameters, the training set was used to adjust the parameters, and finally a data set that has never been seen is used to determine whether the model is working.

Qestion代写
Qestion代写

Qestion代写
Qestion代写

 

a)          Qestion代写

Pr(Drink Preference = Wine| click on Tinder advertisement=Yes) =(412/398+412+175+209+78)=412/1272 = 0.32

Pr(FB Friedns > 300 = N0 | click on Tinder advertisement = Yes)

=(1070/1270) = 0.84

b)

P(HIGH| click on Tinder advertisement=Yes) = 120/(1270) = 0.09

P(HIGH| click on Tinder advertisement=No) = 1200/(3730) = 0.32

 

P(BACHE | click on Tinder advertisement=Yes ) = 900/(1270) = 0.71

P(BACHE | click on Tinder advertisement=No ) = 400/(3730) = 0.11

 

P(MAS | click on Tinder advertisement=Yes ) = 180/(1270) = 0.14

P(MAS | click on Tinder advertisement=No ) = 1200/(3730) = 0.32

 

P(OTHER | click on Tinder advertisement=Yes ) = 70/(1270) = 0.25

P(OTHER | click on Tinder advertisement=No ) = 930/(3730) = 0.32

Informational gain           Qestion代写

H(HIGH) = -[(0.09/(0.09+0.32))*log2(0.09/(0.09+0.32))]-[(0.32/(0.09+0.32))*log2(0.32/(0.09+0.32))] = 0.7593

H(BACHE) = -[(0.71/(0.71+0.11))*log2(0.71/(0.71+0.11))]-[(0.11/(0.71+0.11))*log2(0.11/(0.71+0.11))] = 0.5687

H(MAS) = -[(0.14/(0.14+0.32))*log2(0.14/(0.14+0.32))]-[(0.32/(0.14+0.32))*log2(0.32/(0.14+0.32))] = 0.8865

H(OTHER) = -[(0.06/(0.06+0.25))*log2(0.06/(0.06+0.25))]-[(0.25/(0.06+0.25))*log2(0.25/(0.06+0.25))] = 0.7088

P(HIGH) = 0.26

P(BACHE) = 0.26

P(MAS) = 0.28

P(OTHER) = 0.2

 

Weighted H(Edu) = 0.26* 0.7593 +  0.26* 0.5687 + 0.28* 0.8865 + 0.2*  0.7088 = 0.7353

 

P(Edu) = 1270/5000 = 0.25

P(No Edu) = 1270/5000 = 0.75

 

H(Edu) = -[(0.25)*log2(0.25)]-[(0.75)*log2(0.75)] = 0.8113

Info Gain = 0.8113 – 0.7353 = 0.076

 

Qestion代写
Qestion代写

P(HIGH| Click = Yes) = 120/1270

P(BACHE| Click = Yes) = 120/1270

P(MAS| Click = Yes) = 120/1270

P(HIGH| Click = No) = 1200/3730

P(BACHE| Click = No) = 400/3730

P(MAS| Click = No) = 1200/3730

 

P (Male| Click = Yes) = 880/1270

P (Male| Click = No) = 2020/3730

 

P(FB>300| Click = Yes) = 200/1270

P(FB<300| Click = Yes) = 1070/1270

 

P (Female| Click = No) = 1710/3730

P (Female| Click = Yes) = 390/1270

 

P(FB>300| Click = No) = 3500/3730

P(FB<300| Click = No) = 230/3730

 

Incident 1:

Pr[E|Click = Yes] = (120/1270*880/1270*200/1270*0.25)/Pr[E] = 0.0026/Pr[E]

Pr[E|Click = No] = (1200/3730*2020/3730*3500/3730*0.75)/Pr[E] =0.1226/Pr[E]

Pr[Yes|E] = 0.0026/(0.0026 +0.1226) = 0.0208

Pr[No|E] = 0.1226/(0.0026 +0.1226) = 0.9792

 

Incident 2:

Qestion代写
Qestion代写

Pr[E|Click = Yes] = (900/1270*880/1270*1070/1270*0.25)/Pr[E] = 0.1034/Pr[E]

Pr[E|Click = No] = (400/3730*2020/3730*230/3730*0.75)/Pr[E] =0.0027/Pr[E]

Pr[Yes|E] = 0.1034/(0.1034 +0.0027) = 0.9746

Pr[No|E] = 0.0027/(0.1034 +0.0027) = 0.0254

 

Incident 3:

Pr[E|Click = Yes] = (180/1270*390/1270*200/1270*0.25)/Pr[E] = 0.0017/Pr[E]

Pr[E|Click = No] = (1200/3730*1710/3730*3500/3730*0.75)/Pr[E] =0.1038/Pr[E]

Pr[Yes|E] = 0.0017/(0.0017 + 0.1038) = 0.0161

Pr[No|E] = 0.1038/(0.0017 + 0.1038) = 0.9839

 

Qestion代写
Qestion代写

1.    1506

2.    1329

3.    177

4.    129

5.    48

 

Overall Accuracy= 1329/1506 = 0.8825

Stratified:

Accuracy of “Positive” = 317/ (317+48) = 0.8685

Accuracy of “Negative” = 1012/ (1012+129) = 0.8869

Qestion代写
Qestion代写

 

Because in the prediction of diabetes, the sample is very uneven. False positive means that non-diabetic patients are predicted to be diabetic patients, and false negative means that the diabetic patients are predicted to be non-diabetic patients.

Qestion代写
Qestion代写

 

a)

Pr(Buy = 1| Gender = F, Online Customer= Yes, Color = B) = 0.7*0.4*0.5*0.3= 0.042

b)

Pr(Buy = 1| Gender = M, Online Customer= Yes, Color = R) = 0.3*0.5*0.2*1= 0.03

c)        Qestion代写

Pr(Buy = 1| Gender = X, Online Customer= Yes, Color = B) =

Pr(Buy = 1| Gender = F, Online Customer= Yes, Color = B) + Pr(Buy = 1| Gender = M, Online Customer= Yes, Color = B)

=0.7*0.4*0.5*0.3 + 0.3*0.5*0.7 = 0.063

Pr(Buy = 1| Gender = F, Online Customer= Yes, Color = R) + Pr(Buy = 1| Gender = M, Online Customer= Yes, Color = R)

=0.7*0.4*0.5*0.8 + 0.3*0.5*0.2 = 0.142

 

Red is the best ad to show.

Qestion代写
Qestion代写

 

1: Market Uncertainty

2:

P(Ford) = ()/(1+) = 0.4721912270674566

P(SAMSUMG)=

()/(1+) = 0.6058593408947703

P(Google)=

()/(1+) =

0.5845548447921973

P(Blizzard)=

()/(1+) =

0.4899013735104882

SAMSUMG  and Google

3: change the threshold to 60%

 

Qestion代写
Qestion代写

 

a)There are 3 frequent itemsets:

{Soda,Chips,Cookies}, {Chips, Yogurt,Detergent}, {Chips,Yogurt,Cookies}

b) confidence = (Chips,Cookies,Yogurt)/(Chips,Cookies) = 1/2

confidence = (Yogurt,Cookies,Chips)/(Yogurt,Cookies) = 1/2

c) Lift = confidence/Benchmark confidence of Chips = (1/2) / (5/6) = 0.6

Since the lift is 0.6, which is less than 1, indicating the worse rule. So buying Yogurt and Cookies together reduces the sales of Chips. Buying Yogurt and Cookies together is not a good predictor for buying Chips.

 

29:             Qestion代写

IS Distance = sqr[(4-4)^2+(51-49)^2+(49-46)^2] = 3.6056

Mkt Distance = sqr[(5-4)^2+(48-49)^2+(51-46)^2] = 5.1962

Acct Distance = sqr[(6-4)^2+(46-49)^2+(48-46)^2} = 4.1231

Sophia is close to IS field job.