Main Menu

My Account
Online Free Samples
   Free sample   Data mining assignment experimental design for westie institute of studies

Data Mining Assignment: Experimental Design For Westie Institute Of Studies

Question

Task:
There are two scenarios in this data mining assignment. The first requires the design of an empirical experiment. The second requires the design of a small survey. For each scenario, you will be required to identify the potential biases, how these can be avoided, and the ethical issues involved.

Scenario 1: Student Data Mining
The Westie Institute of Studies (WIS) has several years of data on the qualification completion of their student. This data covers the qualification started by the student, the level of the qualification, the number of courses in the qualification, the student’s grades in each qualification, student attendance rates, and whether the student completed the qualification. There is also data on the students themselves: This includes their gender, age, country of origin and ethnic group. Student attendance in class week-by-week is also available.

WIS wants to mine this data with a view to creating a system that can predict whether a student will complete a qualification or withdraw from it. Several data mining algorithms will be evaluated and the best (i.e. the most accurate) will be embedded into the production environment. Data mining algorithms use data to produce models that can make predictions. Some algorithms are stochastic, that is, they produce slightly different models each time they are run.

Your task is the following:

  1. Identify and explain the ethical issues that are present in this data set and this application.
  2. Identify and explain any potential biases in this data set, which might influence the accuracy of the data mining algorithms.
  3. Design an experimental process for evaluating any data mining algorithm over this data set. The goal of these experiments is to compare the performance of each algorithm: the experimental design must allow this in an unbiased manner.

Scenario 2: Attitudes Towards and Impact of Tertiary Education Policies
The New Zealand government has recently introduced a new tax to fund tertiary education. The tax is levied on housing developers and the stated goal of the tax is to allow more people to study at tertiary level by eliminating fees and increasing living costs support for students. A local lobby group feels that this tax is unfair and wants to perform a survey to prove this. They want to establish what the public think of the tax in terms of satisfaction with it, whether the public think it is justifiable and will accomplish government goals. The lobby group also wants to establish whether the tax is placing an excessive burden on developers. The results of this survey will be used by the lobby group in submissions to government. Results will also be released to the media.

Your task is the following:

  1. Identify and explain the ethical issues that are present in this survey, and how to avoid them.
  2. Identify and explain any biases that could be present in this survey, and how to avoid them.
  3. Design a short survey to investigate the questions posed by the lobby group. This should include a preamble / introduction and the proposed survey questions.

Answer

Scenario 1: Student Data Mining
1. Ethical Issues Present in the Data Set and Application
The data mining system that Westie Institute of Studies (WIS) study wants to design can be very beneficial for the organisation. It has several years of student data, and the implementation of that information can open a new dimension in the field of student behavioural analysis (Moro, Rita & Vala, 2016). Understanding the behaviours and the ends of the students has always been a big challenge for the Westie Institute of Studies, and it can be entirely resolved by the usage of these methods. However, there are some ethical issues that will definitely be a part of these data collection and analysis systems. They are examined in the following section to highlight some of the ethical concerns for the organisation.

The disagreement of User Consent
When an educational institute collects the information and data of the students, it does not have any kind of mutual agreement with the students for the future usage of the information (Moscoso-Zea, Saa, &Luján-Mora, 2019). In this manner, the students cannot legally object to the usage of their personal information. However, there is a robust ethical ground that the organisation must consider the consent of the owners of this information. The students did not provide any vocal or written permission to the Westie Institute of Studies about using their information.

Privacy of Data
The data of any student can not only be seen as an asset of the organisation. It is also the identity of any person that must be kept secure. The university has a moral responsibility of securing the datasets from any outsider (Senosi& Sibiya, 2017). It should also not be used for any corporate purposes. But the data analysis system that Westie Institute of Studies is going to design will break all these barriers. The student data will be used for their business process, and it will break the privacy of the students.

Security of Sensitive Information
The information of the students can also contain any sensitive or private information about them. It can be the identity card number of the student of any information about any illness that any student bore during their studies. If these datasets get leaked, the institute will be responsible for hefty compensations to those individual students. The reputation of the organisation will also be devastated.

2. Identification and Explanation of Biases in the Data Set
There are certain conditions that can lead to the biases of the data mining tool (Pattanaprateep, et al., 2017). It would impact the overall result of the system and can significantly harm the effectiveness of the end-results. The Westie Institute of Studies would want to avoid this issue by keeping the data analysis system feasible for their business. The forts issue that can occur in this matter is the intensity of randomness of the information. The datasets that are to be used and fed into the system for analysis must be selected on the basis on some proper attributes. And the selection of a similar type of data also must be avoided as it would result in the generation of a similar curve of sequence(Zimmermann, 2020). The most types of bias that may occur in the process of the data mining tool design of Westie Institute of Studies are explained in the upcoming section.

Sample Selection Bias
It is a common form of bias and can be a part of the data mining tool design of Westie Institute of Studies. It occurs when the system lacks certain types of data that are needed for the generation of the analysis.

Confirmation Bias
The probability of the confirmation bias is the most dangerous situation that the data analysis process can lead itself. The confirmation bias is basically the process that is focused on providing some pre-determined theory(Moro, Rita, &Vala, 2016). If the makers of the data-mining tool want to prove that the university is going towards a possible future, and most of the students are going to complete their condition, the system will be incredibly biased. It is required that the organisation does not bear any kind of assumptions and accept all the results that are analysed through the process of data-mining.

Outliers
Outliers essentially mean the data that are very much different from the others in the particular dataset(Velmurugan & Anuradha, 2016). For example, if any particular set of students spend $1000 for their studies on average and one particular student has spect 5000$, the later one should be skipped in the analysis process. This standalone high range of data can affect the mean value or the average value in an excellent manner.

3. Experimental Process for Evaluating Data Mining Algorithm
The determination of the efficiency and the performance of any machine learning model can be a complicated task. It much depends on the requirement of the organisation. There are various evaluation models for testing the efficiency of any machine learning system(Moscoso-Zea, Saa, &Luján-Mora, 2019). But the selection ultimately depends on the function of the system and the requirement of the corporate. On the other hand, the overall process must analyse the prediction methods that the tool will utilise. The description of the model will be provided in the upcoming section.

1. Accuracy
The accuracy level of a classifier is determined by the percentage of correct prediction that is generated from the system(Pattanaprateep, McEvoy, Attia, &Thakkinstian, 2017). The reliability of the accuracy of the system will be feasible when it consists of equal proportions of data. If the data set is unbalanced, then the accuracy level of the results will be unreliable.

2. Recall
The recall process is mostly used for the unbalanced datasets. It is helpful for finding out the real positives of the entire dataset(Velmurugan & Anuradha, 2016). If there are more false examples in the dataset, then the recall quantity will be smaller than the accuracy.

3. Precision
The prediction of the results is not the only target of the data mining model. The precision of the results must also be determined and assessed for the feasibility calculation of the analysis process. If the model becomes more biased towards the false prediction data, then the precision method will be more useful for determining the efficiency of the model.

4. F1 Score
There are some instances when the precision and the recall both become necessary for the meshing process of any data-mining. In the process of developing the data mining tool of Westie Institute of Studies, the F1 score can also be used(Zimmermann, 2020). If it is described in a mathematical model, it can be termed as the harmonic mean of both the recall and precision model. If it can be used correctly, it can work better than not the precision and recall process.

Ethical Issues present in the survey
Upon a detailed investigation of the possibilities and the opportunities proposed by the local organisation, it has been observed that the current settings of the survey would be unethical. This would be due to the lack of information and specific placement of ethical consent associated with the survey (Speklé& Widener, 2018). Based on the proposed scenario, it could be observed that the survey is aimed at the locals. A local lobby group would hold this with a significant lack of associated standard codes for ethical consideration.

Extensive research shows that most academic and private sector surveys would be conducted under specific ethics codes. For the mentioned case, the local lobby should follow the AAPOR (American Association of Public Opinion Research). The mentioned ethical code would be able to address the ethical consideration associated with the research. This would be inclusive of the prospect of respect, integrity and honesty while dealing with the respondents (Gupta, 2017). However, the current scenario associated with the local lobby group lacks the mentioned opportunities. Hence, this would be a significant threat to the viability and effectiveness of the survey.

Additionally, the approach associated with the local lobby group has resulted in a scenario where the participants may be unwilling to participate. This would be based on the consideration that the views of the participants have not yet been collected or researched upon. It has been a collective understating of the local lobby group that the taxpayers associated with the new tax policy would be unhappy (Zimmermann, 2020). Hence, the prospect of morality and persuasion for the survey could be determined as an unhealthy practice. This would be based on the understanding that the participants need to have proper knowledge of their participation in the survey.

Furthermore, the participants need to be provided with a consent form that states the validity of the survey and how the data would be used for fundamental research. Hence, due to the lack of these standards from the proposed survey, the opportunity associated with the survey looks slim. This could be related to the lack of anonymity associated with the survey and the possible implementation of the same.

Potential biases in the survey
While observing the possibility and the present scenario associated with the survey, it has been observed that s specific bias resides the result of the survey. It is essential to establish the common ground of understanding that the local lobby group would conduct the survey. Besides, the research team has not yet mentioned the sampling method and the selection of participants. However, the biasness comes from the potential results associated with the survey (Facca et al., 2020). Thus, it could be stated that the bias is a Response bias. The surveying team or the local lobby group already feels that the levied tax policy is unfair. Hence, they would hold the survey to “prove" the results that have already been reflected by the actions of the local lobby group. It has been further noted that the results of the survey would be driven by the possibility associated with the terms of dissatisfaction of the respondents. Therefore, this would reflect on the primary data collected from the survey.

Additionally, it could be observed that the sampling bias could be present for the survey. This would be based on the understanding that the survey would be collecting the response of individuals who are not satisfied with the new policy. This would be related to the prospect that the local lobby group does not establish the process of sampling for the survey. Hence, they would be open to select only those respondents that oppose the new tax policy and could strengthen the dissatisfied scale of responses (Yip, Han&Sng, 2016). This would aid the organisation of the local lobby group to establish that understand that the newly introduced policy would not aid the society or allow the government to achieve its goals. Hence, the unclear notion of the possible inclusion of well-informed individuals associated with the tax policy would be a significant threat to the survey. This would be further related to the potential biases that could be alarming for the reliability of the survey(Gelinas et al., 2017). Furthermore, it has been observed that the survey results would be used for the personal gain of the local lobby. Thus, questioning the potential biasness of the survey.

A short survey to investigate the new tax policy
This section of the study reflects upon the various questions that could be proposed to implement and evaluate the level of satisfaction associated with the new tax policy. It has been well-established that the accountability and the opportunity associated with the survey would be arguable based on the approach of the local lobby group. Hence, the following survey has been generated as a possibility for the survey(Gelinas et al., 2017). However, due to the lack of knowledge associated with the format for the survey, the current survey has been designed with multiple questions that would range from open-ended questions. In addition, an introduction has been included as part of the survey that would guide the respondents to follow some necessary steps to respond correctly.

Introduction: This survey has been designed to collect your responses to tertiary education policies. Your responses would be saved with us in full anonymity and would be sued for further investigation to judge the collective attitude regarding the new tax policy (Newransky et al., 2020). Please provide your response as per the best of your knowledge that would allow the local lobby group to hold research regarding the policy. This would allow the establishment to judge the implementation of the new tax policy as justifiable or devoid of the government goals.

1. Question 1: I have comprehensive knowledge about the new tax policy implemented by the New Zealand Government that would fund tertiary education.

Strongly agree

Agree

Neutral

Disagree

Strongly disagree


2. Question 2: Will the policy eliminate the education fee associated with tertiary education.

Strongly agree

Agree

Neutral

Disagree

Strongly disagree


3. Question 3: Will the tertiary education seekers be affected by the policy? (Please rate it on a scale of 1-10, where 10 is the highest scale of agreeing with the possibility)

1

2

3

4

5

6

7

8

9

10


4. Question 4: Has the increasing living cost support affected the lack of tertiary education seekers?

Strongly agree

Agree

Neutral

Disagree

Strongly disagree


5. Question 5: The levied tax on the housing developers is extremely unfair.

Strongly agree

Agree

Neutral

Disagree

Strongly disagree


6. Question 6: Do you support the implementation of the new tax policy by the New Zealand government? (Please rate it on a scale of 1-10, where 10 denotes “extremely likely”)

1

2

3

4

5

6

7

8

9

10


7. Question 7: Has the tax policy placed an extreme burden on the housing developers in New Zealand?

Strongly agree

Agree

Neutral

Disagree

Strongly disagree


8. Question 8: Do you think the tax policy is an unfair practice, and there is room for improvement from the government body?

Strongly agree

Agree

Neutral

Disagree

Strongly disagree


References
Facca, D., Smith, M. J., Shelley, J., Lizotte, D., &Donelle, L. (2020). Exploring the ethical issues in research using digital data collection strategies with minors: A scoping review. Plos one, 15(8), e0237875. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0237875

Gelinas, L., Pierce, R., Winkler, S., Cohen, I. G., Lynch, H. F., &Bierer, B. E. (2017). Using social media as a research recruitment tool: ethical issues and recommendations. The American Journal of Bioethics, 17(3), 3-14. https://www.tandfonline.com/doi/abs/10.1080/15265161.2016.1276644

Gupta, S. (2017). Ethical issues in designing Internet-based research: recommendations for good practice. Journal of Research Practice, 13(2), D1-D1. http://jrp.icaap.org/index.php/jrp/article/view/576

Moro, S., Rita, P., &Vala, B. (2016). Predicting social media performance metrics and evaluation of the impact on brand building: A data mining approach. Journal of Business Research, 69(9), 3341-3351.https://www.academia.edu/download/48030422/2016_JBR-MoroRitaVala.pdf

Moscoso-Zea, O., Saa, P., &Luján-Mora, S. (2019). Evaluation of algorithms to predict graduation rate in higher education institutions by applying educational data mining. Australasian Journal of Engineering Education, 24(1), 4-13.http://rua.ua.es/dspace/bitstream/10045/93995/5/2019_Moscoso-Zea_etal_AustralasianJEngEdu_preprint.pdf

Newransky, C., Kyriakakis, S., Samaroo, K. D., Owens, D. D., & Abu Hassan Shaari, A. (2020). Ethical and Methodological Challenges of Implementing Social Work Survey Research in Schools: A Perspective from the Suburban United States. International Journal of School Social Work, 5(1), 4. https://newprairiepress.org/ijssw/vol5/iss1/4/

Pattanaprateep, O., McEvoy, M., Attia, J., &Thakkinstian, A. (2017). Evaluation of rational nonsteroidal anti-inflammatory drugs and gastro-protective agents use; association rule data mining using outpatient prescription patterns. BMC Medical Informatics and Decision Making, 17(1), 96.https://link.springer.com/article/10.1186/s12911-017-0496-3

Speklé, R. F., & Widener, S. K. (2018). Challenging issues in survey research: Discussion and suggestions. Journal of Management Accounting Research, 30(2), 3-21. https://meridian.allenpress.com/jmar/article-abstract/30/2/3/80958

Velmurugan, T., & Anuradha, C. (2016). Performance evaluation of feature selection algorithms in educational data mining. International Journal of Data Mining Techniques and Applications, 5(2), 131-139.http://www.hindex.org/2016/article.php?page=1176

Yip, C., Han, N. L. R., &Sng, B. L. (2016). Legal and ethical issues in research. Indian Journal of Anaesthesia, 60(9), 684. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5037952/

Zimmermann, A. (2020). Method evaluation, parameterization, and result validation in unsupervised data mining: A critical survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(2), e1330.https://hal.archives-ouvertes.fr/hal-02284852/file/dm-problem-r3.pdf

NEXT SAMPLE

Related Samples

Question Bank

Looking for Your Assignment?

Search Assignment
Plagiarism free Assignment

FREE PARAPHRASING TOOL

PARAPHRASING TOOL
FREE PLAGIARISM CHECKER

FREE PLAGIARISM CHECKER

PLAGIARISM CHECKER
FREE PLAGIARISM CHECKER

FREE ESSAY TYPER TOOL

ESSAY TYPER
FREE WORD COUNT AND PAGE CALCULATOR

FREE WORD COUNT AND PAGE CALCULATOR

WORD PAGE COUNTER



AU ADDRESS
9/1 Pacific Highway, North Sydney, NSW, 2060
US ADDRESS
1 Vista Montana, San Jose, CA, 95134
ESCALATION EMAIL
support@totalassignment
help.com