Big Data Assignment Discussing Cloud Security Threats in Case of Global Entertainments (GE)
Global Entertainments (GE) is an entertainment company established in 1995 which has grown into a company with a turnover of £10million. Its headquarters is in Slough, where the company was originally founded. There are also regional offices in London, New York and Singapore. GE maintains a centralised database in Slough which supports a number of transaction processing systems. Recently the management team focussed its attention on improving the decision-making capabilities of the organisation. In particular they wanted to provide regional managers with information and insights into existing data to enable more efficient decision making. At present, database (RDBMS) is updated continually throughout the day. For example, entertainment products (online film streaming/DVDs) are constantly requested by the customers and the online transaction processing system records and updates inventory records accordingly. During the sales/promotions phase, regional managers often need online analysis reports to monitor sales performance in order to rectify actions for any deviations in performance. They also need timely analysis reports to assist in making long-term decisions. Currently, a great deal of effort is spent on collecting data from various systems before any analysis can be undertaken. Managers want and need more information, but analysts can provide only minimal information at a high cost within the desired timeframes. It is clear that in order to provide the necessary information more efficiently, there is a need to move the IT infrastructure to a cloud environment and use big data for this purpose. However, there is a concern that this will cause some issues with security.
You have been hired as a Big Data Cloud Engineer to migrate the company’s centralized database (RDBMS) to Big Data in the Cloud and identify security vulnerabilities and potential security threats caused by migrating the company’s data to cloud and big data environments. You should also suggest ways of protecting the company from these dangers.
Tasks to be addressed in the big data assignment:
1. Discuss and analyse the principles of Big Data in the Cloud - how this methodology can be helpful for Global Entertainments (GE) (case study) - Suggested word limit for this task is 1000 words
2. Critically evaluate and discuss the security risks within IAAS, PAAS and SAAS and decide which one is more suitable and secure for Global Entertainments (GE) (case study) - Suggested word limit for this task is 1000 words
3. Critically evaluate and discuss Big Data frameworks in the cloud and how your discussed frameworks counter measure security concerns raised during GE investigations – Suggested word limit for this task is 800 words
4. Produce a high-level infrastructure diagram to support GE data migration project (your diagram should show the tools/solutions)
5. Discuss which professional practices can help the Global Entertainments (GE) to comply with relevant legal, ethical and social issues within the Cloud and Big data environment.
As per the research on big data assignment, Big data is a collection of structured, unstructured, and semi-structured data by an organization that can be mined to infer useful information. This contains the main 5 Vs namely variety, volumes, velocity, veracity, and value which are the main characteristics of big data (Schneider, et al., 2018).
5 Vs of Big data
Source : (Analytics Vidhya, 2020)
Volume represents the amount of data, big data process a huge volume of low density and unstructured data. Records of online firm streaming, customers request and transaction processes are unstructured data. The data in this will be tens of terabytes or hundreds of petabytes. Volume data is increased due to the cloud computing traffic, IoT, etc.
Source : (Analytics Vidhya, 2020)
Velocity represents the speed rate on data received, the highest velocity data will stream directly into the memory while the written data will move into a disk. The real-time data needs a real-time evaluation that needs high data receiving rate for evaluation.
Variety represents a variety of data types like structured, unstructured, and semi-structured data. Structured data are a traditional data types that easily fit into the relational database, unstructured data comes with the rise of big data and both unstructured and semi-structured data are text, audio, and video that require additional pre-processing to infer meaningful information and its support metadata.
Veracity refers to data uncertainty where the data could not be available all the time due to any issue with the data controller. It is varied due to the multi-dimensions from multiple data sources and data types.
Huge data without any value will not be good for the company. With data, there is no use but it should be converted into valuable by extracting the information. This is very important among the remaining 5V’s.
Principles Of Big Data
Scalability – as the data increased in big data databases, a data pipeline is needed to be built on top of the architecture. This architecture has to be scalable to support the stream of data in size, source, and type. In general, a data pipeline will be needed to scale along with the organization's size. To achieve scalability in an organization it needs to take the advantage of the cloud as it provides automatic scalability like AWS. The cloud platform will allow an organization to adjust with their computing efficiency and storage resource which is predefined with the set of rules and protocols (Sagiroglu, et al., 209).
As data plays an important role an organization needs to ensure their business foundation and their data should be usable, consistent, and accessible. This system should have a design that should process and operate even when the components fail to perform some task. To achieve high availability a system should have a decoupling, separate storage, computing with fault tolerance along with failover policies, and backups. All these need to perform automatically.
Business management has been changing over a decade in various sectors as they move from traditional paper-based methods into digital methods with the support of technology. This growth leads an organization to adapt to the new changes quickly to support their business requirement. A data lake, a centralized location that stores huge volumes of data of any type which can be easily available to process, analyze, and consume by a data consumer in an organization. This architecture approach will be is recommended for achieving business operation that maintains the business agility of an organization.
Architecture principle of big data on the data pipeline
Ingest is a stage that collects data as raw information which can be structured, unstructured, and semi-structured data from any source like logs, IoT device, transition, live stream data, etc. at any speed and scale. It accepts both batches as well as stream data.
This store stage in architecture principle will perform data storage that will be insecure, flexible, and efficient storage location. Data can be stored in the data lake, data warehouse, graph database, relational database, and NoSQL database. All these will be used in consume layer.
This stage will change the raw data into a consumable to infer useful information by performing filtering, sorting, joining, splitting, enriching, and also applying for business logic.
This layer is the final stage in architecture principles that mainly provides to the data consumer for the post-processed data. This post-processing of data can be used by performing ad-hoc queries to produce the views in a report and dashboard.
Big data helps Global entertainment in examining the large volumes of data to discover hidden patterns. As Global Entertainments maintains a centralized database in Slough along with its regional offices located in London, New York, and Singapore for processing a huge number of transactions they need to focus on improving their decision-making skills to maintain their business standard and monitor sales performance. The advantages of big data in business are
• Better decision making
• Greater innovation
• Improvement in the education sector
• Product price optimization
• Recommendation engines
The Global entertainments data can be secured by using big data security that has the tools and security measures that will safeguard the data of the transaction and analytic process taken on decision making. They can use encryption, user access control, physical security, and centralized key management to maintain the security of the organization's data. As they use RDBMS for live stream requests from the user encryption will secure huge data on various types. Making user access control GE can secure by providing access privilege that can secure various data on any level of data. Physical security will secure the big data platform on their data center located at the Slough location. As they have various regional offices in London, New York, and Singapore having centralized key management which includes policy-driven automation, logging, key management on abstracting for their usage, and on-demand key delivery.
Using cloudwick will manage the security hub, IBM for monitoring big data and NoSQL environment, log trust to detect online suspicious behavior, and Gemalto to protect entire big data platforms on the cloud, data center, and virtual environment. This includes digital signature solutions, encryption, authentication, and key security management. All this will control the security issues faced by Global Entertainments on access control, real-time security on live stream, endpoints, data in stored location, data mining solutions, and nonrelational data in Slough.
Security Issue In Big Data
Global entertainments are focused on the entertainment where they have a turnover of£10million. Currently, they have a centralized database that supports the transaction processing system. They want to expand the storage system by ensuring the better services offered by the cloud. Now they have started to use the Relational Database Management System for performing the online transactions system, records, and updating the inventory records. During the sales phase, the managers should deal with the analysis and performance for rectifying any deviations. To assist this report there should be timely analysis so there should be a better service to access the information efficiently (Sadiku, et al., 2014).
Cloud computing offers three services such as Infrastructure as a Service, Platform as a Service, and Software as a Services (Rani, et al., 2014) (Mohammed, et al., 2021). IAAS – Infrastructure As A Service Infrastructure as a service offers cloud-based services such as storage services, network services, and virtualization. It offers an advantage of on-premise maintenance and is labor-intensive. It does not require any physical installation of hardware to use the infrastructure. It is highly flexible, scalable and can be replaced whenever the organization requires additional changes (Mohammed, et al., 2021).
The security risk associated with the IaaS are discussed below (Rani, et al., 2014):
The common security issues or risks in IaaS are turned off encryption, misconfiguration, unnecessary cloud accounts, and role-based privilege for the robust user.
Encryption turned off
Encryption plays an important role in data security, without encryption data are easily exposed to theft by unauthorized access into the system. Data transmission needs encryption to secure data that transfers on-premises or between any cloud applications. The data encryption used by an organization can use their encryption key or they can take encryption key offered by a cloud provider.
The most common problem in every organization is storage access which is open to the internet and easily readable by the public. This is due to the misconfiguration of the IaaS instance during deployment.
Unnecessary cloud account
Unwarranted or unwanted use of cloud service can occur at SaaS is common while this also can occur in IaaS when the user uses an application that is not approved by their employer which will be used by cloud provider who doesn’t know the information.
Role-based privilege on robust user
When the access privileges are provided more than the use of the employee then it is high at risk to be exposed by a hacker.
PAAS – PLATFORM AS A SERVICE
Platform as a service offers the hardware and software tools via the internet. Cloud users use the tools for developing the application via the internet. This means the developers need not work from the scratch for creating the application. This saves time. It is a cost-effective and effective way for developing a unique application (Mohammed, et al., 2021).
The security risks associated with the PaaS are discussed below (Rani, et al., 2014):
The common security issues at PaaS are mainly on data protection similar to IaaS on encryption and role-based access privilege with addition it has SLAs issues.
Turned off Encryption
Similar to the IaaS, turning off the PaaS encryption will be easily vulnerable to unauthorized access.
Role-based access privilege
Providing every access to the entire user will lead to misuse of data so it is important to have a role-based access privilege to protect the organizational data.
Unrevised any SLA will lead to security issues as every SLA has an agreement on how and what data has to be used this will value the sensitive data of an organization. An organization has to understand and negotiate its terms to improve its security. So it is important to validate and update security protocols.
SAAS – Software As A Service
Software as a service offers third-party software services via the internet. It is used in the case of monthly subscriptions. The advantage of SaaS is we need not install the software applications on the promise. Only thing is to do is log in to the account and access it over the internet. Data can be accessed from any device and any location. The payment structure is different for SaaS. This can be used when there is a need of sending and receiving emails with the email settings. If anything goes wrong then the SaaS provider will give the proper solution (Mohammed, et al., 2021).
The security risks associated with the SaaS are discussed below (Rani, et al., 2014):
The common security issues in software as a service are less compared to IaaS and PaaS as the client has less responsibility towards security while the cloud service provider will manage and host every necessary task on infrastructure and applications. This will ensure protected user access with strong password protection.
The main and common mistakes and security issues are having weak passwords will lead to high-security issues in SaaS by a user.
The below table describes the security risk associated with the IaaS, PaaS, and SaaS.
Encryption turned off
Unnecessary cloud accounts
Role-based privilege for the robust user
Encryption turned off
PaaS will be suitable for Global entertainment as they use various software applications to monitor sales performance for a live streaming data request for entertainment products. They also need to update and record their request of the processing system into an inventory record. Their centralized database access has to be improved to maintain secure transactions between various users in all regional offices.
GE wants to monitor the sales performance rather than using the network, storage, and virtualization. There should be a usage of hardware and software uses via the internet. As per the requirement of GE, PaaS will be more suitable for providing the services. Data framework such as Hadoop is recommended to use data storage effectively. Hadoop is scalable and distributed where we can store and process large volumes of data. It works on master-slave topology where the nodes are classified as master and slaves with different layers such as intermediary layers. To store the large volume of data during the online transaction, it is necessary to use the data storage unit of Hadoop such as splitting and storing the data. To protect the PaaS by an unauthorized user the online processing system in Global entertainment should have strong password and encryption, revised SLA agreement between cloud providers, and provide access privilege on role-based will secure the system during online live streaming requests by a user.
Big Data Frameworks
We can see from the case study that GE company uses the data for marketing and research most companies who are working with big data and cloud may not have fundamental assets to protect the business. If a security breach occurs to the business the brand will lose all the business value that it would have been built for years. Our requirement entertainment service where huge of content is to be moved to the cloud. The GE company should be able to move all the data to the cloud they should be able to scale if the demand increases for the particular content. This can be done if the company can analyze the data. For making big data secure some of the common techniques are encryption, logging, honeypot detection. In every company, security should be considered as an asset therefore it gives good business value to the customers and the stakeholders.
The data which is stored in the company may be structured or unstructured. The company must use a big data solution to store the data efficiently. There might be security issues when people are trying to retrieve the data from the server like authentication and authorization (Alkatheri, et al., 2019).
When the data is moved to the cloud it faces a lot of security issues like data breaches, data losses, and denial of services. When the company uses these cloud services the company fails to address the levels of security and privacy in the SLA.
If we take big data frameworks like Hadoop and spark, we already know that are going to work on distributed nodes. It is very difficult to find out where the resources are computed so pinpointing a particular cluster of resources and ensuring security for that particular node is very difficult to implement.
There is also the factor the data is distributed in the different machines in the cloud environment. Also, it may have multiple copies to have the data more reliable and it is very difficult to find where the pieces of the data are stored. It is very difficult to secure conditional data in the cloud whereas in a traditional work environment we can wrap the data with several security tools. In the cloud environment, it is difficult to implement security for pieces of data because it can store anywhere in the cloud servers (Inoubli, et al., 2019).
Hadoop uses RPC and TCP/IP with wired and wireless networking. This is how data transfer is done in Hadoop so there is the possibility that anyone can tap and modify the inter-node communication.
The major flaw in Hadoop is that data is not encrypted while stored in the Hadoop environment because of high availability so if a hacker is trying to access the system he can easily tap into the data. There are a lot of nodes that are connected in parallel. We can see that if there is no authentication in the node, a third-party node can join in the existing node, and hackers can use this trick to steal the data. There is no logging in the cloud environment when it comes to Hadoop infrastructure. Information such as which node had joined which cluster which Map-reduce job has executed it is very difficult to see if the cluster is being breached of data. Without a log, a malicious user can manipulate the cluster without getting caught.
Spark is the framework that performs the task faster on huge data set and also with multiple computers and distributed computing tools. It is one of the famous frameworks in the world. It can be deployed using java, python, R programming, machine learning, graph processing, and data streaming. It works on two components such as driver and executors. It can run on standalone cluster mode and it allocates the resources using the cluster management system. It overcomes the disadvantage of MapReduce in Hadoop. The advantage of Spark is speed and developer-friendly API.
Hive facilitates reading, write and manage the large datasets that are stored in the distributed environment using SQL. It is open-source and processes the structured data in Hadoop and facilitates analysis and queries. Using the query, the data is analyzed. Driver and complier are involved in the hive. It is not a relational database and is not suitable for online transaction processing.
It is a stream-based processing framework that focuses on low latency and it is suitable for workloads and real-time processing of data. It can handle more data that provides results with less latency. The steam-based model works with a Direct Acyclic graph in the framework which includes different typologies such as streams, spouts, and bolts. It is highly suitable for real-time processing that handles low latency for workloads.
Countermeasures For The Security Risk:
The way of securing the data in the cloud are as follows:
· If the data can be encrypted that will provide an additional security layer to the data.
· A cloud environment is built for security. It is the user who might unknowingly leave a digital footprint in the private cloud. these public data can be misused to gain unauthorized access.
· If there is any change in the database everything should be logged in the cloud.
· Only authorized people should have to access the server which is running in the cloud.
· Only authorized people should have to access the server which is running in the cloud.
· If the data can be encrypted that will provide an additional security layer to the data.
· The company should always have perfect SLAs.
Always have a strong password for the application which can be accessed in the cloud.
The company may have many servers and storage places which are needed to be moved to the cloud one of the major security issues when the data is moved to the cloud is authentication of the data. There should be always rules and policies which should be established in the company so that only specific people access the data (Awaysheh, et al., 2021).
Big data and its business are very big news. The legal issues related to big data are not understood clearly. all the enterprises are started to adopt and integrate cloud technologies and consider this time to take necessary actions on the legal issues. GE wants to focus on legal, social, and ethical issues. There are some litigation issues while deploying the cloud without investigating and addressing the legal issues with the big data (Cornetta, et al., 2019).
The following legal issues are associated with big data.
Data ownership and security
While considering the security and privacy of the data it is necessary to involve the ownership of the data. There is a big challenge of big data in identifying the ownership of the data i.e nature of the data, generation of data, collection of data, and how it is delivered to the machine or device, or users. The main issue with big data is privacy due to the huge volume of data. legal analysis is important to identify the rights given to the data. this area is complex and quickly developing the legal practice of the technology.
Due to the large volume of data, the security risk related to a data breach is high. when the data is high, the target for the hackers will be very easy to occur the breach. it is costly for businesses such as payment, regulatory response, and civil litigation. The consumer should focus on establishing the legal framework for addressing sensitive data, social data analysis, etc. some of the elements of big data are integrity and privacy of data, encryption, access control, laws and regulations, and corporate policies. there should be monitoring of vendor agreements, ownership of data, custody requirements, confidentiality terms, data archiving, and international issues.
When considering the cloud, the third-party service provider is highly involved. so there should be some restrictions and protections to discover the issues that should not occur. the third party provides various services for data analysis, management, and data storage. related to the third party contract there are some issues including data loss, data control, and data custody, disagreement with the related policies and procedures, regulations, and international rules.
Regulatory compliance and underlying contracts
Due diligence needs to examine the contract with the players for understanding the legal issues with the data transaction. there are some legal issues including the warranties, data control, indemnification agreements and terminating the contracts, etc.
Enterprise should focus on the usage of big data analysis for legal discovery by opposing the government regulators. Technical limitations are decreased in big data to produce the raw data analysis. When the legal discovery process is started the company can focus on the risk associated with the company and limit the scope of the investigation (Mantelero, 2018).
To mitigate the legal risk the following approach should be concentrated:
• using a cross-functional approach
• producing standard procedures and protocols
• leveraging and responding to the legal request for information
• Implementing the changes in all the departments.
Respecting patients autonomy
The ethical issues of autonomy allow the patients to make better decisions. considering the consent documents that are used for particular project use are more general. when big data is used for not identifying public information, there is no need to consent from the participants. the information available in the big data is created for the individual users. there should be consent models for the created big data.
When the big data is obtained from the particular group then it focuses on the particular characteristics and benefits of the group.
Securing the data is the greater ethical issue in big data. Every user should focus on private information related to third-party attacks.
Sharing the private information
it is not realistic to think about the information related to security and privacy. in some cases, there are good data designs with the trust, but the thing is we should be focused on sharing the general information that should be concentrated.
big data uses secondary data for producing new inferences and predictions. this leads to the business dealing with data brokers who collect a large volume of data in different ways. Data owners should have transparency towards the data that is used or sold.
privacy protections are not enough but big data analytics compromise on allowing the institution to moderate and determine to be aware of. There should be the identification of the predictions and inferences before it gets compromised.
When it comes to the social issues of big data and the cloud then it is mainly focused on data sharing (Hayashi, et al., 2013).
when the data should be shared from one user to the other user then there will be an issue with it. to whom the data should be shared and how to share the data by ensuring data privacy.
This is related to identifying the information which is shared is with the confident relationship where that should not be shared without anyone. The important relationship is between the users to user relationship when this relationship is not properly maintained then there will be data issues related to confidentiality.
In the case of data privacy, there are many dimensions. Privacy means rights acquired to prevent information disclosure to other individuals. There should be recognition of information privacy. Individuals have to restrict the information related to the research, workplace, or any other setting. If this is not identified then there will not be measures to prevent the risk.
Even though it is necessary to ensure the security of the data is disclosed with the individual’s then authorization is a great problem. it should be protected by ensuring access control to the users. This helps to prevent the privacy and confidentiality of the data.
To mitigate the issues related to GE, legal issues, social issues, and ethical issues are identified. It is necessary to identify the issues associated with the data before mitigating it. so the analysis on the issues is done to mitigate earlier.
Schneider, I., & Green, N. (2018). The Politics of Big Data: principles, policies, practices. In The Politics of Big Data (pp. 1-18). Routledge.
Sagiroglu, S., &Sinanc, D. (2013, May). Big data: A review. In 2013 international conference on collaboration technologies and systems (CTS) (pp. 42-47). IEEE. Rani, D., & Ranjan, R. K. (2014). A comparative study of SaaS, PaaS, and IaaS in cloud computing. International Journal of Advanced Research in Computer Science and Software Engineering, 4(6).
Mohammed, C. M., &Zebaree, S. R. (2021). Sufficient comparison among cloud computing services: IaaS, PaaS, and SaaS: A review. International Journal of Science and Business, 5(2), 17-30.
Sadiku, M. N., Musa, S. M., & Momoh, O. D. (2014). Cloud computing: opportunities and challenges. IEEE potentials, 33(1), 34-36.
Alkatheri, S., Abbas, S., & Siddiqui, M. (2019). A comparative study of big data frameworks. International Journal of Computer Science and Information Security (IJCSIS), 17(1).
Inoubli, W., Aridhi, S., Mezni, H., Maddouri, M., &Nguifo, E. M. (2018). An experimental survey on big data frameworks. Future Generation Computer Systems, 86, 546-564.
Ansari, M. H., Vakili, V. T., &Bahrak, B. (2019). Evaluation of big data frameworks for analysis of smart grids. Journal of Big Data, 6(1), 1-14.
Hayashi, K. (2013, September). Social issues of big data and Cloud: privacy, confidentiality, and public utility. In 2013 International Conference on Availability, Reliability, and Security (pp. 506-511). IEEE.
Cornetta, G., Touhafi, A., &Muntean, G. M. (2020). Social, Legal, and Ethical Implications of IoT, Cloud, and Edge Computing Technologies.
Awaysheh, F. M., Aladwan, M. N., Alazab, M., Alawadi, S., Cabaleiro, J. C., & Pena, T. F. (2021). Security by Design for Big Data Frameworks Over Cloud Computing. Big data assignmentIEEE Transactions on Engineering Management. Howe III, E. G., &Elenberg, F. (2020). Ethical Challenges Posed by Big Data. Innovations in Clinical Neuroscience, 17(10-12), 24.
Mantelero, A. (2018). AI and Big Data: A blueprint for a human rights, social and ethical impact assessment. Computer Law & Security Review, 34(4), 754-772. Saqr, M. (2017). Big data and the emerging ethical challenges. International journal of health sciences, 11(4), 1.
Analytics Vidhya, (2020). Data Science: The 5 V’s of Big Data. Available at: https://medium.com/analytics-vidhya/the-5-vs-of-big-data-2758bfcc51d [Accessed 14 December 2021]