Information Technology Assignment: Platforms for Big Data Analytics
Task: This information technology assignment expects the student to submit a critique report on a research article related to a Big Data and its current trend technologies. This assessment is designed to improve student presentation skills and to give students experience in researching a topic and writing a report/Critique report relevant to the Unit of Study subject matter. The research articles would be tasked to explore further research trends relevant to the unit content.
For this component you will prepare a report/critique on an academic paper related to Big data or big data technologies, Big data analytics, Big data security etc. The paper you select must be directly relevant to one of these major topics. The paper can be from any academic conference or other relevant Journal or online sources such as Google Scholar, Academic department repositories etc. The topic and the related paper need not be later than 2013-14.
Introduction to the theme of Information Technology Assignment
In the era of large data, simpler and streamlined user input services are used to analyze and forecast customer behavior through information from social networking sites, systems with GPS and CCTV videos. Big Data analytics relates to very large, highly varied and high-speed datasets. The handling of Big Data by traditional methods, approaches and hardware or software systems is a great challenge. Big data is about vast volumes, dynamic databases with various heterogeneous domains and rapidly expanding data. The rapid progress in networking, data management, data collection and collection cap in almost all physical, clinical, biological and technical applications has increased considerably with huge numbers of data. The study as presented in the article “Platforms for Big Data Analytics: Trends TowardsHybrid era”, shows the correlations with industrial patterns, disease control, crime prevention etc. with the data sample analysis. The article would present the conditions where, Scientists, corporate executives, medical practitioners, advertising and administration regularly faces challenges, with a large number of details, in areas such as Online search, finishing technologies, medical informatics, international information technology, corporate statistics. Various large data systems with different features are accessible and a thorough understanding of the functionality of such platforms is needed to choose a suitable platform. In specific, the ability to adapt of the framework to meet enhanced data management demands is a crucial element when developing analytics solutions depending on a specific platform.
Purpose of the article:
To analyze different platforms that can be used for Big Data analytics
Which platforms are appropriate for Big Data Processing?
Big data is based on sophisticated statistical techniques for broad and varied data sets, which include standardized, semi-structured and unorganized data from various sources. With the help of distributed networks it is actually possible to store and use large data technologies to keep data elements in many locations and to retrieve them together through the recognition system . Big data calls for a number of analytical methods and approaches to extract insights. Big data means processing data bytes and the modular cloud architecture allows data-intensive systems to be deployed that support enterprise analytics. The cloud further simplifies networking and coordination within an enterprise, giving more staff access to appropriate analytics and facilitating data exchange. The market offers a range of Big Data tools including Hadoop for storing and manipulating massive amounts of information Spark for calculating in-memory, Storm for speeding unlimited data processing , Apache Cassandra for high reliability and the scalability of a repository and MongoDB for multiplatform capabilities, which allows different functionality in each Big Data technology. The author has discussed different data mining techniques in the article. Big data analysis currently enables the exploitation and reassembly of traditional, more hierarchical data from different types of data, along with tweets, images, social media communications, computer records, and video or voice records . Only the features of big data and analytical technologies currently available allow this kind of data to be processed. Large amounts of both organized and unstructured information require additional capacity, storage and other processing. The accessibility of computer services over the internet relates to cloud-based computing. These tools include servers, computing, data bases, applications, analytics, networking which internet information, and can be configured according to the client's requirements . Consumers have to pay per usage in cloud services. It is quite scalable and services can efficiently be scaled according to demand. The Cloud not only has freely accessible resources, but also the capability of really rapidly extending this functionality in order to handle large traffic or usage spikes. Big data mining in the cloud enabled the method of analysis less expensive. In addition to reducing local infrastructure, expenses associated to equipment repairs and improvements, electricity usage, plant operations and more can also be saved. Big Data can be used in risk management very well. It plays an important role in the analysis of efficiency of developing products and it is quite quicker that other frameworks . The requirements for selecting most suitable and flexible Big Data tools are therefore extremely necessary to consider. Cloud computing is really important to big data because it can provide any other kind of technology and diverse resources on request . In addition, large data analytics need to embrace this pattern because cloud computing is a dominant and extremely scalable business strategy.Although the article has reflected lots of positive points based on the different platforms of Big data analytics but the study still have a research gap based on the most efficient big data platforms to provide appropriate solution to problems emerging from the performance and scalability of Big Data algorithms.
The author conducted qualitative research on the application different platforms for big data analysis. A large-scale data mining framework is being discussed to meet the needs of big data analyzers to handle real-world domain issues. Through the successful application of these well documented and commonly deployed data mining techniques, future patterns for Big Data handling and analysis can be forecast by taking into account the characteristics of available application frames and platforms. When the amount of data is exponentially increased and stored, the flood of data that many businesses have to confront overwhelms them . A technique known as data scaling was needed for many entities working with data sets to handle, store and handle this excess. A flexible data network adapts quickly to shifts in traffic or mass growth of information. This platforms use software or additional hardware to improve data output and capacity. When an organization has a flexible data platform, it is therefore equipped to increase its data requirements. As per the author, the data scaling can be divided into two groups such as horizontal and vertical scaling. By reducing the financial expenditure, the horizontal scaling will improve efficiency. Horizontally scaling is incredibly strong and can be extended unlimitedly because the volume of scaling is not restricted. Besides this dominant aspect, the key downside is the restricted accessibility of software system for the adequate use of horizontal scales . Vertical scaling, infrastructure management and implementation is convenient, but major downside is to scale a platform's capacity to provide services and contribute to overhead financial investment. Massive, network-based systems also produce big data. In non-standard format files, in addition to machine learning to standardize the information machine learning from the Cloud Service Provider can be used. The information can be used and used in various ways through the Cloud Platform. In big data analysis, time and technology are key considerations. Because it's time for vital tasks, it needs huge calculations, a range of costly applications and enormous integration efforts. Cloud technology is the only way to address all these issues, since it can deliver on-demand services with prices commensurate with real use. This cloud technology enables Big Data to be processed in real time . It will take large bursts of data and decode them in real time through intensive networks. The capacity of the cloud enables big data processing to happen within a portion of time. The author also discussed about different tools and analyzed their efficiency in using Big Data analysis. Currently, open source technology plays a pioneering role in the big data realm. Various open-source, distributed data processing tools for companies are now widespread. In this sector, however, there are several data mining applications that concentrate on people with the potential to deal with and follow new requests and patterns. Hadoop is development architecture for open-source information storage and running apps on multiple processors clusters. It offers large storage , huge computing capacity and a nearly unlimited number of connected tasks for all types of data. MapReduce is a versatile solution for different big data processing with programming interface. However, there are few cases where this decent scheduling algorithm is not implemented normally and thus alternate solutions are required. In recent years, there has been a great deal of concern in the scalability approach of machine learning techniques by cluster-based approaches. There are not straightforward or final solutions. Only scalability challenges of master learning algorithms have been given much less consideration, however, integration problems have been considerably less addressed.
Along with the discussion of different big data tools, the author also brought into light some issues regarding the usage of the tools. The Apache Mahout Machine Learning is the leading library of other implementations. Mahut is implemented in Java primarily for Apache Hadoop and is designed for this. While it's worthwhile, there are some serious problems , such as the fact that it isn't just library and it cannot have an independent user experience. It's a platform used by hadoop deep learning. Progress is underway to complete deployment. The number, speed and diversity of big data face storage problems. Data storage on traditional physical storing is difficult, as hard drives sometimes fail and conventional data retention systems for PB scale storage are not effective. The consistency and scalability features of such software vary from various systems or frameworks . To adequately use these features, the quality scalability for big data cannot be achieved only within the same platform or system. One of the main problems is the integration of these application components into an interface mismatch based on their adequacy . This is the biggest obstacle to data mining.
After the detailed research on different frameworks and tools for Big Data analysis, the author identified the efficiency of them with proper justification . A powerful, scalable, hybrid framework with sufficient plumbing can be an ideal approach to big data issues in the correct mix of the DM algorithm's efficiency and scalability. While this hybrid pattern is undergoing exploration, a considerable amount of attention would certainly succeed in discovering exciting developments in data mining in the coming years by choosing suitable, interconnected platforms .
Hence, from the study it is observed that big data is very much essential for the data management in this modern era. It is very difficult to manage the huge amount of data with efficiency. Big data comprises structured information, unstructured data, and semi-structured information. In the standard data processing tools large data cannot be saved and analyzed, it needs advanced tools for big data analytics. It applies to huge and diverse collections of data Special technology, such as huge parallel processing, relational computing, data processing grids, flexible data storage and advanced computing infrastructure, frameworks, networks and application are also essential for efficient Big Data management. Based on the proposed research question that implies the suitable big data platforms there have been introduced different hybrid frameworks and techniques to handle the data properly. The author has analyzed the application of different tools for big data and also discussed about the efficiency of the tools. In our modern world, big data and cloud services play a major part. The two connections provide a chance of business potential to people with brilliant ideas and little capital. They also enable existing enterprises to use data collected but not analyzed before. Further research on the other tools and frameworks for big data analytics are required for clearer vision about the efficiency of big data analytics.
 Acharjya, D.P. and Ahmed, K., 2016. A survey on big data analytics: challenges, open research issues and tools. International Journal of Advanced Computer Science and Applications, 7(2), pp.511-518.
 Anuradha, J., 2015. A brief introduction on Big Data 5Vs characteristics and Hadoop technology. Procedia computer science, 48, pp.319-324.
 Babar, M., Arif, F., Jan, M.A., Tan, Z. and Khan, F., 2019. Urban data management system: Towards Big Data analytics for Internet of Things based smart urban environment using customized Hadoop. Future Generation Computer Systems, 96, pp.398-409.
 Dagade, V., Lagali, M., Avadhani, S. and Kalekar, P., 2015. Big data weather analytics using hadoop. International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE) ISSN, pp.0976-1353.
 Gahi, Y., Guennoun, M. and Mouftah, H.T., 2016, June. Big data analytics: Security and privacy challenges. In 2016 IEEE Symposium on Computers and Communication (ISCC) (pp. 952-957).IEEE.
 Griffith, E., 2016. What is cloud computing. Retrieved from PC Mag: http://au.pcmag. com/networking-communications-software-products/29902/feature/what-is-cloud-computing.
 Grover, P. and Kar, A.K., 2017. Big data analytics: A review on theoretical contributions and tools used in literature. Global Journal of Flexible Systems Management, 18(3), pp.203-229.
 Jain, V.K., 2017. Big Data and Hadoop.Khanna Publishing.
 Marinescu, D.C., 2017. Cloud computing: theory and practice. Morgan Kaufmann.
 Reyes-Ortiz, J.L., Oneto, L. and Anguita, D., 2015. Big data analytics in the cloud: Spark on hadoopvsmpi/openmp on beowulf. Procedia Computer Science, 53, pp.121-130.
 Rittinghouse, J.W. and Ransome, J.F., 2016. Cloud computing: implementation, management, and security.CRC press.
 Saggi, M.K. and Jain, S., 2018. A survey towards an integration of big data analytics to big insights for value-creation. Information Processing & Management, 54(5), pp.758-790.
 Sogodekar, M., Pandey, S., Tupkari, I. and Manekar, A., 2016, December. Big data analytics: hadoop and tools. In 2016 IEEE Bombay Section Symposium (IBSS) (pp. 1-6).IEEE.
 Wu, W., Lin, W., Hsu, C.H. and He, L., 2018. Energy-efficient hadoop for big data analytics and computing: A systematic review and research insights. Future Generation Computer Systems, 86, pp.1351-1367.
 Yao, Q., Tian, Y., Li, P.F., Tian, L.L., Qian, Y.M. and Li, J.S., 2015. Design and development of a medical big data processing system based on Hadoop. Journal of medical systems, 39(3), pp.1-11.