Overview: The purpose of this assignment is to explore the current challenges, trends, and tools that are associated with the process of conducting a cluster analysis.
Data storage used to be the biggest challenge with big data. However, due to advances in cloud infrastructures, storing data is no longer a key concern. Accessing data is now the newest imperative that data scientists face today. Clustering has made big data analysis much easier. However, clustering has introduced its own set of challenges that data engineers must address.
Primary challenges with data clustering: Describe and explain one or two of the primary challenges associated with data clustering techniques. For example, selecting the correct algorithm for the analysis can be difficult. The literature on cluster analysis includes thousands of different algorithms that can be used. Finding the right one for a particular problem requires expertise in the myriad of mathematical and computational options that are available.
Current trends in technology for mitigating challenges: Please describe one trend in data science that can help organizations overcome today’s data clustering challenges. For example, hierarchical clustering has gained traction with machine learning because it does not require the data scientist to pre-specify the number of clusters.
DM tools and real-world scenario: Describe one tool that can support the process of data clustering and provide a real-world example or scenario where this tool has helped an organization overcome their data clustering challenges. For example, the open source tool Weka has a Cluster panel that can be used to identify attributes of those banking customers who have been most loyal for distributing specialized digital marketing campaigns.
Things to be covered in the essay: