Data Mining Research Assignment.

Data Mining Research Assignment.

Discusses strategy and how ERM can be integrated with an organization’s overall strategy. Prepare a research paper on some of the various issues, protocols, methods, frameworks you found and discuss how – if possible – organizations can use ERM as strategy. It is perfectly acceptable if you deem ERM cannot be used as strategy, just back up your claim with scholarly research and justifications.Data Mining Research Assignment.

Your paper should meet these requirements:

  • Be approximately four to six pages in length, not including the required cover page and reference page.
  • Follow APA 7 guidelines. Your paper should include an introduction, a body with fully developed content, and a conclusion.
  • Support your answers with the readings from the course and at least two scholarly journal articles to support your positions, claims, and observations, in addition to your textbook. The UC Library is a great place to find resources.
  • Be clearly and well-written, concise, and logical, using excellent grammar and style techniques. You are being graded in part on the quality of your writing.
  • Data mining is the exploration and analysis of large data to discover meaningful patterns and rules. It’s considered a discipline under the data science field of study and differs from predictive analytics because it describes historical data, while data mining aims to predict future outcomes. Additionally, data mining techniques are used to build machine learning (ML) models that power modern artificial intelligence (AI) applications such as search engine algorithms and recommendation systems.Data Mining Research Assignment.

    Applications of Data Mining

    Data Mining Applications

    • Database Marketing and Targeting
    • Credit Risk Management and Credit Scoring
    • Fraud Detection and Prevention
    • Healthcare Bioinformatics
    • Spam Filtering
    • Recommendation Systems
    • Sentiment Analysis
    • Qualitative Data Mining (QDM)

    Database Marketing and Targeting

    Retailers use data mining to better understand their customers. Data mining allows them to better segment market groups and tailor promotions to effectively drill down and offer customized promotions to different consumers.

    How to do Data Mining

    The accepted data mining process involves six steps:

    1. Business understanding

      The first step is establishing the goals of the project are and how data mining can help you reach that goal. A plan should be developed at this stage to include timelines, actions, and role assignments.

    2. Data understanding

      Data is collected from all applicable data sources in this step. Data visualization tools are often used in this stage to explore the properties of the data to ensure it will help achieve the business goals.

    3. Data preparation

      Data is then cleansed, and missing data is included to ensure it is ready to be mined. Data processing can take enormous amounts of time depending on the amount of data analyzed and the number of data sources. Therefore, distributed systems are used in modern database management systems (DBMS) to improve the speed of the data mining process rather than burden a single system. They’re also more secure than having all an organization’s data in a single data warehouse. It’s important to include failsafe measures in the data manipulation stage so data is not permanently lost.

    4. Data Modeling

      Mathematical models are then used to find patterns in the data using sophisticated data tools.Data Mining Research Assignment.

    5. Evaluation

      The findings are evaluated and compared to business objectives to determine if they should be deployed across the organization.

    6. Deployment

      In the final stage, the data mining findings are shared across everyday business operations. An enterprise business intelligence platform can be used to provide a single source of the truth for self-service data discovery.

    Data Mining Process

    Benefits of Data Mining

    • Automated Decision-Making

      Data Mining allows organizations to continually analyze data and automate both routine and critical decisions without the delay of human judgment. Banks can instantly detect fraudulent transactions, request verification, and even secure personal information to protect customers against identity theft. Deployed within a firm’s operational algorithms, these models can collect, analyze, and act on data independently to streamline decision making and enhance the daily processes of an organization.

    • Accurate Prediction and Forecasting

      Planning is a critical process within every organization. Data mining facilitates planning and provides managers with reliable forecasts based on past trends and current conditions. Macy’s implements demand forecasting models to predict the demand for each clothing category at each store and route the appropriate inventory to efficiently meet the market’s needs.

    • Cost Reduction

      Data mining allows for more efficient use and allocation of resources. Organizations can plan and make automated decisions with accurate forecasts that will result in maximum cost reduction. Delta imbedded RFID chips in passengers checked baggage and deployed data mining models to identify holes in their process and reduce the number of bags mishandled. This process improvement increases passenger satisfaction and decreases the cost of searching for and re-routing lost baggage.

    • Customer Insights

      Firms deploy data mining models from customer data to uncover key characteristics and differences among their customers. Data mining can be used to create personas and personalize each touchpoint to improve overall customer experience. Data Mining Research Assignment.In 2017, Disney invested over one billion dollars to create and implement “Magic Bands.” These bands have a symbiotic relationship with consumers, working to increase their overall experience at the resort while simultaneously collecting data on their activities for Disney to analyze to further enhance their customer experience.

    ORDER A CUSTOM-WRITTEN, PLAGIARISM-FREE PAPER  HERE

    Challenges of Data Mining

    While a powerful process, data mining is hindered by the increasing quantity and complexity of big data. Where exabytes of data are collected by firms every day, decision-makers need ways to extract, analyze, and gain insight from their abundant repository of data.

    • Big Data

      The challenges of big data are prolific and penetrate every field that collects, stores, and analyzes data. Big data is characterized by four major challenges: volume, variety, veracity, and velocity. The goal of data mining is to mediate these challenges and unlock the data’s value.

      Volume describes the challenge of storing and processing the enormous quantity of data collected by organizations. This enormous amount of data presents two major challenges: first, it is more difficult to find the correct data, and second, it slows down the processing speed of data mining tools.

      Variety encompasses the many different types of data collected and stored. Data mining tools must be equipped to simultaneously process a wide array of data formats. Failing to focus an analysis on both structured and unstructured data inhibits the value added by data mining.

      Velocity details the increasing speed at which new data is created, collected, and stored. While volume refers to increasing storage requirement and variety refers to the increasing types of data, velocity is the challenge associated with the rapidly increasing rate of data generation.Data Mining Research Assignment.

      Finally, veracity acknowledges that not all data is equally accurate. Data can be messy, incomplete, improperly collected, and even biased. With anything, the quicker data is collected, the more errors will manifest within the data. The challenge of veracity is to balance the quantity of data with its quality.

    • Over-Fitting Models

      Over-fitting occurs when a model explains the natural errors within the sample instead of the underlying trends of the population. Over-fitted models are often overly complex and utilize an excess of independent variables to generate a prediction. Therefore, the risk of over-fitting is heighted by the increase in volume and variety of data. Too few variables make the model irrelevant, where as too many variables restrict the model to the known sample data. The challenge is to moderate the number of variables used in data mining models and balance its predictive power with accuracy.

    Data Mining Challenges

    • Cost of Scale

      As data velocity continues to increase data’s volume and variety, firms must scale these models and apply them across the entire organization. Unlocking the full benefits of data mining with these models requires significant investment in computing infrastructure and processing power. To reach scale, organizations must purchase and maintain powerful computers, servers, and software designed to handle the firm’s large quantity and variety of data.

    • Privacy and Security

      The increased storage requirement of data has forced many firms to turn toward cloud computing and storage. While the cloud has empowered many modern advances in data mining, the nature of the service creates significant privacy and security threats. Organizations must protect their data from malicious figures to maintain the trust of their partners and customers.

      With data privacy comes the need for organizations to develop internal rules and constraints on the use and implementation of a customer’s data. Data mining is a powerful tool that provides businesses with compelling insights into their consumers. However, at what point do these insights infringe on an individual’s privacy? Organizations must weigh this relationship with their customers, develop policies to benefit consumers, and communicate these policies to the consumers to maintain a trustworthy relationship.

    Types of Data Mining

    Data mining has two primary processes: supervised and unsupervised learning.

    • Supervised Learning

      The goal of supervised learning is prediction or classification. The easiest way to conceptualize this process is to look for a single output variable. A process is considered supervised learning if the goal of the model is to predict the value of an observation. One example is spam filters, which use supervised learning to classify incoming emails as unwanted content and automatically remove these messages from your inbox.

      Common analytical models used in supervised data mining approaches are:

      • Linear Regressions

        Linear regressions predict the value of a continuous variable using one or more independent inputs. Realtors use linear regressions to predict the value of a house based on square footage, bed-to-bath ratio, year built, and zip code.Data Mining Research Assignment.

      • Logistic Regressions

        Logistic regressions predict the probability of a categorical variable using one or more independent inputs. Banks use logistic regressions to predict the probability that a loan applicant will default based on credit score, household income, age, and other personal factors.

      • Time Series

        Time series models are forecasting tools which use time as the primary independent variable. Retailers, such as Macy’s, deploy time series models to predict the demand for products as a function of time and use the forecast to accurately plan and stock stores with the required level of inventory.

      • Classification or Regression Trees

        Classification Trees are a predictive modeling technique that can be used to predict the value of both categorical and continuous target variables. Based on the data, the model will create sets of binary rules to split and group the highest proportion of similar target variables together. Following those rules, the group that a new observation falls into will become its predicted value.

      • Neural Networks

        – A neural network is an analytical model inspired by the structure of the brain, its neurons, and their connections. These models were originally created in 1940s but have just recently gained popularity with statisticians and data scientists. Neural networks use inputs and, based on their magnitude, will “fire” or “not fire” its node based on its threshold requirement. This signal, or lack thereof, is then combined with the other “fired” signals in the hidden layers of the network, where the process repeats itself until an output is created. Since one of the benefits of neural networks is a near-instant output, self-driving cars are deploying these models to accurately and efficiently process data to autonomously make critical decisions.

      • K-Nearest Neighbor

        The K-nearest neighbor method is used to categorize a new observation based on past observations. Unlike the previous methods, k-nearest neighbor is data-driven, not model-driven. This method makes no underlying assumptions about the data nor does it employ complex processes to interpret its inputs. The basic idea of the k-nearest neighbor model is that it classifies new observations by identifying its closest K neighbors and assigning it the majority’s value. Many recommender systems nest this method to identify and classify similar content which will later be pulled by the greater algorithm.

    • Unsupervised Learning

      Unsupervised tasks focus on understanding and describing data to reveal underlying patterns within it. Recommendation systems employ unsupervised learning to track user patterns and provide them with personalized recommendations to enhance their customer experience.

      Common analytical models used in unsupervised data mining approaches are:

      • Clustering

        Clustering models group similar data together. They are best employed with complex data sets describing a single entity. One example is lookalike modeling, to group similarities between segments, identify clusters, and target new groups who look like an existing group.

      • Association Analysis

        Association analysis is also known as market basket analysis and is used to identify items that frequently occur together. Supermarkets commonly use this tool to identify paired products and spread them out in the store to encourage customers to pass by more merchandise and increase their purchases.

      • Principal Component Analysis

        Principal component analysis is used to illustrate hidden correlations between input variables and create new variables, called principal components, which capture the same information contained in the original data, but with less variables. By reducing the number of variables used to convey the same level information, analysts can increase the utility and accuracy of supervised data mining models.

    • Supervised and Unsupervised Approaches in Practice

      While you can use each approach independently, it is quite common to use both during an analysis. Each approach has unique advantages and combine to increase the robustness, stability, and overall utility of data mining models. Supervised models can benefit from nesting variables derived from unsupervised methods. For example, a cluster variable within a regression model allows analysts to eliminate redundant variables from the model and improve its accuracy. Because unsupervised approaches reveal the underlying relationships within data, analysts should use the insights from unsupervised learning to springboard their supervised analysis.Data Mining Research Assignment.