Saturday, April 2, 2011

Ensuring Data Quality: Negative Feedback Control System

Good quality data is essential for Intelligent Decision making. We can build top-class Data Analytics system but if the raw data is distorted the Decision made based on those data will also be wrong (Garbage in-Garbage out).
We have over invested in the Development of Decision making systems but underinvested in the effort/process/systems to ensure quality data entry. Unfortunately we have huge amount of junk data in the system and hence although we are using smart analytics methods we are making biased/wrong decisions.

The junk data is entered into the system for the following reason:
  1. Users are careless of entering true data. For example in a web-survey, people can answer the questions without even reading the questions (just to save time).
  2. Users can purposefully enter wrong data to make their case and meet their short term needs. For example companies can overstate profits to ensure better year-end bonus for the executives.
  3. Users are aware that most of the times they will not be caught for entering wrong data. They can manipulate data in different systems as and when needed. There is no integration, verification or reconciliation of the data in most of the systems.

The proposed idea is a generic framework which can be used to ensure that good quality data is fed into the system.

Basic Principle

Design an intelligent IT system which will have controls in place to persuade users to enter right/True data. This can be done by integrating IT systems across the Enterprise and putting logic in place to validate the entered data based on available data in the system.

Case for Better Technology Investment Process
For example, consider the Investment Decision making process. Many of the times these are not objective and depend on the face-value of the champion of the project. To make the case of a Project the champion can show very high Cost Benefit Analysis (CBA) numbers and get Finance/Top management approval. The Decision making body do not have enough tools to analyze the case based on historical data. Also the champion never fears to be questioned for artificially inflating CBA/ROI numbers.

If we can create a system where the Project cost (including the maintenance cost/operational cost) and the business benefits (revenue growth, cost save etc.) can be tracked and then matched against the actual CBA/ROI we can improve this gap. The Finance department/Investment decision making body will have better information to make the Investment decision. The Project owner/champion will also make sure the CBA/ROI numbers are realistic as this will be tracked against his/her performance (if Proposed CBA does not match against the Actual CBA). This way we can make sure right/realistic data is fed into the system and the decision making process is improved. Just as a data point, Gartner research shows 80% of the IT Investment money is ‘dead-money’  - simple waste of resources. If we can improve the situation even by 50% that will be a huge benefit.

How we can do that?

We need to integrate different IT systems which work in silos and then make a decision making system which reconciles data from different systems to create Business insights. In the previous example we need to integrate Time Tracking/Project management systems (source of cost numbers) with Marketing and Finance system (Profit/revenue numbers).

What is the USP of this solution?

The Intelligent System which can ensure that Users can not manipulate data for their own need and thus enter correct data. This can be achieved by integrating different systems of record in the Enterprise (single view of Project Management system, Revenue Accounting system, HR Management system etc.) and creating intelligent Business logic for verification of entered data (query system, reports).

This follows the famous Negative Feedback Control Theory. In this process the users strive to enter correct data for their own need. Let’s take the same example of IT Decision making system.

In case the users enter an abnormally high CBA values for a Project to push the case of the Project (reason may be some emotional attachment with the Project/people or some other personal motif) and in future the Project could not deliver the promised benefits the Decision maker will be asked for explanation: to justify the case for providing high CBA numbers.

At the same time if someone puts very low CBA (just to be safe and conservative) the Project will not be rolled as there will be Projects with higher CBA numbers which will be considered as more attractive investments.

Hence the user will try to do their due diligence to provide honest numbers. This way he/she will ensure that he/she minimizes the risk of not making his/her initiative or being responsible in future for providing false data(-ve feedback).

The intelligent decision making system should be an ideal repository for analyzing past data and thus improvising the estimation numbers for the future projects. This will also help in creating an enterprise wide Technology spending management system where unallocated/unused funds/resources will be tracked and then utilized in a timely fashion (early automatic adjustment of resources for maximum utilization).
Enterprise Wide operational data consolidation and intelligent analysis for improvement of Operational efficiencies: Following similar fashion we can create Intelligent Data Analysis systems for checking enterprise-wide Product Lifecycle Management (PLM) and Operational efficiency data (system availability, system-uptime, confidentiality, Data Integrity, Server/CPU and storage cost etc.). e.g. if we see a lot of problem tickets from a system, it indicates we need to reengineer the application and thus some improvement opportunity exists(Need Development/ReEngineering Projects).
This idea can be implemented in various systems as described below:

Case Studies:

  1. Better Quality Compliance: Historically speaking, the existing compliance laws (SoX etc.) have miserably failed to stop corruption. At the same time these laws have huge cost implications for the corporations. We create huge number of compliance related documents and this needs huge amount of cost and effort. Many of the times those documents are dead data – Data not updated properly and not verifiable. Worse: As we increase the quantity we decrease the quality of the people who creates these documents and have limited audit opportunity(lack of manpower) to check if the data entered are correct. This is not a smart system and huge amount of productivity gain is possible by freeing resources doing compliance related documentation. The FED, SEC and other compliance bodies can think of standardizing the whole audit process making it smarter. What I suggest is to create a generic data system framework where the data will be fed by the companies in the specific format (web based forms/questionnaires). There will be guidelines to generate theses data from the raw granular level data and the companies need to align their internal Information system to support these numbers. The compliance and the regulatory bodies will have access to the data and intelligent tools to automatically verify the truthfulness of the data. The data points need to be designed in a fashion so that data manipulation is not possible (this need to be improvised on a continuous basis). The regulatory bodies can benefit by efficient (verifiable), low cost (automated) audit system and thus improving better compliance. The companies can benefit by reducing stuff in the compliance and documentation related work (reduce cost) and engaging them in revenue generating activities (sales, innovation etc.). Overall the economy will be benefited by increasing the efficiency of audit and compliance process through 100% automation of validation and verification, better resource utilization and lower compliance related costs.
  2. Better Cost Transparency of Customer Servicing and Intelligent Customer segmentation Even with smart CRM systems in place most of the companies still have very inefficient Customer Servicing and Segmentation processes. We spend huge amount of money in wrongly targeted Promotional activities. For example a big US bank provides customers to customize their Debit card with personalized photographs. Whenever such an activity is initiated the Business rule creates a Promotional mailer for a Travel related service. This Service is Free for a month and you get a Risk free Prize for enrolling into it (iPod/ Portable DVD Player etc.). If you cancel within a month you keep the Prize and you are not charged. This looks to be a pretty standard promotion and should help the Bank to generate charges for the Travelers program. But the part is even if someone has cancelled the program and again requested for a Customized card similar promotional offer is sent to him, although he has cancelled before and received the Prize. A better approach could be to integrate the Prize fulfillment system with the Marketing department and not to send the Promotional offer again (high chance that he will repeat the same activity). We need to have a system where we can see the Total cost of ownership of a customer (Servicing cost etc.) and the Potential revenue that he can generate over time. We need this kind of CBA transparency at the most granular level (customer level) to make right segmentation decisions.
  3. Better People management: We don’t have objective systems in place for Performance appraisal of the people, Business units etc.  Many of the cases the appraisal process is subjective and not based on true data. We have very inefficient compensation system where we don’t objectively measure the return on investment of a specific employee. Most of the times employees can manipulate the data and take undue advantage.

Let me provide some more details around the opportunity. In a typical IT organization we have different Employee Activity Tracking system (this is a typical case in India, in US we generally don’t track and micromanage employee activities and are more result oriented for performance appraisal etc.)  – We track Entry and Exit time of an employee (for attendance record), Computer Login and Logout times for making sure they are shutting down their systems before going home (green IT perspective + Tracking activities for security reason/audit etc.), Timesheet for measuring the effort data(which is the baseline for future estimation) and a Performance appraisal system for doing Employee appraisal( which ultimately decides salary, promotion etc.). Unfortunately all these systems are not integrated and do not have intelligence to verify if the employees are entering junk data. This creates a bad employee culture –either they work more or  just to show they are working more they stay late in the office (probably doing their personal work, surfing Internet etc.) and at the same time showing they are working only eight hours in the timesheet thus increasing their productivity numbers. This has multiple effects - underestimated effort data (in case a critical project demands an effort of 100 unit and even though they took 100 unit to complete the work they report 70 unit in the timesheet – thus base lining 70 unit for a Project which actually requires 100 unit – causing missed revenue opportunity, high employee stress, bad work-life balance, increased risk to the overall Project timeline), misrepresented employee productivity(although taking 100 hrs to complete the work they are reporting 80 hrs and thus improving their chance to get better performance rating), high Operational cost(even they have no work employees are staying late in office, causing more electricity cost for AC, Computers)etc. But if we can integrate the systems and place negative feedback control systems in place we can reduce this problem. Now if they are reporting 8 hrs of work in timesheet but their Computer login and logout data shows 16 hrs or office in and out data shows similar trend they will be asked questions. This will ensure they report honest data in the system and the management can get true picture of employee effort and productivity.
A smart HR system, integrated with different other systems, can ensure right data is put into the system and the decisions made are objective. This can be extended even to Outsourcing decisions and vendor management systems. We can create intelligent systems tracking their performance (against the contract) and make decisions for next contract (whether to renew or negotiate the contract etc.). Benchmarking against the industry standard for estimating the effort needed for Developing a System/solution will ensure vendors are not overcharging.
  1. Create better Surveys: My gut feel is most of the Survey responses are junk data. People generally hate surveys and in general they are negligent in filling the data. Also they sometimes fill biased data. Executives make a mistake in taking those survey results as truth and acting on those decisions. We need to create intelligent surveys where contradictory responses will be questioned and a better quality data will be generated.


Anonymous said...

I found a company working on this area. But not sure if they have followed my logic of negative feedback control system to ensure data quality. People do not enter correct information all the time.

Martin135 said...

We facilitate the provision of independent analysis to support expert testimony, regulatory or legislative engagements. Frequently, this work includes economic, financial and statistical studies of varying data analysis, technical and