“Data is like garbage. You’d better know what you are going to do with it before you collect it.”

Mark Twain


The conventional view on data and garbage is “garbage in, garbage out.” But Mark Twain raises a more fundamental point, that data is only as useful as its potential applications, and these must be thoroughly investigated before the collection begins. This is important because collecting data for the sake of having more data is wasteful of both time and money. Furthermore, focusing on data can distort behaviour, limit the development of new questions and challenging insights and also delay action. Twain’s warning is even more relevant today with our continuous focus on data collection, our increased ability to acquire data and AI generating new data. As a result, we need a new approach that redefines the relationship between data collection, measurement, insight and action, which is more focused on effectiveness and impact. Without such a transition, our ability to respond to social, economic and climate challenges will be constrained and will eventually stall.

The costs and time associated with collecting data are not immaterial. Raoul Rupare identified that the cost of planning and consultation for the development of the Lower Thames Crossing in London has already reached £800 million, and construction has not started yet. To put this in context, this is “twice as much as building the world’s longest road tunnel”.

Our focus on data can lead to misrepresentation and distortions, depending on the data collected and how it is used. “What gets measured gets managed,” is a truism commonly attributed to the leading management thinker Peter Drucker. Unfortunately, the quote is incomplete: “What gets measured gets managed — even when it’s pointless to measure and manage it, and even if it harms the purpose of the organization to do so.” There are multiple examples of inappropriate management measures, the use of ‘body counts’ by the US Army in Vietnam, pharmaceutical companies setting annual targets for new drug development, measuring urban development using simple models with only two data points, eg designing towns around key ratios like schools per number of houses. During the Vietnam War, the US government reported on the ‘body count’ of soldiers killed. This measure allowed the US to present a favourable impression of their progress, but it also distracted the military from effectively developing new tactics to combat guerrilla warfare. More alarming was the army’s manipulation of the numbers to create the impression that the US was ‘winning the war’ against the North Vietnamese. In the second example, pharmaceutical companies setting annual targets for new drug development, aiming for four to five new compounds a year, resulted in organisations over-emphasising quantitative targets and under-appreciating the importance of culture to drive innovation. In urban planning, the use of simplistic mathematical models to understand and design cities resulted in sterile and disconnected redevelopment projects, which inadvertently destroyed the variety and connectivity that made the city worthwhile in the first place. In all these examples, an overreliance on data collection and a mismanagement of measurement objectives resulted in a flattening of complexity with damaging results. 

In addressing complex problems, there is a growing issue of ‘repetitive analysis’, where researchers address the same issues, reuse comparable data, produce similar conclusions and only rarely develop practical new insights. One example is in the study of the relationship between income inequality and unequal health outcomes. There have been a plethora of reports on the subject; our research identified 129 papers referenced by Rowlingson, 223 by Lynch and 63 by Schenkman, going over similar ground. The reports primarily concentrate on defining the problem with limited emphasis on developing new policies and assessing their impact. 

There is an alternative approach, three homeless charities instead of just recounting the number of people sleeping on the streets, challenged conventional wisdom and analysed when was the best time to support people at risk of homelessness, how financial aid should be best delivered and what should be the end goal of the support given. They developed three important insights: support given before becoming homeless is much more effective than later support, giving funds directly to the individual rather than through support staff was the most efficient approach and that effective interventions focused on creating agency for individuals rather than dependency. The London charity Passage developed No Night Out, which provides safe, temporary accommodation to those about to spend their very first night on the streets, so that the charity can work with them to build a tailored support matched to their specific needs. Since 2021, this programme has successfully prevented over 300 people from spending their first night on the street, and over 90 percent of people going through the programme are in housing and sustaining their tenancies. The New Leaf project, based in Vancouver, Canada, gave 50 people who were homeless around £4,000 directly rather than via a support worker. Their programme demonstrated that, after one year, those people who had received the money directly spent fewer days without a home, with no evidence of increased spending on drugs, alcohol and tobacco. The London charity Greater Change’s approach is to ask people experiencing homelessness, ‘what would be effective in supporting them out of homelessness’. The result is a customised empathetic approach with an emphasis on agency. Research demonstrated that of the 1824 people the charity supported, 85 percent have been housed, and almost 50% have now found employment. The approach used by these three charities points to a more effective approach to using data: asking questions to challenge conventional wisdom, developing new insights and ideas, testing solutions in the field, and comparing different approaches when assessing impact.

Our current approach to data creates a third risk, that the focus on collecting data can result in delayed action. This trend, as Klein argues, is partially the result of the unintended consequence of living in a society which emphasises individual over societal needs. One consequence of this is that ‘Liberals speak as if they believe in government and then pass policy after policy hamstringing what it can actually do’. Organisations are laden with requirements to understand everyone’s views and to conduct detailed policy assessments and planning, which slows down decision-making and action. Davies suggests a second cause for delays and lack of action: the tendency of organisations, in complex situations, to create layers of process, which often involve the collection and organisation of data, leading to reduced accountability. Davis argues that the general flight from accountability wasn’t necessarily being caused by the sneakiness of professional managers, or the psychological and legal considerations that made taking responsibility intolerable. It was more likely that managers didn’t feel accountable for the actions of organisations because it didn’t seem to them as if they were the actual decision makers.” As a result of these trends, the distance between problem and person has grown as we focus more and more on data, which is de-energising for citizens, communities and policymakers. Finally, there is an observed pattern that when issues become very problematic, organisations consciously move away from action towards collecting data and providing analysis, advocacy and advice. The net result is a crowding out of action by a focus on data, where organisations measure their impact by the number of questionnaires collected, reports produced, web platforms launched, conferences attended, and workshops run.

A lack of awareness of the real world is a major cause of poor policy formation to address complex issues. However, the risk is even more extensive. This overreliance on, and misuse of, data can result in metrics which distort policy outcomes, fail to design effective policy and delay action in the delivery of results. Organisations of any kind: governments, businesses, think-tanks and charities should all be aware of these risks and ensure their use of data emphasises both relevancy and impact. We need a new approach to address complex problems, with emphasis on thoughtful measures based on a precise understanding of the nature of the problem to be addressed and linked to specific implementable actions. For example, in city development, Jacob identified ‘clues,’ like the length of opening hours of shops, were an effective measure of urban vitality and success. With counterinsurgency programmes, measuring increasing land values, which signal growing confidence in a peaceful future, is a more accurate measure of successful implementation. Using thoughtful approaches to the collection, analysis and application of data results in greater impact, more accurate assessments of policy and less distortion.

If we are unable to shift to this mindset, there is a deeper and longer-term risk. The sum of the effect of reducing everyone to small data points over time creates a problem of standardisation, of depersonalisation, of making every problem the same ‘size’. There is no natural prioritisation of issues, with the result that the whole system will start to stall and overload. The reality is that choices must be made and decisions implemented. It is necessary to accept that there is no single, universal solution that can be applied to everyone at once. Falling back on data collection and analysis may feel constructive, but it is not productive. As Davies states, “Information only counts if it’s being delivered in a form in which it can be translated into action.” This requires a different framing of questions to not only examine the nature of problems but also link to viable policy levers and to focus on the minimum information necessary to create action.

‘The Architect’ from The Matrix Reloaded (2003).

Posted in

2 responses to “Are we drowning in data?”

  1. amcintyre13 Avatar
    amcintyre13

    Love the Twain quote. You are right to identify the core issue as a drift towards data accumulation in lieu of structured thinking and hypothesis generation. The scientific method identifies the theory to be tested and then assembles the data required to test that theory rather than mining the data to generate potential theories. The firehose of data we currently have allows spurious correlations to be elevated, anomalous data to be cherry picked, and analysis paralysis to take hold as in the Lower London Crossing example. Climate science is a good example of huge datasets being collected and analyzed to improve and test complex theoretical models – yet you only have to look at the US at the moment to validate another Twain quote, that “A lie can travel half way around the world while the truth is putting on its shoes’. Given the low marginal cost of storage we are unfortunately in a situation where not only are we in a situation where we are collecting a lot of data garbage without a clear plan for what we are going to do with it, but more worryingly bad actors can sift through that garbage for evidence that makes good but misleading headlines.

    Like

    1. Mike standing Avatar
      Mike standing

      amcintyre13 comment “Given the low marginal cost of storage we are unfortunately in a situation where not only are we in a situation where we are collecting a lot of data garbage without a clear plan for what we are going to do with it, but more worryingly bad actors can sift through that garbage for evidence that makes good but misleading headlines.“ highlights an important additional issue associated with our current predicament of drowning in data. 

      An eminent Oxford Professor observed recently that the internet had created an exponential increase in our ability to collect and distribute data with no corresponding capability to ensure its accuracy. 

      One wonders how long it will be until a new AI powered search engine emerges which will address the data quality and distortion issues we are currently facing. 

      Like

Leave a reply to amcintyre13 Cancel reply