Much of what we know about mental health comes from a clinical setting. When health commissioners, policy-makers and practitioners make decisions on the nation’s healthcare, most of the information at their disposal has been gathered from hospitals, GP surgeries, satisfaction surveys and controlled clinical trials. These are well suited for collecting the precise, exact measurements often demanded by medicine, but they miss an important part of the story of what it is like to suffer from a mental health condition in the UK today – the everyday struggles, successes and frustrations experienced by patients during the time in which they’re not interacting with the health system.
This imbalance of knowledge is evident in the online world, too. Engagement with large, official sources of health information are carefully measured and often well understood – NHS Choices, for example, collects an impressive amount of data on how people are engaging with their site, and for what reason. Again, however, these controlled, regulated environments do not give the whole picture. Important online communities have developed on healthcare focused web forums, freely open to the public and often centered around specific conditions or therapies.
These allow users, often anonymously, to talk to others who have experienced the problems they’re grappling with – to ask for help, share advice, air grievances. As well as providing a crucial resource for their those who use them, these forums represent a large and growing source of open, publicly accessible information on a wide spectrum of issues surrounding mental health, many of which are seldom recorded through official means. They range from large, carefully maintained libraries of thousands of posts to small communities discussing highly specific conditions and challenges.
Unregulated forums, however, have been generally disregarded, distrusted or misunderstood by the healthcare profession. Even for those interested, the question of how to turn the large volumes of unstructured and potentially sensitive material into actionable insight raises serious technical, methodological and ethical issues. First among them is whether the health service or other official actors should be using this data in the first place. While it has been shared publicly, and usually through the use of a pseudonym, forum data is often highly sensitive, containing the experiences of vulnerable communities who often use these spaces to talk about their frustrations with the public sector. It is imperative that any investigation into public forums is conducted with the utmost sensitivity, upholds ethical standards and protects the privacy of those who use them.
Over the last year, CASM has been working with the Kings Fund to begin to address these issues, and investigate the potential of online health forum data to transform the way we understand mental health provision in the UK. Using customised web-scraping software, we collected and anonymised over a million posts shared on relevant public health forums over 12 years. We then trialled a number of technical approaches to find out what could usefully be discovered from this huge, messy dataset, without compromising the privacy of the users involved.
Throughout this investigation, questions of privacy and the ethical implications of our work were paramount. We did not seek to identify individual users, automatically removing usernames and other identifying data before posts were presented to our researchers. All data published has been fully anonymised, either through aggregation or careful manual editing – we have not specified the names of the forums involved, and any quotes have altered in such a way as to preserve meaning while protecting their authors from identification through online searches. Furthermore, in deciding which areas to study, we were guided throughout by input from healthcare professionals and experts at the Kings’ Fund.
Full details of our investigations, and our findings, are available in our paper Online Support, published today. In short, we found:
- Online forums contain valuable discussion from users who have personal experience with specific therapies – we used Cognitive Behavioural Therapy as a case study, finding over 5000 relevant posts. This discussion can be reliably sifted out using Natural Language Processing classifiers, and was found to include, for example, comments on the perceived effectiveness of the therapy and accounts of its use in self care.
- Natural language classification can also be used to find posts which are likely to be sent by users turning to the forums in an emergency – making an urgent ‘cry for help’ – though making this classification is more difficult. Further, many of these users were found to have mentioned a previous experience with the healthcare system.
- There is a sizable amount of information shared on online forums concerning comorbidities – physical conditions experienced alongside a mental health issue. This discussion can be effectively identified and categorised through analysis of keywords.
We feel there is significant potential for emerging technologies to provide a new window into mental health, though we found that these approaches worked better in some areas than in others – The technical approach taken is outlined in great detail in the paper. The development of these techniques, however, cannot be left to software developers and analytics firms. It is imperative that those who truly understand mental heath and online communities are involved in and understand any approaches used, not merely in a consulting role but in the training of the algorithms themselves.
This paper raises a number of questions about the future of health data online, and there is much work to do on the effectiveness, potential applications and ethical standing of examining forum data. If we can answer them, however, we have an opportunity to shape mental health provision for the better.