Fortunately, good information is available, if you not only know where to look but how to look. Living in the open information age now means that many large datasets on Coronavirus and Covid-19 are publically available (particularly important for scientific collaboration). These data include curated libraries of scientific papers like CORD-19 with over 63,000 articles and growing
The scale of this data set means that effective search is key to extracting valuable information. Information = data + context, so context needs to be understood from the search itself. Keyword search explicitly does not provide context as each word is treated individually, but cognitive search is up to the task. Cognitive search processes words semantically, understanding the full context of a word by looking at the words that come before and after it.
So, we set about the task of applying our knowledge of Transformer based NLP models to train a question answering system for Covid-19. We ingest the CORD-19 dataset and use our model to create cognitive embeddings for every paragraph (a numerical representation of the text semantics). We then tag the data with the embeddings and index. When a question is asked we create an embedding for the question, search the data set and then use embedding similarity to determine the best response to present
The key then is accessibility – how do you make the question answering system accessible. For this we turn to consumer messaging technology – now a ubiquitous form of communication familiar to most. We picked WhatsApp and used it to allow users to send questions to the service and receive a summary of the passage that best answers the question along with a link to the original paper. Have a look at the demo below
WhatsApp demo
Leave a Reply