One of the essential differences between the epidemic that attacked the world a hundred years ago and was called the “Spanish flu,” and what the world is facing since the beginning of the current year 2020 with the emerging Corona virus, is the amount of huge data that rushes from official reports, and scientific studies related to virology and epidemiology in general, And the Corona virus family in particular, and the identification of the MERS and SARS viruses.
This is in contrast to the Big Data now available all around us, generated by users via social networks, communication networks, search engines, and many electronic transactions. As an example, it is possible to track the increasing number of searches day after day for symptoms of a disease on a search engine such as Google, which may serve as an indicator of the imminence of a particular epidemic in a region.
Data science combines three main bundles of science, skills, and knowledge, starting with statistics and mathematics, then programming skills, especially artificial intelligence and machine learning, and then knowledge related to the nature of the field whose data is monitored and analyzed.
The field of data science is one of the most in-demand fields in the labor market in the world during the past five years, and according to the report of the “Glassdoor” website, it is the most in-demand job in 2018 in the American market, which is what LinkedIn’s annual report for 2017.
Perhaps the most prominent current efforts related to the employment of data science in the face of the emerging corona virus, the challenge launched by the White House Office of Science and Technology Policy (OSTP) in mid-March 2020, to build a huge open source data center CORD19, in which government institutions, academia, and technology companies participate, such as The US National Institutes of Health (NIH) Library, the Allen Institute for Artificial Intelligence, Cold Spring Laboratory, Georgetown University, as well as Google, Microsoft Research, the Chan Zuckerberg Initiative, and dozens of other institutions.
Reality questions and challenges
This challenge raises a set of important questions for researchers and data scientists, everywhere in the world, about everything related to the virus, such as:
This challenge is hosted by the Kaggle platform, which includes a large number of data scientists and ML developers. To date, the initiative has collected more than 40,000 specialized research papers. “The epidemic is exacerbating very quickly, and we need to understand the different aspects related to the virus, learn what works or fails from the procedures followed in each country, and anticipate the economic repercussions, all of which are necessary,” says Ahmed Metwally, a researcher in bioinformatics in the Department of Genetics, Stanford University, USA. Points where data science can be very useful."
Metwally participates in another challenge related to the use of data science and artificial intelligence to confront the Corona virus, which was launched by the Massachusetts Institute of Technology as part of global efforts to search for solutions to this crisis, especially with regard to how to optimally protect the groups most vulnerable to infection with the virus, and how to support Hospitals and health institutions with staff, equipment and resources. For his part, “Metwally” stresses that “there are many efforts that data can contribute to, such as the interactive dashboard developed by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, and also tracking epidemiological models, such as what was monitored by the Imperial College research report. ".
Another useful application, which combines the Internet of Things and data science, is the use of wearable biosensors to measure temperature, pulse, and other health variables in patients, or even healthy people who have watches that measure pulse, temperature and some bodily functions.
There is research that collects and analyzes genomic data for the evolution of previous coronaviruses, which helps predict the next mutation of the virus, such as the real-time information provided by the Fred Hutch Center's Nextstrain platform. This open source project aims to make this data and powerful analysis tools available to the general public for use in order to increase understanding of the epidemic and improve response to the outbreak.
There is another model in employing data to facilitate the process of simulating medical experiments to choose the most effective drugs in fighting the virus, such as what the world’s fastest computer, Summit, produced by IBM, did, as it analyzed 8000 compounds, to find the most effective drugs, and the result was Suggestion of 77 drugs, arranged according to their preference.
Data accuracy challenge
With the need to work around the clock to collect, refine, organize and analyze data, the issue of accuracy is a key point, especially with cases of disagreement or sometimes contradictions about the available data, and this may include reliable sources, such as what the World Health Organization advises to stay away from more than meters to prevent corona virus, while the Center for Disease Control and Prevention (CDC) adopted a study saying that the distance should not be less than 4 meters.
There is a large list of differences, such as the extent to which the virus is affected by heat, the controversy surrounding its ability to transmit through the air, and its length of stay on surfaces. Paul McLachlan, Director of Data Science at Ericsson's Artificial Intelligence Research Center, sees the problem of data accuracy related to the emerging coronavirus as a global problem linked to the nature of this unknown virus.
However, he thinks it's not an insurmountable problem, he says: "Fortunately, data science has many ways to assess data quality and measure uncertainty around estimates, such as analyzing data sensitivity and testing simulation systems for certainty."
Some specialists believe that the problem of not having enough data is more of a challenge than having it, with some of it inaccurate. “Master data must be available first, for data science to play its role, specifically in spatio-temporal scales, and then add the basic variables, such as incidence, recovery, and mortality,” says Ali Abdel-Hadi, a professor of mathematics and founder of the actuarial science program at the American University in Cairo. .
He adds: Our work depends on the availability of data, and if it is not available, future prevalence rates cannot be predicted, which affects proactively directing appropriate support to critical areas.
McLachlan, who is one of the world's leading data scientists, agrees, and says: "Many countries do not have enough data to characterize the scope and nature of the spread of the virus, especially with a large proportion of infected people asymptomatic, and this is not a small obstacle, because it is It makes it difficult for governments to deal with the virus and undermines their ability to identify the right resources, take the right actions at the right time, or trace contacts.”
Virus, data and privacy
Many countries in the world rely on different methods of collecting and presenting data related to the virus, but according to the statements of the World Health Organization, most Middle Eastern countries are cooperative and send their information regularly, including the Egyptian government, which Amr Talaat, the Egyptian Minister of Communications, commented on, saying: “Since The beginning of the crisis and the government is working to employ applied research and artificial intelligence solutions to implement the best procedures according to the available data,” stressing that the Ministry of Health owns the data related to the development of the new Corona virus in Egypt.
And about the Egyptian government’s use of applications that rely on employing user data on mobile phones, and it can monitor their movements to warn in the event of approaching places with cases infected with the virus, as China did, and as is the case with America, which launched a similar model, developed by Google and Apple.
The Egyptian Minister of Communications and Information Technology acknowledged the government's consideration of relying on such applications. Asked about the extent of their impact on the privacy of users, he emphasized: This is one of the reasons why we are deliberate about such options, to ensure that they will be useful, without violating the privacy of users.
After these statements, the Ministry of Health has already launched the "Health Egypt" application, which collects data from users Crowdsourcing, by warning against approaching places where cases of infection have been recorded.
According to Ahmed Metwally, the privacy of individuals is a major challenge for data scientists in a sudden crisis like Covid19, and it is not an easy challenge, especially in countries that have legislation protecting data privacy, such as the European Union GDPR Data Protection Act, as well as the US HIPAA privacy rules that protect medical records citizens, calling for a serious approach to this challenge; To prevent the dangerous consequences of a data breach.
For his part, Imad Al-Azhari - Director of Big Data and Artificial Intelligence Strategy at Vodafone Egypt - revealed that the issue of privacy differs from one generation to another, according to research conducted by our research and development laboratories. He confirmed that modern generations - especially Generation Z and beyond - have less concern for data privacy. compared to the older generations. This generation is characterized by having grown up in the age of the Internet, social networks and smart phones, which greatly affected its skills, knowledge and behavior, which must always be taken into account.
While McLachlan believes that there is a lot that can be done without resorting to raising the veil of data privacy, he cited the example of the (CORD-19) challenge launched by the White House Office of Technology and Policy, and said: “I work with 400 researchers and volunteers from within Ericsson research laboratories using Artificial intelligence, and data science for that matter, based on the results of thousands of entire research papers.”