December 3, 2020

The FAIR and fair R&D

Making materials R&D more FAIR and fair at the same time.

Making materials modeling more FAIR and fair at the same time.

2020 gave us two important vectors: (1) digitalization due to disruptions related to the coronavirus pandemic, and (2) a global movement for equal opportunities regardless of racial, social, and economic backgrounds. In the context of materials science, we can “codename” these two vectors as the push for (1) FAIR, and (2) fair R&D.

The ever-increasing complexity of materials R&D

Materials drive innovation in energy, semiconductor, manufacturing, chemical engineering, food, agriculture, cosmetics, pharmaceuticals, and many other industrial sectors. Since the dawn of humanity — from the Stone Age to the Silicon Age — the boundaries of our ability to progress and innovate are defined by the materials we have in hand.

As we progress, materials requirements become more and more complex. This ever-growing complexity has reached a point where it is impossible for a human alone to comprehend. This fact, combined with the advent of machine learning, and the availability of databases of materials properties measured or simulated in the past, marks the transition to a new era where materials science is fueled by data-driven approaches.

This new era of data-driven R&D requires new digital tools and a new mindset. What worked yesterday will no longer work tomorrow. That is why we regularly hear about the Digital Transformation of R&D and the rise of Artificial Intelligence (AI) today. Due to the extreme complexity and diversity of materials science, we cannot go full speed with AI until we solve the problem of obtaining well-organized structured materials data and making it accessible to AI/ML.

Dozens of departmental units within a large Fortune 500 organization, are managed by different people, having their own “politically charged” non-communicating IT resources, and zillions of binary files, Excel sheets, ad-hoc notes scattered across all of them. Without much clarity about the scientific approaches deployed, equipment or simulation characteristics, and the nature of research contained inside the data. This is the reality of today. And this reality makes it impossible to efficiently build and apply AI on such fragmented and disparate data.

We believe there are two important urgent drivers transitioning the R&D to the new age:

  • (1) making the FAIR (Findable, Accessible, Interoperable, Reusable) data infrastructure enabling AI/ML,
  • (2) embracing a collaborative and inclusive — fair — approach through digital means, and cooperative gains.

1. The FAIR concept

The FAIR data principles. [Source: www.ands.org.au]

In order to accelerate the development of new materials, we urgently need a digital ecosystem for the design, execution, and exchange of data about materials, their properties, and modeling/simulation workflows. This can be done by embracing the FAIR (Findability, Accessibility, Interoperability, Reusability) concepts [1]. The ecosystem provides the information infrastructure enabling the application of advanced machine learning and artificial intelligence techniques, reducing the complexity of materials data and allowing researchers worldwide to develop materials faster with potential impact on the electronics, aerospace, automotive, chemical, and manufacturing, defense, and many other sectors [2].

Beyond just enabling data storage, such an ecosystem has to provide an incentive for the global community to evolve their digital practices. This way, instead of continuing to contribute data into the disconnected and fragmented landscape of today, they will be able to generate any new data in adherence to the new standards and “automatically” have it organized in the future. This is similar to transitioning from paper notes to digital: instead of using paper and converting them to digital again and again, why not just type them on a computer in the first place?

As with any innovation, there is a generational aspect to this problem. People have preferred ways to do their work, and are averse to change. After a certain age, this becomes especially evident. There are, of course, exceptions, but the trend is hard to argue with. That’s why we have to put extreme attention to educating the next generations of scientists and what kind of digital tools we provide them with. These tools have to embrace the FAIR concept, and over time, will become the new norm.

2. The fair R&D

African children play with a laptop computer. [Source: www.globaltimes.cn]

Besides the ability to facilitate data-driven research, the FAIR ecosystem has another key advantage: it promotes collaboration. No matter how smart you are, two heads are better than one. Especially when dealing with an extremely complex landscape of materials science. When data is standardized, collaboration becomes easier to establish and any intellectual property can be secured through long-established means (think Atlassian, Github).

The value of digital collaboration becomes apparent when we think about the global distribution of talent, software outsourcing, and coronavirus-related lockdowns. Digital tools gave rise to software outsourcing and made it possible today for a 17-year-old software-savvy kid from Mumbai to earn more money today than the rest of his 8-member immediate family combined. For materials science, the right digital ecosystem will make it possible for scientists from developing countries to access state-of-the-art resources, democratizing the research, and making it fairer”.

There are numerous examples of R&D outsourcing. Bengaluru is often referred to as the Indian Silicon Valley rooted in the software outsourcing initiatives of the 1990s. Today we see companies in Energy, Electronics, Manufacturing, Pharmaceuticals, and many other fields establishing R&D centers and actively hiring in India because of its large and attractive talent pool. The attractiveness of the South Asian region is growing rapidly with Bangladesh, Pakistan, Sri-Lanka, Indonesia, Malaysia, and other countries as important players. Beyond that, Latin America is steadily growing its footprint, and Africa with its young and growing population will be making many significant contributions to materials science soon.

The right Digital Tools make materials R&D Fair, leveling the field for the underdeveloped communities, ethnic and racial minorities, and even whole countries. The truly global problems we face today — climate change, pollution, pandemics — require a truly global response. And as long as your idea works it doesn’t matter whether you are Harvard educated or a polymath from an African village — we all share the same planet.

We believe in FAIR and Fair

The future is driven by data and AI. But just like anything, it needs infrastructure. Data infrastructure that can facilitate the applications of AI.

In order to build this infrastructure, we have to embrace the FAIR principles and allow people all around the globe to put their brainpower together.

At Exabyte.io, our mission is to make this possible.

Links

[1] “Big-Data-Driven Materials Science and its FAIR Data Infrastructure”, C. Draxl, M. Scheffler, https://arxiv.org/abs/1904.05859
[2] “From DFT to machine learning: recent approaches to materials science — a review”, Gabriel R Schleder et al 2019 J. Phys. Mater. 2 032001, https://iopscience.iop.org/article/10.1088/2515-7639/ab084b/pdf

Team Exabyte.io: Marta Bulaich, Timur Bazhirov.

Register to join Exabyte.io at https://platform.exabyte.io/register.