January 24, 2024

WTH is Materials Informatics?

Richard Feynman once said that nobody really understands Quantum Mechanics. Today, the same can be said about Materials Informatics. The industry can be confusing, with "apples" often compared to "oranges," leading to wasted time researching and testing the wrong solutions. Mat3ra.com (formerly known as Exabyte.io) was founded before the term "Materials Informatics" even existed, and we have experienced the industry's emergence firsthand. In fact, Mat3ra has been mentioned among the key players in the industry of Materials Informatics at least four times in 2023 alone, and yet even we are still confused about what the term means! Why? Read on to find out.

A futuristic rendering of a workstation for a Materials Informatics scientist/engineer by ChatGPT.
A futuristic rendering of a workstation for a Materials Informatics scientist/engineer by ChatGPT.

In 2023, Meta, Google, and Microsoft [1-3] were among some of the main newsmakers in the rapidly growing [4,5] field of Materials Informatics

But what exactly does Materials Informatics mean, and why does it matter?

Richard Feynman once said: “I think I can safely say that nobody really understands Quantum Mechanics”. Today, it is safe to say the same about Materials Informatics. Market analysts, venture capitalists, corporate executives, startup founders, and anyone else who touches the industry operate in a confusing environment where “apples” are often compared to “oranges,” resulting in wasted time researching and/or testing the wrong solutions.

I started Mat3ra before "Materials Informatics" was a term, and I have been experiencing the emergence of the industry first-hand. In 2023, Mat3ra.com (fka Exabyte.io) has been mentioned among the key players in the industry of Materials Informatics at least four times [4], yet even I am confused about what the term means! So here’s my attempt at explaining it.

1. A bit of history.

The term "Materials Informatics" gained traction around 2017, primarily through Lux Research's efforts, when they began tracking this emerging landscape. Here's a summary of the Materials Informatics timeline.

2000s: Foundational Period

  • The concept of applying informatics to materials science began to gain traction. This period is marked by foundational research exploring how data-driven approaches can be used in materials science for energy, semiconductor, manufacturing, and other industries.
  • Emergence of key papers that lay the groundwork. These papers discuss the potential of using databases and computational methods to analyze and predict material properties.

2010s: Growth and Development

  • 2010-2013: Rapid development in the field, with an increase in publications discussing the integration of machine learning and AI in materials science. The Materials Genome Initiative (MGI) was launched by the Obama administration, significantly boosting the field by promoting the integration of computational tools in materials science.
  • 2014-2016: Emergence of significant materials databases like the Materials Project, AFLOWlib, and others, providing essential data for research and development.
  • 2017: Lux Research starts tracking the landscape of Materials Informatics, marking formal recognition of the field [6]. This year is often cited as a pivotal point in the history of Materials Informatics, with increased industry and academic attention.
Lux Research's early release about Materials Informatics circa 2017.

2020s: Rapid Advancements and Diversification

  • 2020-2023: Further advancements in machine learning and AI techniques specifically tailored for materials science. This period sees a diversification in the application of Materials Informatics, extending to areas like battery development, renewable energy, and biotechnology.

2. The Methodology.

2.1. Major Trends.

We speak about materials informatics today due to the emergence of the new paradigm of R&D - data-driven research. Previously, we could rely on (1) experimentation, (2) theory and solving analytical equations with pen and paper, and (3) computer simulations allowing us to calculate what can't be solved. Recently, we started having enough data produced by (1)-(3) to see its trends and rely on AI/ML techniques to provide scientific insights. Thus, all the Materials Informatics industry players represent multiple facets of this complex "tectonic" shift we experience and aim to facilitate this transition from an "Edisonian" approach, where we had to run thousands of expensive and slow experiments, to an "Einsteinian" approach guided by computation and AI.

The emergence of the data- and AI/ML-driven paradigm of R&D is the key driver behind the growth of the Materials Informatics industry.

Similarly, all solutions in this field touch two or more of the following three domains: (1) Materials Science and Chemistry, (2) Data Science, and (3) Computer Science.

The scientific disciplines involved in digital materials R&D. The cross-section is where Materials Informatics is.

2.2. Distinguishing characteristics.

Let's identify specific areas of R&D that are grouped into "Materials Informatics" today. To help "decompose" the problem space into domains, we suggest the following vectors:

  1. Primary source of data: (1) experimentation, (2) computer simulations. This reflects on the root R&D paradigms that solutions are embracing. For most practical materials science, we can only get information from experiments. For some, we can use physics-based simulations like Density Functional Theory (DFT). Some solutions aim to combine both experimental and computational data. However, one or the other is always prevalent, and there is no 50/50 split.
  2. Ability to generate new independent data: (1) present, (2) not present. Here, we track whether a solution can provide a way to generate new data that can be used to contribute to the refinement of the resulting AI/ML techniques. This can be confusing since ML models can generate new data. However, we use "independent" for a reason: any ML model requires training data originating outside of it to improve itself. Otherwise, we get Baron Munchausen pulling himself out of a swamp by his own hair. A self-driving lab or a physics-based simulation platform to produce data independently of the ML model can provide a critical advantage.
  3. Focus stage of R&D: (1) early-stage, (2) mid-stage, (3) late-stage. To clarify, "Early Stage" means highly creative research, needing many design and prototyping capabilities, and staying close to academic research. "Mid Stage" means less creative research directly affecting the manufacturing, business, and product capabilities. "Late Stage" primarily includes low creativity work with a direct impact on the business and product, such as optimizing the yield of manufacturing processes, for instance.
  4. Focus on specific application areas: (1) deep, (2) shallow, (3) none. Here we consider how transferable the solutions are from one application area to another. An example solution with a "deep" focus is concentrated on a specific narrow domain only, such as "iron-based high-temperature superconductors". One with a "shallow" focus is allowing users to study all electronic materials, for instance. The one with no focus is a horizontal solution providing a way to improve studies of any materials (obviously, with limited depth).
  5. Root scientific discipline: (1) computer science / high-performance computing, (2) materials and chemistry, (3) data science, (4) the intersection of the three. Here we trace the origins of the solution to one of the three scientific disciplines, usually more pronounced within the product. The fourth option here is an exception where the focus of the solution is to build "bridges" between the disciplines.
  6. Other considerations:
  • Company/Product development stage: (1) early, (2) mid, (3) mature.
  • Level of interaction with customers: (1) self-served SaaS with no sales/support oversight required, (2) high-touch enterprise deployment requiring a sales/support representative, (3) consulting-based engagement.
  • Level of collaboration: (1) global online platform; (2) local organizational-level deployment; (3) none (standalone software application).

Not to overcomplicate an already complex picture, but all the above can be further subdivided according to the type of materials concerned: Electronic Materials, Metals, Ceramics, Chemicals, Composites, Polymers, and Other.

Also, notably, item #2 above overlaps with the more established "Computer Aided Design" and "Computer Aided Engineering" industries, which existed for decades but had a relatively small level of penetration in materials and chemistry space (compared to mechanical engineering and/or electrical engineering, for example).

3. Key Players and Sub-classification.

3.1. The Competitive Landscape.

Before continuing, let me re-iterate that the thoughts below reflect my personal experience and understanding of the field and are not meant to accurately represent the objective state of the industry or the way that other players see themselves. The online information is often limited, especially for the high-touch/consultative solutions. That said, here's my attempt to present the Materials Informatics "playfield" and where some solutions would stand based on the above categorization.

The Materials Informatics landscape: some of the players in the industry and related to it, and their corresponding characteristics according to the above classification in 3.2.

3.2. Sub-classification.

There is no single industry of Materials Informatics. Despite the efforts of the market research professionals, instead of combining everyone into a single bucket, it would be better to "divide and conquer".

One of the challenges in understanding Materials Informatics lies in its broad application and interpretation. Currently, the term encompasses many companies and technologies which are fundamentally different. To make sense of this diversity, we need to subclassify by the following dimensions:

I. Independent Data Synthesis.

  1. (Physics-Based) Data Synthesis Solutions: These focus on creating data through physics-based approaches, providing insights grounded in fundamental scientific principles. Self-driving labs of high-throughput experimentation can also be put here.
  2. Data Analysis Solutions: These involve analyzing existing data provided by customers and extracting valuable information from pre-existing datasets.

II. Black-box solutions vs. Open platforms.

  1. Black-Box AI/ML Tools: Such tools use artificial intelligence and machine learning in a 'black-box' manner, offering solutions without needing users to understand the underlying mechanics.
  2. Open Platforms with Multiple Toolchains: These platforms provide a range of tools and functionalities, facilitating various approaches in Materials Informatics.

III. Horizontal vs. Vertical Solutions.

  1. Horizontal solutions that span multiple material domains and
  2. Vertical solutions that focus on specific material areas.

The above three are not exhaustive but should provide a clearer way to distinguish between different offerings. The ability to perform data synthesis outside the AI/ML toolchain is a key differentiating factor. Many outsiders have difficulty understanding the difference because AI/ML generates data, too. However, using only AI/ML-generated data to train more AI/ML violates the "causality" principle. That's why we need input from the other layers of the figure in 3.1. In many practical cases, only physics-based simulations (maybe accelerated with AI/ML) can provide the required volume, velocity, variety, and veracity of data.

4. Striking gold in the Wild-Wild West!

To conclude, Materials Informatics is a field of research facilitating the creation, deployment, and exchange of data-driven digital approaches involving AI/ML. At the core, it can be subdivided into two sections: (I) approaches allowing for Data Synthesis independent from resulting AI/ML, and (II) sole Data Analysis approaches.

How important is Materials Informatics? Well, everything you touch is a material. If we can improve how we discover and develop new materials by even small margins, we can directly affect many aspects of our lives, including important areas like decarbonization, reducing pollution, transition to renewable energy, and electronics beyond Moore. Materials Informatics is a pivotal field transforming how materials are discovered, developed, and applied, making it a vital area in modern science and technology.

Materials Informatics is like Wild-Wild West today - a vast, uncharted territory that has recently piqued the interest of Big Tech. It's a field ripe with potential but also complex and confusing. Today, like during the early days of Quantum Mechanics, no one truly understands what's happening. To make sense of it and to move forward faster, we need to sub-classify and draw the boundaries of the individual states of the Wild-Wild West - "California", "Arizona", "New Mexico," etc. to avoid wasted time and missed expectations. By sub-classifying and focusing our attention, we can navigate the landscape faster and pave the way for groundbreaking discoveries.


  1. Market Research Data from (1) Research And Markets https://www.researchandmarkets.com/report/material-informatics; (2) Value Market Research https://www.valuemarketresearch.com/report/material-informatics-market; (3) Industry Arc https://www.industryarc.com/Report/19609/material-informatics-market.html; (4) OpenPR https://www.openpr.com/news/3213126/global-material-informatics-market-size-share-and-forecast
  2. Precedence Research, Materials Informatics market forecast: https://www.precedenceresearch.com/material-informatics-market
  3. Meta AI Research about the "Open Catalyst" project, https://ai.meta.com/research/impact/open-catalyst/
  4. Google DeepMind about GNoMe and accelerating Materials Discovery with AI: https://deepmind.google/discover/blog/millions-of-new-materials-discovered-with-deep-learning/
  5. Microsoft Research's MatterGen, AI-driven materials design: https://www.microsoft.com/en-us/research/blog/mattergen-property-guided-materials-design/
  6. Lux Research news release about Materials Informatics, 2017/08/08: https://www.globenewswire.com/news-release/2017/08/08/1081561/0/en/Materials-Informatics-is-a-Disruptive-Technology-for-Chemicals-and-Materials-R-D-Says-Lux-Research.html


NOTE: this article was originally published at https://www.linkedin.com/pulse/wth-materials-informatics-timur-bazhirov-sophc