Digitizing a massive dye library

[below text is an excerpt from C&EN News written by Celia Henry Arnaud]

It takes a lot of drawers to hold the Max Weaver Dye Library. Each drawer holds between 600 and 1,000 vials.

A college of textiles is an unexpected place to find a chemist trained in high-resolution mass spectrometry. But the college of textiles at North Carolina State University had something that attracted Nelson R. Vinueza: It is home to the Max A. Weaver Dye Library.

Weaver, a longtime researcher at Eastman Chemical, and his team collected dyes from the company over a period of more than 30 years. His efforts resulted in a treasure trove of about 98,000 vials of dye molecules dating from the 1960s to the 1980s, which Eastman donated to NC State in 2013, along with accompanying fabric swatches. NC State hired Vinueza that same year, and he’s now codirector, with Harold S. Freeman, of the dye library.

The library team is working to make the collection publicly available, in part to encourage researchers to find new uses for the compounds other than as textile dyes. As a first step toward that goal, Vinueza, computational chemist Denis Fourches, and coworkers have digitized and analyzed the structures of 2,700 of the dyes. The team recently reported the results of this work, giving chemists a first glimpse at the structural diversity within the collection (Chem. Sci. 2017, DOI: 10.1039/C7SC00567A).

The team wanted this first set to be representative of the collection, so their students opened drawers and randomly picked dyes to be included. They avoided the earliest dyes because they worried the older ones might have degraded.

“This sample is quite representative of the entire library in terms of the expected color distribution that Eastman provided,” Fourches says. “If we had obtained different distributions, we could have questioned whether this sample was really random.”

Each vial in the collection is labeled with the chemical structure of the dye. The team wanted to make sure the structures they analyzed were all unique ones. So after digitizing the structures, the researchers applied a standardization protocol that looked for salts, mixtures, and duplicates. Those compounds were removed from further analysis; the salts and mixtures remain in the database, however. “We want to make sure that in the final set, there is a very specific, well-defined structure for each dye that is curated,” Fourches says. The standardization protocol resulted in a set of 2,196 unique dyes.

They searched these structures for well-known chromophores, such as anthraquinone and stilbene substructures. The azo group is the most common substructure in the data set.

The team also acquired high-resolution mass spectra of a randomly selected 74-compound subset to make sure the mass, isotopic distribution, and elemental composition matched the expected values from the structures on the labels.

Swapping the positions of amino and hydroxyl groups is enough to change the colors of these dyes from yellow (left) to orange (right), even though they have the same absorption maximum (387 nm).

“We just had the vials and structure,” Vinueza says. “High-resolution mass spectrometry gives us an idea if what we are searching is correct. That gives us more confidence about the structures.”

They characterized the dyes’ properties using cheminformatics software. “We can screen the library for dyes that are similar in terms of structure, shape, volume, or charge distribution to other dyes that have known biological or physicochemical properties of interest,” Fourches says. “It is very likely that in this library we have potential antibiotics, anticancer agents, new ways of coating materials.” This modeling provides a fast and inexpensive way to prioritize dyes for testing for these other properties.

The team has entered 150 of the dyes into the ChemSpider online structure database. When their collaborator Antony J. Williams of the National Center for Computational Toxicology compared the structures to the 58 million in ChemSpider, only seven were already in the database. “It’s almost the same probability as winning the lottery,” Fourches says. “It shows how unique the chemistry is in the dye library.”

A close-up of some of the vials in the dye library.

The NC State researchers also used modeling methods to cluster the dyes according to their structural similarity and displayed the results as dendrograms.

“If two dyes are close together on the dendrogram, it means they are structurally similar,” Fourches says. “You expect to see clusters where the dyes have similar structures and similar colors.” Through this analysis, the scientists found that blue dyes, which are the largest color family both in the overall library and in the recently published subset, had multiple structural scaffolds.

But sometimes very similar structures lead to different colors. When the team searched the database for constitutional isomers, they found one pair of compounds in which the only difference was in the location of a chlorine atom. But that difference was enough to make one dye red and the other one orange. In another pair, swapping the positions of a primary amine and a hydroxyl group meant the difference between a yellow and an orange dye, even though both dyes have the same experimentally measured absorption maximum.

The dye library is of interest to researchers in the wider chemistry community. “Any large, well-documented collection of colorants like this is of interest to those of us who work in conservation science and technical art history,” says Gregory D. Smith, senior conservation scientist at the Indianapolis Museum of Art. “While the cheminformatics approach is interesting in distilling meaning out of this large group of materials, these collections are most useful to our field when the samples have been fully characterized and the resulting spectral libraries can be used to assist in dye identification from artworks and artifacts.”

NC State scientists clustered the structures of 2,196 dyes based on similarity to create these dendrograms, which are colored according to the dyes’ color (left) and molecular weight (right).

But it will be a while before the entire collection is digitized, let alone fully characterized. Students have been drawing the structures so they can be converted into a digital format compatible with modeling software.

“From my perspective on the modeling side, the more unique structures I have to feed my computer, the better,” Fourches says. “But I know from the experimental point of view, the number of hours for undergrads and graduate students to write down the structures is enormous. We’re not sure yet what is the best strategy, but we need to find a path to get this library entirely digitized and modeled as soon as possible. And that requires funding.” The NC State team plans to seek grant funding and establish collaborations to continue characterizing the library.