Digitizing a massive dye library

[below text is an excerpt from C&EN News written by Celia Henry Arnaud]

It takes a lot of drawers to hold the Max Weaver Dye Library. Each drawer holds between 600 and 1,000 vials.

A college of textiles is an unexpected place to find a chemist trained in high-resolution mass spectrometry. But the Wilson College of Textiles at North Carolina State University had something that attracted Nelson R. Vinueza: It is home to the Max A. Weaver Dye Library.

Weaver, a longtime researcher at Eastman Chemical, and his team collected dyes from the company over a period of more than 30 years. His efforts resulted in a treasure trove of about 98,000 vials of dye molecules dating from the 1960s to the 1980s, which Eastman donated to NC State in 2013, along with accompanying fabric swatches. NC State hired Vinueza that same year, and he’s now codirector, with Harold S. Freeman, of the dye library.

The library team is working to make the collection publicly available, in part to encourage researchers to find new uses for the compounds other than as textile dyes. As a first step toward that goal, Vinueza, computational chemist Denis Fourches, and coworkers have digitized and analyzed the structures of 2,700 of the dyes. The team recently reported the results of this work, giving chemists a first glimpse at the structural diversity within the collection (Chem. Sci. 2017, DOI: 10.1039/C7SC00567A).

The team wanted this first set to be representative of the collection, so their students opened drawers and randomly picked dyes to be included. They avoided the earliest dyes because they worried the older ones might have degraded.

“This sample is quite representative of the entire library in terms of the expected color distribution that Eastman provided,” Fourches says. “If we had obtained different distributions, we could have questioned whether this sample was really random.”

Each vial in the collection is labeled with the chemical structure of the dye. The team wanted to make sure the structures they analyzed were all unique ones. So after digitizing the structures, the researchers applied a standardization protocol that looked for salts, mixtures, and duplicates. Those compounds were removed from further analysis; the salts and mixtures remain in the database, however. “We want to make sure that in the final set, there is a very specific, well-defined structure for each dye that is curated,” Fourches says. The standardization protocol resulted in a set of 2,196 unique dyes.

They searched these structures for well-known chromophores, such as anthraquinone and stilbene substructures. The azo group is the most common substructure in the data set.

The team also acquired high-resolution mass spectra of a randomly selected 74-compound subset to make sure the mass, isotopic distribution, and elemental composition matched the expected values from the structures on the labels.

Swapping the positions of amino and hydroxyl groups is enough to change the colors of these dyes from yellow (left) to orange (right), even though they have the same absorption maximum (387 nm).

“We just had the vials and structure,” Vinueza says. “High-resolution mass spectrometry gives us an idea if what we are searching is correct. That gives us more confidence about the structures.”

They characterized the dyes’ properties using cheminformatics software. “We can screen the library for dyes that are similar in terms of structure, shape, volume, or charge distribution to other dyes that have known biological or physicochemical properties of interest,” Fourches says. “It is very likely that in this library we have potential antibiotics, anticancer agents, new ways of coating materials.” This modeling provides a fast and inexpensive way to prioritize dyes for testing for these other properties.

The team has entered 150 of the dyes into the ChemSpider online structure database. When their collaborator Antony J. Williams of the National Center for Computational Toxicology compared the structures to the 58 million in ChemSpider, only seven were already in the database. “It’s almost the same probability as winning the lottery,” Fourches says. “It shows how unique the chemistry is in the dye library.”

A close-up of some of the vials in the dye library.

The NC State researchers also used modeling methods to cluster the dyes according to their structural similarity and displayed the results as dendrograms.

“If two dyes are close together on the dendrogram, it means they are structurally similar,” Fourches says. “You expect to see clusters where the dyes have similar structures and similar colors.” Through this analysis, the scientists found that blue dyes, which are the largest color family both in the overall library and in the recently published subset, had multiple structural scaffolds.

But sometimes very similar structures lead to different colors. When the team searched the database for constitutional isomers, they found one pair of compounds in which the only difference was in the location of a chlorine atom. But that difference was enough to make one dye red and the other one orange. In another pair, swapping the positions of a primary amine and a hydroxyl group meant the difference between a yellow and an orange dye, even though both dyes have the same experimentally measured absorption maximum.

The dye library is of interest to researchers in the wider chemistry community. “Any large, well-documented collection of colorants like this is of interest to those of us who work in conservation science and technical art history,” says Gregory D. Smith, senior conservation scientist at the Indianapolis Museum of Art. “While the cheminformatics approach is interesting in distilling meaning out of this large group of materials, these collections are most useful to our field when the samples have been fully characterized and the resulting spectral libraries can be used to assist in dye identification from artworks and artifacts.”

NC State scientists clustered the structures of 2,196 dyes based on similarity to create these dendrograms, which are colored according to the dyes’ color (left) and molecular weight (right).

But it will be a while before the entire collection is digitized, let alone fully characterized. Students have been drawing the structures so they can be converted into a digital format compatible with modeling software.

“From my perspective on the modeling side, the more unique structures I have to feed my computer, the better,” Fourches says. “But I know from the experimental point of view, the number of hours for undergrads and graduate students to write down the structures is enormous. We’re not sure yet what is the best strategy, but we need to find a path to get this library entirely digitized and modeled as soon as possible. And that requires funding.” The NC State team plans to seek grant funding and establish collaborations to continue characterizing the library.

C&EN Magazine Explores the Weaver Dye Library

[below text is an excerpt from C&EN Magazine written by Carmen Drahl]

Demands for colorful, cheap, and diverse dyes drove the transformation of chemistry into a modern science. To some, it may seem like dyestuffs are the stuff of times past. A dye collection recently donated to a university, however, might show that these compounds still hold the key to some cutting-edge chemistry problems.

Eastman Chemical has donated a dye collection spanning 50 years of the company’s research, beginning in the 1940s, to North Carolina State University. The library includes vials filled with vibrant powders, each meticulously hand-labeled; coordinating envelopes stuffed with dyed fabric swatches and testing data; and post-World War II intelligence reports on the German dye industry. The collection is named for the late Max A. Weaver, a longtime Eastman research leader who made the library his life’s work.

“This dye collection is a research treasure trove,” says David Hinks, director of the university’s Forensic Sciences Institute. NC State has agreed to build a digital database of the approximately 98,000 compounds. All of the structures, previously trade secrets, will become available to the public on the Royal Society of Chemistry’s ChemSpider database.

Continue reading the full article here.

Kelsey puts her passion for sharing science into practice

Green Chemical Graphic in Round Bottom FlaskKelsey Boes, one of our current PhD students, took her enthusiasm for strong science communication to DC. With degrees in both chemistry and studio art, Kelsey passionate about both science and design. What truly excites her is not only discoveries but also sharing them in dynamic and thoughtful ways. As an undergraduate, Kelsey travelled to the 246th American Chemical Society (ACS) National Conference to present her work and became aware of the complex and prohibitive way research is often shared at such events. "Science, as a field," Kelsey admits, "has the reputation of being difficult and uninteresting," but she believes this can be remedied by presenting information in a more diverse variety of forms, specifically graphics. "Convincing visuals," she says, "excel at engaging viewers of all backgrounds."

Wanting to put these ideas into practice, Kelsey set a goal to share a poster at the ACS Green Chemistry & Engineering Conference this past summer, redesigning the mass spectrometer schema used by Vinueza Labs into a visually expressive teaching tool­. She applied for and received a competitive NSF Scholars travel scholarship to attend a green chemistry workshop and to present her work in biofuels at the opening session of the 19th American Chemical Society Chemistry & Engineering Conference in Washington, DC this past June. Below you can find her graphic representation of her biofuel analysis project.

Biofuel Refinery Byproduct Analysis Project Graphic

Project Summary: Economic success of a biofuel refinery requires efficiency at every step. This includes analyzing byproducts for potential value and reuse. One such byproduct is the water stream produced after pretreatment, labelled as autohydrolyzate, which contains several valuable organic derivatives of hemicellulose and lignin from within the biomass. Unfortunately, this mixture is highly complex and difficult to analyze fully with just one instrument. However, mass spectrometry (MS) with its high sensitivity and versatility can be a very useful tool in the analysis of these complex mixtures. We explored the use of dopants—sodium hydroxide and ammonium chloride—in electrospray ionization  in combination with tandem MS/MS for characterization.

Yufei Chen receives Master's Degree and leaps into PhD

Yufei meditating on his progress through grad school.

Yufei Chen, the longest standing member of the group, received his Master's Degree in July 2015. His project explored fragmentation pathways of anthroquinone-based dyes from the Eastman Dye Library bequeathed to the NC State Wilson College of Textiles in 2013. Yufei's work confirmed and corrected the vast database of dye structures through careful mass spectrometric analysis. He is excited to begin work on his PhD under Dr. Vinueza and Dr. Freeman.