Creating an inexhaustible database for drug molecules

Ground-breaking drug discovery work by UCSF School of Pharmacy scientists, who search for drug candidates from stores of small molecules and through computer simulations, continued this week with the announcement of a new virtual library of more than 250 million drug-like compounds for researchers to mine for tomorrow’s cures.

Brian Shoichet, PhD, and John Irwin, PhD, both faculty members in the School’s Department of Pharmaceutical Chemistry, unveiled the new database, which was developed in collaboration with colleagues at the University of North Carolina and unveiled on February 6, 2019 in Nature.

To build the database, they merged a commercial list of more than one billion novel chemical compounds, provided by Enamine Ltd., a company based in Kiev, Ukraine, with ZINC, a free, 3-D library of molecules designed by Irwin at UCSF and available for virtual screening. To demonstrate its potential, they used Shoichet’s DOCK software to search the database for potential drugs, targeting two unrelated proteins: a bacterial enzyme, beta-lactamase, which is involved in antibiotic resistance, and the neuronal D4 dopamine receptor, which has been implicated in psychosis and addictive behavior.

For each protein target, they discovered a new compound that showed an extremely high preference for binding with it.

Shoichet spoke to the School of Pharmacy’s editorial director, Grant Burningham, about the promise of this massive virtual library and its potential to accelerate the work of scientists and clinicians.

Burningham: Tell me about Enamine’s database. How can scientists use it?

Shoichet: Well it’s not really a database, it’s more like a list of molecules they say they can make in a lab. What we’ve done is make that list structured and searchable using John Irwin’s ZINC database. ZINC has been the main database for finding molecules with certain properties for a long time, and it launched back in 2005. You can look online and see how much it gets used, it’s really astounding.

We [virtually] imported all of Enamine’s molecules, and it increased the number [of compounds] in ZINC by about 100-fold. It went from around 3.5 million to a quarter of a billion compounds, and we expect it to exceed a billion by early next year. It’s an amazing leap for scientists who look at small molecules. The number of molecules we have access to is almost unbounded now. The only thing limiting us is our ability to enumerate molecules and then search them.

Burningham: What is “docking” and why will pairing it with these molecules help drug discovery?

Shoichet: Well, the problem is finding things. Do you like shopping?

Burningham: Not really.

Shoichet: Me either. Imagine you suddenly heard that the clothing store was going to be 100 times bigger. So you need to buy a shirt and tie, and there are hundreds of aisles of shirts and ties. How do you find what you need? And what would be even worse is for these shirts to be clustered in a way where they look the same, aisles and aisles of slightly different brown shirts, for instance.

What docking does is take each molecule and fit it into a structure and organize these molecules by how they’re shaped. It can handle one possibility every microsecond. Now you’re walking into that massive store but there’s a salesman who can point you in the right direction, and can show you a few shirts that fit your needs.

Burningham: So put the number one billion into perspective. How many different drug molecules are there now?

Shoichet: There are only about 1,700 drugs, of which about 1,300 are small-molecule drugs, the rest are proteins drugs or antibodies. But there is a larger set of molecules that has been tested in humans or in other tests, and there are a bunch of molecules that are just interesting and get used a lot by biologists or chemists.

In our old database we had 3.5 million of these molecules, which you could actually buy from labs and weren’t too hard to make or dissolve. A lot of molecules, which we excluded, are just too greasy to use.

Burningham: As in actually greasy?

Shoichet: Yeah, they’re too hydrophobic. They won’t play well with water; they’re repelled by it, which makes them hard to work with.

Burningham: As a test, you found two molecules, one that inhibited a bacterial enzyme, beta-lactamase, which is involved in antibiotic resistance, and another that activated the neuronal D4 dopamine receptor. Did you confirm that these molecules work in petri dishes?

Shoichet: Yes, we ordered the molecules and then tested them, and they worked. And these are molecules that have never existed before on the planet. The astonishing thing was that we found those two molecules using a screen of just 549 molecules against the D4 receptor, which was a big undertaking. The simulation suggests that over 450,000 molecules in the same library could have potentially worked. We literally tested one-thousandth of the molecules the program found interesting, and several were among the strongest ever found. It was just a small slice of the total library. Who knows what’s to be found in the 99.9 percent of the molecules that remain untested.

Burningham: How long until researchers will exhaust this new database? How many drugs do you think could come out of this?

Shoichet: Oh forever. They won't exhaust it. There are always new targets you’re using it for. And 99.9 percent of the things we could have legitimately tested, we were unable to test, just because we didn’t have the resources to do so.

Burningham: Who will have access to this?

Shoichet: Everybody. We make the program and the library, both developed with patient support from the National Institute of Health (NIH) over many years, completely open access. This is a general technology and the potential treatments could fit almost every disease, from cancer to epilepsy to pain to disorders of the central nervous system to heart disease, you name it.

Burningham: Thank you so much for talking to me today.

Shoichet: My pleasure.


‘Virtual Pharmacology’ Advance Tackles Universe of Unknown Drugs 

About the School: The UCSF School of Pharmacy aims to solve the most pressing health care problems and strives to ensure that each patient receives the safest, most effective treatments. Our discoveries seed the development of novel therapies, and our researchers consistently lead the nation in NIH funding. The School’s doctor of pharmacy (PharmD) degree program, with its unique emphasis on scientific thinking, prepares students to be critical thinkers and leaders in their field.