Sunday, November 13, 2016

OpenTox Euro 2016: "Data integration with identifiers and ontologies"

Results from a project by MSP students.
J. Windsor et al. (2016): Volatile Organic Compounds:
A Detailed Account of Identity, Origin,
Activity and Pathways
. Figshare.
A few weeks ago OpenTox Euro 2016 meeting was held in Rheinfelden at the German/Swiss border (which allowed me a nice stroll across the Rhine into Switzerland and by a nice x-mas countdown clock. The meeting was co-located with eNanoMapper-hosted meetings, where we discussed, among other things the nanoinformatics roadmaps, that outline where research in this area should go to.

There were many interesting talks, around various data initiatives, adverse outcome pathways (AOPs) and their links to molecular initiating events (MIEs), and ontologies (like the AOP ontology talk by ). In fact, I quite enjoyed the discussion with Chris Grulke about ontologies during the panel discussion. Central was, where is the border between data and ontological concepts. Some slides are available via Lanyrd.

During the Emerging Methods and Practice session hosted by Ola Spjuth, I presented the work at the BiGCaT department into identifier mapping and the use of ontologies for linking data sets.

The presentation integrates a lot of things I have been working on in the last few years, and please note the second slide with all people I have worked with on things presented in these slides.

Recent presentation: "Open Access: a practical perspective"

Source: MediaWiki Commons
For a local grant acquisition course I recently gave a presentation about Open Access (OA). My interest in OA started from my Open Science background and lack of access to literature was a serious problem. Journals were invented to make knowledge dissemination easier, but many publishers are stuck with outdated technologies that make their knowledge dissemination not caught up with the 21st century. BTW, OA to me is the one that actually really helps knowledge dissemination and allows:
  1. download and use (text mining!)
  2. modification (format change!)
  3. redistribute (allow others to read it to! share your modifications!)
There are several stories around showing that fast knowledge exchange saves lives (is there an overview of well-documented examples?). Honestly, I would be surprised that people do not also die because of disseminated knowledge, but then it is of misuse of knowledge, and not because of knowledge denied. And this is what access to knowledge can mean:
It shows that you can get far with access to the right knowledge (here in the form of data). This must be a right every human has. In fact, it is part, but as often, legal wording complicates things. Wikipedia has a good overview. Like with free speech, it tries to find a balance between rights of all people: the right of one cannot restrict the rights of others. Well, I don't know if "caching in" is a human right, but surely many people believe so.

And not every human has this opportunity that Pepke had. Access to knowledge is a serious problem. A problem I am facing every week myself, and then I find myself at a relatively well equipped Maastricht University Library. A recent study found that even researchers at my university found Sci-Hub an important resource, as can be seen in the below slides. I do not encourage Sci-Hub. The legal basis in unclear, but at least it's not found illegal at this moment (as far as I could keep up with the process). And there are many alternatives, which I blogged about earlier.

Fact is, we have a knowledge dissemination issue. And that was the main message of my presentation. Because it is easy to solve as author: don't give away your IP to publishers and by choosing an Open Access license of your work (the gold OA version, as green OA is like the Rolex you by for 10 euro at the black market).

And I'll end with this quote from John Oliver:

"Knowledge dissemination: a topic you know so little about, you think the best kind of dissemination if a Nature journal ReadCube."

Pepke, S., Steeg, G. V., Sep. 2016. Comprehensive discovery of subsample gene expression components by information explanation: therapeutic implications in cancer. bioRxiv, 043257+.

Friday, November 11, 2016

New paper: "SPLASH, a hashed identifier for mass spectra"

I'm excited to have contributed to this important (IMHO) interoperability paper around metabolomics data: "SPLASH, a hashed identifier for mass spectra" (doi:10.1038/nbt.3689, readcube:msZj). A huge thanks to all involved in the great collaborative project! The source code project is fully open source and coordinated by Gert Wolgemuth, the lead author on this paper. It provides an implementation of the algorithm in various programming languages and I'm happy that the splash functionality is available in the just released Bioclipse 2.6.2 (taking advantage of the Java library). An R package by Steffen Neumann is also available.

This new identifier greatly simplifies linking between spectral databases and will in the end contribute to a Linked Data network. Furthermore, journals can start adopting this identifier and list the 'splash' for mass spectra in document, allowing for simplified dereplication and finding additional information around spectra.

There are several databases that have adopted the SPLASH already, such as MassBank, HMDB, MetaboLights, and the OSDB published in JCheminf recently (doi:10.1186/s13321-016-0170-2).

Screenshot snippet of a spectrum in the OSDB.

PS. I personally don't like the idea of ReadCubes (which I may blog about at some point) and how they have been pitched as a "legal" way of sharing papers, but this journal does not have a gold Open Access option, unfortunately.

Wohlgemuth, G., Mehta, S. S., Mejia, R. F., Neumann, S., Pedrosa, D., Pluskal, T., Schymanski, E. L., Willighagen, E. L., Wilson, M., Wishart, D. S., Arita, M., Dorrestein, P. C., Bandeira, N., Wang, M., Schulze, T., Salek, R. M., Steinbeck, C., Nainala, V. C., Mistrik, R., Nishioka, T., Fiehn, O., Nov. 2016. SPLASH, a hashed identifier for mass spectra. Nature Biotechnology 34 (11), 1099-1101.

Sunday, October 16, 2016

New paper: "XMetDB: an open access database for xenobiotic metabolism"

Back in 2013 at the OpenTox conference in Mainz I spoke with Ola, Patrik, and Nina. They were working on a database for CYP metabolism, XMetDB, which I joined on the spot. The database has Open Data, an Application Programming Interface (API), is Open Source, and good amount of experimental detail, like specific enzyme involved and the actual atom mapping of the reaction. A few weeks ago, the paper describing the database was published in the Journal of Cheminformtics (doi:10.1186/s13321-016-0161-3). It's not perfect, but we hope it is a seed for more to follow.

The data, it turns out, is really hard to come by. While I was adding data to the database for most-selling drugs, it was hard to find publications where a human experiment was done (many experiments use rat microsome experiments. Not only makes that hard to identify the specific CYP enzyme, it also is not the human homologue. BTW, since the background of this paper is to create a knowledge base for computational prediction of CYP metabolism, ideally we would even have a specific protein sequence, including any missense SNPs affecting the 3D structure of the enzyme.

However, even for the (at least then) most selling drug aripiprazole, literature was really hard to find! There is a lot of literature just copy/pasting knowledge from other papers, and those other "papers" may in fact be the information sheet you get when you buy the actual drug. Alternatively, personal communication and conference posters can be cited as primary literature too. So, only stressing the importance of a database like this.

At this moment the project is a stalled. None of the currently involved groups has funding for continued development. I guess collaborations are welcome! ChEMBL 22 now was metabolism data for compounds, but I have not explored yet if it has all the details for the transformations needed for XMetDB. At the very least, it may serve as a source of primary literature references.

Spjuth, O., Rydberg, P., Willighagen, E. L., Evelo, C. T., Jeliazkova, N., Sep. 2016. XMetDB: an open access database for xenobiotic metabolism. Journal of Cheminformatics 8 (1). doi:10.1186/s13321-016-0161-3

Friday, September 30, 2016

NanoSafety Cluster presentation: Open Data & NSC Activities

Two weeks ago (already!), the NanoSafety Cluster (NSC) organized two meetings. First, there was on Wednesday afternoon the NSC half-yearly meeting. Second, on Thursday and Friday, in the beautiful Visby on Gotland, the 2nd NanoSafety Forum for Young Scientists. I ran an experiment there, which I will blog about later. Here, please find the slides of my presentation about Open Data I gave on Wednesday:

Oh, and I also presented a few slides about the Working Group 4 activities: