novo|seek is an information extraction system developed by the Spanish IT company Bioalma for searching published knowledge in biomedical literature. Last month, I had the opportunity to talk in depth about novo|seek with Ramón Alonso-Allende, the Marketing and Business Development Director at Bioalma.
I first heard about novo|seek earlier this year through the editor of AltSearchEngines.com, Charles Knight. I then began to see references to it on Twitter and posts by it in The Life Scientists room on FriendFeed. I tried out novo|seek and wrote a glowing review for AltSearchEngines. I then spoke with Ramón Alonso-Allende by phone, he in Madrid and me in my home in Oregon. During the course of our interview he said some fascinating things that are included below.
The Interview
First of all, you began by discussing the origins of Bioalma, the parent firm of novo|seek. You mentioned a research scientist who helped found the company and who developed some key techniques in biomedical search. Could you please tell us who that person is, what search tools he developed and how they led to the development of novo|seek?
The technology base of novo|seek originated from the National Center of Biotechnology, and was largely invented by Prof. Alfonso Valencia, a former director at the Protein Design Group at the NCB, who now serves as Director of the Structural Biology and the biocomputational program at the Spanish National Center for Cancer Research (CNIO).
This technology was formed when Bioalma became involved in examining literary resources on the topic of predicting the function of genes. We found that using text mining had not been implemented as a solution as much as it could have and we saw an opportunity because the use of text mining technologies reveals the relations among concepts within literature — relations that would be difficult to expose over the course of reading a wide breadth of articles to find what you’re looking for. We saw the potential for text mining as a way to extract the information stored within scientific literature.
What is your own professional background and role in novo|seek?
I’ve been working in the biomedical IT sector for 10 years and I initially joined Bioalma in the scientific research unit and now lead the marketing and business development team. I started my career as a CRA developing new electronic data collection system and then moved to the Protein Design Group at the National Center of Biotechnology to coordinate the bioinformatic development of a European molecular biology project, during which I developed molecular data integration systems. I have participated in other European projects developing data integration interfaces and have a degree in Pharmacy from the Universidad San Pablo in Madrid and an MBA from the Instituto de Empresa, Madrid.
Who is the primary audience for novo|seek? Basic scientists? Frontline medical providers? Can you give us some real-world examples? Say I’m a molecular biologist. How would I use novo|seek? Say I am a practicing neurologist — how would I use novo|seek to help me determine the best treatment for a 40-year-year old man with longstanding epilepsy whom I have never treated before?
novo|seek is aimed at a variety of audiences within the biomedical and medical community including: medical doctors and students, medical librarians, and biomedical scientists and researchers.
For example, novo|seek can be used by scientists researching a cure for disease, or the research associated with a particular gene, by allowing them to find the right information that’s the most relevant and the most comprehensive. Another example might be for medical doctors conducting patient-related queries — novo|seek enables them to execute searches that will bring up the most relevant information immediately, without requiring a labor-intensive search. In terms of medical librarians, novo|seek provides the comprehensive, updated search results that they require as it is equipped to process and catalogue thousands of new articles per day.
There are several ways to perform a search, but we should start as a regular user, namely from the main search box. If we type in the main search terms as specified for this research, such as “longstanding epilepsy”, we get 46 results (as of July 6th, 2009).
At this point, when the results are sorted out by date in novo|seek, they are very similar to PubMed’s. The next step is to make the search more precise by adding the filter “epilepsy” to the current search. To do this, we click on the first concept in the category “Diseases or Syndromes” in the left sidebar. We now have 34 results in Medline.
One of the main issues we face is that “longstanding epilepsy” is not recognized as a disease and therefore is not mapped. This is why both “longstanding” and “epilepsy” appear in bold in the results. However, we chose to sort results by relevance and start looking at them. The first one looks promising.
In the case that it is not, we can keep scrolling for more results.
Novice researchers can use the concepts sidebar to refine the search even further. On the other hand, advanced users could use the following query to extract results for male, aged 40 years old, suffering from longstanding epilepsy:
“adult”[mesh] “male”[mesh] longstanding epilepsy
You said during the phone interview that you have been pleasantly surprised by the interest librarians have been showing in novo|seek. What feedback have they been giving you? How are they learning about it?
As potential users of novo|seek, librarians are a very important audience for us. From our point of view, knowing how they use novo|seek and what improvements they can suggest is priceless. Sometimes they contact us directly and sometimes we converse with them on Twitter, where we address their queries directly. There are very useful interactions during which we can help each other. We enjoy having the opportunity to show them how to use novo|seek properly. On the other hand, some of them have already adopted novo|seek and have written detailed reviews of it on their blogs so their colleagues can understand it better. One example is María García-Puente Sánchez, a Spanish librarian at Biblioteca Médica Virtual.
Also, we try to address researchers’ needs, such as this one brought to us by @Puldiszek and received a positive response.
Additionally, we have published tutorial videos to explain a particular search and how to make better use of novo|seek.
Last but not least, there is interest from users in taking part in giving us feedback in continuing to develop and enhance novo|seek. We are currently preparing a PowerPoint presentation for librarians to use as a classroom tool, as well as for their own use should they wish to understand novo|seek better. We are already sharing it on Slideshare, so they can review it and help us make it as helpful as possible.
In order for readers to understand what makes novo|seek such a valuable tool, they need to understand certain terms. Could you please define for us the following terms:
ONTOLOGY: Ontology refers to a specification of a conceptualization. It is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set-of-concept-definitions, but more general, and is a different sense of the word than its use in philosophy.
TEXT MINING: In general, text-mining applications take advantage of a range of domain-independent methods such as part-of-speech (POS) taggers, which label each word with its corresponding part of speech (e.g. noun, verb or adjective), or stemmers, which are algorithms that return the morphological root of a word form.
Instead of just searching for relevant terms, novo|seek goes a step further by indexing the literature using text mining to combine:
- information extraction (attempts to identify meaningful semantic structures within free text using strategies based POS taggers);
- name entity recognition (term recognition, classification and mapping to a desire concept);
- and knowledge discovery technologies (which finds hidden information in the literature by exploring the internal structure of the knowledge network created by the textual information. Techniques such as pattern matching and syntactic analysis can highlight relevant text passages from large abstract collections) in order to provide the most relevant and meaningful biomedical search results.
DATA MINING: Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information — information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. As opposed to text mining, data mining technologies operate with structured data.
CONCEPTUAL SEARCH: Conceptual search is defined as the ability to retrieve relevant information without requiring the occurrence of the search terms in the retrieved documents. Most search technology in use today is traditional keyword search that requires the search term to appear in the retrieved documents. Many of these traditional search engines have mimicked conceptual search through the use of synonym lists and other human-maintained query expansion approaches. True conceptual search retrieves relevant information in a way that does not require the presence of the search terms without the use of query expansion or independently maintained lexicons, taxonomies or synonym lists. This is why conceptual search is distinctly different from keyword search and is the key to why it is able to adapt to changes in language and the use of slang. Conceptual search allows you to locate information about a topic by understanding what words mean in a given context.
There is a lot going in biomedical search these days. Could you please tell us how Novoseek differs from GoPubMed, DeepDyve, Mednar, Wolfram Alpha and from the gold standard of medical search, PubMed?
What is different about novo|seek is that it takes the analysis out of it for users and doesn’t require them to know every single synonym for their search term –- using conceptual search to analyze the results, rank documents in order of relevance, understand the relevant biomedical concepts and recognize their relationships by extracting the key biomedical concepts mentioned in the document set.
novo|seek basically lets users search more accurately, in less time and with less effort than any other search offering in the market today, including PubMed or Google Scholar. novo|seek lets users leverage Web 3.0 concepts to interact with the system and build dictionaries and ontologies, analyze different text sources and structure the results of the information extraction analyses.
With respect to other search engines mentioned here, novo|seek is definitely different. Indeed, when you perform a search, it displays the corresponding results and allows you to quickly preview each of them. No matter what you type in, you will see the publications directly as well as how a concept profile is created in the left column. This profile is a great way to navigate through results and make a search more specific. In addition, the results are presented in a user-friendly way. Clicking on one of them takes you to the publication page (or concept page or author page), and there are always the links to the original website. The interface and user experience are also something we are constantly working to improve and optimize for users.
Can you give us a specific example of how novo|seek would be a better choice compared to each of the others? What does it do better than each of these?
With novo|seek you can provide the system with the specific meaning of the term that you need. This will extract the most accurate documents. For example, in novo|seek if you want to find the documents for the gene PAP, you can type pap[gene] and find the grants and scientific literature you are looking for. This is something you cannot do in any of the search engines mentioned above.
novo|seek also gives you the most relevant biomedical concepts even if they are not in the MeSH dictionaries.
We help the user find relevant information faster through color-coded highlighting as well as with the option to look at the specific sentence that the user mentions in the query term.
One of the things I found most exciting and useful about novo|seek is its capacity to find grants that have been made. I work on the site, ScanGrants, which lists funding opportunities in the health sciences. I liked what you said on the phone, “The grants of today are the papers of tomorrow.” And as we know, the papers of tomorrow can lead to therapies or even cures for diseases or additional uses of existing drugs. That is what I found so potentially useful for empowered patients and researchers about your grant search feature. With ScanGrants, I list grants that have not yet been awarded. I am often very curious about what research comes out of those grants once they have been awarded and the projects completed or are underway right now. You have partnered with SciSight. I take it they provide the actual data that novo|seek enables its users to search. Please tell us about the relationship between novo|seek and SciSight. For instance, it is vital that scientists be able to know what work in currently underway and much of the important work that patient advocacy and disease-centered organizations like the members of the Genetic Alliance and the National Organization for Rare Disorders (NORD) is insufficiently publicized among the general public (potential donors to those organizations and thus potential funders of research) and to scientists themselves (who may not realize that key research is being undertaken). Thus, I could see SciSight, novo|seek and the firm ResearchScorecard being key players in a future network of researchers and funders.
Yes — we partnered with SciSight for integration within novo|seek so users can search overviews of life science research grants from both U.S. and Canadian sources. This partnership basically enables scientists to search and find awarded grants within their area of research. This is a big advantage for us, particularly as the ability to not only search the “who” (who is doing the research) and “what” (area of focus) of the biomedical research but the “how” of the research — the funding source — is an extremely valuable combination. As you may know, the NIH awarded almost $30 billion in grants in 2008, and with the American Recovery & Reinvestment Act of 2009 (ARRA) expected to receive an additional $8.2 billion to help stimulate the U.S. economy through the support and advancement of scientific research, it is more important than ever to unlock that valuable insight into federally funded research.
Please discuss how you see the grants feature of Novoseek developing.
There are plenty of interesting things to do with the grant information, such as answering questions like: what results have been published related to this grant, what are the trends of research or evaluating which grants have lead to interesting scientific results, etc. However we always welcome posts through our feedback site about the kinds of information users would like to see coming out of novo|seek.
A related question pertains to your business model for novo|seek. It is advertising based, correct? Whom do you see as potential advertisers? I would say that if I were Elsevier, for example, I would certainly take a look at novo|seek as a venue for touting my new products SciVal Spotlight and SciVal Funding. And if I were a sci/tech publisher like Springer or Sage, I would certainly take a look at novo|seek. Are those the kinds of advertisers you are looking for? Let’s say I am a headhunter for a life sciences firm — would I be wise to advertise on novo|seek?
Online advertising is a great way to promote your product or services. You get a lot of control over the campaigns and the ROI is readily identifiable.
We have identified a lot of potential advertisers. These range from publishers, as you suggest, to organizations that want to promote an event such as the CHI, to schools willing to promote educational courses such as the biotechnology program in the Instituto de Empresa. These also include bioinformatic software platforms such as Genious or scientific providers such as Invitrogen, Active Motif, Sigma Aldrich, as well as pharmaceutical companies or any service provider in the biomedical sector (lawyers, consultants, human resources companies, etc.).
And apropos of ads, I just tried in novo|seek my favorite search term, amyotrophic lateral sclerosis. The ads generated by Google Ads were interesting — one was to the ALS patient community page of Patients Like Me and one was to the results of the subject of Lou Gehrig’s Disease on Bing. I just clicked on both from novo|seek. How do you leverage clicks like mine just now?
The amount that Google pays for clicks like yours varies depending on how strong the relationship is between the ad and the content of the site. We are working to have Google display the most relevant ads according to the search terms, which would provide the most interesting information for the user.
You have a widget for novo|seek. That is pretty cool. Can you give us examples of who would want to install it on their Web sites? Medical librarians in charge of Web sites? Bloggers on biomedical topics? Have you had pretty good success with adoption of the widget? What difference has it made to your bottom line? What would be the advantages of its adoption by various sites? Where has it been adopted so far? I would think that the webmasters of the sites of scientific societies and professional associations in the health sciences would see the value of providing visitors to their sites with sophisticated search functionality while there. For example, just out of curiosity I just visited the Web site of the American Chemical Society and they seem to have a search box for site search but not for general searches. Is this the kind of site that could benefit from a novo|seek widget? I could see it sitting right next to the feature Molecule of the Week (which is a pretty cute feature, by the way, ACS!).
The widget is another tool we provide for life sciences professionals and they can install it where they want. It allows them to look for literature in Medline, grants and full text. Having the widget, and thus novo|seek’s technology, at hand is very helpful.
Additionally, we appreciate when users install the widget on their blog or web site because it shows they are recommending it to their readers and/or students. The widget is easy to install and users can customize it (though it is not compatible yet with blogs hosted on wordpress.com due to restrictions in java). We have learned that some people would rather install a “classical” search box, so we have also designed one for them. For example, the see the sidebar of Documentación, biblioteconomía e información. In total there are currently 6 blogs that have installed our widget.
You say on your site that, “Soon new sources will be available …” Like what? Will we able to search easily through ScienceDirect, for instance? Have publishers been willing to talk to you or do you find that some of them are still surprisingly reluctant to deal with search engine firms, so mired are they in pre-Web 2.0 ways of doing business?
We are currently doing some research on our users to find out what other content could be of interest to them. We have a few in mind but we need to test them out with our technology and see the quality of the results.
Regarding publishers, we have been getting some interesting responses. The editors from Elsevier are interested in this type of project, so we have been in talks with them. They have some really interesting projects for ScienceDirect with which we believe we could help.
I am really fascinated by the liveliness of the The Life Scientists room of FriendFeed. What has been your experience with it as a search engine firm among the scientist members?
It is true that the Life Scientists room is one of the most active for biomedical professionals on FriendFeed and that is why we like it. We first discovered it by following a conversation through Twitter and decided to take part in it right away. Once we subscribed, we were able to learn a bit more about the people in the room and enjoy the discussions and the information that is shared on it. We post to the site on a regular basis, either about news at novo|seek or some information we have come across that we find interesting.
From the search engine point of view, it has allowed us to contact some people about Bioalma and novo|seek, which has enabled us to build new relationships and get great feedback!
Can you tell us about your use of other social networking tools such as FriendFeed and Twitter? Any advice on their use you can offer your colleagues in the search engine field for generating buzz and eliciting feedback?
Of course! We are experts in text-mining, and also in web 2.0. As we were using these tools before (many of us have a personal Twitter account, use delicious, Friend Feed or share information via Diigo), it was only logical that we create similar accounts for novo|seek as a product. We think they are great resources to allow us to get closer to our users. For instance, one day a scientist asked on Twitter if novo|seek was down. Within 5 minutes, 5 people (including us) had answered that it was working fine!
I do not think there are any secrets with these tools; they are a simple way to share great information and people can answer or share with their contacts as they see fit. I do not consider them a promotion channel but rather another method of communication. It’s simply about talking and listening. Also, be patient. You can’t be the “new big thing” overnight. For us, these tools have helped build strong relationships, meet people and find opportunities. And last but not least, we’ve gotten great feedback that helps us meet our customers’ needs a bit more every day.
Where do you want Novoseek to be in a year? In five? Can you tell us about premium products in the pipeline?
Our plan for the long term is for novo|seek to be the best platform for biomedical search information. To accomplish this, we plan to work on three main areas:
- Introducing new functionality that is helpful for user access and regarding the information.
- Displaying new information extracted from our text mining technology.
- Developing analytical functions that cover the needs of specific scientific areas.
We talked a bit on the phone about the challenges you face as a European firm (headquartered in Spain) operating in a world in which so much medical research and search technology is located in the U.S. and given that the European market is so fragmented. Can you tell us a little about that?
With a web-based product, being in Spain is not that much of a problem since the Internet and conversations over different social media are fluent across all regions. However it is true that face-to-face conversations and events always give us more insight and feedback, and that is something we cannot do as much as will would like to.
I can’t seem to save my searches in novo|seek and set up email alerts of them to myself (as I can in Mednar) or contact the researchers (as I can in a mediated way in GoPubMed) or highlight paragraphs and click to get similar results or quickly get “More Like This” (as I can in DeepDyve). Will you be offering features like that soon? What feature of Novoseek is your particular favorite?
We have some functionality that will be ready by the end of July that will let the user save a search, create alerts or tag documents. We will also let the user filter the search by document type and we are improving the authorship search as well.
The functionality I like most is the filter and the visualization by sentence. These two functionalities are really useful to quickly and easily distinguish the document of interest.
Now I want you to prognosticate for a moment. How do you see the Semantic Web affecting biomedical search. Are you better positioned than other search engines for that brave new world?
I think that Semantic Web — meaning a new way of data tagging and structure — is a very long-term approach to extract relevant information from text sources. It is an approach that relies on the user’s willingness to structure the meaning of the context in a way so that a machine can understand it. This approach is very time consuming and you have to get everyone who produces content involved. Now that the generation of content in the Web is at everybody’s fingertips, a lot of people are involved.
If you think of Semantic Web as a way of exploiting the semantics in the text to extract relevant information, we are better positioned since we have been developing that technology for nine years.
Finally, who are your personal heroes in search, science and history and in your own personal experience?
I do not really believe in heroes, but rather I see everyday people that I admire for their commitment and the passion they have for what they do, in either their personal or professional lives, and for how they deal with everyday situations. I think that anyone who can put into practice all or just some of Kipling’s “If” poem could be considered a hero.
Thank you for your time.
You can follow novo|seek @novoseek on Twitter or novoseek on FriendFeed.
Are you a Twitter user? Tweet this!






Dear Ramón Alonso-Allende,
regarding your answer to the question “how NovoSeek differs from GoPubMed …”, I must say I’m disappointed. Let me tell why: first, what is described here for your search platform was done 6 years ago when GoPubMed first went online. I don’t see any new features above GoPubMed or others. Secondly, your statement: “… lets users search more accurately, in less time and with less effort than any other search offering in the market today” is unproven and just another marketing statement. For “Heart Diseases” you find 163,595 results. We know that at least 848,633 are semantically relevant. It is hart to believe that Novoseek involves any semantics besides synonyms. Using the query “Vitamin C”, NovoSeek is missing (using MEDLINE source) more than 5,000 papers. Additionally, I’m missing all the following features:
· Statistics about queries, ontology terms, persons (co-author network diagram)
· Top author for your query showing the automatically compiled vita references
· Editable author profiles, contact authors
· Curation tools for scientific crowd computing
· Navigation based on MeSH, Gene Ontology and proteins ontologies
· Auto-completion based on MeSH and the Gene Ontology
· Syntax like Pubmed, related articles extracted from Pubmed
· Wiki links with concepts found in abstracts
· UNIPROT links with proteins/genes found in abstracts
· Using the tree: easy option to include and/or exclude concepts in queries
· Easy navigation to dive into the ontologies and find related terms and functions
· The WWWW (WHAT, WHO, WHERE, WHEN) concept: navigation using combination of queries based on concepts, persons, places of affiliation (countries, cities) and/or places cited in abstracts, journals (reviews only, high impact journals) and date (year, months, days)
We are open for any discussion! But let’s move on to the next semantic stages. There are so many open, scientificly interesting and unsolved issues where all should collaborate: ontology alignment, unlearn “old” facts from the ontology and last but not least ontology generation. Let’s go for it!
Sincerely yours
Michael
Gentlemen: Thank you both for your fascinating contributions to this discussion of important topics.
Hope
Dear Mike,
The differences between novoseek and goPubMed are clear under my point of view. As you put it in your comment, there is a lot of pending research to do regarding ontologies; therefore and although we take that into account, we don’t rely that much on them for our analysis.
As you know, it is not all about getting results or reordering pubmed results based on any type of ontology. It is more about differentiating those documents with the most relevant meaning from those that are there just to keep the user busy.
Regarding functionality, novoseek’s approach is based on keeping the user’s focus on getting the information and we like to make it simple. When we decide to add extra features, we want to make sure that it is really interesting to the user. If we see that there is functionality that people is not using, we just take it away, because it does not help to the final experience. However, I modestly suggest you take a closer look to novoseek because, if aiming to be fair and not only state goPubMed feature list, some of the functionalities you claim to miss are actually present in the tool.
Thank you for your comments.
A very gentlemanly exchange. Thank you both for educating all of us on these matters.
[...] additional tools such as better navigation (I’m thinking specifically of something like novo|seek or GoPubMed) may increase the value of the [...]