CureHunter is a web accessible, fully integrated scientific search, data retrieval and analysis engine. Developed by a team of scientists with expertise in medical data mining, artificial intelligence software development, computational linguistics and computational biology, CureHunter “reads” the entire U.S. National Library of Medicine Medline Archive and automatically extract and quantifies the evidence for successful clinical outcomes of all known drugs for all known human diseases.
Hope had an opportunity to talk with Judge Schonfeld, CEO and Chief Scientist of CureHunter. In part I of their interview below, Judge talks about the development of CureHunter, the definition of “autonomous search” and the difference between CureHunter and other authoritative online reference services.
Before we begin, Judge, I’d like to give our readers a little background. I had looked at CureHunter some months ago but didn’t really get it. You and I then chatted for quite some time on the phone and you walked me through the many features of CureHunter and I was very impressed both by CureHunter and by the depth and breadth of your knowledge about the complex fields of medical data mining, artificial intelligence software and other topics. I think our readers need to know about CureHunter and would benefit from your comments on the state of search and the place of medical data mining in the worlds of healthcare information management, healthcare IT, the electronic medical record and the whole field of semantic search, specifically in medicine.
The Interview, part I
First of all, please tell us about the origins of CureHunter.
I and my sons have built a super computational expert biomedical system that at the push of a button — no more user interaction required — reads the National Library of Medicine archive from 1932 to 2009, thinks about what it has learned and completely autonomously discovers [directly computes] new cures for human disease.
Tonight, as I have done every night for the last 4 years, I will push a button and the baby CureHunter Machine will re-read the entire National Library of Medicine Medline archive, update all its knowledge modules, and e-mail me in the morning with any new cures it has found for cancer or any of another of 11,600 diseases.
And by the way, Hope, CureHunter can email any of your readers directly , too, on illnesses they treat, research or suffer from. Just go to www.curehunter.com, search on a disease and click the RSS orange news feed button at the top of the next page to which you are taken. There’s no charge to get the feeds.
And best of all CureHunter is not Medical Google or WebMD. It doesn’t think like they do, because they don’t think, and it does. Don’t be fooled by the presence of a simple search box. CureHunter doesn’t make lists of articles for you to read: it has read everything, evaluated the evidence itself and gives you an answer. Its “Machine 2nd Opinion” of the best drugs to treat any known disease.
From my point of view, it was really the integration of those three professional impulse engines of scholarship, business and science that trained me in how to build our machine.
I think the common denominator in our scientific swat team of a family is a love of symbolic languages — natural and computational and understanding how they function at a very deep level.
Every nursing and medical student and doctor and scientist who wants to understand CureHunter should start by playing around with the Visual Medical Dictionary function Alex built especially for students visiting our web site: http://www.curehunter.com/public/dictionary.do
When the Network Graph appears start double clicking each of the nodes that appear … until you can see how CureHunter’s brain connects the dots of discovery using its built in graph engine. What ideas link? Why?
The CureHunter Visual Dictionary will show you how clusters of medical and biomedical notions extend themselves from formal taxonomies into functional clinical data and leading hypotheses. It shows you that the machine thinks like you do with a neural network: it keeps putting two and two together … until you go, ah ha.
Dig a little deeper and you will discover that whole beautiful dedicated storehouses of alpha wisdom like the National Library of Medicine and Entrez databases holding trillions of words aimed at cataloguing close to a century of empirical observations from chemistry, biology and clinical medicine — are beginning to come together in one integrated data space.
Unified by a shared taxonomy and ontology.
So much data, so little time.
Scientists are born to search, but more importantly to measure. Often finding the key variable in phenomena to measure is at the center of new scientific understandings. CureHunter was born not to search for medical ideas, but to measure them. Born to instrument and quantify medical knowledge in a consistent, replicable way that would allow machines to think about specific findings, form hypotheses and predict new cures based on “quanta of biomedical knowledge.” How much do we know? What is the volume of our knowledge? What is the density of our knowledge? How strong is the force of what we know?
Like data clouds in modern wireless networks, information packets could be modeled as dispersing along vectors and instead of Internet TCP (transmission control protocol), CureHunter TCP would be based on transmissive curative packet networking.
How many messages does a drug send to its disease target?
You have a checksum, message receipt confirmation, when the disease changes its behavior and goes into decline. “Die evil tumor die” is the targeted message sent by most chemotherapy drugs. Tumor regression, tumor shrinkage, normal blood counts, no metastases, returning patient strength and a myriad of other measurable signs prove to you that the message got through. Incremented change was empirically knowable.
All of this thinking about CureHunter as a hard science instrument, grounded in information theory itself, is fundamental to why and how it can actually self-discover new cures for human disease.
Consider these three expressions:
Search for the Cure.
Compute for the Cure.
Imagine these are linguistic equations — instructions in a high level programming language — that a machine can understand.
What comes after the = sign?
Search = ?
Compute = ?
Change one key variable term in the NLP and you have a game changer technology. You have reached The CureHunter Center for the Direct Computation of the Cures for Human Disease.
You are no longer talking about search engines with people writing random queries into text boxes that direct them to lists of millions of articles they are supposed to go off and read one by one to determine if there was any meaningful data in the returns.
You are instead asking a computer for its answer. Its end point calculation of this thing called variable name cure.
Functionally, it’s very similar to the question our doctors answer 100x every day: what drug = X should (not can) I take for my Y = disease?
Many health information sites will tell you what drugs you CAN take … only CureHunter actually calculates from the evidence those you SHOULD take: the ones with the highest probability of making you well.
Every web site in vast homage to Google has today a search box where the users compose their own queries.
CureHunter, however, doesn’t work that way. We have a “search” box, but you don’t compose a query of any kind — because you are only asking one question.
At CureHunter you are always asking: “Machine, may I have your 2nd opinion, consult please on the best meds to treat my disease.”
So all you enter is the name of the disease you want to cure.
It is important for your readers to understand that in CureHunter all of the data inputs are “relevant” to start with, all of the time. They are precise measurements of quanta of key information about clinical efficacy. There is no such thing as relevance ranking algorithms — the basis of most web search engines — in the system. There is no laundry list of vaguely related articles you might want to read some day if you live long enough.
By rigorously defining the properties of trillions of variables existing in the medical knowledge universe, the CureHunter machine can systematically, algorithmically measure the expansion of what we know about what heals and predict the evolution of clinical efficacy.
Operationalized, the CureHunter Machine is a massive clinical outcomes relational database holding about a trillion variables that are fine focused and networked as functionally related agents and factors that achieve efficacy in human systems biology.
Ultra high precision context sensitive search feeds its natural language parser whose only goal is to extract with very high accuracy, 95% +, key clinical findings from raw text in the Medline archive.
These measured “facts” about what really worked to heal someone are stored in a single numeric array over which Network Graph Theory models and predictive analytics run transparently to the user and autonomously predict the best medications to treat any human disease based only on the evidence.
The user of CureHunter whether doctor, patient, or biomedical research scientist never writes a traditional search engine query. Never enters Boolean operators or filters, stop or non-stop words or alternate spellings and phrases — or dates, or times, or authors’ names.
He or she only enters one word: the name of a target disease or the name of a target drug or biological factor (protein, gene, vitamin, ligand, kinase, mineral et al). CureHunter’s brain automatically connects all the dots (theoretically possible queries) that contribute to successful clinical outcomes: individual disease-drug sets of relations are compared automatically to all possible relationships based on our fundamental concept of clinical efficacy indexes.
What data point can we measure that tells us this drug works against that disease? What did the instrument see?
Example of a typical Medline raw text sentence auto extracted by CureHunter from the peer-reviewed literature for analysis: “Remicade, a trade name for infliximab, a monoclonal antibody, achieved significant remission in PASI scores and reported pain in distal phalanges for 32% of the double-blind trial patients with both psoriasis and arthritis by suppressing Tumor Necrosis Factor-alpha induction of cytokine cascade and autoimmune inflammatory response.”
You can see that this single sentence extraction (one of many millions in CureHunter) has data points on multiple diseases, several mechanisms of action, key bio factors and pathology data along with specifics of clinical outcome response on established severity scales related to the illness.
Now imagine that every sentence in Medline 1932 — 2009 updated daily has been read by CureHunter and parsed to a similar depth and far deeper (cross linked to chemical abstracts, protein databank and molecular visualization systems) to store a network model of this idea about autoimmunity as structured data in its monolithic array of all causes, and cures, and outcomes.
Now imagine a doctor or scientist with a large enough brain to have read 20 million research articles, remember every data point in every article, and cross connect every similar finding to see patterns in that data leading to the cure of a human disease. That’s CureHunter the AI Machine. It’s not a search engine like any on this planet.
It self-authors the 11,600 Patient-Physician Summary Report studies available at our site as disease-specific monographs, untouched by a human hand or editor.
In those reports every drug that ever showed significant clinical efficacy against the target illness is meta-analyzed for its relativistic clinical utility; and CureHunter’s machine-written meta-analyses rival those produced by human expert teams in precision and accuracy. And often they are more comprehensive than human meta-analyses by an order of magnitude because the machine has no limits on how much it will read until it gets to the totality of all that is known, i.e. has been published in peer-review 70 years ago or last night.
Could you go to Google, or WebMD or Microsoft Health, or GoPubMed, or NextBio, or Elsevier, or Wolters or Thomson or UpToDate and ask with a mouse click that a fully documented peer reviewed evidence-based meta-analysis of over 200 drugs for your particular disease of choice be sent to you in 10 seconds?
No you cannot.
But, you can at the CureHunter web site, just enter the name of your target disease in the Patient text box at the upper left of the home page: www.curehunter.com
The rest is automatic.
CureHunter updates itself every night and changes its on line “machine 2nd opinion” consult drug recommendations if new data has just been published. Its self-written meta-analysis of effective meds for neoplasms is 770 pages long and hot links the reader of its PDF version to the source data for every drug ever found clinically useful against a cancer in peer review.
CureHunter is not a black box — if you want to audit the data trail from source to conclusion, with one mouse click you can from hot links in the report. And you can buy this book with all the best cancer research of the last 70 years meta-analyzed for $24.00, in 2 minutes on line at our site. Just enter “neoplasms” in the text box upper left of the home page. Read it, and you will pretty much know what all our best oncologists think.
Not only does the CureHunter Engine write its own reports with one mouse click, it also learns by cross connecting the clinical data from every study in its network graph head. And that’s how it can discover new cures and off-label applications for all existing medications. In a sense it knows how to think about everything we know and solves the burial by data problem by thinking across massive clinical and scientific silos of data to come up with a calculated answer to the question: what cures?
Please tell us about your two sons Alexander and Justin and their role in the development of CureHunter.
If you think AI, Medical Lexicography, Computational Linguistics, Artificial Life, Gene Sequencing and Code Pattern Recognition, Computational Biology, Machine Translation and very deep understanding of symbolic and natural languages — you begin to get the picture of the family business and scientific swat team behind CureHunter.
If you add in the availability of vast amounts of low cost computing power, on line access to the world’s best bioinformatics databases via the emergent Net, and years of best in class programming experience to solve these complex problems, you can see how our small team was able to produce a very powerful and innovative machine.
With zero advertising, PR, marketing or promotion within one year, the British National Health Service had become the largest single user of CureHunter in the world, with its Physicians asking for its Machine 2nd Opinion of the best medication to treat a target human disease over 250,000 times per month.
Doctors and Research Scientists at Stanford, Harvard, the Mayo Clinic, the Pasteur Institute, NIH, FDA, Pfizer and many other major pharma were also audited heavy web users of the server.
Independent Web auditors measured the CureHunter audience as most often visiting also: the British Medical Journal Group, Science, Nature and the New England Journal of Medicine. Out of the box and on to the Web, our young company was matching the best in class authoritative sources for the highest quality scientific medical information.
How did you come up with the name CureHunter? Why not, say, “TreatmentFinder?”
A CURE is the ultimate benefit of all medical treatments and the central focus of the CureHunter Engine: the provision of an evidence-based cure with one mouse click.
Sick people aren’t looking for treatments. They are looking to be well, to feel right, to function correctly — to be cured.
The ability to deliver a cure is why we have doctors, biomedical researchers, health care providers, and pharmaceutical scientists in the first place.
A cure is the gold in any medical data mining system. And contrary to popular opinion, the term appears extensively in the scientific literature as the desired end point of a clinical treatment.
What do you mean by “autonomous search” — what is the origin of that phrase and how does it relate to semantic search? What is the difference?
Those of us who have studied search, formal semantics, and linguistic technology for many years know that the way you ask a question will determine both the quantity and quality of the answer. A single an/or/of/the or other stop word can vary returns by many orders of magnitude and degrees of accuracy.
“Autonomous Search” the Expert System search in has a virtual expert medical librarian built into its cognitive model with the ability to resolve synonymous cross references, multiple naming conventions and multiple meanings with great accuracy. Notice if you pause your mouse over any technical term in CureHunter canonical definitions of all the various names and abbreviations for the drug or disease or bio agent you have selected pop up in a yellow window block.
Notice when using CureHunter, you write no queries. The machine knows all possible queries, and they are hard wired into it for speed, accuracy, precision and direct computation of scientifically verifiable results. You get no results that are sort of relevant or sort of more like this or sort of more like that. All of its results are specific data points suitable for use in computations that reduce error, rather than induce it.
In general, the phrase Semantic Web Search and Web 2.0 technologies refers to all text search and analysis capabilities that have some level of semantic intelligence embedded in their technical implementations with the goal always being to resolve ambiguity in the original query to the greatest extent possible.
Various engines are more or less successful at dealing with the problem. I will say for the record that Google and Microsoft just don’t understand the problem very well at all and don’t seem to know the correct way to approach solving it because they have been trying for over a decade with many brilliant people and a small fortune in funding — and still they don’t get it.
This is demonstrable on any given day just by putting the same exact query into multiple big market share search engines and studying what they bring back as “relevant” Imagine in analogy with machine translation that you ask someone to bring you an apple and they keep coming back with a giraffe. Some party to the bipartite communication isn’t getting the message accurately at all … and that’s why you the questioner aren’t getting the answer you seek.
You say on your site, “… our mission is to help people get well by offering both patients and physicians the opportunity to get a “machine 2nd opinion” on all complex drug decisions.” Could you explain how CureHunter differs from UpToDate, MD Consult and McGraw-Hill’s AccessMedicine?
The high quality consult products you list are very good — sound, ethical products with good science underlying them in general. They don’t bring back search noise in the manner of a Google, Yahoo or MSN general purpose engine as a general rule relative to the volume of valid signal they produce.
But fundamentally they are created by asking a theoretically expert group of human specialists — with the necessary tunnel vision and limited currency of directed expertise — to periodically up date the rest of us on what they know.
That assumes the experts — many of whom are in daily practice — have the time to read all the new research, compare it to all the old established research, take notes, structure data, and carry out an objective meta-analysis for every opinion they write.
Do you believe they are working that way? Honestly, I do not.
It is not humanly possible to carry out their tasks that way any more.
Consider again that burial by data is the problem and it is getting worse all the time for all of us and all our experts. UpToDate — while it is an excellent quality resource — is ironically nowhere nearly as “up to date” as CureHunter and every second it is less so.
Over 50% of the peer-reviewed literature reporting a successful clinical outcome is likely NOT to be published in the Journal of Record for the specialty experts nominally caring for that disease — or the specific expert contracted by UpToDate, Consumer Reports or WebMD to write for that specialty.
For example, a key finding in an oncology or neurology journal relates directly to an autoimmune disease but is published in a journal on cardiology or dermatology and is never read by the oncologist or neurologist.
This is the law of diminishing knowledge returns that is proportional to the increasing depth of specialization in many scientific fields: it is burial by data in action.
One of our early QA studies to see if CureHunter was missing important findings was a histogram showing the distribution of good journal data over sources not read by the primary treating physician. In short, how measurably ignorant — unread — is the treating doctor likely to be?
No good doctor is stupid, but many are ignorant of massive bodies of data that could change the way they treat their patients.
You can also see this result empirically by comparing meta-analysis papers retrieved from Medline to those created by CureHunter. Invariably, the human authors miss many important and substantive bodies of data because their original search query for source papers was totally inadequate, even though in many cases they collected a lot of data.
To be expert in the way that CureHunter is expert, the human would have to read all of the journals all of the time for all the specialties and general practice journals and cross connect significantly related findings from all of them automatically.
CureHunter does that every time you click your mouse and it is easy to see measurable results by clicking the Related Diseases TAB where diseases sharing key drugs and bio agents across their cure or pathology history are automatically connected.
Example: What do ADHD, Dementia, Parkinson, Multiple Sclerosis and Schizophrenia all have in common? What proteins, what treatment drugs, what genes? What do the interconnects tell you?
You can’t find those answers with the tools you mentioned, unless perhaps the medical librarian takes months off to go and look down every interconnect trail and document her every finding.
On the other hand, if you want to wrap a lot of medical canonical and often commodity context information around an established treatment, they are excellent tools; and I would hope that all my physicians have access to them.
Sometimes a doctor will want to check his diagnosis, a list of symptoms, possible complications, possible co-morbidity, or default med selection against another authority; the above are good tools for the task.
I have great respect for authoritative textbooks, canon, careful incremental science and complete documentation especially in the field of clinical medicine where the doctor is burdened with life and death responsibility every time he writes a prescription whether for aspirin or exotically toxic chemo.
In our phone conversation you stressed that CureHunter is more of clinical decision support system sort of tool than a search engine. For those of us with medical library rather than healthcare information management backgrounds, could you please discuss how you would classify such things as CureHunter, UpToDate, MD Consult and AccessMedicine versus what I would call search engines such as DeepDyve, Mednar and the gold standard of medical search, PubMed? To wit, who uses what and for what?
If you think about CureHunter, you realize that its instrumental job was to put its “machine eyeballs” on mountains of raw text and convert it consistently to digital format that can be used to directly compute the answer to the problem of the best drugs to treat human diseases. And, furthermore, to extrapolate from those known data stores to new cures.
Thus the big difference in a lot of the products in the medical information space has to do with their original specified design purpose. What information are they trying to model: pharmaceutical research, treatment research, drug delivery, or other and combinations? How well have they digitized input sources?
What answers are the packages trying to compute if any? How much interaction with the interface do they require of their users? Do they provide results that are better, faster, deeper and cheaper than those that can be retrieved by human hands in the old fashioned way of graduate researchers everywhere. Go to the library, find the right articles, read a few, make notes, derive a conclusion from the data.
Some software tools are screwdrivers with unique tips, others are ratchets with multiple heads-functions-some are one-size fits all adjustable information tools that can tell you a little bit about almost everything, but nothing very original or with great precision about anything.
Each one of the tools you mention has value:
MD Consult is good in the same way. Increasingly one site or another will offer more or less full free text based on their business and licensing models.
Epocrates, DynaMed and Inforetriever offer pocket PC and student versions.
In the final analysis, some products are more useful for research — off line — than in daily practice and some solve immediate clinical problems. For example, most modern pharmacies run contraindication alert software that flags possible drug hazards for patients already on multiple drugs or about to be prescribed a new med. That’s a very specific narrow, but important function. No research or intelligence is in the tool. Tables are compared.
The CDSS class of software, the clinical decision support products, often yield immediately actionable diagnostic or prescription results: e.g. If patient shows fever with X, then Y is the highest probability of diagnosis.
Dr. Octo Barnett’s DxPlain software he developed with his Harvard Medical School students and in use for years at Mass General is a good example of a diagnostic answer system supporting Clinical Decisions.
If patient is contraindicated for drug X, then 2nd line default is Y. Typically the CDSS has significantly more intelligence than drug alerts, but they are designed to provide alerting answers to clinical questions in as close to real time as possible. They are logic, flow chart, and table driven to put a known answer, and not another question in the doctor’s head.
Mednar, while trying to add search value through focus on the medical domain, does not clearly (to me) offer any technical advantage over any other generic medical search engine whose primary output is a list of articles for the reader to go read him or herself on the generic topic they have entered in the query field. Their top 10 returns are usually from Medline Plus or PubMed. If WebMD, Mednar, Google, MS Health and Medline Plus itself all deliver the same top 10 articles for the human to go read personally from primarily the same free sources, what’s the added technical value of one over the other?
PubMed as the grandmother of all biomedical archival and search systems is comprehensive and authoritative, up to date, and properly maintained year over year; but it is not an immediate answer or action system.
It does not let you directly compute any treatment answers per se, automatically generate reports or automatically discover new cures for human disease. You can’t directly export key findings in its holdings to computable analytics systems and, of course, that’s exactly why you would use CureHunter.
PubMed is the best biomedical STUDY resource in the world, but it does not deliver STAT answers, generally speaking: it gives you a list of articles to read.
CureHunter combines both STAT and STUDY functionality: the blue meta-analytic graphs of relativistic drug efficacy appear in 10 seconds … if you then want to study the data sources supporting the graph — which has plotted the best drugs based on the evidence — click on through to the hot links to the sources.
I wouldn’t want to criticize the science of others further without doing a detailed analysis of their algorithms and methods and sources and design goals. That not always being possible, the best thing for medical librarians to do is what IT managers have done for years: meet with their users and understand the applications in great detail. Have product demos for your user stakeholders. Sign up for trial licenses, etc. Test and you shall find.
Seeking and searching alone are not enough.
Are you a Twitter user? Tweet this!
In part II of Hope’s interview with Judge Schonfeld, Hope focuses on the users and uses of CureHunter; Judge discusses the differences between CureHunter and Wolfram|Alpha, and compares search results from CureHunter to novo|seek, GoPubMed and PubMed.






[...] « Previous article CureHunter: Interview with Judge Schonfeld, part I Next article [...]
Twitter Comment
at NGS, interviews with Curehunter’s founder: [link to post]
– Posted using Chat Catcher