BarCamp Cambridge - teacking computers to understand text, Peter Corbett
a desk at the computer lab and at the chemistry lab.
computationl lingustic chemistry
auto-detect language in chemistry papers to try to recognics chemical and
markup.
suppliment the mark-up from publishers.
can draw the chemical and annotating them overlayed over the paper
some problems are that there can be new names in papers,
comapct names, include extra hyphens, this program can deal with these kinds
of things.
also can use systematics parsing.
this is the core technology, you can do things like search for alkloids in
your paper, or document dump
this seems to run within a browser.
run the software over a corpus of about 100 papers, and created a search
engine out of this?? I Might be wrong about that.
can create an svg
can go from plain text to something like a connection layout using an
information rich markup
the RSC is using this software along with human-clanup to create markup of
chemistry papers.
can then to semantic search over papers.
Small natual languge processing trick
image we were interested in opiates,
we could just ask opiates to google
you can ask a question like "opiates such as" will give you a much better
return on results.
I just checkd this ad it works
there are many patterns like this, they are known as hurst patterns.
he did a pass over abstracts on pubmed for these kind of patterns to make a
network of relationships
there is not a connected graph
dot failes on large graphs, but the demo does show that you can automate the
discovery of reaction networks.
you can do reasoning on structure as well as process (now he mentions lot's
of chemical names that I know nothing about)
a few bits of wisom from this
most of the informaion has come from biochemists rather than chemists,
more biologists are into open science, and open database
chemisty has ben mostly captured by commercial interest,
hard to get free chemistry data.
next is to define what you are looking for?
you want to be able to evaluate how well the software has done
how do you post-annotate the documents?
in a lot of text there is a diffeernce between what you think the world
looks like and
how it is described in the literature, so even when you get people to ..
question about confidence levels,
the most recent piece of the software has confidence levels. rare events
don't provide
good confidence levels
it could depend on what you are looking for for,
Peter thinkgs that confidence is important for these systems
e.g. "a such has b" if b might be a chemical but you are not sure. if later
you find in your search that a is indeed a chemical it raises your
confidence that b is indeed a chemical
Q: is there any way to automate the acronyms of chemicals.
turns out that this is not allways nice. you can do some of this.