10 posts tagged “barcamp”
what's cool about microformats web?
is it the stickers?
the t-shirts
the community process
urlb.at/2f
personal information disaster
travel
airlines don't talk to railroads
microformats say, what problem does it solve?
perhaps there is no problem at all
what problem does blogging solve?
Twitter for christ's sake?
no one knows what they do until they are popular
e.g. yahoo pipes is not practical yet
it is a user experience nightmare
and it doesn't have a clear defined purpose
useful becasuse there is a lot of data via rss out there
it gives us room to play
microformats is not compatible with this
should be put up data and let people play
adding some richness to the data
if if does not get used then darwin will clear up the mess
the amount of interesting data is greater than the possible number of microformats
this is a mind flip from SQL
RDF is to SQL what dynamic is to static typing
use eRDF
gives an example of sh1 of email address for putting in this data into HTML
if it maps to an RDF schema then you can use it today
if not it is based on URI's and you can make your own schema
for a free market everyone needs to take part
GRDDL only W3C could come up with a name like that
triplr.org does this
triplr.org will tell you what data your page is putting out
the semantic web is only scary if you make it scary
look at Cwm python tool, closed wold machine
can use with FOAF to make a page of hCards
you can combine them together to see all of your friends
uses api's
gives you a FOAF docuemtns and HTML page with hCard/XFN for import to for example Dopplr
tommorris.org
homework:
add some rdf data to your site
getsemantic.com
OS CMS systems beat the crap out of the free ones for what you get for your money, including
support.
If you have a budget then you can get in touch with the authors of the OS systems easily
the only thing they sometimes don't win on is polish
it is someone's job to look at each piece and make sure that it is slick
Drupal is free,
It upgrades about twice a year, one major one minor
is built on PHP and MySQL
scales pretty well, but perhaps not as well as to the size that sanger would need
but does scale on small hardware to 100s of thousands of items and users
is very modular.
it has lot's of modules
core modules are very well written
It is very flexible. the core bit of content is a node,
any content that you have, if it is a node, then it inherits a lot of features
such as getting commenting, revision control, access control
categorisation,
CCK is the content creation kit
Views module is for making custom lists of pages and custom lists of notes
for example in hte drupal site there is a blog module,
but you can make any custom view with the views module
overview of admin page
and module view, the view is quite hard to see as the light is a bit high in the back of the room at the moment.
There is some disagreement about the status of Casablanca as a great movie, suggestion gets derisory snort from Matt, ces't la vie.
a staging system would be nice, but is not there at the moment
Ensemble came out of the human genome project about 8 years ago to prevent
commercialization of genomic data.
the idea was to have an open source human genome
companies would have to do some work before they could make money off of
sequences.
the ensemble projects takes the raw data from the genes and adds other data
to this, such as reference data from other experiments
there is enemble code
and there is the data
there are 41 genomes,
the code is also used elsewhere from this project
everything is OS
there are probably about 100 instaled copies world wide
it is 1.5 milion lines of perl code
major pharma companie use it and layer their hose data on top if the public
data
there is a public mysql interface
ww.ensembl.org (no e on the end)
there is also an archive system to see old data
everything is in CVS
there are about 40 people involved directly from the gene builders through
to the comparative groups
there is a funtional annotation of the genome
there is the web team, an outreach team a helpdesk team.
a warehouse team.
and others ..
there is support from the core web team,
scale
35 species in ensemble, human mouse rat zebra fish
then there are random mammalls
hedgehogs, many mammals from madagascar
the platapus has a poisned claw
they are runing half a million search index queries on one machine, this
makes them about the 5th
largest search index in the world
about 2 million page impression a week
100 gb's of data traffic
they have 20 4 core machines, about 80 cores to run the site
BLAAST SSAHA servers
using 40 TB's of data at the moment
you expect hardware failure every week, and they don't let you know
at this point about hardware failure every day
currently on 3rd set of web code
2000 human
2001 mouse
2001 fly
2003 Vegas site
2004 archive site started
2005 web code v3
2006 users and groups
in about a month ensembe 50 will be released
also have a number of other sites
they have a two month cycle for releasing data, and code.
the day after each release they start building genes again
many data sets take longer than this, for data, the new mouse sequence was
released by ncbi 6 months ago,
but it has taken this long for sanger to do the annotation and comparative
work.
there is a pre-site for data that didn't quite finish within the two month
cycle
VectorBase - ensembl for desiese vectors
Gramene - esembl for plants
Cosmic - uses the drawing code
they are moving over to AJAX because people don't realize that items in the
interface are buttons or forms.
a lot of the interaction is human interaction
they hope they can make ajax that does not break the screen readers, hope
that ajax will offer a web services
platform. this leads to issues of display vs data markup.
webcode is extensible by plug-ins.
can add code which resides outside the main ensemble CVS tree - but
accessible from within.
and that's it
Questions:
Q: how does MySQL cope?
it copes really well, they have 150 GB, about 5GB is in RW DB the rest is in
Read only DB
the issue is not the size of the data, but the number of tables.
one of the DB's has 3000 tables, so they have very careful balancing of data
on the servers
some problems come from MySQL not being able to have key
talbes larger than 4GB,
and when you have 60GB of memory then you run into this problem.
the bottlenecks tend to be in the code layer, not in the DB
this is one of the largest MySQL DB's in the world
currently using 4.something, keep planning to move to 5, but keep finding
other things that are more important.
there are a lot of left joins in some queries.
sometimes it is easier to do these joins in perl rather than in
MySQL,
millions of times faster than in MySQL
connected to the net via a 1gb net to Janet.
James is just an interested bysander on the HTML 5 mailing list, hey, it's a
barcamp
html5 is th enew verison of html
apple
mozilla
opera
anyone who joins the mailing list
and
w3c (which means MS, which means this is going to work in IE)
if you have ideas, then you can joing hte mailing list and put ideas
forwards for the specification
why should we?
lots of information is locked up in HTML, not XML, not SVGL
most of it is invalid
it's important to know how to parse this invalid html, at the moment all of
this understanding is locked up in browers, you have to reverse hack mozilla
or IE souce, not a good situation to be in
HTML4 is underspecified
incisistent
does not match reality
for example the reason when google maps is launced, it didn't work in Safari
because no one knows how to parse HTML
another example is video, you need a proprietary plug in to watch videos in
you tube, this is nuts
what of xhtml, this requires XML, and for a lot of people this is also nuts
most browser vendors can't impliemtnXHTML2 inther browser
what is the proceedure?
identify use cases
and look for solutions to use cases
this is more contentions than you would think
html 5 looks like html
some changes
doctype is shrter
charset is supported
what interesting features are implimented
things like
nav could be ignored by screen readers
aside is designed for pull out boxes.
some of the reasons for these new items is a google search for favourite
class names used in html
these items closely follow a hughe number of entities that are already in
use
html 5 specifies more algorithms in more details
you can associate a visable caption with an image
(is called a legend, not caption for historical reasons)
finally good support for video
multiple encodings supported with source elements
with fallback content
autoplay attribute for audio
lots of DOM support, so you could write a media player in HTML 5
this is working already
new inline elements, e.g. datetime, progress, meter. (specify value through
attributes or get values via program??)
lot's of support for forms,
sample shows many high level form inputs with easy coding.
lots more, canvas element used by yahoo pipes
parsing, HTML privides a detailed parsing algorithim that can deal with
mis-formed html
it is designed with desktop browsers in mind.
implimentaion in html5lib (originally written in python, ported to Ruby).
you can use this on the web and see how the parser works there.
! Discussion
a desk at the computer lab and at the chemistry lab.
computationl lingustic chemistry
auto-detect language in chemistry papers to try to recognics chemical and
markup.
suppliment the mark-up from publishers.
can draw the chemical and annotating them overlayed over the paper
some problems are that there can be new names in papers,
comapct names, include extra hyphens, this program can deal with these kinds
of things.
also can use systematics parsing.
this is the core technology, you can do things like search for alkloids in
your paper, or document dump
this seems to run within a browser.
run the software over a corpus of about 100 papers, and created a search
engine out of this?? I Might be wrong about that.
can create an svg
can go from plain text to something like a connection layout using an
information rich markup
the RSC is using this software along with human-clanup to create markup of
chemistry papers.
can then to semantic search over papers.
Small natual languge processing trick
image we were interested in opiates,
we could just ask opiates to google
you can ask a question like "opiates such as" will give you a much better
return on results.
I just checkd this ad it works
there are many patterns like this, they are known as hurst patterns.
he did a pass over abstracts on pubmed for these kind of patterns to make a
network of relationships
there is not a connected graph
dot failes on large graphs, but the demo does show that you can automate the
discovery of reaction networks.
you can do reasoning on structure as well as process (now he mentions lot's
of chemical names that I know nothing about)
a few bits of wisom from this
most of the informaion has come from biochemists rather than chemists,
more biologists are into open science, and open database
chemisty has ben mostly captured by commercial interest,
hard to get free chemistry data.
next is to define what you are looking for?
you want to be able to evaluate how well the software has done
how do you post-annotate the documents?
in a lot of text there is a diffeernce between what you think the world
looks like and
how it is described in the literature, so even when you get people to ..
question about confidence levels,
the most recent piece of the software has confidence levels. rare events
don't provide
good confidence levels
it could depend on what you are looking for for,
Peter thinkgs that confidence is important for these systems
e.g. "a such has b" if b might be a chemical but you are not sure. if later
you find in your search that a is indeed a chemical it raises your
confidence that b is indeed a chemical
Q: is there any way to automate the acronyms of chemicals.
turns out that this is not allways nice. you can do some of this.
ARM microcontroler dev guy.
hard to use these processors, whanted to make something like this available
to normal people
want internet bluetooth connected devices
so they built something
he just plugged in a microcontoler with a wireless sensor
his machine things that it's a flash drive
he dregged over a binary
the device started blinking, this is the hello world of hardware hacking,
cool
what can you do now? well cool stuff obviously!
the other cool thing is that there is a compiler on a website
so you can talk about things in the context that people imagine them
this is built on top of c++
you can save straigt onto the device, as the computer just thinks that its a
hard drive.
now he has a flashing light on this
it's about giving people confidence in the tools that they are working with.
there was no software that needed to be installed, reducing the chain
before you get a response.
if you have a long chain, compiling and so forth,
by the time you get a response your confidence that you have done the right
thing
can be low.
low chain, high level of confidence.
he then hacks the light to flash at a vairable rate depending on how to
twist a switch. this normally takes about two days to get working
in the usual embeded programming systems.
the difference between this and lego mindstorms is you make this system do
anything you want
and it can talk to the internet.
hacked with a gps sensor in his garden, and it told him that he was 2 miles
away from his garden
via google-maps. It was outputting in degrees and minutes but needed to be
in digital.
how does it compare to sunspots?
the main audience for this is people who want to add some control the their
design, but that it is not their core competence.
For people who want to bridge the physicall world and the internet world.
could put an accelerometer in a rocket, and fly it for school kids. Change
'fly' a rocket, and it's cool, to 'fly a rocket, and learn about
acceleration'.
I am at Bar Camp Cambridge,
We have had the three word introductions and are just running through the
morning talks now. It's pretty cool.
The coffee is good, and cookies are great.
Let's see how the day goes.
Laura James
Alert Me.Com
might get too corporate.
trying to do the internet of things, internet access to small devices
they are implimetning today, and will be shipping later this year.
comes from R&D, but working in a shipping
they are going to ship a home security system, but they are actually
building a platofrm
that can connect anything that does not require full audio and video
using a mesh network that connect to a hub using a 'zigby'
output can be things like a lamp that has a color dependant state
the hub runs on linux with python on top
run by xml doc
can do things like tell you what day is bin day
is mains powered with battery backup
plugs in to ethernet
sounds just too cool
connects to a hubserver
they have gprs to connect to the internet if the router goes out
talks to a dialog server and a DB
there is data from loads of sensors in the home
goes to the hub,
the hub has a logic engine
go to the website and set it up to let you know
can send you a twitter when your doorbell rings
there are two really big trade offs
security vs usability
need to make sure that your home can not get hacked
and that your data does not get leaked
e.g. don't want people to have to type in the mac address of every entity in
the network
reliability vs extensabililyt
networ at home is based on zigby, low powered wireless network
open standards
smallet item is a zigby tile, about 2cm square. range will be about 300m
with one tile.
ca n cover a standard home with ome base station.
Q- could you set this up for a wet lab or other lab?
discussion
Q Gridle is raised along with RDFa
if you don't like the way that the xslt is workin you can make your own.
however most domain exprts can't write xslt
if you just let the domain experts create microformats you may leave the
ontological
definitions to people who are creating the xslt
AG says that there may be two different problems, address to addressbook
from page vs data harvesting
Matt talking about semanitic web for science, an introduction
XML, URI, namespaces, RDL OWL,
standards are often argued about but it;s just XML
we are supposed to be able to publish semantis data easily,
at the moment it's not just an extension but a whole other world,
people won't learn sapqrl
Matt believes that we can get the benifits of semantic now, but without
in any case, it's hard to get funding
should consider semantic web, rather than Semantic Web,
how do we add sematic value to existing dcuments,
enemble is the public interface originally to hte human genome project, but
there are lots of other gnese in there now
set up to fight against the patenting of genes
contains microformat in web output now!
enembleit's open source and open data
people understand how to look at a html source, so it's easy to add, the
value is there without the overhead
just add standard html classes, and let people do what they want to do with
it.
Q - will this lead to islands of parsing?
The idea is enseble is a big resource, and hope that people will follow so
create a defacto-standard
Can style it,
Parse it
can the website be the API?
can use a standard uri to access the data
cut down on the amount of code that gets written
there is more data on the web, then is available through the api
(we are not the only ones)
Q - What if your api is just your search
Flickr and Yahoo do this
Why not a pipes fo biology?
see microformats.org
have started a
bioformats.org
the microformats approach is slow, they have the idea of the process,
then you have to go through a standard, and it goes round after round,
we should just get started
(see operator plugin, the browser becomes the broker for data)
slideshare.net/mza