红楼梦李纨ppt:10 Years of Semantic Web: does it work in theory?

来源：百度文库编辑：中财网时间：2024/04/30 04:15:02

10 Years of Semantic Web:

does it work in theory?

Keynote at ISWC 2011,

Frank van Harmelen

Original PowerPoint source without the script is here.

Duck & Birdie

Apology for wrong title

Conference organisers always ask you for a title when you only have arough idea what you want to to talk about.
Then you get esprit d'escalier and realise what you should havesaid. Better title would have been "The Semantic Web: does it work in theory?"
When looking back at 10 years of Semantic Web, there's no questionabout the engineering feats we have achieved, and I'll have a bit moreto say about that later.
But: besides the engineering, did we learn any permanent,generic, scientific knowledge? Can we discover any laws thatarise from our decade of work?

Jeff Naughton slide

talking about science is always a pretentious thing to do(and I mean talking about science instead of just "talking science").
Jeff Naughton (leader in DB field) recently gave a talk at ICDE, whichwas all about how we have organised the scientific process, and heused the following health warning at the start of his talk. I'vefreely borrowed his slide to give you the same health warning:
But notice the flawed piece of logic in the final bullet:
if you don't give the keynote, you might still well be a washed up hasbeen. In which case I thought I might as well give the keynoteanyway.
talking about generic laws is certainly apretentious thing do (and certainly in computer science), so you havebeen warned.

Philosophical confession

and even worse: before I can speak about our community discovering anyscientific laws, I must first explain how I think about scientificlaws, and how there could ever be any scientific laws in computerscience.

Telescope slide

My view on science is that of a "realist".
Quote: philosophical realism is the belief that our reality,is ontologically independent of our conceptualschemes, linguistic practices, beliefs, etc.
believes in a world out there, existing independently from us
task of science is to find the laws that govern that independentlyexisting world.
so, I'm not a *constructivist,
Constructivists, maintain that scientificknowledge is constructed by scientists and not discovered from theworld. Constructivists claim that the concepts of science are mentalconstructs .

I do not believe that scientific knowledge is just a mental or social construction, and that our scientific laws have only relative and subjective value.

Clearly, such a view realist view, with laws describing anindependently existing world would apply to physics, but what does ithave to do with computer science?

Laws about the Information Universe

Well, I believe that data, information, and knowledge have inherentstructure & properties, and
that there are laws that govern these structures & properties.
I believe we can discover these laws (just like we can discoverphysics laws).
thus: just like the physical universe "exists out there" (and is notjust a mental or social or cultural construction), so is the informationuniverse "out there" (and is not just a mental or social or culturalconstruction).
Of course, many of the actual objects in the physical universe are ourown construction (billiard balls, space ships, nuclear powerstations), but the laws that govern these objects are not justmental/social constructs, these laws are "objective", "real", they are"out there to be discovered".
In the same way, the actual objects in the informational universe areour own constructs (programs, databases, languages, URI's), but the lawsthat govern these objects are not just mental/social constructs,these laws are "objective", "real", they are "out there to bediscovered".

Distorted Mirror slide

of course it is the case that our perception of these laws at anyparticular time during our scientific progress will be somehowcoloured by our perceptions and social and mental constructs.
what we perceive to be the universe may well be coloured
- by the limitations of our cognitive machinery,
- by culturally shaped expectations and desires
- by the limitations and distortions of our experimental apparatus
and in general it is hard to distinguish the "real" laws about theexternal universe from cognitive artifacts and observational bias.
but that doesn't imply that all laws are only fictions of ourculturally biased imaginations.
and it is the role of science to continuously chip away at thesecognitive, cultural and historical biases to find out what the "real"laws are like.

Now, the parallel with physics is of course a bit pretentious.
Physics is a very mature science, with a high degree ofmathematisation.
and it will be a long time before Computer Science willreach the same degree of maturity,

Physics slide

and before we can write the beautiful sets of concise equations aboutthe information universe.
we cannot yet hope for such beautifully mathematised laws,in such a concise language that fits on a very compact space

in fact, Computer Science is a very young field, and I think thatinstead of comparing ourselves with physics, maybe we are morecomparable something like alchemy,

Alchemy slide

historians of science describe alchemy as a "protoscience"
it was not just a failure to turn lead into gold,
it was a "protoscience",
searching for proper goals,
proper conceptual framework
developing their experimental apparatus
and this is now recognised as having lead to the more mature sciencesof chemistry and physics that we now know.
and in fact, one of the originators of modern science, Isaac Newton,was an active alchemist.
So there's really no negative connotation to the description ofcomputer science as alchemy, it just describes the fact that ourscience is very young, and that perhaps we have not discovered many ofthe laws about the information universe yet.

So, the central question that I will boldly (and perhaps ratherfoolishly tackle in the rest of this talk is this one:

Question slide

Did a decade of Semantic Web work help to discover any Computing Sciencelaws?

What have we built over the past 10 years

So let's first take a look at what we actually built in the pastdecade.

We can characterise what we have built over the past 10 years in 3parts:

Babel Towers slide

We built a whole lot of vocabularies (including the languages torepresent them, the tools to construct and deploy them, etc)

Naming slide

We built a whole lot of URI's to name lots of things in the world, infact, many billions of URI's

Neural Network slide

We connected all of these in a very large network

Engineer slide

But all of these have been mostly treated as one very large engineeringexercise.

And it's obvious that as engineers we have succeeded.

Governments (and not just US and UK anymore)
BBC (worldcup football web site)
Retail (GoodRelations),
search engines (schema.org)
Oracle (DB product),
publishing industry (e.g. New York Times)
Electricity de France (personalised energy saving plans for 350.000customers a day)
etc, just look at my Good News Quiz slide deck at slideshare for manymore examples.

Now, remember the goal of this talk is:

Did we learn any science, ideally science that is valid beyond theparticular artifacts that we have so successfully built over the past10 years?

10 years experiment

So what I'm going to do now, is to treat the past 10 years of SemWebengineering as one giant experiment:

designing languages for representing information and knowledge on the web
building very many ontologies in all kinds of domains
building many ontologies in a single domain (eg medicine)
building DBPedia,
building, populating and linking the Linked Data cloud
the widespread use of RDF, RDFS and OWL across very many domains(these are now the most widely used knowledge representation languagesever, by a very large margin).

So take that as a giant experiment and ask the question:

If we would build the Semantic Web again, surely some things would endup looking different, but are there things that would end up looking thesame, simply because they have to be that way?

for example

languages full of angle brackets. If you reran the experiment, surelyit would be different, because it's just an accidental choice. Thatfeature isn't governed by any "law in the Information Universe"(or at least not one that I can imagine).
but other features of what we've built what turn out in essentiallythe same way,
you would find the same pattern over and over again, every time we ranthe experiment.
And that is because they are governed by fundamental laws that rulethe structure and behaviour of information and knowledge.

So, let's see if we can discover any of such laws, such stable patternsthat we would rediscover by necessity every time we ran the experiment.

Now, fortunately, we don't have to start from scratch. Some well knownlaws of Computer Science already can be seen to apply to our 10 yearexperiment as well. I'll give you two examples:

Zipf law

Zipf law says that many datasets have long tail distributions
Roughly this means that the vast majority of some phenomenon ofinterest is caused by a vast minority of items,and that the vast majority of items (the long tail) each barely contributeto the phenomenon

We know from our 10 year long experiment that our datasets also obeyZipf's law, and this has been well documented in a number of empiricalstudies.

this phenomenon is sometimes a blessing, sometimes a curse
- nice for compression
- awful for load balancing

It's important to realise that knowing Zipf's law helps us deal with thephenomenon, both in the cases where it's a blessing (so we can exploitit) and in the cases where it's a curse (so that we can try to avoidit).

that's why it is worth trying to discover these laws.

Here's a second well known law from Computer Science:

Use vs Re-use

Another known law also applies:

Use vs reuse: use = 1 - re-use
(of course don’t take linear form literally)
lesson from ontologies
Law of conservation of misery, you can’t have it both ways

OK, so now I'll start proposing some "laws" that originate from our ownfield, and from our own 10 year experiment:

Factual knowledge is a graph

the dominant life-forms in our information space is the graph.

The vast majority of our factual knowledge consists of simplerelationships between things,
represented as an ground instance of a binary predicate.
And lots of these relations between things together form a giantgraph.

Now this may sound obvious to us in this community, but stating thatfactual knowledge is a graph is not obvious at all.

For example, if you would ask this question to a DB person, they'd say:factual knowledge is a table.And a logician would say: knowledge is a set of sentences.

I know that you can convert one form into the other

every table is a (simple) graph, and every graph can be hacked intotable format (but not so nicely)
every graph is a (simple) set of sentences, but not always the otherway round,

but that's a bit beside the point: just because all our programminglanguage are Turing complete doesn't mean that there aren't very realand important differences between them.

So in the same way, graphs, tables and sets of sentences are all reallydifferent representations, even with the theoretical transformations.

And the law that I propose says that factual knowledge is a graph
and the DB people may think it's a table, but actually, many of theirtables with lots of foreign keys are really encoding graphs.
and the logicians may think it's a set of sentences, but thatrepresentation is wildly overshooting the mark (and typically not evenaimed at or used for representing factual knowledge)

So let's switch to a less controversial law;

Terminological knowledge is a hierarchy

this law has been rediscovered in knowledge representing andinformation modelling many times over.
the details may differ, but the notion of simple hierarchies withproperty inheritance is widely recognised as the right way torepresent terminological knowledge.

And this observed repeated invention, makes this a much stronger law.

So to say: this experiment has already been rerun many times in thehistory of computer science, and this has proven to be a stablefinding.

So now I've talked about both factual and hierarchical knowledge. Buthow do these two types of knowledge compare?

Terminological knowledge is much smaller than the factual knowledge

or alternatively, in a picture:

Small hierarchy, big graph

And again, this may sound obvious to all of us in this audience, butreally it wasn't all that obvious before we started the 10 yearexperiment. And in fact, it sharply contrasts with a long history ofknowledge representation

traditionally, KR has focussed on small and very intricate sets ofaxioms: a bunch of universally quantified complex sentences
but now it turns out that much of our knowledge comes inthe form of very large but shallow sets of axioms.
lots of the knowledge is in the ground facts, (not in the quantifiedformula's)

And with this law, we can even venture to go beyond just a qualitativelaw, and put some quantitative numbers on it.

Jacopo numbers

Here are some numbers obtained by a Jacopo Urbani, a PhD student in ourlab (and some of you will have seen these figures in his presentationyesterday), in the session on reasoners:

three of the largest datasets around (two real, one artificial)
compute full deductive closure of schema hierarchy only
runtime counted in seconds or small number of minutes
then compute full deductive closure of schema + instances
then runtime counted in hours

notice that this is now using an interesting measure of "size" here:we're not just counting triples, but we're measuring somehow thecomplexity of these triples by seeing how expensive it is to dodeduction over them.

And we observe that the graph is 1-2 orders "larger" or than theschema.

So, if we revisit the diagram I sketched before:

Small hierarchy, big graph

then the size of the hierarchy (although already small) is actuallystill vastly overstated. If we have to believe the numbers on theprevious slide, the real size of the terminological knowledge wrt tothe size of the factual knowledge is like this

Now the black dot representing terminological knowledge is 2 orders ofmagnitude smaller than the size of the factual graph.

To put this in a slogan:

"It's the A-box, stupid"
knowledge is much more dominated by specific instances than by general rules

Apparently, the power of represented knowledge comes from from representing a very small set of general rules that aretrue about the world in general,

together with a huge body of rather trivial assertions thatdescribe things as they happen to be in the current world (even thoughthey could easily have been different).

And again, understanding this law helps us to design our distributedreasoners. It is the justification that when building parallelreasoners, many of us just take the small schema and simply replicate itacross all the machines: it's small enough that we can afford to dothis.

We've already seen that the factual knowledge is very large but verysimple. We can ask ourselves how simple or complex the terminological knowledgeis.

Terminological knowledge is of low complexity

When we go around with our data telescope, and we try to observe whatreal ontologies look like when they are out there in the world, what dowe see?

Telescope with OWL

We see very wide spread of expressivity in ontologies, all the way fromundecidable OWL Full to very simple RDF hierarchies. But this spreadis very uneven: there are very many lightweight ontologies, and very fewheavyweight ones.

This is of course well captured by Jim Hendler's timeless phrase:

A little semantics goes a long way (JH)

And combining both this law and the previous law, we can now see thathis "little semantics" means both: low expressivity and low volume

We could also phrase this as "the unreasonable effectiveness oflow-expressive KR"

And there is another way in which this law is true:

Of course it is nice that we can express also the highly expressiveontologies in our languages (like OWL2).
And some of these languages have very scary worst-case complexitybounds.
But when writing ontologies in these expressive languages, we oftenfind that the behaviour of the reasoners for these expressivelanguages perform quite well.
In other words: the information universe is apparently structuredin such a way that the double exponential worse case complexity boundsdon't hit us in practice.

If the world of information would be worst case, we wouldn'’t have beenable to deal with it, but apparently the laws of information make theworld such that we can deal with the practical cases.

So: for highly expressive KR we could say that it works better in practice then in theory

The next law has of course been staring us in the face ever since westarted this work on the semantic web (and it has been staring databasepeople in the face for quite a bit longer):

Heterogeneity is unavoidable

It's for a good reason of course that I choose a Tower of Babel tosymbolise our vocabularies:

Tower of Babel slide

A crucial insight that perhaps distinguishes the work in this communityfrom many earlier pieces of work is that instead of fightingheterogeneity, we have seen that it's inevitable anyway, and that wemight as well live with it.

And actually, I would claim that the fact that we have embraced thislaw (instead of fighting it) has enabled the enormous growth of the Webof Data.

Compared to many previous attempts, which try to impose a singleontology, the approach of let a 1000 ontologies blossom has been a keyfactor for the growth of our datasets.

But of course, embracing heterogeneity is nice when you are publishingdata, but it's not so nice when you are consuming data. So heterogeneityis not only an opportunity, it's also a problem. And the question is:can we solve that problem.

Heterogeneity is solvable

I'll argue that yes, heterogeneity is solvable, but maybe not in the waythat our community likes to hear).

We can see what's going on by looking at the Linked Data cloud.

LOD cloud

This is the picture we all know so well,
it's carefully hand crafted, andkudos to the hard work that went into it,
but actually the picture is also somewhat misleading.
It (no doubt unintentionally) suggests an evenly spread out cloud oflots of colourful datasets.
The true image of "let a 1000 ontologies blossom".
It suggests lots of connections between lots of datasets

But that's not actually the structure of the Linked Data cloud.

Instead, the Linked Data cloud looks like this:

circular cluster map

This is a picture generated on the LOD cloud as it was last week,
it shows a heavily clustered structure.
And here's the same picture,
but now with some more emphasis on displaying the clusters;

linear cluster map

so, LOD cloud is not evenly connected
(unlike traditional LOD cloud diagram),
but highly clustered
with strong links inside the clusters
and low links between the clusters)
And how did these clusters come about? T
not by ontology mapping,
but mostly by a combination of social, economic and cultural processes:
Why is SNOMED so important in the medical domain?Partly because it was the first to be around
Why will schema.org be so important:Because it carries the economic weight of 90% of the web-search market

etc.

Does that mean that ontology mapping should be abandoned?
No, it doesn't.
Many of the links inside these clusters are created byalgorithmic ontology mapping.
But I would claim that this is only possible inside such acluster, ie the fine-grained structure of the graph,
whereas the the course-grained structure of the graph isdetermined through social, economic and cultural processes.

For the next law, we must remember that we are not only a semanticweb community, but also a semantic web community. So let's look at distribution:

speed decreases with distribution, centralisation is necessary

The original dream of this community has sometimes been formulated asturning the Web into a database.

earth globe slide

But unfortunately, observations from our 10 year experiment tell us ratherthe opposite:

the Web is a good platform for data publication,
but it's a pretty bad platform for data consumption.

Indeed, the distributed model for data-publishing is a keyfactor that has enabled the growth of the Web and indeed of the Web ofData, but for data-consumption, physical centralisation workssurprisingly well.

And this is not just us finding this out.

Google is combining our distributed publishing with their centralisedprocessing,
Facebook is combing our distributed publishing with their centralisedprocessing,
Wikipedia, etc.
So, you might think that centralisation would become a bottle neck.wrong, distribution is the bottle neck,

The Web is not a database, and I don't think it ever will be.

So if all this massive data has to be in one central place to processit, how are we going to cope? Well, the good news from the InformationUniverse is that

speed increases with parallelisation

at least for our types of data. I'll show you how well this works.

Jacopo graph 1

This was the performance of triple stores on forward inferencing,somewhere in 2009.

Jacopo graph 2

and this is how much parallelisation improved the performance. Soapparently, the types of knowledge and data that we deal with are verysuitable for parallelisation.

And it's interesting to see that the previous laws actually help us tomake this possible: the combination of

factual knowledge being a graph
terminology being hierarchical
terminological knowledge being small
and of low complexity

(which were my proposed laws 1-4) make the design of our parallelreasoners possible.

So, that brings me to the final law

knowledge is layered

Contrary to the other laws, this law does not come so much yet from ourown observations in this field. But other fields tell us that knowledgeis like a set of Russian dolls:

Russian dolls

with one doll nested inside the other.

From fields like

Cognitive Science,
Logic,
Linguistics,
Knowledge Representation

we know thatstatements of knowledge need not only refer to the world, but that theymay refer to other bits of knowledge, creating a multi-layeredstructure.

The examples are plenty: we may say that a fact in the world is true,and then we can say

what the certainty of that statement is,
or what the provenance of that statement is,
or what our trust in that statement is
or at what date that statement was made,etc.

Now curiously enough, there is lots and lots of demand in ourcommunity for this kind of layered representation, but ourrepresentation language serve this need very poorly.Re-ification can be seen as a failed experiment to obtain suchlayering, and now people are abusing named graphs because there isnothing better.

So, being more aware of this law would have helped us to create betterrepresentation language sooner.

So, we're reaching the end of the talk, final slide in sight:

Final slide

and I'll end with the same slide that I started with:

does it work in theory?
well, what theory?

My hope for this talk is that - many of you might disagree with some of my proposed "laws"- and some of you may even disagree with all of them

but regardless of that,
I hope that I will have prompted you to start thinking about thenotion of laws in the Information Universe:
- that such laws may exist
- and it's our task to discover them

And this has very concrete impact on how we organise our community:

it's an invitation to journal editors and conference chairs to alsoconsider papers that have the ridiculously ambitious aim to discussone of these laws
and it's also a challenge to you:

Of course we won't really redo the last 10 years of our experiment, butwhen you do your research and write your papers, try to think about whatare the repeatable patterns, these laws, and try to separate theincidental choices you make from the fundamental patterns you areuncovering.

Acknowledgments

Ten Years of Sexy Anumber of years ago thousends of years ago| years应该加s吗? Validity of Noun Semantic Network and Integration for Word-Sense Disambiguation semantic design I had twelve years of education. each of those three years是什么意思? We have a f____ of twenty years. 几千年的历史到底用 thousands of years?还是hundreds of years? NewYork has a short history of some three hundered years. 帮我找找Best Years Of Our Lives的歌词 shall remain effective for a period of three years minimum of 3 years working experience, automotive industry preferred Trees have grown and died for__. A.millons of year B.millons of years C.millons years选哪个?为什么? Half of the workers here are under twenty __B__ A.years old B. years of age 为什么选B semantic influence on vocabulary SEMANTIC的定义 Two occasions of ___ declining activity were registered in the years of 1929 and 1987. Man believed for thousands of years that earth was the c___ in space. Furniture must be able be able to provide 5 years of trouble-free daily use. 用英语解释句子8.Our country has a hospital of 5000 years. It shows what has happened to the forests of the USA in the last 350years. 英语填空:Shanghai( )a small town hundreds of years ago . Now it ( )a large city. Nearly 50% of the Irap population,which is over 20 million, is under 15 years old.