<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Totuba Labs</title>
	<atom:link href="http://labs.totuba.com/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://labs.totuba.com</link>
	<description>Tracking Totuba&#039;s research activities</description>
	<lastBuildDate>Thu, 22 Apr 2010 07:19:56 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Porting Totuba MySQL data to RDF</title>
		<link>http://labs.totuba.com/?p=51</link>
		<comments>http://labs.totuba.com/?p=51#comments</comments>
		<pubDate>Thu, 22 Apr 2010 07:19:56 +0000</pubDate>
		<dc:creator>carsten.ullrich</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[linkeddata]]></category>
		<category><![CDATA[semantic]]></category>

		<guid isPermaLink="false">http://labs.totuba.com/?p=51</guid>
		<description><![CDATA[In the last post, I talked about our goal: to represent our course data as Linked Data. It is an ongoing process; here is a description of our first step, the &#8220;conversion&#8221; of Totuba&#8217;s MySQL data into RDF.
One good starting point for how to proceed when generating Linked Data is &#8220;How to Publish Linked Data [...]]]></description>
			<content:encoded><![CDATA[<p>In the last post, I talked about our goal: to represent our course data as Linked Data. It is an ongoing process; here is a description of our first step, the &#8220;conversion&#8221; of Totuba&#8217;s MySQL data into RDF.</p>
<p>One good starting point for how to proceed when generating Linked Data is &#8220;<a href="http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/">How to Publish Linked Data on the Web</a>&#8221; by Chris Bizer, Richard Cyganiak and Tom Heath. The document introduces the background, concepts and mechanisms, and also has recipes for <a href="http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/#relationaldata">quickstarting with MySQL data</a>. Look for it near the end of the document.</p>
<p>My first goal was to simply make our data available as RDF, based on our original data model.</p>
<p>Actually, this task is amazingly easy if you know which product to use. The <a href="http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/index.html">D2R server</a> provides clear instructions and within minutes our MySQL data was available as RDF and awaiting  SPARQL queries.</p>
<p>Here is a screenshot of the results of a query asking for the name of all organizations offering courses in Shanghai:<br />
<div id="attachment_68" class="wp-caption alignnone" style="width: 710px"><img src="http://labs.totuba.com/wp-content/uploads/2010/01/Snorql-Exploring-http-localhost-2020-sparql_1262935550908.png" alt="All Mandarin courses in Shanghai" title="Snorql- Exploring http---localhost-2020-sparql_1262935550908" width="700" class="size-full wp-image-68" /><p class="wp-caption-text">All Mandarin courses in Shanghai</p></div></p>
<p>In the next posts I will describe our ontology that refines the original database structure.</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.totuba.com/?feed=rss2&amp;p=51</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>First steps towards Linked Course Data</title>
		<link>http://labs.totuba.com/?p=47</link>
		<comments>http://labs.totuba.com/?p=47#comments</comments>
		<pubDate>Mon, 11 Jan 2010 05:54:31 +0000</pubDate>
		<dc:creator>carsten.ullrich</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[linkeddata]]></category>
		<category><![CDATA[semantic]]></category>

		<guid isPermaLink="false">http://labs.totuba.com/?p=47</guid>
		<description><![CDATA[Totuba has collected an impressive amount of data about courses in China, for Chinese (Mandarin), Business Management and others. It is not just a copy of some data found in the Web, but has been carefully selected and processed to allow our users to compare courses and quickly find the one course that meets their [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.totuba.com/">Totuba</a> has collected an impressive amount of data about courses in China, for <a href="http://www.totuba.com/en_GB/8/1/find+courses/Chinese+(Mandarin)/all+majors/all+geographic+regions/China/all+states/all+cities/all+districts/in+class">Chinese (Mandarin)</a>, <a href="http://www.totuba.com/en_GB/8/1/find+courses/MBA/all+majors/all+geographic+regions/China/all+states/all+cities/all+districts/in+class">Business Management</a> and others. It is not just a copy of some data found in the Web, but has been carefully selected and processed to allow our users to compare courses and quickly find the one course that meets their requirements.</p>
<p>We now are exploring whether/how this data and our services can profit from making the data available to third-parties. It is widely argued that exposing data/services to the world actually increases its value and is of benefit for the providing party. Twitter is often used as an example, as their interface allows third-party applications to provide alternative interfaces, but also to build new kinds of applications on top of Twitter.</p>
<p>Instead of starting with a Web 2.0 style interface with our own function names that interested developers have to become accustomed to, we decide to first take the <a href="http://www.w3.org/2001/sw/">Semantic Web/Web of Data</a> approach. This means that we will build on top of the standards developed by the W3C and have a <a href="http://www.w3.org/TR/rdf-primer/">RDF</a> representation of our data with a <a href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a> access to query and retrieve it. This way we will implement according to a standard way to publish our data in the Web, just like HTML is a standard way to publish content in the Web.</p>
<p>We also want to link our data to other data already out there. So, instead of having our own entity for a city, e.g, &#8220;Shanghai&#8221;, we will reuse entities already out there, for instance in <a href="http://dbpedia.org">DBPedia</a>, a semantic version of Wikipedia. What is the advantage? Well, by reusing existing entities, people using our data will immediately know what we are talking about, e.g., <a href="http://en.wikipedia.org/wiki/Shanghai">Shanghai</a>, the city in China, or about <a href="http://en.wikipedia.org/wiki/Master_of_Business_and_Management">the Master of Business and Management</a>. They can identify that their data talks about the same thing, for instance that the content of their book is also relevant for a <a href="http://en.wikipedia.org/wiki/Master_of_Business_and_Management">Master of Business and Management</a>. They can combine their and our data and come up with new applications. So can we.</p>
<p>Well, that is the theory. We are keen to learn what comes out of it. But first we have to transform our data in Linked Data. More about that in one of the next posts.</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.totuba.com/?feed=rss2&amp;p=47</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper accepted: Opportunities for AI in Intelligent Web-based Technology-Supported Learning</title>
		<link>http://labs.totuba.com/?p=42</link>
		<comments>http://labs.totuba.com/?p=42#comments</comments>
		<pubDate>Mon, 24 Aug 2009 06:01:10 +0000</pubDate>
		<dc:creator>carsten.ullrich</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[semantic]]></category>
		<category><![CDATA[workspace]]></category>

		<guid isPermaLink="false">http://labs.totuba.com/?p=42</guid>
		<description><![CDATA[A paper that takes the work that we are doing at Totuba as an example of the power of the Web 2.0 and Semantic Web was accepted at AICI&#8217;09, the 2009 International Conference on Artificial Intelligence and Computational Intelligence. The conference will take place in Shanghai, 7./8. November 2009.
Here is the abstract:
The Web is still [...]]]></description>
			<content:encoded><![CDATA[<p>A paper that takes the work that we are doing at Totuba as an example of the power of the Web 2.0 and Semantic Web was accepted at <a href="http://wism-aici2009.shiep.edu.cn/">AICI&#8217;09, the 2009 International Conference on Artificial Intelligence and Computational Intelligence</a>. The conference will take place in Shanghai, 7./8. November 2009.</p>
<p>Here is the abstract:<br />
<em>The Web is still changing at a rapid rate. Principles that define the Web 2.0 are now better understood than a few years ago, but continue to evolve. The Semantic Web is coming of age, resulting, for instance, in a significant amount of data now being available as Linked Open Data. Advanced Web services make sophisticated functionality such as entity extraction and location-data available for free. Against this background, we feel that the potential of today&#8217;s Web for intelligent technology-supported learning is under-exploited. Especially for AI exciting research opportunities arise from building on top of available data and services. In this paper, we sketch the current state of the art of Web data and services, highlight potential future development and point out opportunities and challenges for Web-based technology-supported learning.</em><br />
<a href="http://www.carstenullrich.net/pubs/Ullrich09Opportunities.pdf"><br />
Download the preprint here</a>. What do you think? I&#8217;m looking forward to your comments!</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.totuba.com/?feed=rss2&amp;p=42</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Presentation at the UDS-SJTU Joint Research Lab for Language Technology</title>
		<link>http://labs.totuba.com/?p=8</link>
		<comments>http://labs.totuba.com/?p=8#comments</comments>
		<pubDate>Fri, 31 Jul 2009 07:02:40 +0000</pubDate>
		<dc:creator>carsten.ullrich</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[workspace]]></category>

		<guid isPermaLink="false">http://labs.totuba.com/?p=8</guid>
		<description><![CDATA[I was invited to give a presentation at a workshop at the UDS-SJTU Joint Research Lab for Language Technology, a joint research lab of Saarland University, Germany and Shanghai Jiao Tong University, China. I gave a brief overview on how we have been building Totuba&#8217;s research workspace based on existing services and data. It an [...]]]></description>
			<content:encoded><![CDATA[<p>I was invited to give a presentation at a workshop at the UDS-SJTU Joint Research Lab for Language Technology, a joint research lab of Saarland University, Germany and Shanghai Jiao Tong University, China. I gave a brief overview on how we have been building Totuba&#8217;s research workspace based on existing services and data. It an interesting time to be an AI researcher: thanks to the <a href="http://linkeddata.org/">Linked Open Data Initiative</a> huge amounts of interlinked machine-processable data are available in Web; similarly Web services exist that enable sophisticated processing of text, for instance <a href="http://www.opencalais.com/">OpenCalais</a>, a service that extracts the topics a text is about. In plain English, this means that we were able to reuse these tools and data for Totuba and could quickly build a prototype that would have been prohibitively expensive only a few years ago.<br />
In my talk I stressed that a landscape that encourages reuse creates advantages for research / commercial applications.<br />
Here are the slides:</p>
<div id="__ss_1699283" style="width: 425px; text-align: left;"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" title="Rapid Prototyping of a Semantic-Web-based Research Workbench" href="http://www.slideshare.net/ullrich/rapid-prototyping-of-a-semanticwebbased-research-workbench">Rapid Prototyping of a Semantic-Web-based Research Workbench</a><object width="425" height="355" data="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=workbenchcoli-090708235232-phpapp02&amp;stripped_title=rapid-prototyping-of-a-semanticwebbased-research-workbench" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=workbenchcoli-090708235232-phpapp02&amp;stripped_title=rapid-prototyping-of-a-semanticwebbased-research-workbench" /><param name="allowfullscreen" value="true" /></object></p>
<div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">documents</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/ullrich">Carsten Ullrich</a>.</div>
</div>
<p>In the following, a brief overview on some of the other talks.</p>
<p>Prof. Hans Uszkoreit presented Hybrid Machine Translation, which combines the two leading paradigms of machine translation: statistical machine translation and rule-based machine translation. Both ways have their advantages: statistical systems are ahead in closed domains, while in an open domain, rule-based systems do better. The main idea of hybrid machine translation is to substitute phrases from the rule-based translation with phrases from the statistical machine translation.<br />
Prof. Uszkoreit also presented Project EuroMatrix, a championship for translation with European languages.</p>
<p>Feiyu Xu explained how to use &#8220;seeds&#8221; to extract information from a text, e.g., the seed (ElBaradei, Nobel prize, peace, 2005) can help find similar information. Feiyu showed how important it is to select the right seed and how negative seeds (e.g., (nominated, Noble Prize)) can improve the precision (but then recall suffers).</p>
<p>Xiwen Cheng presented the EU project <a href="http://www.ofai.at/rascalli/project/project.html">RASCALLI</a> and gave a demo of their gossip agent.</p>
<p>Jun Liu gave an overview of the Chinese opinion analysis evaluation (COAE 2008), organized by the Chinese Information Processing Society China. This could have been a very interesting starting point to learn more about this topic, but I was unable to find any Web page for it, just <a href="http://www.nlpr.ia.ac.cn/2008papers/gnhy/nh10.pdf ">one paper</a>. Additionally, the results are completely anonymous so you don&#8217;t even know who performed at what level. Not really useful. Related competitions for English are TREC and NTCIR MOAT (Multilingual opinion analysis task).</p>
<p>Hongyan Song discussed the problem that evaluating opinion mining requires annotated opinioned corpus, which is labor intensive to produce. He showed how active learning can speed up the annotation. In his approach, the active learning algorithm queries the user for labels in a training data, an approach suitable for situations in which unlabeled data is abundant but labeling data is expensive. The basic idea is to take those instances that the classifier is most unsure about and query the user about them. He uses <a href="http://www.keenage.com/html/e_index.html">HowNet, a Chinese common-sense knowledge system based on WordNet</a> for this purpose.</p>
<p>Xiaojun Zhang presented an iterative reinforcement approach for attribution-sentiment pair extraction. His approach starts with attribution/sentiment seeds,  and then retrieves potential other attribution/sentiments from the training data. He too uses HowNet to compute the similarity to the seeds.</p>
<p>I’m not well-versed in Machine Translation, so this workshop helped to get an idea of what has become possible today and an introduction to how it is done. At Totuba we are looking at many ways to automatically extract information about courses from the Web. However, we are faced with many challenges, as there is lack of standardisation and open source libraries that we can draw data from. Prof. Uszkoreit suggested to more deeply investigate how people encode addresses, pricing information, etc, not only in the language course domain, but in other domains, too, to enlarge the amount of examples we can use for training. That feedback was well received and has been included in our range of investigations already in progress. Prof. Uszkoreit cited the work of Frank Puppe on <a href="http://textmarker.sourceforge.net/">Textmarker, a system for learning meta knowledge for rule-based knowledge-extraction</a>.</p>
<p>To sum up, a very interesting workshop and one that shows how research can be facilitated and enabled by existing tools, libraries and data.</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.totuba.com/?feed=rss2&amp;p=8</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Research topics at Totuba</title>
		<link>http://labs.totuba.com/?p=5</link>
		<comments>http://labs.totuba.com/?p=5#comments</comments>
		<pubDate>Tue, 21 Jul 2009 05:07:30 +0000</pubDate>
		<dc:creator>carsten.ullrich</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[semantic]]></category>
		<category><![CDATA[toolkit]]></category>
		<category><![CDATA[workspace]]></category>

		<guid isPermaLink="false">http://labs.totuba.com/?p=5</guid>
		<description><![CDATA[I am involved in two major research projects here at Totuba, and in the following posts, I will present these in more detail. The first one is the Totuba Toolkit, a set of tools that make life easier for learners and researchers. The toolkit consists of the following components:

The Totuba Research Assistant helps students to [...]]]></description>
			<content:encoded><![CDATA[<p>I am involved in two major research projects here at Totuba, and in the following posts, I will present these in more detail. The first one is the Totuba Toolkit, a set of tools that make life easier for learners and researchers. The toolkit consists of the following components:</p>
<ul>
<li>The Totuba Research Assistant helps students to capture, categorize and reference useful information off the web.</li>
<li>The Totuba Knowledge Repository stores information saved from Totuba Research Assistant and acts as the student&#8217;s long-term personal information repository.</li>
<li> The Totuba Workspace provides support while writing research and term papers. It extracts the main topics from your document and suggests further reading about these topics. The Workspace also shows topics that are related to your topics and how. This enables you to quickly find additional information you need.</li>
</ul>
<p>The Totuba Toolkit is in an open, rolling development mode. If you want to give it a try, please <a href="mailto:toolkit@totuba.com">send an email to toolkit@totuba.com</a>.</p>
<p>Our main goal was to find out what can be done with today&#8217;s technology. For instance, the Totuba Workspace is based on<a href="http://en.wikipedia.org/wiki/Semantic_Web"> Semantic Web technology</a>. We won&#8217;t go into detail right now, but we are using <a href="http://dbpedia.org/">DBPedia</a>, a service that represents <a href="http://en.wikipedia.org/wiki/Main_Page">Wikipedia </a>in Semantic Web format, that is in a way this encyclopedia can be used by machines automatically.</p>
<p>The second research project we are working on is also related to the Semantic Web. Here at Totuba, we have <a href="http://www.totuba.com/">a public database containing thousands of courses (e.g., Mandarin courses, MBA, etc.)</a>.  We collected this data from various sites and manually curated it so that you as a learner can easily find and compare the course best suited for you. This is valuable data. We believe that this data will be most beneficial to us and you if we share it. One principle underlying the big shift from the &#8220;Web 1.0&#8243; to Web 2.0 was that Web sites became more open. <a href="http://twitter.com">Twitter </a>is the best example for that. Twitter makes its functionality (posting updates, reading updates) and data (the updates) available to other sites and tools via Web interfaces. This allowed a garden of services and tools to grow around Twitter. Desktop applications, search engines, <a href="http://beta.twittervision.com/">visualization sites</a>, even <a href="http://twistori.com/">art projects</a>.<br />
The idea behind the Semantic Web is to take this openness one step further. Today, Twitter has its own interface, Flickr has its own, etc, etc. The Semantic Web specifies a set of standards how to make your data available so that other services can access it.<br />
That is what we want to explore at Totuba. What will happen once we make valuable data like ours available. What kind of data exchange opportunities and mash-up ideas will our efforts generate? More about this in a later post.</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.totuba.com/?feed=rss2&amp;p=5</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Welcome to Totuba Labs: Blogging our investigations and innovations</title>
		<link>http://labs.totuba.com/?p=3</link>
		<comments>http://labs.totuba.com/?p=3#comments</comments>
		<pubDate>Tue, 21 Jul 2009 04:45:15 +0000</pubDate>
		<dc:creator>carsten.ullrich</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[general]]></category>

		<guid isPermaLink="false">http://labs.totuba.com/?p=3</guid>
		<description><![CDATA[Welcome to the Totuba Labs, the new home for discussion of Totuba&#8217;s research and development activities. Totuba is a Shanghai-based start-up company that supports education ventures in China via its consulting services, the totuba.com course search and comparison website, the China Education Blog, and its research and development effort to improve educational tools and practices. Here in [...]]]></description>
			<content:encoded><![CDATA[<p>Welcome to the Totuba Labs, the new home for discussion of Totuba&#8217;s research and development activities. <a title="Labs link to Totuba Corporate" href="http://corporate.totuba.com/about/">Totuba</a> is a Shanghai-based start-up company that <a title="Labs link to Corporate services" href="http://corporate.totuba.com/supporting-china-education-ventures/">supports education ventures in China</a> via its <a title="Labs link to Corporate services" href="http://corporate.totuba.com/supporting-china-education-ventures/">consulting services</a>, the <a title="Labs link to Totuba Course Search" href="http://www.totuba.com/">totuba.com course search and comparison website</a>, <a title="Labs link to China Education Blog" href="http://www.chinaeducationblog.com/">the China Education Blog</a>, and its research and development effort to improve educational tools and practices. Here in this blog, Carsten Ullrich, that is me, will keep you up-to-date about how we apply the latest research results here at Totuba to serve these goals.</p>
<p>I have over ten years experience in the field of Artificial Intelligence (AI) with a focus on web-based learning. As a researcher at <a href="http://www.uni-saarland.de">Saarland University</a> and the <a href="http://www.dfki.de">German Research Center for AI (DFKI)</a>, I have worked on international projects and was a founding member of the <a href="http://www.activemath.org">ActiveMath </a>group. Currently, I&#8217;m a researcher at the <a href="http://www.dlc.sjtu.edu.cn/">e-learning lab of Shanghai Jiao Tong University</a>. There, I&#8217;m an involved in the <a href="http://www.role-project.eu/">European Project ROLE</a> and working on mobile learning and active language learning.</p>
<p>Here at Totuba I am working on harnessing the potential of the Semantic Web for innovative learning services. More on that in the next posts.<br />
For more information about me, visit my <a href="http://www.carstenullrich.net/">homepage</a>. From time to time I write about things I find interesting on <a href="http://bloggingullrich.blogspot.com/">my blog</a>. You can also friend me on <a href="http://twitter.com/ullrich">twitter </a>and <a href="http://friendfeed.com/ullrich">friendfeed</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://labs.totuba.com/?feed=rss2&amp;p=3</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
