<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1374054023917435198</id><updated>2011-12-26T16:02:38.088-07:00</updated><category term='bayesian analysis'/><category term='technology'/><category term='tools'/><category term='learning theory'/><category term='sample complexity'/><category term='reinforcement learning'/><category term='technical'/><category term='approximation theory'/><category term='bayesian models'/><category term='representation learning'/><category term='latex'/><category term='annoyance'/><category term='models'/><category term='Beamer style'/><category term='frequentist approach'/><category term='curse of dimensionality'/><category term='pdf'/><category term='presentation'/><category term='djvu'/><category term='matlab'/><category term='non-parametric statistics'/><category term='compression'/><category term='mac osx'/><category term='AI'/><category term='model selection'/><category term='Occam&apos;s razor'/><category term='mathematics'/><category term='Keynote'/><category term='aggregation'/><category term='clinical trials'/><category term='statistics'/><category term='optimization tools'/><category term='machine learning'/><category term='image processing'/><category term='thunderbird'/><category term='blogging'/><category term='X11'/><category term='Powerpoint'/><category term='optogenetics'/><category term='artificial intelligence'/><category term='jbig2'/><category term='mixing'/><category term='exploration'/><category term='svn'/><category term='science'/><title type='text'>Readings in Machine Learning</title><subtitle type='html'>Machine learning related posts, thoughts, ideas..</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>20</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-6600661100250465029</id><published>2011-05-04T08:43:00.007-07:00</published><updated>2011-05-04T11:57:31.040-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='AI'/><category scheme='http://www.blogger.com/atom/ns#' term='artificial intelligence'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>Brains, Minds and Machines</title><content type='html'>Former UofA student,&lt;a href="http://people.csail.mit.edu/agf/Homepage/Welcome.html"&gt; Alborz&lt;/a&gt;, shared a &lt;a href="http://amps-web.amps.ms.mit.edu/public/150th/may3/"&gt;link&lt;/a&gt; to a video recording of a recent MIT150 symposium on Brains, Minds and Machines on facebook. I watched the video yesterday (guess what, I need to mark 40 something finals, hehe:)).&lt;br /&gt;I wrote a comment back to Alborz on facebook and then I thought, why not make this a blog post? So, here it goes, edited, expanded. Warning: Spoilers ahead and the summary will be biased. Anyhow..&lt;br /&gt;&lt;br /&gt;The title of the panel was: "&lt;span style="font-style: italic;"&gt;The Golden Age — A Look at the Original Roots of Artificial Intelligence, Cognitive Science, and Neuroscience&lt;/span&gt; " and the panelist were &lt;strong&gt;Emilio Bizzi, &lt;/strong&gt;&lt;strong&gt;Sydney Brenner, &lt;/strong&gt;&lt;strong&gt;Noam Chomsky, &lt;/strong&gt;&lt;strong&gt;Marvin Minsky, &lt;/strong&gt;&lt;strong&gt;Barbara H. Partee and &lt;/strong&gt;&lt;strong&gt;Patrick H. Winston. &lt;/strong&gt;The panel was moderated by &lt;strong&gt;Steven Pinker &lt;/strong&gt;who started with a 20-30 minute introduction. Once done with this each of the panelist delivered a little speech and at the end there were like two questions asked by Pinker.&lt;br /&gt;&lt;br /&gt;My heroes in the panel were &lt;span style="font-weight: bold;"&gt;Minsky&lt;/span&gt; and &lt;span style="font-weight: bold;"&gt;Winston&lt;/span&gt;. They rocked! Minsky almost fell asleep during his talk, but he was well aware of this and I loved him. He told a story about Asimov not wanting to come to his lab to see the real robots (he did not want to get disappointed) and about von Neumann who said that he does not know if Minsky's thesis could qualify as a thesis on mathematics (they were both at Princeton in the math department), but soon it will be. I really enjoyed this part. Winston acted a bit like a comedian. I did not mind this either. One thing that Minsky and Winston both said is that the mistakes happened when AI became successful and everyone from that on seemed to forget the science part of AI. But they did not say much about how we can get back on the track (except that we should try). Winston blamed the short-sightedness of funding agencies and who would disagree.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Chomsky&lt;/span&gt; made some interesting claims. He claimed that language is designed for thoughts and not for communication. This was a pretty interesting claim. He claimed other things supporting his idea about innate, universal grammars, but I did not know how much credit to give to those as I have many linguist friends who strongly disagree. Things became interesting when answering a question at the end, he dumped the whole of machine learning. He talked about "a novel scientific criterion" that was never heard of before referring to being able to predict "on unanalyzed data". He said that "of course" with enough data you will do better, but it seemed that he thinks that the evaluation criterion is already ridiculous. He also said  that a little statistics does not hurt, but he still seems that the big deal is the engineering part. (He did not say with these words, but this is what I got from what he said).&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Sydney Brenner&lt;/strong&gt; (pioneer in genetics and molecular biology) was puzzled about why he was invited, though he had some good stories. I liked when he said that in 50 years people will not understand why everyone talked about consciousness 50 years ago.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Emilio Bizzi &lt;/strong&gt;(a big shot in neuroscience, in particular in motor control) talked about modularity, "dimension reduction" and generalization, and he looked like a fine Italian gentleman, but I have to confess that I don't remember anything else, though this could have been because it was late.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Barbara H. Partee&lt;/span&gt; read a script. In the first 5 minutes or so she was mostly praising Chomsky. Then she started to talk about her own work, which was foundational in semantics. She talked about how semantics is the real thing. By semantics she means formal semantics (like in logics). While in general I am fond of this work, I am not sure if anything like this is going on in our brain and if sentences really do have a meaning in a formal sense. It seems to me that the fact that sentences can have a formal meaning is more likely an illusion, a post-hoc thought than the real thing. Ad it is unclear if bringing in formal logics is going to bring us anywhere. Unfortunately, there was no discussion of this at all. At one point she made the remark that "search engines do not use semantics", but then left us in vain about how we could do any better. Oh, well..&lt;br /&gt;&lt;br /&gt;In summary, an impressive set of people, some nice stories, but little cutting edge science. Loads of romanticism about the 50s and 60s and no advice for the young generation. The title tells it all. It was still nice to see these people.&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-6600661100250465029?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/6600661100250465029/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=6600661100250465029&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/6600661100250465029'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/6600661100250465029'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2011/05/brains-minds-and-machines.html' title='Brains, Minds and Machines'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-6952237298442385661</id><published>2011-04-13T20:09:00.006-07:00</published><updated>2011-05-17T09:54:09.261-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='latex'/><category scheme='http://www.blogger.com/atom/ns#' term='tools'/><category scheme='http://www.blogger.com/atom/ns#' term='technology'/><category scheme='http://www.blogger.com/atom/ns#' term='technical'/><category scheme='http://www.blogger.com/atom/ns#' term='svn'/><title type='text'>Useful latex/svn tools (merge, clean, svn, diff)</title><content type='html'>This blog is about some tools that I have developed (and yet another one that I have downloaded) which help me to streamline my latex work cycle. I make the tools available, hoping that other people will find them useful. However, they are admittedly limited (more about this) and as usual for free stuff they come with zero guarantee. Use them at your own risk.&lt;br /&gt;&lt;br /&gt;The first little tool is for creating a cleaned up file before submitting it to a publisher who asks for source files. I call it &lt;span style="font-style: italic;"&gt;ltxclean.pl&lt;/span&gt;, it is developed in Perl. It can be downloaded from &lt;a href="http://www.ualberta.ca/%7Eszepesva/sourcecode/ltxclean.pl"&gt;here&lt;/a&gt;.&lt;br /&gt;The functionality is&lt;br /&gt;(1) to remove latex comments&lt;br /&gt;(2) to remove \todo{} commands&lt;br /&gt;(3) to merge files included from a main file into the main file&lt;br /&gt;(4) to merge the bbl file into the same main file&lt;br /&gt;&lt;br /&gt;If you make the tool executable (chmod a+x ltxclean.pl), you can use it like this:&lt;br /&gt;&lt;br /&gt;$ ltxclean.pl  main.tex &amp;gt; cleaned.tex&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;How does this work?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The tool reads in the source tex file, processes it line by line and produces some output to the standard output stream, which you can redirect (as shown above) to a file.&lt;br /&gt;Thus, whatever the tool does is limited to the individual lines. This is a limitation, but this made it possible for me to write this tool in probably less time than I spend on writing about it now.&lt;br /&gt;There are other limitations, see below. Now, how do we know that this worked? The advice is to run &lt;span style="font-family:courier new;"&gt;latex+dvips&lt;/span&gt; and then &lt;span style="font-family:courier new;"&gt;diff original.ps new.ps&lt;/span&gt; to see if there is any significant change. On the files I have tried, the only difference was the filename and the date.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Why this functionality and the glory details&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As it happens, &lt;span style="font-style: italic;"&gt;removing the comments&lt;/span&gt; before you submit a source file is &lt;span style="font-style: italic;"&gt;crucial&lt;/span&gt;. Not long ago, it happened to me that I have submitted a source to a publisher and I did not care about removing the comments. At the publisher, they loaded the file into a program, which wrapped the long lines, including the ones with comments! This created a lot of garbage in the middle of the text. We were pressed against time, though I could not check the text in details. The result: The text was printed with a lot of garbage! Too bad!! A painful experience for me.. I will never again submit source files with the comments kept in the file! Now, the above utility is meant to handle comments correctly. It pays attention to not to create empty lines (and thus new lines) inadvertently, not to remove end-of-line comments etc.&lt;br /&gt;&lt;br /&gt;The &lt;span style="font-style: italic;"&gt;\todo{}&lt;/span&gt; commands belong to the same category: They are better removed before submitting the file. For my todos, I use the &lt;a href="http://www.ctan.org/tex-archive/macros/latex/contrib/todonotes/"&gt;todonotes&lt;/a&gt; package, which puts todo notes on the margin (or within the text). This package supplies the \todo[XX]{ZZZ} command, where [XX] is optional. The above little script removes such todo commands, but only if they span a single line only. For now, you would need to remove multi-line todos by hand.&lt;br /&gt;&lt;br /&gt;Another service of this little tool is to &lt;span style="font-style: italic;"&gt;merge multiple files&lt;/span&gt; into a single one. Oftentimes, we use the latex command &lt;span style="font-style: italic;"&gt;\input&lt;/span&gt; to break a large source file into multiple files. However, publishers typically want just one file. So this tool reads in the main file and the recursively, whenever it sees \input{FILE} in the source, it reads in the corresponding file and processes it before it continues with the current file (just like latex would work).&lt;br /&gt;&lt;br /&gt;Finally, if the tool finds a \bibliography{...} command, it will take that out and open the .bbl file sharing the same base name as the input to the tool. Thus, if the tool was called on the file main.tex, when seeing a bibliography command, the tool will attempt to open main.bbl and include it in place of the \bibliography command. (If you use &lt;a href="http://www.tug.org/applications/hyperref/manual.html"&gt;hyperref&lt;/a&gt;, turn off &lt;span class="ec-lmtt-10"&gt;pagebackref&lt;/span&gt;, otherwise this functionality will not work.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Managing revisions with svn&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Two other small utilities that I make available are &lt;a href="http://www.ualberta.ca/%7Eszepesva/sourcecode/svnprevdiff"&gt;svnprevdiff&lt;/a&gt; and &lt;a href="http://www.ualberta.ca/%7Eszepesva/sourcecode/svnreviewchanges"&gt;svnreviewchanges&lt;/a&gt;.&lt;br /&gt;The purpose of these scripts is to help one review changes to files which are under &lt;a href="http://en.wikipedia.org/wiki/Apache_Subversion"&gt;svn&lt;/a&gt; control.&lt;br /&gt;There is a third script, &lt;a href="http://www.ualberta.ca/%7Eszepesva/sourcecode/diffmerge"&gt;diffmerge&lt;/a&gt;, called by the above two scripts. This script takes two file arguments and loads these into the program  &lt;a href="http://www.sourcegear.com/diffmerge/"&gt;DiffMerge&lt;/a&gt; which allows you to visually inspect the differences between the two files and make changes to the second one loaded. On a different platform/installation, or if you want to use a different tool for comparing/merging files.&lt;br /&gt;&lt;br /&gt;The utility &lt;span style="font-style: italic;"&gt;svnreviewchanges&lt;/span&gt; takes a file as an argument, compares it to its base version stored on your disk and opens up the two versions for comparison using diffmerge. The purpose is to allow one to quickly review how a file was changed before submitting a file to the svn server (so that you can write meaningful comments in the commit message).&lt;br /&gt;&lt;br /&gt;The utility &lt;span style="font-style: italic;"&gt;svnprevdiff&lt;/span&gt; takes a filename as an argument, compares it to its &lt;span style="font-style: italic;"&gt;previous&lt;/span&gt; version &lt;span style="font-style: italic;"&gt;stored on the svn server&lt;/span&gt; and then opens up the two versions using diffmerge. The purpose of this is to check the changes implemented by your pals &lt;span style="font-style: italic;"&gt;after&lt;/span&gt; an update. A future version will take an optional argument which when present will be interpreted as a revision number. Maybe.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Advice on using latex when working in a team: Break long lines&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;A small, but useful thing is to put every sentence on its own line and generally avoiding long lines (even when writing equations). The reason is that this will make the job of diff much easier. And believe me, diffing is something people will end up doing for good or bad (mostly good) when they are on a team.&lt;br /&gt;&lt;br /&gt;Some of my friends, like &lt;a href="http://www.szit.bme.hu/%7Eantos/"&gt;Antoska&lt;/a&gt; would recommend breaking up the individual sentences into multiple lines. You can do this, but if you overdo it, you will find yourself fiddling way too much with what goes into which line.&lt;br /&gt;&lt;br /&gt;Finally, a tool which does this,  written by &lt;a href="http://www.math.ntnu.no/%7Estacey/"&gt;Andrew Stacey&lt;/a&gt;, is &lt;a href="http://www.math.ntnu.no/%7Estacey/HowDidIDoThat/LaTeX/fmtlatex"&gt;fmtlatex.pl&lt;/a&gt;.&lt;br /&gt;This is also in Perl and its documentation will be written on the screen if you use &lt;code&gt;perldoc fmtlatex. &lt;/code&gt;I still have to try this.&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-6952237298442385661?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/6952237298442385661/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=6952237298442385661&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/6952237298442385661'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/6952237298442385661'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2011/04/useful-latexsvn-tools-merge-clean-svn.html' title='Useful latex/svn tools (merge, clean, svn, diff)'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-6006039475470116561</id><published>2011-04-13T19:22:00.005-07:00</published><updated>2011-04-13T19:53:17.621-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mac osx'/><category scheme='http://www.blogger.com/atom/ns#' term='technology'/><category scheme='http://www.blogger.com/atom/ns#' term='technical'/><category scheme='http://www.blogger.com/atom/ns#' term='matlab'/><category scheme='http://www.blogger.com/atom/ns#' term='X11'/><title type='text'>I can run Matlab on my Mac again!</title><content type='html'>After much struggling today I managed to make Matlab run again my Mac.&lt;br /&gt;The major problem was that Matlab complained about that I have the wrong version of X11 installed on my system and it won't start. As I have finished teaching today for the semester, I thought that I am going to celebrate this by resolving this issue which I was struggling with for a year or so by now. On the internet you will see a lot of advice on what to do, and as they say, the truth is indeed out there, however, it is not so easy to find. In a nutshell what seems to happen is this:&lt;br /&gt;&lt;br /&gt;Why Matlab does not start when other applications do start (say, Gimp, Gnuplot using X11, etc.).&lt;br /&gt;Matlab seems to make the assumption that  the X11 libraries are located at /usr/X11/lib and it &lt;span style="font-style: italic;"&gt;sticks to this assumption no matter how your system is configured&lt;/span&gt;. I use XQuartz and macports' X11 and they put stuff elsewhere. I had some legacy code sitting in /usr/X11/, which I did not use. It was a remainder of some version of X11 that I used probably 2 or 3 laptops ago. Matlab reported that the lib was found, but the "architecture was wrong". The error message had something like:&lt;br /&gt;&lt;br /&gt;..&lt;span style="font-size:85%;"&gt;&lt;span style="font-family: courier new;"&gt; &lt;/span&gt;&lt;span style="font-family: courier new;"&gt; Did find: &lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family: courier new;"&gt;/usr/X11R6/lib/libXext.6.dylib: mach-o, but wrong architecture&lt;/span&gt;&lt;/span&gt;..&lt;br /&gt;&lt;br /&gt;Anyhow, here is one solution.&lt;br /&gt;You have to arrange that /usr/X11 points to a directory that has a working X11 copy.&lt;br /&gt;It is probably a good idea to first clean up the old X11 installation. You can do this by following the advice on the &lt;a href="http://xquartz.macosforge.org/trac/wiki/X11-UsersFAQ"&gt;XQuartz FAQ&lt;/a&gt; page by issuing the following commands in the terminal:&lt;br /&gt;&lt;pre class="wiki"&gt;&lt;span style="font-family: courier new;"&gt;sudo rm -rf /usr/X11* /System/Library/Launch*/org.x.* /Applications/Utilities/X11.app /etc/*paths.d/X11 sudo pkgutil --forget com.apple.pkg.X11DocumentationLeo sudo pkgutil --forget com.apple.pkg.X11User sudo pkgutil --forget com.apple.pkg.X11SDKLeo sudo pkgutil --forget org.x.X11.pkg&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;Then I have reinstalled the latest XQuartz copy (not all these steps might be necessary, but in order to stay on the safe side, I will describe everything I did).&lt;br /&gt;I also have &lt;a href="http://www.macports.org/"&gt;macports&lt;/a&gt; and xorg-libX11, xorg-libXp, xorg-server seems necessary for the following steps to succeed (but possibly other xorg-* ports are also needed). I am guessing that XQuartz does not install all the libraries, but after installing enough xorg-* ports through macports, all the libraries will be installed which are used by Matlab.&lt;br /&gt;&lt;br /&gt;Now, my X11 is located at /opt/X11 and some additional libs are found at /opt/local/lib.&lt;br /&gt;&lt;br /&gt;So I created a bunch of symbolic links:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family: courier new;"&gt;sudo ln -s /opt/X11 /usr/X11&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;for i in /opt/local/lib/libX* ; do sudo ln -s $i /usr/X11/lib; done&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;The first line creates a symbolic link to /opt/X11, while the second is necessary because of the additional libX* libraries which, for some reason, macports puts into /opt/local/lib instead of puttting it into /opt/X11/lib. Initially I did not know that I need these libs, and then Matlab complained that it did not find the image for some lib (it was /usr/X11/lib/libXp.6.dylib).&lt;br /&gt;&lt;br /&gt;Anyhow, I am really happy that this worked!&lt;br /&gt;I hope people who will have the same trouble will find my post useful.&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-6006039475470116561?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/6006039475470116561/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=6006039475470116561&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/6006039475470116561'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/6006039475470116561'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2011/04/i-can-run-matlab-on-my-mac-again.html' title='I can run Matlab on my Mac again!'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-7127922686046549879</id><published>2009-11-17T08:55:00.008-07:00</published><updated>2009-11-18T09:12:11.397-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='jbig2'/><category scheme='http://www.blogger.com/atom/ns#' term='image processing'/><category scheme='http://www.blogger.com/atom/ns#' term='technology'/><category scheme='http://www.blogger.com/atom/ns#' term='pdf'/><category scheme='http://www.blogger.com/atom/ns#' term='djvu'/><category scheme='http://www.blogger.com/atom/ns#' term='compression'/><title type='text'>Djvu vs. Pdf</title><content type='html'>Long blog again, so here is the &lt;span style="font-weight: bold;"&gt;executive summary&lt;/span&gt;: Djvu files are typically smaller than Pdf files. Why? Can we further compress pdf files? Yes, we can, but the current best solution has limitations. And you can forget all "advanced" commercial solutions. They are not as good as a free solution.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Introduction&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/DjVu"&gt;DJVU&lt;/a&gt; is a proprietary file format by LizardTech. Incidentally, it was invented by some machine learning researchers, &lt;a href="http://yann.lecun.com/" title="Yann LeCun"&gt;Yann LeCun&lt;/a&gt;, &lt;a href="http://leon.bottou.org/" title="Léon Bottou"&gt;Léon Bottou&lt;/a&gt;, &lt;a href="http://www2.research.att.com/%7Ehaffner/" class="new" title="Patrick Haffner (page does not exist)"&gt;Patrick Haffner&lt;/a&gt; and the image compression researcher &lt;a href="http://www.informatik.uni-trier.de/%7Eley/db/indices/a-tree/h/Howard:Paul_G=.html" class="new" title="Paul G. Howard (page does not exist)"&gt;Paul G. Howard&lt;/a&gt; at AT&amp;amp;T back in 1996. The &lt;a href="http://djvu.sourceforge.net/"&gt;DJVULibre&lt;/a&gt; library provides a free implementation, but is GPLd and hence is not suitable for certain commercial softwares, like &lt;a href="http://mekentosj.com/papers/"&gt;Papers&lt;/a&gt;, which I am using to organize my electronic paper collection. Hence, Papers, might not support djvu in the near future (the authors of Papers do not want to make it free, and, well, this is their software, their call).&lt;br /&gt;Djvu files can converted to Pdf files using &lt;a href="http://djvu.sourceforge.net/doc/man/ddjvu.html"&gt;ddjvu&lt;/a&gt;, a command line tool which is part of DJVULibre (&lt;a href="http://freshmeat.net/projects/djvu2pdf/"&gt;djvu2pdf&lt;/a&gt; is a script that calls this tool). Djvu can also be converted into PS files using &lt;a href="http://djvu.sourceforge.net/doc/man/djvups.html"&gt;djvups&lt;/a&gt; (then use ps2pdf). However, all these leave us with pretty big files compared to the originals and, on the top of it, if there was an OCR layer in the Djvu file, it gets lost, but this is another story. How much bigger? Here is an illustration:&lt;br /&gt;&lt;br /&gt;Original djvu file:           9.9MB&lt;br /&gt;djvu2pdf file:              427.6MB(!)&lt;br /&gt;djvu2ps file:                     1.0GB&lt;br /&gt;djvu2ps, ps2pdf file:  162.6MB&lt;br /&gt;&lt;br /&gt;Note that I have turned on compression in the conversion process (-quality=50). (The quality degradation was not really noticeable at this level.) So, at best, I got more than 16 times the original file size. Going mad about it, I started to search the internet for better solutions. I have spent almost a day on this (don't do this, especially if you are a student!)..&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;JBig2 and the tale of commercial solutions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;First, I figured, the difference is that these use general image compression techniques (like jpeg), while djvu is specialized to text and black&amp;amp;white images. Thus, for example, it can recognize if the same character appears multiple times on the page, store a template and a reference to the template. This is clever. I then figured that PDF files support the so-called &lt;a href="http://en.wikipedia.org/wiki/JBIG2"&gt;jbig2&lt;/a&gt; encoding standard, which is built around this idea. Hence, the quest for software that would support encoding a document using a jbig2 encoder and put the result into a pdf format. The easiest would be, if such a software just existed out there. A few commercial packages indeed mention jbig2. I felt lucky (especially, seeing that there are a few cheap ones). So, I started to download trial versions. Here are the results:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.imagepdf.com/"&gt;PDFJB2&lt;/a&gt;:                                              34.1MB&lt;br /&gt;&lt;a href="http://www.cvisiontech.com/products/general/pdfcompressor.html"&gt;CVision PdfCompressor&lt;/a&gt;:                    48MB&lt;br /&gt;CVision PdfCompressor with OCR: 49MB&lt;br /&gt;&lt;a href="http://www.a-pdf.com/"&gt;A-PDF&lt;/a&gt;:                                              106.8MB&lt;br /&gt;A-PDF + PDFCompress:               106.8MB&lt;br /&gt;djvu2pdf + &lt;a href="http://www.bureausoft.com/products.html#PDF%20Compress"&gt;PDFCompress&lt;/a&gt;:     conversion failed&lt;br /&gt;&lt;br /&gt;Hmm, interesting. 34MB is much better than 160MB, but it is still a long way from 9.9MB. (After a superficial look at the resulting files I concluded that only the A-PDF compressed file lost quality. What happened with this file is that on some page in some line containing a mathematical formula, the top of the letters got chopped.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Free, open source solutions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Becoming desperate, I continued hunting for better solutions. Searching around, I have found &lt;a href="http://www.lowagie.com/iText/"&gt;iText&lt;/a&gt;, which is an open source, free Java library supporting all kinds of manipulations of Pdf files. I have figured that it "uses" Jbig2, but it was not clear if it uses it for compression or just knows how to handle the encoding. So, here I go, I wrote a java program opening a pdf file and then writing it out in "compressed" mode. Hmm, this few lines of coding allowed me to create a file of size 26MB, smaller than what I could ever get previously. Exciting! Unfortunately, opening the file revealed the `secret': Quality was gone. The file looked to be seriously downsampled (i.e., the resolution was decreased). Not good.&lt;br /&gt;&lt;br /&gt;Then I have found &lt;a href="http://code.google.com/p/pdfsizeopt" style="text-decoration: none; color: rgb(0, 0, 0);"&gt;pdfsizeopt&lt;/a&gt; on google code, which aims exactly at compressing the size of pdf files! The Holy Grail? Well, installing pdfsizeopt on my mac was far from easy (I use a Mac, which also runs Windows; quite handy as some of the above software runs only under Windows..). However, finally, I was able to run pdfsizeopt. Unfortunately, it seems to crash, without even looking at my pdf file (I hope the bug will be corrected soon and then I can report results using it). Along the way, I had to install &lt;a href="http://github.com/agl/jbig2enc"&gt;jbig2enc&lt;/a&gt;. For this, I just had to install &lt;a href="http://www.leptonica.com/"&gt;leptonica&lt;/a&gt; (version 1.62, not the latest one), which is really the part that is doing the image processing part of the process. JBig2Enc expects a tif file and produces "pdf" ready output (every page is put in a separate file), which can be concatenated into a single pdf file by a python script provided. Having jbig2enc on my system, I gave it a shot. I first used ddjvu to transform the input to a tif file (using the command line option, "-quality=75", resulting in a file of size 1GB). Then I used the jbig2 encoded with the command line arguments "-p -s". The result is this:&lt;br /&gt;&lt;br /&gt;jbig2enc: 3.8MB&lt;br /&gt;&lt;br /&gt;Wow!! Opening the file revealed a dirty little secret: Color images are gone, as well as the quality of some halftoned gray-scale images got degraded. However, line drawings were kept nicely and, in general, the quality was good (comparable to the original djvu file). Conversion to tif took 5 minutes, conversion from tif to jbig2 took ca. 4 minutes, altogether making the whole process take close to 10 minutes. (Other solutions were not faster at all either. And the tests were run on a resourceful MacBook Pro.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Conclusions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;jbig2enc seems to work, but you will lose colors. If you are happy with this, jbig2enc is the solution, though the process should be streamlined a bit (a small script good do this). Oh yes, I did not mention that these processes are not fast. I did not attempt to measure the speed, but conversion takes a lot of time. Jbig2Enc is maybe on the faster end of the spectrum.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Future work&lt;/span&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;pdfsizeopt is a good idea. It should be made work.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;It would be nice to create a jbig2enc wrapper &lt;/li&gt;&lt;li&gt;ddjvu is open source: Maybe it can be rewritten to support jbig2 directly. The added benefit could be that one could also keep the OCR layer in the original djvu file if one existed&lt;/li&gt;&lt;li&gt;Along the way, I have found a cool google code project, &lt;a href="http://code.google.com/p/tesseract-ocr/"&gt;Tesseract&lt;/a&gt;, which is an open source OCR engine. How cool would it be if we had an OCR engine that helps the compression algorithm and eventually also puts an OCR layer on the top of documents which lack text information (think of scanned documents, or documents converted from an old postscript file). Currently, I am using Nuance's &lt;a href="http://www.nuance.com/imaging/products/pdfconverter.asp"&gt;Pdf Converter Professional&lt;/a&gt; (yes, I paid for it..), which I am generally very satisfied with apart from its speed. However, this could be the subject of another post.&lt;/li&gt;&lt;/ol&gt;PS: I have tested the capabilities of Nuance's Pdf Converter Professional and Abbyy's in terms of their compression capabilities:&lt;br /&gt;Nuance:            132MB&lt;br /&gt;Abbyy:                 129MB&lt;br /&gt;Yes, I tried their advance "MRC" compression, in Nuance I have explicitly selected jbig2. No luck.&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-7127922686046549879?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/7127922686046549879/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=7127922686046549879&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/7127922686046549879'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/7127922686046549879'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2009/11/djvu-vs-pdf.html' title='Djvu vs. Pdf'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-9034117616933721945</id><published>2009-11-14T12:47:00.006-07:00</published><updated>2009-11-14T16:46:49.769-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Keynote'/><category scheme='http://www.blogger.com/atom/ns#' term='mac osx'/><category scheme='http://www.blogger.com/atom/ns#' term='Powerpoint'/><category scheme='http://www.blogger.com/atom/ns#' term='Beamer style'/><category scheme='http://www.blogger.com/atom/ns#' term='presentation'/><title type='text'>Keynote vs. Powerpoint vs. Beamer</title><content type='html'>A few days ago I decided to give &lt;a href="http://www.apple.com/iwork/keynote/"&gt;Keynote&lt;/a&gt;, Apple's presentation software, a try (part of iWork '09). Beforehand I used MS Powerpoint 2003, Impress from &lt;a href="http://www.neooffice.org/neojava/en/index.php"&gt;NeoOffice 3.0&lt;/a&gt; (OpenOffice's  native Mac version) and LaTeX with &lt;a href="http://latex-beamer.sourceforge.net/"&gt;beamer&lt;/a&gt;. Here is a comparison of the ups and downs of these software, mainly to remind myself when I will reconsider my choice in half a year and also to help people decide what to use for their presentation. Comments, suggestions, critics are absolutely welcome, as usual. Btw, while preparing this note I have learned that &lt;a href="http://go-oo.org/"&gt;go-oo.org&lt;/a&gt; has a native Mac Aqua version of OpenOffice. Maybe I will try it some day and update the post. It would also be good to include a recent version of Powerpoint in the comparison.&lt;br /&gt;&lt;h3&gt;Stability&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Keynote: Excellent &lt;/span&gt;&lt;br /&gt;After a few days of usage, so take this statement with a grain of salt..&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;MS Powerpoint 2003: Excellent&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Impress: Poor&lt;/span&gt;&lt;br /&gt;Save your work very often&lt;/li&gt;&lt;li style="font-weight: bold;"&gt;Beamer: Excellent&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Creating visually appealing slides, graphics on slides&lt;br /&gt;&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Keynote: Excellent&lt;/span&gt;&lt;br /&gt;Positioning rulers help a lot. The process is really smooth. Keynote forces you to use less text. Built in templates are professional looking. Adding presentation graphics (tables, basic charts) is very easy. Cooler (technical drawing) better done with &lt;a href="http://www.omnigroup.com/applications/OmniGraffle/"&gt;OmniGraffle&lt;/a&gt;. You can also easily animate the graphics, tables. Overall, very impressive.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;MS Powerpoint 2003: Good&lt;/span&gt;&lt;br /&gt;Aligning to other objects is more cumbersome than in Keynote. The quality of fonts, color palettes, templates is not as good in Keynote.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Impress: Good&lt;/span&gt;&lt;br /&gt;Same as MS Powerpoint, maybe somewhat below (but the difference is not big).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Beamer: Poor&lt;/span&gt;&lt;br /&gt;The fonts and styles (templates) are great. However, creating slides with lively graphic is a nightmare (due to the lack of a GUI): You will end up with a few standard layouts, you will in general not use graphics, let alone animated graphics (or you will spend days on creating your slides). Also, departing from the styles is difficult and I am just bored of some of these styles that everyone seems to use.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;LaTeX (math) support&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Keynote: Poor&lt;/span&gt;&lt;br /&gt;Supported through &lt;a href="http://pierre.chachatelier.fr/programmation/latexit_en.php"&gt;LatexIt&lt;/a&gt; (free), but overall a cumbersome process. Details below.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;MS Powerpoint 2003: &lt;/span&gt;&lt;span style="font-weight: bold;"&gt;Medium&lt;/span&gt;&lt;br /&gt;Supported through &lt;a href="http://texpoint.necula.org/buy.html"&gt;TexPoint&lt;/a&gt; (commercial, USD30) process is roughly same as with LatexIt and Keynote, slightly better integration.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Impress: Medium&lt;/span&gt;&lt;br /&gt;Supported through &lt;a href="http://ooolatex.sourceforge.net/"&gt;OOoLatex&lt;/a&gt; (free), same as MSPowerPoint + TexPoint, the integration is slightly better.&lt;br /&gt;&lt;/li&gt;&lt;li style="font-weight: bold;"&gt;Beamer: Excellent&lt;br /&gt;&lt;span style="font-weight: normal;"&gt;Beamer is built for this!&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Animations&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Keynote: Near perfect&lt;/span&gt;&lt;br /&gt;Magic slide transition helps a lot with continuity across slides. What does this do? If you have the same object on two consecutive slides, Keynote will create an animation, keeping the object on screen and flying it to its new position. Works with multiple objects, too. I have found this very helpful for presenting a multi-slide argument. In general, Keynote animations are slick, polished, the flexibility is great. I lack some features of Beamer, such as animated highlighting, in-place replacement of some text (these can all be simulated with the existing tools, but with difficulty only).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;MS Powerpoint 2003: Basic&lt;/span&gt;&lt;br /&gt;I miss Keynote's magic transitions. In general, Keynote is richer in animations. Again, some features of Beamer would be nice to have.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Impress: Weak&lt;/span&gt;&lt;br /&gt;Impress is inferior in terms of its animation caps to MS Powerpoint &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Beamer: Good&lt;/span&gt;&lt;br /&gt;If only someone added support for magic transitions between slides. Some other cool effects would also come handy.&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Dual screen presentation support&lt;/h3&gt;The idea is to show notes, time left in addition to the current and next slide on your screen, while showing the current slide on the big screen.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Keynote: Excellent&lt;/span&gt;&lt;br /&gt;Keynote supports double screen presentations natively. If you need to swap displays, go on the notes screen in the options menu. This will be on the big screen, obviously, if you need to swap the the screens.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;MS Powerpoint 2003: Not available&lt;/span&gt;&lt;br /&gt;I have no experience with this feature of MS Powerpoint. Maybe you can use and add-on or something, but the basic software does not support it. I am pretty sure newer versions of Powerpoint must support this.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Impress: Excellent(?)&lt;/span&gt;&lt;br /&gt;The "&lt;a href="http://extensions.services.openoffice.org/project/presenter-screen"&gt;Sun Presenter Console&lt;/a&gt;" extension supposedly supports dual screen presentations just like Keynote, but I have never had the chance to test it. Hence, the question mark. Some posts on the internet indicate that the extension might leak memory.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Beamer: Basic support&lt;/span&gt;&lt;br /&gt;Use &lt;a href="http://code.google.com/p/splitshow/"&gt;Splitshow&lt;/a&gt; for this purpose. However, as far as I know, you cannot show the current time or the time remaining on the notes screen.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Interoparability&lt;/h3&gt;I want to put my presentations on the web so that people can look at them no matter what (major) operating system they use, without loosing animations or any other features. Another desired feature is the ability to create a compact, printable version of the slides: That is, if you have animations spanning multiple slides, somehow they should get handled intelligently. There is a tradeoff here: The more animation rich your slides are, the more bloated/complicated your printout will be.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Keynote: OK&lt;/span&gt;&lt;br /&gt;Proprietary file format. This is my biggest complaint. A keynote presentation is a keynote presentation. Apple likes to lock you in. Export to PDF and PPT works relatively well, but will lose some features of the presentation, like the cool animations. Exporting to PDF without animations to create printable versions seems to work well.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;MS Powerpoint 2003: Good&lt;/span&gt;&lt;br /&gt;Free powerpoint viewers exist that can play any PPT file. Export to PDF will again lose some features.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Impress: Good&lt;/span&gt;&lt;br /&gt;Same as powerpoint.&lt;br /&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Beamer: Excellent&lt;/span&gt;&lt;br /&gt;Produces PDF outputs: The presentations can be viewed on any computer! Also, the source is later, beamer is available on all systems. Add  [handout] to the style and beamer will create an animation free version of your slides that works almost all the cases.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;  &lt;h3&gt;More about using formulae in Keynote (and why it sucks)&lt;/h3&gt;I used LatexIt which produces a PDF that can be embedded into the presentation. Style is not matched automatically. The PDF contains the latex source for the formulae, copy paste it back to LatexIt to edit it. When done with the edit, you need to drag and drop the formula back into Keynote. This sucks, since you need to delete the original that you have edited, reposition the new formula and reapply animations if you had any. Horrible.&lt;br /&gt;&lt;br /&gt;Another issue is that the source saved with the formula by default does not have the preamble, thus using a command set specific to a presentation is difficult to achieve (you have to set this up manually). Another major headache is that you will not be able to use inline formula (a text is either in LaTeX, or in Keynote, the fonts in general do not match and mix well, alignment is a nightmare), nor will you be able to animate easily formulae (e.g., displaying a multiline formula line by line requires you to split the formula into multiple PDFs and use Keynote animations to show them one by one; this is problematic because formula alignment by hand is time consuming).&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-9034117616933721945?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/9034117616933721945/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=9034117616933721945&amp;isPopup=true' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/9034117616933721945'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/9034117616933721945'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2009/11/keynote-vs-powerpoint-vs-beamer.html' title='Keynote vs. Powerpoint vs. Beamer'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-47416875921364635</id><published>2009-10-31T11:01:00.002-07:00</published><updated>2009-10-31T11:03:05.244-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='optogenetics'/><category scheme='http://www.blogger.com/atom/ns#' term='science'/><title type='text'>Optogenetics</title><content type='html'>This is a little deviation from the usual topic.&lt;br /&gt;Scientist are able to genetically modify neurons that respond to light. They are in fact able to do this in a targeted manner. A patient would then have some LEDs inside his skull, emitting some light. In response the selected neurons start to fire. They demonstrated the technology by making mice run counterclockwise when they turn on the light. This is input to the brain. Earlier, it was demonstrated that neurons can be genetically modified to emit light when they are firing. Are we heading towards rewiring the brain and turning it into a light computer?&lt;br /&gt;The motivation for the research is to cure diseases like Parkinson's disease, when the patient has all the circuity and muscles but is just unable to make the movements. In fact, the researchers are already testing this technology on primates. Source: Wired Nov. 2009, "Powered by Photons" pp. 109--113. The wikipedia entry for optogenetics is &lt;a href="http://en.wikipedia.org/wiki/Optogenetics"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-47416875921364635?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/47416875921364635/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=47416875921364635&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/47416875921364635'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/47416875921364635'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2009/10/optogenetics.html' title='Optogenetics'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-7856918279416576413</id><published>2009-10-25T19:28:00.006-07:00</published><updated>2009-10-25T21:01:41.627-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='models'/><category scheme='http://www.blogger.com/atom/ns#' term='mathematics'/><title type='text'>Pitfalls of optimality in statistics</title><content type='html'>I was reading a little bit about robust statistics, as we are in the process of putting together a paper about entropy estimation where robustness comes up as an issue. While searching on the net for the best material to understand this topic (I am thinking about posting another article about what I have found), I have bumped into a nice paper (downloadable from &lt;a href="http://projecteuclid.org/DPubS?service=UI&amp;amp;version=1.0&amp;amp;verb=Display&amp;amp;handle=euclid.lnms/1249305323"&gt;here&lt;/a&gt;) by Peter J. Huber, one of the main figures in robust statistics, where he talks about a bunch of &lt;span style="font-style: italic;"&gt;pitfalls&lt;/span&gt; around pursuing optimality in statistics. Huber writes eloquently -- he gives plenty of examples, motivates definitions. He is just great. I can only recommend this paper or his &lt;a href="http://www.amazon.ca/gp/product/0470129905/"&gt;book&lt;/a&gt;. Now, what are the pitfalls he writes about? He distinguishes 4 types with the following syndromes:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;The fuzzy concepts syndrome: sloppy translation of concepts into mathematics&lt;/span&gt;. Think about uniform vs. non-uniform convergence (sloppy asymptotics). In statistics a concrete example is the concept of efficiency which is defined in a non-uniform manner with respect to the estimable parameters, which allows for (weird) "super-efficient" estimators that pay special attention to some distinguished element of the parameter-space.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;The straitjacket syndrome&lt;/span&gt;:&lt;span style="font-style: italic;"&gt; the use of overly restrictive side conditions&lt;/span&gt;, such as requiring that an estimator is unbiased or equivariant (equivariant estimates in high dimensions are inadmissible in very simple situations). In Bayesian statistics another example might be the convenient but potentially inappropriate conjugate priors.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;The scapegoat syndrome: confusing the model with reality &lt;/span&gt;(offering the model for the gods of statistics instead of the real thing, hoping that they will accept it). The classic example is the Eddington-Fisher argument. Eddington advocated the mean-absolute-deviation (MAD) instead of the root-mean-square (RMS) deviation as a measure of scale. Fisher argued that MAD estimates are highly inefficient (converge slowly) relative to the RMS deviation estimates if the sample comes from a normal distribution.  Tukey has shown that the situation gets reversed even under small deviations from a normal model. The argument that under narrow conditions one estimator is better than some other should not be even made. Another example is perhaps classical optimal design and the fundamentalist approach in Bayesian statistics. Of course, there is nothing wrong with assumptions, but the results should be robust.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;The souped-up car syndrome: by optimizing for speed we can end up with an elaborate gas-guzzler&lt;/span&gt;. Optimizing for one quantity (efficiency) may degrade another one (robustness). Practical solutions must find a balance between such contradicting requirements.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;These syndromes are not to hard identify in machine learning research. Wear protective gear as needed!&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-7856918279416576413?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/7856918279416576413/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=7856918279416576413&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/7856918279416576413'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/7856918279416576413'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2009/10/pitfalls-of-optimality-in-statistics.html' title='Pitfalls of optimality in statistics'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-4267601487040751596</id><published>2009-09-07T10:06:00.004-07:00</published><updated>2009-09-07T10:15:46.229-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='annoyance'/><category scheme='http://www.blogger.com/atom/ns#' term='mac osx'/><category scheme='http://www.blogger.com/atom/ns#' term='thunderbird'/><title type='text'>How to make Thunderbird delete temporary file</title><content type='html'>If you are using &lt;span style="font-weight: bold;"&gt;Thunderbird&lt;/span&gt; (TB) on &lt;span style="font-weight: bold;"&gt;Mac OSX&lt;/span&gt;, you might be annoyed by that when TB opens an attachment (like a pdf file) it creates the file on the Desktop and then leaves just it there! I have finally found a solution, which seems to work at least for me and assuming that you also have Firefox. The solution is &lt;a href="http://www.zagz.com/pdf-files-left-by-firefox-on-mac-os-x-desktop/"&gt;here&lt;/a&gt;, but I duplicate it here to make sure the idea spreads:&lt;br /&gt;&lt;br /&gt;Simply open Firefox, in the address bar type in &lt;span style="font-family: courier new;"&gt;about:config&lt;/span&gt;, then add a &lt;span style="font-weight: bold;"&gt;boolean&lt;/span&gt; variable &lt;span style="font-family: courier new;"&gt;browser.helperApps.deleteTempFileOnExit&lt;/span&gt; and set its value to &lt;span style="font-family: courier new;"&gt;true&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Now, this works to the extent that when you exit Firefox(!!) (after quitting Thunderbird), it will remove the cluttering files.&lt;br /&gt;&lt;br /&gt;Enjoy!&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-4267601487040751596?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/4267601487040751596/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=4267601487040751596&amp;isPopup=true' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/4267601487040751596'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/4267601487040751596'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2009/09/how-to-make-thunderbird-delete.html' title='How to make Thunderbird delete temporary file'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-4445640410337091003</id><published>2008-04-05T14:01:00.001-07:00</published><updated>2008-04-05T14:02:43.664-07:00</updated><title type='text'>Ninja Carburglars</title><content type='html'>&lt;a href="http://icanhascheezburger.com/2008/03/15/funny-pictures-ninja-catburglars/"&gt;&lt;img src="http://icanhascheezburger.wordpress.com/files/2008/03/funny-pictures-black-cats-ninja-burglers.jpg" style="word-spacing:674673px;font-size:674673px;" alt="Humorous Pictures" /&gt;&lt;/a&gt;&lt;br /&gt;see more &lt;a href="http://icanhascheezburger.com"&gt;crazy cat pics&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-4445640410337091003?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/4445640410337091003/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=4445640410337091003&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/4445640410337091003'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/4445640410337091003'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2008/04/see-more-crazy-cat-pics.html' title='Ninja Carburglars'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-3192080846159880063</id><published>2008-03-29T16:55:00.004-07:00</published><updated>2008-03-29T17:46:27.419-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Occam&apos;s razor'/><category scheme='http://www.blogger.com/atom/ns#' term='curse of dimensionality'/><category scheme='http://www.blogger.com/atom/ns#' term='non-parametric statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='aggregation'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>Statistical Modeling: The Two Cultures</title><content type='html'>Sometimes people ask what is the difference between what statisticians and machine learning researchers do. The best answer that I have found so far  can be found in &lt;br /&gt;"&lt;a href="http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&amp;amp;id=pdf_1&amp;amp;handle=euclid.ss/1009213726"&gt;Statistical Modeling: The Two Cultures&lt;/a&gt;" by Leo Breiman (Statistical Science, 16:199-231, 2001).&lt;br /&gt;According to this, statisticians like to start by making modeling assumptions about how the data is generated (e.g., the response is a noise added to the linear combination of the predictor variables), while in machine learning people use algorithm models and treat the data mechanism as unknown. He estimates that (back in 2001) less than 2% of statisticians work in the realm when the data mechanism is considered as unknown.&lt;br /&gt;It seems that there are two problem with the data model approach.&lt;br /&gt;One is that the this approach does not address the ultimate question which is making good predictions: if the data does not fit the model, this approach has nothing to offer (it does not make sense to apply a statistical test if the assumptions are not valid).&lt;br /&gt;The other problem is that as data become more complex, data models  become more cumbersome. Then why bother? With complex models we lose the advantage of easy interpretability, not talking about the computational complexity of fitting such models.&lt;br /&gt;The increased interest in Bayesian modeling with Markov Chain Monte Carlo is viewed as the response of the statistical community to this problem. True enough, this approach might be able to scale to complex data, but does this address the first issue? Are not there computationally cheaper alternatives that can achieve the same prediction power?&lt;br /&gt;He characterizes the machine learning approach, as the pragmatic approach: You have to solve a prediction problem, hence take it seriously: Estimate the prediction error and choose the algorithm that gives a predictor with the better accuracy (but let's not forget about data snooping!).&lt;br /&gt;But the paper offers more. Amongst other things it identifies three important recent lessons:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The multiplicity of good models: If you have many variables, there can be many models of similar prediction accuracy. Use them all by combining their predictions instead of just picking one. This should increase accuracy, reduce instability (sensitivity to perturbations of the data). Boosting, bagging, aggregation using exponential weights are relevant recent popular buzzwords.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The Occam dilemma: Occam's razor tells you to choose the simplest predictor. Aggregated predictors don't look particularly simple. But aggregation seems to be the right choice otherwise. I would think that Occam's razor tells you only that you should have a prior preference to simple functions. I think this is rather well understood by now.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Bellman: dimensionality -- curse or blessing: Many features are not bad per se. If your algorithm is prepared to deal with the high-dimensional inputs (SVMs, regularization, random forests are mentioned) then extracting many features can boost accuracy considerably. &lt;/li&gt;&lt;/ol&gt;In summary, I like the characterization of the difference between (classical) statistical approaches and machine learning.  However, I wonder if these differences are still as significant as they were (must have been) in 2001 when the article was written and if the differences will become smaller over time. Then it will be really difficult to answer the question on the difference between the statistical and the machine learning approaches.&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-3192080846159880063?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/3192080846159880063/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=3192080846159880063&amp;isPopup=true' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/3192080846159880063'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/3192080846159880063'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2008/03/statistical-modeling-two-cultures.html' title='Statistical Modeling: The Two Cultures'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-200567710830198976</id><published>2008-03-21T19:48:00.004-07:00</published><updated>2008-03-21T21:54:23.384-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='clinical trials'/><category scheme='http://www.blogger.com/atom/ns#' term='frequentist approach'/><category scheme='http://www.blogger.com/atom/ns#' term='bayesian models'/><category scheme='http://www.blogger.com/atom/ns#' term='bayesian analysis'/><title type='text'>Bayesian Statistics in Medical Clinical Trials</title><content type='html'>I came across a very interesting document.&lt;br /&gt;The document is titled &lt;span style="font-size:100%;"&gt;"&lt;a href="http://www.fda.gov/cdrh/osb/guidance/1601.html"&gt;Guidance     for the Use of Bayesian Statistics in Medical Device Clinical Trials&lt;/a&gt;". It is a draft guidelines poster by the Center for Devices and Radiological Health of &lt;/span&gt;&lt;span style="font-size:100%;"&gt;FDA, dated May 23, 2006.&lt;br /&gt;Why is this interesting? The job of FDA (the US Food and Drug Administration) is to make sure that the decisions in any clinical trial are made in a scientifically sound manner. Clearly, when following the Bayesian approach the choice of the prior and the model can influence the decisions. What does FDA do in this situation?&lt;br /&gt;The establish a process where they require a pre-specification (and agreement on) both the prior and the model, including an analysis of the operating characteristics of the design. This latter includes estimating the probability of erroneously approving an ineffective or unsafe device (the Type I error). This will typically be done by conducting Monte-Carlo simulations, where the Type I error is measured for the borderline cases when the device should not be approved. In the case of a large estimated Type I error, the trial will be rejected.&lt;br /&gt;Is this a good procedure? If the simulations use a biased model then the estimated Type I error might be biased. Their response is that both the prior and the model should be backed up with scientific arguments and existing statistics. Yet another problem is that the calculations often use MCMC. How do you determine if your samples converged to the posterior? The samples of the posterior are not iid. How do you know that you took enough samples of the posterior? (Think of a mixture of Gaussian, with a narrow Gaussian proposal.  If you sample from the mixture and then sample just a few points with Metropolis-Hastings, you will likely miss the second mode if the two modes are sufficiently far away.)&lt;br /&gt;On the other hand, there are a number of potential advantages to a Bayesian design. If we accept that the model and the prior is good, then often the Bayesian analysis will require smaller sample sizes to reach a decision (if they are not, the conclusion might be wrong).  It can also provide flexible methods for handling interim analyses (stopping when enough evidence is available for either approval or rejection) and sometimes good priors are available such as earlier studies on previous generations of a device or from &lt;/span&gt;&lt;span style="font-size:100%;"&gt;overseas &lt;/span&gt;&lt;span style="font-size:100%;"&gt;studies. Such approaches can be used with a fequentist approach, too, but the frequentist analysis of deriving a procedure is often non-trivial, while the Bayesian "only" needs to be concerned about computational issues.&lt;br /&gt;The document cites two trials that used Bayesian analysis. It appears that in both studies Bayesian analysis was used only as a supplementary information, i.e., the critical decisions (if a device is safe and minimally effective) were made using traditional, &lt;/span&gt;&lt;span style="font-size:100%;"&gt;frequentist&lt;/span&gt;&lt;span style="font-size:100%;"&gt; methods.&lt;br /&gt;Common to both the frequentist and the Bayesian approaches is the use of a number of unverified assumptions. In the frequentist case, if the design is simple then the typical assumption is only that there is a common underlying distribution to the outcome-patient pairs and that patients are selected uniformly at random from the population. This looks fairly minimal, but can be questioned nevertheless (drifts, environmental effects, sample biases, etc.). In a more complicated scenario there will be more assumptions. If the set of assumptions for the methods satisfy some containment relation then one naturally trusts the method that relies on less information. In the lack of containment the decision of which method to prefer is not so simple. In any case, it is very interesting to see how a regularity body (like FDA) wrestles with these fundamental issues. They look to act in a pretty reasonable manner. The existence of this document predicts that we should expect to see more decisions that used Bayesian analysis in the future. Is this good or bad? One could be concerned by the use of more unverified assumptions in the Bayesian analysis and that the probability of making an error can also increase because the calculations are non-trivial. Life is dangerous, is not it? But how dangerous will it be if Bayesian analysis is used routinely in assessing success in clinical trials? Time will tell for sure. Well, assuming some form of stationarity.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-200567710830198976?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/200567710830198976/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=200567710830198976&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/200567710830198976'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/200567710830198976'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2008/03/bayesian-statistics-in-medical-clinical.html' title='Bayesian Statistics in Medical Clinical Trials'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-5710382852368178833</id><published>2008-03-15T20:41:00.005-07:00</published><updated>2008-03-16T00:46:55.258-07:00</updated><title type='text'>Curse of dimensionality</title><content type='html'>I came across two papers today that discuss the curse of dimensionality. I thought this is just enough to write a short blog about the topic that definitely deserves attention. So, here we go:&lt;br /&gt;&lt;br /&gt;The first paper is by Flip Korn, Bernd-Uwe Pagel and Christos Faloutsos, the title is &lt;a href="http://www.informedia.cs.cmu.edu/pubs/abstract.asp?id=98"&gt;On the "Dimensionality Curse" and the "Self-Similarity Blessing"&lt;/a&gt;. This is a 2001 paper that won The paper is about nearest neighbor retrieval: You have $n$ datapoints that you can store and the task is to find the nearest neighbor among these datapoints of a query point. If the data lies in a $D$-dimensional space Euclidean space, it was a common wisdom to believe that the time required to find the nearest neighbor scales exponentially with $D$. The essence of the paper is that if the data lies on a low dimensional manifold then the complexity of search will depend only on the intrinsic dimensionality of the manifold. (The &lt;a href="http://hunch.net/%7Ejl/projects/cover_tree/cover_tree.html"&gt;cover tree package&lt;/a&gt; due to &lt;a href="http://hunch.net/%7Ebeygel/"&gt;Alina Beygelzimer&lt;/a&gt;, &lt;a href="http://ttic.uchicago.edu/%7Esham/"&gt;Sham Kakade&lt;/a&gt;, and &lt;a href="http://hunch.net/%7Ejl"&gt;John Langford&lt;/a&gt; should be mentioned here. This software is able to deal with data that does not necessarily lies in a vector space. The algorithms comes with guarantees similar to the above mentioned one. For details and further references see the &lt;a href="http://hunch.net/%7Ejl/projects/cover_tree/icml_final/final-icml.pdf"&gt;accompanying paper&lt;/a&gt;. I guess there were quite a few precursors to this paper since around 1998, where a lot of excitement was generated when people started to realize this phenomenon.) This paper has some relation to our recent work that I will write about at the end of this blog.&lt;br /&gt;&lt;br /&gt;So the good news here is that although with a little bit of a luck, a clever algorithm might run significantly faster than what is predicted by the worst case bounds. Note that a not so clever algorithm will not run faster. So you need luck for your data to display some regularities, but you need a clever design to take advantage of those regularities.&lt;br /&gt;&lt;br /&gt;The other paper is by &lt;a href="http://www-stat.stanford.edu/%7Edonoho/"&gt;David Donoho&lt;/a&gt; from 2000, &lt;a href="http://www-stat.stanford.edu/%7Edonoho/Lectures/CBMS/Curses.pdf"&gt;High-dimensional data analysis: The curses and blessings of dimensionality&lt;/a&gt;, an Aide-Memoire lecture delivered at the conference "Math Challenges of the 21st Century". Donoho mentions three curses (in optimization, Bellman's original use; in function approximation and in numerical integration). When talking about the blessings, he first talks about the concentration of measure (CoM) phenomenon, which roughly states that any Lipschitz function defined over the $D$-dimensional sphere is "nearly constant": If we place a uniform measure on the sphere then the the probability that it deviates from its expectation by more than $t$ is bounded by  $C_1 exp(- C_2 t)$, where $C_i$ are universal constants (i.e., they neither depend on $D$, nor $f$)! This is important since often we are interested in an expected value of a Lipschitz function and we hope to estimate the expected value by the observed (random) value of the function. This phenomenon is important in model selection, as explained to some extent in the paper. The second blessing, related to the CoM phenomenon is "dimension asymptotics". The observation is that when the CoM applies, often as $D$ goes to infinity, the distributions converge to some limiting distribution. Sometimes then it becomes possible to obtain predictions that work for moderate dimensions but which are derived by using the limiting distributions (the example mentioned is the prediction of the top eigenvalue of a data covariance matrix). The third blessing is when the data is a sampled version of a continuous phenomenon (he calls this "approach to continuum"). Since what is measured is continuous,  the space of observed data will show signs of compactness that can be exploited. Here the example is that a basis derived from some sampled data with some procedure tends to resemble wavelets, an object from the continuous world. For larger $D$ the resemblance becomes stronger and so the interpretation becomes easier. The is the bless of dimensionality in action.&lt;br /&gt;&lt;br /&gt;Back in 2001 he mentioned 3 areas of potential interest: The first is high-dimensional approximate linear algebra (use randomization, the concentration of measure phenomenon to speed up calculations). Here he mentions rank $k$ approximation to data matrices. The second area is when the data are curves or images and the problem is to select a good basis. He mentions an interesting example here: We have a single data point and $D$ is very large. Then if the data is the realization of a stationary Gaussian stochastic process (i.e., coordinates are correlated) then we can in fact learn the full probability distribution. The third example is coming from approximation theory. Here, he cites a well-known result of Barron, that imposes a constraint on the derivative of the function: The derivative must be such that its Fourier transform is integrable. Since assuming $s$ times differentiabily typically yields that the function can be approximated only at a rate of $n^{-s/D}$, we expect to see a slow approximation rate of $n^{-1/D}$. However, Barron has shown that this type of functions can be approximated at the rate $n^{-1/2}$. What this and similar results point to that there are non-classical spaces of functions where things might work differently from what one expects. Again, the need arises for algorithms that are clever enough to take advantage if the situation is such that the curse of dimensionality can be avoided.&lt;br /&gt;&lt;br /&gt;On another note, recently with &lt;a href="http://www.cs.ualberta.ca/%7Eamir/"&gt;Amir massoud Farahmand&lt;/a&gt; and &lt;a href="http://cermics.enpc.fr/%7Eaudibert/"&gt;Jean-Yves Audibert&lt;/a&gt; we published a &lt;a href="http://www.sztaki.hu/%7Eszcsaba/papers/dimicml.pdf"&gt;paper&lt;/a&gt; related to these topics at ICML. The subject is dimensionality estimation of manifolds based on random samples supported by the manifold. Here again, the good news is that the embedding dimension does not play a direct role in the quality of the estimates (only through the properties of the embedding). One motivation for coming up with this method was that we wanted to modify existing regression procedures so that when the data lies on a low dimensional manifold, the methods should converge faster. We figured that many algorithms tune their parameters to the dimensionality of the data points and if we replace that dimensionality by the manifold dimensionality, we get a "manifold adaptive" method, which is clever enough to converge faster when the input lies on a manifold. Interestingly, these results can be obtained without ever estimating the manifold itself. However, this should be the subject of another post!&lt;br /&gt;&lt;br /&gt;On a last note, quite a few references to works dealing with the curse of dimensionality can be found in the &lt;a href="http://www.inma.ucl.ac.be/%7Efrancois/these/papers/"&gt;annotated bibliography&lt;/a&gt; of &lt;a href="http://www.inma.ucl.ac.be/%7Efrancois"&gt;Damien Francois&lt;/a&gt;. Enjoy!&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-5710382852368178833?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/5710382852368178833/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=5710382852368178833&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/5710382852368178833'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/5710382852368178833'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2008/03/curse-of-dimensionality.html' title='Curse of dimensionality'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-5690909187568915547</id><published>2007-08-12T06:12:00.000-07:00</published><updated>2007-08-27T13:24:10.969-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sample complexity'/><category scheme='http://www.blogger.com/atom/ns#' term='learning theory'/><title type='text'>Discriminative vs. generative learning: which one is more efficient?</title><content type='html'>I just came across a paper by &lt;a href="http://research.google.com/pubs/author116.html"&gt;Philip M. Long&lt;/a&gt;, &lt;a href="http://www.cs.columbia.edu/%7Erocco/"&gt;Rocco Servedio&lt;/a&gt; and &lt;a href="http://www.ruhr-uni-bochum.de/lmi/simon/"&gt;Hans Ulrich Simon&lt;/a&gt;. (&lt;a href="http://www.cs.columbia.edu/%7Erocco/papers/colt06discgen.html"&gt;Here&lt;/a&gt; is a link to the paper titled "Discriminative Learning can Succeed where Generative Learning Fails".) The question investigated in the paper is the following:&lt;br /&gt;We are in a classification setting and the learning problem is defined by a pair of jointly distributed random variables, &lt;span style="font-style: italic;"&gt;(X,Y)&lt;/span&gt;, where &lt;span style="font-style: italic;"&gt;Y&lt;/span&gt; can take on the values +1 and -1. Question: How many iid copies of this pair does an algorithm need to &lt;span style="font-style: italic;"&gt;(i)&lt;/span&gt; find a classifier that yields close to optimal performance with high probability &lt;span style="font-style: italic;"&gt;(ii)&lt;/span&gt; find two score functions, one trained with the positive examples &lt;span style="font-style: italic;"&gt;only&lt;/span&gt;, the other with the negative examples &lt;span style="font-style: italic;"&gt;only&lt;/span&gt; such that the sign of the difference of the two score functions gives a classifier that is almost optimal with high probability?&lt;br /&gt;The result in the paper is that there exists a class of distributions, parameterized by &lt;span style="font-style: italic;"&gt;d&lt;/span&gt; (determining the dimension of samples) such that there is a discriminative algorithm (tuned to this class) that can learn the correct classifier with only $2log(2/\delta)$ samples, while the number of samples required for &lt;span style="font-style: italic;"&gt;any&lt;/span&gt; generative classifier is at least &lt;span style="font-style: italic;"&gt;d&lt;/span&gt;.&lt;br /&gt;Since it is clear that the requirements of generative learning are stronger than those of discriminative learning, &lt;span style="font-style: italic;"&gt;it follows that in the above framework discriminative learning is strictly "easier" than generative learning&lt;/span&gt;.&lt;br /&gt;The distribution concentrates on &lt;span style="font-style: italic;"&gt;O(d)&lt;/span&gt; samples and the main idea is that the joint knowledge of positive and negative samples suffices for the easy identification of the target distribution (hence, classifier), while knowing only either the positive or negative examples alone is insufficient. Two special inputs, both marked for easy of recognition, determine the full distribution jointly but one of the inputs is in the positive set, the other is in the negative set and the knowledge of only one of them is insufficient to learning the otherwise "difficult" to learn distribution.&lt;br /&gt;Although the construction given in the paper is simple and works as intended, it is arguably "artificial and contrived", as it was also noted by the authors. In particular, does a similar result hold when we consider a continuous domain of a fixed dimension, and/or we restrict the class of algorithms to consistent ones? Further, the example shows more the limitation of algorithms that learn from positive and negative samples independently of each other than the limitation of generative algorithms (generative algorithms traditionally refer to learners that estimate the joint distribution of the inputs and outputs).&lt;br /&gt;The question of whether generative or discriminative algorithms are more efficient are particularly interesting in light of an &lt;a href="http://ai.stanford.edu/%7Eang/papers/nips01-discriminativegenerative.ps"&gt;earlier paper&lt;/a&gt; by &lt;a href="http://ai.stanford.edu/%7Eang/"&gt;Andrew Y. Ng&lt;/a&gt; and &lt;a href="http://www.cs.berkeley.edu/%7Ejordan/"&gt;Michael Jordan&lt;/a&gt; ("On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes", NIPS-14, 2002). In this paper the authors compare one particular discriminative algorithm with another particular algorithm that is "generative". The two algorithms are &lt;span style="font-style: italic;"&gt;logistic regression&lt;/span&gt; and&lt;span style="font-style: italic;"&gt; naive Bayes&lt;/span&gt; and together they form what is called a "Generative-Discriminative" pair. The meaning of this is that while naive Bayes maximizes the total joint log-likelihood, $\sum_{i=1}^n \log p_\theta(x_i,y_i)$ over the samples, logistic regression maximizes the total conditional likelihood, $\sum_{i=1}^n \log p_\theta(y_i|x_i)$ over the &lt;span style="font-style: italic;"&gt;same &lt;/span&gt;parametric model. In the case of these two particular algorithms the parametric model is written in terms of $p(x|y)$ and asserts independence of the individual components of the feature-vector $x$. (For continuous spaces the individual feature distributions, $p(x_k|y)$, are modeled as Gaussians with unknown mean and variance.) The agnostic setting is considered (the input distribution is not restricted). Since both learners pick a classifier from the very same class of classifiers and the  discriminative learner in the limit of an infinite number of samples converges to the best classifier from this class, it follows that the ultimate loss suffered by the discriminative learner is never higher than that suffered by the generative learner. Hence, it seems that if the naive Bayes assumption made by the generative method is not met, &lt;span style="font-style: italic;"&gt;the discriminative method can have an edge&lt;/span&gt; -- at least ultimately (open issue: give an example that shows positive separation!). However, this is just half of the story: the generative model may converge faster! In particular, the authors state an upper bound on the convergence of loss for the &lt;span style="font-style: italic;"&gt;generative model&lt;/span&gt; that scales with $\sqrt{1/n \log(d)}$ ($d$ is the number of components of $x$), while as follows from standard uniform convergence results, the same convergence rate for the &lt;span style="font-style: italic;"&gt;discriminative method&lt;/span&gt; is $\sqrt{d/n \log(d/n)}$. They argue that this result follows since the hypothesis class has a VC-dimension of $d$. Note the difference in the way the two bounds scale with $d$, the dimension of the input space: In the case of the discriminative algorithm the scaling is (log-)linear with $d$, while in the case of the generative algorithm it is logarithmic in $d$. (Strictly speaking, the upper bound would call for a proof since logistic regression is &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; a risk-minimization algorithm for the &lt;span style="font-style: italic;"&gt;0/1 loss&lt;/span&gt; and the cited theory has been worked out for risk-minimization algorithms.) Since comparing upper bounds is not really appropriate, they point to existing lower bounds that show that the worst-case sample complexity is &lt;span style="font-style: italic;"&gt;lower bounded&lt;/span&gt; by $O(d)$. (My favorite paper on this is by Antos and Lugosi: "Strong minimax lower bounds for learning": &lt;a href="http://www.szit.bme.hu/%7Eantos/ps/anlu_strlower.ps.gz"&gt;link&lt;/a&gt;.) Hence, the conclusion of Ng and Jordan is that, contrary to the widely held belief, generative learning can sometimes be more efficient than discriminative learning; at least when the number of features is large compared to the number of samples and the ultimate loss of the generative learner is not much higher than that of the discriminative learner. They also performed experiments on UCI datasets to validate this claim. The experiments show that the performance of naive Bayes is indeed often better when the number of examples is smaller. Unfortunately, the figures show the loss as a function of the number of samples only and hence validate the theory only half-way since for a full validation we should know how the dimension of the dataset influences the performance.&lt;br /&gt;A similar conclusion to that of Ng and Jordan was shown to hold in an earlier KDD-97 paper titled "Discriminative vs. Informative Learning" by Y. Dan Rubinstein and &lt;a href="http://www-stat.stanford.edu/%7Ehastie/"&gt;Trevor Hastie&lt;/a&gt; (&lt;a href="http://www-stat.stanford.edu/%7Ehastie/Papers/kdd97.ps"&gt;link&lt;/a&gt;) who did a case study with simulated and real examples.&lt;br /&gt;The convergence rate comparison looks quite intriguing at the first sight. However, some more thinking reveals that the story is far from being finished. To see this consider the "rate" function $r(n,d) = L_{gen}$ if $n\le D-1$, $r(n,d) =\min(L_{gen},\sqrt{d/n})$ if $n\ge D$. Imagine that $r(n,d)$ is an upper bound on the loss of some learner, where $L_{gen}$ is the loss of the generative learner. The rate of convergence then is $\sqrt{d/n}$, not contradicting with the lower bounds, but clearly, this discriminative learner will be competitive with the generative learner.&lt;br /&gt;&lt;d$, if="" n="" ge="" imagine="" that="" an="" upper="" bound="" on="" some="" where="" loss="" rate="" of="" convergence="" then="" is="" d="" not="" contradicting="" lower="" but="" this="" discriminative="" learner="" will="" be="" competitive="" with="" the="" generative=""&gt;So is discriminative learning more (sample) efficient than generative learning? It seems that sometimes having both the positive and negative samples at hand can be useful. However, generative learning might be advantageous when the model considered fits well the data. Hence, the case is not yet settled. Many interesting open questions!&lt;/d$,&gt;&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-5690909187568915547?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/5690909187568915547/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=5690909187568915547&amp;isPopup=true' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/5690909187568915547'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/5690909187568915547'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2007/08/discriminative-vs-generative-learning.html' title='Discriminative vs. generative learning: which one is more efficient?'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-3487404036540443657</id><published>2007-07-30T04:20:00.001-07:00</published><updated>2007-07-30T05:14:42.450-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reinforcement learning'/><category scheme='http://www.blogger.com/atom/ns#' term='representation learning'/><title type='text'>Learning Symbolic Models</title><content type='html'>It is quite a common sense that successful generalization is the key to efficient learning in difficult environment. It appears to me that this must be especially true for reinforcement learning.&lt;br /&gt;One potentially very powerful idea to achieve successful generalization is to learn symbolic models. Why? It is because a symbolic model (almost by definition) allows for very powerful generalizations (e.g. actions with parameters, state representation of environments with a variable number of objects with different object types, etc.).&lt;br /&gt;JAIR just published the paper on this topic by H. M. Pasula, L. S. Zettlemoyer and L. P. Kaelbling, with the title &lt;a href="http://www.jair.org/papers/paper2113.html"&gt;"Learning Symbolic Models of Stochastic Domains"&lt;/a&gt;. A brief glance reveals that the authors propose a greedy learning method, assuming a particular representation. The learning problem itself was shown earlier to be NP-hard, hence this sounds like a valid approach. &lt;br /&gt;However, one thing is badly missing from this approach: learning the representation itself. It appears that the authors' assumption is that the state representation of the environment is given in an appropriate symbolic form. This is in my opinion a very strong assumption -- at least when it comes to deal with certain real-world problems when only noisy sensory information of the environment is available (think of robotics). &lt;br /&gt;However, I must realize that I am disappointed only because I wrongly assumed (given the title) that the paper will be about the problem of learning representations. Will we ever be able to learn (symbolic) representation in an efficient manner? What are the conditions that allow efficient learning? Do we need the flexibility of symbolic representations to scale up reinforcement learning at all? Here, at UofA several people are interested in these questions (well, who is not??). More work needs to be done, but hopefully, some day you will here more about our efforts..&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-3487404036540443657?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/3487404036540443657/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=3487404036540443657&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/3487404036540443657'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/3487404036540443657'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2007/07/learning-symbolic-models.html' title='Learning Symbolic Models'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-3165961615149055096</id><published>2007-05-21T14:26:00.000-07:00</published><updated>2007-05-21T14:29:24.871-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='optimization tools'/><title type='text'>Minimize!</title><content type='html'>A matlab code for minimizing a multivariate function whose partial derivatives are available by Carl Rasmussen is downloadable from &lt;a href="http://www.kyb.tuebingen.mpg.de/bs/people/carl/code/minimize/"&gt;here&lt;/a&gt;. The routine looks pretty efficient, at least on the classical Rosenbrock function.&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-3165961615149055096?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/3165961615149055096/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=3165961615149055096&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/3165961615149055096'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/3165961615149055096'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2007/05/minimize.html' title='Minimize!'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-452499375066246358</id><published>2007-05-20T10:07:00.000-07:00</published><updated>2007-05-20T13:05:31.212-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compression'/><category scheme='http://www.blogger.com/atom/ns#' term='approximation theory'/><title type='text'>A notion of function compression</title><content type='html'>The following compressibility concept is introduced by Harnik and Naor in their recent &lt;a href="http://www.wisdom.weizmann.ac.il/%7Enaor/PAPERS/compressibility.pdf"&gt;paper&lt;/a&gt;: Given a function $f$ over some domain, a compression algorithm for $f$ should efficiently compress an input $x$ in a way that will preserve the information needed to compute $f(x)$. Note that if $f$ is available (and efficiently computable) then compression is trivial as then $y=f(x)$ will serve the purpose of the compact representation of $x$. Actually, the concept was originally studied in the framework of NP decision problems where unless $P=NP$ $f$ is not efficiently computable, hence the trivial solution is not available.&lt;br /&gt;&lt;br /&gt;I am wondering if this compressibility notion could be used in learning theory or function approximation? Consider e.g. classification problems so that the output of $f$ is  $\{0,1\}$. In order to prevent the trivial solution we may require that $x$ be compressed to some $y$ such that for some fixed (efficiently computable) function $g$, $f(g(y))=f(x)$. We do not require that $g(y)=x$, though if $g$ satisfies this then we certainly get a solution: if a compact representation of $\{ x| f(x)=1\}$ is available then $f$ can be well-compressed. We might allow of course for some error and study approximation rates. Of course, the notation applies to real-valued or more general functions, too. What is the class of functions that can be efficiently  compressed? How is this class related to classical smoothness spaces (in the case of real-valued functions)? One obvious thought is that the smoother the target function is, the better the obvious discretization approach would be.  It would be interesting to find out whether this new approch could lead to new regularity descriptions by varying the restrictions on the compression.&lt;br /&gt;&lt;br /&gt;Another related thought is to compress the training samples in a learning problem such that the solution is kept (roughly) the same.&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-452499375066246358?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/452499375066246358/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=452499375066246358&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/452499375066246358'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/452499375066246358'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2007/05/notion-of-function-compression.html' title='A notion of function compression'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-7926798128169379642</id><published>2007-04-17T00:38:00.000-07:00</published><updated>2007-08-12T10:01:37.923-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='technical'/><title type='text'>LaTeX support</title><content type='html'>I added the following two lines to the header section of the template of this page:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;  &amp;lt;script type="text/javascript"&lt;br /&gt;src="http://www.maths.nottingham.ac.uk/personal/drw/LaTeXMathML.js"&amp;gt;&lt;br /&gt;&amp;lt;/script&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Suddenly, I can type (almost) anything in latex, e.g. \$a_n\$ becomes $a_n$. Fancy!&lt;br /&gt;(If you do not see anything fancy then either Javascript is disabled in your browser or you are using Internet Explorer without MathML support. In the latter case you may want to download&lt;a href="http://www.dessci.com/en/products/mathplayer/download.htm"&gt; MathPlayer&lt;/a&gt; by DesignScience.)&lt;br /&gt;&lt;br /&gt;Many thanks for &lt;a href="http://www1.chapman.edu/%7Ejipsen/"&gt;Peter Jipsen&lt;/a&gt; the folks who developed &lt;a href="http://www1.chapman.edu/%7Ejipsen/asciimath.html"&gt;ASCIIMathML&lt;/a&gt;, which serves as the basis of &lt;a href="http://www.maths.nottingham.ac.uk/personal/drw/lm.html"&gt;LaTeXMathML&lt;/a&gt; by &lt;a href="http://www.maths.nottingham.ac.uk/personal/drw/"&gt;Douglas R. Woodall&lt;/a&gt;. Examples showing what is possible with LatexMathML can be found &lt;a href="http://www.maths.nott.ac.uk/personal/drw/lmtest.html"&gt;here&lt;/a&gt;. This is an indispensable tool!&lt;br /&gt;&lt;br /&gt;The nice thing is that MathML is scaleable:&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-size:180%;"&gt;$E=m c^2$&lt;/span&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-7926798128169379642?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/7926798128169379642/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=7926798128169379642&amp;isPopup=true' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/7926798128169379642'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/7926798128169379642'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2007/04/proba.html' title='LaTeX support'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-5483181540115387644</id><published>2007-04-16T18:05:00.000-07:00</published><updated>2007-04-16T18:34:02.855-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mixing'/><category scheme='http://www.blogger.com/atom/ns#' term='exploration'/><category scheme='http://www.blogger.com/atom/ns#' term='reinforcement learning'/><title type='text'>The Fastest Mixing Markov Chain on a Graph</title><content type='html'>The paper can be found &lt;a href="http://www.stanford.edu/%7Eboyd/reports/fmmc.pdf"&gt;here&lt;/a&gt;. The authors are &lt;a href="http://www.stanford.edu/%7Eboyd/"&gt;Stephen Boyd&lt;/a&gt;, Persi Diaconis and Lin Xiao. I have found the paper while looking at the papers by &lt;a href="http://www-stat.stanford.edu/%7Ecgates/PERSI/"&gt;Perso Diaconis&lt;/a&gt;, a notable mathematician and magician.&lt;br /&gt;&lt;br /&gt;The paper talks about exactly what the title suggests: You are given a finite graph and you can set up a random walk on this graph by determining the transition probabilities between vertices that are connected by an edge. The walk must be symmetric so that the uniform distribution is a stationary distribution of this walk. Assuming that the associated Markov chain is irreducible and symmetric, the state distribution will converge to the uniform. The task is to maximize the rate of convergence of this. The solution is the Fastest Mixing Markov Chain on the graph (FMMC).&lt;br /&gt;&lt;br /&gt;The authors show that this is a convex optimization problem and give a polynomial algorithm (based on semidefinite programming) to find the solution. A subgradient method is given that can be more effective for larger graphs. The solution is generalized to the non-symmetric case by considering reversible Markov chains.&lt;br /&gt;&lt;br /&gt;Why I find this paper interesting? Well, it is always nice to find out that a problem is convex hence can be solved in an "efficient" manner.  But the main reason is that if an agent wanted to find out the most about the states of its environment, it would need to find exactly the FMMC (assuming a finite state environment). Questions: What if the transitions probabilities cannot be changed in an arbitrary manner. The case I have in mind is when you have an Markovian Decision Problem and you want to find the fastest mixing policy, where the goal is to converge say e.g. to a distribution which is close to uniform (the uniform distribution might not be a solution). Is there a well-defined solution to this problem? Do the ideas generalize to this problem? To what extent? Can such a solution be found while interacting with the environment?&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-5483181540115387644?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/5483181540115387644/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=5483181540115387644&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/5483181540115387644'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/5483181540115387644'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2007/04/fastest-mixing-markov-chain-on-graph.html' title='The Fastest Mixing Markov Chain on a Graph'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-4227203278643299355</id><published>2007-04-15T19:43:00.000-07:00</published><updated>2007-04-15T20:35:50.177-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='model selection'/><title type='text'>The Loss Rank Principle by Marcus Hutter</title><content type='html'>I found the &lt;a href="http://arxiv.org/PS_cache/math/pdf/0702/0702804v1.pdf"&gt;paper&lt;/a&gt; posted by Marcus Hutter on arxiv quite interesting. The paper is about model (or rather predictor) selection. The idea is a familiar one, but the details appear to be novel: You want to find a model which yields small loss on the dataset available, while yielding a larger loss on most other datasets.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Classification&lt;/span&gt;: The simplest case is when we consider supervised learning and the target set is finite. Then you can count the number of target label variations such that the predictor's loss is smaller than its loss when the true targets are used. This idea sounds very similar to the way Rademacher complexity works, see e.g. the &lt;a href="http://www.econ.upf.es/%7Elugosi/penaltynewrev.pdf"&gt;paper&lt;/a&gt; of Lugosi and Wegkamp, where a localized version of Rademacher complexity is investigated.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Regression&lt;/span&gt;: For continuous targets you can use a grid with an increasing resolution (assume that the range of targets is bounded) and count the number of gridpoints such that the predictor's loss is less than its loss on the true dataset.&lt;br /&gt;With an appropriate normalization this converges to the volume of such target values (hopefully this set is measurable:)).&lt;br /&gt;&lt;br /&gt;The paper does not go very far: Some examples are given that demonstrate that the criterion gives a computable procedure and that this procedure is reasonable. A quick comparison to alternatives is given. It will be interesting to see the further developments!&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-4227203278643299355?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/4227203278643299355/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=4227203278643299355&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/4227203278643299355'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/4227203278643299355'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2007/04/loss-rank-principle-by-marcus-hutter.html' title='The Loss Rank Principle by Marcus Hutter'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1374054023917435198.post-2173854530732427915</id><published>2007-04-14T19:48:00.000-07:00</published><updated>2007-04-14T20:41:51.522-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='blogging'/><title type='text'>Why this Blog??</title><content type='html'>I am struggling with organizing the notes I make after lectures or after reading a paper. Hence I will experiment with this fancy way of keeping track of my thoughts. Of course, I will be happy to receive feedback from the occasional readers.&lt;br /&gt;&lt;br /&gt;We will see how well it goes!&lt;div class="blogger-post-footer"&gt;MLReadings&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1374054023917435198-2173854530732427915?l=readingsml.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://readingsml.blogspot.com/feeds/2173854530732427915/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1374054023917435198&amp;postID=2173854530732427915&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/2173854530732427915'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1374054023917435198/posts/default/2173854530732427915'/><link rel='alternate' type='text/html' href='http://readingsml.blogspot.com/2007/04/why-this-blog.html' title='Why this Blog??'/><author><name>Csaba Szepesvári</name><uri>http://www.blogger.com/profile/13790307935040509983</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://www.cs.ualberta.ca/~szepesva/image_new.gif'/></author><thr:total>0</thr:total></entry></feed>
