I want to keep telling you about information geometry… but I got sidetracked into thinking about something slightly different, thanks to some fascinating discussions here at the CQT. There are a lot of people interested in entropy here, so some of us ? Oscar Dahlsten, Mile Gu, Elisabeth Rieper, Wonmin Son…]]>

I want to keep telling you about information geometry… but I got sidetracked into thinking about something slightly different, thanks to some fascinating discussions here at the CQT.

There are a lot of people interested in entropy here, so some of us ? Oscar Dahlsten, Mile Gu, Elisabeth Rieper, Wonmin Son and me ? decided to start meeting more or less regularly. I call it the Entropy Club. I’m learning a lot of wonderful things, and I hope to tell you about them someday. But for now, here’s a little idea I came up with, triggered by our conversations:

? John Baez, Rnyi entropy and free energy.

In 1960, Alfred Rnyi defined a generalization of the usual Shannon entropy that depends on a parameter. If $latex p$ is a probability distribution on a finite set, its **Rnyi entropy** of order $latex beta$ is defined to be

View original post 685 more words

My first blog on machine learning is to discuss a pet peeve I have about working in the industry, namely why not to apply an RBF kernel to text classification tasks.

I wrote this as a follow up to a Quora Answer on the subject:

*I will eventually re-write this entry once I get better at Latex. For now, refer to *

Smola, Scholkopf, and Muller*, The connection between regularization operators and support vector kernels *http://cbio.ensmp.fr/~jvert/svn/bibli/local/Smola1998connection.pdf

I expand on one point–*why not to use Radial Basis Function (RBF) Kernels for Text Classification.* I encountered this while a consultant a few years ago eBay, where not one but 3 of the teams (local, German, and Indian) were all doing this, with no success They are were treating a multi-class text classification problem using an SVM with an RBF Kernel. What is worse, they were claiming the RBF calculations…

View original post 686 more words

**Applications**

- Matrix Factorization Techniques for Recommender Systems: An article by Koren, Bell and Volinsky in IEEE computer magazine.

**Algorithms and Code**

- Netflix Update: Try this at Home: This is such a nice online resource that it has been cited by many research papers.
- Matrix Factorization: A Simple Tutorial and Implementation in Python: This page explains all the math around a stochastic descent for matrix factorization.
- Timely Development: Some Analysis of the Netflix data with C++ code.

**Large Scale Implementation**

- Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent:
*KDD 2011*paper by R Gemulla*et. al.* - Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce:
*WWW 2010*paper by C Liu*et. al.*

import scipy.io as sio #A is the required matrix sparse or dense sio.mmwrite(filename, A) #note the extension .mtx is given to the filename by scipy

To work with matrix market format in matlab we need to have the files – *mminfo.m, mmread.m, mmewrite.m* all them can be found from the matrix market website. These files must be present either in the present directory in matlab or in the path directories. Suppose the file ‘Mat1.txt.mtx’ contained our matrix that we saved from python to read it in matlab we need to just write the following code.

A = mmread('Mat1.txt.mtx')

The required matrix will be stored as in variable A.

]]>all(l[i] <= l[i+1] for i in xrange(len(l)-1))]]>

- Install the following packages: pymacs, python-mode, python-rope, python-ropemacs, auto-complete-el
- Update the .emacs file as give in : http://stackoverflow.com/questions/2855378/ropemacs-usage-tutorial/2855895#2855895

`timeseries.fill_missing_dates()`

method. One of the arguments it takes is `fill_value`

, this is the default value we want to set for the missing data. But it does not work as intended. In fact the missing data is masked. To fill in the required data one must use the `timeseries.filled(fill_value)`

method. Here is an example:

>>>import scikits.timeseries as ts
>>> datarr = ts.date_array(['2009-01-01', '2009-01-05'], freq='D')

>>> datarr

DateArray([01-Jan-2009, 05-Jan-2009],

freq='D')

>>> sr1 = ts.time_series([3,4], datarr)

>>> sr1

timeseries([3 4],

dates = [01-Jan-2009 05-Jan-2009],

freq = D)

>>> m1 = sr1.fill_missing_dates(fill_value=0)

>>> m1

timeseries([3 -- -- -- 4],

dates = [01-Jan-2009 ... 05-Jan-2009],

freq = D)

`>>> m1.filled(0)`

timeseries([3 0 0 0 4],

dates = [01-Jan-2009 ... 05-Jan-2009],

freq = D)