<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Semi-supervised Naive Bayes in Python</title>
	<atom:link href="http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/</link>
	<description>Machine Learning, Data Mining, Natural Language Processing…</description>
	<lastBuildDate>Mon, 02 Jan 2012 10:53:06 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
	<item>
		<title>By: Purvi</title>
		<link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/#comment-229015</link>
		<dc:creator>Purvi</dc:creator>
		<pubDate>Sat, 26 Nov 2011 16:27:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.mblondel.org/journal/?p=126#comment-229015</guid>
		<description>Hello rick wei,

As you suggest, I have provided input data for td and delta in train function, but I am getting error of objects are not aligned. Here is output of that

 File &quot;/var/test/seminb.py&quot;, line 156, in train
    self.p_w_c[w,c] += td[w,d] * delta[d,c]
  File &quot;/usr/lib/python2.6/dist-packages/numpy/core/defmatrix.py&quot;, line 290, in __mul__
    return N.dot(self, asmatrix(other))
ValueError: objects are not aligned

I am running it on ubuntu linux system.

Please advise what should i do to successfully run train and other functions.</description>
		<content:encoded><![CDATA[<p>Hello rick wei,</p>
<p>As you suggest, I have provided input data for td and delta in train function, but I am getting error of objects are not aligned. Here is output of that</p>
<p> File &#8220;/var/test/seminb.py&#8221;, line 156, in train<br />
    self.p_w_c[w,c] += td[w,d] * delta[d,c]<br />
  File &#8220;/usr/lib/python2.6/dist-packages/numpy/core/defmatrix.py&#8221;, line 290, in __mul__<br />
    return N.dot(self, asmatrix(other))<br />
ValueError: objects are not aligned</p>
<p>I am running it on ubuntu linux system.</p>
<p>Please advise what should i do to successfully run train and other functions.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Purvi</title>
		<link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/#comment-228980</link>
		<dc:creator>Purvi</dc:creator>
		<pubDate>Thu, 24 Nov 2011 10:26:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.mblondel.org/journal/?p=126#comment-228980</guid>
		<description>Hello..
 thanks for your code..
 Can anyone give complete working code with input matrix ? Thanks in advance..</description>
		<content:encoded><![CDATA[<p>Hello..<br />
 thanks for your code..<br />
 Can anyone give complete working code with input matrix ? Thanks in advance..</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: hoh</title>
		<link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/#comment-228535</link>
		<dc:creator>hoh</dc:creator>
		<pubDate>Thu, 10 Nov 2011 15:56:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.mblondel.org/journal/?p=126#comment-228535</guid>
		<description>Does weka support Semi-Supervised Learning?</description>
		<content:encoded><![CDATA[<p>Does weka support Semi-Supervised Learning?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mathieu</title>
		<link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/#comment-227952</link>
		<dc:creator>Mathieu</dc:creator>
		<pubDate>Thu, 22 Sep 2011 07:42:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.mblondel.org/journal/?p=126#comment-227952</guid>
		<description>Just quickly looking at the equations in this post, there doesn&#039;t seem to be any reason that it shouldn&#039;t work. So I guess it&#039;s probably a bug in your implementation. Try to keep in mind the big picture: first train your classifier on the labeled data, then use your current classifier to find probabilistic labels of the unlabeled data, retrain with all the data (labeled and probabilistically labeled) and repeat. Also keep your implementation simple until you get it right and optimize performance afterwards.</description>
		<content:encoded><![CDATA[<p>Just quickly looking at the equations in this post, there doesn&#8217;t seem to be any reason that it shouldn&#8217;t work. So I guess it&#8217;s probably a bug in your implementation. Try to keep in mind the big picture: first train your classifier on the labeled data, then use your current classifier to find probabilistic labels of the unlabeled data, retrain with all the data (labeled and probabilistically labeled) and repeat. Also keep your implementation simple until you get it right and optimize performance afterwards.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Abhaya</title>
		<link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/#comment-227928</link>
		<dc:creator>Abhaya</dc:creator>
		<pubDate>Tue, 20 Sep 2011 11:18:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.mblondel.org/journal/?p=126#comment-227928</guid>
		<description>Hi Mathieu,

I am trying to use the Kamal Nigam&#039;s algorithm for the Multi-Variate Burnoulli NB.  I am running into the strange problem of likelihood of data decreasing with every EM step! I believe that I might have made a mistake in adopting the algorithm for the Burnoulli case. Do you have any thoughts on how that can be done correctly?

Regards,
Abhaya</description>
		<content:encoded><![CDATA[<p>Hi Mathieu,</p>
<p>I am trying to use the Kamal Nigam&#8217;s algorithm for the Multi-Variate Burnoulli NB.  I am running into the strange problem of likelihood of data decreasing with every EM step! I believe that I might have made a mistake in adopting the algorithm for the Burnoulli case. Do you have any thoughts on how that can be done correctly?</p>
<p>Regards,<br />
Abhaya</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Quora</title>
		<link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/#comment-226928</link>
		<dc:creator>Quora</dc:creator>
		<pubDate>Thu, 05 May 2011 16:38:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.mblondel.org/journal/?p=126#comment-226928</guid>
		<description>&lt;strong&gt;How big the training set should be in the Naive Bayes text classification?...&lt;/strong&gt;

* First make  sure that data is balanced. Since simple naive Bayesian algorithm  won&#039;t work for unbalanced dataset. If dataset is unbalanced, then I suggest you to try out complement Bayesian algorithm. Weka(http://www.cs.waikato.ac.nz/ml/weka/) and m...</description>
		<content:encoded><![CDATA[<p><strong>How big the training set should be in the Naive Bayes text classification?&#8230;</strong></p>
<p>* First make  sure that data is balanced. Since simple naive Bayesian algorithm  won&#8217;t work for unbalanced dataset. If dataset is unbalanced, then I suggest you to try out complement Bayesian algorithm. Weka(http://www.cs.waikato.ac.nz/ml/weka/) and m&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mathieu</title>
		<link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/#comment-226875</link>
		<dc:creator>Mathieu</dc:creator>
		<pubDate>Fri, 22 Apr 2011 09:09:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.mblondel.org/journal/?p=126#comment-226875</guid>
		<description>I think it&#039;s a bug in your program. It means that you&#039;re doing something like this: arr[i, j] = [1, 2, 3, ...]. You cannot set the element of an array with a sequence.</description>
		<content:encoded><![CDATA[<p>I think it&#8217;s a bug in your program. It means that you&#8217;re doing something like this: arr[i, j] = [1, 2, 3, ...]. You cannot set the element of an array with a sequence.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rick wei</title>
		<link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/#comment-226873</link>
		<dc:creator>Rick wei</dc:creator>
		<pubDate>Thu, 21 Apr 2011 21:14:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.mblondel.org/journal/?p=126#comment-226873</guid>
		<description>Hi,
Thanks for your program share.

I met an error where the number of row of numpy.array more than 1600.
&quot;&quot;&quot; ValueError: setting an array element with a sequence. &quot;&quot;&quot;

Does the numpy limit the size of array? If the anwer is yes, do you know how to expend the array size?</description>
		<content:encoded><![CDATA[<p>Hi,<br />
Thanks for your program share.</p>
<p>I met an error where the number of row of numpy.array more than 1600.<br />
&#8220;&#8221;" ValueError: setting an array element with a sequence. &#8220;&#8221;"</p>
<p>Does the numpy limit the size of array? If the anwer is yes, do you know how to expend the array size?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: rick wei</title>
		<link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/#comment-226849</link>
		<dc:creator>rick wei</dc:creator>
		<pubDate>Fri, 15 Apr 2011 14:08:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.mblondel.org/journal/?p=126#comment-226849</guid>
		<description>Hi,
Very appreciate for your response. It&#039;s very helpful. Thank you.</description>
		<content:encoded><![CDATA[<p>Hi,<br />
Very appreciate for your response. It&#8217;s very helpful. Thank you.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mathieu</title>
		<link>http://www.mblondel.org/journal/2010/06/21/semi-supervised-naive-bayes-in-python/#comment-226831</link>
		<dc:creator>Mathieu</dc:creator>
		<pubDate>Wed, 13 Apr 2011 07:11:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.mblondel.org/journal/?p=126#comment-226831</guid>
		<description>In the E-step, I use the current model to label unlabeled data, i.e., compute the fractional labels for the unlabeled data.
In the M-step, I compute the counts for unlabeled data, I add the counts for labeled data  (those don&#039;t change so they are computed outside of the loop), and finally I normalize to get probabilities.

You can find a more recent version of my code here:
https://github.com/mblondel/scikit-learn/blob/semisupervised/scikits/learn/naive_bayes.py

You can find unit tests here:
https://github.com/mblondel/scikit-learn/blob/semisupervised/scikits/learn/tests/test_naive_bayes_semi.py

(Sorry I don&#039;t have time to provide support about these programs)</description>
		<content:encoded><![CDATA[<p>In the E-step, I use the current model to label unlabeled data, i.e., compute the fractional labels for the unlabeled data.<br />
In the M-step, I compute the counts for unlabeled data, I add the counts for labeled data  (those don&#8217;t change so they are computed outside of the loop), and finally I normalize to get probabilities.</p>
<p>You can find a more recent version of my code here:<br />
<a href="https://github.com/mblondel/scikit-learn/blob/semisupervised/scikits/learn/naive_bayes.py" rel="nofollow">https://github.com/mblondel/scikit-learn/blob/semisupervised/scikits/learn/naive_bayes.py</a></p>
<p>You can find unit tests here:<br />
<a href="https://github.com/mblondel/scikit-learn/blob/semisupervised/scikits/learn/tests/test_naive_bayes_semi.py" rel="nofollow">https://github.com/mblondel/scikit-learn/blob/semisupervised/scikits/learn/tests/test_naive_bayes_semi.py</a></p>
<p>(Sorry I don&#8217;t have time to provide support about these programs)</p>
]]></content:encoded>
	</item>
</channel>
</rss>

