How much data is needed to make a good prediction?

Jul 30, 2010 at 3:39 PM
Edited Jul 30, 2010 at 3:47 PM
This is my product class:
public class Product
{
  public string Description { get; set; }
  [Feature]
  public double Price { get; set; }
  [Label]
  public bool WasBoughtOrWillbuy { get; set; }
}
-------------------------------------------------------------------------------

This is my data
public static Product[] GetAll()
{
  return new Product[]
  {
  new Product{ Description="Keurig k-cup coffee maker deluxe", Price = 79.99, WasBoughtOrWillbuy = true },
   new Product{ Description="Keurig coffee cups", Price = 79.99, WasBoughtOrWillbuy = true },
  new Product{ Description="Keurig k-cup coffee maker premium", Price = 159.99, WasBoughtOrWillbuy = false },
  new Product{ Description="Xbox 360 arcade", Price = 199.99, WasBoughtOrWillbuy = false },
  new Product{ Description="Xbox 360 pro", Price = 299.99, WasBoughtOrWillbuy = false }
};

}
-------------------------------------------------------------------------------

Test code:
Product p = new Product { Description = "Keurig k-cup coffee maker deluxe", Price = 79.99, WasBoughtOrWillbuy = false };
linearClassifier.Predict(p);
-------------------------------------------------------------------------------

Result:
p.WasBoughtOrWillbuy = FALSE;

I was thinking it should be "TRUE" because it has very direct match
WasBoughtOrWillbuy = true when Price = 79.99

But when I increased the learning dataset as shown below it made a good prediction.

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

My new data.
public static Product[] GetAll()
{
return new Product[]
{
new Product{ Description="Keurig k-cup coffee maker deluxe", Price = 79.99, WasBoughtOrWillbuy = true },
new Product{ Description="Keurig coffee cups", Price = 79.99, WasBoughtOrWillbuy = true },
new Product{ Description="Keurig coffee cups", Price = 79.99, WasBoughtOrWillbuy = true },
new Product{ Description="Keurig coffee cups", Price = 79.99, WasBoughtOrWillbuy = true },
new Product{ Description="Keurig coffee cups", Price = 79.99, WasBoughtOrWillbuy = true },
new Product{ Description="Keurig coffee cups", Price = 79.99, WasBoughtOrWillbuy = true },
new Product{ Description="Keurig coffee cups", Price = 79.99, WasBoughtOrWillbuy = true },
new Product{ Description="Keurig coffee cups", Price = 79.99, WasBoughtOrWillbuy = true },
new Product{ Description="Keurig k-cup coffee maker premium", Price = 159.99, WasBoughtOrWillbuy = false },
new Product{ Description="Xbox 360 arcade", Price = 199.99, WasBoughtOrWillbuy = false },
new Product{ Description="Xbox 360 pro", Price = 299.99, WasBoughtOrWillbuy = false }
};
}
-------------------------------------------------------------------------------

Same Test code:
Product p = new Product { Description = "Keurig k-cup coffee maker deluxe", Price = 79.99, WasBoughtOrWillbuy = false };
linearClassifier.Predict(p);
-------------------------------------------------------------------------------

Result:
p.WasBoughtOrWillbuy = TRUE;
-------------------------------------------------------------------------------

Coordinator
Jul 30, 2010 at 3:47 PM
Edited Jul 30, 2010 at 3:49 PM

That is an excellent question. Based upon what you have sent, the system will basically learn a price threshold for what the user will buy. The name of the product is pretty much a wash because it gets converted into its corresponding length. Can you print out the xml of the learned predictor? As far as how much data, there are some really cool theories based upon PAC Learning. I will look into it a bit more for binary linear classifiers.

 

-- EDIT --

It will basically learn that anything below 159.99 is something worth buying (now that I've looked at the code better).

Jul 30, 2010 at 3:50 PM
I think I accidently through you off. Here "Description" is not even a [Feature]. I was just playing around with price.
Coordinator
Jul 30, 2010 at 3:53 PM

Yeah, I just saw that! Sorry for my hastiness. The system will basically learn some point between 79.99 and 159.99 is the division point (or the separator) between Buy and Not Buy.

Jul 30, 2010 at 4:14 PM
Edited Jul 30, 2010 at 6:35 PM
This what I got when I used the smaller learning set:
<?xml version="1.0"?>
<perceptron type="mlmine.Product">
  <weight>
   <v size="1">
    <e>-1.5778431372549198</e>
   </v>
  </weight>
<bias>0.92156862745098045</bias>
<!--The following section is for informational purposes only-->
<model>
<features>
<feature type="System.Double" converter="None">Price</feature>
</features>
<learn>WasBoughtOrWillbuy</learn>
</model>
</perceptron>
Jul 30, 2010 at 4:16 PM
Edited Jul 30, 2010 at 4:17 PM
This is what I got when I used the large set:
<?xml version="1.0"?>
<perceptron type="mlmine.Product">
  <weight>
   <v size="1">
    <e>3.5939639639639465</e>
   </v>
  </weight>
<bias>0.963963963963964</bias>
<!--The following section is for informational purposes only-->
<model>
<features>
<feature type="System.Double" converter="None">Price</feature>
</features>
<learn>WasBoughtOrWillbuy</learn>
</model>
</perceptron>
Coordinator
Jul 30, 2010 at 5:56 PM

Generally, the larger the dataset, the better the machine is at predicting th correct thing. Also, single feature learning sets are not *super* interesting since the system has to find essentially a single point to divide the set (you could do this by putting the data in an excel spreadsheet then sorting on the feature and finding the best split point). Hope this makes sense.

Jul 30, 2010 at 6:41 PM
Edited Jul 30, 2010 at 7:22 PM
yes it does make sense. I will start adding meaningful dimensions and see how it changes its behavior. Thx!