Multi-class Classification

Nov 9, 2010 at 5:17 PM

Seth first off,  thank you so much for making this an open source project.  I am really learning a lot about learning algorithms and finding good uses for them.  I have successfully coded ~8000 job titles from a set of 12,000 members needing job title codes with in our association.  I wanted to get your input as to if I am taking the best approach.

The basic problem is we have input from our members about what their job title is.  This is a string that is free form text.  What we needed to happen is based on this input we need to assign a job title code to that record.  There are approximately 50 of them (a finite set.)  Using preceptron models I created a model for each code and fed it examples from a set 24,000 records already coded.  This generated 50 models.  For each new job title I fed it through each of the 50 models to predict the job title code.  Around 6% had multiple hits for more that one job title.  What I want to do is get someone to manually correct those and then regenerate the models based on the new data.   

I was reading your blog and you mention multi-class classification.  This is basically want I am doing.  Do you have any plans to implement a model that supports it?  If so how does it work?  Would you use rough sets to accomplish this?

Let me know I am really interested in this stuff.

Nate
Dev Manager @ mgma.org 

Coordinator
Nov 10, 2010 at 3:45 AM
Edited Nov 10, 2010 at 3:46 AM

It looks like you implemented the one-versus-all method for multi-class classification. I indeed need to implement this! This is probably forefront in my mind (including word exclusions for string features). I definitely need to implement it!

The way to decide which classifier "wins" (if you can call it that) is by calculating the distance between the linear separator and the point in question. The one with the largest normalized distance should be the "guess." Hope this makes sense!

Coordinator
Feb 23, 2011 at 11:18 PM

Nate:
    Finally added Decision Trees which natively support multi-class classification. Hope this helps!

-Seth