+2 votes
1 view
in AI and Deep Learning by (43.2k points)

I'm planning to develop a program in Java which will provide a diagnosis. The data set is divided into two parts one for training and the other for testing. My program should learn to classify from the training data (BTW which contain the answers for 30 questions each in new column, each record in new line the last column will be diagnosis 0 or 1, in the testing part of data diagnosis column will be empty - data set containing about 1000 records) and then make predictions in testing part of data :/

I've never done anything similar so I'll appreciate any advice or information about the solution to a similar problem.

I was thinking about Java Machine Learning Library or Java Data Mining Package but I'm not sure if it's the right direction...? and I'm still not sure how to tackle this challenge...

Please advise.

All the best!

2 Answers

+2 votes
by (92.8k points)
edited by

For implementations of the classification algorithms, the best bet is Weka. http://www.cs.waikato.ac.nz/ml/weka/. Weka is a collection of ML algorithms for data mining tasks. It contains the particular tools for data preparation, classification, regression, clustering, association rules mining, and visualization.

You can perform a lot of complicated stuff using this without really having to do any coding or math. Once you get used to it, you could use its API to integrate any of its classifiers into your own Java programs.

If you want to know more about Machine Learning then watch this video:

Learn Machine Learning from experts, click here to more in this Machine Learning Course!

+1 vote
by (8k points)

There are numerous algorithms that be the category of "machine learning", and that is true for your scenario depends on the kind of information you are managing. If your data essentially consists of mappings of a set of questions to a set of diagnoses each of which can be yes/no, then I think methods that could potentially work include neural networks and strategies for automatically building a decision tree supported the test knowledge. I'd have a glance at a number of the quality texts like Russel & Norvig ("Artificial Intelligence: a contemporary Approach") and alternative introductions to AI/machine learning and see if you can simply adapt the algorithms they mention to your specific knowledge. See also O'Reilly, "Programming Collective Intelligence" for some sample Python code of one or two algorithms that might be adaptable to your case. If you can read Spanish, the Mexican publishing house Alfaomega has also published various good AI-related introductions in recent years. This is a classification problem, not really data mining. The general approach is to extract features from every knowledge instance and let the classification algorithmic program learn a model from the options and therefore the outcome (which for you is 0 or 1). Presumably, each of your 30 queries would be its own feature.

There are many classification techniques you can use. Support vector machines are popular as are maximum entropy. I haven't used the Java Machine Learning library, however, at a look, I do not see either of those. The OpenNLP project has a maximum entropy implementation. LibSVM has a support vector machine implementation.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !