2 views

Assume you know a student who wants to study Machine Learning and Natural Language Processing.

What specific computer science subjects should they focus on and which programming languages are specifically designed to solve these types of problems?

I am not looking for your favorite subjects and tools, but rather industry standards.

Example: I'm guessing that knowing Prolog and Matlab might help them. They also might want to study Discrete Structures*, Calculus, and Statistics.

*Graphs and trees. Functions: properties, recursive definitions, solving recurrences. Relations: properties, equivalence, partial order. Proof techniques, inductive proof. Counting techniques and discrete probability. Logic: propositional calculus, first-order predicate calculus. Formal reasoning: natural deduction, resolution. Applications to program correctness and automatic reasoning. Introduction to algebraic structures in computing.

by (33.1k points)

Machine learning and Natural Language Processing (NLP) are quite vast fields. You should go through some prerequisites needed to gain a  basic understanding of these fields.

The prerequisites are:

• Probability and Statistics

• Linear algebra

• Basic computer science

For NLP, extracting logic from a text is dependant on several steps:

• Tokenization

• Chunking

• Disambiguation on a lexical level

• Syntactic Parsing

• Morphological analysis

The above mentioned are some methods from a small list, there are lots of methods used for NLP.

For Natural Language Processing, the NLP group at Stanford provides many good resources. The introductory course Stanford CS 224: Natural Language Processing includes all the lectures online.

Some recommended texts are:

The prerequisite computational linguistics course requires basic computer programming and data structures knowledge and uses the same textbooks. The required artificial intelligence course is also available online along with all the lecture notes and uses:

This is the standard Artificial Intelligence text and is also worth reading.

You can use Python/R for machine learning. For this, I would suggest looking at The Elements of Statistical Learning, for which the full text is available online for free. You may want to refer to the Machine Learning and Natural Language Processing views on CRAN for specific functionality.