Python Machine Learning/Data Science Project Structure

Question

asked Jul 10, 2019 in Data Science by sourav (17.6k points)

I'm looking for information on how should a Python Machine Learning project be organized. For Python usual projects there is Cookiecutter and for R ProjectTemplate.

This is my current folder structure, but I'm mixing Jupyter Notebooks with actual Python code and it does not seems very clear.

.

├── cache
├── data
├── my_module
├── logs
├── notebooks
├── scripts
├── snippets
└── tools

I work in the scripts folder and currently adding all the functions in files under my_module, but that leads to errors loading data(relative/absolute paths) and other problems.

I could not find proper best practices or good examples on this topic besides this kaggle competition solution and some Notebooks that have all the functions condensed at the start of such Notebook.

1 Answer

Shlok Pandey · Answer 1 · 2019-07-12T07:05:53+0000

Here, you can append to the system path which is the most efficient and cleanest way of importing code into a notebook without lots of module boilerplate and a pip -e install.

Also, you should use the %autoreload and %aimport, magics with the above as shown in the code below.

# Load the "autoreload" extension
%load_ext autoreload

# always reload modules marked with "%aimport"
%autoreload 1

import os
import sys
# add the 'src' directory as one where we can import modules
src_dir = os.path.join(os.getcwd(), os.pardir, 'src')
sys.path.append(src_dir)
# import my method from the source code
%aimport preprocess.build_features

If you want some hands on Data Science then you can watch this video tutorial on Data Science Project for Beginners.

If you wish to learn more about how to use python for data science, then go through data science python programming course by Intellipaat for more insights.

Python Machine Learning/Data Science Project Structure

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources