Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

What is the correct naming convention for files in a data science and machine learning project?

I believe the file name of the Python classes should be a noun. However, I want to make it clear that whether to name the class as a subject noun or object noun.

Which of these should I use?

1) The class that outputs plots.

visualization.py, visualizer.py, vis.py, or ...

2) The class that analyses the dataset and outputs files that contains results.

analysis.py, analyzer.py, or ...

3) The class that coverts the dataset to pickle files.

preprocessor.py, preprocessing.py, prepare.py, or ...

(I had checked PEP8 but couldn't find the clearly naming conversion for the file names)

1 Answer

0 votes
by (41.4k points)
edited by

Here is the file naming convention for data-sets in brief.

1.The names which you assign should be descriptive such that it shows the identity of your file.

2.The formatting of files should not be different.It should be same throughout including data set files or zip or tar files.

3. Use only numbers, letters, and underscores.Do not use special characters, spaces, dashes, or multiple dots or stops. 

4.Do not use more than 32 characters.

5.Use consistent case – either all should be UPPERCASE or it should be lowercase.

6.Sequential numbering should allow for growth, and include leading zeros. If you have 100 files, then numbering should be from 001 to 100.

7.Avoid using common terms (‘hello’, ‘champ’, ‘use’, or ‘python’)

8.Dates should be in a standard format – YYYYMMDD, which will help  to sort chronologically.

If you want some hands on Data Science then you can watch this video tutorial on Data Science Project for Beginners.

Browse Categories

...