Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in AWS by (19.1k points)

I started a P2 instance with this AMI (Linux with Nvidia Tesla). I installed some tools like screen, torch, etc. Then I successfully run some experiments using GPU and I created an image of the instance so that I can terminate it and run it again later.

Later I started a new instance from the AMI I created before. Everything looked fine - screen, torch, my experiments were present on the system, but I couldn't run the same experiments as before:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

To me, it looks like the drivers might be installed (because all other tools are installed from before), but they are not running. Is it a correct assumption? How can I start them?

1 Answer

0 votes
by (44.4k points)

If your default kernel on AWS gets updated, you might not have the Nvidia drivers installed:

ubuntu@ip-XXX-XXX-XXX-XXX:~$ ls -laR /lib/modules/4.4.0-1077-aws | grep -i nvidia

ubuntu@ip-XXX-XXX-XXX-XXX:~$ ls -laR /lib/modules/4.4.0-1049-aws | grep -i nvidia

-rw-r--r--  1 root root   87368 Jul 17 10:21 nvidia-drm.ko

-rw-r--r--  1 root root 1155304 Jul 17 10:21 nvidia-modeset.ko

-rw-r--r--  1 root root 1163016 Jul 17 10:21 nvidia-uvm.ko

-rw-r--r--  1 root root 18014088 Jul 17 10:21 nvidia.ko

GRUB config allowed booting the old image (1049), but it booted the new one by default (1077). The relevant portion of /boot/grub/cfg:

ubuntu@ip-XXX-XXX-XXX-XXX:~$ grep -i -e "ubuntu, with linux" /boot/grub/grub.cfg

    menuentry 'Ubuntu, with Linux 4.4.0-1077-aws' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.4.0-1077-aws-advanced-XXXX' {

    menuentry 'Ubuntu, with Linux 4.4.0-1077-aws (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.4.0-1077-aws-recovery-XXXX' {

    menuentry 'Ubuntu, with Linux 4.4.0-1049-aws' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.4.0-1049-aws-advanced-XXXX' {

    menuentry 'Ubuntu, with Linux 4.4.0-1049-aws (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.4.0-1049-aws-recovery-XXXX' {

You can now force that on the next reboot, and boot the old instance which has the Nvidia drivers installed.

sudo /usr/sbin/grub-reboot "Advanced options for Ubuntu>Ubuntu, with Linux 4.4.0-1049-aws"

sudo reboot

So, now your old kernel is booted and you have the Nvidia drivers.

Related questions

Want to get 50% Hike on your Salary?

Learn how we helped 50,000+ professionals like you !

0 votes
1 answer
...