I'd Rather Be Coding

Installing macOS Mojave with an NVIDIA Graphics Card

5 Oct 2018

Warning: Do not try this at home.

macOS Mojave has been around for a couple of months now, and while it hasn’t been a monumental release, it does offer some a long-sought cosmetic feature, a true dark mode. Sadly, there’s no such thing as a free lunch, though. Apple have once again flip-flopped graphics providers, moving back to AMD, leaving NVIDIA users like myself out in the cold, dependent on NVIDIA’s lagging driver release schedule. Thus the question becomes: How far am I willing to go in the name of sweet, sweet dark mode?

Apparently about to the end of this post.

Update to macOS High Sierra

It is not advised by Apple to upgrade a Mac Pro to Mojave directly from macOS versions prior to 10.13.6 (High Sierra). If you’re already on 10.13.6, skip this section.

Download macOS High Sierra from the App Store.
The installer includes a firmware update, and requires that you have a Mac-EFI-flashed graphics card installed (e.g. the OEM graphics card).
Once you’ve verified that you have a proper graphics card installed, follow the instructions in the installer.

Upgrade to macOS Mojave

Turn off FileVault, as it is not supported in Mojave on Mac Pro 5,1.
Download macOS Mojave from the App Store.
While the installer downloads, check for unsupported hardware. Mojave requires Metal-capable graphics cards. To verify that your graphics card is supported, look in the Graphics/Displays section of System Information.

System Information showing our Metal-capable NVIDIA graphics card.

Like 10.13.6, macOS Mojave includes a firmware update. The update requires that all installed cards support Metal, so remove any unsupported cards before proceeding (This update does not require a Mac EFI-flashed graphics card be installed).

If the installation fails like mine did, you may reinstall a Mac-EFI-flashed graphics card, even if it does not support Metal, and continue. The graphics won’t be pretty, but sufficient to complete the upgrade.

Install Web Drivers

Download the Webdriver Manager app.
Launch the Webdriver Manager and select Reinstall Webdriver, then Download Webdriver.
Once it’s completed downloading, relaunch the app and select Existing Webdriver Patching. The app requires access to Terminal, so be sure to grant it when prompted.
After the patch has been applied, open Terminal and execute the following commands:

$

sudo chmod -R 755 /Library/Extensions/NVDAStartupWeb.kext

$

sudo chown -R root:wheel /Library/Extensions/NVDAStartupWeb.kext

$

sudo touch /System/Library/Extensions/ && sudo kextcache -u /

$

sudo touch /Library/Extensions && sudo kextcache -u /

$

sudo reboot

Upon restart, NVIDIA graphics should be "working", in the most liberal sense of the word. Without hardware acceleration, Safari is a mess, and Launchpad slows to a crawl. Being a member of teams Ungoogled Chromium and Spotlight, these aren’t huge losses, however the drivers do not seem to be able to wake from sleep. That is less than desirable, obviously (Needless to say if you’re a video editor working in Final Cut all day, hold off on this hack), and it’s incredibly frustrating that we more than likely won’t be seeing drivers from NVIDIA before the new year. While one could assume that the drivers are delayed due to a combination of the significant change to the graphics stack in Mojave along with the tremendously small market share NVIDIA Macs possess, there have been rumors that Apple have been the ones dragging their feet in signing them. I won’t speculate here, but I will say that as a user, Apple has enough vendor lock-in across the platform, and it would be greatly appreciated if we could have some say in what we put into our modular towers. One can only hope that this philosophy is embodied when the new Mac Pros come next year.

Building PyTorch from Source with CUDA on macOS

12 Jul 2018

PyTorch 1.0.0 is here! Unfortunately, if you want GPU support on macOS, you’ll have to get your hands dirty. Here’s how to build PyTorch from source with CUDA 10.0 on macOS High Sierra.

Prerequisites

Xcode

In this tutorial we’ll be building PyTorch with CUDA 10.0. Xcode 9.4 is required to install CUDA, and can be downloaded from Apple. Be sure to download Command Line Tools for Xcode 9.4 as well. Extract Xcode to /Applications (Renamed in case another version of Xcode already exists):

$

tar -zxvf Xcode_9.4.1.xip -C /Applications/Xcode-9.4.app

Command Line Tools can be installed by following the instructions given in the installer.

CUDA and cuDNN

As previously mentioned, we’ll be building against the latest releases of CUDA and cuDNN, versions 10.0 and 7.4.1, respectively. CUDA can be downloaded directly from NVIDIA, while downloading cuDNN requires a developer account. Installation of CUDA is extremely straightforward, just follow the instructions provided by the downloaded installer. cuDNN must be extracted and copied to the proper directories.

Navigate to the directory containing the downloaded tarball extract it:

$

tar -xzvf cudnn-10.0-osx-x64-v7.4.1.5.tgz

Copy the library components into your CUDA installation, and make them executable:

$

sudo cp cuda/include/cudnn.h /usr/local/cuda/include

$

sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib

$

sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*

You can delete the extracted directory afterward if you’d like:

$

rm -rf cuda

Lastly, set an environmental variable, DYLD_LIBRARY_PATH, to point to cuDNN’s location.

$

export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH

Of course it never hurts to check your work. Verify cuDNN with the following command (It may return some warnings but no errors):

$

echo -e '#include"cudnn.h"\n void main(){}' | nvcc -x c - -o /dev/null -I/usr/local/cuda/include -L/usr/local/cuda/lib -lcudnn

Building PyTorch

Clone the GitHub repository and navigate into it.

$

git clone --recursive https://github.com/pytorch/pytorch && cd pytorch

Let’s checkout the correct branch, and download the required dependencies.

$

git checkout v1.0.0

$

git submodule update --init

PyTorch requires a few additional python dependencies. I’d recommend installing these into a virtual environment for the build, however I’ll leave the implementation details up to the reader.

$

pip install numpy pyyaml mkl mkl-include setuptools cmake cffi typing

Add the path to the python execetuable you’ll be using as an environmental variable for cmake.

$

export CMAKE_PREFIX_PATH=$(dirname $(which python))/../

Finally, we’re ready to go! If you’d rather build a .whl for installation with pip, replace install in the command below with bdist_wheel, and you’ll find it in a dist directory upon completion.

$

MACOSX_DEPLOYMENT_TARGET=10.13 CC=clang CXX=clang++ python setup.py install

Let’s give PyTorch 1.0 a try. Let’s train one of the example models on the MNIST dataset.

 from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
 
 
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
 
    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)
 
 
def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
 
 
def test(args, model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.max(1, keepdim=True)[1]  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()
 
    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
 
 
def main():
    # Training settings
    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
    parser.add_argument('--batch-size', type=int, default=64, metavar='N',
                        help='input batch size for training (default: 64)')
    parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
                        help='input batch size for testing (default: 1000)')
    parser.add_argument('--epochs', type=int, default=10, metavar='N',
                        help='number of epochs to train (default: 10)')
    parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
                        help='learning rate (default: 0.01)')
    parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
                        help='SGD momentum (default: 0.5)')
    parser.add_argument('--no-cuda', action='store_true', default=False,
                        help='disables CUDA training')
    parser.add_argument('--seed', type=int, default=1, metavar='S',
                        help='random seed (default: 1)')
    parser.add_argument('--log-interval', type=int, default=10, metavar='N',
                        help='how many batches to wait before logging training status')
    args = parser.parse_args()
 
    use_cuda = not args.no_cuda and torch.cuda.is_available()
 
    torch.manual_seed(args.seed)
 
    device = torch.device("cuda" if use_cuda else "cpu")
 
    kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
    train_loader = torch.utils.data.DataLoader(
        datasets.MNIST('../data', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=args.batch_size, shuffle=True, **kwargs)
    test_loader = torch.utils.data.DataLoader(
        datasets.MNIST('../data', train=False, transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=args.test_batch_size, shuffle=True, **kwargs)
 
    model = Net().to(device)
    optimizer = optim.SGD(model.parameters(), lr=args.lr,
                          momentum=args.momentum)
 
    for epoch in range(1, args.epochs + 1):
        train(args, model, device, train_loader, optimizer, epoch)
        test(args, model, device, test_loader)
 
 
if __name__ == '__main__':
    main()

Running the example CPU-only and again with a GPU produced models with extremely similar accuracy, 98.41% and 98.38% respectively, however GPU training was, on average, 27% faster than CPU-only.

It’s unfortunate that first-party support for GPU-accelerated machine learning on macOS in general leaves much to be desired, however given the circumstances surrounding Apple and NVIDIA, you really can’t blame the developers of libraries like PyTorch or TensorFlow for not devoting more engineering resources to a largely front-end platform like macOS. That being said, diversity is better for everyone, and hopefully Apple will continue to prove their renewed commitment to Pro users with hardware capable of leveraging these advanced libraries. Likewise, we should all hope that machine learning libraries will eventually break free of the chokehold NVIDIA has on them, and that they can begin supporting alternative frameworks, such as ROCm, but I digress. I’ll save that discussion for another day.

Managing Multiple Python Installations

28 Jun 2018

With the recent release of Python 3.7.0, I’ve found myself in a position I was warned of early on when I started using Python, but thought I had solved with virtual environments. I was sorely mistaken when my PyTorch GPU build broke after upgrading Python 3.6.5 with Homebrew. Virtual environments are Python’s answer to the frustrating problem of package management. Different projects and utilities often rely on particular versions of packages or modules that potent. Virtual environments allow different versions of the same package to live happily together on the same system, partitioned in their own little universe that includes an independent Python installation as well. It also has the advantage of encapsulating more obscure dependencies with their files for organization (Just run pip freeze and try to guess what goes with which project). Great! We’ve got packages sorted out, what about Python installations themselves? This is where things get a little murky. First of all, it goes without saying that you should never ever mess with the system’s installation of Python (Located in /usr/bin/python on macOS). There are plenty of ways to install an additional binary. Installer packages as well as source release archives are available directly. As you may already have guessed, clever reader, there are a few reasons this is less than ideal, and you’re right! Building from source is a hassle and unless your a masochist like me, or have system optimizations (Like GPU support *cough* PyTorch *cough*), it’s generally more trouble than is worth. The installer packages, on the other hand, are stupid easy to install, but add unnecessary (And unwanted) cruft to your Applications folder. Homebrew, however, will install python (2 and/or 3) alongside your system’s installation, and update your PATH variable (More on that below) so they all play along nicely. What’s not to love?

Installing multiple versions of Python with Homebrew

Homebrew is great for installing the latest version of Python, but anything else requires diving into git to checkout the correct commit. For example, this is how to install Python 3.6.5_1:

$

brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/f2a764ef944b1080be64bd88dca9a1d80130c558/Formula/python.rb

And to add insult to injury, the commit history is too long to generate online, so one would need to clone the entire repository to access the history. Luckily, there is a better way. Enter Pyenv. A fork of the Ruby evnvironment manager rbenv, pyenv works the same way, by injecting shim executables into your PATH. What?

Follow the Yellow/Blue Brick Road

The PATH environment variable has been shrouded in mystery from me for far too long than I care to admit, but it’s way simpler than it’s made out to be. PATH is just a list of paths to directories (Hence the name) for the operating system to search through when looking for an executable, from first to last, separated by colons. That first caveat is an important one. For example, calling python in a shell returns the REPL. If I have two Python installations in my PATH, the one that’s found first is the one used. If one installation is my system installation and is in /usr/bin, and the other I’ve installed to /usr/local/bin, I’ll want to export my PATH with /usr/local/bin ahead of /usr/bin if it isn’t already (Though it usually is).

Pyenv’s shims are injected at the head of the PATH variable so the will intercept any calls to python, and direct them to the appropriate version’s binary through a symlink, as described by the current directory’s .python-version file, or the default if no such file exists. You can set the desired version with:

$

pyenv local <version number>

To install pyenv, simply run:

$

brew install pyenv

Once installed, you can get a list of available Python versions with:

$

pyenv install --list

And install with:

$

pyenv install <version number>

To see a list of all installed Python versions:

$

pyenv versions

Venv and virtualenv and pipenv, oh my!

In the beginning, there was virtualenv, a tool for creating isolated Python environments. It’s an indispensible tool if more than one project is being developed on the same system, as it allows each project to install its own copies of its dependencies. This allows for multiple versions of the same dependency to exist on a system without fear of name collision and helps keep the global package namespace clean. Virtualenv is so useful that in Python 3 it was folded into the standard library as venv. If you don’t need or care about Python 2 support, then you can use venv out of the box and have all the features of virtualenv. Pipenv is the latest iteration of virtual environment tools, and the recommended dependency manager for applications. While Virtualenv and venv are great, they fall short in certain use cases, such as collaborative or version-controlled projects. Pipenv bakes in pip and Pipfile, simplifying the installation process (Pip must be manually installed in new environments created by virtualenv and venv), and installing the virtualenv directory in the user’s home directory (~/.local/share/virtualenvs/), as opposed to the current working directory. Additionally, the required dependencies are listed in a Pipfile, a more robust alternative to requirements.txt, and is analogous to packages.json in npm.

Developing in multiple versions of any language is no trivial task, but for Python, with the right tools, it’s become easier than ever.

	from __future__ import print_function
	import argparse
	import torch
	import torch.nn as nn
	import torch.nn.functional as F
	import torch.optim as optim
	from torchvision import datasets, transforms


	class Net(nn.Module):
	def __init__(self):
	super(Net, self).__init__()
	self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
	self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
	self.conv2_drop = nn.Dropout2d()
	self.fc1 = nn.Linear(320, 50)
	self.fc2 = nn.Linear(50, 10)

	def forward(self, x):
	x = F.relu(F.max_pool2d(self.conv1(x), 2))
	x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
	x = x.view(-1, 320)
	x = F.relu(self.fc1(x))
	x = F.dropout(x, training=self.training)
	x = self.fc2(x)
	return F.log_softmax(x, dim=1)


	def train(args, model, device, train_loader, optimizer, epoch):
	model.train()
	for batch_idx, (data, target) in enumerate(train_loader):
	data, target = data.to(device), target.to(device)
	optimizer.zero_grad()
	output = model(data)
	loss = F.nll_loss(output, target)
	loss.backward()
	optimizer.step()
	if batch_idx % args.log_interval == 0:
	print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
	epoch, batch_idx * len(data), len(train_loader.dataset),
	100. * batch_idx / len(train_loader), loss.item()))


	def test(args, model, device, test_loader):
	model.eval()
	test_loss = 0
	correct = 0
	with torch.no_grad():
	for data, target in test_loader:
	data, target = data.to(device), target.to(device)
	output = model(data)
	test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
	pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
	correct += pred.eq(target.view_as(pred)).sum().item()

	test_loss /= len(test_loader.dataset)
	print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
	test_loss, correct, len(test_loader.dataset),
	100. * correct / len(test_loader.dataset)))


	def main():
	# Training settings
	parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
	parser.add_argument('--batch-size', type=int, default=64, metavar='N',
	help='input batch size for training (default: 64)')
	parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
	help='input batch size for testing (default: 1000)')
	parser.add_argument('--epochs', type=int, default=10, metavar='N',
	help='number of epochs to train (default: 10)')
	parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
	help='learning rate (default: 0.01)')
	parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
	help='SGD momentum (default: 0.5)')
	parser.add_argument('--no-cuda', action='store_true', default=False,
	help='disables CUDA training')
	parser.add_argument('--seed', type=int, default=1, metavar='S',
	help='random seed (default: 1)')
	parser.add_argument('--log-interval', type=int, default=10, metavar='N',
	help='how many batches to wait before logging training status')
	args = parser.parse_args()

	use_cuda = not args.no_cuda and torch.cuda.is_available()

	torch.manual_seed(args.seed)

	device = torch.device("cuda" if use_cuda else "cpu")

	kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
	train_loader = torch.utils.data.DataLoader(
	datasets.MNIST('../data', train=True, download=True,
	transform=transforms.Compose([
	transforms.ToTensor(),
	transforms.Normalize((0.1307,), (0.3081,))
	])),
	batch_size=args.batch_size, shuffle=True, **kwargs)
	test_loader = torch.utils.data.DataLoader(
	datasets.MNIST('../data', train=False, transform=transforms.Compose([
	transforms.ToTensor(),
	transforms.Normalize((0.1307,), (0.3081,))
	])),
	batch_size=args.test_batch_size, shuffle=True, **kwargs)

	model = Net().to(device)
	optimizer = optim.SGD(model.parameters(), lr=args.lr,
	momentum=args.momentum)

	for epoch in range(1, args.epochs + 1):
	train(args, model, device, train_loader, optimizer, epoch)
	test(args, model, device, test_loader)


	if __name__ == '__main__':
	main()