A Support Vector Machine (SVM) is a very powerful and versatile Machine Learning model that is capable of performing linear or nonlinear classification, regression, and even outlier detection.
SVMs are particularly well-suited for classification of complex but small- or medium-sized datasets.
Here are a few ways to customize more advanced versions of them.
Linear SVM for Nonlinear Datasets
Although linear SVM classifiers are efficient and work well in many cases, many datasets are not even close to being linearly separable.
One approach to handling nonlinear datasets is to add more features, such as polynomial features, which may result in a linearly separable dataset.
To implement this in Scikit-Learn, we simply create a
Pipeline containing a
PolynomialFeatures transformer, followed by a
StandardScaler and a
from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import StandardScaler from sklearn.svm import LinearSVC polynomial_svm_clf = Pipeline([ ("poly_features", PolynomialFeatures(degree=3)), ("scaler", StandardScaler()), ("svm_clf", LinearSVC(C=10, loss="hinge")) ]) #polynomial_svm_clf.fit(X, y)
Note: If your SVM model is overfitting, you can try regularizing it by reducing
LinearSVC class regularizes the bias term, so you should center the training set first by subtracting its mean. This is automatic if you scale the data using the
Also make sure that you set the
loss hyperparameter to
"hinge" as it is not the default value (the default value is
Finally, for better performance, you should set the
dual hyperparameter to
False, especially when
n_samples > n_features.
SVM with Polynomial Kernels
As we have seen previously, adding polynomial features via the
PolynomialFeatures transformer is simple and straightforward to implement. However, the polynomial degree should not be too high as this will create a huge number of features, making the model too slow.
An alternative is to perform the kernel trick. It makes it possible to get the same result as if you added many polynomial features, even with very high-degree polynomials, without actually having to add them.
from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC # With Polynomial Kernel poly_kernel_svm_clf = Pipeline([ ("scaler", StandardScaler()), ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5)) ]) # With RBF Kernel rbf_kernel_svm_clf = Pipeline([ ("scaler", StandardScaler()), ("svm_clf", SVC(kernel="rbf", gamma=5, C=0.001)) ])
LinearSVC vs. SVC
LinearSVCclass is based on the
liblinearlibrary, which implements an optimized algorithm for linear SVMs. It does not support the kernel trick, but it scales almost linearly with the number of training instances and the number of features — its training time complexity is roughly
O(m × n).
On the other hand, the
SVCclass is based on the
libsvmlibrary, which implements an algorithm that supports the kernel trick. The training time complexity is usually between
O(m^2 × n)and
O(m^3 × n). Unfortunately, this means that it gets dreadfully slow when the number of training instances gets large.
In a Nutshell
With so many kernels to choose from, how can you decide which one to use?
As a rule of thumb, you should always try the linear kernel first (remember that
LinearSVC is much faster than
SVC(kernel="linear")), especially if the training set is very large or if it has plenty of features.
And if the training set is not too large, you should try the Gaussian RBF kernel as well; it works well in most cases.
Finally, should you have spare time and computing power, you can also experiment with a few other kernels using cross-validation and grid search, especially if there are kernels specialized for your training set’s data structure.
If you enjoyed this post and want to buy me a cup of coffee...
The thing is, I'll always accept a cup of coffee. So feel free to buy me one.