Using ML.NET CLI To Automate Model Training

As we continue our article series about ML.NET, we will look at the ML.NET Command Line Interface (CLI) tool.

Like ML.NET Model Builder, the ML.NET CLI uses AutoML to produce machine learning models.

To download the source code for this article, you can visit our GitHub repository.

Let’s dive in and see how we can use it to automate model generation.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!

What is the ML.NET CLI?

The ML.NET CLI tool, as the name suggests, is the command line tool that allows us to develop the best possible machine learning model for a given dataset and selected scenario.

ML.NET is a .NET Core tool, so it is available for Windows, macOS, and Linux.

For a given dataset provided as an input parameter, ML.NET CLI generates both an ML model and C# code to run and train the model.

The current ML.NET CLI version supports the following tasks:

classification
regression
recommendation
train
image-classification
text-classification
forecasting

ML.NET CLI Installation

Before using the tool, we need to install it. It requires .NET SDK to be installed on the machine. To check and verify the installed version, let’s open the command line or a terminal and run the command:

PM> dotnet --version

If it returns the version number, we are good to go. If it fails, we must install .NET SDK before installing the ML.NET CLI tool.

The install command format is:

dotnet tool install --global mlnet-<OS>-<ARCH>

So, to install it on a 64-bit Windows, we run this command:

PM> dotnet tool install --global mlnet-win-x64

Occasionally, the installation might return the following error:

dotnet : Failed to create shell shim for tool 'mlnet-win-x64': Command 'mlnet' conflicts with an existing command from another tool.

This indicates the older (obsolete) ML.NET CLI version is already installed on the system. Therefore, first, we need to uninstall it to be able to install the latest version. To uninstall the tool, run the command:

PM> dotnet tool uninstall --global mlnet

Finally, we can repeat the install command to install the latest version on our system.

Using the ML.NET CLI

Now, we are ready to see the tool in action.

Calling the tool without any parameters will give us the basic information about its usage and available commands:

PM> mlnet
mlnet : Required command was not provided.

mlnet

Usage:
  mlnet [options] [command]

Options:
  --version <version>    Show version information.
  -?, -h, --help <help>  Show help and usage information.

Commands:
  classification        Train a custom ML.NET model for classification...
  regression            Train a custom ML.NET model for regression...
  recommendation        Train a custom ML.NET model for recommendation...
  train                 train using training config file
  image-classification  Train a custom ML.NET model for image classification...
  text-classification   Train a custom ML.NET model for text classification ...
  forecasting           Train a custom ML.NET model for time series forecasting...

Let’s use it on our dataset. We will use the same Credit Risk Customers dataset as in the previous article.

Since we want to classify the credit submission, we’ll use the classification command.

First of all, let’s see the available options by calling the tool with only the task name:

PM> mlnet classification
mlnet : Option '--dataset' is required.

Option '--label-col' is required.

classification
  Train a custom ML.NET model for classification...
Usage:
  mlnet [options] classification

Options:
  --dataset <dataset> (REQUIRED)             File path to single dataset or training dataset...
  --label-col <label-col> (REQUIRED)         Name or zero-based index of label (target)...
  --cache <Auto|Off|On>                      Specify [On|Off|Auto] for cache to be turned...
  --cv-fold <cv-fold>                        Number of folds used for cross-validation...
  --has-header                               Specify [true|false] depending if dataset file(s)...
  --ignore-cols <ignore-cols>                Specify columns to be ignored in given dataset....
  --log-file-path <log-file-path>            Path to log file.                             
  --name <name>                              Name for output project or solution to create...
  -o, --output <output>                      Location folder for generated output. Default...
  --split-ratio <split-ratio>                Percent of dataset to use for validation...
  --train-time <train-time>                  Maximum time in seconds for exploring models...
  --validation-dataset <validation-dataset>  File path for validation dataset in train/valid...
  -v, --verbosity <verbosity>                Output verbosity choices: q[uiet], m[inimal]...

Required options: --dataset, --label-col

As we can see, the tool provides clear information about the usage and available and required options.

In our example, we want to predict the value of the class column, and we want to set the training time to 10 seconds.

Taking all this into consideration, we run the command:

mlnet classification --dataset "DataSets/credit_customers.csv" --label-col 20 --has-header true --train-time 10

Since our dataset has headers, we can also use a column name for the label column:

mlnet classification --dataset "DataSets/credit_customers.csv" --label-col "class" --has-header true --train-time 10

As a result, we see the training process details and the location of the generated assets:

Start Training
start multiclass classification
Evaluate Metric: MacroAccuracy
Available Trainers: LGBM,FASTFOREST,FASTTREE,LBFGS,SDCA
Training time in seconds: 10
|      Trainer                             MacroAccuracy Duration    |
|--------------------------------------------------------------------|
|0     FastTreeOva                         0.5751     0.6000         |
|1     FastTreeOva                         0.5854     0.2860         |
|2     FastTreeOva                         0.6822     0.4950         |
|3     FastForestOva                       0.6059     0.3550         |
|4     FastTreeOva                         0.5826     0.5150         |
|5     LightGbmMulti                       0.6465     0.1350         |
|6     LightGbmMulti                       0.6483     0.1390         |
|7     FastTreeOva                         0.6737     1.1230         |
|8     FastTreeOva                         0.6380     0.4040         |
|9     LightGbmMulti                       0.7010     0.1160         |
[Source=AutoMLExperiment, Kind=Info] cancel training because cancellation token is invoked...
|--------------------------------------------------------------------|
|                          Experiment Results                        |
|--------------------------------------------------------------------|
|                               Summary                              |
|--------------------------------------------------------------------|
|ML Task: multiclass classification                                  |
|Dataset: DataSets\credit_customers.csv|
|Label : class                                                       |
|Total experiment time :     9.0000 Secs                             |
|Total number of models explored: 11                                 |
|--------------------------------------------------------------------|
|                        Top 5 models explored                       |
|--------------------------------------------------------------------|
|      Trainer                             MacroAccuracy Duration    |
|--------------------------------------------------------------------|
|9     LightGbmMulti                       0.7010     0.1160         |
|2     FastTreeOva                         0.6822     0.4950         |
|7     FastTreeOva                         0.6737     1.1230         |
|6     LightGbmMulti                       0.6483     0.1390         |
|5     LightGbmMulti                       0.6465     0.1350         |
|--------------------------------------------------------------------|
[Source=AutoMLExperiment, Kind=Info] cancel training because cancellation token is invoked...
save SampleClassification.mbconfig to ML.NET_CLI\SampleClassification
Generating a console project for the best pipeline at location : ML.NET_CLI\SampleClassification

Among the generated files, we have the SampleClassification.mbconfig file:

{
	"Scenario": "Classification",
	"DataSource": {
		"Type": "TabularFile",
		"Version": 1,
		"FilePath": "DataSets\\credit_customers.csv",
		"Delimiter": ",",
		"DecimalMarker": ".",
		"HasHeader": true,
		"ColumnProperties": [...]
	},
	"Environment": {
		"Type": "LocalCPU",
		"Version": 1
	},
	...
}

It is the same configuration file used in the ML.NET Model Builder tool. You can check the entire file in our source code.

ML.NET CLI Integration

A common scenario where the ML.NET CLI tool comes in handy is integrating with different CI/CD tools.

For example, ML model development automation in Azure DevOps Pipeline:

trigger:
- 'DataSets/credit_customers.csv'

pool:
  vmImage: 'windows-latest'

steps:
- script: |
    dotnet tool install --global mlnet-win-x64

    $dataPath = 'DataSets/credit_customers.csv'
    $outputModelPath = 'CreditCustomerClassificationModel.zip'

    mlnet classification --dataset $dataPath --label-col "class" --has-header true --train-time 10 --output $outputModelPath

  displayName: 'Retrain Credit Customer Classification Model'

This script defines a pipeline to retrain the model when the credit_customers.csv file is changed.

Model Evaluation

The ML.NET CLI generates the “best model” based on the quality metrics. Depending on the task type, different metrics are used.

The default metric in binary classification problems is accuracy. The usual metrics for multi-classification tasks are Micro Accuracy, which measures overall accuracy, and Macro accuracy, representing average accuracy at a class level. Finally, the default metric for the value prediction tasks is RSquared, with values ranging from 0 to 1.

Occasionally, there might be situations that require the usage of an additional metric. For details about the metrics available, please see the official documentation from Microsoft.

Conclusion

In this article, we’ve explored the ML.NET CLI tool, which enables us to automate and optimize machine learning model development and generation. It provides a simple but clear way to generate different ML models from a given dataset through the command line interface or a script.

In conclusion, ML.NET CLI is a useful tool for any developer or data scientist who needs to develop ML models in an easy and automated way.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!

Using ML.NET CLI To Automate Model Training

What is the ML.NET CLI?

ML.NET CLI Installation

Using the ML.NET CLI

ML.NET CLI Integration

Model Evaluation

Conclusion

Leave a reply Cancel reply

Courses – Code Maze

Ad 1

Ad 2

Ad 3

Ad 4