As we continue our article series about ML.NET, we will look at the ML.NET Command Line Interface (CLI) tool.
Like ML.NET Model Builder, the ML.NET CLI uses AutoML to produce machine learning models.
Let’s dive in and see how we can use it to automate model generation.
What is the ML.NET CLI?
The ML.NET CLI tool, as the name suggests, is the command line tool that allows us to develop the best possible machine learning model for a given dataset and selected scenario.
ML.NET is a .NET Core tool, so it is available for Windows, macOS, and Linux.
For a given dataset provided as an input parameter, ML.NET CLI generates both an ML model and C# code to run and train the model.
The current ML.NET CLI version supports the following tasks:
- classification
- regression
- recommendation
- train
- image-classification
- text-classification
- forecasting
ML.NET CLI Installation
Before using the tool, we need to install it. It requires .NET SDK to be installed on the machine. To check and verify the installed version, let’s open the command line or a terminal and run the command:
PM> dotnet --version
If it returns the version number, we are good to go. If it fails, we must install .NET SDK before installing the ML.NET CLI tool.
The install command format is:
dotnet tool install --global mlnet-<OS>-<ARCH>
So, to install it on a 64-bit Windows, we run this command:
PM> dotnet tool install --global mlnet-win-x64
Occasionally, the installation might return the following error:
dotnet : Failed to create shell shim for tool 'mlnet-win-x64': Command 'mlnet' conflicts with an existing command from another tool.
This indicates the older (obsolete) ML.NET CLI version is already installed on the system. Therefore, first, we need to uninstall it to be able to install the latest version. To uninstall the tool, run the command:
PM> dotnet tool uninstall --global mlnet
Finally, we can repeat the install command to install the latest version on our system.
Using the ML.NET CLI
Now, we are ready to see the tool in action.
Calling the tool without any parameters will give us the basic information about its usage and available commands:
PM> mlnet mlnet : Required command was not provided. mlnet Usage: mlnet [options] [command] Options: --version <version> Show version information. -?, -h, --help <help> Show help and usage information. Commands: classification Train a custom ML.NET model for classification... regression Train a custom ML.NET model for regression... recommendation Train a custom ML.NET model for recommendation... train train using training config file image-classification Train a custom ML.NET model for image classification... text-classification Train a custom ML.NET model for text classification ... forecasting Train a custom ML.NET model for time series forecasting...
Let’s use it on our dataset. We will use the same Credit Risk Customers dataset as in the previous article.
Since we want to classify the credit submission, we’ll use the classification command.
First of all, let’s see the available options by calling the tool with only the task name:
PM> mlnet classification mlnet : Option '--dataset' is required. Option '--label-col' is required. classification Train a custom ML.NET model for classification... Usage: mlnet [options] classification Options: --dataset <dataset> (REQUIRED) File path to single dataset or training dataset... --label-col <label-col> (REQUIRED) Name or zero-based index of label (target)... --cache <Auto|Off|On> Specify [On|Off|Auto] for cache to be turned... --cv-fold <cv-fold> Number of folds used for cross-validation... --has-header Specify [true|false] depending if dataset file(s)... --ignore-cols <ignore-cols> Specify columns to be ignored in given dataset.... --log-file-path <log-file-path> Path to log file. --name <name> Name for output project or solution to create... -o, --output <output> Location folder for generated output. Default... --split-ratio <split-ratio> Percent of dataset to use for validation... --train-time <train-time> Maximum time in seconds for exploring models... --validation-dataset <validation-dataset> File path for validation dataset in train/valid... -v, --verbosity <verbosity> Output verbosity choices: q[uiet], m[inimal]... Required options: --dataset, --label-col
As we can see, the tool provides clear information about the usage and available and required options.
In our example, we want to predict the value of the class column, and we want to set the training time to 10 seconds.
Taking all this into consideration, we run the command:
mlnet classification --dataset "DataSets/credit_customers.csv" --label-col 20 --has-header true --train-time 10
Since our dataset has headers, we can also use a column name for the label column:
mlnet classification --dataset "DataSets/credit_customers.csv" --label-col "class" --has-header true --train-time 10
As a result, we see the training process details and the location of the generated assets:
Start Training start multiclass classification Evaluate Metric: MacroAccuracy Available Trainers: LGBM,FASTFOREST,FASTTREE,LBFGS,SDCA Training time in seconds: 10 | Trainer MacroAccuracy Duration | |--------------------------------------------------------------------| |0 FastTreeOva 0.5751 0.6000 | |1 FastTreeOva 0.5854 0.2860 | |2 FastTreeOva 0.6822 0.4950 | |3 FastForestOva 0.6059 0.3550 | |4 FastTreeOva 0.5826 0.5150 | |5 LightGbmMulti 0.6465 0.1350 | |6 LightGbmMulti 0.6483 0.1390 | |7 FastTreeOva 0.6737 1.1230 | |8 FastTreeOva 0.6380 0.4040 | |9 LightGbmMulti 0.7010 0.1160 | [Source=AutoMLExperiment, Kind=Info] cancel training because cancellation token is invoked... |--------------------------------------------------------------------| | Experiment Results | |--------------------------------------------------------------------| | Summary | |--------------------------------------------------------------------| |ML Task: multiclass classification | |Dataset: DataSets\credit_customers.csv| |Label : class | |Total experiment time : 9.0000 Secs | |Total number of models explored: 11 | |--------------------------------------------------------------------| | Top 5 models explored | |--------------------------------------------------------------------| | Trainer MacroAccuracy Duration | |--------------------------------------------------------------------| |9 LightGbmMulti 0.7010 0.1160 | |2 FastTreeOva 0.6822 0.4950 | |7 FastTreeOva 0.6737 1.1230 | |6 LightGbmMulti 0.6483 0.1390 | |5 LightGbmMulti 0.6465 0.1350 | |--------------------------------------------------------------------| [Source=AutoMLExperiment, Kind=Info] cancel training because cancellation token is invoked... save SampleClassification.mbconfig to ML.NET_CLI\SampleClassification Generating a console project for the best pipeline at location : ML.NET_CLI\SampleClassification
Among the generated files, we have the SampleClassification.mbconfig
file:
{ "Scenario": "Classification", "DataSource": { "Type": "TabularFile", "Version": 1, "FilePath": "DataSets\\credit_customers.csv", "Delimiter": ",", "DecimalMarker": ".", "HasHeader": true, "ColumnProperties": [...] }, "Environment": { "Type": "LocalCPU", "Version": 1 }, ... }
It is the same configuration file used in the ML.NET Model Builder tool. You can check the entire file in our source code.
ML.NET CLI Integration
A common scenario where the ML.NET CLI tool comes in handy is integrating with different CI/CD tools.
For example, ML model development automation in Azure DevOps Pipeline:
trigger: - 'DataSets/credit_customers.csv' pool: vmImage: 'windows-latest' steps: - script: | dotnet tool install --global mlnet-win-x64 $dataPath = 'DataSets/credit_customers.csv' $outputModelPath = 'CreditCustomerClassificationModel.zip' mlnet classification --dataset $dataPath --label-col "class" --has-header true --train-time 10 --output $outputModelPath displayName: 'Retrain Credit Customer Classification Model'
This script defines a pipeline to retrain the model when the credit_customers.csv file is changed.
Model Evaluation
The ML.NET CLI generates the “best model” based on the quality metrics. Depending on the task type, different metrics are used.
The default metric in binary classification problems is accuracy. The usual metrics for multi-classification tasks are Micro Accuracy, which measures overall accuracy, and Macro accuracy, representing average accuracy at a class level. Finally, the default metric for the value prediction tasks is RSquared, with values ranging from 0 to 1.
Occasionally, there might be situations that require the usage of an additional metric. For details about the metrics available, please see the official documentation from Microsoft.
Conclusion
In this article, we’ve explored the ML.NET CLI tool, which enables us to automate and optimize machine learning model development and generation. It provides a simple but clear way to generate different ML models from a given dataset through the command line interface or a script.
In conclusion, ML.NET CLI is a useful tool for any developer or data scientist who needs to develop ML models in an easy and automated way.