🤖DecentrAI Whitepaper

A Blockchain based Decentralized AI Model Training with Hybrid Cluster Computing

Abstract

The burgeoning demand for artificial intelligence (AI) and machine learning (deep learning) applications has intensified the need for efficient, scalable, secure, and cost-effective AI model training solutions. Traditional centralized approaches to AI model training face challenges related to transparency, privacy, accountability, and cost. To address these issues, we present DecentrAI, a novel AI model training platform that combines a hybrid cluster computing model with blockchain technology, staking mechanisms, and enhanced privacy using zero-knowledge proofs (ZKP) to offer a decentralized, incentivized, privacy-preserving, and cost-effective solution for AI model training.

DecentrAI leverages data and model parallelism within a cluster computing framework to efficiently distribute the computational workload of large-scale AI model training across multiple nodes. This approach reduces the cost of training by utilizing the spare computing capacity of participating users, who can earn rewards for their contributions. The platform employs blockchain technology to create a secure, tamper-proof, and transparent environment for model training. Nodes participating in the training process stake cryptocurrency as collateral, which incentivizes accurate results and deters malicious behavior.

To enhance privacy, DecentrAI incorporates zero-knowledge proofs in the verification process. ZKPs allow nodes to prove the correctness of their training results without revealing the underlying data or model parameters, protecting sensitive information and ensuring privacy for all participants. This not only maintains the confidentiality of the training data but also prevents potential leakage of proprietary model information.

This paper presents the design, implementation, and evaluation of DecentrAI, a cost- effective blockchain-based AI model training platform that incorporates a hybrid clus ter computing model, staking mechanisms, and enhanced privacy using zero-knowledge proofs. We believe that DecentrAI can provide scalability, security, privacy, and cost reduction, paving the way for a more democratic, collaborative, and accountable approach to AI model training with enhanced privacy, transparency, and financial incentives for users with spare computing capacity.

Introduction

Blockchain, a distributed ledger technology that has transformed many industries with its decentralized and transparent nature, provides a tamper-proof and immutable record of transactions. Its cryptographic algorithms ensure the integrity of the data, while its distributed consensus mechanism eliminates the need for intermediaries, making it an ideal solution for trustless environments. With the advent of smart contracts, blockchain has become even more powerful, enabling automated and transparent execution of complex business logic.[1]

The integration of Artificial Intelligence (AI) and big data, on the other hand, has seen significant growth in recent years, with a focus on developing more advanced and complex tasks involving data-intensity, data analytics, predictive analysis, generative AI, and automation. AI and big data have become integral components across various applications. The demand for powerful AI models has led to a surge in data collection and processing requirements resulting in an exponential increase in computational resources [2].

To meet these requirements, centralized server-based approaches have been the norm which involves a single entity or organization being responsible for collecting, storing, and processing vast amount of data in a central location[3]. Though these cloud-based centralized servers have led to the development of highly complex applications and models, these have limitations ranging from data privacy to cost effectiveness, transparency and trust, and above all the risk of the single point of failure.

Recently, decentralized and distributed computation using blockchain technology has emerged as a promising solution which can address these concerns and limitations, and provide a secure and transparent platform for data-intensive and CPU-intensive tasks [4]. In a decentralized setting, the data can be partitioned and distributed among nodes in a blockchain network, with each node responsible for running the appropriate computation model and producing the results. Decentralization is also useful in exploiting model parallelism such that a computationally extensive task is divided among worker nodes. Such an arrangement may also require synchronization among nodes during iterations of computation. In either case, the results are combined in a collaborative and privacy enabled manner, where the privacy of results can be ensured using the zero-knowledge proof (ZKP) protocol. The protocol would enable the nodes to prove that they have performed honest computation without revealing any sensitive information[5]. Coupled with the integration and utilization of an efficient staking mechanism for validator nodes to stake the cryptocurrency as collateral, they can act in an honest model training process, thereby making the decentralized approach to AI model training more secure and trustworthy[6].

Decentralized computation can provide several advantages over centralized approaches. These include faster computation, privacy, security, and scalability. The distributed nature of blockchain technology ensures that there is no central point of failure or attack, making the training process more secure. Furthermore, the use of multi-party computation (MPC) can ensure that the privacy of the training data is maintained throughout the

training process, mitigating concerns over data privacy [7]. Decentralized computation also enables more efficient use of computational resources. It also promotes parallelization. Since the workload is distributed among nodes in the network, need for large centralized data centers is mitigated.

The motivation behind DecentrAI stems from the need to develop a decentralized platform for data-intensive and computationally extensive tasks. Considering the above mentioned, these tasks may include AI model training platform that combines the benefits of a hybrid cluster computing model, blockchain technology, staking mechanisms, and zero-knowledge proofs. By leveraging these technologies, DecentrAI aims to provide a more democratic, collaborative, and accountable approach to AI model training, enabling users with spare computing capacity to contribute to the training process and earn rewards while maintaining privacy and security. In doing so, DecentrAI seeks to empower a wider range of stakeholders to participate in the development and deployment of advanced AI models, fostering innovation and driving the AI field forward.

Related Work

The rapid advancements in Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) technologies have revolutionized various industries, including healthcare, finance, automotive, and entertainment. These technologies rely on the training of sophisticated AI models to make accurate predictions and decisions based on vast amounts of data. As the complexity of AI models and the size of training datasets continue to grow, the computational requirements for training these models have increased substantially. This has led to a rising demand for more efficient, scalable, and cost-effective AI model training solutions.

Traditionally, AI model training has been carried out using centralized approaches, where powerful computing resources, such as cloud-based servers or high-performance computing clusters, are employed to handle the demanding computational tasks. While these centralized solutions have been effective in many cases, they present several challenges, including limited scalability, data privacy concerns, high costs, lack of transparency, and single points of failure.

In recent years, there has been growing interest in exploring decentralized and distributed computing solutions to address these challenges. Decentralized AI model training platforms can distribute the computational workload across multiple nodes, harnessing the power of a diverse network of participants to collaboratively train AI models. This approach offers the potential to overcome the limitations of centralized systems, enhance privacy and security, reduce costs, and foster greater transparency and trust among stakeholders.

Decentralized and Collaborative AI (DCAI) on Blockchain project by Microsoft team, now renamed to Sharing Updatable Models (SUM) on Blockchain, conducted in 2019, demonstrated through open-source code that how blockchain technology can be used to democratize artificial intelligence (AI) by providing a framework for collaborative and decentralized AI in which participants can collaboratively train and maintain models on

public blockchains, making them free to use for evaluating predictions. In their framework, using smart contracts, models could be updated on chain for a small transaction fee or used for inference off-chain with no transaction costs. They incorporated a staking mechanism to stake the collateral amount for reward and punishment purpose, to necessitate collaborating nodes to act in an honest manner.[8]

BOINC (Berkeley Open Infrastructure for Network Computing) is an open-source platform designed to support volunteer and grid computing projects [9]. Developed at the University of California, Berkeley, BOINC enables researchers to harness the unused processing power of volunteers’ computers to perform large-scale scientific computing tasks. Volunteers download the BOINC software, which runs as a background process on their computers, and then choose the projects they want to support[10].

The paper titled, "Towards Federated Learning at Scale: System Design" by researchers from Google Research discusses the design, implementation, and deployment of a large-scale federated learning system[11]. The authors address key aspects of federated learning, such as secure aggregation, on-device training, and fault tolerance, to enable privacy-preserving and efficient machine learning across multiple devices. The system is designed to overcome challenges associated with distributed learning, including communication constraints, device heterogeneity, and data privacy. The work provides valuable insights into the practical considerations of deploying federated learning at scale and serves as a foundational reference for researchers and practitioners working on similar systems.

In another research by the google research [12], the authors introduce techniques such as structured updates, sketched updates, and quantization to reduce the communication overhead during the training process. This work offers valuable insights into optimizing federated learning systems, addressing the critical challenge of communication bottlenecks, and can be considered a significant contribution to the development of more efficient privacy-preserving machine learning / deep learning approaches.

Anothor paper presents a Secure Aggregation protocol for Federated Learning on user-held data, which preserves the privacy of each user’s model gradient. The protocol aims to be efficient, robust, and secure against various attack scenarios. The authors employ a double-masking scheme to protect user data even when the server can recover a user’s perturbations. The protocol is communication-efficient and can handle up to 1/3 of users dropping out. The paper develops a protocol for secure and efficient aggregation of data from multiple users for machine learning models. The protocol uses secret sharing, masking, and unmasking rounds to ensure privacy and security. The authors also suggest a refinement of the protocol to reduce communication costs and a practical deployment using server-mediated key agreement.[13]

Challenges in Centralized AI Model Training

Centralized AI model training, which involves using a single or few powerful computing resources to train AI models, has been the dominant approach in the field of machine learning / deep learning. Despite its widespread use, centralized training presents a number of challenges that may limit its effectiveness and applicability in various scenarios.

These challenges include:

Data Privacy: In a centralized training environment, sensitive data must often be shared with third-party cloud providers or centralized data centers, raising concerns about data privacy and security. Ensuring that sensitive data is protected and compliant with regulations such as GDPR and HIPAA can be challenging in a centralized setting.[14][15]
Cost: The computational resources required for training large-scale AI models can be expensive, especially when relying on cloud-based services or high-performance computing infrastructure. These costs can be prohibitive for small organizations or individual researchers, limiting their ability to develop and deploy advanced AI models.[16]
Transparency and Trust: In centralized training settings, it can be difficult to en sure transparency and trust between different stakeholders, such as data providers, model developers, and users. This lack of transparency can lead to concerns about data misuse, biased models, or unfair distribution of rewards and incentives.[17]
Single Point of Failure: Centralized AI model training systems are vulnerable to single points of failure, where issues such as hardware failures, software bugs, or security breaches can impact the entire training process. This can result in significant delays or loss of valuable data and model progress.[18]

These challenges highlight the need for alternative approaches to AI model training that can address the limitations of centralized systems. Decentralized AI model training platforms, like DecentrAI, offer promising solutions to overcome these challenges by distributing the training workload across multiple nodes, enhancing privacy and security, reducing costs, and fostering greater transparency and trust among participants.

In next part of this section, we present various projects and frameworks that provide distributed deep learning or machine learning as well as high-performance functionalities that reinforce the DecentrAI platform’s viability and applicability.

Projects and Frameworks

Distributed Machine Learning / Deep Learning

High-Performance Deep Learning and Machine Learning Project at Ohio State University

The HiDL/HiML initiatives aim to create innovative parallelization methods for training advanced out-of-core models and optimize distributed DL/ML training and inference performance through modern HPC technologies. To date, numerous papers have been published as part of these projects, and a few are [19; 20; 21].

ChainerMN: Scalable Distributed Deep Learning Framework

The utilization of high computing power, primarily GPUs, has been a key factor in the breakthrough of deep learning across various fields. To address unsolved challenges and further enhance deep learning, the ability to leverage distributed processing for increased computing power is crucial. In this regard, ChainerMN [22], a distributed deep learning framework is designed, implemented, and evaluated. The tests demonstrate that ChainerMN can effectively scale the learning process of the ResNet-50 model to the ImageNet dataset, utilizing up to 128 GPUs with a parallel efficiency of 90.

Decentralized and Collaborative AI on Blockchain

The centralization of large datasets and proprietary algorithms used in machine learning / deep learning hinders collaboration and the ability to keep models up to date. To address this issue, DCAI [23] suggests a collaborative framework where participants can build a dataset and host a continually updated model using smart contracts on a blockchain. This model would be publicly shared for inference without charge. The framework includes financial and gamified incentives for contributors to provide high-quality data, and it is suitable for learning problems where the model is used repeatedly for similar input. An open-source implementation for Ethereum blockchain is provided.

Horovod

Uber Engineering developed Horovod [24] in 2017 as an open-source distributed training framework for deep learning models. Its primary aim is to enhance the efficiency and speed of training large-scale datasets. Horovod is compatible with various deep learning frameworks, including TensorFlow, PyTorch, and Keras, enabling effortless parallelization of training across multiple GPUs or nodes in a cluster.

Deeplearning4J (DL4J)

DeepLearning4J (DL4J) [25] is a potent open-source deep learning library created for the Java Virtual Machine (JVM). It was built to be fast, adaptable, and scalable, making it perfect for distributed training across multiple machines. DL4J offers various tools for distributed training, including support for Spark and Hadoop, as well as its own built-in distributed training capabilities. The library also provides support for a range of neural network architectures and training algorithms, and includes tools for data preprocessing, visualization, and evaluation.

PyTorch

PyTorch [26; 27] was developed by Facebook, PyTorch is one of the most widely used and easy-to-learn deep learning frameworks. With PyTorch, creating and executing neural network modules is highly efficient, and the framework’s distributed training modules allow for parallel training with minimal code. PyTorch offers three approaches to distributed training:

nn.DataParallel: This package allows you to perform parallel training in a single machine with multiple GPUs. One advantage is that it requires a minimum code.
nn.DistributedDataParallel: This package allows you to perform parallel training across multiple GPUs within multiple machines. It requires a few more extra steps to configure the training process.
torch.distributed.rpc: This package allows you to perform a model-parallelism strategy. It is very efficient if your model is large and does not fit in a single GPU.

DeepSpeed

PyTorch’s distributed training approach is primarily focused on data parallelism, while DeepSpeed [28], a framework built on top of PyTorch by Microsoft, targets model-parallelism and offers distributed training for large-scale models.

Distributed TensorFlow

Google’s TensorFlow supports distributed training [29; 30], utilizing data-parallel techniques. The tf.distribute API is available for leveraging distributed training on TensorFlow, offering a configurable training setup according to specific requirements.

TensorFlowOnSpark

Apache Spark is an open-source big data processing platform, capable of performing various data-related tasks such as data engineering, data science, and machine learning / deep learning. In order to use TensorFlow on Apache Spark, TensorFlowOnSpark must be utilized. Developed by Yahoo, TensorFlowOnSpark [31] is a machine learning framework that enables distributed training and inference on Apache Spark Clusters and Apache Hadoop. The framework facilitates distributed training and inference with minimal code changes to existing TensorFlow code on the shared grid.

BigDL

BigDL is an open-source framework that enables distributed training on Apache Spark, developed by Intel to enable running deep learning algorithms on Hadoop and Spark clusters. It offers an end-to-end pipeline for both data analysis and deep learning applications, which makes it easier to build and process production data [32].

Ray

Ray [33] is a Pytorch-based open-source framework for distributed training that includes tools for launching GPU clusters on any cloud provider. In contrast to other libraries, Ray is highly flexible and can operate on a variety of platforms, such as Azure, GCD, AWS Apache Spark, and Kubernetes.

Flux

Flux is an open-source deep learning library that is designed for building and training machine learning models in a distributed environment [34]. It is written in the Julia programming language, making it a popular choice among researchers and developers. Flux has the ability to scale to multiple processors and GPUs, which is an advantage for faster training of large-scale models. It supports distributed training by splitting the workload across multiple devices and combining the results at the end.

Data Intensive High-Performance Computing

Apache Hadoop

Apache Hadoop is an open-source framework for distributed storage and processing of large data sets across clusters of computers [35; 36; 37]. It was initially inspired by Google’s MapReduce and Google File System (GFS) papers and was designed to handle Big Data applications such as web search indexing, log processing, and large-scale data analytics. Hadoop provides a distributed file system called Hadoop Distributed File System (HDFS) that can store large data sets across multiple nodes in a cluster, as well as a processing framework called MapReduce that allows data processing to be distributed across the nodes. With its ability to scale horizontally, Hadoop has become a popular solution for organizations that need to process and analyze large amounts of data. It has also spawned a rich ecosystem of related tools and technologies, including Hive, Pig, Spark, and HBase, among others, making it a versatile and widely-used solution for Big Data processing.

Apache Spark

Apache Spark [38] is a popular open-source distributed computing system that is widely used for big data processing. It offers a unified platform to process and analyze large-scale data in a distributed computing environment. Spark allows developers to write applications in various programming languages, including Scala, Java, Python, and R. It provides various libraries for different data processing tasks, such as Spark SQL for data processing and analysis, MLlib for machine learning, GraphX for graph processing, and Streaming for real-time data processing. Spark can run on various big data platforms, such as Hadoop, Kubernetes, and Apache Mesos. Its ability to process data in-memory, with fault tolerance, makes it a popular choice for big data processing tasks.

OpenMPI

OpenMPI is an open-source message passing interface (MPI) that is widely used for parallel computing [39]. It provides a standardized interface for communication between processes and is compatible with a wide range of computing platforms, including clusters, grids, and supercomputers. OpenMPI is designed to be highly scalable, allowing users to run parallel applications on a large number of computing nodes. It also includes a number of advanced features, such as dynamic process spawning and process fault tolerance, which make it a popular choice for large-scale distributed computing. OpenMPI has been used in a wide range of scientific and engineering applications, from astrophysics simulations to climate modeling and molecular dynamics.

Apache Flink

Apache Flink is an open-source distributed stream processing framework for high performance, scalable, and fault-tolerant data streaming applications [40; 41]. It is designed to process large volumes of data streams in real-time and offers a unified programming model for both batch and stream processing. One of the key strengths of Flink is its ability to support event-driven, data-driven, and time-driven processing models. It provides various APIs for building stream processing applications, including a DataStream API for building streaming pipelines and a Table API for building SQL-like queries on streaming data. Flink also offers various connectors to integrate with other systems and tools, such as Apache Kafka, Apache Hadoop, and Elasticsearch.

Apache Storm

Apache Storm [42] is an open-source distributed real-time computation system that processes large streams of data in real-time. It is designed to be fault-tolerant, scalable, and reliable. Apache Storm provides a programming model that allows users to define processing pipelines using a set of primitives, such as spouts and bolts. Spouts are the source of the data stream, while bolts perform transformations and aggregations on the data. Storm provides support for multiple programming languages, including Java, Python, and Clojure. The system also integrates with various other Apache projects, such as Hadoop and Kafka, to provide a complete real-time data processing solution. Storm is commonly used in industries such as finance, telecommunications, and advertising for real-time data analysis and processing.

Problem Statement

In this section, we describe the problem of maintaining a blockchain-based distributed computation and storage environment. We also highlight key requirements for developing an enriched solution.

The overall goal is motivated by the concept of volunteer computing. However, there are additional requirements of ensuring privacy and data integrity. Following are the key requirements:

The system should facilitate model and data distribution to achieve parallelism
The proposed system should be able to cater to the threat of incorrect results from faulty and malicious nodes.
It should have an intelligent mechanism of data and model aggregation for compiling results.
It should ensure data integrity and ensure user privacy.

Possible Use Cases

In this section, we describe the possible use cases of the DecentrAI network. Below is a non-extensive list of applications, which can benefit from DecentrAI:

For Partition/Aggregate Problems: Such problems work on Single Instruction/Program Multiple Data (SIMD/SPMD). In such problems, a large dataset is divided into smaller chunks. Each node computes same instructions on different set of chunk(s). The final result is computed by aggregating the results from each node. These problems are inherently parallel. Examples of such problems include wordcount [43], item frequency, market basket analysis [44], and distributed search etc.
Distributed Machine Learning / Deep Learning Problems DecentrAI can be extremely beneficial to accommodate various scenarios in node topologies. Besides the peer-to-peer arrangement, computing nodes can be arranged in other topologies such as Tree and Centralized topology (Ensemble Learning). It can also utilize different synchronization protocols such as Bulk Synchronous Proto col (BSP) and Stale Synchornous Protocol (SSP) [45]. We anticipate that the wide-scale flexibility of DecentrAI will facilitate its wide-scale usability.
For Ensemble Learning Methods: In such methods, the Machine Learning (ML) / Deep Learning (DL) model needs to be evaluated using different models to determine the most appropriate method. Since the models are independently executed (in parallel) on different resources, DecentrAI can provide a useful platform for such tasks [46].
Privacy-aware Computing: DecentrAI can ensure privacy to users and providers through blockchain-based privacy measures. For instance, Zero Knowledge Proof (ZKP) can ensure user anonymity [47]. For this purpose, the DecentrAI provides API.
Data and Model Integrity: DecentrAI could be highly useful in scenarios where integrity of results and models is desired. The inherent mechanism of data integrity in blockchain can be leveraged to provide model and result integrity. For instance, the results will be signed through the private key of the user so that their integrity can be maintained. Similarly, any model or application, which is published through the DecentrAI marketplace can be also be verified [48].

DecentrAI - Platform Overview

The prime objective of DecentrAI framework is to offer distributed training and testing of ML and AI based jobs. However, a user can also submit high performance computing jobs to the platform as well. This section provides a complete overview of the DecentrAI platform. In Section 4.1 we present users and their types. In Section 4.2, we discuss different types of supported jobs. In Section 4.3, we present core services provided by platform, in Section 4.4 we present different categories of nodes and in Section 4.5 we discuss necessary platform operations.

DecentrAI User Types

There can be three different types of users on the DecentrAI platform, as presented below.

DecentrAI User (DU): A DecentrAI user refers to an end-user who wants to execute his machine-learning / deep learning task on the DecentrAI platform. A DU can be an individual, such as a student or researcher, who wants to use blockchain capabilities to build, train, and test his model in a distributed manner to achieve faster and more reliable results. Another type of user on the DecentrAI platform is an organization or an institute whose objective is to bring new products to market more quickly by leveraging blockchain and distributed computing. These organizations can be commercial, nonprofit, educational, or research institutes.
DecentrAI Service Providers (DSP): DecentrAI service providers are basically another type of user offering their computational power for decentralized model training, testing, and job execution services. A DSP can consists of a single decentride or a cluster of decentrides on the DecentrAI platform. A decentride is an end node on the DecentrAI platform that is responsible for the execution of an actual job.
DecentrAI App Developer (App Dev): A DecentrAI app developer is essentially a developer who creates and sells his application services on the DecentrAI marketplace.

After completing the registration process, users who offer a variety of services, such as DecentrAI Service Providers (DSP) and DecentrAI App Developers (App Dev), become available on the DecentrAI Marketplace.

DecentrAI Job Types

A DU can submit different types of jobs on the DecentrAI platform. Each job requires different requirements and resources to meet, before the job gets approved for execution. In this section, we present and discuss a few job types we plan to support initially.

Model Training Job - Cat I: A user can submit a request to train the model on the DecentrAI platform. In this scenario, the outcome is the trained model.
Model Training and Testing Job - Cat II: A user wants to use the DecentrAI platform for model training and testing purposes. In this scenario, the outcomes are (i) a trained model and (ii) testing services.
Testing Job - Cat III: If a user already has a trained model and wants to use that model for testing. In another scenario, a user can use an already trained model available on the DecentrAI marketplace. The outcome for this scenario is testing services.
High performance computing (HPC) Job - Cat IV: In this job category, a user can submit any other job that requires high-performance computing.
Text to Image Generation Job - Cat V: In this job category, a user can submit a text and the DecentrAI system will generate the image based on the provide text.
Image to 3D Model - Cat VI: In this job category, a user can submit image(s) of an object and the DecentrAI system will create a 3D model of the provide object.
Custom Job Type - Cat VII: In this category, a user can submit a custom job type not listed in the above category. The system is capable enough to execute the job and generate the desired output.

Core Services

DecentrAI core services provides the basic building blocks necessary for the operation of the platform. Core services can be categorized into different groups as shown in the Figure 1. Each group of services can be executed on a separate node. However, the platform also allows its users to run all services on single node.

Application Services:

Application Registry Services (ARS): An approved application must be registered first. Application registry services provide application registration and registries.
Application Approval Services (AAS). An application built by a developer must be approved before it becomes available on the application registry. Application approval services provide application approval services.
Application Lookup Services (ALS): The job of application lookup services is to provide a search service from which users can search for a particular application.

Job Services:

Job Submission Services (JSS): A user can submit a job to the DecentrAI platform via job submission services running on different nodes on the platform.
Job Validation Services (JVS): DecentrAI Job Validation Services (JVS) validates and verifies the job requirements as per the agreed protocol. After the verification and validation processes, the job is approved and its state is ready. The identification of the job category is also the responsibility of JVS.
Job Splitting Services (JSS): The responsibility of the job splitting services is to split the submitted job into small chunks of parallel execution units. Each job chunk can be scheduled independently by job scheduling layer.
Job Scheduling Services (JSS): The responsibility of the job scheduling services is to schedule the splitted jobs. The job scheduling services make use of of different parameters in order to schedule job. For example, reputation score of the decentride, reliability factor as next, job type as presented in Section 4.2, DU requirements etc.
Result Evaluation and Validation Services (REVS): The evaluation and validation of the results submitted by different decentrides is the responsibility of result evaluation and validation services. DecentrAI platform offers three proposed models.
- High Reliability: In high reliability model, the same job chunk is scheduled to five different decentrides.
- Standard Reliability: In standard reliability model, the same job chunk is scheduled to three different decentrides.
- No Reliability: In no reliability model, the same job chunk is scheduled to one decentrides only.

In all three models, REVS decides on majority similarity.

Results Integration Services (RIS): Result integration services plays a crucial role in combining the results received from different decentrides.

Resource Services:

Resource Discovery: This services is responsible for the resource discovery on the platform.
Resource Monitoring Services (RMS): The monitoring of available resources on a decentride is a crucial task on which the entire process of job scheduling depends. The resource monitoring service monitors the available and occupied resources on a node and sends a notification to the job scheduling layer. When the job scheduling layer does not receive three consecutive notifications from a decentride about its health status, it is assumed that the decentride has gone offline. The job scheduling layer will then shift the job to the next decentride.
Resource Broker Services (RBS): DecentrAI resource brokers can be considered decentralized brokers whose job is to select the most appropriate resource (in terms of CPU/GPU, memory, storage, network, etc.)

Economy Services:

DecentrAI Trading Services (DTS): DTS enables resource trading and the execution of user requests directed through DRBs.
DecentrAI Market Directory (DMD): The interaction between DUs and DSPs during the resource trading (service cost establishment) process is mediated through a DecentrAI market directory (DMD).

DecentrAI Nodes Types

Nodes on the DecentrAI platform can be of different types based on the types of services they are rendering. For example, a core node provides all core services of the platform. Initially, our plan is to start with two types of nodes. First is the decentride node, which is responsible for the execution of the jobs, and second is the core node, which provides all core services. However, on the basis of the type of services provided by a node, it can be further classified into app nodes, job nodes, resource nodes, eco nodes, etc.

DecentrAI Platform Operations

In this section we present the different platform services or processes necessary for the operation of the DecentrAI platform’s overarching functionality.

DecentrAI Service Provider (DSP) Registration Process

The following steps must be performed by a DSP in order to offer its services.

A DSP must be registered on the DecentrAI platform by creating a special wallet address or account.

After the initial registration process, DSP must deposit 1000 DCE coins into the DSP deposit contract.

When the above steps are performed successfully, a DSP is required to at least onboard one decentride on the network. Decentride registration and onboarding process is presented in Section 4.5.2.

Decentride Registration and Onboarding Process

A DSP is required to follow the below steps for a successful registration and onboarding of decentride on the network.

A DSP must install Decentride Node Agent (DNA) on the end node.

Decentride node agent or DNA can be installed within any Operating System (OS) as an application service in a secure containerized environment. A node must also meet basic hardware and software requirements to become a decentride.
A unique DNA node public address is generated, that is associated with its private key.

When DNA is successfully installed on a node, it connects with a Decentrides Registration Services (DRS). The interaction between Decentrides Registration Services (DRS) provider nodes and potential decentride take place on a secure communication channel using DecentrAI Zero-Knowledge Protocol (DZKP).

Decentrides Registration Services (DRS) are hosted on different nodes on the blockchain network.

The job of DRS providers is to verify and validate the node’s minimum requirements to become a decentride. These requirements include:

Available coins staked by DSP.
Available agreed, minimum CPU/GPU cycles or percentage staked.
Available agreed, minimum memory staked.
Available agreed, minimum storage staked.
Available agreed, minimum network bandwidth staked.

DecentrAI User (DU) Registration Process

In this section, we present DU registration process.

A user account must be created to submit the job. It can be created using either the DecentrAI web application or the DecentrAI mobile application.

In both cases, a unique DU public address associated with the private key is generated.

The user must deposit necessary DCE coins to submit and execute the task.

There can be different account types on the DecentrAI platform. Each account type is used for different purpose and may have different process for account creation and activation. For example,

Individual account
Group account
Business account
University account, etc.

DecentrAI Job Execution Lifecycle

In this section, we present the entire job execution lifecycle.

The first step is the submission of a job by a DU to the DecentrAI platform.

The DecentrAI platform receives the submitted job and forward it to a core node.

The Job Validation Services (JVS) running on the core node validates and verifies the job requirements as per the agreed protocol.

When the job validation services validates all requirements and prerequisites for the job, the job splitting service will split the job.

Several parameters are considered by the job splitting services before the splitting begins e.g. job category as discussed in Section 4.2, application types to be used, storage location and availability of the dataset, network delays, potential DSPs and decentrides, job economic and scheduling needs, etc.

When the job is split successfully, the job scheduling algorithm will schedule the individuals chunks to the appropriate decentrides.

Decentrides will execute the job and submit the results back to the core nodes.

When the result is received from decentrides, the result evaluation and validation services running on the core nodes will validate the results as per the requested reliability needs.

The validated results are combined by using result integration services.

The final result is sent back to the DU.

DecentrAI Job Scheduling Approaches

Primarily DecentrAI provide two approaches for job scheduling which are presented next. Manually Scheduled Job: In this type, a job is opened for bidding, DSPs are allowed to submit their bids against a job. The coordination between DU and DSP is achieved with the help of DMD. DecentrAI Trading Service (DTS) plays its role in negotiating the service cost. The Figure 24 presents manually scheduled job submission process.

Auto Scheduled Job: In this type, a DU submits its job to DecentrAI Front-end User Interface (DFUI) manually or via DecentrAI Self Submission APIs (DSS-API). In both cases, DecentrAI decentralized Resource Brokers (DRB) selects the most appropriate DSPs for the execution of the job based on the requirement checklist submitted by the user which is presented in Section 4.6.1. A DU user must submits a requirement checklist for both type of jobs. The Figure 25 presents auto-scheduled job submission process.

Requirement Checklist for Job Submission

In this section, we discuss a few generic requirements for all job categories. A DecentrAI user is required to provide a set of requirements necessary for DRBs for the selection of the most appropriate DSPs for the smooth execution of jobs. For example, consider a scenario in which a commercial organization is required to build and train its model at maximum speed considering the time constraint. In this case, DU is required to submit the details and specifications of the dataset, the type of machine learning (supervised, unsupervised, etc.) or deep learning to be performed, and all necessary hyper-parameters of the algorithm in addition to the maximum time threshold. On the other hand, consider an another user who is a student having limited budget, but wants to utilize DecentrAI platform. In this case, the user will submit all the details related to learning algorithm in addition to the amount of fee he wants to pay.

On DecentrAI, a user can choose from the available standard machine learning algorithms or deep learning architecture or submit their own algorithm for training. In both cases, we refer to MApp, which stands for machine learning or deep learning application.

An MJob can have one to several constraints and different input data, for example a job j1 is time constrained that must be completed within one hour, another job j2 is fee constraint, and user is willing to pay just $50 for the training. Similarly confidentiality of the dataset or the training results may be a constraint for some other jobs. Therefore, MJobs must have a ’constrained set’ and a ’resource set’. A resource set include MApp details, trainining data, hyper-parameters etc.

Reward Distribution and Penalization

Every DecentrAI Service Provider (DSP) is eligible to receive the reward for the job they have completed. The reward for a job is distributed at the end of each epoch after completing the job. All honest decentride nodes receive a reward and a positive reputation score. Any dishonest decentride is penalized and receives a negative reputation score. The reputation score of decentride is used by DecentrAI Job Scheduling Services to schedule future jobs.

Similarly, any dishonest decentride that does not meet the requirements is penalized. A dishonest node is penalized by slashing the DCE staked and having a negative reputation. A decentride submitting a bad result, a late submission, or not submitting the results at all is liable for penalization.

DecentrAI Zero Knowledge Services

In the proposed DecentrAI platform, Zero-Knowledge Proofs (ZKPs) will be utilized to enhance privacy, security, and trust among users and nodes participating in the decentralized machine learning / deep learning process. ZKPs will play a significant role in several aspects of the platform:

Data privacy: ZKPs will be used to prove that a node possesses a certain piece of data or has performed specific computations without revealing the underlying data itself. This capability will be particularly useful in preserving the privacy of sensitive information, such as user datasets and model parameters, during the training and testing processes [49].
Model validation: In scenarios where a user submits a custom machine learning algorithm or deep learning architecture for training, ZKPs will be employed to validate the correctness and integrity of the algorithm without disclosing its actual implementation. This will allow other nodes to trust the submitted algorithm without gaining access to its proprietary details [50; 51; 52].
Access control and authentication: ZKPs will be utilized to verify the authenticity of users and nodes, as well as to manage access control within the platform. By employing ZKPs, nodes can authenticate each other without sharing sensi tive credentials, thereby reducing the risk of identity theft or unauthorized access [53; 54].
Secure aggregation and model updates: In federated learning scenarios, ZKPs will be used to prove the correctness and validity of the aggregated model updates or gradient contributions from participating nodes. This will ensure that the central server can trust the received updates without having direct access to the data used for training at each node [55; 56].
Incentive mechanisms and rewards: If the DecentrAI platform incorporates a token-based incentive system, ZKPs will be used to prove the completion of tasks or the provision of resources by nodes without disclosing the details of the work performed. This will allow for a secure and transparent distribution of rewards without compromising privacy [57; 58].

The integration of ZKPs into the DecentrAI platform will help maintain a high level of privacy and security, fostering trust among participants and enabling efficient, privacy- preserving machine learning / deep learning processes in a decentralized environment.

DecentrAI Token Utility

There are various ways in which a token or coin can provide utility within a blockchain ecosystem. In this section, we present how the DecentrAI DCE coin or token will be used within the DecentrAI platform.

Access to Services or Features

Tokens can be used to gain access to certain services or features within a platform or ecosystem. For example, in a decentralized file storage network, users may need to hold a certain amount of the platform’s native token to be able to store or retrieve files. On the DecentrAI platform, there are three types of users as discussed in Section 4.1: (i) DecentrAI User (DU), (ii) DecentrAI Service Provider (DSP), and DecentrAI App Developer (DAD).

DecentrAI User (DU): DU is fundamentally a consumer who will submit a high- performance computing or AI job to the platform. A DU is required to deposit a certain minimum amount so that the submitted job can be executed on the platform.
DecentrAI Service Provider (DSP): A DSP is basically a service provider. DSP is responsible for onboarding decentrides (nodes) on the platform so that execution of the submitted job can be performed. A DSP may also enroll core nodes whose job it is to provide the necessary core services required to operate the DecentrAI platform. All core services are discussed in Section 4.3. A DSP must deposit a certain amount of DCE coins into the deposit contract before onboarding a decentride node or providing core services.
DecentrAI App Developer (DAD): DAD is a third type of user who will utilize the DecentrAI platform. App developers are responsible for creating apps on the DecentrAI platform by using APIs provided by the platform and earning rewards. An app developer can utilize the sandbox test environment free of charge to develop apps on the platform. However, the DecentrAI platform is required to verify the submitted app for security, performance, and compliance purposes. After the evaluation, the signed application becomes available on the DecentrAI market directory. An app developer is required to pay a minimal fee for application evaluation, signing, and enlistment on the marketplace. This fee is also paid via platform native coin.

Payment for Goods and Services

As discussed above, tokens can be used as a means of payment for goods and services within a particular marketplace or platform. On the DecentrAI platform, tokens are generally used for offering or consuming services.

Staking and Rewards

For any blockchain that uses a Proof of Stake (PoS)-based consensus protocol, staking is mandatory. Users on the DecentrAI platform can stake coins and earn rewards on them. Staking involves holding tokens in a wallet or account to support the security or operation of a blockchain network, and in return, users may receive rewards or incentives for their contributions.

Incentivizing Participation

Tokens can be used to incentivize users to perform certain actions within the platform. Below, we present a few ways in which tokens can be used to incentivize participation:

Referral Programs: On the DecentrAI platform, a user can be incentivized via a referral program. The DecentrAI platforms can offer token rewards to users who refer new users to the platform. This can help drive user acquisition and growth within the ecosystem.
Bug Bounties: DecentrAI can offer token rewards to users who identify and report bugs or security vulnerabilities within the platform. This can help improve the security and reliability of the platform.
Encouraging Feedback and Reviews: Tokens can be used to incentivize users to provide feedback or reviews about a particular application or service within a platform. This can help improve the quality of the platform and build trust within the community.
Rewarding Content Creation: The DecentrAI platform can reward users with tokens for creating valuable content, such as an application, or a dataset that a user submitted job can utilize. This can encourage users to create more appropriate applications and datasets, which can increase engagement and drive growth within the ecosystem.

Governance and Voting

Governance and voting are another way in which tokens can provide utility within a blockchain ecosystem, and they are often used to give users a right in the decision- making process of the network. On the DecentrAI platform, tokens can be used to give users a stake in the governance process, as they may have voting rights that allow them to participate in decision-making. These voting rights can be proportional to the number of tokens held by the user, or they can be based on other factors, such as reputation or contributions to the network.

By using tokens for governance and voting, developers can create a more democratic and decentralized ecosystem, in which decisions are made by the community rather than centralized entities. This can help build trust within the community and promote the long-term sustainability of the network.

Conclusion

In conclusion, the DecentrAI platform offers a diverse range of services, catering to the varied needs of users in the machine learning, deep learning and high-performance computing landscape. Initially, the platform is designed to support four key job types: Model Training, Model Training and Testing, Testing, and High-Performance Computing.

The first category, Model Training, focuses on providing users with the ability to train models on the platform, yielding a trained model as the outcome. The second category, Model Training and Testing, extends this functionality by not only offering model training but also testing services to evaluate the model’s performance. The third category, Testing, is tailored for users who have a pre-trained model or want to utilize a model from the DecentrAI marketplace, providing them with testing services to assess their model’s effectiveness. Finally, the High-Performance Computing job category addresses the needs of users who require significant computational resources for tasks beyond the realm of machine learning / deep learning.

By providing these diverse services, the DecentrAI platform aims to be a comprehensive solution for users with different requirements and resources. As the platform evolves, it will continue to adapt and expand its offerings to stay at the forefront of the ever-changing machine learning, deep learning and high-performance computing landscape, delivering cutting-edge, privacy-preserving, and scalable solutions for its users.

References

[1]A. G. Gad, D. T. Mosa, L. Abualigah, A. A. Abohany, Emerging trends in blockchain technology and applications: A review and outlook, Journal of King Saud University

- Computer and Information Sciences 34 (9) (2022) 6719–6742.

[2]L. Shen, Y. Sun, Z. Yu, L. Ding, X. Tian, D. Tao, On efficient training of large-scale deep learning models: A literature review, arXiv preprint arXiv:2304.03589 (2023).

[3]C. Lauren, Training deep learning models at scale in azure, Microsoft AI/ML Blogs (May 2020).

URL https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/

training-deep-learning-models-at-scale-in-azure/ba-p/1399647

[4]I. Gupta, Decentralization of artificial intelligence: Analyzing developments in decentralized learning and distributed ai networks (05 2020).

[5]N. Singh, P. Dayama, V. Pandit, Zero knowledge proofs towards verifiable decentralized ai pipelines, in: I. Eyal, J. Garay (Eds.), Financial Cryptography and Data Security, Springer International Publishing, Cham, 2022, pp. 248–275.

[6]K. Toyoda, A. N. Zhang, Mechanism design for an incentive-aware blockchain- enabled federated learning platform, in: 2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 395–403.

[7]Y. Yang, L. Wei, J. Wu, C. Long, Block-smpc: A blockchain-based secure multiparty computation for privacy-protected data sharing, in: Proceedings of the 2020 The 2nd International Conference on Blockchain Technology, ICBCT’20, Association for Computing Machinery, New York, NY, USA, 2020, p. 46–51.

[8]J. D. Harris, B. Waggoner, Decentralized collaborative ai on blockchain, in: 2019 IEEE International Conference on Blockchain (Blockchain), IEEE, 2019.

[9]D. P. Anderson, Boinc: A system for public-resource computing and storage, in: Fifth IEEE/ACM international workshop on grid computing, IEEE, 2004, pp. 4–10.

[10]N. e. Høimyr, J. Blomer, P. Buncic, M. Giovannozzi, A. Gonzalez, A. Harutyunyan, P. Jones, A. Karneyeu, M. Marquina, E. Mcintosh, et al., Boinc service for volunteer cloud computing, in: Journal of Physics: Conference Series, Vol. 396, IOP Publishing, 2012, p. 032057.

[11]K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečny`, S. Mazzocchi, B. McMahan, et al., Towards federated learning at scale: System design, Proceedings of machine learning and systems 1 (2019) 374– 388.

[12]J. Konečny`, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, D. Bacon, Federated learning: Strategies for improving communication efficiency, arXiv preprint arXiv:1610.05492 (2016).

[13]K. A. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel,

D. Ramage, A. Segal, K. Seth, Practical secure aggregation for federated learning on user-held data, CoRR abs/1611.04482 (2016). arXiv:1611.04482.

URL http://arxiv.org/abs/1611.04482

[14]U. Mattsson, Practical data security and privacy for gdpr and ccpa, ISACA Journal 2 (2020).

URL https://www.isaca.org/resources/isaca-journal/issues/2020/volume-

3/practical-data-security-and-privacy-for-gdpr-and-ccpa

[15]V. Ruehle, R. Sim, S. Yekhanin, N. Chandran, M. Chase, D. Jones, K. Laine,

B. K"opf, J. Teevan, J. Kleewein, S. Rajmohan, Privacy preserving machine learn- ing: Maintaining confidentiality and preserving trust (10 2021).

URL https://www.microsoft.com/en-us/research/blog/privacy-preserving-

machine-learning-maintaining-confidentiality-and-preserving-trust/

[16]O. Sharir, B. Peleg, Y. Shoham, The cost of training NLP models: A concise overview, CoRR abs/2004.08900 (2020). arXiv:2004.08900.

[17]P. Schmidt, F. Biessmann, T. Teubner, Transparency and trust in artificial intelligence systems, Journal of Decision Systems 29 (4) (2020) 260–278. doi: 10.1080/12460125.2020.1819094.

[18]A. Householder, J. M. Spring, N. VanHoudnos, O. Wright, Machine learning classifiers trained via gradient descent are vulnerable to arbitrary misclassification attack, https://kb.cert.org/vuls/id/425163 (3 2020).

[19]N. Alnaasan, A. Jain, A. Shafi, H. Subramoni, D. K. Panda, Accdp: Accelerated data-parallel distributed dnn training for modern gpu-based hpc clusters, in: 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC), IEEE, 2022, pp. 32–41.

[20]A. Jain, A. Shafi, Q. Anthony, P. Kousha, H. Subramoni, D. K. Panda, Hy-fi: Hy brid fi ve-dimensional parallel dnn training on high-performance gpu clusters, in: High Performance Computing: 37th International Conference, ISC High Performance 2022, Hamburg, Germany, May 29–June 2, 2022, Proceedings, Springer, 2022, pp. 109–130.

[21]A. Jain, N. Alnaasan, A. Shafi, H. Subramoni, D. K. Panda, Accelerating cpu-based distributed dnn training on modern hpc clusters using bluefield-2 dpus, in: 2021 IEEE Symposium on High-Performance Interconnects (HOTI), IEEE, 2021, pp. 17– 24.

[22]T. Akiba, K. Fukuda, S. Suzuki, Chainermn: Scalable distributed deep learning framework, arXiv preprint arXiv:1710.11351 (2017).

[23]D. Justin, B. Harris, Decentralized & collaborative ai on blockchain, in: Proceedings of the 2019 IEEE International Conference on Blockchain (Blockchain), Atlanta, GA, USA, 2019, pp. 14–17.

[24]A. Sergeev, M. Del Balso, Horovod: fast and easy distributed deep learning in tensorflow, arXiv preprint arXiv:1802.05799 (2018).

[25]Z. Wang, K. Liu, J. Li, Y. Zhu, Y. Zhang, Various frameworks and libraries of machine learning and deep learning: a survey, Archives of computational methods in engineering (2019) 1–24.

[26]A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,

N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems 32 (2019).

[27]M. Chen, Analysis of data parallelism methods with deep neural network, in: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering, 2022, pp. 1857–1861.

[28]J. Rasley, S. Rajbhandari, O. Ruwase, Y. He, Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 3505–3506.

[29]Y. Kim, H. Choi, J. Lee, J.-S. Kim, H. Jei, H. Roh, Towards an optimized distributed deep learning framework for a heterogeneous multi-gpu cluster, Cluster Computing 23 (2020) 2287–2300.

[30]A. Malik, M. Lu, N. Wang, Y. Lin, S. Yoo, Detailed performance analysis of distributed tensorflow on a gpu cluster using deep learning algorithms, in: 2018 New York Scientific Data Summit (NYSDS), IEEE, 2018, pp. 1–8.

[31]X. Lu, H. Shi, R. Biswas, M. H. Javed, D. K. Panda, Dlobd: a comprehensive study of deep learning over big data stacks on hpc clusters, IEEE Transactions on Multi-Scale Computing Systems 4 (4) (2018) 635–648.

[32]J. J. Dai, Y. Wang, X. Qiu, D. Ding, Y. Zhang, Y. Wang, X. Jia, C. L. Zhang,

Y. Wan, Z. Li, et al., Bigdl: A distributed deep learning framework for big data, in: Proceedings of the ACM Symposium on Cloud Computing, 2019, pp. 50–60.

[33]P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol,

Z. Yang, W. Paul, M. I. Jordan, et al., Ray: A distributed framework for emerging

{AI} applications, in: 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18), 2018, pp. 561–577.

[34]M. Innes, Flux: Elegant machine learning with julia, Journal of Open Source Software 3 (25) (2018) 602.

[35]A. Ghosh, V. Bhatnagar, Big data analytics using hadoop ecosystem, International Journal of Advanced Research in Computer Science and Software Engineering 5 (2) (2015) 351–354.

[36]S. Rizvi, M. B. Amin, S. Bashir, Big data analytics in healthcare using hadoop ecosystem: A review, Journal of Ambient Intelligence and Humanized Computing 9 (1) (2018) 141–153.

[37]X. Lu, H. Song, J. Sun, A big data processing architecture using hadoop ecosystem, Journal of Computational Science 16 (2016) 63–68.

[38]Databricks, Apache spark (2021).

URL https://databricks.com/spark/about

[39]E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres,

V. Sahay, Openmpi: A high-performance, portable implementation of the mpi message passing interface standard, Parallel Computing 30 (7) (2004) 1–32. doi: 10.1016/j.parco.2004.03.002.

[40]P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, K. Tzoumas, Apache flink: Stream and batch processing in a single engine, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36 (4) (2015) 28–38. doi:10.1109/BUPT-TCDE.2015.25.

[41]M. E. Kharbili, N. Rivierre, L. Tissot, Apache flink: a comprehensive review, Journal of Big Data 8 (1) (2021) 1–29. doi:10.1186/s40537-020-00377-5.

[42]D. Kaur, P. Bansal, Performance analysis of apache storm with different data distribution techniques, Journal of Ambient Intelligence and Humanized Computing 12 (2) (2021) 1367–1375.

[43]J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters, Communications of the ACM 51 (1) (2008) 107–113.

[44]J. Woo, Market basket analysis algorithms with mapreduce, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 3 (6) (2013) 445–452.

[45]J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg, T. Verbelen, J. S. Rellermeyer, A survey on distributed machine learning, Acm computing surveys (csur) 53 (2) (2020) 1–33.

[46]U. Majeed, C. S. Hong, Eflchain: Ensemble learning via federated learning over blockchain network: a framework, (2019) 845–847.

[47]X. Sun, F. R. Yu, P. Zhang, Z. Sun, W. Xie, X. Peng, A survey on zero-knowledge proof in blockchain, IEEE network 35 (4) (2021) 198–205.

[48]P. Wei, D. Wang, Y. Zhao, S. K. S. Tyagi, N. Kumar, Blockchain data-based cloud data integrity protection mechanism, Future Generation Computer Systems 102 (2020) 902–911.

[49]Y. Zhang, A. Juels, M. K. Reiter, A survey of private machine learning, Foundations and Trends® in Privacy and Security 5 (1-2) (2020) 1–196.

[50]X. Huang, D. Evans, J. Katz, Private, verifiable, and efficient inference of non- interactive machine learning models, in: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, 2019, pp. 2039–2056.

[51]S. Yang, N. Papernot, Y. Chen, D. Song, Scaling private learning with sat-based verification, in: Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 10481–10491.

[52]P. Mohassel, Y. Zhang, Secureml: A system for scalable privacy-preserving machine learning, in: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 195–212.

[53]B. Zhang, Z. Han, X. Cheng, J. Chen, H. Chen, Zero-knowledge proofs for decentralized identity and access control: A survey, IEEE Transactions on Information Forensics and Security 16 (2021) 771–793.

[54]A. Sonnino, E. De Cristofaro, F. Dünkel, C. Fournet, P. Pietzuch, Zero-knowledge proofs for authentication in decentralized systems, IEEE Security & Privacy 19 (2021) 24–33.

[55]I. Alabdulmohsin, R. Wahby, M. Polychronakis, C. Papamanthou, Efficient privacy- preserving deep learning using authenticated encryption with secure aggregation, IEEE Transactions on Dependable and Secure Computing 19 (2020) 1034–1048.

[56]Q. Yang, Y. Liu, T. Chen, Y. Tong, W. Zhang, X. Wang, Federated machine learning: Concept and applications, ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2) (2019) 1–19.

[57]T. Park, T. Kim, J. Moon, H. Kim, J. Kang, Efficient and privacy-preserving incentives for blockchain-based federated learning, in: IEEE International Conference on Blockchain, IEEE, 2021, pp. 77–82.

[58]F. Li, J. Zheng, X. Zhu, Y. Wu, J. Wang, Zero-knowledge proof based blockchain- assisted secure data sharing framework for iot, in: IEEE Global Communications Conference, IEEE, 2021, pp. 1–6.

Last updated 1 year ago