-
EdgeConvEns: Convolutional Ensemble Learning for Edge Intelligence
Authors:
Ilkay Sikdokur,
İnci M. Baytaş,
Arda Yurdakul
Abstract:
Deep edge intelligence aims to deploy deep learning models that demand computationally expensive training in the edge network with limited computational power. Moreover, many deep edge intelligence applications require handling distributed data that cannot be transferred to a central server due to privacy concerns. Decentralized learning methods, such as federated learning, offer solutions where m…
▽ More
Deep edge intelligence aims to deploy deep learning models that demand computationally expensive training in the edge network with limited computational power. Moreover, many deep edge intelligence applications require handling distributed data that cannot be transferred to a central server due to privacy concerns. Decentralized learning methods, such as federated learning, offer solutions where models are learned collectively by exchanging learned weights. However, they often require complex models that edge devices may not handle and multiple rounds of network communication to achieve state-of-the-art performances. This study proposes a convolutional ensemble learning approach, coined EdgeConvEns, that facilitates training heterogeneous weak models on edge and learning to ensemble them where data on edge are heterogeneously distributed. Edge models are implemented and trained independently on Field-Programmable Gate Array (FPGA) devices with various computational capacities. Learned data representations are transferred to a central server where the ensemble model is trained with the learned features received from the edge devices to boost the overall prediction performance. Extensive experiments demonstrate that the EdgeConvEns can outperform the state-of-the-art performance with fewer communications and less data in various training scenarios.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Common Subexpression-based Compression and Multiplication of Sparse Constant Matrices
Authors:
Emre Bilgili,
Arda Yurdakul
Abstract:
In deep learning inference, model parameters are pruned and quantized to reduce the model size. Compression methods and common subexpression (CSE) elimination algorithms are applied on sparse constant matrices to deploy the models on low-cost embedded devices. However, the state-of-the-art CSE elimination methods do not scale well for handling large matrices. They reach hours for extracting CSEs i…
▽ More
In deep learning inference, model parameters are pruned and quantized to reduce the model size. Compression methods and common subexpression (CSE) elimination algorithms are applied on sparse constant matrices to deploy the models on low-cost embedded devices. However, the state-of-the-art CSE elimination methods do not scale well for handling large matrices. They reach hours for extracting CSEs in a $200 \times 200$ matrix while their matrix multiplication algorithms execute longer than the conventional matrix multiplication methods. Besides, there exist no compression methods for matrices utilizing CSEs. As a remedy to this problem, a random search-based algorithm is proposed in this paper to extract CSEs in the column pairs of a constant matrix. It produces an adder tree for a $1000 \times 1000$ matrix in a minute. To compress the adder tree, this paper presents a compression format by extending the Compressed Sparse Row (CSR) to include CSEs. While compression rates of more than $50\%$ can be achieved compared to the original CSR format, simulations for a single-core embedded system show that the matrix multiplication execution time can be reduced by $20\%$.
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
A Decentralized Framework with Dynamic and Event-Driven Container Orchestration at the Edge
Authors:
Umut Can Özyar,
Arda Yurdakul
Abstract:
Virtualization provides an abstraction layer for the Internet of Things technology to tackle the heterogeneity of the edge networks. It enables the deployment of an application on devices with different architectures to achieve uniformity. This study lays down the fundamentals of a framework for dynamic and event-driven orchestration towards a fully decentralized edge. It provides a blockchain-bas…
▽ More
Virtualization provides an abstraction layer for the Internet of Things technology to tackle the heterogeneity of the edge networks. It enables the deployment of an application on devices with different architectures to achieve uniformity. This study lays down the fundamentals of a framework for dynamic and event-driven orchestration towards a fully decentralized edge. It provides a blockchain-based delivery platform for containerized applications registered with their resource requirements through a registry on a distributed file system, namely InterPlanetary File System (IPFS). The decentralized resource manager running on the metrics scraped from the host and the virtualization platform, i.e., Docker in our implementation, dynamically optimizes the resources allocated to each container. The framework ensures that variable workloads of a heterogeneous environment can co-exist on multiple edge devices. An event-driven architecture is built over a lightweight messaging protocol, MQTT, capitalizing on the asynchronous and distributed nature of the publish/subscribe pattern to achieve a truly distributed system.
△ Less
Submitted 26 September, 2022; v1 submitted 23 June, 2022;
originally announced June 2022.
-
ElectAnon: A Blockchain-Based, Anonymous, Robust and Scalable Ranked-Choice Voting Protocol
Authors:
Ceyhun Onur,
Arda Yurdakul
Abstract:
Remote voting has become more critical in recent years, especially after the Covid-19 outbreak. Blockchain technology and its benefits like decentralization, security, and transparency have encouraged remote voting systems to use blockchains. Analysis of existing solutions reveals that anonymity, robustness, and scalability are common problems in blockchain-based election systems. In this work, we…
▽ More
Remote voting has become more critical in recent years, especially after the Covid-19 outbreak. Blockchain technology and its benefits like decentralization, security, and transparency have encouraged remote voting systems to use blockchains. Analysis of existing solutions reveals that anonymity, robustness, and scalability are common problems in blockchain-based election systems. In this work, we propose ElectAnon, a blockchain-based, ranked-choice election protocol focusing on anonymity, robustness, and scalability. ElectAnon achieves anonymity by enabling voters to cast their votes via zero-knowledge proofs anonymously. Robustness is realized by removing the direct control of the authorities in the voting process by using timed-state machines. Results show that ElectAnon is scalable amongst existing works as it reduces the gas consumption up to 89% compared to previous works. The proposed protocol includes a candidate proposal system and swappable tallying libraries. An extension is also proposed to minimize the trust assumption on election authorities. Our code is available on https://github.com/ceyonur/electanon.
△ Less
Submitted 8 November, 2022; v1 submitted 31 March, 2022;
originally announced April 2022.
-
Image Classification on Accelerated Neural Networks
Authors:
Ilkay Sikdokur,
Inci Baytas,
Arda Yurdakul
Abstract:
For image classification problems, various neural network models are commonly used due to their success in yielding high accuracies. Convolutional Neural Network (CNN) is one of the most frequently used deep learning methods for image classification applications. It may produce extraordinarily accurate results with regard to its complexity. However, the more complex the model is the longer it take…
▽ More
For image classification problems, various neural network models are commonly used due to their success in yielding high accuracies. Convolutional Neural Network (CNN) is one of the most frequently used deep learning methods for image classification applications. It may produce extraordinarily accurate results with regard to its complexity. However, the more complex the model is the longer it takes to train. In this paper, an acceleration design that uses the power of FPGA is given for a basic CNN model which consists of one convolutional layer and one fully connected layer for the training phase of the fully connected layer. Nonetheless, inference phase is also accelerated automatically due to the fact that training phase includes inference. In this design, the convolutional layer is calculated by the host computer and the fully connected layer is calculated by an FPGA board. It should be noted that the training of convolutional layer is not taken into account in this design and is left for future research. The results are quite encouraging as this FPGA design tops the performance of some of the state-of-the-art deep learning platforms such as Tensorflow on the host computer approximately 2 times in both training and inference.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
An Embedded RISC-V Core with Fast Modular Multiplication
Authors:
Ömer Faruk Irmak,
Arda Yurdakul
Abstract:
One of the biggest concerns in IoT is privacy and security. Encryption and authentication need big power budgets, which battery-operated IoT end-nodes do not have. Hardware accelerators designed for specific cryptographic operations provide little to no flexibility for future updates. Custom instruction solutions are smaller in area and provide more flexibility for new methods to be implemented. O…
▽ More
One of the biggest concerns in IoT is privacy and security. Encryption and authentication need big power budgets, which battery-operated IoT end-nodes do not have. Hardware accelerators designed for specific cryptographic operations provide little to no flexibility for future updates. Custom instruction solutions are smaller in area and provide more flexibility for new methods to be implemented. One drawback of custom instructions is that the processor has to wait for the operation to finish. Eventually, the response time of the device to real-time events gets longer. In this work, we propose a processor with an extended custom instruction for modular multiplication, which blocks the processor, typically, two cycles for any size of modular multiplication when used in Partial Execution mode. We adopted embedded and compressed extensions of RISC-V for our proof-of-concept CPU. Our design is benchmarked on recent cryptographic algorithms in the field of elliptic-curve cryptography. Our CPU with 128-bit modular multiplication operates at 136MHz on ASIC and 81MHz on FPGA. It achieves up to 13x speed up on software implementations while reducing overall power consumption by up to 95\% with 41\% average area overhead over our base architecture.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
IDMoB: IoT Data Marketplace on Blockchain
Authors:
Kazım Rıfat Özyılmaz,
Mehmet Doğan,
Arda Yurdakul
Abstract:
Today, Internet of Things (IoT) devices are the powerhouse of data generation with their ever-increasing numbers and widespread penetration. Similarly, artificial intelligence (AI) and machine learning (ML) solutions are getting integrated to all kinds of services, making products significantly more "smarter". The centerpiece of these technologies is "data". IoT device vendors should be able keep…
▽ More
Today, Internet of Things (IoT) devices are the powerhouse of data generation with their ever-increasing numbers and widespread penetration. Similarly, artificial intelligence (AI) and machine learning (ML) solutions are getting integrated to all kinds of services, making products significantly more "smarter". The centerpiece of these technologies is "data". IoT device vendors should be able keep up with the increased throughput and come up with new business models. On the other hand, AI/ML solutions will produce better results if training data is diverse and plentiful.
In this paper, we propose a blockchain-based, decentralized and trustless data marketplace where IoT device vendors and AI/ML solution providers may interact and collaborate. By facilitating a transparent data exchange platform, access to consented data will be democratized and the variety of services targeting end-users will increase. Proposed data marketplace is implemented as a smart contract on Ethereum blockchain and Swarm is used as the distributed storage platform.
△ Less
Submitted 30 September, 2018;
originally announced October 2018.
-
Designing a blockchain-based IoT infrastructure with Ethereum, Swarm and LoRa
Authors:
Kazım Rıfat Özyılmaz,
Arda Yurdakul
Abstract:
Today, the number of IoT devices in all aspects of life is exponentially increasing. The cities we are living in are getting smarter and informing us about our surroundings in a contextual manner. However, there lay significant challenges of deploying, managing and collecting data from these devices, in addition to the problem of storing and mining that data for higher-quality IoT services. Blockc…
▽ More
Today, the number of IoT devices in all aspects of life is exponentially increasing. The cities we are living in are getting smarter and informing us about our surroundings in a contextual manner. However, there lay significant challenges of deploying, managing and collecting data from these devices, in addition to the problem of storing and mining that data for higher-quality IoT services. Blockchain technology, even in today's nascent form, contains the pillars to create a common, distributed, trustless and autonomous infrastructure system. This paper describes a standardized IoT infrastructure; where data is stored on a DDOS-resistant, fault-tolerant, distributed storage service and data access is managed by a decentralized, trustless blockchain. The illustrated system used LoRa as the emerging network technology, Swarm as the distributed data storage and Ethereum as the blockchain platform. Such a data backend will ensure high availability with minimal security risks while replacing traditional backend systems with a single "smart contract".
△ Less
Submitted 22 September, 2018; v1 submitted 20 September, 2018;
originally announced September 2018.
-
Taxim: A Toolchain for Automated and Configurable Simulation for Embedded Multiprocessor Design
Authors:
Gorker Alp Malazgirt,
Deniz Candas,
Arda Yurdakul
Abstract:
Multicore embedded systems have been constantly researched to improve the efficiency by changing certain metrics, such as processor, memory, cache hierarchies and their cache configurations. Using Multi2Sim and McPAT simulators in combination allows the user to design various multiprocessing architectures and estimate performance, power, area and timing metrics. However, the design time required t…
▽ More
Multicore embedded systems have been constantly researched to improve the efficiency by changing certain metrics, such as processor, memory, cache hierarchies and their cache configurations. Using Multi2Sim and McPAT simulators in combination allows the user to design various multiprocessing architectures and estimate performance, power, area and timing metrics. However, the design time required to simulate these systems is daunting and prone to human error. In this paper, we introduce Taxim, a toolchain that can automatically create requested multicore on-chip topologies along with minimizing the simulation time due to repetitive tasks between architectural power, energy and timing simulations. Taxim's decision-tree-based topology synthesis tool creates processor configuration files that can be highly erroneous when generated manually. The toolchain also automates the steps from design entry to output report extraction by running automation scripts, and listing the results. Our experiments show that multiprocessing architectures with 32 cores and irregular cache hierarchies are more than 1k lines of code in Multi2Sim's processor configuration format and Taxim can create such a file in less than 10 milliseconds. The source code is freely available at https://github.com/bouncaslab/TaXim/.
△ Less
Submitted 13 January, 2016;
originally announced January 2016.