Mikhail Dorojevets
· Associate ProfessorVerifiedStony Brook University · Electrical and Computer Engineering
Active 1995–2018
About
Mikhail Dorojevets is an Associate Professor at the Department of Electrical and Computer Engineering at Stony Brook University. His research focuses on parallel computer architecture, high-performance systems design, and superconductor processors. His work involves developing advanced computational systems and architectures to enhance processing capabilities and efficiency in high-performance computing environments.
Research topics
- Computer science
- Parallel computing
- Electrical engineering
- Computer hardware
- Embedded system
Selected publications
Energy-Efficient Superconductor Bloom Filters for Streaming Data Inspection
IEEE Transactions on Dependable and Secure Computing · 2018-06-04 · 4 citations
article1st authorCorrespondingBloom filters can be used in network intrusion detection systems to detect known attack signatures in packet payloads. In this paper we propose and analyze the potential application of superconductor flux quantum technology for streaming data inspection with Bloom filters designed with Reciprocal Quantum Logic (RQL). This paper describes the gate-level design, performance, and energy-efficiency analysis of three superconductor 2 Kbit Bloom filters with 1) the run-time selection of the number of hashes per stream, and 2) different numbers of input streams per Bloom filter. The Bloom filter circuits were designed using a bottom-up approach with manual placing and routing of basic RQL gates. The design complexity is below 97K Josephson junctions. The highest clock frequency reached in the simulation of the circuits is 14.7 GHz. The false positive rates of the RQL Bloom filters are in very close agreement with the theoretical expectations of the false positive probability for the filters. For the cryocooling efficiency of 0.1 percent, the RQL Bloom filters demonstrate high energy efficiency in the range of ~1.5-43.6 pJ/stream/operation at room temperature for stream lengths from 16 to 256 bits. All circuits are designed and simulated for the 248 nm MIT Lincoln Laboratory SFQ5ee fabrication process.
FPGA-based satisfiability filters for deep packet inspection
2018-05-01 · 1 citations
articleSenior authorSatisfiability (SAT) filters have been recently proposed as a fast and storage-efficient way of implementing set membership operations. In this paper we discuss the application of the random SAT filters with k hash functions (k-SAT filters), for detecting the potential presence of known malicious signatures (byte patterns) in packet payloads to prevent cyber attacks. We developed and verified the operation of a FPGA-based 3-SAT filter with 3 hash functions per signature. The hash functions are implemented with bit stream processing circuits using the CRC-32 polynomial. The 3-SAT filter with 1,024 variables has a single-instance architecture with 64 solutions for a set of 3,360 input test patterns extracted from the content fields of the known malicious signatures in the Snort intrusion detection system database. During a filter construction phase, the 64 “good” solutions with the maximum Hamming distance between them have been selected among the 8,000 solutions found by a SAT solver. A Digilent Arty A7 with an Artix-7 FPGA was used to implement the filter design. The complete FPGA filter system operates at a 200 MHz clock rate and uses 720 Kbit of BRAM, 17,606 LUTs, and 20,296 flip-flops. The experimentally observed false positive rate for 50,000 randomly-generated signatures of different lengths was ~1.6%. The 3-SAT FPGA design can be used to work with any set of signatures of interest with no need for changing and re-synthesizing VHDL code and reprogramming the entire FPGA. The results of this project allow for better understanding and planning of our next steps in the work on k-SAT filter applications for deep packet inspection.
Novel integration of Dimetheus and WalkSAT solvers for k-SAT filter construction
2017-05-01 · 2 citations
articleSenior authorThis paper describes a novel approach used to integrate two leading satisfiability (SAT) solvers, Dimetheus and WalkSAT, into a system to provide users with solver selection and solution customization capabilities. The two solvers are efficient for two different cases, have different execution procedures and generate a single solution from a single line of command. This integration provides the most efficient way to find multiple random solutions of any set membership problem from a common single line of command. To build an effective k-SAT filter, multiple random solutions are essential. The integration also provides a unified solution output format rather than two different output formats of two solvers. The theoretical approach and a practical C program were developed and tested during the work in an on-going project to build world's first practical k-SAT filter for deep packet inspection in network intrusion detection systems at Stony Brook University.
Design and Demonstration of a 30 GHz 16-bit Superconductor RSFQ Microprocessor
2015-03-10 · 1 citations
article1st authorCorrespondingAbstract : The major objective of the project was to design and demonstrate operation of key components of a 30 GHz 16-bit RSFQ processor prototype implemented with the AIST/ISTEC 10 kA/cm sq. fabrication process. Our team has developed complete logical and physical designs of five RSFQ chips using the CONNECT cell library and RSFQ CAD tools developed at the Universities of Yokohama and Nagoya (Japan). The major results are the world's first successful design, fabrication, and demonstration of correct operation of a 20 GHz 8x8-bit parallel carry-save RSFQ multiplier with approximately 6K JJs, a 16-bit sparse-tree wave-pipelined RSFQ adder with approximately 10K JJs, and partial operation of an 8-bit ALU chip with approximately 9K JJs. The goal of the second phase of the project was to get detailed understanding of the performance, complexity, and energy efficiency of on-chip storage units implemented with superconductor Reciprocal Quantum Logic (RQL) using our RQL VHDL cell library tuned to the MIT Lincoln Laboratory 10 kA/cm2 248 nm process. The 8.5 GHz 1-4 Kbit 32-/64-bit multi-ported scratchpad memory, register files, write-through and write-back caches designed with RQL Non-Destructive Read-Out storage cells have the average energy consumption of 3.0-9.5 fJ/bit/operation at room temperature using the cryocooling efficiency of 0.1%.
2015-10-01 · 12 citations
article1st authorCorrespondingNew superconductor single flux quantum (SFQ) technology, such as Reciprocal Quantum Logic (RQL), is currently considered one of the promising candidates for highperformance energy-efficient computing. This paper presents our work on the design and detailed energy efficiency analysis of three types of 32- and 64-bit RQL multi-ported pipelined local storage structures (13 total), namely 1) random access memory (RAM) and register files, 2) direct-mapped write-through and write-back caches, and 3) first-in-first-out (FIFO) buffers. Our layout-aware cell-level design process uses a VHDL RQL cell library developed at the Ultra High Speed Computing Laboratory at Stony Brook University (SBU). The SBU VHDL RQL cell library specifies the dynamic and standby energy consumption, gate delays, a number of Josephson junctions (JJs) per cell, and approximate sizes of individual cells based on the parameters of the 248 nm 100 μA/μm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> 10 Nb metal layer SFQ fabrication process currently under development at the MIT Lincoln Laboratory. Gate and wire delays as well as clock skew are taken into account during digital circuit simulation done with Mentor Graphics CAD tools. After completing a physical chip layout, the circuit models need to be updated and re-simulated to include the effects of parasitic inductances and actual wire lengths on signal propagation delays. To meet both performance and energy efficiency targets, the RQL storage structures were designed with RQL non-destructive read-out single-bit storage cells. We chose a relatively moderate clock frequency of 8.5 GHz for all storage units to keep their read latencies in the range of 1- 3 cycles. The most complex design in terms of JJs is a tripleported 4 Kbit 64x64-bit register file with 253,918 JJs and its read access latency of 338 ps. The highest energy consumption in terms of energy/operation/bit (~9.5 aJ at 4.2 K) is for a write hit in a 2 Kbit 32-bit wide write-back cache. The average energy consumption of the RQL storage designs varies from ~1.6 aJ/operation/bit for a small 4x32-bit FIFO to 7.3 aJ/operation/bit for the 2 Kbit write-back cache at 4.2 K. Given the cryocooler efficiency of 0.1%, this means the energy consumption of ~1.6-7.3 fJ/operation/bit at room temperature. The physical implementation of the RQL storage units will become feasible upon the development of the target MIT fabrication process and CAD tools for VLSI RQL chip design in 2015-2016.
IEEE Transactions on Applied Superconductivity · 2014-11-06 · 31 citations
article1st authorCorrespondingNew superconductor single flux quantum logics with no static power dissipation in bias resistors, such as Reciprocal Quantum Logic (RQL), offer opportunities to create energy-efficient superconductor processors operating at high frequencies with ultra-low power consumption. This paper discusses the results of our work on the cell-level design and analysis of a benchmark set of 32-/64-bit RQL processor integer and floating-point units such as adders, multipliers, an arithmetic-logic unit, and an array shifter, as well as small 1-4 Kbit RQL on-chip storage components such as register files, on-chip memory, and the top level caches. Our layout-aware design process includes the complete cell-level design and approximate physical layout of the circuits followed by the VHDL simulation, verification, and energy profiling using our RQL VHDL cell library tuned to the future MIT Lincoln Laboratory 10 kA/cm2 248 nm process with 10 Nb metal layers and the minimum JJ critical current of 38 μA. Our designs have the energy efficiency of ~1.0 single-precision TFLOPS/W and '0.5 double-precision TFLOPS/W for floating-point units, and ~1-24 TOPS/W for 32-bit integer units at room temperature using the cryocooling efficiency of 0.1 % (1000 W/W). The 1-4 Kbit 32-/64-bit multi-ported scratchpad memory, register files, write-through and write-back caches designed with RQL Non-Destructive Read-Out storage cells have the average energy consumption of 3.0-9.5 fJ/bit/operation at room temperature using the cryocooling efficiency of 0.1%. While these results are very promising, more work is needed to evaluate the contribution of the energy costs of instruction scheduling and off-chip main memory access to the energy efficiency of RQL computing across a whole system.
Demonstration of an 8&#x00D7;8-bit RSFQ multi-port register file
2013-07-01 · 8 citations
articleAs a part of the 8-bit RSFQ processor datapath development, we have designed, fabricated, and experimentally demonstrated an 8×8-bit RSFQ multi-port register file. The register file provides input data operands and stores Arithmetic Logic Unit (ALU) results. It can perform two simultaneous non-destructive “read” operations and one “write” operation and is capable of storing eight 8-bit words. The distinct feature of the design is an extensive use of passive transmission lines (PTLs) for very complex interconnects inside the register file. The register file is designed for integration with recently demonstrated 20-GHz 8-bit RSFQ ALU. It is fabricated with the standard HYPRES's 1.0-um 4.5-kA/cm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> process. The circuit is placed on a 1 cm × 1 cm chip and consists of ~4,000 Josephson junctions.
16-Bit Wave-Pipelined Sparse-Tree RSFQ Adder
IEEE Transactions on Applied Superconductivity · 2012-12-12 · 35 citations
article1st authorCorrespondingIn this paper, we discuss the architecture, design, and testing of the first 16-bit asynchronous wave-pipelined sparse-tree superconductor rapid single flux quantum adder implemented using the ISTEC 10 kA/cm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ADP2.1 fabrication process. Compared to the Kogge-Stone adder, our parallel-prefix sparse-tree adder has better energy efficiency with significantly reduced complexity (at the expense of latency) and almost no decrease in operation frequency. The 16-bit adder core (without SFQ-to-dc and dc-to-SFQ converters) has 9941 Josephson junctions occupying an area of 8.5 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . It is designed for the target operation frequency of 30 GHz with the expected latency of 352 ps at the bias voltage of 2.5 mV. The adder chip was fabricated and successfully tested at low frequency for all test patterns with measured bias margins of +9.8%/-10.7%.
20-GHz 8 $\times$ 8-bit Parallel Carry-Save Pipelined RSFQ Multiplier
IEEE Transactions on Applied Superconductivity · 2012-11-21 · 22 citations
article1st authorCorrespondingWe will discuss the microarchitecture, design, and testing of the first 8 × 8-bit (by modulo 256) parallel carry-save RSFQ multiplier implemented using the ISTEC 10- kA/cm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> 1.0-μm fabrication technology. Partial products are asynchronously generated and sent to the reduction stage at the internal “hardwired” rate of 80 GHz. The 8 × 8-bit RSFQ multiplier uses a two-level parallel carry-save reduction tree that significantly reduces the multiplier latency. The 80-GHz carry-save reduction is implemented with asynchronous data-driven wave-pipelined [4:2] compressors built with toggle flip-flop cells. The design has mostly regular layout with both local and global connections between modules. The multiplier core (without SFQ-to-DC and DC-to-SFQ converters) has 5948 Josephson junctions occupying the area of 3.5 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . The multiplier is designed with the target operation frequency of 20 GHz and has the latency of 447 ps at the bias voltage of 2.5 mV. Despite some challenges due to fabrication process parameter variations and flux trapping, the multiplier chip was fabricated and successfully tested for the vast majority of test vectors by the Stony Brook designers with the assistance of colleagues from Yokohama National University in February 2012. While multiplier test operations were generated at low frequency, each of these operations was executed at the “hardwired” rate of 80 GHz. The fabricated chip operated with the measured DC bias margins of ±5%.
IEEE Transactions on Applied Superconductivity · 2012-11-21 · 34 citations
article1st authorCorrespondingThis paper describes the design and testing of an 8-bit asynchronous wave-pipelined sparse-tree Rapid Single Flux Quantum (RSFQ) Arithmetic Logic Unit (ALU). Compared to previously developed RSFQ ALUs, this unit features an extensive set of 8 arithmetic and 12 logical operations. The execution of ALU operations consists of two steps. First, when necessary, one or both operands are inverted, and then operations are performed on these pre-processed data. Unlike the RSFQ Kogge-Stone-based designs, our parallel-prefix sparse-tree ALU has significantly reduced circuit complexity while maintaining robust operational margins at high frequency. An 8-bit ALU has been implemented with the International Superconductivity Technology Center (ISTEC) 10 kA/cm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> 1.0 μm 9-metal ADP2.1 fabrication process as a joint effort between Stony Brook University, Yokohama National University, and Nagoya University. Using the CONNECT cell library and SFQ CAD tools developed at Nagoya and Yokohama, the Stony Brook team has developed a complete logical and physical design of the ALU chip. The 8-bit ALU core (without SFQ-to-dc and dc-to-SFQ converters) consists of 8832 Josephson junctions with an area of 7.2 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . Simulations show that the ALU can operate at the maximum rate of 42 GHz. It has the latency of 374 ps at a bias voltage of 2.5 mV. The chip was fabricated and tested at low frequency in 2012. Testing results showed malfunctioning of some gates but despite these shortcomings we still verified several ALU operations with the measured DC bias voltage margins of ±1.8%.
Frequent coauthors
- 16 shared
P. Bunyk
D-Wave Systems (Canada)
- 9 shared
Dmitry Zinoviev
Suffolk University
- 9 shared
Artur K. Kasperek
Stony Brook University
- 6 shared
Christopher L. Ayala
Yokohama National University
- 5 shared
Q.P. Herr
- 5 shared
A. H. Silver
Yokohama National University
- 4 shared
Anubhav Sahu
Hypres (United States)
- 4 shared
L.A. Abelson
Northrop Grumman (United States)
Labs
Electrical and Computer EngineeringPI
Education
- 2005
Ph.D., Computer Science
University of California, Los Angeles
- 2001
M.S., Computer Science
University of California, Los Angeles
- 1999
B.S., Computer Science
University of California, Los Angeles
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Mikhail Dorojevets
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup