Introduction

This little survey post should give you a basic overview of simulation and emulation software for microprocessors and the ecosystems around them (like caches, memories, busses, peripherals, etc.). So, basically everything that falls in the category of computer architecture. Whenever the words simulation and emulation are used in the following, they refer to the mentioned case.
In general, this post does not dive into the technical details of how to implement such a simulator/emulator, but rather look at the topic from a high-level perspective. I hope that in the end the following are answered:

  • What is a simulator/emulatior? How do they differ?
  • Which simulators/emulators do exist?
  • What can I expect from a simulator/emulator?
So, let's start with the first question :)

What is Simulator/Emulator?

Do you know that feeling when a simple question about a certain topic makes you question your expertise? In my case it was the questions located in the title.
If you'd ask me, a guy pursuing a PhD in computer engineering, "What is Simulator/Emulator?" I'd confidently tell you something about how they mimic real-world computer system and how they have dozens of use-cases ranging from software and hardware co-development to efficient design space exploration.
Well, that answer is probably not wrong, but what distinguishes an emulator from a simulator?
Hrm, I didn't have answer for that question, so I consulted my typical sources: IEEE search, Wikipedia, stackoverflow, Google, etc.
Let me already give you a short conclusion: They don't know either.
Most sources distinguish simulators and emulators similarly but there's also a significant amount of sources defining these terms in an exact opposite way. And then there are some people which basically don't care and use these terms equivalently...
In the following I'll begin with my "personal" definition that reflects the majority of sources and then procede with some contrary oppinions.

The Majority Definition

In theory, the basic difference between simulation and emulation according to most sources can be easily defined as follows [1]:

  • A simulator mimicks the inner and outward behaviour of a system
  • An emulator mimicks only the outward behaviour of a system

So, when using an emulator we regard our system as a black-box and only care for the input-output relation (see Fig. 1). A good example for this are video game console emulators like Bleem! from the Bleem Company which allowed the user to play Playstation 1 (PS1) games on a desktop PC [2]. In this case the user has no interest in how the emulator mimicks the PS1 system. Providing the same input-output relation and therefore the same gaming experience as the PS1 is sufficient. In fact, Bleem! had even better graphics than the PS1.
In a simulation, on the other hand, the internal states of the system are also modelled, as they are of particular interest (see Fig. 1). For example, gem5 [3] can simulate an x86 processor and provide information about the exact state of the pipeline or the caches. Yet it also shows a correct input-output behaviour.
Simulation vs. Emulation
Fig.1 - Simulation vs. Emulation

In practice however, the border between emulation or simulation is often blurred, because rarely does a simulation model a target system perfectly and rarely does an emulator really give no insight into its processes. Furthermore, any internal state can also be regarded as an output variable.

It can therefore sometimes be useful to define the terms not only by how the simulator/emulator works, but also by their use-case [1]:

  • Emulators are often used as a substitute for the target system.
  • Simulators are often used for research and study
These definitions fit well with the examples mentioned above. A gamer would probably use the Bleem! emulator because he or she cannot afford a real Playstation 1. It is therefore a substitute for the target system.
A simulator like gem5 targets computer architecture researchers, which might want to study the performance of some cool new cache protocol they inventend.
In addition, emulators are usually several magnitudes faster than simulators when modelling the same system, but more on that later.

Contrary Definitons

- IEEE search

More Definitions...

In the field of computer architecture the terms simulation and emulation are often replaced by some other terms that provide more detail about the system being mimicked. Or they are just some fancy marketing terms... Anywaym, here's a list of the most common simulator/emulator variants used in literature and industry:

Instruction Set Simulator (ISS)
An ISS usually simulates the behaviour of a microprocessor. This is more than the name might suggest, as many ISSs do not only simulate the instruction set but also the architecture of specific processor like ARMulator [4].

Full-System Simulator (FSS)
Basically an ISS plus simulated hardware such as caches, cusses, or peripheral devices. A popular example is gem5 [3]. Accordingly, if the full-system is rather emulated than simulated, one speaks of an Full-System Emulator (FSE). Here, a popular example is QEMU (short for quick emulator).[23].

Virtual Prototype (VP)
This is just another term for Full-System Simulator/Emulator which is often used by companies. Depending on the company, the term Virtual Platform is also used equivalently (luckily it has the same acronym). To verify this, let's have a look at definitions from several companies and organizations.

ESA [5]: A Virtual Platform is a software based system that can fully mirror the functionality of a target System-on-Chip or board.
Imperas [6]: Virtual Platforms are really just another name for "simulation of your system".
Synopsys [7] : Virtual prototypes are fast, fully functional software models of complete systems that execute unmodified production code and provide unparalleled debug efficiency.

Imperas even divides this term further into Hardware VPs (basically a Full-System Simulator) and Software VPs (basically a Full-System Emulator)

Which Simulators/Emulators do exist?

To make it easy, let's begin with a list of computer architecture simulators and emulators first:

NameTypelast ΔtargetlicenseIPSsource
ARMISSISS2008ARMproprietary?[17]
ARMulatorISS2003ARMproprietary?[18], [19]
BochsFSE2020x86LGPL?[26], [27]
gem5FSS2020Alpha, ARM, MIPS, PowerPC, RISC V, SPARC, x86, Caches, Busses, NoCs, ...BSD like0.2-3 MIPS[3], [8], [9], [10]
MARSSx86FSS2019x86open-source0.15-0.3 MIPS[24], [25]
or1kissISS2020OpenRISC 1000Apache License 2.010-50 MIPS[16]
OVPsimFSS2020ARM, Synopsys ARC, MIPS, PowerPC, ...parts are open-source, free of charge for non-commercial useup to 1000 MIPS[8], [15]
PTLsimFSS2012x86GNU GPLv2?[8], [13]
QEMUFSE2020ARM, Microblaze, SPARC, PowerPC, x86GPLv250-200 MIPS[10], [22], [23]
ShadeISS1994SPARC, MIPSproprietary?[21]
SimpleScalarFSS2011Alpha, ARM, PowerPC, x86, Cachesopen-source, custom license0.35-10 MIPS[8], [14]
sniperISS2020x86MIT?[29]
Synopsys DesignWare ARC nSIMISS2020Synopsys ARCproprietary20MIPS (interpretive), 475 MIPS (JIT) [20]
Wind River SimicsFSS2020Alpha, ARM, MIPS, PowerPC, SPARC, x86proprietary?[8], [11], [12]

And this is not everything... There are probably a few more simulators/emulators out in the wild (searching "Instruction Set Simulator" on GitHub yields 201 results). But this list is already a good starting point covering the most important ones.

What can I expect from a simulator/emulator?

As already mentioned above, the main difference between for you as a user are simulation/emulation speed, the amount of insight you get, and how accurate the results are. Let's start with the FSS gem5 as an example to understand how this works out in detail.

I guess one of the reasons why gem5 is so popular nowadays is its extensive yet well-arranged modularity and flexibility. For many components like caches or CPUs there are multiple versions on different levels abstraction. Reaching from nearly cycle-accurate out-of-order CPUs to simple atomic CPUs which just execute the instructions one by one without any pipeline. It's quite intuitive to guess, that the out-of-order CPU will probably yield more accurate results but will be slower as the atomic counter part. In fact, I ran a simulation and the atomic CPU was x5.7 faster than the out-of-order CPU. In general, the rule of thumb for simulation techniques is: you can easily tradeoff simulation performance for accuracy and detail as show in Fig. 2.

Performance vs. Accuracy/Detail
Fig.2 - Performance vs. Accuracy/Detail
Anyway, what you get in the end after simulation is a file with a lot of statistics. The statistics file of my out-of-order CPU ARM System had over 6000 variables with things like several cache hit rates, number and type of executed instructions, execution time, and so on. These statistics can be used, for example, to find a good configuration of a compute system which is to be developed (this is also called design space exploration). However, you need to be patient as full-system simulators are usually slow.
I did a simulation of a full out-of-order, multi-core ARM-SoC with busses, caches and DRAM. The result: 0.132273 MIPS. With the atomic CPU it was 0.757062 MIPS. This seems terribly slow (a linux boot needs more than an hour with this speed), but other FSSs with this level of detail won't be much faster.
A possible way to tackle this problem is using parallelism. As most simulated systems are multi-core or multi-socket systems, one could also try to parallelize the simulation itself at the right points. But there are some problems with that approach: First, some problems like cache coherency are inherently sequential and cannot be parallelized. Second, this is easier said that done. To my knowledge only two FSSs make us of parallelism: The first one is distr-gem5 [9]. According to the paper it achieves a speedup of 83 compared to standard gem5. However, this is strictly limited to the simulation of highly distributed systems. If you simulate a typical smartphone multi-core system, you won't benefit from distr-gem5.
Parallelism does also seem to be implemented in OVPsim, where this acceleration technique is called QuantumLeap. However, information about that seems to intended for paying customers as their website suggests [15].

Looking at the table above, it can be summarized that with today's technology one can expect a performance of 0.1-10 MIPS for an open-source sequential full system simulator. The only actively developed proprietary FSS OVPsim promises as speed of 1000MIPS which is something I don't believe until I see it. But I could imagine that this is related to the mentioned QuantumLeap technology and might work for some very special use-cases.
Using an instruction set simulator one can expect around 10-50 MIPS. Again a proprietary product but this time from Synopsys is a little bit outstanding with 475 MIPS. According to their website this is achieved by running a multithreaded JIT compiler.
Non-surprisingly, full system emulators achieve the highest performances with ca. 50-200MIPS. And you might even achieve a higher performance when using a parallelized ones like PQEMU [28]. According to their paper, the performance can nearly scale linearly with the amount of cores you provide to the simulation. But this speedup does heavily depend on the workload executed.

One thing that has been left out so far is accuracy. The problem with accuracy is that it is difficult to make a general statement for a simulator (emulators don't have an accuracy in that sense as they don't record any inner states/values). Depending on the value examined and the workload used, there are often huge differences. Butko et al. [8] show, for example, that the absolute execution time error (AETE) of gem5 ranges from 1.39% to 17.94%. The accuracy therefore varies more than 10 times depending on the workload.
But also the accuracy of the examined values themselves can vary considerably. In a case-study of the PTLsim the AETE is stated as 4.30%. However, the error of the DTLB miss rate is 245%.

Summary

In this post a high-level introduction to computer architectures was given. We answered three basic questions which will be summarized in a TL;DR style:

What is simulation/emulation?
Simulation is slow. Models all internal states. Used for study and research.
Emulation is fast. Only cares for the input-output relation. Used for substitution.

Which simulators/emulators do exist?
A lot (see the list). They can primarly be distinguished by how the work and what they model.

What can I expect from a simulator/emulator?
Performance (with today's 2020 technology):
0.1-10 MIPS for FSS, 10-50 MIPS for ISS, 50-200 MIPS for FSE
This just a rough classification. There might be some outliers.
Accuracy:
Really hard to make a precise statement.
AETE of gem5 and PTLsim somewhere in the range of 1%-18%.
Other values can be significantly inaccurate (more than 100% error).

If you have any question or additions, feel free to write an e-mail to me (see About).

References

[1]: https://stackoverflow.com/questions/1584617/simulator-or-emulator-what-is-the-difference
[2]: https://v1.escapistmagazine.com/articles/view/video-games/issues/issue_117/2295-Best-Little-Emulator-Ever-Made
[3]: https://www.gem5.org/
[4]: https://developer.arm.com/documentation/dui0058/d/armulator-basics/about-armulator
[5] VP at ESA
[6] VP at Imperas
[7] VP at Synopsys
[8] Butko et al. (2012): Accuracy evaluation of GEM5 Simulator System
[9] Mohammad et al. (2018): dist-gem5: Distributed Simulation of Computer Clusters
[10] dist-gem5 Tutorial at IEEE International Symposium on Workload Characterization (IISWC) 2017
[11] Magnusson et al. (2002): Simics: A full system simulation platform
[12] Simics Product Overview
[13] Yourst et. al (2007): PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator
[14] Simple Scalar tutorial
[15] OVPsim website
[16] or1kiss
[17] ARMISS: An Instruction Set Simulator for the ARM Architecture
[18] ARMulator about
[19] ARMulator docs
[20] Synopsys DesignWare ARC nSIM about
[21] R. Cmelik, D. Keppel (1994): Shade: A Fast Instruction-Set Simulator for Execution Profiling
[22] QEMU website
[23] F. Bellard (2015): QEMU, a Fast and Portable Dynamic Translator.
[24] Patel et al. (2011): MARSS-x86: A QEMU-Based Micro-Architectural and Systems Simulator for x86 Multicore Processors
[25] MARSS-x86 website
[26] Bochs website
[27] Bochs in Linux Journal Volume 1996, Issue 29es
[28] Ding et al. (2011): PQEMU: A Parallel System Emulator Based on QEMU
[28] snipersim website