Pyramix and MassCore
FPGAs and DSPs were created for different purposes. DSPs were created to provide an optimized platform for signal processing algorithms implemented in software, while FPGAs were initially created to provide glue logic. FPGAs are by far the superior choice for networking applications that move traffic at gigabit/second data rates. DSPs, meanwhile, have historically been the superior choice for video and audio applications. However, today there are many overlapping applications for the two device types.
If we focus our analysis on the particular audio application of a mix engine, as the number of channel strips and busses grow and the sampling rates get higher, there is clearly a growing appeal for a digital audio designer to consider a FPGA over a DSP for the shear cost advantage (as low as 0.2 cent/MMAC, or million multiply-accumulate operations) some of the latest generation FPGAs offer. The raw computational power of such new devices allow easily to construct mixers that offer over 200 input strips With full multi-band EQ and dynamics on each channel) into 80 or more busses. This is just impossible to achieve even with the fastest and most recent DSPs on the market. Such a mix engine would therefore have to be split over several tens of DSPs to achieve the same result. Given their granular structure FPGAs are also able to process audio with minimal latency and are hence optimal for ultra low latency between Live In to Live Out, which is clearly a major advantage for digital mix engines suitable for Live applications.
There are however a number of issues that made the choice of FPGA a much less interesting proposal for Merging and listed below are some of the most obvious cons:
1) since not only the algorithms but also the underlying fabric (or structure) of the signal path must be extensively defined in an FPGA, typical high-level languages such as C or C++ cannot be employed. One has to resort to using a quite low-level design-entry language (such as VHDL, VHSIC hardware description language) which somehow brings us back to the middle-age of what assembly language used to be for DSP programming.
2) While relatively simple structures as those of a mix buss, an Equalizer and even a Dynamics section are relatively straightforward to program in such low-level languages, more complex algorithms (such as a Reverb or Pitch/Time compression-expansion plugin) do represent a major challenge to implement in an FPGA. This not only at design stage but further on through the entire optimization and maintenance life cycle.
3) Finally one of the most significant drawback when designing with large FPGAs is how slow and tedious this process can be. During each design and debugging iteration stage an enormous amount of time is wasted waiting for a new silicon to be ready for test. Every new compilation may require up to several hours (typically an overnight job) before being able to evaluate the result of even the simplest modification. Doing the same job for a DSP or a code running on a X86 CPU will at most take a few minutes for each iteration.
Mykerinos PCI DSP card
After having had quite an extensive experience with DSP-based audio engines, our first ISA DSP cards design dates back to 1994 and our current VLIW (Very Long Instruction Words) DSP based board was first put on the market in 1999, we started a complete re-evaluation of the state-of-the art 2 years ago. We probably looked at and tried about every possible solution, even the most exotic components, some of them composed of up to 96 parallel processing CPUs in one single device. We also designed an entire mix-engine with hundreds of channels mixing capability on a brand new FPGA and still have it on our prototype bench, but were striving for a more elegant and straightforward solution.
In accordance with Moores law, one would also expect the power of all these solutions to roughly double every 18 months. Upon closer analysis, while DSPs and FPGAs were quite good at almost following Moores law, nothing could beat the tremendous scaling of performance that the x86 platform has enjoyed over the past years. As we already had quite a bit of experience with the Native side of things with our Pyramix Native software, we were looking for a possible way to combine all the advantages of Native X86 programming (high-level language, easy to operate design, maintenance and debug tools, rapidity and flexibility to test a number of design variants, etc.) with the advantages of FPGA or DSP-based designs (ultra-low and deterministic latencies under a strict real-time operating system). While it now could seem a rather evident solution, coming up with MassCore (patent pending) was not a trivial task.
MassCore (the yet another solution combining best of both worlds) what is it?
So after realizing after many fights, after many trials to control (and limit) the impact of the Windows OS on the real-time response of an audio engine, with its formidable constraints on ultra-low roundtrip latency requirements and deciding it was basically impossible to avoid such a general purpose Operating System not to mess up entirely at times by possibly delaying a critical audio task by as much as 10 to 20 ms (if not more), the next thought was would it be possible to keep an audio engine running on a X86 CPU (such as a Pentium) by isolating it totally from the Windows OS?
At the same time, the first Core2 Duo and Core2 Quad processors were seeing their market introduction both by Intel, with similar multicore processors introduced by AMD. It then seemed quite tempting to try the following: isolate one of many cores of a CPU from Windows entirely, will leaving Windows operate on the remaining cores to accomplish all the rather mundane tasks of updating the screen, fetching (or storing) data from/to Hard disks, receiving new user data from the keyboard or from the mouse, transferring files over networks, etc… in brief all the tasks that Windows is rather good at, but keeping the audio engine as much away from it as possible. The way we made it to work is basically to dedicate the core where we wanted the audio mix engine to run to be operated by a true Real-time kernel that take possession of that core and even makes this core essentially invisible to Windows. It was with great amazement that we soon realized this was not only actually possible but that the results, in terms of achievable deterministic latency were even better than what we could achieve until then with our DSP based cards. With guaranteed response times in the order of a couple of microseconds (compared to tens of milliseconds under Windows) we were able to bring down the total Live-in to Live-out audio processing delay of the entire mix engine (including complex plugins) to under 1.4 ms which is at least a factor of 10 better than what even the best fine-tuned Native systems can achieve. The best part of it was however to come. By relaxing slightly the latency constraints (to say 5.4 ms, which is still OK for all but the most stringent live mixing situations) we just couldnt believe how much raw power was available from just one single core of those new Core2 Duo or Core2 Quad processors. We kept on adding input strips after input strips, mix busses after mix busses and quickly managed to reach the limit of what our (by then Pyramix V5) software was able to support while the CPU usage was still showing some moderate figure of between 10 percent and 20 percent of total available power. We then immediately spent the next few days feverishly expand all former software limitations we had put in place in the code, and discover to our great astonishment that the limit was well beyond anything we had envisioned in the first place. Not only could we achieve mix engines offering up to 384 concurrent channels on over 100 mix and aux busses, including full EQ and Dynamics on all input strips, plus generous finalizing plugins on the master outputs at 48 kHz sampling rate but were able also to create the first mix engine able to do not only that but also be reconfigured within seconds to mix 48 channels in DXD (352.8 kHz) for instance. Any other combination of sample rates in between is of course also possible, such as 96 channels at 192 kHz.
Such a large mix-engine needs of course also commensurate means of audio I/O, which is something one will not find by default in a general purpose computer. This is where our existing Mykerinos range of DSP cards come to action. While their DSP power is no longer needed in full, since the main mixing action is now taking place in a Pentium Core, they are still put to good use for all what relates to I/O acceleration and handling processes. Their I/O and sync capability remains however in full use to support both the I/O requirements, with MADI, AES, SDIF, TDIF or Analog daughtercards and to provide the heartbeat to the MassCore engine from whatever sync source is selected (Wordclock, Video or HDTV sync, LTC, etc.). This alone is probably one of the most cost-effective side-effect of the MassCore technology. Even our earliest customers, who might have purchased their first Mykerinos card as far back as 1999 can enjoy the power of MassCore almost 10 years later, without having had to scrap their entire hardware investment 2 or 3 times meanwhile. Its almost as giving their investment a second life and feeling young again for many additional years on top of that!
MassCore and plugins (VS3 and VST now)
Well, while it is nice to come up with one of the most powerful mix-engine available these days, many of our customers do not necessarily need such huge amount of tracks, channels and mix busses. However one of their almost universal sought-after feature is the capability to insert numerous plugins of their choice, at every stage of their mix-engine, be it in the strips, in auxes or main busses without (or with as few possible) limitations. While Pyramix V5 provided a quite restricted connectivity to VST plugins (limited to playback strips) Pyramix V6 and MassCore provides an unlimited (as a matter of fact only limited by total CPU and memory available in the system) possibility to connect any VST (or chain of) plugins in every conceivable order in every possible stage of the mix-engine. On top of that flexibility it does also fully automatically compensate for any VST plugin’s latency and provides full automation over all the automatable parameters of such plugins. Further to that, the fact that the MassCore engine operates in the core sitting just a few microns apart from the other core(s) within the same chip provide an ideal boost in performance, since the buffers that are used to transfer audio data between MassCore and Windows are sharing the same CPU memory, thus getting away with many costly transfer of data through PCI or PCIe busses from/to dedicated DSP of FPGA hardware. Even better, in most cases, these data will be readily available in the inner cache memory of the multi-core CPU itself, removing any requirement to load and fetch them from the slower external memory.
So while our users can now enjoy the most successful plugin platform in the best possible way, choosing from hundreds if not thousands of plugins written to adhere to the VST standard, there is still a small snag that we would like to share with you. While the MassCore engine operates on a well-controlled (timewise) platform, this is unfortunately not the case for VST plugins, which are designed to operate under Windows (or OS-X in the Mac world) and therefore cannot enjoy the strict deterministic behaviour of MassCore at the moment. In fact, as soon as even only one VST plugin is inserted in the MassCore engine, the entire system’s latency must be relaxed to cope with the much larger audio buffers required by VST plugins (usually ranging from 256 to well over 1000 samples in the worst case). This means that while all VS3 plugins have been ported to operate under MassCore, benefiting from the ultra-low latency inherent under the real-time kernel, VST were not conceived, in the first place, to live outside of Windows or OS-X.
VS4L, the merge of the VS3 (Merging’s own proprietary plugin architecture) and of the VST platform to benefit from MassCore
We strongly believe that in the same way that the mix-engine was ported to operate under a real-time kernel, the same could happen soon with a number of VST plugins. VS4L ported VST plugins can operate under MassCore benefiting from a tremendous improvement in their worst-case latency, bringing them from several tens of milliseconds to virtually zero latency, making them suitable for the first time for Live applications. Merging has recently started the VS4L (Virtual Studio for Live) initiative, talking with a limited number of VST plugin manufacturers to start porting some of their highest-quality grade plugins to what could become the next step for the very successful VST architecture. The great news is that initial tests showed that a properly written VST plugin can be ported to the MassCore environment in the matter of a few hours, often requiring not much more than a simple recompilation with the proper tools. Since a single afternoon time could see the port of several VST plugins, it becomes virtually a no-brainer for VST manufacturers, since no tedious rewriting of their code is necessary. Further to that, the MassCore version of their plugins enjoys additional protection against piracy, since it will not work on a plane standard Windows PC. As it is Mergings intention to make the VS4L an open format, we are very supportive to welcome additional VST partners over the course of this summer.
MassCore, a bright future
Well, that might be the best part of the MassCore technology, as it is indeed possible to envisage a future where not only one but a number of cores will be sharing a huge mix-engine. Merging is already prototyping a solution where up to 7 (out of 8, as we are nice guys and would leave Windows do its stuff in one of them) cores of a Octo-core system could be running MassCore so stay tuned as this could be announced as a new upgrade in the near future. And with Moore’s help there’s little doubt that 16 or 32 core systems are not that further away!