Product Search:
Interphase Products

Multicore Processing

What Is Multicore Processing?


Multicore (or multi-core) processing uses software designed to run as parallel or asynchronously processed multiple applications over a multicore processor. The efficiency of multicore processing is dependent on how well the software application is optimized to take advantage of the multiple cores, the composition of those multicore processors, and the speed of the external interfaces and related hardware components in a system. Multicore processing can be especially useful for low-latency applications, with the largest boost in performance likely to be noticed in improved response time while running CPU-intensive processes.

Interphase Multicore Solutions

Multicore Processors - A Hardware View

Multicore Software

Multicore and Virtualization

Key Multicore Applications


Interphase Multicore Solutions


Interphase offers a family of wire-speed packet processing cards for use in many packet processing applications such as the delivery of broadband services in the 3G Wireless, Voice Over IP, and IMS network infrastructure. These iSPAN® cards are extremely versatile and provide the functionality necessary for migrating to next-generation infrastructures and converged networks. They are ideal for designing into wireless network elements such as node-B/RNCs, media servers, line-rate encryption and security functions, VoIP, stateful protocol identification, edge/access routers, and deep packet inspection (DPI) uses such as policy enforcement applications.




Multicore Processors - a Hardware View


Multicore processor refers to a single integrated circuit chip or die that includes more than one processing unit. Each processor or core shares the same interconnect to the rest of the system. Each core may independently implement optimizations such as superscalar execution (a CPU architecture that allows more than one instruction to be executed in one clock cycle), pipelining (a standard feature in RISC processors, is much like an assembly line: the processor works on different steps of the instruction at the same time), and multithreading (a specialized form of multitasking enabling concurrent execution of pieces of the same program). Using multiple processors on a single piece of silicon:

    • Enables increased parallelism
    • Saves space on a printed circuit board which enables smaller footprint boards and related cost savings
    • Reduces distance between processors which enables faster intercommunication with less signal degradation than if signals had to travel off-chip between processors
    • Reduces the dependence on growth of processor speeds and the related increasing gap between processor and memory speeds, the additional power needed to run higher-speed processors, and the difficulty in finding enough parallelism in the instruction stream of a single process to efficiently use higher-speed processors


Typical Multicore Processor Design

Illustration source: Cavium Networks


In addition to general multi-purpose multicore processors, there are specialized versions for certain applications:

    • Network services processor (NSP), which is optimized for network service providers in the converged communication network
    • Multimedia
    • Recognition


Here is a closer look at some of the features that make multicore NSPs particularly suited for network services.

    • Optimized for packet handling with massive computation power required for providing network services that involve deep packet inspection for all network traffic
    • Use multiple cores with separate packet-scheduling hardware
    • Reduced instruction set, using a standard instruction set architecture (ISA), so software architects and developers can leverage an existing code base, use popular development environments with standard programming models
    • On-chip packet workload balancer that simplifies software programming and allows multiple cores to coordinate work between them for various packets and flows, without stepping each others toes
    • Offer operating system support for Linux or Real Time
    • Include hardware accelerators for security encryption and hashing which not only eliminates off-chip co-processors but eliminates system interconnect bottlenecks caused by off-core traffic for each packet
    • Built-in TCP co-processors since TCP termination is an important aspect of application-aware services and emerging storage networking applications
    • Available from multiple vendors such as Cavium and RMI


Heat dissipation and data synchronization are issues that arise with the use of multicore processors. The design of the board or card using multicore processors can play a huge role in keeping system heat levels in the acceptable range, as can various system cooling methods. Heat dissipation is one of the factors to consider when designing high-powered, multicore-driven applications. The data synchronization issues that can arise with multicore designs can be well addressed by the multicore and application software, as covered in the next section.



Multicore Software


Managing concurrency is crucial in developing parallel applications using any multiprocessing models. When designing such applications, the developer must consider: 1) partitioning, which is identifying individual tasks and deciding which tasks can be executed in parallel, 2) mapping the information flow, which is determining the intercommunication of tasks where they must share data, 3) combining or agglomerating the smaller tasks and decide whether to replicate or share data or computations, and 4) mapping where each task is to execute.


Multicore software can be implemented in various forms.¹ In a system implementing symmetrical multiprocessing (SMP), multiple cores are essentially interchangeable as they execute the operating system and tasks. A variation on SMP uses affinity or CPU reservation to bind specific tasks and specific cores, which effectively makes them dedicated processors.


Asymmetrical multiprocessing (AMP) is typically used to describe systems in which multiple independent operating systems are implemented. Supervised AMP uses virtualization² to abstract processing elements such as memory, cores, or devices. More on virtualization can be found in the next section of this article.


¹Device Software Optimization for Concurrent and Consecutive Systems, Wind River, whitepapers
²Achieving Business Goals with Wind Rivers Mulicore Solution, Wind River, whitepaper


To address the complexity of programming for multicore architectures, vendors have come up with software toolkits and more programming-friendly core architectures. For example, the Software Tools Ecosystem for Intel IXP tried to simplify the software development task. However, the MIPS processor core architecture, which supports well-known operating systems (OS) such as Linux, VxWorks, and Real Time Operating System (RTOS), allowing software tool vendors to port applications across multiple vendors has emerged as a preferred architecture.


The different OS types have their own strengths and are typically used for different portions of an application. RTOS supports wire-rate packet processing, which is perfect for pattern matching, routing, compression/decompression, encryption/decryption, etc. Processor-specific RTOS is available from the multicore processor vendors, for example Cavium Simple Executive and RMI RTOS. VxWorks offers another popular RTOS available for a number of multicore processors. Linux supports management and exception process quite well for user/device authentication, security associations, and accounting functions. It is available from various vendors including Wind River PNE, Monte Vista CGE, and Red Hat and is also available open source from, Debian, and others.


Here is an illustration of how these operating systems may be applied in a multicore-architected communication networking offload application.


Multicore Software Scheme for Network Equipment


This illustration shows the networking equipment architecture and the relationship of task to the OS used:


    • Linux OS is used for the signaling function (control plane), for protocol handing and maintaining information for the data plane
    • Linux OS is used for the forwarding function (data plane) where slow path is used for complex processing of exception packets
    • RTOS is used for the forwarding function (data plane) where fast path is used for simplified processing of the majority of packets
    • The control plane and data plane can be co-localized or distributed over multiple processors
    • High performance is achieved by running Fast Path on dedicated hardware
    • The data plane is easily parallelized and well suited for running on multiple cores in parallel


Toolkits are now emerging that can exploit the power of a multicore processor as well as use the real time operating systems provided with the different processors including:


    • 6WIND 6WINDGate™
    • SafeNet Quicksec™
    • Interphase Protocol Accelerators for Session Initiation Protocol (SIP)/Real-time Transport Protocol (RTP), GPRS Tunneling Protocol (GTP-u), and Signaling Transport protocol (SIGTRAN)
    • Qosmos ixDPI



Multicore and Virtualization


Virtualization is the abstraction of computer resources such as memory, cores, and devices. It requires a scalable hypervisor, also called virtual machine monitor (VMM), which is computer software/hardware platform virtualization software that allows multiple operating systems to run on a host computer concurrently while sharing the underlying hardware resources. Like multicore processing, virtualization presents opportunities to reduce hardware costs and power consumption while enabling new platform-level capabilities. Combining the two offers more options for enhancing platform performance, security, scalability, certifiability, and usability. The following illustration shows four software configurations enabled by multicore and virtualization:


    • symmetric multiprocessing, with one OS running across two cores
    • asymmetric multiprocessing, with one OS per core
    • single-core virtualization, with two OS managed by a hypervisor running on one CPU
    • multicore virtualization, the most flexible of the four, with two OS (or more) managed by a hypervisor running on two or more cores


Multicore Virtulization Models

Illustration Source: Achieving Business Goals with Wind Rivers Mulicore Solution, Wind River, whitepaper


Operating system software tends to run many threads as part of its normal operation, making multicore a great choice for virtual machines, since each virtual machine runs independently of others and can be executed in parallel. This makes multicore processing ideal for application servers that allow many users to connect to a site simultaneously and have independent threads of execution. In many cases, multicore and virtualization technologies can help device makers meet business goals regarding performance, cost, and differentiation.



Key Applications for Multicore Network Processors


Packet processing building blocks are found in applications across all elements in the converged communication network. Evolution in the technology and market uses of these elements is driving the need for more advanced packet processing:


    • The 3G evolution to HSPA and packet-based access architecture affect node base stations (Node-B), radio network controllers (RNC), and media gateways
    • Security and encryption requirements affect network access and the core elements including network address translation (NAT), session border control (SBC), call session control functions (x-CSCF) in the IP Multimedia Subsystem (IMS) architecture, Lawful Intercept, and wireless access gateways (WiMAX, 3GPP Long Term Evolution (LTE), and Femto cells)
    • Increased traffic demands affect packet switches and routers
    • Evolution in content delivery, management, and transcoding affect media servers, content servers in IPTV, mobile TV application servers, and related billing


Key Applications for Multicore Across Network Elements


Multicore packet process not only improves network element performance, it enables the convergence of multiple functions and diverse services from different systems into a single system. This raises challenges related to the need for speed: 1) network traffic needs to be handled at up to multi-gigabit rates, 2) deep packet inspection is generally required for all packets, 3) wire-speed security needs to be applied at each network layer, from Layer 3 to Layer 7, and 4) services need to be performed on the processed network packets.



Next Steps


Contact us for more information about our COTS or custom packet processor solutions.