Two Kinds Of Particles Mac OS

ParticleShop-a powerful new plug-in for Adobe Photoshop, powered by Corel Painter-creates one-of-a-kind effects.Experience living grab-and-go particle brushes that are easy to use with a pressure sensitive tablet, touchscreen, or mouse, and which allow you to artistically enhance photos, designs, and illustrations with strokes of genius. Customer Satisfaction. Probably the most striking difference between Macs and PCs is in customer. There are two types of MAC protocol. There are two types of threads to be managed in a modern system: User threads and kernel threads. User threads are supported above the kernel, without kernel support. These are the threads that application programmers would put into their programs. Kernel threads are supported within the kernel of the OS itself. The 'classic' Mac OS is the original Macintosh operating system that was introduced in 1984 alongside the first Macintosh and remained in primary use on Macs until the introduction of Mac OS X in 2001. Apple released the original Macintosh on January 24, 1984; its early system software was partially based on the Lisa OS and the Xerox PARC Alto computer, which former Apple CEO Steve Jobs.

References:

Two Kinds Of Particles Mac Os 11

  1. Abraham Silberschatz, Greg Gagne, and Peter Baer Galvin, 'Operating System Concepts, Ninth Edition ', Chapter 4

4.1 Overview

  • A thread is a basic unit of CPU utilization, consisting of a program counter, a stack, and a set of registers, ( and a thread ID. )
  • Traditional ( heavyweight ) processes have a single thread of control - There is one program counter, and one sequence of instructions that can be carried out at any given time.
  • As shown in Figure 4.1, multi-threaded applications have multiple threads within a single process, each having their own program counter, stack and set of registers, but sharing common code, data, and certain structures such as open files.


Figure 4.1 - Single-threaded and multithreaded processes

4.1.1 Motivation

  • Threads are very useful in modern programming whenever a process has multiple tasks to perform independently of the others.
  • This is particularly true when one of the tasks may block, and it is desired to allow the other tasks to proceed without blocking.
  • For example in a word processor, a background thread may check spelling and grammar while a foreground thread processes user input ( keystrokes ), while yet a third thread loads images from the hard drive, and a fourth does periodic automatic backups of the file being edited.
  • Another example is a web server - Multiple threads allow for multiple requests to be satisfied simultaneously, without having to service requests sequentially or to fork off separate processes for every incoming request. ( The latter is how this sort of thing was done before the concept of threads was developed. A daemon would listen at a port, fork off a child for every incoming request to be processed, and then go back to listening to the port. )


Figure 4.2 - Multithreaded server architecture

4.1.2 Benefits

  • There are four major categories of benefits to multi-threading:
    1. Responsiveness - One thread may provide rapid response while other threads are blocked or slowed down doing intensive calculations.
    2. Resource sharing - By default threads share common code, data, and other resources, which allows multiple tasks to be performed simultaneously in a single address space.
    3. Economy - Creating and managing threads ( and context switches between them ) is much faster than performing the same tasks for processes.
    4. Scalability, i.e. Utilization of multiprocessor architectures - A single threaded process can only run on one CPU, no matter how many may be available, whereas the execution of a multi-threaded application may be split amongst available processors. ( Note that single threaded processes can still benefit from multi-processor architectures when there are multiple processes contending for the CPU, i.e. when the load average is above some certain threshold. )

4.2 Multicore Programming

  • A recent trend in computer architecture is to produce chips with multiple cores, or CPUs on a single chip.
  • A multi-threaded application running on a traditional single-core chip would have to interleave the threads, as shown in Figure 4.3. On a multi-core chip, however, the threads could be spread across the available cores, allowing true parallel processing, as shown in Figure 4.4.


Figure 4.3 - Concurrent execution on a single-core system.


Figure 4.4 - Parallel execution on a multicore system

  • For operating systems, multi-core chips require new scheduling algorithms to make better use of the multiple cores available.
  • As multi-threading becomes more pervasive and more important ( thousands instead of tens of threads ), CPUs have been developed to support more simultaneous threads per core in hardware.

4.2.1 Programming Challenges ( New section, same content ? )

  • For application programmers, there are five areas where multi-core chips present new challenges:
    1. Identifying tasks - Examining applications to find activities that can be performed concurrently.
    2. Balance - Finding tasks to run concurrently that provide equal value. I.e. don't waste a thread on trivial tasks.
    3. Data splitting - To prevent the threads from interfering with one another.
    4. Data dependency - If one task is dependent upon the results of another, then the tasks need to be synchronized to assure access in the proper order.
    5. Testing and debugging - Inherently more difficult in parallel processing situations, as the race conditions become much more complex and difficult to identify.

4.2.2 Types of Parallelism ( new )

In theory there are two different ways to parallelize the workload:

  1. Data parallelism divides the data up amongst multiple cores ( threads ), and performs the same task on each subset of the data. For example dividing a large image up into pieces and performing the same digital image processing on each piece on different cores.
  2. Task parallelism divides the different tasks to be performed among the different cores and performs them simultaneously.

In practice no program is ever divided up solely by one or the other of these, but instead by some sort of hybrid combination.

4.3 Multithreading Models

  • There are two types of threads to be managed in a modern system: User threads and kernel threads.
  • User threads are supported above the kernel, without kernel support. These are the threads that application programmers would put into their programs.
  • Kernel threads are supported within the kernel of the OS itself. All modern OSes support kernel level threads, allowing the kernel to perform multiple simultaneous tasks and/or to service multiple kernel system calls simultaneously.
  • In a specific implementation, the user threads must be mapped to kernel threads, using one of the following strategies.

4.3.1 Many-To-One Model

  • In the many-to-one model, many user-level threads are all mapped onto a single kernel thread.
  • Thread management is handled by the thread library in user space, which is very efficient.
  • However, if a blocking system call is made, then the entire process blocks, even if the other user threads would otherwise be able to continue.
  • Because a single kernel thread can operate only on a single CPU, the many-to-one model does not allow individual processes to be split across multiple CPUs.
  • Green threads for Solaris and GNU Portable Threads implement the many-to-one model in the past, but few systems continue to do so today.


Figure 4.5 - Many-to-one model

4.3.2 One-To-One Model

  • The one-to-one model creates a separate kernel thread to handle each user thread.
  • One-to-one model overcomes the problems listed above involving blocking system calls and the splitting of processes across multiple CPUs.
  • However the overhead of managing the one-to-one model is more significant, involving more overhead and slowing down the system.
  • Most implementations of this model place a limit on how many threads can be created.
  • Linux and Windows from 95 to XP implement the one-to-one model for threads.


Figure 4.6 - One-to-one model

4.3.3 Many-To-Many Model

  • The many-to-many model multiplexes any number of user threads onto an equal or smaller number of kernel threads, combining the best features of the one-to-one and many-to-one models.
  • Users have no restrictions on the number of threads created.
  • Blocking kernel system calls do not block the entire process.
  • Processes can be split across multiple processors.
  • Individual processes may be allocated variable numbers of kernel threads, depending on the number of CPUs present and other factors.


Figure 4.7 - Many-to-many model

  • One popular variation of the many-to-many model is the two-tier model, which allows either many-to-many or one-to-one operation.
  • IRIX, HP-UX, and Tru64 UNIX use the two-tier model, as did Solaris prior to Solaris 9.


Figure 4.8 - Two-level model

4.4 Thread Libraries

  • Thread libraries provide programmers with an API for creating and managing threads.
  • Thread libraries may be implemented either in user space or in kernel space. The former involves API functions implemented solely within user space, with no kernel support. The latter involves system calls, and requires a kernel with thread library support.
  • There are three main thread libraries in use today:
    1. POSIX Pthreads - may be provided as either a user or kernel library, as an extension to the POSIX standard.
    2. Win32 threads - provided as a kernel-level library on Windows systems.
    3. Java threads - Since Java generally runs on a Java Virtual Machine, the implementation of threads is based upon whatever OS and hardware the JVM is running on, i.e. either Pthreads or Win32 threads depending on the system.
  • The following sections will demonstrate the use of threads in all three systems for calculating the sum of integers from 0 to N in a separate thread, and storing the result in a variable 'sum'.

4.4.1 Pthreads

Two kinds of particles mac os 11
  • The POSIX standard ( IEEE 1003.1c ) defines the specification for pThreads, not the implementation.
  • pThreads are available on Solaris, Linux, Mac OSX, Tru64, and via public domain shareware for Windows.
  • Global variables are shared amongst all threads.
  • One thread can wait for the others to rejoin before continuing.
  • pThreads begin execution in a specified function, in this example the runner( ) function:


Figure 4.9


New

4.4.2 Windows Threads

  • Similar to pThreads. Examine the code example to see the differences, which are mostly syntactic & nomenclature:


Figure 4.11

4.4.3 Java Threads

  • ALL Java programs use Threads - even 'common' single-threaded ones.
  • The creation of new Threads requires Objects that implement the Runnable Interface, which means they contain a method 'public void run( )' . Any descendant of the Thread class will naturally contain such a method. ( In practice the run( ) method must be overridden / provided for the thread to have any practical functionality. )
  • Creating a Thread Object does not start the thread running - To do that the program must call the Thread's 'start( )' method. Start( ) allocates and initializes memory for the Thread, and then calls the run( ) method. ( Programmers do not call run( ) directly. )
  • Because Java does not support global variables, Threads must be passed a reference to a shared Object in order to share data, in this example the 'Sum' Object.
  • Note that the JVM runs on top of a native OS, and that the JVM specification does not specify what model to use for mapping Java threads to kernel threads. This decision is JVM implementation dependant, and may be one-to-one, many-to-many, or many to one.. ( On a UNIX system the JVM normally uses PThreads and on a Windows system it normally uses windows threads. )


Figure 4.12

4.5 Implicit Threading ( Optional )

Shifts the burden of addressing the programming challenges outlined in section 4.2.1 above from the application programmer to the compiler and run-time libraries.

4.5.1 Thread Pools

  • Creating new threads every time one is needed and then deleting it when it is done can be inefficient, and can also lead to a very large ( unlimited ) number of threads being created.
  • An alternative solution is to create a number of threads when the process first starts, and put those threads into a thread pool.
    • Threads are allocated from the pool as needed, and returned to the pool when no longer needed.
    • When no threads are available in the pool, the process may have to wait until one becomes available.
  • The ( maximum ) number of threads available in a thread pool may be determined by adjustable parameters, possibly dynamically in response to changing system loads.
  • Win32 provides thread pools through the 'PoolFunction' function. Java also provides support for thread pools through the java.util.concurrent package, and Apple supports thread pools under the Grand Central Dispatch architecture..

4.5.2 OpenMP

  • OpenMP is a set of compiler directives available for C, C++, or FORTRAN programs that instruct the compiler to automatically generate parallel code where appropriate.
  • For example, the directive:

would cause the compiler to create as many threads as the machine has cores available, ( e.g. 4 on a quad-core machine ), and to run the parallel block of code, ( known as a parallel region ) on each of the threads.

  • Another sample directive is '#pragma omp parallel for', which causes the for loop immediately following it to be parallelized, dividing the iterations up amongst the available cores.

4.5.3 Grand Central Dispatch, GCD

  • GCD is an extension to C and C++ available on Apple's OSX and iOS operating systems to support parallelism.
  • Similar to OpenMP, users of GCD define blocks of code to be executed either serially or in parallel by placing a carat just before an opening curly brace, i.e. ^{ printf( 'I am a block.n' ); }
  • GCD schedules blocks by placing them on one of several dispatch queues.
    • Blocks placed on a serial queue are removed one by one. The next block cannot be removed for scheduling until the previous block has completed.
    • There are three concurrent queues, corresponding roughly to low, medium, or high priority. Blocks are also removed from these queues one by one, but several may be removed and dispatched without waiting for others to finish first, depending on the availability of threads.
  • Internally GCD manages a pool of POSIX threads which may fluctuate in size depending on load conditions.

4.5.4 Other Approaches

There are several other approaches available, including Microsoft's Threading Building Blocks ( TBB ) and other products, and Java's util.concurrent package.

Two kinds of particles mac os catalina

4.6 Threading Issues

4.6.1 The fork( ) and exec( ) System Calls

  • Q: If one thread forks, is the entire process copied, or is the new process single-threaded?
  • A: System dependant.
  • A: If the new process execs right away, there is no need to copy all the other threads. If it doesn't, then the entire process should be copied.
  • A: Many versions of UNIX provide multiple versions of the fork call for this purpose.

4.6.2 Signal Handling

  • Q: When a multi-threaded process receives a signal, to what thread should that signal be delivered?
  • A: There are four major options:
    1. Deliver the signal to the thread to which the signal applies.
    2. Deliver the signal to every thread in the process.
    3. Deliver the signal to certain threads in the process.
    4. Assign a specific thread to receive all signals in a process.
  • The best choice may depend on which specific signal is involved.
  • UNIX allows individual threads to indicate which signals they are accepting and which they are ignoring. However the signal can only be delivered to one thread, which is generally the first thread that is accepting that particular signal.
  • UNIX provides two separate system calls, kill( pid, signal ) and pthread_kill( tid, signal ), for delivering signals to processes or specific threads respectively.
  • Windows does not support signals, but they can be emulated using Asynchronous Procedure Calls ( APCs ). APCs are delivered to specific threads, not processes.

4.6.3 Thread Cancellation

  • Threads that are no longer needed may be cancelled by another thread in one of two ways:
    1. Asynchronous Cancellation cancels the thread immediately.
    2. Deferred Cancellation sets a flag indicating the thread should cancel itself when it is convenient. It is then up to the cancelled thread to check this flag periodically and exit nicely when it sees the flag set.
  • ( Shared ) resource allocation and inter-thread data transfers can be problematic with asynchronous cancellation.

4.6.4 Thread-Local Storage ( was 4.4.5 Thread-Specific Data )

  • Most data is shared among threads, and this is one of the major benefits of using threads in the first place.
  • However sometimes threads need thread-specific data also.
  • Most major thread libraries ( pThreads, Win32, Java ) provide support for thread-specific data, known as thread-local storage or TLS. Note that this is more like static data than local variables,because it does not cease to exist when the function ends.

4.6.5 Scheduler Activations

  • Many implementations of threads provide a virtual processor as an interface between the user thread and the kernel thread, particularly for the many-to-many or two-tier models.
  • This virtual processor is known as a 'Lightweight Process', LWP.
    • There is a one-to-one correspondence between LWPs and kernel threads.
    • The number of kernel threads available, ( and hence the number of LWPs ) may change dynamically.
    • The application ( user level thread library ) maps user threads onto available LWPs.
    • kernel threads are scheduled onto the real processor(s) by the OS.
    • The kernel communicates to the user-level thread library when certain events occur ( such as a thread about to block ) via an upcall, which is handled in the thread library by an upcall handler. The upcall also provides a new LWP for the upcall handler to run on, which it can then use to reschedule the user thread that is about to become blocked. The OS will also issue upcalls when a thread becomes unblocked, so the thread library can make appropriate adjustments.
  • If the kernel thread blocks, then the LWP blocks, which blocks the user thread.
  • Ideally there should be at least as many LWPs available as there could be concurrently blocked kernel threads. Otherwise if all LWPs are blocked, then user threads will have to wait for one to become available.

Two Kinds Of Particles Mac Os Catalina


Figure 4.13 - Lightweight process ( LWP )

4.7 Operating-System Examples ( Optional )

4.7.1 Windows XP Threads

  • The Win32 API thread library supports the one-to-one thread model
  • Win32 also provides the fiber library, which supports the many-to-many model.
  • Win32 thread components include:
    • Thread ID
    • Registers
    • A user stack used in user mode, and a kernel stack used in kernel mode.
    • A private storage area used by various run-time libraries and dynamic link libraries ( DLLs ).
  • The key data structures for Windows threads are the ETHREAD ( executive thread block ), KTHREAD ( kernel thread block ), and the TEB ( thread environment block ). The ETHREAD and KTHREAD structures exist entirely within kernel space, and hence are only accessible by the kernel, whereas the TEB lies within user space, as illustrated in Figure 4.10:


Figure 4.14 - Data structures of a Windows thread

4.7.2 Linux Threads

  • Linux does not distinguish between processes and threads - It uses the more generic term 'tasks'.
  • The traditional fork( ) system call completely duplicates a process ( task ), as described earlier.
  • An alternative system call, clone( ) allows for varying degrees of sharing between the parent and child tasks, controlled by flags such as those shown in the following table:
flagMeaning
CLONE_FSFile-system information is shared
CLONE_VMThe same memory space is shared
CLONE_SIGHANDSignal handlers are shared
CLONE_FILESThe set of open files is shared
  • Calling clone( )with no flags set is equivalent to fork( ). Calling clone( ) with CLONE_FS, CLONE_VM, CLONE_SIGHAND, and CLONE_FILES is equivalent to creating a thread, as all of these data structures will be shared.
  • Linux implements this using a structure task_struct, which essentially provides a level of indirection to task resources. When the flags are not set, then the resources pointed to by the structure are copied, but if the flags are set, then only the pointers to the resources are copied, and hence the resources are shared. ( Think of a deep copy versus a shallow copy in OO programming. )
  • ( Removed from 9th edition ) Several distributions of Linux now support the NPTL ( Native POXIS Thread Library )
    • POSIX compliant.
    • Support for SMP ( symmetric multiprocessing ), NUMA ( non-uniform memory access ), and multicore processors.
    • Support for hundreds to thousands of threads.

4.8 Summary

MAIN INDEX of latest speed tests

GeForce 6800 Ultra DDL graphics card
versus others

Originally posted October 12th 2004, by rob-ART morgan, mad scientist
Updated October 13th, 2004 with Quake3 results
Updated October 15th, 2004, with Motion results
Special thanks to 'remote mad scientist,' hackintosh,
for his help in making this article happen

The NVIDIA GeForce 6800 Ultra DDL is the new top end 8X AGP graphics card for the G5 Power Mac. It's the only card that will drive the new 30' Apple Cinema Display. But you don't have to have that display to take advantage of the card's speed. You can use it with any display. Though our test unit hasn't arrived yet, with the help of a 'remote mad scientist,' we have some performance data for you. This page has results running on a 17' LCD display at 1280x1024. For results at 1920x1200, see our OTHER GeForce 6800 PAGE.

The Unreal Tournament 2004 (UT2004) test was done using SantaDuck LCDBench for at Maximum Settings. We chose the FLYBY posted above because it's one of the best ways to measure the contribution of the GPU over the CPU. In the graph below, we show the BOTMATCH using the same tool and settings. It is a CPU 'bound' test, so it makes little or no difference what card you use.

Halo tends to use a combination of CPU and GPU to do its thing. We either turned on or set 'high' every feature to stress the GPU. The exceptions were Lens Flare, Sound Quality, and Sound Variety which we set to low since those are CPU functions.

In consultation with the Halo developers, we came up with these settings to stress the graphics cards:
HW Shaders = ATI Pixel and Vertex Shaders*, FSAA = 4X,
Lens Flare = low, Model Quality = high
VIDEO: Resolution 1280x1024, Refresh = 0, Framerate Throttle = no vsync
Specular = on, Shadows = on, Decals = on,
Particles = high, Texture Quality = high
SOUND: Sound = on, Sound Quality = low, Sound Variety = low

(* At the suggestion of the Halo developers, we made two kinds of runs with the GeForce 6800 -- one using NVIDIA NV and Vertex Shaders and one using ATI Pixel and Vertex Shaders. Note that ATI shaders produced the higher frame rate.)

Quake 3 Arena fans will be happy to know that it ROCKS on the GeForce 6800:

Motion is the newest test of graphic cards. As you can see below, how fast you can render a project for preview depends on your graphics card's speed as much as it does on your cpu speed. A G5/2.5GHz Power Mac with a GeForce 6800 renders the 300 frame 'Fire - Mortise 2' template 38% faster than the same computer with a Radeon 9800 XT and 86% faster than the same computer with the Radeon 9600 XT. That's almost like having a third CPU.


GRAPH LEGEND
Graphics Cards
GeF68 = nVidia GeForce 6800 Ultra DDL (8X, 256MB)
Rad98 XT = Radeon 9800 XT OEM (8X, 256MB
Rad98 SE = Radeon 9800 Pro Mac Special Edition (8X, 256MB)
Rad96 XT = Radeon 9600 XT OEM (8X, 128MB)
Rad98 R = Radeon 9800 Pro Mac Retail Edition (2X/4X 128MB)
CPUs
G5/2.5 = G5/2.5GHz MP Power Mac
G5/2 = G5/2.0GHz MP Power Mac

CONCLUSIONS
The NVIDIA GeForce 6800 Ultra DDL is more than a card for driving a 30' display. It's got some real horsepower for running 3D accelerated games. And as we found out, it screams when you render Motion RAM PREVIEWs.

Being a 'madman' for speed, I am definitely lusting over this card. However, be aware of the fact that it 'eats' one of your PCI-X slots. The large heatsink/fan assembly encroaches on the PCI-X slot adjacent to the AGP slot. So if you are depending on having all three PCI-X slots available for use in your G5, this is not the card for you. Ditto for the Radeon 9800 XT. That might be why Apple is offering the Radeon 9800 Pro SE as a kit.

If you are one of those who bought the RocketRAID 8 port SATA PCI-X adapter, you might still be able to route the data cables out through the back of slot 2 with the backplate off.

Is the $599 aftermarket GeForce 6800 kit worth 50% more than the Radeon 9800 Pro Special Edition kit to get up to 115% more 3D game speed. Yeah.

Is the Configure-To-Order (CTO) $450-$500 GeForce 6800 worth 41%-50% more than the CTO Radeon 9800 XT to get up to 102% more 3D game speed. Duh.

It may seem cheaper if you buy the GeForce 6800 as a CTO option on your new G5 -- but remember, that $450-$500 addon price includes the 'credit' for the Radeon 9600 XT or GeForceFX 5200 it replaces. Hmmm. That gives me an idea: Why not sell your old card on eBay to help recover some of the cost of your 'smokin' GeForce 6800?

FLASH: ATI showed off a Radeon X800 graphics card running on a G5 at the Digital Life Expo in New York this week. It only uses one slot and they are confident will match the performance of the GeForce 6800 Ultra. We hope to verify that soon.

CAUTION: I've been informed that a bunch of guys who got their 6800's are reporting that the card causes OS X 10.3.5 or lower to crash if the driver is not installed. This means that the Apple Install DVD is useless with the card in the slot. It also means that you can't reinstall the OS or fix permissions while booting off the DVD. It's probably a good idea to install the driver before you install the card. Then I suggest using Carbon Copy Cloner to make a back up drive that can booted from with the card installed. You may also want to make a bootable Install DVD as suggested on Blargatron.

RELATED ARTICLES
The GeForce 6800 Ultra running at 1920x1200 resolution
The Radeon 9800 XT running Motion, UT2003, and Quake3.

WHERE TO BUY FAST GRAPHICS CARDS
The Apple Online Store offers the 9600 XT ($50), 9800 XT ($300-$350), and GeForce 6800 ($450-500) as 'configure-to-order' (CTO) options when purchasing a new G5 Power Mac.

If you already own a G5 Power Mac and want to upgrade to the 9600 XT or 9800 XT, you can't. They aren't available as kits from Apple (yet). And they aren't in retail channels (yet). Apple's Online Store does, however, sell aftermarket kits for the Radeon 9800 Pro 'Mac Special Edition' ($399) and the GeForce 6800 Ultra DDL ($599).

There are various resellers carrying the Radeon 9800 Pro Mac Special Edition. Buy.com has it and it qualifies for the $10 off coupon and free shipping. Check also with Other World Computing and Small Dog Electronics.

If you own a G4 Power Mac with a 2X or 4X AGP slot, your best option is the Radeon 9800 Pro Retail Edition (128MB). I included it in the test graphs above so you can see that it is almost as fast as the other Radeon 9800 models. It is also at Buy.com and also qualifies for the $10 off coupon and free shipping. Check also with Other World Computing or Small Dog Electronics or the Apple Online Store.

If you own a Power Mac with only PCI slots, you might want to upgrade to the Radeon 9200 Mac Edition. But read my report on that card first.

Has Bare Feats helped you? How about helping Bare Feats?








© 2004 Rob Art Morgan
'BARE facts on Macintosh speed FEATS'
Email , the webmaster and mad scientist