Bits and Bytes

Here is a sort of glossary of computer buzzwords you will encounter in computer use:

Computer processors can only tell if a wire is on or off. Luckily, they can look at lots of wires at a time (see buss), and react to a complex pattern of ons and offs in pretty sophisticated ways. To translate these patterns into something that makes sense to humans, we consider a wire that is on to be a "1" and a wire that is off to be a "0". Then we can look at the wires leading into a computer and read something like 00110111 00010000. We don't know what that represents to the processor, it's just a pattern. Each place in the pattern is a bit, which may be 1 or 0. If it means a number to the processor, the bits make up a binary number.

Binary Numbers
Most of us count by tens these days. Ancient cultures used to count by 5s or 12s or 24s, but for the last thousand years, counting by tens has been the norm. when you see the number 145, you just know it includes one group of ten tens, plus four groups of ten, and five more. Ten tens is a hundred or ten squared. Ten hundreds is a thousand, or ten to the third. There's a pattern here. Each digit represents the number of tens raised to the power of the position of the digit, provide you start counting with zero and count right to left.

If you do the same thing with bits that can only be 1 or 0, each position in the list of bits represents some power of two. 1001 means one eight plus no fours plus no twos plus one extra. This is called binary notation. You can convert numbers from binary notation to decimal notation, but you seldom have to.

Numbers like 00110111 10110000 are a lot easier to read if you put spaces every 8 bits. In decimal notation, we use commas every three digits for the same reason. There's nothing special about 8 bits, it just kind of got started that way. Hardware is easier to build if you group the wires consistently from one piece to another. Some older hardware used to group wires in 10s, but in the 70s the idea of working in groups of 8 really took over, especially in the design of integrated circuits. Somebody made a joke about a group carrying a byte of the data, and the term stuck. Sometimes you hear a group of four bits called a nibble.

The largest number you can represent with 8 bits is 11111111, or 255 in decimal notation. Since 00000000 is the smallest, you can represent 256 things with a byte. (Remember, a bite is just a pattern. It can represent a letter or a shade of green.) The bits in a byte have numbers. The rightmost bit is bit 0, and the left hand one is bit 7. Those two bits also have names. The rightmost is the least significant bit or lsb. It is least significant, because changing it has the smallest effect on the value. Which is the msb? (Bytes in larger numbers can also be called least significant and most significant.)


Hexadecimal Numbers
Even with the space, 00110111 10110000 is pretty hard to read. Software writers often use a code called hexadecimal to represent binary patterns. Hexadecimal was created by taking the decimal to binary idea and going the other way. Someone added six digits to the normal 0-9 so a number up to 15 can be represented by a single symbol. Since they had to be typed on a normal keyboard, the letters A-F were used. One of these can represent four bits worth, so a byte is written as two hexadecimal digits. 00110111 10110000 becomes 37B0.

Here's a handy table:
Hex binary decimal
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9
A 1010 10
B 1011 11
C 1100 12
D 1101 13
E 1110 14
F 1111 15


With three different schemes running around, it's easy to confuse numbers. 1000 can translate to a thousand, eight, or four thousand and ninety six. You have to indicate which system you are using. The fact that you still sometimes see an obsolete system called octal (digits 0-7. You can work it out) adds to the potential for confusion. Hexadecimal numbers can be indicated by writing them 1000hex 1000h or 0x1000. Binary numbers can be written 1000bin . Octal numbers were just written with an extra leading 0. Decimal numbers are not indicated, unless there's some possibility of confusion, such as one in a page of hex numbers.

In electrical systems, a wire that connects to more than two devices is called a buss. Typically you have a power buss that supplies current to all of the parts that need it, and a ground buss that takes the current back to the power supply. (All current paths must be a round trip.)

In computer engineering, the concept of a buss has been expanded to mean a group of wires that carries data around the system. There's usually enough wires to handle one to four bytes. The size of these busses has a big effect on the efficiency of the system. A 32 bit buss can handle numbers twice as long (meaning 2 to the 16th bigger) than a 16 bit buss.

Serial Data
You can send big numbers down a narrow buss if you send it in chunks. If you have an eight bit buss, you can send bytes one after another, and the processor can put the bytes together. This can be down with a single wire buss. Then the bits come one at a time -- this is called serial data transmission.

A computer wouldn't be much use if it couldn't store data. There have been many schemes for storing data over the years, but the way it's done today involves wiring transistors so they stay on when turned on and stay off when turned off. A transistor can then store a bit. The transistors are organized in groups of 8, so each group can store a byte. A single integrated circuit may have millions of these groups.

Each member of the group is connected to one wire of the data buss. A group can be instructed by some other wires to copy the state of the buss, or to connect their outputs to the buss, so the buss reflects what's in this group. These other wires are in fact a second buss called the address buss. By manipulating the address buss, the central processor can choose which particular group of transistors (or memory location) to read or modify. The number of wires in the address buss determines how many memory locations it could possibly address.

This kind of memory is called RAM for random access memory. Since it depends on transistors to stay on, all data goes away when the power is turned off. Some computers can keep the memory by never really turning off. They have a battery that keeps enough power to the memory transistors that they don't forget.

Another kind of memory is called ROM, for read only memory. There are various types of this, but the most common is like an array of fuses. Any that are blown represent a 0. Nothing can change what's in read only memory, so any program or data in there is available as soon as the computer is turned on.

Since the memory is cleared when the power goes off, there needs to be some mechanical system for keeping data between jobs. The medium used for storing the data can vary from magnetic tape to optical discs, and some devices allow the media to be easily removed and replaced. Most of these storage systems involve some kind of spinning disc. There is an elaborate scheme for keeping track of the data on a disk - the bytes are grouped into blocks, the blocks into files, the files into directories (or folders), and directories into partitions (or volumes). The user generally only sees files and above.
The Central Processing Unit
The central processing unit, or CPU is the heart of the computer. The CPU reads an instruction from memory (Instructions are bit patterns, just like anything else.), carries it out, and looks for the next instruction. The instructions are simple things like copy a value from memory. The CPU has its own memory locations called registers. Special hardware makes it possible to add or subtract the registers from each other. To add two numbers, the CPU must fetch the first number and put it in a register, fetch the other number and put it in another register, add the two registers, and put the result back into memory. Each of these operations requires an instruction.
Luckily the CPU can do all of this very quickly. The whole operation is controlled by an oscillator circuit called the system clock, which runs at millions of hertz (cycles per second). It would be simple to think one clock cycle means one instruction, but instructions vary in complexity, and take anywhere from 4 to 20 cycles to complete. Operations are further slowed down by the memory, which has trouble keeping up. Some CPUs have super high speed memory called cache where numbers that are needed a lot can be stored and retrieved more quickly.


Peripheral devices
The CPU communicates with memory via the address and data buss. To communicate with the rest of the world, other buses are used. (Places where external devices can be connected are sometimes called ports.) These busses may be shared or connected to a single device. They may serial or the multi wire type called parallel. Devices connected to the system are called peripherals; this includes keyboards, monitors, mice, graphics tablets, printers, MIDI systems and a lot more. Each has its own kind of data and electrical characteristics, but the connection at the port has to be standardized enough to allow interchange of similar devices. The following are the kinds of connections fond in various systems.

Parallel Port
This an old standard, originally designer for printers, so it's often called the printer port, although other things can be connected here and printers can be connected in other ways. As data ports go, this one is pretty slow.
This is a parallel buss designed for bulk data storage devices. This is usually hidden inside the box, since the connectors used aren't very strong. There are wires in the IDE buss that select which device is active, so the logical location of a device (drive A, B and so on) depends on which connector its on.

This is another type of parallel buss for bulk storage. It's a lot stronger mechanically than IDE, so it's often used between boxes. SCSI is an evolving standard that is periodically adapted to work at faster speeds. SCSI accommodates seven devices on a buss, and each must have a unique ID number set on its back panel.

This is a type of video connector. It's one of many, but the most common right now.

Comm Port
This is a type of serial port that has been around for decades. Another name for it is RS-232, which is the name of a technical document that describes how it should work. It's the slowest port of all. Only very simple devices are connected here.

One thing often found connected to a serial port is a modem, which is a box that converts data into tones that can be transmitted over the telephone. In many cases a modem is built into the computer, so the modem connection goes right to a phone line.

There are many systems designed to connect computers to each other. Ethernet is one of the most popular because it is very fast and relatively cheap to build. Computers don't connect directly to each other with Ethernet-- they go by way of a box called a hub or switch that allows several computers to talk on a party line. If there are only two, or to use Ethernet to connect a computer to a printer a special cable can be used without a hub.

USB is a new high speed serial system. It's supposed to accommodate up to 128 devices, and allows the devices to be connected without turning the power off. (Fussing with IDE or SCSI with the power on can damage things.)

Firewire, also known as IEEE 1394, is an even faster serial system. It's also more reliable than USB for a variety of reasons. There is a contest going on between firewire and SCSI to see which is faster. Firewire is definitely more convenient.

MIDI is a communications system designed for musical instruments. It is used to control other things, but music is the main thing. MIDI is discussed at great length elsewhere on this site.