JTAG explorer toy
- 1 What is this, and, why..?
- 2 How JTAG works (as I understood it)
- 3 JTAG-toy hardware
- 4 Software
- 5 The actual tests
What is this, and, why..?
This is of course bad, because...
JTAG is THE SHIT -- like Batman!
...or well, it's nice anyway.
I interpreted the ATmega32-datasheet and (quite) some on-line docs as best as I could; however, if there's a bug/flaw somewhere, let me know please.
Note that this thing is totally useless, so questions/comments regarding this issue will be ignored. Xmas-season, too much time, so there :-)
The idea for me was to make a simple toy-board with 2 MCU's on it -- one acting as JTAG host/master, and the other being JTAG-victim.
The PC talks to the master and slave through a serial protocol, to read/set pins, and so initiate JTAG-actions; master talks to slave only through its JTAG-port.
The master itself is not JTAG-enabled, but drives/reads the slave's JTAG-port I/O-pins.
What I would like to see
As I understood, JTAG offers a nice 'backdoor' into a (slave-)chip's state, and so that's what I would like to see; I'd like to...
- put it in, and take it out of reset (reset/suspend/resume)
- decouple core logic from I/O pins, and read/set them from boundary cells instead
- and some more, basically toy with it.
How JTAG works (as I understood it)
JTAG normally uses a single master/host to read/set states of one or more chips. There may be separate (JTAG) communication-channels between the host and each chip, or chips can be daisy-chained, so that the system's interface is kept simple and small.
Of course this is not complete; for more details, browse the lovely Internet.
Communication comes down to selecting which data to operate on, and then reading/writing that data.
In a 1-chip JTAG-setup, the chip contains an instruction-register and a number of task-specific data-registers. The host can shift bits into each register; when a bit is shifted in, a bit falls out at the other end. Shifting always occurs in the same direction.
The host selects which data-register is active by entering an instruction-code, and then operates on the corresponding data-register. Writing/reading of both instruction- and data-registers occurs in the same way, by shifting bits in/out.
This works like above, except the complete system (more chips together) can be viewed as a number of big registers (where different parts/offsets in a register may live on different chips).
A chip offers access to its JTAG-subsystem through a port (more about this in another section), consisting of 4 (or 5) I/O-lines:
- TCK (Test ClocK) will be used to clock bits in/out of the device, and initiate mode-change.
- TMS (Test Mode Select) is used to traverse through the TAP-controller's state-machine. It is sampled on rising edge of TCK. Values in the state-machine diagram indicate values for TMS.
- TDI (Test Data In) contains values to be shifted into the device. It is sampled on rising edge of TCK.
- TDO (Test Data out) will contain bits shifted out of the device. It changes on falling edge of TCK.
- TCK is used for every single action,
- TMS is used to explicitly navigate, or stay put, in the TAP-controller's state-machine; the state machine always applies, so TMS can never be ignored,
- TDI and TDO are only relevant when actually shifting bits in/out of the chip. Bits shifted in or out of the chip are always passed via these 2 lines.
See Fig.5 for an idea about timing of these signal.
The TDI- and TDO-lines may be used to daisy-chain chips together -- chip #1's TDO is connected to chip #2's TDI. The host then shifts bits into chip #1's TDI, and catches bits falling out of the TDO of the last chip in the chain.
In such a daisy-chain, all chips share the TCK- and TMS-lines. Although sharing a clock may be normal, the common TMS-line was quite surprising to yours truly. But it still makes sense!
The TAP (Test Access Port) controller is part of every JTAG-enabed chip, and basically implements a state machine. The TMS-line navigates between states on each rising edge of TCK.
Instructions and data
As mentioned before, the TAP-controller has 1 (fixed-size) instruction-register and multiple, arbitrarily-sized data-registers. Although there are many data-registers, only one can be operated on at a time.
Poetic sidenote: one of the beautiful things, IMHO, about JTAG is that the chip 'connects' one of its data-registers between TDI and TDO on request. Subsequent operations then take place on that DR. This is no different than address-/data-selection (in that order) elsewhere, but this 'connecting' actually makes the bit-path through the chip (from TDI to TDO) shorter and longer on request!
See Fig.4: each state leads to 2 states (one of which may be itself); the selection is done by the value of TMS at the next clock pulse.
For example, to write a specific register, the following must be done (in 'pseudo-actions' -- see the notes about 'last bit', below):
- navigate to state 'Shift-IR' to clock in instruction-bits, by clocking in the right sequence of values from TMS,
- clock in all instruction-bits,
- update the instruction-register by moving to state 'Update-IR',
- navigate to state 'Shift-DR' to clock in data-bits,
- clock in all data-bits.
Some states 'do 1 thing, once', e.g. have the chip fill a parallel shift-register to be subsequently shifted out on TDO, but other states 'keep on doing something' during multiple TCK-cycles. To keep them in that state, TMS must be set accordingly.
A state-transition takes place after the data shifted in at that clock-pulse, if relevant, is processed. In other words, bits shifted in always apply to the current state, not the possibly new one indicated by TMS at that clock-pulse. So, if N bits must be shifted in at a state, this is how it's done:
- enter the state by setting TMS and raising TCK,
- clock the 1st (N-1) bits in, keeping TMS so that the state is not left,
- set TMS to navigate to the next state, and clock the last bit in.
Analogously, for a positive clock pulse, the bit falling out out at TDO applies to the new state, if there was a state-change at the rising edge of TCK.
JTAG can be used to read/stimulate I/O at a chip's pins. The chip's core logic can be decoupled from its I/O-pins as well; see Fig.3 for an idea: 'boundary-scan cells' sit in between the chip's core logic and the actual pin.
A chain of these cells ('boundary-scan chain') may be selected to sit between TDI and TDO by issueing the proper instruction, so that current/new values to/from the outside world or the chip's core logic can be shifted in or out.
Together with daisy-chaining multiple chips, this can give the developer a system where chips ('inside the socket') and e.g. PCB-tracks (between sockets) can be tested, with only 4 pins dedicated for this purpose!
So ok, enough about that -- time to solder! As mentioned before, the proto-board to play with all this technology will have 2 MCU's: a JTAG-slave and a JTAG-master. The PC initiates commands to either MCU through a serial link (at a whopping 1200 bps, plenty of time to handle commands -- CBA to write proper interrupt-handlers and do proper buffering ;-)
Wiring and MCU-overkill
The slave-MCU is a big ATmega32; although we will only use about 10 I/O pins, it was the only JTAG-capable Atmel MCU I had. So there. For master I used the wonderful ATtiny2313.
As can be seen in the schematics in Fig.6, for performing actual tests using boundary-scan chain, there are dedicated fixed-direction I/O-lines...
- 'AB', running from master ('A') to slave ('B'),
- 'PQ', running from slave ('P') to master ('Q'),
- 'XY', running from slave ('X') to itself ('Y').
These lines have LEDs so I can see what's going on. Apart from the 4 other lines to the slave's JTAG-port, the master also controls the slave's reset-line.
From an idea by MHL, both master and slave share the Tx- and Rx-lines to a MAX232 receiver/driver. Both always receive, but only one can/should transmit at a time.
Since communication between PC and board is done in a query/response fashion, the MCU for which a query was intended, enbles its transmitter from software; its Tx-line then changes from hi-Z to active, and it sends its reply. The master and slave have a totally disjunct command-set for this to work.
There are 3 'softwares' here: 2 run on master- and slave-MCU, and one runs on the PC. The PC software is really not necessary for anything, except that I got a bit mad trying to use minicom sensibly.
The software for both master- and slave-MCU's is quite similar; they both implement a simple command-interpreter. No interrupts are used at all.
Basically, command-bytes sent from the host are read from serial port and are processed immediately; on error, chars until next newline are eaten. Upon receiving a newline after a successful command, a reply is sent back to the host. This basically means that, for a long/compound command, stuff actually happens when you type it, not when you press Enter.
See the source if interested; basically it handles the following commands:
read I/O-pin status, where Pin is one of 'a' (outgoing end of test-line 'AB'), 'q' (incoming end of test-line 'PQ'), 'r' (slave-reset pin), or one of 'c', 'm', 'i' or 'o', for JTAG-pins TMC, TMS, TDI and TDO, respectively; State is one of '0' (low) or '1' (high).
write I/O pin status, where Pin can be one of 'm', 'c', 'i', 'r' or 'a' as in the previous command. Note that you can read more pins that you can write. State, Oldstate and Newstate as State in the previous command.
|j(([mi]State)|^)*||m> state* ok||
JTAG (TAP-lines) command. Meh... '(', ')', '[', ']' and the Kleene-star '*' are meant as regexp symbols. Use letters for pins, as described above; use '^' to emit a positive pulse on TCK. The value at TDI is clocked in at the rising edge; values of TDO at each falling edge edge are collected and output as State*, in order of clock pulses.
|#.*||(nothing)||Comment; everything up to and including the next newline char(s) will be eaten/ignored.|
Note that the master controls the slave's TAP-lines and reset-line. It's so friggin' simple I'll stop before embarassing myself even more.
Some examples (MCU-reply not shown):
# Negative pulse on slave's (low-active) reset-line wr0 wr1 # Read incoming end of test-line 'PQ' rq # Enter some imaginary state (0->1 on TMS-line), # clock 3 bits ('010') over TDI, and leave state (1->0 on TMS-line). # Note that we clock the last bit *while* leaving the state. jm1^ ji0^i1^ jm0i0^ # Some other way to write that last sequence: jm1^i0^i1^m0i0^
Equally boring; see C'n'P source fore details. It's basically a stripped version of the master's code. The implemented command-set is also very similar:
See description Master command-set; Pin can be one of 'b' (incoming end of test-line 'AB'), 'p' (outgoing end of test-line 'PQ'), and 'x' and 'y' for incoming, respectively outgoing ends of test-line 'XY'.
See description Master command-set; Pin can only be 'p' or 'x'.
Note that the slave cannot read its own TAP-lines. I often used 'rb' to see if the slave is alive or not. Since the slave can read/write lines like the master, and nothing else, no examples here.
Serial terminal emulator
This runs on PC, and is a sorry excuse for a terminal, really. The examples (with MCU-replies) on this page are all coming from this program. It does the following:
- read line from stdin
- echo line to stdout and send over serial port
- sleep 500 ms
- receive all pending incoming stuff from serial port and echo to stdout
- goto 1 until EOF
An example of use:
sh$ cat write_this.txt # On master: read slave's reset-pin status rb # On slave: write '0' to pin 'X' (test-line 'XY') wx0 # On master: clock 4 '1's in current TAP-state, and catch output j^^^^ sh$ ./wr < write_this.txt # On master: read slave's reset-pin status rb m> b:1 # On slave: write '0' to pin 'X' (test-line 'XY') wx0 s> x:1->0 # On master: clock 4 '1's in current TAP-state, and catch output j^^^^ m> 1010 ok sh$
The actual tests
Ok, since this was a useless project (remember..?), we're satisfied if JTAG works. Since it already does, we are satisfied already, but we still like to see it with eyes.
Wussie basic tests
Ok, not that interesting; we'll just test slave-reset and all test-lines. Lo and behold (all tested using the lovely terminal-emulator mentioned earlier)...
# Read a pin on the slave; verify it's alive rb s> b:1 # Lower slave's reset-pin and verify it's dead wr0 m> r:1->0 rb # (slave does not answer anymore) # Raise reset-pin again and behold, slave lives again wr1 m> r:0->1 rb s> b:1
# Raise outgoing end, and read on both sides of the line wa1 m> a:1->1 # (ok, so it was already high, who cares) ra m> a:1 rb s> b:1 # Lower outgoing end, and read both ends again wa0 m> a:1->0 ra m> a:0 rb s> b:0
(Similar for test-lines 'PQ' and 'XY'.) Goodie, everything works. It's also nice to see our shared Tx-line also works :-)
Ok, now let's do some real work! BTW, if you spot a flaw, feel free to mail me and tell me all about it.
There is the 'AVR_RESET' instruction (0x0c), placing a 1-bit register between TDI and TDO. Bits shifted in either reset' ('1') or unreset ('0') the MCU. We try both:
$ ./wr < reset_and_unreset_slave.scr # # Reset and unreset MCU # # reset FSM jm1^^^^^ m> 11111 ok # observe, slave still responds to commands: rb s> b:0 # enter ShiftIR jm0^m1^^m0^^ m> 11111 ok # write AVR_RESET (0x0c) instruction ji0^^i1^m1^ m> 0001 ok # update IR and move to ShiftDR j^m0^m1^m0^^ m> 11110 ok # shift a reset-bit in DR (maintain state) ji1^ m> 1 ok # ...and slave is dead/silent now: rb # (there should be no reply, anyway) # shift a reset-clear bit in DR, leave state ji0m1^ m> 1 ok # behold, slave lives again rb s> b:0 # (there should be a reply) $
Reading the chip's ID-code
It's always very reassuring (to me) to do something which should have a certain effect, and then seeing this effect. The 'IDCODE' instruction (0x01) is ideal for that: a 32-bit register is placed between TDI and TDO, containing chip-specific information.
For the ATmega32, this information should be:
|part number:||ATmega32 = 0x9502|
|manufacturer ID:||ATMEL = 0x01f|
Let's see if it works:
$ ./wr < read_idcode.scr # # Read ID-code from device. # # reset TAP-controller jm1^^^^^ m> 11111 ok # enter Shift-IR state jm0^m1^^m0^^ m> 11111 ok # write 'IDCODE' instruction (0x01) and leave state ji1^i0^^m1^ m> 0001 ok # enter Shift-DR state and read IDCODE<0> (fixed '1') jm1^m0^m1^m0^^ m> 11111 ok # read IDCODE<1..11> (manufacturer ID); # it should read 0x01f, for 'atmel' j^^^^ m> 1111 ok j^^^^ m> 1000 ok j^^^ m> 000 ok # read IDCODE<12..27> (part number); # it should read 0x9502 (ATmega32) j^^^^ m> 0100 ok j^^^^ m> 0000 ok j^^^^ m> 1010 ok j^^^^ m> 1001 ok # read IDCODE<28..31> (JTAG-version, '1'='A') j^^^ m> 100 ok $
Goodie, everything works! Note that output-bits are printed in order of appearance, i.e. reversed. (e.g. '0100' means '0010b', means 2)
So ok, one of the really beautyful things, once again, is the varying datapath length through the chip. We will now make a 1-cell datapath using the 'BYPASS' instruction (0x0f), which is there just for this purpose :-)
Since the data-register consists of only 1 cell, this means whatever is shifted in at the TDI-end, falls right out at TDO, with a delay of 1 clock-cycle. Or rather, since bits are clocked in at rising edge, and clocked out at falling edge of TCK, this means we'll see whatever we shifted in appear at the other end, when using our 'j'-command:
$ ./wr < bypass.scr # # Put device in bypass-mode, shift a pattern in, # and see it being output in corresponding falling # clock-edges right away. # # reset TAP-controller jm1^^^^^ m> 11111 ok # enter Shift-IR state jm0^m1^^m0^^ m> 11111 ok # write 'bypass' instruction (0x0f) and leave state ji1^^^m1^ m> 0001 ok # enter Shift-DR state jm1^m0^m1^m0^^ m> 11110 ok # input a pattern ('0000011111', in order of writing), # observe it passing through the chain, and leave state ji0^^^^^i1^^^^m1^ m> 0000011111 ok $
Setting pin from software, reading from JTAG
Ok, let's drive pins low/high from software, and read them back using JTAG. This can be done by selecting the boundary-chain register to sit in between TDI and TDO, and then shifting its contents out.
The 'SAMPLE/PRELOAD' instruction (0x02) is used to take a snapshot of the external pins (in state Capture-DR), and loading latches from boundary-chain cells (in state Update-DR). The latches are still decoupled from output-pins using this instruction - the 'EXTEST' instruction can be selected to actually drive the pins with preloaded values.
The boundary-chain register itself is big, BTW - a whopping 140 bits. Many cells are reserved for the ADC circuitry, which was quite surprising to yours truly. For each I/O-pin, 'pull-up enable' ('PUE'), 'output control' ('OC') and 'output data' ('OD') bits occur in the boundary-chain.
The interesting offsets in the chain are as follows:
To set pins 'A', 'P' and 'X' (outgoing sides of all 3 lines between master and slave) low:
$ ./wr < set_pins_low.scr # Set all pins ('A' on master-side, 'P' and 'X' on slave-side) low wa0 m> a:1->0 wp0 s> p:1->0 wx0 s> x:1->0 $
To verify/read pin-states from JTAG ('BSC' = boundary-chain cells):
$ ./wr < read_pins.scr # # Read pins 'P', 'B', 'X' and 'Y'. # # reset TAP-controller jm1^^^^^ m> 11111 ok # enter Shift-IR state jm0^m1^^m0^^ m> 11111 ok # write 'SAMPLE/PRELOAD' instruction (0x02) and leave state ji0^i1^i0^m1^ m> 0001 ok # enter Shift-DR state and read BSC<0> (irrelevant) jm1^m0^m1^m0^^ m> 11110 ok # read BSC<1..37> (irrelevant) j^^^^^^^^^^ m> 0100100100 ok j^^^^^^^^^^ m> 1001001001 ok j^^^^^^^^^^ m> 0010000100 ok j^^^^^^^ m> 1001001 ok # read BSC<38..40> ('Y' PUE/OC/OD) j^^^ m> 000 ok # confirm by reading same pin from software (slave-side only) ry s> y:0 # read BSC<41..43> ('X' PUE/OC/OD) j^^^ m> 010 ok # confirm by reading same pin from software (slave only) rx s> x:0 # read BSC<44..46> (irrelevant) j^^^ m> 001 ok # read BSC<47..49> ('P' PUE/OC/OD) j^^^ m> 010 ok # confirm by reading same pin from software (slave-, then master-side) rp s> p:0 rq m> q:0 # read BSC<50..52> (irrelevant) j^^^ m> 001 ok # read BSC<53..55> ('B' PUE/OC/OD) j^^^ m> 000 ok # confirm by reading same pin from software (master-, then slave-side) ra m> a:0 rb s> b:0 $
Pins 'X' and 'P' show 'OC=1', which makes sense, since they are the only output-pins. All pins show low state (OD=0).
To set pins high from software:
$ ./wr < set_pins_high.scr # Set all pins ('A' on master-side, 'P' and 'X' on slave-side) high wa1 m> a:0->1 wp1 s> p:0->1 wx1 s> x:0->1 $
...and to verify they are indeed seen as 'high' from JTAG:
$ ./wr < read_pins.scr # # Read pins 'P', 'B', 'X' and 'Y'. # # reset TAP-controller jm1^^^^^ m> 11111 ok # enter Shift-IR state jm0^m1^^m0^^ m> 11111 ok # write 'SAMPLE/PRELOAD' instruction (0x02) and leave state ji0^i1^i0^m1^ m> 0001 ok # enter Shift-DR state and read BSC<0> (irrelevant) jm1^m0^m1^m0^^ m> 11110 ok # read BSC<1..37> (irrelevant) j^^^^^^^^^^ m> 0000100100 ok j^^^^^^^^^^ m> 1001001001 ok j^^^^^^^^^^ m> 0010000100 ok j^^^^^^^ m> 1001001 ok # read BSC<38..40> ('Y' PUE/OC/OD) j^^^ m> 001 ok # confirm by reading same pin from software (slave-side only) ry s> y:1 # read BSC<41..43> ('X' PUE/OC/OD) j^^^ m> 011 ok # confirm by reading same pin from software (slave only) rx s> x:1 # read BSC<44..46> (irrelevant) j^^^ m> 001 ok # read BSC<47..49> ('P' PUE/OC/OD) j^^^ m> 011 ok # confirm by reading same pin from software (slave-, then master-side) rp s> p:1 rq m> q:1 # read BSC<50..52> (irrelevant) j^^^ m> 001 ok # read BSC<53..55> ('B' PUE/OC/OD) j^^^ m> 001 ok # confirm by reading same pin from software (master-, then slave-side) ra m> a:1 rb s> b:1 $
Note that directions of relevant pins ('B', 'P', 'X' and 'Y') are still as they were, but all pins show high state now (OD=1)!
Setting pins from JTAG, reading from software
Aargh... I cannot seem to get this working. I am using the 'EXTEST' instruction to drive preloaded latch-values to output pins, but it just doesn't seem to work.
Meanwhile, it works! - sort of. At this point yours truly is slightly drunk, listening to Johnny Cash, but still able enough to at least verify some pins change from hi to lo:
$ ./wr < write_pin_P.scr # # Toggle pin P from JTAG, read back from software on master/slave # # set pin P high from software, verify on other end wp1 s> p:0->1 rq m> q:1 # reset TAP-controller jm1^^^^^ m> 11111 ok # enter ShiftIR jm0^m1^^m0^^ m> 11111 ok # enter 'EXTEST' instruction (0x00) to drive pins, and leave state ji0^^^m1^ m> 0001 ok # enter Shift-DR state jm1^m0^m1^m0^^ m> 11110 ok # clock a shitload of bits in (and out), and leave state ji0 m> ok j^^^^^^^^^^ m> 0100100100 ok j^^^^^^^^^^ m> 1001001001 ok j^^^^^^^^^^ m> 0010000100 ok j^^^^^^^^^^ m> 1001001000 ok j^^^^^^^^^^ m> 0000010000 ok j^^^^^^^^^^ m> 0100000100 ok j^^^^^^^^^^ m> 0001100010 ok j^^^^^^^^^^ m> 1100100100 ok j^^^^^^^^^^ m> 1001001001 ok j^^^^^^^^^^ m> 0010000101 ok j^^^^^^^^^^ m> 1000100000 ok j^^^^^^^^^^ m> 0001000100 ok j^^^^^^^^^^ m> 0000000100 ok j^^^^^^^^^^ m> 0000110010 ok jm0^ m> 0 ok # enter Shift-DR to capture jm1^m0^m1^m0^^ m> 11100 ok # see what software says now rp rq m> q:0 # Q is low! hurrah! :-) $
Note that at this point, slave seems dead (this might be because oscillator is disabled), but master reads low voltage on incoming end of line 'PQ'! I really don't care anymore at this point. This is what I have been trying to achieve for days -- THE END :-)
Therefore, from slight disappointment, this experiment turned into glamorous success! :-)
Have fun -- Michai