350 likes | 598 Views
The problem of UWB interference? . Other users are upset (to say the least) about the potential of UWB transmissions in their bandUWB transmission at Part 15 spectral mask levels would severely degrade narrowband systems How low should the requirement be?Make the transmission level so low that i
E N D
1. Undetectable UWB Networks http://bwrc.eecs.berkeley.edu
2. The problem of UWB interference Other users are upset (to say the least) about the potential of UWB transmissions in their band
UWB transmission at Part 15 spectral mask levels would severely degrade narrowband systems
How low should the requirement be?
Make the transmission level so low that it is completely masked by kT thermal noise
3. Some numbers Received thermal noise level is -174dBm/Hz
Over a 20 MHz band (lets take 802.11a as an example) the rms noise is 101dBm.
The use of Part 15 EIRP levels would yield 91 dBm at 8 meters (6dB of antenna gain)
This gives a sensitivity loss of 10dB
No wonder they are unhappy!
(Bill McFarlin, Atheros Comm.)
4. One answer Transmit at power levels that do not degrade sensitivity (i.e. at levels so that the received signals at some distance are below the thermal noise level)
Then the question is what data rate can we support?
More numbers
5. More numbers Capacity(AWGN) = BW * ln(1+SNR)
If SNR << 1 then: Capacity = BW * SNR
For BW = 3 GHz
SNR = .1
Capacity = 300 Mbits/sec
(Not bad.)
OK is this achievable???
6. Realizing 300MBits/second Two issues
Are we in the overspread regime where too much bandwidth causes a loss in capacity
Is it implementable at low cost in a reasonable time frame?
7. The Overspreading Story (Tse) As the bandwidth increases the number of resolved multipath rays, L, increases proportionally L~ (BW)*T delay spread
The coherence time, Tcoh, tells us how long the multipath remains stationary
The energy available for estimation of the multipath components is Power*Tcoh
(These channel estimates are necessary to recover the energy using a rake receiver)
8. The overspreading problem Therefore the energy to noise ratio of each ray decreases with increasing bandwidth and thus becomes increasingly difficult to estimate
Multipath ray ENR = P*Tcoh / (L * No) = P*Tcoh / (No*BW*Td.s.)
~ 1/BW
9. When is it a problem? When happens when we get into an overspreading situation? (from Telatar and Tse)
Critical parameter a = PTcoh/(NoL)
P = Total power No= Additive noise
Tcoh = Coherence time L = # of resolvable paths (>>1)
If a << 1 then Capacity = a * CAWGN
What regime are we in???
10. Implementation Issues To achieve the 300 Mbit/sec limit we need to A/D at twice the total bandwidth or 6 Gsamples/second.
We then need to perform 100, 100 point correlations at a rate of 300 Msamples/sec or 3 *1012 operations/second
(100 rays in the rake, length 100 codes)
Is that really possible?
11. What are alternative architectures?
12. Direct mapping architectures
13. Energy and Area Efficiency Metrics Definition: MOPs
Millions of algorithmically defined arithmetic operations (e.g. multiply, add, shift, delay) in a GP processor several instructions per useful operation
Figures of merit
MOPs/mW - Energy efficiency
MOPs/mm2 - Area(cost) efficiency
What are these efficiencies for advanced CMOS using the most highly optimized architecture?
14. What can a fully parallel CMOS solution potentially do? In .13 micron a multiplier requires .02 mm2 and 3pJ per operation at 1 V. Adders and registers are about 10 times smaller and 10 times lower energy
Lets implement a 50mm2 chip using adders, registers and multipliers
We can have 10,000 adders/registers and 500 multipliers in about 1/2 of the chip, also assume 1/3 of power goes into clocks
100 MHz clock (1 volt) gives ~1000 Bops at 150mW
3000 Mops/mW and 20,000 Mops/mm2
15. Chip power and size What does it take to provide 3000 Bops?
Power = 3 x 1012 / 3000 Mops/mW = 1 Watt
Area = 3 x 1012 / 20,000 Mops/mm2 = 150 mm2
Reasonable .
16. Our Approach Transmission at below thermal noise levels
use matched filter with processing gain to improve SNR
Highly optimized analog and digital architectures
Design and implement radio at frequencies up to 60 GHz
17. System Modeling Develop models for antenna, packaging and CMOS circuitry appropriate for UWB (wideband vs. narrowband approximations)
Simulink model of complete baseline system to drive SShaft and Bee design flows
18. Transmitter Modeling
19. Receiver Modeling
20. Simulink System Simulation Transmit ideal Gaussian doublets modulated with a length 2200 code, whose chip time is 10ns at a 2MHz rate
Reception simple correlator
21. Undetectable UWB Real time waveforms in multipath with and w/o noise:
22. Undetectable UWB Correlation profiles in multipath with and w/o noise:
First arrival signal arrives at 103 ns
23. Undetectable UWB Correlation profiles in analog and digital domain after 3 bit A/D:
24. Analog Circuits - Pulse Reception
25. Proposed System Architecture As we only need to convert a window of time quickly, we might borrow the architecture used for a sampling oscilloscope. Essentially a bank of parallel sample and holds grab the signal at offsets. We then have the rest of the pulse repetition period to convert the result.
Essentially this brings the digital A/Ds about as close to the antenna as they can get. This has the advantage that only the front-end gain blocks and the S/H switches need to have a large BW. The A/D rate is lower, hence they are not constrained to fast operation (and hence larger power). Note that if we attempted an analog correlator, prior to the A/D (perhaps to attempt to mitigate the effect of interferers on the number of bits in the A/D we need), then we would need an array of additional integrators, all operating at 1GHz, plus we would have to generate the received pulse waveform and its offsets much more power.
Digital correlation is flexible. Could program up the MultAcc filter.
Problems with this front-end: Doesnt scale well to a wide window. Wind up with too many parallel S/Hs and A/Ds. (Could look at resource sharing).
Still need to design clock edges, gain and S/H tracking BW at 1GHz (although in 0.12um, this doesnt seem immensely difficult.)
I have only just started looking at the circuits, but as it now stands: Current Estimates need 24 S/H/A/Ds at 0.5ns spacing. (12ns window). A/D 4 to 6 bits. Gain ~ 45dB. NF, dont know, been using 10dB.As we only need to convert a window of time quickly, we might borrow the architecture used for a sampling oscilloscope. Essentially a bank of parallel sample and holds grab the signal at offsets. We then have the rest of the pulse repetition period to convert the result.
Essentially this brings the digital A/Ds about as close to the antenna as they can get. This has the advantage that only the front-end gain blocks and the S/H switches need to have a large BW. The A/D rate is lower, hence they are not constrained to fast operation (and hence larger power). Note that if we attempted an analog correlator, prior to the A/D (perhaps to attempt to mitigate the effect of interferers on the number of bits in the A/D we need), then we would need an array of additional integrators, all operating at 1GHz, plus we would have to generate the received pulse waveform and its offsets much more power.
Digital correlation is flexible. Could program up the MultAcc filter.
Problems with this front-end: Doesnt scale well to a wide window. Wind up with too many parallel S/Hs and A/Ds. (Could look at resource sharing).
Still need to design clock edges, gain and S/H tracking BW at 1GHz (although in 0.12um, this doesnt seem immensely difficult.)
I have only just started looking at the circuits, but as it now stands: Current Estimates need 24 S/H/A/Ds at 0.5ns spacing. (12ns window). A/D 4 to 6 bits. Gain ~ 45dB. NF, dont know, been using 10dB.
26. Power Budget for a Low Power version
27. How do we do the digital design? New Software:
Generation of netlists from Simulink
Merging of floorplan from last iteration
Automatic routing and performance analysis
Automation of flow as a dependency graph (like the UNIX MAKE program) We found that in order for this to happen, we had to write a lot of new software. First, we wrote software to translate data-flow graphs from Simulink, our chosen editor, to an electronic design format. This elaboration step must also invoke macro generators and stitch everything into a netlist of routable objects. Next, we wrote programs to merge placement information from the floorplan views with the netlist, creating autoLayout views. Physical designers modify these autoLayout views and save them as floorplans for the next iteration. We also wrote programs which automatically route, verify, and characterize the design. Lastly, we described our design flow as a dependency graph and created a tool much like the UNIX MAKE program to automate it.We found that in order for this to happen, we had to write a lot of new software. First, we wrote software to translate data-flow graphs from Simulink, our chosen editor, to an electronic design format. This elaboration step must also invoke macro generators and stitch everything into a netlist of routable objects. Next, we wrote programs to merge placement information from the floorplan views with the netlist, creating autoLayout views. Physical designers modify these autoLayout views and save them as floorplans for the next iteration. We also wrote programs which automatically route, verify, and characterize the design. Lastly, we described our design flow as a dependency graph and created a tool much like the UNIX MAKE program to automate it.
28. Simulink for Design Entry Simulink is an easy sell to algorithm developers
Closely integrated with popular system design tool Matlab
Successfully models digital and analog circuits Why did we choose Simulink as our data-flow graph editor? In short, Simulink is an easy sell to algorithm developers. This is primarily because it is closely integrated with Matlab which is popular among algorithm experts and system designers. If were serious about getting our algorithm people to create layout, then we need to make it as easy as possible for them to approach our environment. Furthermore, we have successfully modeled a variety of digital data-paths with Simulink as well as co-simulating them with models of analog circuits. Thus, we know that Simulink is sufficient for the kinds of wireless baseband algorithms which are of most interest to us. This simple example of a time-multiplexed FIR filter illustrates how we use Simulink. Here we see a multiply-accumulate block being fed by an input data stream and tap coefficients from an SRAM and control logic.Why did we choose Simulink as our data-flow graph editor? In short, Simulink is an easy sell to algorithm developers. This is primarily because it is closely integrated with Matlab which is popular among algorithm experts and system designers. If were serious about getting our algorithm people to create layout, then we need to make it as easy as possible for them to approach our environment. Furthermore, we have successfully modeled a variety of digital data-paths with Simulink as well as co-simulating them with models of analog circuits. Thus, we know that Simulink is sufficient for the kinds of wireless baseband algorithms which are of most interest to us. This simple example of a time-multiplexed FIR filter illustrates how we use Simulink. Here we see a multiply-accumulate block being fed by an input data stream and tap coefficients from an SRAM and control logic.
29. Example 1: Test Chip 300k transistors
0.25 mm
1.0 V
25 MHz
6.8 mm2
14 mW
2 phase clock
3 layers of P&R hierarchy Here is a die photo of the first test-chip made with our flow. It is a version of the parallel pipelined FIR filter shown in the last slide using the hierarchical floorplan shown earlier. The design has 3 layers of routing hierarchy. This was more layers than necessary, but it allowed us to exercise our hierarchical place & route flow more thoroughly. The chip has 300,000 transistors and consumes 14 mW at 25 MHz. This chip demonstrates our entire methodology except for race-immune clock-tree synthesis. A 2-phase clock was used to avoid race problems. The low ratio of transistors to area is due to the excessive detail of the floorplanning. Later versions of the flow allow selective flattening of the hierarchy to improve density.Here is a die photo of the first test-chip made with our flow. It is a version of the parallel pipelined FIR filter shown in the last slide using the hierarchical floorplan shown earlier. The design has 3 layers of routing hierarchy. This was more layers than necessary, but it allowed us to exercise our hierarchical place & route flow more thoroughly. The chip has 300,000 transistors and consumes 14 mW at 25 MHz. This chip demonstrates our entire methodology except for race-immune clock-tree synthesis. A 2-phase clock was used to avoid race problems. The low ratio of transistors to area is due to the excessive detail of the floorplanning. Later versions of the flow allow selective flattening of the hierarchy to improve density.
30. Example 2: CDMA Baseband Receiver 500k transistors
0.18 mm
1.0 V
25 MHz
1.1 mm2
21 mW
single phase clock
5 clock domains
2 layers of P&R hierarchy A complete baseband receiver chip which exercises the flow more thoroughly is scheduled to be taped out within the next month. The design includes 3 Module Compiler macros: a carrier detection macro to recover coarse timing, a frequency estimation block to achieve fine timing, and a rotate and correlate block with a phase locked loop to coherently provide soft symbols. The design also features control logic generated from Stateflow. This design achieves a greater density than the test chip by having only 2 layers of routing hierarchy. This design also has a single phase clock with 5 domains, allowing the clock to be switched off when not in use to save power .A complete baseband receiver chip which exercises the flow more thoroughly is scheduled to be taped out within the next month. The design includes 3 Module Compiler macros: a carrier detection macro to recover coarse timing, a frequency estimation block to achieve fine timing, and a rotate and correlate block with a phase locked loop to coherently provide soft symbols. The design also features control logic generated from Stateflow. This design achieves a greater density than the test chip by having only 2 layers of routing hierarchy. This design also has a single phase clock with 5 domains, allowing the clock to be switched off when not in use to save power .
31. Test Bed for System Verification Goal: Evaluation of algorithms
Approaches
Simulation inexpensive, but slow and inaccurate (Simulink)
HW Prototyping accurate, but slow and expensive (Sshaft)
HW Emulation fast, accurate, & inexpensive (Bee)
Our HW Emulator is called BEE
32. Whats BEE? A real time hardware emulator built with multiple high density Field Programmable Gate Arrays (FPGAs) ~ 5,000 Bops
Designed to directly emulate the digital portion of the chip and interface with analog front-ends.
Fully automated design flow from Simulink to FPGA configuration bit stream.
33. BEE Architecture Processing Board
Total 20 Xilinx VirtexE 2000 chips, 16 for processing, 4 for interchip routing; 16 ZBT SRAM chips, 1MB each.
Control module
Intel StrongARM 1110, on board 10 Base-T Ethernet, Linux OS
Radio Rx/Tx Boards
UWB transceiver 1 GHz 4 bit A/D, ECL edge generator, discrete amplifiers
Design Flow
From Simulink MDL to FPGA bit stream
34. BEE Processing Board
35. BEE UWB Front-End 1 1.5 GHz A/D (4-8 bits)
1 ns pulse generation
Modular front-end for evaluation to evolve over time
36. Conclusions Test circuits analog and digital to test out undetectable UWB operation
Bee test bed fast evaluation of algorithms for base band processing
Evaluation of UWB and frequency bands ranging up to 60 GHz
Most importantly need to support a UWB approach which no one can logically object to because they cant detect it!
http://bwrc.eecs.berkeley.edu In conclusion, we believe that the way to realize the benefits of direct-mapped architectures is by giving system designers the means to create layout and explore the design space. With this approach, the phases of the design process are determined by which part of the hierarchy is currently being hardened, rather than by which designers expertise is currently being used. The focus on low supply voltages reduces the impact of interconnect and makes the design flow easier to automate. And lastly, we have demonstrated that by expressing our design flow as a dependency graph and writing a few pieces of new software, getting layout from a system level description within a day is feasible. This concludes my talk. Thank you for your attention.In conclusion, we believe that the way to realize the benefits of direct-mapped architectures is by giving system designers the means to create layout and explore the design space. With this approach, the phases of the design process are determined by which part of the hierarchy is currently being hardened, rather than by which designers expertise is currently being used. The focus on low supply voltages reduces the impact of interconnect and makes the design flow easier to automate. And lastly, we have demonstrated that by expressing our design flow as a dependency graph and writing a few pieces of new software, getting layout from a system level description within a day is feasible. This concludes my talk. Thank you for your attention.