Echoes and Gradients: The Hidden Unity Between Radar Imaging and Deep Learning

At first glance, a passive radar system and a Deep Neural Network (DNN) seem to inhabit different worlds. One deals with wave physics, antennas, and scattering coefficients; the other deals with tensors, activation functions, and gradient descent.

However, if you strip away the domain-specific jargon, you find that the algorithm used to form an image from radar echoes (Back-Projection) and the algorithm used to train a neural network (Back-Propagation) are, at their core, the same mathematical operation.

They are both applications of the Adjoint Operator to solve an Inverse (Reconstruction) Problem.

In this post, we will explore this duality, using the specific example of a passive imaging system to illuminate the inner workings of deep learning.

1. The Forward Pass: Scattering vs. Inference

To understand the backward process, we must first look at the forward process—how data generates an output.

1.1. The Imaging System

Imagine a passive radar system with multiple receivers (Rx) to detect active targets.

Transmission: The transmission process follows the physics of electromagnetic wave propagation, which can be modeled by the Green’s function $\mathcal{H}(\mathbf r)$ . In this article, we use the 2D free-space Green’s function: $\mathcal{H}(\mathbf r) = \dfrac{1}{\sqrt{j\pi |\mathbf r|}}\exp(j2\pi|\mathbf r|/\lambda)$ where $\lambda$ is the wavelength, and $|\mathbf r|$ is the distance from the transmitter to the observation point.
Target: The target emits an electromagnetic wave, which propagates through space, denoted as the $s(\mathbf r)$ .
Recieption: The receivers collect the waves from the target. The received signal $g(\mathbf{r})$ is a convolution of the transmitted wave and the Green’s function: $g(\mathbf r) = s(\mathbf r) * \mathcal{H}(\mathbf r)$ where $*$ denotes convolution over space $\mathbf r$ .

Following is an interactive demo of Green’s function and the conjugate Green’s function.

1.2. The Neural Network

Now, look at a single connection in a neural network between an input neuron and an output neuron.

Input: An activation value passes from the previous layer.
Interaction: The value is multiplied by a Weight $\theta$ .
Output: The result contributes to the next layer.

Mathematically, the output $\hat{y}$ is the input $x$ modulated by the model weights $\theta$ :

\hat{y} = \pi_\theta(x)

The loss function $\mathcal{L}(\hat{y}, y)$ measures the difference between the predicted output $\hat{y}$ and the true label $y$ . Our goal is to minimize this loss by adjusting the weights $\theta$ .

\min_{\theta} \sum \mathcal{L}(\pi_\theta(x), y)

2. The Inverse Problem: “Who is Responsible?”

In both fields, we face an inverse problem:

2.1. Imaging: Time Reversal

How do we find the target? We take the signal received at the Rx and perform Time Reversal. We mathematically “play the tape backward.”
We project the received signal back into space, applying a negative time delay corresponding to the distance from Rx to the target, and then from the target to Tx.

2.1.1. Back-Proprogation in Imaging

Now, the imaging problem: we have the received data $g(\mathbf r)$ , and we want to find the target $s(\mathbf r)$ .How do we do it? We mathematically “propagate” the received waves back into space to see where they intersect. This is often called Matched Filtering or Back-projection. Mathematically, we convolve the received signal with the complex conjugate of the Green’s function, $h^*(\mathbf r)$ :

\begin{aligned} \hat s(\mathbf r) &= g(\mathbf r) * \mathcal{H}^*(\mathbf r) \\ &= \left[ s(\mathbf r) * \mathcal{H}(\mathbf r) \right] * \mathcal{H}^*(\mathbf r) \\ &= s(\mathbf r) * \left[ \mathcal{H}(\mathbf r) * \mathcal{H}^*(\mathbf r) \right] \\ &= s(\mathbf r) * \delta(\mathbf r) \\ &= s(\mathbf r) \end{aligned}

The physical meaning of $\mathcal{H}^*(\mathbf r)$ is that we are reversing the wave propagation process. By convolving with the conjugate Green’s function, we assume the sensor behaves like a transmitter in reverse, sending a signal back through the same medium.

It is notable that this is an ideal model; in practice we can not perfectly capture the whole received signal in space, and noise will be present. But the core idea remains: by back-projecting the received signals using the conjugate of the forward operator, we can reconstruct the target location.

2.2. Deep Learning: Gradient Descent

How do we find the optimal weight? We can use Gradient Descent. However, calculating the gradient of the loss function with respect to each weight directly is computationally expensive. Instead, we use the Back-Propagation algorithm.

Consider a simple neural network with $N$ layers:

Input tensor: $\alpha^{(0)}=x$ .
Parameters at layer $l$ : Parameters matrix $W^{(l)}$ and bias vector $b^{(l)}$ .
Pre-activation at layer $l$ : $z^{(l)} = W^{(l)} \alpha^{(l-1)} + b^{(l)}$ .
Nonlinear activation at layer $l$ : $\alpha^{(l)} = \sigma(z^{(l)})$ .
Output tensor: $\hat{y} = \alpha^{(N)} = \pi_\theta(x)$ .

Then, the loss function is $L_\theta(\hat{y}, y) = \mathcal{L}(\alpha^{(N)}, y)$ . And we want to compute the gradient of the loss $\nabla_\theta \mathcal{L}$ which contains $\frac{\partial \mathcal{L}}{\partial W^{(l)}}$ and $\frac{\partial \mathcal{L}}{\partial b^{(l)}}$ for all layers $l$ .

BP introduces intermediate variables called Error Signals $\delta^{(l)}$ , defined as:

\delta^{(l)} \equiv\dfrac{\partial \mathcal{L}}{\partial z^{(l)}}

Error signal at the output layer $N$ :
Applying the chain rule, we can express the error signal at layer $l$ in terms of the error signal at layer $l+1$ :
$\frac{\partial \mathcal{L}}{\partial z_i^{(N)}} = \sum_j \frac{\partial \mathcal{L}}{\partial a_j^{(N)}} \frac{\partial a_j^{(N)}}{\partial z_i^{(N)}}$
Because the activation function $\sigma$ is applied element-wise, we have:
$\frac{\partial a_j^{(N)}}{\partial z_i^{(N)}} = \sigma'(z_i^{(N)}) \delta_{ij}$
where $\delta_{ij} = 1$ if $i=j$ and $0$ otherwise. Therefore, we can simplify the expression for the error signal:
$\delta_i^{(N)} = \frac{\partial \mathcal{L}}{\partial z_i^{(N)}} = \frac{\partial \mathcal{L}}{\partial a_i^{(N)}} \sigma'(z_i^{(N)})$
To simplify notation, we can express this in vector form:
$\delta^{(N)} = \frac{\partial \mathcal{L}}{\partial a^{(N)}} \odot \sigma'(z^{(N)})$
where $\odot$ denotes the element-wise (Hadamard) product.
Error signal at hidden layers $l < N$ :
Again, applying the chain rule, we have:
$\frac{\partial \mathcal{L}}{\partial z_i^{(l)}} = \sum_{k} \frac{\partial \mathcal{L}}{\partial z_k^{(l+1)}} \frac{\partial z_k^{(l+1)}}{\partial z_i^{(l)}}$
According to the definition of $\delta_k^{(l+1)} = \frac{\partial \mathcal{L}}{\partial z_k^{(l+1)}}$ , we can rewrite the above equation as:
$\frac{\partial \mathcal{L}}{\partial z_i^{(l)}} = \sum_{k} \delta_k^{(l+1)} \frac{\partial z_k^{(l+1)}}{\partial z_i^{(l)}}$
$\frac{\partial z_k^{(l+1)}}{\partial z_i^{(l)}}$ can be computed using the Jacobian matrix:
$\left[ \frac{\partial z^{(l+1)}}{\partial z^{(l)}} \right]_{ki} = \frac{\partial z_k^{(l+1)}}{\partial z_i^{(l)}}$
Thus, we can express the error signal at layer $l$ as:
$\delta^{(l)} = \left( \frac{\partial z^{(l+1)}}{\partial z^{(l)}} \right)^T \delta^{(l+1)}$
Since $z^{(l+1)} = W^{(l+1)} \alpha^{(l)} + b^{(l+1)}$ and $\alpha^{(l)} = \sigma(z^{(l)})$ , we have:
$\delta^{(l)} = \left( W^{(l+1)} \right)^T \delta^{(l+1)} \odot \sigma'(z^{(l)})$

Together with (15) and (10), we can recursively compute the error signals from the output layer back to the input layer. This is the essence of the Back-Propagation algorithm.

After computing the error signals for all layers, we can compute the gradients with respect to the weights and biases:

\frac{\partial \mathcal{L}}{\partial b_i^{(l)}} = \frac{\partial \mathcal{L}}{\partial z_i^{(l)}} \frac{\partial z_i^{(l)}}{\partial b_i^{(l)}}

because $z_i^{(l)}= W_i^{(l)} \alpha^{(l-1)} + b_i^{(l)}$ , we have $\frac{\partial z_i^{(l)}}{\partial b_i^{(l)}} = 1$ . Therefore:

\frac{\partial \mathcal{L}}{\partial b_i^{(l)}} = \delta_i^{(l)}

Vectorizing this, we get:

\frac{\partial \mathcal{L}}{\partial b^{(l)}} = \delta^{(l)}

Similarly, for the weights:

\frac{\partial \mathcal{L}}{\partial W_{ij}^{(l)}} = \frac{\partial \mathcal{L}}{\partial z_i^{(l)}} \frac{\partial z_i^{(l)}}{\partial W_{ij}^{(l)}}

Since $z_i^{(l)} = W_i^{(l)} \alpha^{(l-1)} + b_i^{(l)}$ , we have $\frac{\partial z_i^{(l)}}{\partial W_{ij}^{(l)}} = \alpha_j^{(l-1)}$ . Therefore:

\frac{\partial \mathcal{L}}{\partial W_{ij}^{(l)}} = \delta_i^{(l)} \alpha_j^{(l-1)}

In matrix form, this becomes:

\frac{\partial \mathcal{L}}{\partial W^{(l)}} = \delta^{(l)} \left( \alpha^{(l-1)} \right)^T

Following is an interactive demo of Back-Propagation in Deep Learning. The task is to fit a simple 2D function:

f(x_1, x_2) = \begin{cases} 1 & \text{if } x_1 + x_2 > 1 \\ 0 & \text{otherwise} \end{cases}

3. The Synthesis: Adjoint Symmetry

The striking similarity between the equations of wave imaging and neural network training is not a coincidence. Both systems are governed by the same mathematical principle: the duality between a linear operator and its adjoint.

To see this clearly, let us unify our notation. In both cases, we have a Forward Operator that maps an internal state (the target or the weights ) to an observation (the signal or the prediction ).

3.1. The Unified Forward Map

In a linearized or local sense, both processes can be viewed as an operator acting on a hidden source to produce a measurable result:

Feature	Passive Radar (Imaging)	Deep Learning (Training)
Hidden Source	Target Reflectivity $s(\mathbf{r})$	Parameter Update $\Delta \theta$
Forward Operator	Green’s Function Convolution $\mathcal{H}$	Jacobian Matrix $J = \frac{\partial \pi}{\partial \theta}$
Observation	Received Echoes $g(\mathbf{r})$	Prediction Error $\Delta y = (\hat{y} - y)$

Mathematically, we represent the forward pass as:

y_{obs} = A x

where $A$ is our forward transformation.

3.2. The Inverse Problem and the Adjoint

The “Inverse Problem” asks: given the observation $y_{obs}$ , what was the source $x$ ?

Solving this perfectly usually requires computing the inverse $A^{-1}$ , which is often computationally impossible or “ill-posed” (highly sensitive to noise). Instead, both fields use the Adjoint Operator $A^\dagger$ (the complex conjugate transpose in matrix terms).

3.2.1. In Radar: Back-Projection

The reconstructed image $\hat{s}$ is formed by applying the adjoint of the propagation operator:

\hat{s} = \mathcal{H}^\dagger g

As derived earlier, $\mathcal{H}^\dagger$ corresponds to convolution with the conjugate Green’s function $h^*(\mathbf{r})$ . Physically, this is Time Reversal.

3.2.2. In Deep Learning: Back-Propagation

The gradient used to update weights is found by applying the adjoint of the network’s local sensitivity:

\nabla_\theta \mathcal{L} = \left( \frac{\partial \hat{y}}{\partial \theta} \right)^T \delta

Here, the transpose $(W^{(l)})^T$ is the adjoint of the forward weight matrix $W^{(l)}$ . Just as the radar signal is sent back through the channel, the error signal is “sent back” through the weights.

3.3. Why the Adjoint Works

The effectiveness of both algorithms stems from the Adjoint Identity:

\langle Ax, y \rangle = \langle x, A^\dagger y \rangle

In imaging, this means the energy we collect at the receivers ( $\langle \text{signal}, \text{echo} \rangle$ ) is mathematically equivalent to the energy at the target location when we project our “knowledge” back into the scene.

In deep learning, this identity ensures that the most efficient way to reduce the loss in the high-dimensional output space is to “project” that error back onto the weights.

4. Conclusion: One Algorithm, Two Realities

The “Back” in both Back-Projection and Back-Propagation refers to the same fundamental movement: reversing the flow of information.

In Radar, we reverse Time to locate a target in physical space.
In Deep Learning, we reverse Causality to locate the source of an error in parameter space.

Whether we are “seeing” a stealth aircraft through passive echoes or “learning” to recognize a cat in a photo, we are performing the same mathematical ritual: calculating the forward transformation, measuring the discrepancy, and using the Adjoint Operator to project that discrepancy back to its origin.

Deep learning, in this light, is not just a branch of computer science; it is a form of Computational Imaging where the “scene” being reconstructed is the internal logic of the model itself.

MicDZ's Blog