# [SOLVED] broadcasting and reducing dimension in tensorflow

## Issue

I have the following

``````  tensor A with A.shape = (N,2)
tensor B with B.shape = (3,2)
``````

physically I am visualizing A as N data points in 2 dimension.

B is 3 centers in the same 2 dimension.

My objective is to compute the squared distance of A from each of the 3 centers and then add them up (that is the sum total of inertia of the system from the 3 centers).
I want to compute

``````\$\$ D = \Sum_{i,j} (A(i,j) - B(1,j))^2 + (A(i,j) - B(2,j))^2 + (A(i,j) - B(3,j))^2 \$\$
``````

## Solution

The first solution one might come up with this could be

``````tf.reduce_sum(tf.square(A-B[0])+tf.square(A-B[1])+tf.square(A-B[2]))
``````

, which is the direct translation of your formula to code. Though, using the implicit broadcast provided by tensorflow is marginally more efficient.

``````tf.reduce_sum(tf.square(A[:,None,:]-B[None,:,:]))
``````

code 1 for microbenchmark (large dataset):

``````A=tf.random.normal((2**25,2))
B=tf.random.normal((3,2))
%timeit tf.reduce_sum(tf.square(A[:,None,:]-B[None,:,:]))
%timeit tf.reduce_sum(tf.square(A-B[0])+tf.square(A-B[1])+tf.square(A-B[2]))
``````

output:

``````13.1 ms ± 38.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
15.7 ms ± 7.83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
``````

code 2 for microbenchmark (small dataset):

``````A=tf.random.normal((3,2))
B=tf.random.normal((3,2))
%timeit tf.reduce_sum(tf.square(A[:,None,:]-B[None,:,:]))
%timeit tf.reduce_sum(tf.square(A-B[0])+tf.square(A-B[1])+tf.square(A-B[2]))
``````

output:

``````175 µs ± 731 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
391 µs ± 18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
``````