## Issue

I’m trying to accelerate my `numpy`

code using `dask`

. Following is a part of my `numpy`

code

```
arr_1 = np.load('<arr1_path>.npy')
arr_2 = np.load('<arr2_path>.npy')
arr_3 = np.load('<arr3_path>.npy')
arr_1 = np.concatenate((arr_1, arr_2[:,:,np.newaxis]),axis = 2)
arr_1_half = totaldata.shape[0]//2
arr_4 = arr_3[:half]
[r,c] = np.where(arr_4 == True)
[rn,cn] = np.where(arr_4 == False)
print(len(r))
```

This prints valid results and is working fine. However, following `dask`

equivalent

```
arr_1 = da.from_zarr('<arr1_path>.zarr')
arr_2 = da.from_zarr('<arr2_path>.zarr')
arr_3 = da.from_zarr('<arr3_path>.zarr')
arr_1 = da.concatenate((arr_1, arr_2[:,:,np.newaxis]),axis = 2)
arr_1_half = totaldata.shape[0]//2
arr_4 = arr_3[:half]
[r,c] = da.where(arr_4 == True)
[rn,cn] = da.where(arr_4 == False)
print(len(r)) # <----- Error: float' object cannot be interpreted as an integer
```

results in `r`

as

```
dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,), chunktype=numpy.ndarray>
```

and thus the above mentioned error. Since `dask`

arrays are lazily evaluated, do I have to explicitly call `compute()`

or similar somewhere? Or am I missing something basic? Any help will be appreciated.

## Solution

The array you’ve constructed with `da.where`

has unknown chunk sizes, which can happen whenever the size of an array depends on lazy computations that havenâ€™t yet been performed. Unknown values within shape or chunks are designated using np.nan rather than an integer, which is why you see the `ValueError`

(this error message was improved in the last few months). The solution is to use `compute_chunk_sizes`

:

```
import dask.array as da
x = da.from_array(np.random.randn(100), chunks=20)
y = x[x > 0]
# len(y) # ValueError: Cannot call len() on object with unknown chunk size.
y.compute_chunk_sizes() # modifies y in-place
len(y)
```

Answered By – scj13

Answer Checked By – Cary Denson (BugsFixing Admin)