NumPy and Pandas Interview Questions and Answers
1. What is NumPy and why is it used in data science?
NumPy (Numerical Python) is a powerful library for numerical computations in Python. It provides support
for arrays, matrices, and a large number of mathematical functions. In data science, NumPy is used for fast
numerical computations, efficient handling of large datasets, and serves as the foundation for libraries like
Pandas and SciPy.
2. What is the difference between arange and range function?
• range() is a built-in Python function that returns a range object.
• np.arange() is NumPy’s version, which returns a NumPy array.
• np.arange() supports float steps (e.g., np.arange(0, 1, 0.1) ), unlike range() .
3. How do you create a NumPy array? Provide examples.
• From list: np.array([1, 2, 3])
• Using functions: np.zeros((2,3)) , np.ones((3,3)) , np.eye(3) , np.linspace(0,10,5)
• Random: np.random.rand(2,2)
4. Difference between Python list and NumPy array
• List: Heterogeneous, slower operations, no vectorization.
• NumPy Array: Homogeneous, faster with vectorized operations.
5. Element-wise operations in NumPy
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2 # [5 7 9]
Supports + , - , * , / , ** etc.
6. NumPy dimensions and shapes
• Shape: Tuple representing dimensions e.g., (3, 4)
• Manipulate using reshape() , ravel() , flatten() , transpose()
7. Broadcasting in NumPy
Broadcasting allows arithmetic operations between arrays of different shapes by expanding one or more
arrays to a compatible shape.
8. Select specific subset
1
arr = np.array([1, 2, 3, 4])
arr[1:3] # [2 3]
Supports slicing, boolean indexing, and fancy indexing.
9. Aggregation functions
np.sum() , np.mean() , np.min() , np.max() , np.std() , np.var()
np.mean([1, 2, 3]) # 2.0
10. Handling missing values
Use np.nan , np.isnan() , and aggregation functions like np.nanmean() , np.nansum()
11. Handling large datasets
NumPy uses contiguous memory blocks and vectorized operations, making it efficient in handling large
arrays with minimal memory overhead.
12. Matrix multiplication
np.dot(A, B) or A @ B
13. Linear algebra module
np.linalg : functions like inv() , eig() , svd() , solve() for solving systems, finding inverses,
eigenvalues, etc.
14. Random functions
np.random.rand() , np.random.randint() , np.random.normal() , np.random.seed()
15. Vectorization
Instead of loops:
np.vectorize(lambda x: x**2)(np.array([1,2,3]))
16. np.where() usage
Conditional filtering:
2
np.where(arr > 0, 1, 0)
17. Statistical operations
mean() , median() , percentile() , std() , etc.
18. Masked arrays
Useful for ignoring invalid entries:
masked = np.ma.masked_array(data, mask=condition)
19. np.copy() vs np.view()
• copy() creates a new array.
• view() creates a new view of the same data.
20. Array reshaping
Use reshape() , resize() , flatten()
21. Concatenate vs vstack
• np.concatenate([a, b], axis=0)
• np.vstack([a, b]) : vertical stack
22. Polynomial functions
np.poly1d() , np.polyfit() , np.polyval() for polynomial creation and evaluation.
23. Memory layout
Arrays are stored in contiguous blocks (row-major). This enables faster computations.
24. Statistical tests
Basic support via np.corrcoef() , np.cov() . Advanced in scipy.stats
25. np.histogram()
Used to compute the frequency distribution:
np.histogram(data, bins=5)
3
26. Array initialization
np.zeros() , np.ones() , np.full() , np.eye() , np.empty()
27. Complex numbers
arr = np.array([1+2j, 3+4j])
np.real(arr), np.imag(arr)
28. FFT
np.fft.fft(signal)
np.fft.ifft(signal)
29. np.unique()
Returns sorted unique elements and their counts.
np.unique(arr, return_counts=True)
30. What is Pandas?
Pandas is a data analysis and manipulation library built on NumPy. It provides Series and DataFrame .
31. Create DataFrame
df = pd.DataFrame({'A':[1,2], 'B':[3,4]})
32. Reading data
pd.read_csv() , read_excel() , read_sql() , read_json()
33. Missing data handling
df.isna() , df.fillna() , df.dropna()
34. Aggregation and grouping
df.groupby('column').agg(['sum','mean'])
4
35. Merge and join
• pd.merge() for joining on keys.
• df.join() for joining on index.
36. Filtering and sorting
df[df['col'] > 5], df.sort_values('col')
37. Row/column manipulation
df.drop() , df.insert() , df.rename()
38. apply() method
Applies a function column or row-wise.
39. Indexing methods
.loc[] , .iloc[] , .at[] , .iat[]
40. Time series handling
pd.to_datetime() , resample() , rolling()
41. Pivot table
pd.pivot_table(df, values='val', index='A', columns='B')
42. Normalization
df['col'] = (df['col'] - df['col'].mean()) / df['col'].std()
43. pd.concat()
Used for appending/combining dataframes row-wise or column-wise.
44. rolling()
Window-based calculations:
df.rolling(window=3).mean()
5
45. Transformation and aggregation
df.groupby('A').transform('mean')
46. Multi-index
df.set_index(['col1', 'col2'])
47. query() method
Filter using string expressions.
df.query('col > 5')
48. Large datasets
Use chunksize in readers, filter early, optimize data types.
49. Categorical data
astype('category') reduces memory.
50. Merge vs Join
merge() is more versatile, join() is convenient for index-based joins.
51. Slicing and selection
Using .loc[] , .iloc[] , slicing syntax df[1:5]
52. Airbnb-style question: Efficiently handling, transforming, and visualizing data using Pandas for
business decision-making.
53. Complex groupby()
df.groupby(['A','B']).agg({'C':'sum', 'D':'mean'})
54. applymap()
Element-wise function application for DataFrames.
6
55. pd.to_datetime()
pd.to_datetime(df['date_column'])
56. Advanced missing values
Interpolate, forward/backward fill: df.interpolate() , df.fillna(method='bfill')
57. pd.cut() and pd.qcut()
Binning continuous data into discrete intervals.
58. Hierarchical indexing
Used for multi-level indexes, especially after groupby or pivot.
59. pd.melt()
Unpivots a DataFrame from wide to long format.
60. Custom aggregation
df.groupby('A').agg({'B': lambda x: x.max() - x.min()})
61. Performance considerations
Avoid loops, use vectorized ops, downcast data types, filter early.
62. query() for efficient selection
Uses internal expression evaluation engine, faster for large data.
End of Document.