KEMBAR78
Num Py Pandas Interview Qa | PDF | Applied Mathematics | Computer Programming
0% found this document useful (0 votes)
4 views7 pages

Num Py Pandas Interview Qa

The document provides a comprehensive overview of NumPy and Pandas, including key concepts, functions, and differences between them. It covers topics such as array creation, data manipulation, handling missing values, and statistical operations. Additionally, it addresses performance considerations and methods for efficiently managing large datasets.

Uploaded by

bamboocader
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views7 pages

Num Py Pandas Interview Qa

The document provides a comprehensive overview of NumPy and Pandas, including key concepts, functions, and differences between them. It covers topics such as array creation, data manipulation, handling missing values, and statistical operations. Additionally, it addresses performance considerations and methods for efficiently managing large datasets.

Uploaded by

bamboocader
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

NumPy and Pandas Interview Questions and Answers

1. What is NumPy and why is it used in data science?

NumPy (Numerical Python) is a powerful library for numerical computations in Python. It provides support
for arrays, matrices, and a large number of mathematical functions. In data science, NumPy is used for fast
numerical computations, efficient handling of large datasets, and serves as the foundation for libraries like
Pandas and SciPy.

2. What is the difference between arange and range function?

• range() is a built-in Python function that returns a range object.


• np.arange() is NumPy’s version, which returns a NumPy array.
• np.arange() supports float steps (e.g., np.arange(0, 1, 0.1) ), unlike range() .

3. How do you create a NumPy array? Provide examples.

• From list: np.array([1, 2, 3])


• Using functions: np.zeros((2,3)) , np.ones((3,3)) , np.eye(3) , np.linspace(0,10,5)
• Random: np.random.rand(2,2)

4. Difference between Python list and NumPy array

• List: Heterogeneous, slower operations, no vectorization.


• NumPy Array: Homogeneous, faster with vectorized operations.

5. Element-wise operations in NumPy

arr1 = np.array([1, 2, 3])


arr2 = np.array([4, 5, 6])
result = arr1 + arr2 # [5 7 9]

Supports + , - , * , / , ** etc.

6. NumPy dimensions and shapes

• Shape: Tuple representing dimensions e.g., (3, 4)


• Manipulate using reshape() , ravel() , flatten() , transpose()

7. Broadcasting in NumPy

Broadcasting allows arithmetic operations between arrays of different shapes by expanding one or more
arrays to a compatible shape.

8. Select specific subset

1
arr = np.array([1, 2, 3, 4])
arr[1:3] # [2 3]

Supports slicing, boolean indexing, and fancy indexing.

9. Aggregation functions

np.sum() , np.mean() , np.min() , np.max() , np.std() , np.var()

np.mean([1, 2, 3]) # 2.0

10. Handling missing values

Use np.nan , np.isnan() , and aggregation functions like np.nanmean() , np.nansum()

11. Handling large datasets

NumPy uses contiguous memory blocks and vectorized operations, making it efficient in handling large
arrays with minimal memory overhead.

12. Matrix multiplication

np.dot(A, B) or A @ B

13. Linear algebra module

np.linalg : functions like inv() , eig() , svd() , solve() for solving systems, finding inverses,
eigenvalues, etc.

14. Random functions

np.random.rand() , np.random.randint() , np.random.normal() , np.random.seed()

15. Vectorization

Instead of loops:

np.vectorize(lambda x: x**2)(np.array([1,2,3]))

16. np.where() usage

Conditional filtering:

2
np.where(arr > 0, 1, 0)

17. Statistical operations

mean() , median() , percentile() , std() , etc.

18. Masked arrays

Useful for ignoring invalid entries:

masked = np.ma.masked_array(data, mask=condition)

19. np.copy() vs np.view()

• copy() creates a new array.


• view() creates a new view of the same data.

20. Array reshaping

Use reshape() , resize() , flatten()

21. Concatenate vs vstack

• np.concatenate([a, b], axis=0)


• np.vstack([a, b]) : vertical stack

22. Polynomial functions

np.poly1d() , np.polyfit() , np.polyval() for polynomial creation and evaluation.

23. Memory layout

Arrays are stored in contiguous blocks (row-major). This enables faster computations.

24. Statistical tests

Basic support via np.corrcoef() , np.cov() . Advanced in scipy.stats

25. np.histogram()

Used to compute the frequency distribution:

np.histogram(data, bins=5)

3
26. Array initialization

np.zeros() , np.ones() , np.full() , np.eye() , np.empty()

27. Complex numbers

arr = np.array([1+2j, 3+4j])


np.real(arr), np.imag(arr)

28. FFT

np.fft.fft(signal)
np.fft.ifft(signal)

29. np.unique()

Returns sorted unique elements and their counts.

np.unique(arr, return_counts=True)

30. What is Pandas?

Pandas is a data analysis and manipulation library built on NumPy. It provides Series and DataFrame .

31. Create DataFrame

df = pd.DataFrame({'A':[1,2], 'B':[3,4]})

32. Reading data

pd.read_csv() , read_excel() , read_sql() , read_json()

33. Missing data handling

df.isna() , df.fillna() , df.dropna()

34. Aggregation and grouping

df.groupby('column').agg(['sum','mean'])

4
35. Merge and join

• pd.merge() for joining on keys.


• df.join() for joining on index.

36. Filtering and sorting

df[df['col'] > 5], df.sort_values('col')

37. Row/column manipulation

df.drop() , df.insert() , df.rename()

38. apply() method

Applies a function column or row-wise.

39. Indexing methods

.loc[] , .iloc[] , .at[] , .iat[]

40. Time series handling

pd.to_datetime() , resample() , rolling()

41. Pivot table

pd.pivot_table(df, values='val', index='A', columns='B')

42. Normalization

df['col'] = (df['col'] - df['col'].mean()) / df['col'].std()

43. pd.concat()

Used for appending/combining dataframes row-wise or column-wise.

44. rolling()

Window-based calculations:

df.rolling(window=3).mean()

5
45. Transformation and aggregation

df.groupby('A').transform('mean')

46. Multi-index

df.set_index(['col1', 'col2'])

47. query() method

Filter using string expressions.

df.query('col > 5')

48. Large datasets

Use chunksize in readers, filter early, optimize data types.

49. Categorical data

astype('category') reduces memory.

50. Merge vs Join

merge() is more versatile, join() is convenient for index-based joins.

51. Slicing and selection

Using .loc[] , .iloc[] , slicing syntax df[1:5]

52. Airbnb-style question: Efficiently handling, transforming, and visualizing data using Pandas for
business decision-making.

53. Complex groupby()

df.groupby(['A','B']).agg({'C':'sum', 'D':'mean'})

54. applymap()

Element-wise function application for DataFrames.

6
55. pd.to_datetime()

pd.to_datetime(df['date_column'])

56. Advanced missing values

Interpolate, forward/backward fill: df.interpolate() , df.fillna(method='bfill')

57. pd.cut() and pd.qcut()

Binning continuous data into discrete intervals.

58. Hierarchical indexing

Used for multi-level indexes, especially after groupby or pivot.

59. pd.melt()

Unpivots a DataFrame from wide to long format.

60. Custom aggregation

df.groupby('A').agg({'B': lambda x: x.max() - x.min()})

61. Performance considerations

Avoid loops, use vectorized ops, downcast data types, filter early.

62. query() for efficient selection

Uses internal expression evaluation engine, faster for large data.

End of Document.

You might also like