KEMBAR78
Capgemini Data Analyst Interview Prep | PDF | Sql | Computers
0% found this document useful (0 votes)
61 views3 pages

Capgemini Data Analyst Interview Prep

QA

Uploaded by

shankar das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views3 pages

Capgemini Data Analyst Interview Prep

QA

Uploaded by

shankar das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Question set 1

Company name: Capgemini


Role: Data Analyst

1. What is the Difference Between a Shallow Copy and Deep Copy in python?

Deepcopy creates a different object and populates it with the child objects of the
original object. Therefore, changes in the original object are not reflected in the copy.
copy.deepcopy() creates a Deep Copy. Shallow copy creates a different object and
populates it with the references of the child objects within the original object. Therefore,
changes in the original object are reflected in the copy. copy.copy creates a Shallow
Copy.

2. How can you remove duplicate values in a range of cells?

1. To delete duplicate values in a column, select the highlighted cells, and press
the delete button. After deleting the values, go to the ‘Conditional Formatting’
option present in the Home tab. Choose ‘Clear Rules’ to remove the rules from
the sheet. 2. You can also delete duplicate values by selecting the ‘Remove
Duplicates’ option under Data Tools present in the Data tab.

3. Define shelves and sets in Tableau?


Shelves: Every worksheet in Tableau will have shelves such as columns, rows, marks,
filters, pages, and more. By placing filters on shelves we can build our own visualization
structure. We can control the marks by including or excluding data.

Sets: The sets are used to compute a condition on which the dataset will be prepared. Data
will be grouped together based on a condition. Fields which is responsible for grouping are
known assets. For example – students having grades of more than 70%.

4. Given a table Employee having columns empName and empId, what


will be the result of the SQL query below?

select empName from Employee order by 2 asc;


“Order by 2” is valid when there are at least 2 columns used in SELECT statement.
Here this query will throw error because only one column is used in the SELECT
statement.

Question set 2

1. Explain split(), sub(), subn() methods of “re” module in Python.


Ans: To modify the strings, Python’s “re” module is providing 3 methods. They are:

● split() – uses a regex pattern to “split” a given string into a list.


● sub() – finds all substrings where the regex pattern matches and then replace
them with a different string
● subn() – it is similar to sub() and also returns the new string along with the no. of
replacements.

2. Explain how relationships are defined in Power BI Desktop?

Relationships between tables are defined in two ways:

● Manually - Relationships between tables are manually defined using primary


and foreign keys.
● Automatic - When enabled, this automated feature of Power BI detects
relationships between tables and creates them automatically.

3. Explain the commands in TCL.

TCL stands for Transaction Control Languages. These commands are used for
maintaining consistency of the database and for the management of transactions
made by the DML commands.
1. COMMIT :
This command is used to save the data permanently.
2. ROLLBACK :
This command is used to get the data or restore the data to the last savepoint or
last committed state.
3. SAVEPOINT :
This command is used to save the data at a particular point temporarily, so that
whenever needed can be rollback to that particular point.

4. How can we calculate the standard deviation from the Series in


pandas?

The Pandas std() is defined as a function for calculating the standard deviation of the
given set of numbers, DataFrame, column, and rows.

Series.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None,


**kwargs)

You might also like