HAL Gnanajyoti School
Java Character Set
Java Character Set
A Character set is a set of valid characters that a language can recognize. It
represents any letter, digit or any other sign. Java uses the Unicode character set.
Unicode is 2 byte (16 bits) character code that has characters representing almost
all characters in all most all human alphabets and writing systems around the world
including English, Arabic, Greek, Chinese etc. Each Unicode character starts with
‘\u’ followed by four hexadecimal digits.
Token: The smallest individual unit in a program that is meaningful to the
compiler is called a token.
Tokens
Keywords Identifiers Literals/Constants Operators Punctuators/Separators
Keywords: Keywords are the words that convey a special meaning to the language
compiler. These are reserved for special purpose and must not be used as a normal
identifier names. Eg:- public class void, main, int, double etc.
Identifiers: They are used as the general terminology for the names given to
different parts of the program like variable names, class names, object names, array
names etc.
Rules for naming identifiers
• Identifiers can have alphabets, digits, underscore and dollar sign.
• They must not be a keyword
• They must not begin with a digit
• They can be of any length
• They cannot have space between two words
• Java is case sensitive i.e. uppercase & lowercase letters are treated
differently.
A few examples of Java identifiers that are valid:
Student_Name Mark1 this_is_a_very_long_identifier
The following Java identifiers are invalid:
Student name Contains a blank space
1Mark_Comp Starting with a digit
new Reserved Java Keyword
Literals/Constants
A literal is a sequence of characters used in a program to represent a constant value
(fixed data) like integer literal, floating literal, boolean literal etc.
Operators
Operators are basically the symbols or tokens that perform specific operations on
operands.
Separators/Punctuators
They are the special characters in Java, which are used to separate the variables or
the characters.