diff --git a/Course.md b/Course.md index e10d789afca79ab2df93834ba5e54d75d4d9964f..bfc101b9bc1de11b5f260cebd48af4e031325a3e 100644 --- a/Course.md +++ b/Course.md @@ -19,14 +19,14 @@ header-includes: # Introduction — programming -Before we start learning the Java language, it might be useful to take some time -to think about programming in general. We will make broad remarks that are not -things to know for themselves but will hopefully shed an interesting light when -we actually start doing things in Java. There are many ways to program, many -languages, many paradigms, but capturing their similarities will not only -prevent us from getting lost in what we're trying to achieve, it will also at -the same time allow us to understand what is unique to Java and how we can -leverage that to ease our task. +Before we start learning the Java language, it might be useful to take some +time to think about programming in general. We will make broad remarks that are +not things to know for themselves but will hopefully shed an interesting light +on what we are going to do when we actually start doing things in Java. There +are many ways to program, many languages, many paradigms, but capturing their +similarities will not only prevent us from getting lost in what we're trying to +achieve, it will also allow us to understand what is unique to Java and how we +can leverage that to ease our task. ## Formulas and computations @@ -48,9 +48,9 @@ object and assigns it its value. But this symbolic definition is of no use when actual values are needed in a concrete application like engineering or statistics. Converting those formulas into actionable data is an entirely different matter that doesn't always rely on -the same kind of skills: there starts the realm of computing where notions like +the same kind of skills : there starts the realm of computing where notions like efficiency start to make sense. There may be several paths to get the result's -digits, not all of which are necessarily equivalent: writing a calculation one +digits, not all of which are necessarily equivalent : writing a calculation one way or another can take more or less time, and use more or less space (think of the number of intermediate steps you have to write down, or the difficulty to memorize them if you're doing it mentally). @@ -62,12 +62,12 @@ pre-defined by our numeral system. No choice can be made to change the result. This probably explains at least partly why many people find calculation tedious and boring, but it is actually a very good property for several reasons. Obviously, this objectivity allows people to agree on the actual results, but -there's much. Those rules aren't tedious to apply because they are too +there's more. Those rules aren't tedious to apply because they are too complicated, on the contrary they are tedious because they are so simple. The -difficulty in doing a calculation isn't in figuring what two digits make when -summed together, it lies in the repetition of many such easy steps. This is -crucial to be able to verify each step and find errors. The most important -property of this system, however, is possibly that it is able to define the +difficulty in doing a calculation isn't in figuring out what two digits make +when summed together, it lies in the repetition of many such easy steps. This +is crucial to be able to verify each step and find errors. The most important +property of this system, however, is possibly that its ability to define the result of an infinite number of calculations with only a (small) set of rules. With this system, anyone given enough time is able to compute the sum or the product of any two arbitrarily large numbers, that one has never ever seen @@ -76,7 +76,7 @@ before. ## Models (or "you were already programming this whole time") This example is very precious in the way it captures a fundamental part of -programming: evaluating formulas to get a meaningful result for humans requires +programming : evaluating formulas to get a meaningful result for humans requires to *model* the entities involved in order to be able to derive the expected result by applying a set of *simple* and *objective* rules. In this particular exemple, the entities we consider are numbers, which may seem natural and @@ -88,23 +88,23 @@ that you may not even have thought about the fact that an integer exists in itself, outside of its base-10 writing which is only one way (among many others) to represent it. -Note that these representations work both ways: given a number, you are able to +Note that these representations work both ways : given a number, you are able to *encode* it into a string of digits, but conversely seeing a string of digits is enough for you to conceptualize the number it represents (you can *decode* it). -- real useful concept : number -- model : a string of digits -- conversion between reality and model : numeral system -- operations on the model : the arithmetic rules ("9 + 3 make 12, write 2, carry +- real useful concept : number +- model : a string of digits +- conversion between reality and model : numeral system +- operations on the model : the arithmetic rules ("9 + 3 make 12, write 2, carry 1 and…") Though very simple here — so simple that it was hard at first to notice it's even there — this very process is the heart of programming. All that's ever been done -with computers is this: defining a layer useful enough to solve a certain class -of problems, and then encode more abstract problems in term of this layer to be +with computers is this : defining a layer useful enough to solve a certain class +of problems, and then encode more abstract problems on top of this layer to be able to solve them as well. -A way to visualize this pattern is building the following parallelogram: +A way to visualize this pattern is to build the following parallelogram : \begin{tabular}{l r} Want to solve a problem B about A ? & @@ -140,16 +140,16 @@ objects, such as polynoms, by numbers to be able to compute approximations of interesting functions (logarithms, trigonometric functions, etc.). The analytical engine had abstracted several key concepts that appear during computations, like the "memory" required to hold intermediate computation steps -or the notion of "program", written on punched cards : designing a machine with +or the notion of "program", written on punched cards : designing a machine with more expressive power than a basic calculator required a language more complex that one button for each arithmetic operator to describe what was expected of it, and this language had to be readable by mechanical means. This shows that humans didn't wait for silicon to be able to automate calculations. But computers wouldn't have been very useful if they had been -restricted to mathematical problems: like the Analytical Engine, Java and other +restricted to mathematical problems : like the Analytical Engine, Java and other modern programming languages are more than the language of expressions you've -been punching on calculators in junior high-school and they are not limited to +been punching into calculators in junior high-school and they are not limited to representing numbers, or rather, since electronic and mechanical devices can handle only numbers natively, they have ways to represent more complex data. @@ -160,7 +160,7 @@ Indeed, if for instance numbers seem more than enough to model physics problems other fields may appear at first less straightforward to solve with numbers. We will see that numbers can do much though. -History is full of events which outcome was strongly influenced by the secrecy +History is full of events whose outcome was strongly influenced by the secrecy of strategic intelligence. For such purposes, spies, generals and children alike have created tables to convert letters into numbers and back in order to be able to encrypt their messages by applying mathematical operations on the coded @@ -181,7 +181,7 @@ begining of a sequence" (by, say, storing it in the first or last digit of each unit). Actually, there are at least two major strategies to represent sequences of -objects, and this alternative is a common "sightseeing" spot on the learning +objects, and this alternative is a common "landmark" spot on the learning route to C, a notable ancestor of Java. The first one is to directly store object after object, and to reserve a special value which no object can take to mark the end of the sequence. The other stores the length of the sequence at its @@ -191,23 +191,23 @@ the language already provides a built-in way to represent sequences. Once we have sequences, the expressivity of numbers rises seriously. Encoding sequences of sequences, we have access to multi-dimensional objects such as -pictures; storing different kind of objects side by side (text, numbers, +pictures; storing different kinds of objects side by side (text, numbers, sequences, other objects…) we can describe more complex real-world concepts and start developing useful models that can do useful things for us (think, for instance, of the representation of a music track if you're trying to develop a -music player: you'll probably need to store the actual sound data, but also its +music player : you'll probably need to store the actual sound data, but also its title, the name of the band, the title of the album, its duration, possibly the lyrics, etc.). All numbers. The above isn't of course a very precise blueprint for a general-purpose format -to store binary data, but this rough sketch hopes to show a way how very complex -data can be represented by numbers, *while retaining some of its structure* so -that useful things can be said on the actual real-world objects by considering -their models only. The most important thing to remark here is that some objects -have a natural direct representation as numbers while others need a little more -work to be digitized. This distinction matters because it will have consequences -on how both groups will be handled by programming languages in general and Java -in particular. +to store binary data, but this rough sketch hopes to show a way in which very +complex data can be represented by numbers, *while retaining some of its +structure* so that useful things can be said on the actual real-world objects +by considering their models only. The most important thing to remark here is +that some objects have a natural direct representation as numbers while others +need a little more work to be digitized. This distinction matters because it +will have consequences on how both groups will be handled by programming +languages in general and Java in particular. ## Up to… the Java Machine @@ -215,15 +215,15 @@ in particular. We noticed earlier that numbers themselves were too abstract to be handled directly and that numeral systems with digits and arithmetic rules were devised -as a model that allowed to do calculations on them. But this model has a limit: +as a model that allowed to do calculations on them. But this model has a limit : a physical world full of upper bounds with finite space accessible in a finite time can only accomodate a finite number of digits. -Likewise, practical computing device from the abacus to the processor all have a -fixed predefined number of digits and can't represent any number but only a +Likewise, practical computing devices from the abacus to the processor all have +a fixed predefined number of digits and can't represent any number but only a finite subset of them. While each rod on an abacus can represent the ten digits -we're used to, electronic circuits were designed to represent only two digits, 0 -and 1, hence favouring the use of a binary notation of numbers, that is, the +we're used to, electronic circuits were designed to represent only two digits, +0 and 1, hence favouring the use of a binary notation of numbers, that is, the base 2 over base 10. A "32-bit" processor has thus 32 (binary) digits to represent numbers, and a "64-bit" processor 64 digits, meaning that they can only represent respectively $2^{32}$ and $2^{64}$ different values, usually @@ -234,28 +234,28 @@ Such a set of bits of fixed length is called a *word* (and the number of bits it takes its *size*). When a carry occurring during a computation needs a new bit to be represented that exceeds the size, the physical device has no other option than to merely drop it, and "loop" over from 0. The same would happen if you had -for instance an abacus with 5 rods, representing 99999, and you added 1: +for instance an abacus with 5 rods, representing 99999, and you added 1 : applying the rules, all '9' digits would shift to '0', but you couldn't carry the last '1' to a new rod. When this happens the device is said to *overflow*. Natively, the processor can only handle words, which means that the distinction we made earlier between what was immediately numbers and more complex data -wasn't enough: actually, even numbers greater than the maximum value of a word +wasn't enough : actually, even numbers greater than the maximum value of a word are complex data to a processor and require a special encoding just like -sequences. For us, it means that Java won't be able to handle the same way a -"small" integer and one of arbitrary size and this is a key to understand why +sequences. For us, it means that Java won't be able to handle a "small" integer +and one of arbitrary size the same way and this is a key to understand why there are different types of numbers in programming languages. ### Layers We already knew that handling complex objects would come at a cost, but we've -just understood why even large integers would be exactly as painful to handle. +just understood why even large integers would be just as painful to handle. You should be convinced by now that no one really wants to operate directly on their word representations. A complex representation of data requires more -abstract ways to process it, while always in the end relying on the same base -layer: the processor. +abstract ways to process it, while always relying on the same base layer in the +end : the processor. -Processors operate on numbers stored in registered, some for input data and others +Processors operate on numbers stored in registeres, some for input data and others to know the next instruction they should execute on those values. They are thus built with a certain number of predefined operations they can perform on input words. The size of their words as well as this set of *instructions* determines @@ -270,12 +270,13 @@ to things to make programs more readable than actual binary code but one still has to manually put the values needed into the proper registers. The next big step was to introduce the notion of variables to free the -programmer from the administrative work of loading and saving registers. This is -a huge conceptual step because it enables one to refer to virtually an infinity -of objects at once just like in human languages. The price to pay for this power -is the difficulty to name things, one of the hardest problems in programming. As -we will see later in this course, a great care should be taken in choosing names -for objects, as poor names make bad code which is way worse than no code at all. +programmer from the administrative work of loading and saving registers. This +is a huge conceptual step because it enables one to refer to arbitrarily many +objects at once just like in human languages. The price to pay for this power +is the difficulty to name things, one of the hardest problems in programming. +As we will see later in this course, great care should be taken in choosing +names for objects, as poor names make bad code which is much worse than no code +at all. Languages of this level, like ALGOL, originally developed in 1958, still describe algorithms as sequences of instructions to perform in a given order @@ -284,35 +285,30 @@ core of the "imperative" approach, probably culminating in the C language, a successor which added a lot of expressivity by allowing to define data types and really start the modeling process we've been discussing all along. -At this level of abstraction, code starts being regrouped into *procedures* or -*functions* (some people make a difference but they are synonyms to a large -extent) and they provide loops to ease the control of the flow of the program, -by allowing to iterate on the values of a variable (in assembly, this was only -possible by jumping to specific parts of the program after running a test on the -value in a specific register). - -Java is yet a level higher in term of abstraction as it is an object-oriented -language: in addition to the definition of data structures like C, it allows to -regroup functions operating on a given data structures and attach them tightly -to it for clarity (and other benefits we will see later), as a metaphor of -physical devices (think of a washing machine or a hi-fi system) which have -programmed routines which can be triggered by pressing buttons. It also +At this level of abstraction, code is regrouped into *procedures* or +*functions* (some people don't make a difference and t they are indeed synonyms +to a large extent but the distinction matters for Java) and they provide loops +to ease the control of the flow of the program, by allowing to iterate on the +values of a variable (in assembly, this was only possible by jumping to +specific parts of the program after running a test on the value in a specific +register). + +Java is yet a level higher in terms of abstraction as it is an object-oriented +language : in addition to the definition of data structures like C, it allows +to regroup functions operating on a given data structure and attach them +tightly to it for clarity (and other benefits we will see later), as a metaphor +of physical devices (think of a washing machine or a hi-fi system) which have +programmed routines that can be triggered by pressing buttons. It also introduces the notion of packages and namespaces which are ways to separate cleanly independent parts of programs, avoiding name conflicts and misuse of internal states of implementations. ### Compilers and interpreters -Each new language introduces new abstractions which semantics must be eventually -described in term of what can be run, that is, machine code. But the translation -doesn't have to go all the way down for each language. In fact, most languages -are defined only in term of an existing lower-level language. If you set out to -create a language you would probably want to define its semantic in term of -assembly or even C, you wouldn't need to be able to convert its source code -directly to machine code because it's easier to produce C and we already have -tools to convert C code to a binary executable: the ones distributed to support -C by one of its many implementations. There are two major strategies to actually -perform this conversion from a higher-level language to a lower-level. +Each new language introduces new abstractions whose semantics must eventually +be described in terms of what can be run, that is, machine code. But there are +two major strategies to actually perform this conversion from a higher-level +language to a lower-level. One is to perform the translation once and produce code in the destination language, a process called *compilation*. This is conceptually the more simple @@ -323,7 +319,7 @@ that contains only the instructions needed for your purpose so it can be rather small. The other approach is to translate instructions on the fly by writing an -*interpreter*: a sort of general purpose program which reads what to do during a +*interpreter* : a sort of general purpose program which reads what to do during a given run from a *script* written in the abstract language defined by that interpreter. It is generally slower than compiled code because the instructions need to be understood and translated while the program is running. In addition @@ -336,7 +332,20 @@ you get all the scripts written for that interpreter to work, with no additional work if the interpreter really abstracts correctly from the layer it's implemented in (which is always more or less the case in practice). -Java combines a bit of both: it's a compiled language, but instead of compiling +the translation + + +doesn't have to go all the way down for each language. In fact, most languages +are implemented in an existing lower-level language and not directly . If you set out to +create a language you would probably want to define its semantic in term of +assembly or even C, you wouldn't need to be able to convert its source code +directly to machine code because it's easier to produce C and we already have +tools to convert C code to a binary executable : the ones distributed to support + +C by one of its many implementations. There are two major strategies to actually +perform this conversion from a higher-level language to a lower-level. + +Java combines a bit of both : it's a compiled language, but instead of compiling directly to machine code it compiles to an intermediate binary format containing instructions for a sort of interpreter, the *Java Machine* (this type of interpreter for binary code is more generally called a *virtual machine*). @@ -345,7 +354,7 @@ Our Java program will hence have a compiler, `javac`, to produce binary files (called *bytecode*) that are not directly executable but sorts of "binary scripts" for the `java` interpreter. It's usually a bit slower than a true executable but the bytecode is way more optimized than a textual script, and its -strong selling point is the portability it provides: a Java program, once +strong selling point is the portability it provides : a Java program, once compiled, can be copied to any machine where the Java machine has been ported and run there ("Write Once Run Anywhere" was the slogan used by Sun Microsystem in advertisement material). @@ -360,7 +369,7 @@ So let's sum up the important ideas we've found thinking about computing. Programming is about developing models of problems to be able to solve them automatically. This is conceptually very similar to numeral systems used to -perform calculations on numbers: a programming language plays the part of +perform calculations on numbers : a programming language plays the part of arithmetic, a numeral system with operators, that let one assemble programs, the equivalent of expressions, that can then be run automatically by a machine applying simple rules to get a result just like applying simple rules on the @@ -372,7 +381,7 @@ processor let one use many expressive and complex types of data, which will however have to be handled a little differently from the types directly translatable into numbers. -The Java language is implemented as a compiler for a virtual machine: a program +The Java language is implemented as a compiler for a virtual machine : a program transforms textual source code into a set of instructions stored in a binary file, that can then be run by the virtual machine with no additional work on any platform where the virtual machine was ported. @@ -390,7 +399,7 @@ special typography to mean that the text written in it is legit Java. The introduction has (hopefully) shown that the core of programming is to build useful *models* of actual problems we want to solve. Doing so requires a toolbox -to define those models : data structures. +to define those models : data structures. Contrary to what one could believe thinking of programs as "cooking recipe" for computers, an import part of a program is *describing* things instead of *doing* @@ -399,19 +408,20 @@ they don't really achieve much in themselves. They are nonetheless as important as the next section and should be properly understood before attempting to write anything. -Because it is easier to describe a set by giving example elements it contains, -we will in this section present built-in Java types as well as valid *litterals* -for each type. +Because it is easier to describe a set by giving examples of elements it +contains, we will in this section present built-in Java types as well as valid +*litterals* for each type, that is, atomic snippets of code that Java +interprets as direct values. ### "Numbers" The above remarks on architectures have taught us that the machines we use to program can only handle directly (a finite subset of) natural numbers. But those -finite subset come in different flavours for different purposes: +finite subset come in different flavours for different purposes : **boolean** -The simplest piece of information that can be stored is a binary truth value: +The simplest piece of information that can be stored is a binary truth value : either `true` or `false`, as described in Boole's algebra. They are of course a lot shorter than a machine word and are hence obviously immediate values. @@ -432,14 +442,14 @@ sockets. **int** -Short for *integer*, `int` are just that: a truncated version of the set of +Short for *integer*, `int` are just that : a truncated version of the set of natural numbers. Since Java was normalized in the 90s, at a time when 32-bit architecture dominated the market, they are of course 32-bit integers, the size of machine words at the time, to fit entirely in a processor register for the sake of efficiency. Values of this type can be created in programs by simply writing a string digit -like we are used to, for instance on calculators. These are valid values: +like we are used to, for instance on calculators. These are valid values : - `0` - `1` @@ -450,16 +460,16 @@ As all primitive number types, they can take positive or negative values, so - `-2` is valid too. Since we know they can go from $-2^{31}$ to $2^{31}-1$ it means -that: +that : - `-2147483648` - `2147483647` are valid too and are the bounds of the values for this type. Finally, Java -supports notation in other bases: binary, octal and hexadecimal. Binary values +supports notation in other bases : binary, octal and hexadecimal. Binary values are prefixed by "0b", octal only by a "0" (this is a frequent caveat, don't pad the numbers you write in a program with 0, it would change their meaning and -possibly their value) and hexadecimal by "0x": +possibly their value) and hexadecimal by "0x" : - `0b10` (which is equal to `2`) - `010` (which is equal to `8`) @@ -472,7 +482,7 @@ size when the code is expected to run in production on embedded devices with smaller architecture, but it could also allow some compiler optimization and is also a way to state you don't expect the values handled to go above $2^{31}-1$ or under $-2^{31}$. For this reason, all the previous valid values can only be -`short` except of course the two bounds, which are instead: +`short` except of course the two bounds, which are instead : - `-32768` - `32767` @@ -491,7 +501,7 @@ Java also has representation for decimal numbers on 32-bit with the `float` type, allowing for efficient decimal computations still handled natively by the processor. Numbers of this type can be written as the previous integers (granted they don't exceed the maximum precision for this type), with a point separating -an integer part and a fraction part (which can be empty to mean $0$), like this: +an integer part and a fraction part (which can be empty to mean $0$), like this : - `1.03` - `-0.041` @@ -501,7 +511,7 @@ an integer part and a fraction part (which can be empty to mean $0$), like this: It also accepts scientific notation with a *significand* (sometimes also called *mantissa*) and an *exponent*, the power of ten by which to multiply the significand, separated by a lowercase or uppercase 'e'. The previous numbers can -thus also be written: +thus also be written : - `10.3e-1` - `-4.1e-2` @@ -516,8 +526,8 @@ values. For instance, the value `1.00000001e8` for a float is equal to **double** -The `double` type is to `float` what `long` is to `int`: semantically the same, -accepting the same litteral values, but with twice as much precision by storing +The `double` type is to `float` what `long` is to `int` : semantically the same, +accepting the same literal values, but with twice as much precision by storing the numbers on 32-bit instead. If stored in a `double`, then - `1.00000001e8` is different from @@ -525,7 +535,7 @@ the numbers on 32-bit instead. If stored in a `double`, then By default, any value in decimal notation is expected by Java to be a `double`, which is why the previous constant values can be suffixed by a lowercase or -uppercase `f` or `d` to mean they are respectively `float` or `double` like so: +uppercase `f` or `d` to mean they are respectively `float` or `double` like so : - `1.03f` (explicitly float) - `1.03d` (explicitly double) @@ -535,7 +545,7 @@ uppercase `f` or `d` to mean they are respectively `float` or `double` like so: Finally, characters are represented by a their numeric code in Unicode. They are stored on 16 bits. A value of this type must be written between single quotes -likes this: +likes this : - `'a'` - `'$'` @@ -543,24 +553,24 @@ likes this: - `'Z'` Some special characters can't be entered conveniently in programs and are -represented by *escape sequences*: a '\\' (backslash) followed by a character +represented by *escape sequences* : a '\\' (backslash) followed by a character which isn't interpreted litteraly but is translated. This notation isn't specific to Java and can be looked up in any ASCII table, but the most useful -are: +are : -- `'\n'`: newline character -- `'\r'`: carriage return (with the previous character, involved in the various - ways to represent the actual end of line: while UNIX generally considers +- `'\n'` : newline character +- `'\r'` : carriage return (with the previous character, involved in the various + ways to represent the actual end of line : while UNIX generally considers `'\n'` to be enough, Windows wants both `'\r'`, then `'\n'` and MacOS is satisfied with `'\r'`) -- `'\t'`: tabulation +- `'\t'` : tabulation -Of course, since the single quote marks the end of a character litteral, it too -must be escaped to be entered: +Of course, since the single quote marks the end of a character literal, it +must be escaped to be entered too : - `'\''` -And that entails that the backslash too must be escaped +And that entails that the backslash must be escaped too - `'\\'` @@ -581,7 +591,7 @@ For instance there's a type for arbitrary-large numbers, called `BigInteger`. Don't believe it's actually infinite, since your machine has only finite memory in the end, but it's much larger than the 64 bits allowed to a `long` and reaching that limit will mean you'll have exhausted all your computing resources -so you'll have way more serious problem to worry about than the representation +so you'll have much more serious problems to worry about than the representation overflow. Possibly the most useful compound type is `String`. It allows to represent text, @@ -591,12 +601,12 @@ Integers (!) Strings «struct» -> in fact classes (but let's not talk about the object model just yet) -tip: lower-case -> primitive type, upper-case first -> class +tip : lower-case -> primitive type, upper-case first -> class There are of course many more useful built-in types but the main important thing to -Please note the attention taken to the typographic conventions: while primitive +Please note the attention taken to the typographic conventions : while primitive types were written all lowercase, classes (we're not quite ready to define them but we'll get [there](#objects) soon) in Java start with an uppercase letter. Since spaces are not allowed within names, when a name is made of several words, @@ -615,13 +625,13 @@ convention is called "CamelCase". ### Statements -### Applied magics: built-ins +### Applied magics : built-ins ## Control structures ### Functions -Parallel : constants -> variables / list of predefined instructions -> +Parallel : constants -> variables / list of predefined instructions -> dynamic route ### Conditionals