cctg: Programming Concepts in Julia

julia 提供很多讓 data scientists , statisticians 使用的功能，也非常接近 MATLAB, Python, R 的語法。

Revisiting programming paradigms

有四種：

Imperative: C

sequential execution，可改變變數的值。是以 Von Neumann computer 概念發展的，有 reusable memory，可修改 state。

優點：
- 有效利用系統資源
- 接近 machine language
缺點：
- 有些問題無法單純以遵循 order of statements 解決
- 因為可修改 state，讓程式比較難被理解
- debug 不直覺
- 只能完成有限的抽象化

Logical: Prolog

也稱為 rule-based paradigm，程式會處理資料，建立 rules 的組合，提供 logical expression。沒有 function，以 relations 取代。 Y=f(x) 改為 r(X, Y)

ex:

male(X) // X is a male
father(F,X) // F is father of X
father(F,Y) // F is father of Y
mother(M,X) // M is mother of X
mother(M,Y) // M is mother of Y

// The preceding relationship implies:
brother(X,Y) // X is a brother of Y

Functional: Haskell

由數學概念發展而來，將所有subprogram 都視為 functions。function 可當參數，也可以回傳 function。可將 imperative paradigm 轉換為 functional paradigm。

優點：
- function 為高階抽象概念，減少錯誤。
- 適合用在 parallel computation
- value 為 non-mutable
缺點：
- 在需要很多 sequential activity 的狀況下會很複雜。使用 imperative / object-oriented 會比較洽當。
- 程式可能會比較沒有效率
Object-oriented: Smalltalk

everythin is an object，可利用 behavior/method 修改物件的狀態 state

有四個基本概念：
- encapsulation
- abstration
- inheritance
- polymorphism

Starting with Julia REPL

julia REPL 可直接執行 statements

$ julia -e 'println("HelloWorld")'
HelloWorld

$ julia -e 'for i=1:5; println("HelloWorld"); end'
HelloWorld
HelloWorld
HelloWorld
HelloWorld
HelloWorld

$ julia -e 'for i in ARGS; println(i); end' test1 test2 test3
test1
test2
test3

ARGS 可以取得 command line 參數

Variables

swap 變數

julia> x=24
24

julia> y=10
10

julia> x,y=y,x
(10, 24)

julia> x
10

可使用 unicode 變數名稱，或是以 _ 開頭

julia> 測試=100
100

julia> 測試
100

julia> _ab=40
40

pi 是內建的常數，不要用在自訂變數名稱

julia> pi
π = 3.1415926535897...

Naming Conventions

通常變數名稱是小寫字母
可使用 _ 但不建議使用
Function 及 macro 名稱是小寫字母
module 與 type 的第一個字母為大寫，名稱中不同的字，字首大寫: Upper Camel Case
會修改/改寫參數(輸入資料)的 function，以 ! 結尾

a = Int64[]
push!(a, 1)    # => [1]
push!(a, 2)    # => [1,2]
push!(a, 3)    # => [1,2,3]
pop!(a)  # => 3
a # => [1,2]

可使用 typeof() 查詢變數的資料型別

julia> typeof(a)
Array{Int64,1}

numbers 可以用 _ 增加可讀性

julia> 100_000_000
100000000

julia> 1_0_0
100

Integers, bits, bytes, and bools

primitive numeric types

Type	number of bits	smallest values	largest value
Int8	8	-2^7	2^7-1
UInt8	8	0	2^8-1
Int16	16	-2^15	2^15-1
UInt16	16	0	2^16-1
Int32	32	-2^31	2^31-1
UInt32	32	0	2^32-1
Int64	64	-2^63	2^63-1
UInt64	64	0	2^64-1
Int128	128	-2^127	2^127-1
UInt128	128	0	2^128-1
Bool	8	false(0)	true(1)

可用 typemax, typemin 取得最大/最小值

julia> typemax(Int32)
2147483647

julia> typemin(Int32)
-2147483648

在 64bits 機器，數字預設型別為 Int64，也可用 Sys.WORD_SIZE (word size) 檢查機器的位元數

julia> typeof(10)
Int64
julia> Sys.WORD_SIZE
64

因為 julia 是 strong-type language，變數會維持原本的資料型別，在發生 overflowing 時，可能會取得錯誤的答案

julia> x = Int16(10000)
10000

julia> println(x*x)
-7936

julia> x=typemax(Int16)
32767

julia> x+Int16(1)
-32768

Bool 可用 true/false 為值， 0, NULL, empty string 不能視為 false

julia> 1>2
false

julia> typeof(ans)
Bool

julia> if 0
       println("hello")
       end
ERROR: TypeError: non-boolean (Int64) used in boolean context
Stacktrace:
 [1] top-level scope at none:0

Floating point number in Julia

type	precision	number of bits
Float16	half	16
Float32	single	32
Float64	double	64

julia> 100.0
100.0

julia> 24.
24.0

julia> .1
0.1

julia> typeof(ans)
Float64

julia> 0 == -0
true

julia> bitstring(0)
"0000000000000000000000000000000000000000000000000000000000000000"

julia> bitstring(-0)
"0000000000000000000000000000000000000000000000000000000000000000"

可使用 exponenital notation

julia> 2.99e8
2.99e8

julia> 2.99e8 > 99999
true

julia> 2.99f8
2.99f8

julia> 2.99e8 == 2.99f8
true

julia> typeof(2.99e8)
Float64

julia> typeof(2.99f8)
Float32

也可以使用 hexadecimal floating point literals，但只有 Float64

julia> 0x4.1p1
8.125

julia> typeof(ans)
Float64

Inf, -Inf, NaN

julia> 1/0
Inf

julia> -1/0
-Inf

julia> 0/0
NaN

julia> Inf/Inf
NaN

下(前)一個可以表示的 floating point number: nextfloat(), prevfloat()

julia> nextfloat(0.0)
5.0e-324

julia> prevfloat(0.0)
-5.0e-324

有時會遇到，計算結果跟預期的不同，這是因為該數字無法以正確的 floating-point representation 表示，就必須 rounded 到適當的 value。預設是 RoundNearest

julia> 0.1+0.2
0.30000000000000004
julia> 1.1+0.1
1.2000000000000002

這是因為 computer 無法正確表示 0.1, 0.2, 0.3 這樣的數字

ref: Strange behaviour when adding floating numbers

ref: What Every Programmer Should Know About Floating-Point Arithmetic

BigInt, BigFloat

julia 提供了不同精確度的計算方法。他使用了 GNU Multiple Precision Arithmetic Library, GMP 和 GNU MPFR Library，並使用 BigInt, BigFloat 提供任意精確度的整數及浮點數。

julia> setrounding(BigFloat, RoundUp) do
                  BigFloat(1) + parse(BigFloat, "0.1")
              end
1.100000000000000000000000000000000000000000000000000000000000000000000000000003

julia> setrounding(BigFloat, RoundDown) do
                  BigFloat(1) + parse(BigFloat, "0.1")
              end
1.099999999999999999999999999999999999999999999999999999999999999999999999999986

julia> setprecision(40) do
                  BigFloat(1) + parse(BigFloat, "0.1")
              end
1.1000000000004

在寫數學方程式時，不需要特定寫 *

julia> x=4;y=5
5

julia> 3x+4y+5
37

Logical and arithmetic operations

+, -, *, /, ^, !, and %

julia> a=20;b=10
10

julia> a+b
30

julia> -a
-20

julia> !(4>2)
false

julia> -(-a)
20

bitwise operations

expression	name
~x	bitwise not
x & y	bitwise and
x \| y	bitwise or
x ⊻ y	bitwise xor
x >>> y	logical shift right
x >> y	arithmetic shift right
x << y	logical/arithmetic shift left

+=, -=, *=, /=

julia> x=4
4

julia> x+=10
14

julia> x/=2
7.0

==, !=, <, <=, >, >=

julia> 100 > 99.99
true

julia> 24 == 24.0
true

julia> 24 === 24.0
false

julia> 24 !== 24.0
true

julia> NaN == NaN
false

julia> NaN === NaN
true

julia> Inf == Inf
true

julia> Inf >= NaN
false

可以 chain together

julia> 10 > 20 < 30 >= 30.0 == 100 > 101
false

precendence of operators 的執行順序

Syntax (. followed by ::)
Exponentiation (^)
Fractions (//)
Multiplication (*, /, %, &, and )
Bitshifts (<<, >>, and >>>)
Addition (+, -, |, and $)
Syntax (:, .., and |>)
Comparisons (>, <, >=, <=, ==, ===, !=, !==, and <:)
Control flow (&& followed by || followed by ?)
Assignments (=, +=, -=, *=, /=, //=, =, ^=, ÷=, %=, |=, &=, $=, <<=, >>=, and >>>=)

型別轉換

julia> Int8(100)
100

julia> Int8(100*10)
ERROR: InexactError: trunc(Int8, 1000)
Stacktrace:
 [1] throw_inexacterror(::Symbol, ::Any, ::Int64) at ./boot.jl:567
 [2] checked_trunc_sint at ./boot.jl:589 [inlined]
 [3] toInt8 at ./boot.jl:604 [inlined]
 [4] Int8(::Int64) at ./boot.jl:714
 [5] top-level scope at none:0

julia> Int16(100*10)
1000

Arrays and matrices

array 的 index 以 1 開始，不是 0。另外 array 的所有資料的型別都會是一樣的。

julia> a=[1,2,3,4]
4-element Array{Int64,1}:
 1
 2
 3
 4

julia> a[2]
2

julia> a[2:4]
3-element Array{Int64,1}:
 2
 3
 4

julia> ra=rand(1:10, 6)
6-element Array{Int64,1}:
  5
  2
  6
 10
  1
  5

julia> b=[1,2,2.5]
3-element Array{Float64,1}:
 1.0
 2.0
 2.5

List Comprehension in Julia

julia> pow2 = Array{Int64}(undef, 10)
10-element Array{Int64,1}:
 4634856432
 4634856352
          1
 4634856512
 4634856592
 4634856672
          8
          4
          2
          1

julia> pow2[1] = 2
2

julia> [pow2[i] = 2^i for i = 2:length(pow2)]; pow2
10-element Array{Int64,1}:
    2
    4
    8
   16
   32
   64
  128
  256
  512
 1024

建立空白沒有任何元素的 array

julia> empty_array = Float64[]
0-element Array{Float64,1}

julia> println(empty_array)
Float64[]

julia> push!(empty_array,1.1)
1-element Array{Float64,1}:
 1.1

julia> push!(empty_array,2.2,3.3)
3-element Array{Float64,1}:
 1.1
 2.2
 3.3

julia> append!(empty_array,[101.1,202.2,303.3])
6-element Array{Float64,1}:
   1.1
   2.2
   3.3
 101.1
 202.2
 303.3

產生 4x1 matrix

julia> X = Array{Int64}(undef, 4,1)
4×1 Array{Int64,2}:
 4371931344
 4367076560
 4496885520
 4405372192

julia> fill!(X,4)
4×1 Array{Int64,2}:
 4
 4
 4
 4

julia> X[2] = 10; X
4×1 Array{Int64,2}:
  4
 10
  4
  4

矩陣

# 產生 3x2 矩陣
julia> A = [2 4; 8 16; 32 64]
3×2 Array{Int64,2}:
  2   4
  8  16
 32  64

# reshape 可轉換為轉置矩陣 2x3
julia> println(reshape(A,2,3))
[2 32 16; 8 4 64]

julia> println(reshape(A,1,6))
[2 8 32 4 16 64]

# transpose 也是轉置矩陣
julia> transpose(A)
2×3 LinearAlgebra.Transpose{Int64,Array{Int64,2}}:
 2   8  32
 4  16  64

矩陣的加法跟乘法

julia> B = [1 1 2; 3 5 8]
2×3 Array{Int64,2}:
 1  1  2
 3  5  8

julia> transpose(A)+B
2×3 Array{Int64,2}:
 3   9  34
 7  21  72

julia> transpose(A)*transpose(B)
2×2 Array{Int64,2}:
  74  302
 148  604

有一種特別的矩陣乘法 .*，就是將各對應元素相乘，而不是標準的矩陣乘法

julia> transpose(A).*B
2×3 Array{Int64,2}:
  2   8   64
 12  80  512

julia> transpose(A) .== B
2×3 BitArray{2}:
 false  false  false
 false  false  false

rand 可產生亂數矩陣

julia> multiA = rand(3,3,3)
3×3×3 Array{Float64,3}:

sparse matrix 稀疏矩陣: 矩陣內大部分的元素都是0

julia> using SparseArrays

julia> sm = spzeros(5,5)
5×5 SparseMatrixCSC{Float64,Int64} with 0 stored entries

julia> sm[1,1] = 10
10

julia> sm
5×5 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
  [1, 1]  =  10.0

Understanding DataFrames

DataFrame 是有 labeled columns 的 data structure，很像是 SQL table/spreadsheet，2 dimensions，可視為 a list of dictionaries

以DataFrames.jl 這個 package 處理，建議用在 statistical analysis

在新版 julia 1.0 以後，DataArray 不能使用了改用 missing 代表 missing value

# 首先必須要先知道 array，可能會有兩種資料型別，要事先定義
julia> x = Union{Missing, String}["a", "b"]
2-element Array{Union{Missing, String},1}:
 "a"
 "b"

# 就可以將某個元素設定為 missing
julia> x[1] = missing
missing

julia> x
2-element Array{Union{Missing, String},1}:
 missing
 "b"

julia> Pkg.add("DataFrames")

julia> using DataFrames

julia> df = DataFrame(Name = ["Julia", "Python"], Version = [0.5, 3.6])
2×2 DataFrame
│ Row │ Name   │ Version │
│     │ String │ Float64 │
├─────┼────────┼─────────┤
│ 1   │ Julia  │ 0.5     │
│ 2   │ Python │ 3.6     │

References

Learning Julia

cctg

2019/5/6

Programming Concepts in Julia

Revisiting programming paradigms

Starting with Julia REPL

Variables

Integers, bits, bytes, and bools

Floating point number in Julia

Logical and arithmetic operations

Arrays and matrices

Understanding DataFrames

References

沒有留言:

張貼留言

analytics

Creative Commons License