<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>bits_by_seng.log</title>
        <link>https://velog.io/</link>
        <description>An Aspiring Back-end Developer</description>
        <lastBuildDate>Sun, 12 Nov 2023 03:12:34 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>bits_by_seng.log</title>
            <url>https://velog.velcdn.com/images/bits_by_seng/profile/37a9583f-3d25-41eb-b9e7-3df39fe5424f/image.png</url>
            <link>https://velog.io/</link>
        </image>
        <copyright>Copyright (C) 2019. bits_by_seng.log. All rights reserved.</copyright>
        <atom:link href="https://v2.velog.io/rss/bits_by_seng" rel="self" type="application/rss+xml"/>
        <item>
            <title><![CDATA[[Basic Stats] 01.Hypothesis Testing]]></title>
            <link>https://velog.io/@bits_by_seng/Basic-Stats-01.Hypothesis-Testing</link>
            <guid>https://velog.io/@bits_by_seng/Basic-Stats-01.Hypothesis-Testing</guid>
            <pubDate>Sun, 12 Nov 2023 03:12:34 GMT</pubDate>
            <description><![CDATA[<p>This series will be followed a more advanced series in mathematical statistics which I&#39;m still getting the hang of!</p>
<p>This post is meant to be a refresher on the topic! 
If this is your first time with these topics, please checkout more thorough materials. </p>
<h2 id="hypothesis-testing">Hypothesis Testing</h2>
<h3 id="terminology">Terminology</h3>
<p>*<em>Hypothesis : *</em>  A statement on something we are trying to investigate. It&#39;s normally a statement that is based on a belief about the population.</p>
<p>*<em>Null Hypothesis : *</em>The originial statement. The statement about the population we want to test. </p>
<p>*<em>Alternative Hypothesis : *</em> The opposite of the null hypothesis. </p>
<p>*<em>Type I Error : *</em> Thee probability of rejecting the null hypothesis although it is true. If we say tht the possibility of rejecting H0 is $a$ then type I error is $1 - a$. </p>
<p>*<em>Type II Error : *</em> The probability of accepting the H0 despite the fact that it is wrong. </p>
<p>*<em>P-Value : *</em> The probability that the H0 will be true. It is a number between 0 ~ 1. </p>
<h3 id="two-side-test-vs-one-side-test">Two-side Test v.s. One-side Test</h3>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/6c6a60f8-80f9-4892-99a0-79afdf92b728/image.png" alt=""></p>
<table>
<thead>
<tr>
<th align="center"></th>
<th align="center">Two-tailed test</th>
<th align="center">Left-tailed test</th>
<th align="center">Right-tailed test</th>
</tr>
</thead>
<tbody><tr>
<td align="center">Sign in $H_{a}$ rejection region</td>
<td align="center">$\neq$</td>
<td align="center">&lt;</td>
<td align="center">&gt;</td>
</tr>
</tbody></table>
<h2 id="one-sample-hypothesis-test">One Sample Hypothesis Test</h2>
<h3 id="assumptions-about-the-population-mean---population-variance-is-known">Assumptions about the population mean - population variance is KNOWN</h3>
<p>If we know the population variance we can utilize the $z-score$ regardless of the size of the population. </p>
<p>$a = 0.05$, $z =\frac{\bar{x} - \mu}{\sigma/ n^{0.5}}$  </p>
<p><strong>a) $|z_{0}| \geq z_{a/2}$ Reject $H_{0}$</strong>
<strong>b) $z_{0} \geq z_{a}$ Reject $H_{0}$</strong> 
<strong>c) $z_{0} \leq -z_{a}$  Reject $H_{0}$</strong></p>
<p><strong>Otherwise we FAIL to reject $H_{0}$</strong></p>
<h3 id="assumptions-about-the-population-mean---the-population-variance-is-unknown">Assumptions about the population mean - the population variance is UNKNOWN</h3>
<p>$a = 0.05$, $z =\frac{\bar{x} - \mu}{s/ n^{0.5}}$ ~ $t(n-1)$  </p>
<p><strong>$n&gt;30$</strong> <strong>The T-distribution becomes similar to a normal distribution as the degree of freedom increases</strong></p>
<h3 id="assumptions-about-the-population-proportion">Assumptions about the population proportion</h3>
<p>** Remember that the sampling distribution of $\hat{p}$ has a mean of $p$ and a standard deviation of $\sqrt{p(1-P)/n}$ **</p>
<p>$a = 0.05, z= \frac{\hat{p}-p}{\sqrt{p(1-P)/n}}$</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[ML] Data Preprocessing📚]]></title>
            <link>https://velog.io/@bits_by_seng/ML-Data-Preprocessing</link>
            <guid>https://velog.io/@bits_by_seng/ML-Data-Preprocessing</guid>
            <pubDate>Sun, 05 Nov 2023 05:12:06 GMT</pubDate>
            <description><![CDATA[<h2 id="assigning-numbers-to-non-number-values-using-label-encoder">Assigning numbers to non-number values using Label Encoder</h2>
<pre><code class="language-python">import pandas as pd
df = pd.DataFrame({&#39;A&#39;: [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;a&#39;,&#39;b&#39;],
                   &#39;B&#39;: [1, 2, 3, 1, 0]})

df[&#39;le_A&#39;] = le.fit_transform(df[&#39;A&#39;])
df</code></pre>
<blockquote>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/7b1a482e-3b65-4351-a2cb-d43b64d3c148/image.png" alt=""></p>
</blockquote>
<p>We can go from string to int:</p>
<pre><code class="language-python">le.inverse_transform(df[&#39;le_A&#39;])</code></pre>
<h2 id="feature-scaling">Feature Scaling</h2>
<ul>
<li>Feature scaling enables gradient descent to run faster</li>
</ul>
<h3 id="min-max-scaling">Min-max Scaling</h3>
<h2 id="x--fracx-minxmaxx---minx">$x&#39; = \frac{x-min(x)}{max(x) - min(x)}$</h2>
<pre><code class="language-python">df = pd.DataFrame({
    &#39;A&#39;: [10, 20, -10, 0, 25],
    &#39;B&#39;: [1, 2, 3, 1, 0]
})

from sklearn.preprocessing import MinMaxScaler
mms = MinMaxScaler()
mms.fit(df)

df_mms = mms.transform(df)</code></pre>
<blockquote>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/0fd27e59-bc52-4614-af66-b5a440c18102/image.png" alt=""></p>
</blockquote>
<h3 id="standard-scaler-z-score">Standard Scaler (Z-score)</h3>
<h2 id="x--fracx---musigma">$x&#39; = \frac{x - \mu}{\sigma}$</h2>
<pre><code class="language-python">from sklearn.preprocessing import StandardScaler

ss = StandardScaler()
ss.fit(df)
df_ss = ss.transform(df)</code></pre>
<blockquote>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/eadf2560-648f-4131-a9aa-0c52fa253fea/image.png" alt=""></p>
</blockquote>
<h3 id="robust-scaler">Robust Scaler</h3>
<h2 id="x--fracx---q2q3---q1">$x&#39; = \frac{x - Q2}{Q3 - Q1}$</h2>
<pre><code class="language-python">from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler

mm = MinMaxScaler()
ss = StandardScaler()
rs = RobustScaler() 

df_scaler = df.copy()
df_scaler[&#39;MinMax&#39;] = mm.fit_transform(df)
df_scaler[&#39;Standard&#39;] = ss.fit_transform(df)
df_scaler[&#39;Robust&#39;] = rs.fit_transform(df)

df_scaler</code></pre>
<blockquote>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/e640139b-37b4-43fc-860e-a433b8369116/image.png" alt=""></p>
</blockquote>
<p>In general there is very little difference in performance between the MinMaxScaler and the StandardScaler. The Robust scaler may be more &#39;robust&#39; against outliers as the median will be 0. If I am using ReLU as the activation function, for instance, I would lean towards the Min-Max Scaler as this will yield target values between [0, 1].</p>
<h2 id="creating-a-pipeline">Creating a Pipeline</h2>
<pre><code class="language-python">import pandas as pd
red_url = &quot;https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/winequality-red.csv&quot;
white_url = &quot;https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/winequality-white.csv&quot;

red_wine = pd.read_csv(red_url, sep=&quot;;&quot;)
white_wine = pd.read_csv(white_url, sep=&quot;;&quot;)

red_wine[&#39;color&#39;] = 1 
white_wine[&#39;color&#39;] = 0 

wine = pd.concat([red_wine, white_wine])

X = wine.drop([&#39;color&#39;], axis = 1)
y = wine[&#39;color&#39;]

wine.head()</code></pre>
<blockquote>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/a61be309-0981-48e8-ac00-f18cd2bfe2cd/image.png" alt=""></p>
</blockquote>
<pre><code class="language-python">from sklearn.pipeline import Pipeline 
from sklearn.tree import DecisionTreeClassifier 
from sklearn.preprocessing import StandardScaler 

estimators = [
    (&#39;scaler&#39;, StandardScaler()), 
    (&#39;clf&#39;, DecisionTreeClassifier())
]

pipe = Pipeline(estimators)

pipe.steps</code></pre>
<blockquote>
<p>[(&#39;scaler&#39;, StandardScaler()), (&#39;clf&#39;, DecisionTreeClassifier())]</p>
</blockquote>
<pre><code class="language-python">pipe.set_params(clf_max_depth=2)
pipe.set_params(clf__random_stae=13)</code></pre>
<blockquote>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/bd43dc2c-f266-4b63-8c20-37b890d6fe00/image.png" alt=""></p>
</blockquote>
]]></description>
        </item>
        <item>
            <title><![CDATA[[EDA/Python] Playing with Pandas 📊🐼 2편]]></title>
            <link>https://velog.io/@bits_by_seng/EDAPython-Playing-with-Pandas-2%ED%8E%B8</link>
            <guid>https://velog.io/@bits_by_seng/EDAPython-Playing-with-Pandas-2%ED%8E%B8</guid>
            <pubDate>Sat, 21 Oct 2023 09:36:14 GMT</pubDate>
            <description><![CDATA[<h3 id="applying-functions-to-dataframes">Applying Functions to Dataframes</h3>
<p>사실 필자는 영어가 더 편하다. 필자의 학습을 위해 작성 중인 블로그인 만큼 한국말이 생각나지 않으면 그냥 영어로 적도록 하겠다. 🐼</p>
<p>apply()를 활용하여 df 혹은 Series에 함수를 적용시킬 수 있다. </p>
<p>&quot;Dataframe C&quot;
<img src="https://velog.velcdn.com/images/bits_by_seng/post/ef5547ea-6470-49ba-b3b6-d44dbb0c08d9/image.png" alt=""></p>
<pre><code class="language-python">display(C)
G = C.copy() #Copy를 하지 않으면 the dataframe is modified 
G[&#39;year] = G[&#39;year&#39;].apply(lambda x: &quot;&#39;{:02d}&quot;.format(x % 100)) 
display(G) </code></pre>
<p>&quot;Dataframe G&quot;
<img src="https://velog.velcdn.com/images/bits_by_seng/post/8ca7adb7-a693-4fc9-832c-3cb8836af1b5/image.png" alt=""></p>
<p>요렇게 쓴다 이말이야. 근데 이게 상당히 복잡하고 유용해진다. </p>
<h3 id="두-열에-대한-연산을-통해-새로운-열-생성하기">두 열에 대한 연산을 통해 새로운 열 생성하기</h3>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/e60c864f-0215-48e0-bd1a-5ac475356a07/image.png" alt="">
우선 axis = 0과 axis = 1의 방향을 잊지 말자. </p>
<pre><code class="language-python">G[&#39;prevalence&#39;] = G[&#39;cases&#39;] / G[&#39;popuation&#39;] 
</code></pre>
<p>물론 위가 가장 간단한 방법이지만 apply함수를 활용하는 함수를 작성해보자. </p>
<pre><code class="language-python">def calc_prevalence(G):
    assert &#39;cases&#39; in G.columns and &#39;population&#39; in G.columns
    F = G.copy()
    F[&#39;prevalence&#39;] = F.apply(lambda row : row[&#39;cases&#39;]/row[&#39;population&#39;], axis=1)

    return F
display(calc_prevalence(G))</code></pre>
]]></description>
        </item>
        <item>
            <title><![CDATA[[EDA/Python] Playing with Pandas 📊🐼 1편]]></title>
            <link>https://velog.io/@bits_by_seng/EDAPython-Playing-with-Pandas-1%ED%8E%B8</link>
            <guid>https://velog.io/@bits_by_seng/EDAPython-Playing-with-Pandas-1%ED%8E%B8</guid>
            <pubDate>Sat, 21 Oct 2023 09:14:47 GMT</pubDate>
            <description><![CDATA[<p>🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼🐼</p>
<h3 id="tidy-data">Tidy Data</h3>
<p>Tidy data는 데이터가 목적에 맍는 형식을 갖고 있음을 의미한다. R프로그래밍 장인이자 통계학자인 Hadley Wickham에 따르면 *<em>Tidy data는 다음과 같은 조건을 만족하는 2-D 테이블이다: *</em></p>
<blockquote>
<p><em>1. each column represents a variable; 
2. each row represents an observation; 
3. each entry of the table represents a single value, which may come from either categorical(discrete) or continuous spaces.</em></p>
</blockquote>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/dde5b991-6c55-49ce-afad-701710e897a4/image.png" alt="">
<strong><em>&#39;tidy&#39;한 테이블을 우리는 &#39;tibble&#39;이라고 부르기도 한다</em></strong></p>
<pre><code class="language-python">import pandas as pd 
from io import StringIO
from IPython.display import display            #그래프나  df생성시 활용하면 편하다 

A_csv = &quot;&quot;&quot;country,year,cases
Afghanistan,1999,745
Brazil,1999,37737
China,1999,212258
Afghanistan,2000,2666
Brazil,2000,80488
China,2000,213766&quot;&quot;&quot;

with StringIO(A_csv) as fp:
    A = pd.read_csv(fp)
print(&quot;=== A ===&quot;)
display(A)</code></pre>
<blockquote>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/180844f2-7be6-458d-a16d-7ce4da1ee286/image.png" alt=""></p>
</blockquote>
<pre><code class="language-python">A_csv = &quot;&quot;&quot;country,year,cases
Afghanistan,1999,745
Brazil,1999,37737
China,1999,212258
Afghanistan,2000,2666
Brazil,2000,80488
China,2000,213766&quot;&quot;&quot;

with StringIO(A_csv) as fp:
    A = pd.read_csv(fp)
print(&quot;=== A ===&quot;)
display(A)</code></pre>
<blockquote>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/3ccecd7e-e1aa-4954-9483-82e772b16a36/image.png" alt=""></p>
</blockquote>
<p>merge()함수를 이용하여 이 두 df를 쉽게 합칠 수 있다. </p>
<pre><code class="language-python">C = A.merge(B, on=[&#39;country&#39;, &#39;year&#39;])
print(&quot;\n=== C = merge(A, B) ===&quot;)
display(C)</code></pre>
<blockquote>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/8031b431-53ee-4ca8-9960-f99f7da80fe0/image.png" alt=""></p>
</blockquote>
<h3 id="joins">Joins</h3>
<p>쉽게 말하자면... 다음과 같다: </p>
<ul>
<li>Inner-join(A,B) (default): 둘 사이의 교집합만 살리고 나머지는 버림</li>
<li>Outer-join(A,B): 둘 사이의 합집합을 살리는데 non-match에 대해서는 NaN으로 채워버림 </li>
<li>Left-join(A,B): A의 모든 행을 살리고 A와 맞는 B만 살림</li>
<li>Right-join(A,B): left-join 반대 </li>
</ul>
<pre><code class="language-python">with StringIO(&quot;&quot;&quot;x,y,z
bug,1,d
rug,2,d
lug,3,d
mug,4,d&quot;&quot;&quot;) as fp:
    D = pd.read_csv(fp)
print(&quot;=== D ===&quot;)
display(D)

with StringIO(&quot;&quot;&quot;x,y,w
hug,-1,e
smug,-2,e
rug,-3,e
tug,-4,e
bug,1,e&quot;&quot;&quot;) as fp:
    E = pd.read_csv(fp)
print(&quot;\n=== E ===&quot;)
display(E)

print(&quot;\n=== Outer-join (D, E) ===&quot;)
display(D.merge(E, on=[&#39;x&#39;, &#39;y&#39;], how=&#39;outer&#39;))

print(&quot;\n=== Left-join (D, E) ===&quot;)
display(D.merge(E, on=[&#39;x&#39;, &#39;y&#39;], how=&#39;left&#39;))

print(&quot;\n=== Right-join (D, E) ===&quot;)
display(D.merge(E, on=[&#39;x&#39;, &#39;y&#39;], how=&#39;right&#39;))


print(&quot;\n=== Inner-join (D, E) ===&quot;)
display(D.merge(E, on=[&#39;x&#39;, &#39;y&#39;]))</code></pre>
<blockquote>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/8210350a-377e-4d53-abfe-983020bfea69/image.png" alt="">
<img src="https://velog.velcdn.com/images/bits_by_seng/post/49dfe96c-5576-4693-ac46-12c53f731f5c/image.png" alt="">
<img src="https://velog.velcdn.com/images/bits_by_seng/post/e875944a-d2e7-42a6-a99d-398be7af18a2/image.png" alt=""></p>
</blockquote>
<p>참 쉽죠~?</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[EDA/Python] Row Major v.s. Column Major 📊]]></title>
            <link>https://velog.io/@bits_by_seng/EDAPython-Row-Major-v.s.-Column-Major</link>
            <guid>https://velog.io/@bits_by_seng/EDAPython-Row-Major-v.s.-Column-Major</guid>
            <pubDate>Sat, 21 Oct 2023 03:43:26 GMT</pubDate>
            <description><![CDATA[<h3 id="이것은-무엇인교">이것은 무엇인교?</h3>
<p>2차원 이상의 배열을 사용할때 주의해야 하는 것이 바로 row-major와 column-major이다. </p>
<p>배열의 차원과 관계없이 저장 장치에 정장될 때에는 반드시 1차원으로 저장된다. </p>
<p>그럼 2차원 배열을 어떻게 1차원을 필 수 있을까? </p>
<h3 id="row-major">Row-major</h3>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/e7393636-7e4c-493c-a958-9e415f5aa2a8/image.png" alt=""></p>
<p>row-major는 <strong>row 단위로 저장하겠다는 것을 의미한다.</strong> </p>
<p>즉, 다음과 같이 저장된다. </p>
<blockquote>
<p>[a11    a12    a13    a21    a22    a23    a31    a32    a33] </p>
</blockquote>
<p>기존 index를 1차원 row-major 리스트 index로 반환하는 함수를 작성해보자! </p>
<p>n = 행의 수 
m = 열의 수 
i = 행 index
j = 열 index </p>
<pre><code class="language-python">def linearize_rowmajor(i, j, m, n): # calculate `v`

    return i * n + j
</code></pre>
<p>참 쉽죠? </p>
<h3 id="column-major">Column-major</h3>
<p><img src="https://velog.velcdn.com/images/bits_by_seng/post/e4f6d22b-14fa-420a-b562-0c95de6ef198/image.png" alt="">
같은 원리니 설명은 생략한다. 
<img src="https://velog.velcdn.com/images/bits_by_seng/post/83205983-c71d-4bba-b9b3-08cbef787914/image.png" alt=""></p>
<p>Col-major 함수는 다음과 같다. </p>
<pre><code class="language-python">def linearize_colmajor(i, j, m, n): # calculate `u`

    return i + (j*m)</code></pre>
]]></description>
        </item>
        <item>
            <title><![CDATA[[EDA/Python] Numpy! Numpy What and Why? 📊]]></title>
            <link>https://velog.io/@bits_by_seng/EDAPython-Numpy-Numpy-What-and-Why</link>
            <guid>https://velog.io/@bits_by_seng/EDAPython-Numpy-Numpy-What-and-Why</guid>
            <pubDate>Sat, 21 Oct 2023 02:25:09 GMT</pubDate>
            <description><![CDATA[<h3 id="numpy란">Numpy란?</h3>
<p>요즘 1년간 거의 매일 진행해온 수학 공부가 결실을 맺고 있는 것 같아 기분이 좋다. 머신러닝 공부를 최근에 본격적으로 시작하면서 수학 때문에 막힌 적은 크게 없는 것 같다 (IQ가 몇 점 부족하여 발생하는 문제는 빈번하다).</p>
<p>아무튼, Numpy란 &#39;multidimensional arrays&#39;에 대한 연산을 용이하게 해주는 라이브러리다. 그냥 기본 리스트 혹은 딕셔너리를 사용하는 것보다 훨씬 빠르다. 특히 &#39;gradient descent&#39;를 생각한다면 for loop을 돌려 parameter를 업데이트 해주는 것보다 np.dot 혹은 np.matmul 등의 기능을 활용하면 훨씬 빠르게 행렬 연산을 진행할 수 있다. 이런 얘기는 추후 machine learning 관련 포스팅에서 더 자세히 하도록 하겠다. </p>
<h3 id="why-numpy">Why Numpy?</h3>
<p>&#39;vectorization&#39;은 머신러닝에 알고리즘에 매우 중요하다. 아래 코드를 살펴보자. </p>
<pre><code class="language-python">import numpy
import time
 size = 1000000  

list1 = range(size)
list2 = range(size)

array1 = numpy.arange(size)  
array2 = numpy.arange(size)

initialTime = time.time()
resultantList = [(a * b) for a, b in zip(list1, list2)]

print(&quot;Time taken by Lists :&quot;, 
      (time.time() - initialTime),
      &quot;seconds&quot;)

initialTime = time.time()
resultantArray = array1 * array2

print(&quot;Time taken by NumPy Arrays :&quot;,
      (time.time() - initialTime),
      &quot;seconds&quot;)</code></pre>
<pre><code class="language-python">&gt; Time taken by Lists : 1.1984527111053467 seconds
  Time taken by NumPy Arrays : 0.13434123992919922 seconds
</code></pre>
<p>리스트를 &#39;vectorize&#39;하여 행렬처럼 대하면 훨씬 빠르게 결과를 산출할 수 있다. </p>
<h3 id="기본문법">기본문법</h3>
<ul>
<li><p>행렬 생성 </p>
<pre><code class="language-python">B = np.array([[0, 1, 2, 3], 
            [4, 5, 6, 7], 
            [8, 9, 10, 11]])</code></pre>
</li>
<li><p>B 모양 확인 </p>
<pre><code class="language-python">print(B.shape)</code></pre>
<pre><code class="language-python">&gt; (3, 4)</code></pre>
</li>
<li><p>3 X 4 &#39;0&#39; 행렬 생성, 3 x 3  Identity 행렬 생성 </p>
<pre><code class="language-python">print(np.zeroes((3,4)))
print(np.eye(3))</code></pre>
<blockquote>
<pre><code class="language-python">[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]</code></pre>
</blockquote>
<pre><code></code></pre></li>
</ul>
<blockquote>
<pre><code class="language-python">[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]</code></pre>
</blockquote>
<pre><code>
### Indexing and Slicing 
![](https://velog.velcdn.com/images/bits_by_seng/post/93717366-0f2c-4ba1-bbdb-0cd2405651fd/image.png)

```python
&gt; Z= np.array([[0,1,2,3,4,5],
             [10,11,12,13,14,15],
             [20,21,22,23,24,25],
             [30,31,32,33,34,35],
             [40,41,42,43,44,45],
             [50,51,52,53,54,55]])

# Construct `Z_green`, `Z_red`, `Z_orange`, and `Z_cyan`:
Z_green = Z[(2,4), ::2]
Z_red = Z[:, 2]
Z_orange = Z[0, 3:5]
Z_cyan = Z[(4,5), 4:6]</code></pre><p>크게 어려울 건 없다. 리스트 인덱싱과 비슷하다고 생각하면 된다. 
메모리 공간을 고려했을때 Z_green 등은 그냥 &#39;view&#39;이다. Slicing을 하여 변수를 선언한다고 새로운 메모리 공간이 할당 되는 것은 아니다. 마찬가지로 새로운 객체를 생성하고 싶다면 Z[:, 2].copy() 를 선언하면 된다. </p>
<h3 id="indirect-addressing">Indirect Addressing</h3>
<p>&#39;Boolean Mask&#39; 또는 &#39;Indices&#39;로 구성된 array를 통해 indxing을 할 수도 있다. </p>
<pre><code class="language-python">from numpy.random import default_rng 
rng = default_rng(12345) 

x = rng.integers(0, 20, 15) 
print(x)
&gt; [13 4 15 6 4 15 12 13 19 7 16 6 11 11 4]

inds = np.array([3, 7, 7, 12])
print(x[inds])
&gt; [6 13 19 11]

mask_mult_3 = (x &gt; 0) &amp; (x % 3 ==0) 
print(&quot;x:&quot;, x)
print(&quot;mask_mult_3:&quot;, mask_mult_3)
print(&quot;==&gt; x[mask_mult_3]:&quot;, x[mask_mult_3]) 
&gt;x: [13 4 15 6 4 15 12 13 19 7 16 6 11 11 4]
&gt;mask_mult_3: [False False  True  True False  True  True False False False False  True
 False False False]
&gt;==&gt; x[mask_mult_3]: [15 6 15 12 6]</code></pre>
<h3 id="응용">응용</h3>
<p>20까지의 소수를 모두 찾는 알고리즘을 작성해보자. 에라토스테네스의 체를 numpy를 활용하여 작성할 수 있다. 사실 불필요하며 코딩테스트에서는 그냥 리스트를 활용할 것 같다. </p>
<pre><code class="language-python">from math import sqrt
def sieve(n):

    is_prime = np.empty(n+1, dtype=bool) # the &quot;sieve&quot;

    # Initial values
    is_prime[0:2] = False # {0, 1} are _not_ considered prime
    is_prime[2:] = True # All other values might be prime

    m = int(sqrt(n)) + 1

    for i in range(2, m):
        if is_prime[i] == True:
            for j in range(i+i, n+1, i):
                is_prime[j] = False 

    return is_prime

# Prints your primes
print(&quot;==&gt; Primes through 20:\n&quot;, np.nonzero(sieve(20))[0])
&gt;==&gt; Primes through 20:  
 [2 3 5 7 11 13 17 19]</code></pre>
]]></description>
        </item>
        <item>
            <title><![CDATA[[Algorithms/Python] 유용한 수학 알고리즘 정리 1편 📒]]></title>
            <link>https://velog.io/@bits_by_seng/AlgorithmsPython-%EC%9C%A0%EC%9A%A9%ED%95%9C-%EC%88%98%ED%95%99-%EC%95%8C%EA%B3%A0%EB%A6%AC%EC%A6%98-%EC%A0%95%EB%A6%AC-1%ED%8E%B8</link>
            <guid>https://velog.io/@bits_by_seng/AlgorithmsPython-%EC%9C%A0%EC%9A%A9%ED%95%9C-%EC%88%98%ED%95%99-%EC%95%8C%EA%B3%A0%EB%A6%AC%EC%A6%98-%EC%A0%95%EB%A6%AC-1%ED%8E%B8</guid>
            <pubDate>Mon, 16 Oct 2023 12:26:52 GMT</pubDate>
            <description><![CDATA[<h1 id="에라토스테네스의-체eratosthenes-sieve로-소수-구하기">에라토스테네스의 체(Eratosthenes Sieve)로 소수 구하기</h1>
<p>자연수 n이 소수인지를 판별하기 위해서는 2부터 n-1까지 for 반복문을 돌려 나누어 떨어지는 숫자가 있는지 확인하는 방법을 사용할 수 있다. 하지만 이러한 일반적인 방법을 사용할 경우 <strong><em>O(n)</em></strong>의 복잡도를 갖기 때문에 시간이 초과될 것이다.</p>
<p>따라서 우리는 <strong>에라토스테네스의 체</strong>라는 알고리즘을 사용할 것이다. </p>
<h3 id="논리">논리</h3>
<p>각 수가 갖는 약수는 제곱근을 기준으로 대칭을 이루기 때문에 제곱근까지만 나누어 떨어지는 숫자가 있는지 확인하면 된다. </p>
<p><strong>이는 제곱근 까지의 숫자의 배수를 모두 배제시키는 알고리즘을 구현하면 된다는 것을 의미한다!</strong></p>
<blockquote>
<p>파이썬으로 구현해보자</p>
</blockquote>
<pre><code class="language-python">def prime_list(n):
    # Updating a list with numbers from 0-n (assume all are prime)
    sieve = [True] * n

    m = int(n ** 0.5)
    for i in range(2, m + 1):
        if sieve[i] == True:  # i가 소수인 경우
            for j in range(i + i, n, i):  # i이후 i의 배수들을 False 판정
                sieve[j] = False

    # 소수 목록 산출
    print(sieve)
    return [i for i in range(2, n) if sieve[i] == True]</code></pre>
<p><strong><em>시간복잡도가 대략 O(√n)</em></strong>으로 줄어든다!
<strong>일정 숫자까지의 소수를 구하는 알고리즘에서 효율이 매우 증가한다.</strong>
즐겁지 아니한가?!</p>
<h3 id="에라이토레타의-체를-이용한-소인수분해">에라이토레타의 체를 이용한 소인수분해</h3>
<p>같은 논리를 적용하여 소인수분해를 할 수 있다. n을 √n 으로 나눴을 때 √n 보다 큰 수가 나올 수 없다. 따라서 n이 1이 될때까지 나눠서 소인수 분해 할 필요는 없다.</p>
<p>그냥 코드를 보면 안다 </p>
<blockquote>
<p>파이썬으로 구현구현~</p>
</blockquote>
<pre><code class="language-python">N = int(sys.stdin.readline().strip())
d = 2
M = N ** (0.5)

while d &lt;= M:
    if N % d != 0:
        d +=1
    else:
        print(d)
        N //= d
if N &gt; 1:
    print(N)</code></pre>
<h1 id="유클리드-호제로-gcd--lcm-구하기">유클리드 호제로 GCD &amp; LCM 구하기</h1>
<p>a, b가 있을 경우 min(a,b)를 i로 설정하고 a % i == 0 and b % i ==0가 될때까지 i -= 1을 하며 while룹을 돌려도 된다. 하지만 최악의 경우 i만큼 룹을 돌아야할 수 있기 때문에 시간이 초과될 가능성이 높다. </p>
<p><strong>그래서 우리는 유클리드 알고리즘을 사용해야 한다!</strong></p>
<h3 id="논리-1">논리</h3>
<p>2 개의 자연수 a, b (a &gt; b)에 대해서 a를 b로 나눈 나머지가 r일 때, a와 b의 최대공약수는 b와 r의 최대공약수와 같다. 재귀냄새가 물씬 나지 않는가?!</p>
<blockquote>
<p>파이썬으로 구현해보자</p>
</blockquote>
<pre><code class="language-python">def gcd_u(a,b):
    bigger = max(a,b)
    smaller = min(a,b)
    if smaller == 0:
        return bigger
    return gcd_u(smaller, bigger % smaller)</code></pre>
<p><strong>마찬가지로 시간이 훨씬 절약된다!</strong></p>
<p>그럼 LCM은 어떻게 구하나요? </p>
<p><strong>두 수 a와 b의 최소공배수는 a와 b의 곱을 a와 b의 최대공약수로 나눈 것과 같다!</strong></p>
]]></description>
        </item>
    </channel>
</rss>