´ÙÀ½Àº Á¦°¡ science writing course¿¡¼ ¾´ Journal articleÀÔ´Ï´Ù. ¿ø·¡ hwp file·Î ¾´ °ÍÀ» html·Î ¿Å±ä °ÍÀÌ¾î¼ ±×¸²°ú Ç¥´Â Áö¿ü½À´Ï´Ù. ºÒ¿ÏÀüÇϳª¸¶ ´ë°ÀÇ ³»¿ëÀº ¾Æ½Ç ¼ö ÀÖÀ» °ÍÀÔ´Ï´Ù. ÀÌ ±ÛÀº Àü»êÇÐÀ» À§ÇÑ ±ÛÀ̶ó±â º¸´Ù´Â ¿µ¾î °øºÎ¸¦ À§ÇÑ ±ÛÀÏ °ÍÀ̶õ »ý°¢À¸·Î ¿Ã¸° ±ÛÀ̶ó´Â Á¡ ¹àÇôµÓ´Ï´Ù. ÇѱÛÆÄÀÏÀº ¿©±â ÀÖ½À´Ï´Ù.
Title : The optimal sorting method according to the characteristics of the data
Abstract
Sorting is a process rearranging the elements of a list into some linear
order. Sorting is a very important subject in computer science because of its
practical nature and its role as an example of various versions of problem solving.
In this article, five internal sorting algorithms - selection sort, insertion
sort, bubble sort, quick sort, and merge sort - are presented and compared with
each other. Though the quick sort is often the best sorting method, various
sorting methods can be optimal according to the data characteristics.
Introduction
Sorting is the process for rearranging the elements of a list into
some linear order[1]. Sorting is the most important problem in computer science,
especially in the field of algorithms. First, sorting is a very practical problem.
By experimental estimation, almost 70% of total computer time - every computer
program in every computer on earth - is spent sorting and searching. In
the case of searching, before searching a element in the list, sorting should
be done previously for a efficient speed[2]. Second, the study of sorting illustrates
many different approaches the same problem. This variety of ideas toward the
same topic is an essential point in studying algorithms, which is one of the
most fundamental aspects of computer programs.
In this review, five internal sorting algorithms - selection sort, insertion
sort, bubble sort, quicksort, and mergesort - will be introduced. Internal sorting
takes place in the main memory of a computer, where random access capacity is
used. The opposite process, external sorting, is necessary when the number of
elements to be sorted is too large to fit in main memory. In such cases, the
bottleneck between the internal memory and the external memory device is too
critical, and a completely different approach is needed[3].
The selection sort is a very elementary algorithm. In this sorting process,
first the smallest element among n elements(the list size is n)
is selected, and swapped with the first element of the list. Next, the smallest
element among n-1 elements, from the second to the last element, is selected
and swapped with the second element of the list. As a result, after the mth
pass, m smallest elements will occupy the first m elements in
sorted order.
The insertion sort is another elementary algorithm. Its main methodology
is inserting an element into the right place in a sorted list. In this sorting
process, an element can be considered sorted(thought it is only one element)
and the size of the sorted list is increased by insertion while the list's order
is maintained. As a result, after the mth pass, m elements are
sorted and n-m elements from the m+1th element to mth(last)
element are not sorted(m<n).
The basic idea behind the bubble sort is that the elements to be sorted
are kept in a vertical list. From the first two elements, the larger element
of two adjacent elements moves to right, and the smaller element moves to left.
Next, for the second and third element of the list, the same operation takes
place. After repeating the swapping operation until the last two elements of
the list, the largest element will be at the right end. Same procedure is then
repeated from the first element to the n-1th element. The last element
is the largest element; in other words, it is already at the right position.
After the mth repetition, m large elements are placed at the right
position.
A quicksort is more complicated. The main idea of the quick sort originates
from the "divide and conquer" methodology[4]. First, an element is
selected and the list is rearranged with the selected number as in Fig.1 following.
The merge sort also uses the "divide and conquer" methodology[6]. Though the quick sort algorithm decomposes the list into two sublists of different sizes, merge sort does it into equal size. After decomposing the list, the two lists are sorted separately by using the merge sort algorithm. The strategy of the merge sort is in Fig.2.
Discussion
The O notation is a good measure for comparing the efficiency of each sorting method[7]. In other words, the O notation is a useful measurement to estimate the running-time of a program or subroutine in computer science(in such cases the O notation can describe the time complexity of a program). Mathematically the O notation is defined like this:
The running time T(n) of some program is O(f(n)) there are positive constants c and n0 such that for n equal to or greater than n0, we have T(n).
In the case of the above mentioned sorting methods, the methods are O(n^2) or O(nlgn). On average, the selection, insertion, and bubble sorts are the O(n^2) algorithms and the quicksort and mergesort are the O(nlgn) algorithms. By mathematical intuition, the latter are more efficient than the former, and when sufficiently large amounts of data are sorted, the former are almost useless. The relation between the size of data and the sorting time of O(nlgn) and O(n^2 ) are shown below in Table 1[8].
Besides the average, in the worst case the O notation can be different. For example, the worst-case time complexity of quicksort is O(n^2). The average-case time complexity and the worse-case time complexity of each sorting method are as below (Table 2).
Though their inefficiency in the worst-case, a quick sort is the most
efficient method for average data. For this reason, quicksort are widely used
in spite of their worst-case inefficiency[9].
The speed of each sorting method can vary according to data size, and
the size of the data to be sorted is sometimes the essential factor in choosing
the optimal sorting method. For this reason, it is necessary to check the time
used for sorting in various data sizes. Fig. 3 below shows the results[10].
Conclusion
When data size is sufficiently large, a quicksort is the best sorting
method, in spite of its inefficiency for the worst-case data; but if stable
efficiency is required, the merge sort is the best method. When data size is
small(less than 20), the insertion or selection sort is the best method[11].
This is because that the overhead required for O(nlgn) algorithms is quite large.
In other words, the efficiency of O(nlgn) algorithms does not affect the sorting
time such small amounts of data. In particular when the data are almost sorted
prior to the beginning of the sorting, insertion sort is most effective[12].
References
[1] Aho, Hopcroft, Ullman, Data structures and algorithms, Addison-Wesley,
pp.253
[2] Baase, Computer algorithms - Introduction to design and analysis, Addison-Wesley,
pp. 48
[3] Baase, Computer algorithms - Introduction to design and analysis, Addison-Wesley,
pp. 48
[4] Baase, Computer algorithms - Introduction to design and analysis, Addison-Wesley,
pp. 53
Aho, Hopcroft, Ullman, Data structures and algorithms, Addison-Wesley,
pp.306
[5] Horowitz, Sahni, Anderson-freed, Fundamentals of data structures is C, Computer
science press, pp. 332
[6] Baase, Computer algorithms - Introduction to design and analysis, Addison-Wesley,
pp. 53
[7] Aho, Hopcroft, Ullman, Data structures and algorithms, Addison-Wesley, pp.
17
[8] Horowitz, Sahni, Anderson-freed, Fundamentals of data structures is C, Computer
science press, pp. 40
[9] Horowitz, Sahni, Anderson-freed, Fundamentals of dat
a structures is C, Computer science press, pp. 329
[10] Horowitz, Sahni, Anderson-freed, Fundamentals of data structures is C,
Computer science press, pp. 370
[11] Aho, Hopcroft, Ullman, Data structures and algorithms, Addison-Wesley,
pp. 260
[12] Baase, Computer algorithms - Introduction to design and analysis, Addison-Wesley,
pp. 49-50