LIBERAL STUDIES CHANNEL

Liberal Studies Channel selects and recombines fragments of knowledge in humanities, science and technology from all over the world, in order to provide interviews with professors linking various disciplines, as well as to recommend courses and enjoy knowledge.
Created by Xiamen University Liberal Studies Center, refreshed every Tuesday / Wednesday

“All data is put into codes.” – Me

FOUR STEPS OF OBTAINING ONLINE BIG DATA:
1.Analyzing the source code of the web page, finding its target code.
2.Obtaining the source code with Python (or other softwares).
3.Obtaining the target code with “Tool”.
4.Sending out.

INTRODUCING WEB CRAWLERS
A so-called “web crawler” is an automatic program capable of downloading the content of a web page and providing results for a search engine quickly. It is able to search for web pages through their link addresses. It is also able to search for web pages through links, and find further pages through links provided on these pages. It applies the method of recursion, which means that it is constantly working until it captures all available pages from a certain website.
The most well-known web crawler is Googlebot, which helps to make the Google search engine quicker by visiting billions of web pages and downloading their content.
THROUGH WEB CRAWLERS, MALICIOUS USERS MIGHT OBTAIN SENSITIVE DATA FOR IMPROPER USE

SEARCHING THROUGH WEB PAGES, FILES, PROGRAMS AND ITS POSSIBLE DRAWBACKS
On most web server sites, test pages, support files, sample programs and supplementary debug programs are attached. These documents can leak out large amounts of system information, provide means to bypass authentication and directly access data stored on the web server. Therefore, they became effective sources of information for malicious users attempting to launch analytical attacks on web servers.
SEARCH ADMINS LOGGING INTO THE WEB PAGE
Many network products provide web-based management interfaces, allowing administrators to engage in online remote management. If the administrator is not vigilant enough and does not change the admin name and password provided by the network product, one day a malicious user might finds it and its network security will be in great danger.

SEARCHING THROUGH THE PERSONAL DATA OF INTERNET USERS
The personal data of internet users include name, ID number, phone number, e-mail address, QQ number, postal address and other personal information. After acquiring these, malicious users can easily carry out attacks and fraud through the use of social engineering.

HOW TO STRENGTHEN PERSONAL PRIVACY
1.Reduce the amount of personal private information provided during online registration, do not provide personal information on unidentified websites.
2.Avoid interactive activities on social platforms, e.g. tests, sharing and lotteries if the origin is unclear
3.Use safe computers for going online, avoid the input of your passwords in net cafes and other public places, install and refresh anti-virus and firewall programs regularly.
4.Do not access unfamiliar Wi-Fi networks in public places, be wary of phishing traps.
5.Be wary of e-mail frauds, do not fall easily for advertisements about price winning and credit providing.
6.Pay attention to the proper disposal of personal data documents, such as bank receipts, express delivery receipts and bills.

RECOMMENDED COURSE
Course name: Information security in the age of Big Data
Lecturer: Zeng Jiwen
Time: 1st to 8th week, Tuesday 10:10-11:50 (Section 3-4), Thursday 14:30-16:10 (Section 5-6)
Place: Student Dormitories (学生公寓) 104

Course content:
David Beckham’s mailbox was hacked leading to a scandal, my neighbor Mr. Wang’s Alipay was hacked and charged, all his deposit disappearing… It doesn’t matter whether you are a celebrity or an ordinary person, to keep your information security in the age of Big Data is a hard task for everyone. Hush, I tell you a secret, don’t forget to take the key called “mathematics” with you. It will help you to unlock quite a few secrets of Big Data age information security!

Lecturer:
Prof. Zeng Jiwen, Department of Mathematics and Applied Mathematics at the School of Mathematical Sciences of Xiamen University, Master’s supervisor, main research fields include group representation theory, the application of group theory in cryptography.

How to download web crawlers:
1.“Python” can be downloaded from its official website.
2.“pycharm” can be downloaded from its official website.
3.Simple installation package program “setuptools” can be downloaded from http://pan.baidu.com/s/1pKGgQGN or you should search for it on Baidu using the keyword “setuptools”.

Tencent address of the original video: https://v.qq.com/x/page/b0385jdszeq.html
Bilibili account: http://space.bilibili.com/94252767/#!/

Text and video by: Chen Dongyi
Edited by: Chen Xi
Translation by: Sebestyen Hompot

Click on “Reading the original text” to watch our previous video “Aerial documentaries: Looking through all corners of the university”

Teachers | Obtaining Big Data is super easy: Introducing web crawlers