News Similarity: Biases and Similarities of News Sources

Square

News Sources Feb 12 2020

“http://www.rt.com”, “http://www.cnn.com”, “https://www.rte.ie/”,
“https://www.breitbart.com/”, “https://www.bbc.com/”,
“https://usatoday.com/”, “http://www.xinhuanet.com/english/”,
“https://mexiconewsdaily.com/”, “https://www.cbc.ca/”,
“https://www.huffpost.com/”, “https://www.nytimes.com/”,
“https://www.foxnews.com/”,
“https://www.theguardian.com/international”, “https://www.wsj.com/”,
“https://www.latimes.com/”, “https://www.aljazeera.com/”

Sample Word Vector

Text extracted from the new web sites and turned into variety of words lists.

 

 

Words turned into numerical vectors called Features

TFIDF or other algorithms convert the textual words lists into numerical vectors numbering in thousands of columns.

{0., -0.00336273, -0.0213353, -0.00224182, -0.0100882, 0.,
-0.000560454, 0., 0., -0.00574844, 0., 0., 0., 0., -0.00171913,
-0.071092, 0., -0.00952772, 0., 0., 0., -0.000859564, -0.117617,
-0.0322764, 0., -0.0443117, -0.000938294, 0., 0., 0., 0.,
-0.00171913, 0., 0., -0.00392318, -0.00765881, 0., 0., -0.0156927,
0., -0.00172453, -0.00107288, -0.00347522, -0.00711175, -0.00437646,
0., -0.00171913, 0., 0., 0., -0.00492352, 0., 0., 0., 0., 0.,
-0.0257117, 0., 0., -0.00177116, 0., -0.00107288, 0., -0.0235235, 0.,
0., 0., -0.000760377, 0., 0., 0., -0.000859564, 0., 0., 0.,
-0.000859564, 0., -0.0151323, 0., 0., 0., -0.0100882, 0., 0.,
-0.006165, -0.000938294, 0., 0., 0., -0.00129249, -0.000938294, 0.,
0., 0., -0.000938294, 0., 0., -0.0289941, -0.14646, -0.134029,
-0.211672, 0., 0., -0.00875293, 0., 0., -0.00114969, 0.,
-0.000646245, 0., 0., -0.00129249, -0.000803705, 0., -0.0158647, 0.,
0., -0.00168136, -0.0231541, -0.00448363, 0., 0., -0.00295193,
-0.141141, 0., 0., -0.00109412, -0.000938294, 0., -0.0251647,
-0.0443117, -0.000803705, 0., -0.0160588, 0., -0.000574844, 0.,
-0.151805, -0.00187659, -0.00448363, 0., -0.0114882, -0.0224294, 0.,
0., -0.000760377, 0., 0., -0.0530646, 0., -0.000938294, 0.,
-0.00114969, -0.056347, 0., -0.00118077, -0.0082654, -0.00107288,
-0.00214577, -0.0158647, 0., -0.0134509, -0.00347522, 0.,
-0.00187659, 0., 0., 0., -0.000938294, 0., 0., -0.00224182,
-0.626381, 0., 0., -0.165759, -0.000859564, 0., 0., 0., 0., 0., 0.,
0., -0.00765881, 0., -0.000574844, -0.00437646, -0.00171913, 0.,
-0.131294, 0., 0., -0.000859564, -0.0112091, 0., 0., 0., -0.00336273,
-0.00751018, 0., 0., 0., 0., 0., 0., 0., 0., -0.0128904, 0.,
-0.000938294, 0., 0., 0., 0., 0., -0.00448363, 0., 0., -0.00784636,
0., 0., -0.028447, 0., 0., 0., -0.00152075, -0.0164117, -0.00672545,
0., -0.0694884, 0., 0., -0.00214577, 0., 0., 0., 0., -0.00382941, 0.,
0., 0., 0., 0., 0., 0., -0.00107288, 0., -0.000938294, -0.00515738,
-0.0798704}

 

Dendograms

https://en.wikipedia.org/wiki/Dendrogram

Similarity Matrix below is annotated with axial Dendograms.

Each leaf of the Dendogram is a news URL. Black Tile or Pixel in the Similarity Matrix indicates very dissimilar and white indicates very similar with gray gradations.

Clearly the visualization shows that Breitbart is quite dissimilar from all other other sources of news.

 

Nearest Neighbor Graph (K-NN-G)

https://en.wikipedia.org/wiki/Nearest_neighbor_graph

An element elemj is a k-nearest neighbor of an element elemi whenever the distance from elemi to elemj is among the k^(th) smallest distances from elemi to any other element.

For the following K-NN-G graph has set k=2.

Each vertex is a news source, each arrow from vertex A to vertex B indicates B is most similar ranking between 1 and 2 top contenders for being similar.