Techno Blender
Digitally Yours.

Text Network Analysis: Theory and Practice | by Petr Korab | Jun, 2022

0 28


Text network analysis belongs to the broader skill set of most text data-oriented analysts.

This article begins a journey to discover this wonderful area: from theory, data prep, and network construction to visualization and forecasting — this series covers the most fundamental concepts of text networks in Python.

Figure 1. Text network plot via Pyvis. Image by author

Developments since the late 20th century, such as…

…and potentially many other factors have resulted in a vast amount of text data easily accessible to analysts, students, and researchers. Over time, scientists developed numerous complex methods to understand the relations in the text datasets, including text network analysis.

This first article on text network analysis in Python will briefly survey the underpinnings of text network analysis, real-world applications of text networks, and their implementation in major data science and business intelligence (BI) software.

In the academic literature, networks are more formally referred to as graphs. More rigorous theoretical propositions of graph theory can be traced back to the 1950s (Berge, 1958). Over time, text network literature has evolved into several streams:

This field has experienced a rapid increase in popularity among academic researchers. It is reflected in the volume of papers with the “network” keyword in the JSTOR database (link here) and the popularity of text networks and semantic networks in Google Books (available from here).

More formally, networks are comprised of two sets of objects (Ma & Seth, 2022):

• A node set: the “entities” in a graph

• An edge set: the record of “relationships” between the entities in the graph.

For example, if a node set n is comprised of elements:

Then, the edge set e would be represented as pairs of elements:

If we draw out a network, nodes are commonly represented as shapes, such as circles, while edges are the lines between the shapes. In text mining, edges and nodes might be represented by:

  • Social and professional networks: nodes individual users, edges “one user has decided to follow another”
  • Identification of fake news in newspaper articles: nodes most frequent and relevant words associated with fake news newspaper articles, edges co-occurrence of words in articles (Segev, 2020)
  • Discovering public discourse of US senators about impeachment: nodes -senators, edges – similarities in the senators’ public statements
  • Understanding policy communication by analyzing written texts: node concepts in written communication, edges co-occurrence of concepts within a sentence or paragraph (Shim et al., 2015)
  • Role of advocacy organizations in shaping conversation on social media: nodes actors engaged in public conversation about an advocacy issue, edges similarities in the content of their messages (Bail, 2016).

Generally, a specified workflow used in network approaches starts with a definition of research questions and leads to inference and decision-making. Several steps can be omitted in a less complex task, but a complete workflow involves steps in Figure 2.

Figure 2. Schematic representation of the workflow used in network approaches. Source: Borsoom et al., 2021. Image by draw.io

An interesting area is a stream in the literature that uses text networks for forecasting. Graph structures are used here as a model whose weights can be optimized by a neural network and used to predict specific variables of interest. Krenn & Zeilinger (2020) use semantic networks and deep learning to predict research topics, which will be published in quantum physics in the next five years.

Network analysis methods are implemented in all major data science and BI tools. Here is the list of the most common libraries and commercial programs. Some of them are not primarily designed for text data analysis, but by feeding them with correctly transformed data, we can display the network structure of the text:

Python:

Network construction:

Network visualization:

Both network construction and visualization:

Julia:

R:

As companies need to understand complex network data structures, most BI programs include network methods and graphics. See the tutorial by Data Surfers (2019) on network visualizations for Tableau and a list of network methods in Power BI. For other commercial and open-source software, we might opt for Infranodus (text network analysis), Neo4j (graph data science), Gephi (network visualization and exploration), or SocNetV (social network analysis).

This article is the first part of upcoming series on analyzing text networks in Python. Stay updated on the following pieces:

Text Network Analysis From Scratch #1 — Data Prep and Network Construction

Text Network Analysis from Scratch #2 — Make Beautiful Network Visualisations

Text Network Analysis from scratch #3 — Using the Net as Model for Predictions

PS: You can subscribe to my email list to get notified every time I write a new article. And if you are not a Medium member yet, you can join here.

[1] Bail, A., C. 2016. Combining natural language processing and network analysis to examine how advocacy organizations stimulate conversation on social media. Proceedings of the National Academy of Sciences, vol. 113, no. 42.

[2 ] Berge, C. 1958. Théorie des graphes et ses applications. Paris: Dunod Editeur.

[3] Borsboom, et al. 2021. Network analysis of multivariate data in psychological science. Nature Reviews Methods Primers, vol. 1, no. 58.

[4] Celardo, L., Everett, M. G. 2020. Network text analysis: A two-way classification approach. International Journal of Information Management, vol. 51, April.

[5] Griffiths, T., L., Steyvers, M., Firl, A. 2007. Google and the Mind: Predicting Fluency With PageRank. Psychological Science, vol. 18, no. 12.

[6] Krenn, M., Zeilinger, A. 2020. Predicting research trends with semantic and neural networks with an application in quantum physics. Proceedings of the National Academy of Sciences, vol. 117, no. 4.

[7] Liao, W., Zeng,B., Liu, J., Wei, P., Cheng, X., Zhang, W. 2021. Multi-level graph neural network for text sentiment analysis. Computers & Electrical Engineering, vol. 92, June.

[8] Ma, E., Seth, M. 2022. Network Analysis Made Simple: An Introduction to Network Analysis and Applied Graph Theory using Python and NetworkX. Lean Publishing. 2022–05–16 ed.

[9] Netzer, O., Feldman, R., Goldenberg, J., Fresko, M. 2012. Mine Your Own Business: Market-Structure Surveillance Through Text Mining. Marketing Science, vol., 31, no. 3.

[10] Paranyushkin, 2019. InfraNodus: Generating Insight Using Text Network Analysis. In Proceedings of WWW ’19: The Web Conference, May 13, 2019, San Francisco, USA.

[11] Segev, 2020. Textual network analysis: Detecting prevailing themes and biases in international news and social media. Sociology Compass, vol. 14, no. 4.

[12] Shim, J., Park, C., Wilding, M. 2015. Identifying policy frames through semantic network analysis: an examination of nuclear energy policy across six countries. Policy Sciences, vol. 48.

[13] The Data Surfers. 2019. How to use Gephi to create Network Visualizations for Tableau. Retrieved 2022–5–31 from https://thedatasurfers.com/2019/08/27/how-to-use-gephi-to-create-network-visualizations-for-tableau/.

[14] Yao, L., Mao, C., Luo, Y. 2019. Graph Convolutional Networks for Text Classification. In Proceedings from The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Hawaii, USA.


Text network analysis belongs to the broader skill set of most text data-oriented analysts.

This article begins a journey to discover this wonderful area: from theory, data prep, and network construction to visualization and forecasting — this series covers the most fundamental concepts of text networks in Python.

Figure 1. Text network plot via Pyvis. Image by author

Developments since the late 20th century, such as…

…and potentially many other factors have resulted in a vast amount of text data easily accessible to analysts, students, and researchers. Over time, scientists developed numerous complex methods to understand the relations in the text datasets, including text network analysis.

This first article on text network analysis in Python will briefly survey the underpinnings of text network analysis, real-world applications of text networks, and their implementation in major data science and business intelligence (BI) software.

In the academic literature, networks are more formally referred to as graphs. More rigorous theoretical propositions of graph theory can be traced back to the 1950s (Berge, 1958). Over time, text network literature has evolved into several streams:

This field has experienced a rapid increase in popularity among academic researchers. It is reflected in the volume of papers with the “network” keyword in the JSTOR database (link here) and the popularity of text networks and semantic networks in Google Books (available from here).

More formally, networks are comprised of two sets of objects (Ma & Seth, 2022):

• A node set: the “entities” in a graph

• An edge set: the record of “relationships” between the entities in the graph.

For example, if a node set n is comprised of elements:

Then, the edge set e would be represented as pairs of elements:

If we draw out a network, nodes are commonly represented as shapes, such as circles, while edges are the lines between the shapes. In text mining, edges and nodes might be represented by:

  • Social and professional networks: nodes individual users, edges “one user has decided to follow another”
  • Identification of fake news in newspaper articles: nodes most frequent and relevant words associated with fake news newspaper articles, edges co-occurrence of words in articles (Segev, 2020)
  • Discovering public discourse of US senators about impeachment: nodes -senators, edges – similarities in the senators’ public statements
  • Understanding policy communication by analyzing written texts: node concepts in written communication, edges co-occurrence of concepts within a sentence or paragraph (Shim et al., 2015)
  • Role of advocacy organizations in shaping conversation on social media: nodes actors engaged in public conversation about an advocacy issue, edges similarities in the content of their messages (Bail, 2016).

Generally, a specified workflow used in network approaches starts with a definition of research questions and leads to inference and decision-making. Several steps can be omitted in a less complex task, but a complete workflow involves steps in Figure 2.

Figure 2. Schematic representation of the workflow used in network approaches. Source: Borsoom et al., 2021. Image by draw.io

An interesting area is a stream in the literature that uses text networks for forecasting. Graph structures are used here as a model whose weights can be optimized by a neural network and used to predict specific variables of interest. Krenn & Zeilinger (2020) use semantic networks and deep learning to predict research topics, which will be published in quantum physics in the next five years.

Network analysis methods are implemented in all major data science and BI tools. Here is the list of the most common libraries and commercial programs. Some of them are not primarily designed for text data analysis, but by feeding them with correctly transformed data, we can display the network structure of the text:

Python:

Network construction:

Network visualization:

Both network construction and visualization:

Julia:

R:

As companies need to understand complex network data structures, most BI programs include network methods and graphics. See the tutorial by Data Surfers (2019) on network visualizations for Tableau and a list of network methods in Power BI. For other commercial and open-source software, we might opt for Infranodus (text network analysis), Neo4j (graph data science), Gephi (network visualization and exploration), or SocNetV (social network analysis).

This article is the first part of upcoming series on analyzing text networks in Python. Stay updated on the following pieces:

Text Network Analysis From Scratch #1 — Data Prep and Network Construction

Text Network Analysis from Scratch #2 — Make Beautiful Network Visualisations

Text Network Analysis from scratch #3 — Using the Net as Model for Predictions

PS: You can subscribe to my email list to get notified every time I write a new article. And if you are not a Medium member yet, you can join here.

[1] Bail, A., C. 2016. Combining natural language processing and network analysis to examine how advocacy organizations stimulate conversation on social media. Proceedings of the National Academy of Sciences, vol. 113, no. 42.

[2 ] Berge, C. 1958. Théorie des graphes et ses applications. Paris: Dunod Editeur.

[3] Borsboom, et al. 2021. Network analysis of multivariate data in psychological science. Nature Reviews Methods Primers, vol. 1, no. 58.

[4] Celardo, L., Everett, M. G. 2020. Network text analysis: A two-way classification approach. International Journal of Information Management, vol. 51, April.

[5] Griffiths, T., L., Steyvers, M., Firl, A. 2007. Google and the Mind: Predicting Fluency With PageRank. Psychological Science, vol. 18, no. 12.

[6] Krenn, M., Zeilinger, A. 2020. Predicting research trends with semantic and neural networks with an application in quantum physics. Proceedings of the National Academy of Sciences, vol. 117, no. 4.

[7] Liao, W., Zeng,B., Liu, J., Wei, P., Cheng, X., Zhang, W. 2021. Multi-level graph neural network for text sentiment analysis. Computers & Electrical Engineering, vol. 92, June.

[8] Ma, E., Seth, M. 2022. Network Analysis Made Simple: An Introduction to Network Analysis and Applied Graph Theory using Python and NetworkX. Lean Publishing. 2022–05–16 ed.

[9] Netzer, O., Feldman, R., Goldenberg, J., Fresko, M. 2012. Mine Your Own Business: Market-Structure Surveillance Through Text Mining. Marketing Science, vol., 31, no. 3.

[10] Paranyushkin, 2019. InfraNodus: Generating Insight Using Text Network Analysis. In Proceedings of WWW ’19: The Web Conference, May 13, 2019, San Francisco, USA.

[11] Segev, 2020. Textual network analysis: Detecting prevailing themes and biases in international news and social media. Sociology Compass, vol. 14, no. 4.

[12] Shim, J., Park, C., Wilding, M. 2015. Identifying policy frames through semantic network analysis: an examination of nuclear energy policy across six countries. Policy Sciences, vol. 48.

[13] The Data Surfers. 2019. How to use Gephi to create Network Visualizations for Tableau. Retrieved 2022–5–31 from https://thedatasurfers.com/2019/08/27/how-to-use-gephi-to-create-network-visualizations-for-tableau/.

[14] Yao, L., Mao, C., Luo, Y. 2019. Graph Convolutional Networks for Text Classification. In Proceedings from The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Hawaii, USA.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment