Document Type : Original Article


Department of Information Technology Engineering, Industrial Engineering Faculty, K.N. Toosi University of Technology, Tehran, Iran.


Social networks provide marketing managers and businesses with opportunity to target their customers. By understanding the demographics of users, marketing managers can offer suitable products and services. Although direct questioning can be drawn upon to solicit users’ demographics such as age, some customers due to privacy concerns do not like to reveal their personal information and, it cannot come in handy for potential customer identification. The huge amount of data social networks generate can solve this problem. Previous studies in the prediction of demographic characteristics suffer some limitations because they were mainly text based and hence, language-bound. This study investigates how some interactive data can predict users’ age. Further, it examines if classification methods can be used for age prediction. The results revealed that the number of friends, number of opposite sex friends, number of comments received, and number of photos which users share can predict users’ age. Also, a linear relationship between interactive data and users’ age was found.


Ahlqvist, T., Bäck, A., Halonen, M., and Heinonen, S., (2008). "Social media road maps exploring the futures triggered by social media", VTT Tiedotteita-Valtion Teknillinen Tutkimuskeskus.
Appel, G., Grewal, L., Hadi, R., and Stephen, A. T., (2020). "The future of social media in marketing", Journal of the Academy of Marketing Science, Vol. 48, No. 1, pp. 79-95.
Argamon, S., Koppel, M., Pennebaker, J. W., and Schler, J., (2007). "Mining the blogosphere: Age, gender and the varieties of self-expression", First Monday, Vol. 12, No. 9.
Bamman, D., Eisenstein, J., and Schnoebelen, T., (2014). "Gender identity and lexical variation in social media", Journal of Sociolinguistics, Vol. 18, No. 2, pp. 135-160.
Chang, P. F., Choi, Y. H., Bazarova, N. N., and Löckenhoff, C. E., (2015). "Age differences in online social networking: Extending socioemotional selectivity theory to social network sites", Journal of Broadcasting & Electronic Media, Vol. 59, No. 2, pp. 221-239.
Dwivedi, Y.K., Ismagilova, E., Hughes, D.L., Carlson, J., Filieri, R., Jacobson, J., Jain, V., Karjaluoto, H., Kefi, H., Krishen, A.S. and Kumar, V., (2020). "Setting the future of digital and social media marketing research: Perspectives and research propositions", International Journal of Information Management, p.102168.
Erumban, A. A., and De Jong, S. B., (2006). "Cross-country differences in ICT adoption: A consequence of Culture?", Journal of world business, Vol. 41, No. 4, pp. 302-314.
Field, A., (2009). Discovering statistics using SPSS, Sage publications.
Friedmann, S., (2009). The complete idiot's guide to target marketing, Penguin.
Hinz, O., Schulze, C., and Takac, C., (2014). "New product adoption in social networks: Why direction matters", Journal of Business Research, Vol. 67, No. 1, pp. 2836-2844.
Hosseini, M., and Tammimy, Z., (2016). "Recognizing users’ gender in social media using linguistic features", Computers in Human Behavior, Vol. 56, pp. 192-197.
Hu, J., Zeng, H. J., Li, H., Niu, C., & Chen, Z. (2007, May). Demographic prediction based on user's browsing behavior. In Proceedings of the 16th international conference on World Wide Web (pp. 151-160).
Kemp, S. (2019). Digital 2019: Global social media users pass 3.5 billion.
Kramsch, C., (2015). "Applied linguistics: A theory of the practice", Applied linguistics, Vol. 36, No. 4, pp. 454-465.
Kucukyilmaz, T., Cambazoglu, B. B., Aykanat, C., & Can, F. (2006, October). Chat mining for gender prediction. In International conference on advances in information systems (pp. 274-283). Springer, Berlin, Heidelberg.
Kuhn, M., and Johnson, K., (2013). Applied predictive modeling (Vol. 26), Springer.
Kumar, S., Morstatter, F., and Liu, H., (2014). Twitter data analytics, Springer.
Lanteri, S., (2019). Persona Spotlight: The Online Shopper. Global web index. Retrieved from
Madden, M., Lenhart, A., Cortesi, S., Gasser, U., Duggan, M., Smith, A., and Beaton, M., (2013). "Teens, social media, and privacy", Pew Research Center, Vol. 21, pp. 2-86.
Marquardt, J., Farnadi, G., Vasudevan, G., Moens, M. F., Davalos, S., Teredesai, A., & De Cock, M. (2014). Age and gender identification in social media. Proceedings of CLEF 2014 Evaluation Labs, 1180, 1129-1136.
Menard, S. (2002). Applied logistic regression analysis (Vol. 106). Sage.
Shevlin, M., & Miles, J. (2000). Applying regression and correlation: A guide for students and researchers. Applying Regression and Correlation, 1-272.
Morgan-Lopez, A. A., Kim, A. E., Chew, R. F., and Ruddle, P., (2017). "Predicting age groups of Twitter users based on language and metadata features", PloS one, Vol. 12, No. 8, e0183537.
Mukherjee, A., & Liu, B. (2010, October). Improving gender classification of blog authors. In Proceedings of the 2010 conference on Empirical Methods in natural Language Processing (pp. 207-217).
Myers, R. H., and Myers, R. H., (1990). Classical and modern regression with applications (Vol. 2), Duxbury press Belmont, CA.
Nguyen, D., Smith, N. A., & Rose, C. (2011, June). Author age prediction from text using linear regression. In Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities (pp. 115-123).
Peersman, C., Daelemans, W., & Van Vaerenbergh, L. (2011, October). Predicting age and gender in online social networks. In Proceedings of the 3rd international workshop on Search and mining user-generated contents (pp. 37-44).
Pfeil, U., Arjan, R., and Zaphiris, P., (2009). "Age differences in online social networking–A study of user profiles and the social capital divide among teenagers and older users in MySpace", Computers in Human Behavior, Vol. 25, No. 3, pp. 643-654.
Rangel, F., and Rosso, P., (2016). "On the impact of emotions on author profiling", Information processing & management, Vol. 52, No. 1, pp. 73-92.
Saravanakumar, M., and SuganthaLakshmi, T., (2012). "Social media marketing", Life Science Journal, Vol. 9, No. 4, pp. 4444-4451.
Clement, J. (2020). Global social networks ranked by number of users 2020. Statista. com.
Tagliamonte, S. A. (2006). Analysing sociolinguistic variation. Cambridge University Press.
Wang, Z., Hale, S., Adelani, D. I., Grabowicz, P., Hartman, T., Flöck, F., & Jurgens, D. (2019, May). Demographic inference and representative population estimate from multilingual social media data. In The world wide web conference (pp. 2056-2067).
Witten, I. H., & Frank, E. (2002). Data mining: practical machine learning tools and techniques with Java implementations. Acm Sigmod Record, 31(1), 76-77.
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., and Philip, S. Y., (2008). "Top 10 algorithms in data mining", Knowledge and information systems, Vol. 14, No. 1, pp. 1-37.
Zarrella, D. (2009). The social media marketing book. " O'Reilly Media, Inc.".
Zhang, X., Wang, W., de Pablos, P. O., Tang, J., and Yan, X., (2015). "Mapping development of social media research through different disciplines: Collaborative learning in management and computer science", Computers in Human Behavior, Vol. 51, pp. 1142-1153.
Zheleva, E., & Getoor, L. (2009, April). To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th international conference on World wide web (pp. 531-540).